-
Measurement of the branching fraction of $D^+_s\to \ell^+ν_\ell$ via $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (634 additional authors not shown)
Abstract:
Based on $10.64~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data taken at center-of-mass energies between 4.237 and 4.699 GeV with the BESIII detector, we study the leptonic $D^+_s$ decays using the $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$ process. The branching fractions of $D_s^+\to\ell^+ν_{\ell}\,(\ell=μ,τ)$ are measured to be $\mathcal{B}(D_s^+\toμ^+ν_μ)=(\bfmuv)\%$ and…
▽ More
Based on $10.64~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data taken at center-of-mass energies between 4.237 and 4.699 GeV with the BESIII detector, we study the leptonic $D^+_s$ decays using the $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$ process. The branching fractions of $D_s^+\to\ell^+ν_{\ell}\,(\ell=μ,τ)$ are measured to be $\mathcal{B}(D_s^+\toμ^+ν_μ)=(\bfmuv)\%$ and $\mathcal{B}(D_s^+\toτ^+ν_τ)=(\bftauv)\%$, respectively. The product of the decay constant and Cabibbo-Kobayashi-Maskawa matrix element $|V_{cs}|$ is determined to be $f_{D_s^+}|V_{cs}|=(\mufdsxvcsresult)_{μν}~\mathrm{MeV}$ and $f_{D_s^+}|V_{cs}|=(\taufdsxvcsresult))_{τν}~\mathrm{MeV}$, respectively. Taking the value of $|V_{cs}|$ from a global fit in the Standard Model, we obtain ${f_{D^+_s}}=(\mufdsresult)_{μν}$\,MeV and ${f_{D^+_s}}=(\taufdsresult)_{τν}$\,MeV, respectively. Conversely, taking the value for $f_{D_s^+}$ from the latest lattice quantum chromodynamics calculation, we obtain $|V_{cs}| =(\muvcsresult)_{μν}$ and $|V_{cs}| = (\tauvcsresult)_{τν}$, respectively.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Video-Language Alignment Pre-training via Spatio-Temporal Graph Transformer
Authors:
Shi-Xue Zhang,
Hongfa Wang,
Xiaobin Zhu,
Weibo Gu,
Tianjin Zhang,
Chun Yang,
Wei Liu,
Xu-Cheng Yin
Abstract:
Video-language alignment is a crucial multi-modal task that benefits various downstream applications, e.g., video-text retrieval and video question answering. Existing methods either utilize multi-modal information in video-text pairs or apply global and local alignment techniques to promote alignment precision. However, these methods often fail to fully explore the spatio-temporal relationships a…
▽ More
Video-language alignment is a crucial multi-modal task that benefits various downstream applications, e.g., video-text retrieval and video question answering. Existing methods either utilize multi-modal information in video-text pairs or apply global and local alignment techniques to promote alignment precision. However, these methods often fail to fully explore the spatio-temporal relationships among vision tokens within video and across different video-text pairs. In this paper, we propose a novel Spatio-Temporal Graph Transformer module to uniformly learn spatial and temporal contexts for video-language alignment pre-training (dubbed STGT). Specifically, our STGT combines spatio-temporal graph structure information with attention in transformer block, effectively utilizing the spatio-temporal contexts. In this way, we can model the relationships between vision tokens, promoting video-text alignment precision for benefiting downstream tasks. In addition, we propose a self-similarity alignment loss to explore the inherent self-similarity in the video and text. With the initial optimization achieved by contrastive learning, it can further promote the alignment accuracy between video and text. Experimental results on challenging downstream tasks, including video-text retrieval and video question answering, verify the superior performance of our method.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
REMM:Rotation-Equivariant Framework for End-to-End Multimodal Image Matching
Authors:
Han Nie,
Bin Luo,
Jun Liu,
Zhitao Fu,
Weixing Liu,
Xin Su
Abstract:
We present REMM, a rotation-equivariant framework for end-to-end multimodal image matching, which fully encodes rotational differences of descriptors in the whole matching pipeline. Previous learning-based methods mainly focus on extracting modal-invariant descriptors, while consistently ignoring the rotational invariance. In this paper, we demonstrate that our REMM is very useful for multimodal i…
▽ More
We present REMM, a rotation-equivariant framework for end-to-end multimodal image matching, which fully encodes rotational differences of descriptors in the whole matching pipeline. Previous learning-based methods mainly focus on extracting modal-invariant descriptors, while consistently ignoring the rotational invariance. In this paper, we demonstrate that our REMM is very useful for multimodal image matching, including multimodal feature learning module and cyclic shift module. We first learn modal-invariant features through the multimodal feature learning module. Then, we design the cyclic shift module to rotationally encode the descriptors, greatly improving the performance of rotation-equivariant matching, which makes them robust to any angle. To validate our method, we establish a comprehensive rotation and scale-matching benchmark for evaluating the anti-rotation performance of multimodal images, which contains a combination of multi-angle and multi-scale transformations from four publicly available datasets. Extensive experiments show that our method outperforms existing methods in benchmarking and generalizes well to independent datasets. Additionally, we conducted an in-depth analysis of the key components of the REMM to validate the improvements brought about by the cyclic shift module. Code and dataset at https://github.com/HanNieWHU/REMM.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
TCFormer: Visual Recognition via Token Clustering Transformer
Authors:
Wang Zeng,
Sheng Jin,
Lumin Xu,
Wentao Liu,
Chen Qian,
Wanli Ouyang,
Ping Luo,
Xiaogang Wang
Abstract:
Transformers are widely used in computer vision areas and have achieved remarkable success. Most state-of-the-art approaches split images into regular grids and represent each grid region with a vision token. However, fixed token distribution disregards the semantic meaning of different image regions, resulting in sub-optimal performance. To address this issue, we propose the Token Clustering Tran…
▽ More
Transformers are widely used in computer vision areas and have achieved remarkable success. Most state-of-the-art approaches split images into regular grids and represent each grid region with a vision token. However, fixed token distribution disregards the semantic meaning of different image regions, resulting in sub-optimal performance. To address this issue, we propose the Token Clustering Transformer (TCFormer), which generates dynamic vision tokens based on semantic meaning. Our dynamic tokens possess two crucial characteristics: (1) Representing image regions with similar semantic meanings using the same vision token, even if those regions are not adjacent, and (2) concentrating on regions with valuable details and represent them using fine tokens. Through extensive experimentation across various applications, including image classification, human pose estimation, semantic segmentation, and object detection, we demonstrate the effectiveness of our TCFormer. The code and models for this work are available at https://github.com/zengwang430521/TCFormer.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
A Survey of Distance-Based Vessel Trajectory Clustering: Data Pre-processing, Methodologies, Applications, and Experimental Evaluation
Authors:
Maohan Liang,
Ryan Wen Liu,
Ruobin Gao,
Zhe Xiao,
Xiaocai Zhang,
Hua Wang
Abstract:
Vessel trajectory clustering, a crucial component of the maritime intelligent transportation systems, provides valuable insights for applications such as anomaly detection and trajectory prediction. This paper presents a comprehensive survey of the most prevalent distance-based vessel trajectory clustering methods, which encompass two main steps: trajectory similarity measurement and clustering. I…
▽ More
Vessel trajectory clustering, a crucial component of the maritime intelligent transportation systems, provides valuable insights for applications such as anomaly detection and trajectory prediction. This paper presents a comprehensive survey of the most prevalent distance-based vessel trajectory clustering methods, which encompass two main steps: trajectory similarity measurement and clustering. Initially, we conducted a thorough literature review using relevant keywords to gather and summarize pertinent research papers and datasets. Then, this paper discussed the principal methods of data pre-processing that prepare data for further analysis. The survey progresses to detail the leading algorithms for measuring vessel trajectory similarity and the main clustering techniques used in the field today. Furthermore, the various applications of trajectory clustering within the maritime context are explored. Finally, the paper evaluates the effectiveness of different algorithm combinations and pre-processing methods through experimental analysis, focusing on their impact on the performance of distance-based trajectory clustering algorithms. The experimental results demonstrate the effectiveness of various trajectory clustering algorithms and notably highlight the significant improvements that trajectory compression techniques contribute to the efficiency and accuracy of trajectory clustering. This comprehensive approach ensures a deep understanding of current capabilities and future directions in vessel trajectory clustering.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
Unified Implementation of Relativistic Wave Function Methods: 4C-iCIPT2 as a Showcase
Authors:
Ning Zhang,
Wenjian Liu
Abstract:
In parallel to the unified construction of relativistic Hamiltonians based solely on physical arguments [J. Chem. Phys. 160, 084111 (2024)], a unified implementation of relativistic wave function methods is achieved here via programming techniques (e.g., template metaprogramming and polymorphism in C++). That is, once the code for constructing the Hamiltonian matrix is made ready, all the rest can…
▽ More
In parallel to the unified construction of relativistic Hamiltonians based solely on physical arguments [J. Chem. Phys. 160, 084111 (2024)], a unified implementation of relativistic wave function methods is achieved here via programming techniques (e.g., template metaprogramming and polymorphism in C++). That is, once the code for constructing the Hamiltonian matrix is made ready, all the rest can be generated automatically from existing templates used for the nonrelativistic counterparts. This is facilitated by breaking a second-quantized relativistic Hamiltonian down to diagrams that are topologically the same as those required for computing the basic coupling coefficients between spin-free configuration state functions (CSF). Moreover, both time reversal and binary double point group symmetries can readily be incorporated into molecular integrals and Hamiltonian matrix elements. The latter can first be evaluated in the space of (randomly selected) spin-dependent determinants and then transformed to that of spin-dependent CSFs, thanks to simple relations in between. As a showcase, we consider here the no-pair four-component relativistic iterative configuration interaction with selection and perturbation correction (4C-iCIPT2), which is a natural extension of the spin-free iCIPT2 [J. Chem. Theory Comput. 17, 949 (2021)], and can provide near-exact numerical results within the manifold of positive energy states (PES), as demonstrated by numerical examples.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis
Authors:
Weizhi Liu,
Yue Li,
Dongdong Lin,
Hui Tian,
Haizhou Li
Abstract:
Amid the burgeoning development of generative models like diffusion models, the task of differentiating synthesized audio from its natural counterpart grows more daunting. Deepfake detection offers a viable solution to combat this challenge. Yet, this defensive measure unintentionally fuels the continued refinement of generative models. Watermarking emerges as a proactive and sustainable tactic, p…
▽ More
Amid the burgeoning development of generative models like diffusion models, the task of differentiating synthesized audio from its natural counterpart grows more daunting. Deepfake detection offers a viable solution to combat this challenge. Yet, this defensive measure unintentionally fuels the continued refinement of generative models. Watermarking emerges as a proactive and sustainable tactic, preemptively regulating the creation and dissemination of synthesized content. Thus, this paper, as a pioneer, proposes the generative robust audio watermarking method (Groot), presenting a paradigm for proactively supervising the synthesized audio and its source diffusion models. In this paradigm, the processes of watermark generation and audio synthesis occur simultaneously, facilitated by parameter-fixed diffusion models equipped with a dedicated encoder. The watermark embedded within the audio can subsequently be retrieved by a lightweight decoder. The experimental results highlight Groot's outstanding performance, particularly in terms of robustness, surpassing that of the leading state-of-the-art methods. Beyond its impressive resilience against individual post-processing attacks, Groot exhibits exceptional robustness when facing compound attacks, maintaining an average watermark extraction accuracy of around 95%.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset
Authors:
Yi Zhang,
Wang Zeng,
Sheng Jin,
Chen Qian,
Ping Luo,
Wentao Liu
Abstract:
Recent years have witnessed increasing research attention towards pedestrian detection by taking the advantages of different sensor modalities (e.g. RGB, IR, Depth, LiDAR and Event). However, designing a unified generalist model that can effectively process diverse sensor modalities remains a challenge. This paper introduces MMPedestron, a novel generalist model for multimodal perception. Unlike p…
▽ More
Recent years have witnessed increasing research attention towards pedestrian detection by taking the advantages of different sensor modalities (e.g. RGB, IR, Depth, LiDAR and Event). However, designing a unified generalist model that can effectively process diverse sensor modalities remains a challenge. This paper introduces MMPedestron, a novel generalist model for multimodal perception. Unlike previous specialist models that only process one or a pair of specific modality inputs, MMPedestron is able to process multiple modal inputs and their dynamic combinations. The proposed approach comprises a unified encoder for modal representation and fusion and a general head for pedestrian detection. We introduce two extra learnable tokens, i.e. MAA and MAF, for adaptive multi-modal feature fusion. In addition, we construct the MMPD dataset, the first large-scale benchmark for multi-modal pedestrian detection. This benchmark incorporates existing public datasets and a newly collected dataset called EventPed, covering a wide range of sensor modalities including RGB, IR, Depth, LiDAR, and Event data. With multi-modal joint training, our model achieves state-of-the-art performance on a wide range of pedestrian detection benchmarks, surpassing leading models tailored for specific sensor modality. For example, it achieves 71.1 AP on COCO-Persons and 72.6 AP on LLVIP. Notably, our model achieves comparable performance to the InternImage-H model on CrowdHuman with 30x smaller parameters. Codes and data are available at https://github.com/BubblyYi/MMPedestron.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
All Roads Lead to Rome: Unveiling the Trajectory of Recommender Systems Across the LLM Era
Authors:
Bo Chen,
Xinyi Dai,
Huifeng Guo,
Wei Guo,
Weiwen Liu,
Yong Liu,
Jiarui Qin,
Ruiming Tang,
Yichao Wang,
Chuhan Wu,
Yaxiong Wu,
Hao Zhang
Abstract:
Recommender systems (RS) are vital for managing information overload and delivering personalized content, responding to users' diverse information needs. The emergence of large language models (LLMs) offers a new horizon for redefining recommender systems with vast general knowledge and reasoning capabilities. Standing across this LLM era, we aim to integrate recommender systems into a broader pic…
▽ More
Recommender systems (RS) are vital for managing information overload and delivering personalized content, responding to users' diverse information needs. The emergence of large language models (LLMs) offers a new horizon for redefining recommender systems with vast general knowledge and reasoning capabilities. Standing across this LLM era, we aim to integrate recommender systems into a broader picture, and pave the way for more comprehensive solutions for future research. Therefore, we first offer a comprehensive overview of the technical progression of recommender systems, particularly focusing on language foundation models and their applications in recommendation. We identify two evolution paths of modern recommender systems -- via list-wise recommendation and conversational recommendation. These two paths finally converge at LLM agents with superior capabilities of long-term memory, reflection, and tool intelligence. Along these two paths, we point out that the information effectiveness of the recommendation is increased, while the user's acquisition cost is decreased. Technical features, research methodologies, and inherent challenges for each milestone along the path are carefully investigated -- from traditional list-wise recommendation to LLM-enhanced recommendation to recommendation with LLM agents. Finally, we highlight several unresolved challenges crucial for the development of future personalization technologies and interfaces and discuss the future prospects.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
Authors:
Zihao Zhou,
Shudong Liu,
Maizhen Ning,
Wei Liu,
Jindong Wang,
Derek F. Wong,
Xiaowei Huang,
Qiufeng Wang,
Kaizhu Huang
Abstract:
Exceptional mathematical reasoning ability is one of the key features that demonstrate the power of large language models (LLMs). How to comprehensively define and evaluate the mathematical abilities of LLMs, and even reflect the user experience in real-world scenarios, has emerged as a critical issue. Current benchmarks predominantly concentrate on problem-solving capabilities, which presents a s…
▽ More
Exceptional mathematical reasoning ability is one of the key features that demonstrate the power of large language models (LLMs). How to comprehensively define and evaluate the mathematical abilities of LLMs, and even reflect the user experience in real-world scenarios, has emerged as a critical issue. Current benchmarks predominantly concentrate on problem-solving capabilities, which presents a substantial risk of model overfitting and fails to accurately represent genuine mathematical reasoning abilities. In this paper, we argue that if a model really understands a problem, it should be robustly and readily applied across a diverse array of tasks. Motivated by this, we introduce MATHCHECK, a well-designed checklist for testing task generalization and reasoning robustness, as well as an automatic tool to generate checklists efficiently. MATHCHECK includes multiple mathematical reasoning tasks and robustness test types to facilitate a comprehensive evaluation of both mathematical reasoning ability and behavior testing. Utilizing MATHCHECK, we develop MATHCHECK-GSM and MATHCHECK-GEO to assess mathematical textual reasoning and multi-modal reasoning capabilities, respectively, serving as upgraded versions of benchmarks including GSM8k, GeoQA, UniGeo, and Geometry3K. We adopt MATHCHECK-GSM and MATHCHECK-GEO to evaluate over 20 LLMs and 11 MLLMs, assessing their comprehensive mathematical reasoning abilities. Our results demonstrate that while frontier LLMs like GPT-4o continue to excel in various abilities on the checklist, many other model families exhibit a significant decline. Further experiments indicate that, compared to traditional math benchmarks, MATHCHECK better reflects true mathematical abilities and represents mathematical intelligence more linearly, thereby supporting our design. On our MATHCHECK, we can easily conduct detailed behavior analysis to deeply investigate models.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models
Authors:
Wanling Gao,
Yunyou Huang,
Dandan Cui,
Zhuoming Yu,
Wenjing Liu,
Xiaoshuang Liang,
Jiahui Zhao,
Jiyue Xie,
Hao Li,
Li Ma,
Ning Ye,
Yumiao Kang,
Dingfeng Luo,
Peng Pan,
Wei Huang,
Zhongmou Liu,
Jizhong Hu,
Gangyuan Zhao,
Chongrong Jiang,
Fan Huang,
Tianyi Wei,
Suqin Tang,
Bingjie Xia,
Zhifei Zhang,
Jianfeng Zhan
Abstract:
A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of cl…
▽ More
A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of clinicians in collaborating with AI, pivotal for determining its impact on clinical practice, is often overlooked. For the first time, we emphasize the critical necessity for rigorous and cost-effective evaluation methodologies for AI models in clinical practice, featuring patient/clinician-centered (dual-centered) AI randomized controlled trials (DC-AI RCTs) and virtual clinician-based in-silico trials (VC-MedAI) as an effective proxy for DC-AI RCTs. Leveraging 7500 diagnosis records from two-phase inaugural DC-AI RCTs across 14 medical centers with 125 clinicians, our results demonstrate the necessity of DC-AI RCTs and the effectiveness of VC-MedAI. Notably, VC-MedAI performs comparably to human clinicians, replicating insights and conclusions from prospective DC-AI RCTs. We envision DC-AI RCTs and VC-MedAI as pivotal advancements, presenting innovative and transformative evaluation methodologies for AI models in clinical practice, offering a preclinical-like setting mirroring conventional medicine, and reshaping development paradigms in a cost-effective and fast-iterative manner. Chinese Clinical Trial Registration: ChiCTR2400086816.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$
Authors:
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (645 additional authors not shown)
Abstract:
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be…
▽ More
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Fusion of atomic W-like states in cavity QED systems
Authors:
Cheng-Yun Ding,
Wan-Fang Liu,
Li-Hua Zhang
Abstract:
It is well-known that maximally entangled GHZ states can achieve perfect teleportation and superdense coding, whereas maximally entangled W states cannot. However, it has been demonstrated that there exists a special class of non-maximally entangled W states, called as \textit{W-like} states, which can overcome this limitation. Therefore, it is of great significance to prepare such W-like states f…
▽ More
It is well-known that maximally entangled GHZ states can achieve perfect teleportation and superdense coding, whereas maximally entangled W states cannot. However, it has been demonstrated that there exists a special class of non-maximally entangled W states, called as \textit{W-like} states, which can overcome this limitation. Therefore, it is of great significance to prepare such W-like states for efficient quantum communication. Here, we propose two kinds of novel and efficient fusion schemes for atomic W-like states based on the large-detuning interactions between several atoms and a single-mode cavity field, with which large-scale atomic $|\mathcal{W}_{N+M-1}\rangle$ and $|\mathcal{W}_{N+M+T-2}\rangle$ states can be prepared, respectively, from two small-scale atomic $|\mathcal{W}_{N}\rangle$ and $|\mathcal{W}_{M}\rangle$ states and three small-scale atomic $|\mathcal{W}_{N}\rangle$, $|\mathcal{W}_{M}\rangle$ and $|\mathcal{W}_{T}\rangle$ states, by detecting the states of one or two of the fused atoms. Particularly, although the fusion process of our scheme involves particle loss, the corresponding success probability is high and fixed, which may induce high fusion efficiency. Furthermore, through the investigation of the resource cost and feasibility analysis, our protocol is simple and feasible under the current experimental conditions. All these suggest that it provides an alternative strategy for preparing large-scale atomic W-like states for perfect teleportation and superdense coding.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Timing Recovery for Non-Orthogonal Multiple Access with Asynchronous Clock
Authors:
Qingxin Lu,
Haide Wang,
Wenxuan Mo,
Ji Zhou,
Weiping Liu,
Changyuan Yu
Abstract:
A passive optical network (PON) based on non-orthogonal multiple access (NOMA) meets low latency and high capacity. In the NOMA-PON, the asynchronous clock between the strong and weak optical network units (ONUs) causes the timing error and phase noise on the signal of the weak ONU. The theoretical derivation shows that the timing error and phase noise can be independently compensated. In this Let…
▽ More
A passive optical network (PON) based on non-orthogonal multiple access (NOMA) meets low latency and high capacity. In the NOMA-PON, the asynchronous clock between the strong and weak optical network units (ONUs) causes the timing error and phase noise on the signal of the weak ONU. The theoretical derivation shows that the timing error and phase noise can be independently compensated. In this Letter, we propose a timing recovery (TR) algorithm based on an absolute timing error detector (Abs TED) and a pilot-based carrier phase recovery (CPR) to eliminate the timing error and phase noise separately. An experiment for 25G NOMA-PON is set up to verify the feasibility of the proposed algorithms. The weak ONU can achieve the 20% soft-decision forward error correction limit after compensating for timing error and phase noise. In conclusion, the proposed TR and the pilot-based CPR show great potential for the NOMA-PON.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Shadow of slowly rotating Kalb-Ramond black holes
Authors:
Wentao Liu,
Di Wu,
Jieci Wang
Abstract:
Real astronomical objects possess spin, yet deriving exact solutions for rotating black holes within gravitational theories is a formidable challenge. To understand the shadow of rotating black holes in Lorentz-violating spacetimes induced by antisymmetric tensor fields, known as Kalb-Ramond (KR) fields, we have focused on the slow-rotation approximation framework. Using this approach, we have obt…
▽ More
Real astronomical objects possess spin, yet deriving exact solutions for rotating black holes within gravitational theories is a formidable challenge. To understand the shadow of rotating black holes in Lorentz-violating spacetimes induced by antisymmetric tensor fields, known as Kalb-Ramond (KR) fields, we have focused on the slow-rotation approximation framework. Using this approach, we have obtained first-order rotation series solutions, which describe slowly rotating KR black holes. For this solutions, we have plotted the black hole shadow contours under various parameters using the numerical backward ray-tracing method. As the Lorentz-violating parameter increases, not only the apparent size of the black hole shadow decreases, but also the effects of rotation, such as the D-shaped structure and frame-dragging, are amplified. Furthermore, the KR field also enhances gravitational lensing, causing the shadow to occupy a larger area within the photon ring. This distinctive feature can differentiate KR gravity from general relativity. Additionally, using the latest observational data from EHT on M87* and Sgr A*, we have provided constraints on the Lorentz-violating parameter of rotating KR black holes. We found that, compared to static black holes, rotating black holes allow for the presence of stronger Lorentz violation effects.
△ Less
Submitted 13 July, 2024; v1 submitted 10 July, 2024;
originally announced July 2024.
-
On the Iitaka volumes of log canonical surfaces and threefolds
Authors:
Guodu Chen,
Jingjun Han,
Wenfei Liu
Abstract:
Given positive integers $d\geqκ$, and a subset $Γ\subset [0,1]$, let $\mathrm{Ivol}_{\mathrm{lc}}^Γ(d,κ)$ denote the set of Iitaka volumes of $d$-dimensional projective log canonical pairs $(X, B)$ such that the Iitaka--Kodaira dimension $κ(K_X+B)=κ$ and the coefficients of $B$ come from $Γ$. In this paper, we show that, if $Γ$ satisfies the descending chain condition, then so does…
▽ More
Given positive integers $d\geqκ$, and a subset $Γ\subset [0,1]$, let $\mathrm{Ivol}_{\mathrm{lc}}^Γ(d,κ)$ denote the set of Iitaka volumes of $d$-dimensional projective log canonical pairs $(X, B)$ such that the Iitaka--Kodaira dimension $κ(K_X+B)=κ$ and the coefficients of $B$ come from $Γ$. In this paper, we show that, if $Γ$ satisfies the descending chain condition, then so does $\mathrm{Ivol}_\mathrm{lc}^Γ(d,κ)$ for $d\leq 3$. In case $d\leq 3$ and $κ=1$, $Γ$ and $\mathrm{Ivol}_\mathrm{lc}^Γ(d,κ)$ are shown to share more topological properties, such as closedness in $\mathbb{R}$ and local finiteness of accumulation complexity. In higher dimensions, we show that the set of Iitaka volumes for $d$-dimensional klt pairs with Iitaka dimension $\geq d-2$ satisfies the DCC, partially confirming a conjecture of Zhan Li.
We give a more detailed description of the sets of Iitaka volumes for the following classes of projective log canonical surfaces: (1) smooth properly elliptic surfaces, (2) projective log canonical surfaces with coefficients from $\{0\}$ or $\{0,1\}$. In particular, the minima as well as the minimal accumulation points are found in these cases.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
In-Orbit Processing or Not? Sunlight-Aware Task Scheduling for Energy-Efficient Space Edge Computing Networks
Authors:
Weisen Liu,
Zeqi Lai,
Qian Wu,
Hewu Li,
Qi Zhang,
Zonglun Li,
Yuanjie Li,
Jun Liu
Abstract:
With the rapid evolution of space-borne capabilities, space edge computing (SEC) is becoming a new computation paradigm for future integrated space and terrestrial networks. Satellite edges adopt advanced on-board hardware, which not only enables new opportunities to perform complex intelligent tasks in orbit, but also involves new challenges due to the additional energy consumption in power-const…
▽ More
With the rapid evolution of space-borne capabilities, space edge computing (SEC) is becoming a new computation paradigm for future integrated space and terrestrial networks. Satellite edges adopt advanced on-board hardware, which not only enables new opportunities to perform complex intelligent tasks in orbit, but also involves new challenges due to the additional energy consumption in power-constrained space environment. In this paper, we present PHOENIX, an energy-efficient task scheduling framework for emerging SEC networks. PHOENIX exploits a key insight that in the SEC network, there always exist a number of sunlit edges which are illuminated during the entire orbital period and have sufficient energy supplement from the sun. PHOENIX accomplishes energy-efficient in-orbit computing by judiciously offloading space tasks to "sunlight-sufficient" edges or to the ground. Specifically, PHOENIX first formulates the SEC battery energy optimizing (SBEO) problem which aims at minimizing the average battery energy consumption while satisfying various task completion constraints. Then PHOENIX incorporates a sunlight-aware scheduling mechanism to solve the SBEO problem and schedule SEC tasks efficiently. Finally, we implement a PHOENIX prototype and build an SEC testbed. Extensive data-driven evaluations demonstrate that as compared to other state-of-the-art solutions, PHOENIX can effectively reduce up to 54.8% SEC battery energy consumption and prolong battery lifetime to 2.9$\times$ while still completing tasks on time.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Plasmonic Vortices Host Magnetoelectric Interactions
Authors:
Atreyie Ghosh,
Sena Yang,
Yanan Dai,
W. Vincent Liu,
Hrvoje Petek
Abstract:
The vector cross product and pseudoscalar dot products of electric (E) and magnetic (H) fields are separately finite in vacuum transverse electric and magnetic (TEM) plane waves, and angular momentum structured light. Current theories of interactions beyond the standard model of particle physics invoke non-zero dot(E,H) as the source term in the axion law that describes interactions with the cosmo…
▽ More
The vector cross product and pseudoscalar dot products of electric (E) and magnetic (H) fields are separately finite in vacuum transverse electric and magnetic (TEM) plane waves, and angular momentum structured light. Current theories of interactions beyond the standard model of particle physics invoke non-zero dot(E,H) as the source term in the axion law that describes interactions with the cosmological dark matter axion particles outside of the quartet of Maxwells equations. The non-zero dot(E,H) also drives relativistic spin-charge magnetoelectric excitations of axion quasiparticles at a distinctively higher condensed matter scale in magnetic and topological materials. Yet, how to drive coherent dot(E,H) responses is unknown, and provides motivation to examine the field polarizations in structured light on a deep sub-diffraction limited spatial scale and sub-optical cycle temporal scale by ultrafast nonlinear photoemission electron microscopy. By analytical theory and ultrafast coherent photoemission electron microscopy, we image dot(E,H) fields in surface plasmon polariton vortex cores at subwavelength scales, where we find that the magnetoelectric relative to the dipole density is intensified on a ~10 nm diameter scale as a universal property of plasmonic vortex fields. The generation and nanoscale localization of dot(E,H) fields introduces the magnetoelectric symmetry class, having the parity and time reversal broken, but the joint parity-time reversal symmetry preserved. The ability to image the optical fields of plasmonic vortex cores opens the research of ultrafast microscopy of magnetoelectric responses and interactions with axion quasiparticles in solid state materials.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
MoSt-DSA: Modeling Motion and Structural Interactions for Direct Multi-Frame Interpolation in DSA Images
Authors:
Ziyang Xu,
Huangxuan Zhao,
Ziwei Cui,
Wenyu Liu,
Chuansheng Zheng,
Xinggang Wang
Abstract:
Artificial intelligence has become a crucial tool for medical image analysis. As an advanced cerebral angiography technique, Digital Subtraction Angiography (DSA) poses a challenge where the radiation dose to humans is proportional to the image count. By reducing images and using AI interpolation instead, the radiation can be cut significantly. However, DSA images present more complex motion and s…
▽ More
Artificial intelligence has become a crucial tool for medical image analysis. As an advanced cerebral angiography technique, Digital Subtraction Angiography (DSA) poses a challenge where the radiation dose to humans is proportional to the image count. By reducing images and using AI interpolation instead, the radiation can be cut significantly. However, DSA images present more complex motion and structural features than natural scenes, making interpolation more challenging. We propose MoSt-DSA, the first work that uses deep learning for DSA frame interpolation. Unlike natural scene Video Frame Interpolation (VFI) methods that extract unclear or coarse-grained features, we devise a general module that models motion and structural context interactions between frames in an efficient full convolution manner by adjusting optimal context range and transforming contexts into linear functions. Benefiting from this, MoSt-DSA is also the first method that directly achieves any number of interpolations at any time steps with just one forward pass during both training and testing. We conduct extensive comparisons with 7 representative VFI models for interpolating 1 to 3 frames, MoSt-DSA demonstrates robust results across 470 DSA image sequences (each typically 152 images), with average SSIM over 0.93, average PSNR over 38 (standard deviations of less than 0.030 and 3.6, respectively), comprehensively achieving state-of-the-art performance in accuracy, speed, visual effect, and memory usage. Our code is available at https://github.com/ZyoungXu/MoSt-DSA.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Regularization by Nonlinear Noise for PDEs: Well-posedness and Finite Time Extinction
Authors:
Wei Hong,
Shihu Li,
Wei Liu
Abstract:
This work focuses on the global existence, uniqueness and the Feller property for a class of stochastic partial differential equations by adding a suitable nonlinear noise, while the corresponding deterministic equations may only have local solutions. In particular, we discover a new phenomenon that for a potentially explosive deterministic system, an appropriate intervention of nonlinear noise ca…
▽ More
This work focuses on the global existence, uniqueness and the Feller property for a class of stochastic partial differential equations by adding a suitable nonlinear noise, while the corresponding deterministic equations may only have local solutions. In particular, we discover a new phenomenon that for a potentially explosive deterministic system, an appropriate intervention of nonlinear noise can not only prevent blow-up but also lead to the finite time extinction of the associated stochastic system.
Furthermore, our main results have broad applications, including not only all locally monotone stochastic equations in the variational framework (cf. \cite{LR2,LR1,RSZ1}), but also several new models such as stochastic $p$-Laplace equations with heat sources, stochastic 3D Navier-Stokes equations, stochastic quasi-geostrophic equations and stochastic surface growth models, which provide positive answers to some longstanding open problems in this field.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
SKYCASTLE: Taming LEO Mobility to Facilitate Seamless and Low-latency Satellite Internet Services
Authors:
Jihao Li,
Hewu Li,
Zeqi Lai,
Qian Wu,
Weisen Liu,
Xiaomo Wang,
Yuanjie Li,
Jun Liu,
Qi Zhang
Abstract:
Emerging integrated space and terrestrial networks (ISTN) built upon low earth orbit (LEO) satellite constellations aim at providing planet-wide Internet services, not only for residential users, but also for mobile users (e.g., in airplane and cruise scenarios). Efficiently managing global mobility and keeping connections active for mobile users is critical for ISTN operators. However, our quanti…
▽ More
Emerging integrated space and terrestrial networks (ISTN) built upon low earth orbit (LEO) satellite constellations aim at providing planet-wide Internet services, not only for residential users, but also for mobile users (e.g., in airplane and cruise scenarios). Efficiently managing global mobility and keeping connections active for mobile users is critical for ISTN operators. However, our quantitative analysis identifies that existing mobility management (MM) schemes suffer from frequent connection interruptions and long latency in ISTN scenarios. The fundamental challenge stems from a unique characteristic of ISTNs: not only users are mobile, but also core network infrastructures (i.e., LEO satellites) are frequently changing their locations in the network. To facilitate seamless and low-latency satellite Internet services, this paper presents SKYCASTLE, a novel network-based global mobility management mechanism. SKYCASTLE incorporates two key techniques to address frequent connection interruptions in ISTNs. First, to reduce the interruption time, SKYCASTLE adopts distributed satellite anchors to track the location changes of mobile nodes, manage handovers and avoid routing convergence. Second, SKYCASTLE leverages an anchor manager to schedule MM functionalities at satellites to reduce deployment costs while guaranteeing low latency. Extensive evaluations combining real constellation information and mobile user trajectories show that: SKYCASTLE can improve up to 55.8% uninterrupted time and reduce 47.8% latency as compared to other existing MM solutions.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Using Graph Neural Networks and Frequency Domain Data for Automated Operational Modal Analysis of Populations of Structures
Authors:
Xudong Jian,
Yutong Xia,
Gregory Duthé,
Kiran Bacsa,
Wei Liu,
Eleni Chatzi
Abstract:
The Population-Based Structural Health Monitoring (PBSHM) paradigm has recently emerged as a promising approach to enhance data-driven assessment of engineering structures by facilitating transfer learning between structures with some degree of similarity. In this work, we apply this concept to the automated modal identification of structural systems. We introduce a Graph Neural Network (GNN)-base…
▽ More
The Population-Based Structural Health Monitoring (PBSHM) paradigm has recently emerged as a promising approach to enhance data-driven assessment of engineering structures by facilitating transfer learning between structures with some degree of similarity. In this work, we apply this concept to the automated modal identification of structural systems. We introduce a Graph Neural Network (GNN)-based deep learning scheme to identify modal properties, including natural frequencies, damping ratios, and mode shapes of engineering structures based on the Power Spectral Density (PSD) of spatially-sparse vibration measurements. Systematic numerical experiments are conducted to evaluate the proposed model, employing two distinct truss populations that possess similar topological characteristics but varying geometric (size, shape) and material (stiffness) properties. The results demonstrate that, once trained, the proposed GNN-based model can identify modal properties of unseen structures within the same structural population with good efficiency and acceptable accuracy, even in the presence of measurement noise and sparse measurement locations. The GNN-based model exhibits advantages over the classic Frequency Domain Decomposition (FDD) method in terms of identification speed, as well as against an alternate Multi-Layer Perceptron (MLP) architecture in terms of identification accuracy, rendering this a promising tool for PBSHM purposes.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations
Authors:
Bowen Shen,
Zheng Lin,
Daren Zha,
Wei Liu,
Jian Luan,
Bin Wang,
Weiping Wang
Abstract:
Structured pruning fundamentally reduces computational and memory overheads of large language models (LLMs) and offers a feasible solution for end-side LLM deployment. Structurally pruned models remain dense and high-precision, highly compatible with further tuning and compression. However, as the coarse-grained structured pruning poses large damage to the highly interconnected model, achieving a…
▽ More
Structured pruning fundamentally reduces computational and memory overheads of large language models (LLMs) and offers a feasible solution for end-side LLM deployment. Structurally pruned models remain dense and high-precision, highly compatible with further tuning and compression. However, as the coarse-grained structured pruning poses large damage to the highly interconnected model, achieving a high compression ratio for scaled-up LLMs remains a challenge. In this paper, we introduce a task-agnostic structured pruning approach coupled with a compact Transformer architecture design. The proposed approach, named TransAct, reduces transitional activations inside multi-head attention (MHA) and multi-layer perceptron (MLP) modules, while preserving the inter-module activations that are sensitive to perturbations. Hence, the LLM is pruned into an intra-module low-rank architecture, significantly reducing weights, KV Cache and attention computation. TransAct is implemented on the LLaMA model and evaluated on downstream benchmarks. Results verify the optimality of our approach at high compression with respect to both efficiency and performance. Further, ablation studies reveal the strength of activation-guided iterative pruning and provide experimental analysis on the redundancy of MHA and MLP modules.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Experimental investigation of direct non-Hermitian measurement and uncertainty relation towards high-dimensional quantum domain
Authors:
Yi-Tao Wang,
Zhao-An Wang,
Zhi-Peng Li,
Xiao-Dong Zeng,
Jia-Ming Ren,
Wei Liu,
Yuan-Ze Yang,
Nai-Jie Guo,
Lin-Ke Xie,
Jun-You Liu,
Yu-Hang Ma,
Jian-Shun Tang,
Chengjie Zhang,
Chuan-Feng Li,
Guang-Can Guo
Abstract:
Non-Hermitian dynamics in quantum systems have unveiled novel phenomena, yet the implementation of valid non-Hermitian quantum measurement remains a challenge, because a universal quantum projective mechanism on the complete but skewed non-Hermitian eigenstates is not explicit in experiment. This limitation hinders the direct acquisition of non-Hermitian observable statistics (e.g., non-Hermitian…
▽ More
Non-Hermitian dynamics in quantum systems have unveiled novel phenomena, yet the implementation of valid non-Hermitian quantum measurement remains a challenge, because a universal quantum projective mechanism on the complete but skewed non-Hermitian eigenstates is not explicit in experiment. This limitation hinders the direct acquisition of non-Hermitian observable statistics (e.g., non-Hermitian population dynamics), also constrains investigations of non-Hermitian quantum measurement properties such as uncertainty relation. Here, we address these challenges by presenting a non-Hermitian projective protocol and investigating the non-Hermitian uncertainty relation. We derive the uncertainty relation for pseudo-Hermitian (PH) observables that is generalized beyond the Hermitian ones. We then investigate the projective properties of general quantum states onto complete non-Hermitian eigenvectors, and present a quantum simulating method to apply the valid non-Hermitian projective measurement on a direct-sum dilated space. Subsequently, we experimentally construct a quantum simulator in the quantum optical circuit and realize the 3-dimensional non-Hermitian quantum measurement on the single-photon qutrit. Employing this platform, we explore the uncertainty relation experimentally with different PH metrics. Our non-Hermitian quantum measurement method is state-independent and outputs directly the non-Hermitian quantum projective statistics, paving the way for studies of extensive non-Hermitian observable in quantum domain.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
MemoCRS: Memory-enhanced Sequential Conversational Recommender Systems with Large Language Models
Authors:
Yunjia Xi,
Weiwen Liu,
Jianghao Lin,
Bo Chen,
Ruiming Tang,
Weinan Zhang,
Yong Yu
Abstract:
Conversational recommender systems (CRSs) aim to capture user preferences and provide personalized recommendations through multi-round natural language dialogues. However, most existing CRS models mainly focus on dialogue comprehension and preferences mining from the current dialogue session, overlooking user preferences in historical dialogue sessions. The preferences embedded in the user's histo…
▽ More
Conversational recommender systems (CRSs) aim to capture user preferences and provide personalized recommendations through multi-round natural language dialogues. However, most existing CRS models mainly focus on dialogue comprehension and preferences mining from the current dialogue session, overlooking user preferences in historical dialogue sessions. The preferences embedded in the user's historical dialogue sessions and the current session exhibit continuity and sequentiality, and we refer to CRSs with this characteristic as sequential CRSs. In this work, we leverage memory-enhanced LLMs to model the preference continuity, primarily focusing on addressing two key issues: (1) redundancy and noise in historical dialogue sessions, and (2) the cold-start users problem. To this end, we propose a Memory-enhanced Conversational Recommender System Framework with Large Language Models (dubbed MemoCRS) consisting of user-specific memory and general memory. User-specific memory is tailored to each user for their personalized interests and implemented by an entity-based memory bank to refine preferences and retrieve relevant memory, thereby reducing the redundancy and noise of historical sessions. The general memory, encapsulating collaborative knowledge and reasoning guidelines, can provide shared knowledge for users, especially cold-start users. With the two kinds of memory, LLMs are empowered to deliver more precise and tailored recommendations for each user. Extensive experiments on both Chinese and English datasets demonstrate the effectiveness of MemoCRS.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
OneRestore: A Universal Restoration Framework for Composite Degradation
Authors:
Yu Guo,
Yuan Gao,
Yuxu Lu,
Huilin Zhu,
Ryan Wen Liu,
Shengfeng He
Abstract:
In real-world scenarios, image impairments often manifest as composite degradations, presenting a complex interplay of elements such as low light, haze, rain, and snow. Despite this reality, existing restoration methods typically target isolated degradation types, thereby falling short in environments where multiple degrading factors coexist. To bridge this gap, our study proposes a versatile imag…
▽ More
In real-world scenarios, image impairments often manifest as composite degradations, presenting a complex interplay of elements such as low light, haze, rain, and snow. Despite this reality, existing restoration methods typically target isolated degradation types, thereby falling short in environments where multiple degrading factors coexist. To bridge this gap, our study proposes a versatile imaging model that consolidates four physical corruption paradigms to accurately represent complex, composite degradation scenarios. In this context, we propose OneRestore, a novel transformer-based framework designed for adaptive, controllable scene restoration. The proposed framework leverages a unique cross-attention mechanism, merging degraded scene descriptors with image features, allowing for nuanced restoration. Our model allows versatile input scene descriptors, ranging from manual text embeddings to automatic extractions based on visual attributes. Our methodology is further enhanced through a composite degradation restoration loss, using extra degraded images as negative samples to fortify model constraints. Comparative results on synthetic and real-world datasets demonstrate OneRestore as a superior solution, significantly advancing the state-of-the-art in addressing complex, composite degradations.
△ Less
Submitted 10 July, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
Segment Any 4D Gaussians
Authors:
Shengxiang Ji,
Guanjun Wu,
Jiemin Fang,
Jiazhong Cen,
Taoran Yi,
Wenyu Liu,
Qi Tian,
Xinggang Wang
Abstract:
Modeling, understanding, and reconstructing the real world are crucial in XR/VR. Recently, 3D Gaussian Splatting (3D-GS) methods have shown remarkable success in modeling and understanding 3D scenes. Similarly, various 4D representations have demonstrated the ability to capture the dynamics of the 4D world. However, there is a dearth of research focusing on segmentation within 4D representations.…
▽ More
Modeling, understanding, and reconstructing the real world are crucial in XR/VR. Recently, 3D Gaussian Splatting (3D-GS) methods have shown remarkable success in modeling and understanding 3D scenes. Similarly, various 4D representations have demonstrated the ability to capture the dynamics of the 4D world. However, there is a dearth of research focusing on segmentation within 4D representations. In this paper, we propose Segment Any 4D Gaussians (SA4D), one of the first frameworks to segment anything in the 4D digital world based on 4D Gaussians. In SA4D, an efficient temporal identity feature field is introduced to handle Gaussian drifting, with the potential to learn precise identity features from noisy and sparse input. Additionally, a 4D segmentation refinement process is proposed to remove artifacts. Our SA4D achieves precise, high-quality segmentation within seconds in 4D Gaussians and shows the ability to remove, recolor, compose, and render high-quality anything masks. More demos are available at: https://jsxzs.github.io/sa4d/.
△ Less
Submitted 12 July, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
Occupancy as Set of Points
Authors:
Yiang Shi,
Tianheng Cheng,
Qian Zhang,
Wenyu Liu,
Xinggang Wang
Abstract:
In this paper, we explore a novel point representation for 3D occupancy prediction from multi-view images, which is named Occupancy as Set of Points. Existing camera-based methods tend to exploit dense volume-based representation to predict the occupancy of the whole scene, making it hard to focus on the special areas or areas out of the perception range. In comparison, we present the Points of In…
▽ More
In this paper, we explore a novel point representation for 3D occupancy prediction from multi-view images, which is named Occupancy as Set of Points. Existing camera-based methods tend to exploit dense volume-based representation to predict the occupancy of the whole scene, making it hard to focus on the special areas or areas out of the perception range. In comparison, we present the Points of Interest (PoIs) to represent the scene and propose OSP, a novel framework for point-based 3D occupancy prediction. Owing to the inherent flexibility of the point-based representation, OSP achieves strong performance compared with existing methods and excels in terms of training and inference adaptability. It extends beyond traditional perception boundaries and can be seamlessly integrated with volume-based methods to significantly enhance their effectiveness. Experiments on the Occ3D nuScenes occupancy benchmark show that OSP has strong performance and flexibility. Code and models are available at \url{https://github.com/hustvl/osp}.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Compact ultra-broadband light coupling on chip via nonadiabatic pumping
Authors:
Weiwei Liu,
Chijun Li,
Bing Wang,
Tianyan Chai,
Lingzhi Zheng,
Zhuoxiong Liu,
Haoru Zhang,
Shuaifei Ren,
Xiaohong Li,
Cheng Zeng,
Jinsong Xia,
Peixiang Lu
Abstract:
Enlarging bandwidth capacity of the integrated photonic systems demands efficient and broadband light coupling among optical elements, which has been a vital issue in integrated photonics. Here, we have developed a compact ultra-broadband light coupling strategy based on nonadiabatic pumping in coupled optical waveguides, and experimentally demonstrated the designs in thin-film lithium niobate on…
▽ More
Enlarging bandwidth capacity of the integrated photonic systems demands efficient and broadband light coupling among optical elements, which has been a vital issue in integrated photonics. Here, we have developed a compact ultra-broadband light coupling strategy based on nonadiabatic pumping in coupled optical waveguides, and experimentally demonstrated the designs in thin-film lithium niobate on insulator (LNOI) platform. We found that nonadiabatic transition would produce a decreased dispersion of the phases related to eigenstates in the waveguides. As a consequence, we realized high-efficiency directional transfer between edgestates for various wavelengths covering a 1-dB bandwidth of ~320 nm in experiment (>400 nm in simulation), with a coupling length (~50 μm) approximately 1/10 of that required in the adiabatic regime. Furthermore, we have constructed complex functional devices including beamsplitter and multiple-level cascaded networks for broadband light routing and splitting. Our work preserves significant advantages simultaneously in extending the operation bandwidth and minimizing the footprint, which demonstrates great potential for large-scale and compact photonic integration on chip.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Continuous-variable quantum digital signatures against coherent attacks
Authors:
Yi-Fan Zhang,
Wen-Bo Liu,
Bing-Hong Li,
Hua-Lei Yin,
Zeng-Bing Chen
Abstract:
Quantum digital signatures (QDS), which utilize correlated bit strings among sender and recipients, guarantee the authenticity, integrity and non-repudiation of classical messages based on quantum laws. Continuous-variable (CV) quantum protocol with heterodyne and homodyne measurement has obvious advantages of low-cost implementation and easy wavelength division multiplexing. However, security ana…
▽ More
Quantum digital signatures (QDS), which utilize correlated bit strings among sender and recipients, guarantee the authenticity, integrity and non-repudiation of classical messages based on quantum laws. Continuous-variable (CV) quantum protocol with heterodyne and homodyne measurement has obvious advantages of low-cost implementation and easy wavelength division multiplexing. However, security analyses in previous researches are limited to the proof against collective attacks in finite-size scenarios. Moreover, existing multi-bit CV QDS schemes have primarily focused on adapting single-bit protocols for simplicity of security proof, often sacrificing signature efficiency. Here, we introduce a CV QDS protocol designed to withstand general coherent attacks through the use of a cutting-edge fidelity test function, while achieving high signature efficiency by employing a refined one-time universal hashing signing technique. Our protocol is proved to be robust against finite-size effects and excess noise in quantum channels. In simulation, results demonstrate a significant reduction of over 6 orders of magnitude in signature length for a megabit message signing task compared to existing CV QDS protocols and this advantage expands as the message size grows. Our work offers a solution with enhanced security and efficiency, paving the way for large-scale deployment of CV QDS in future quantum networks.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Consistent Point Orientation for Manifold Surfaces via Boundary Integration
Authors:
Weizhou Liu,
Xingce Wang,
Haichuan Zhao,
Xingfei Xue,
Zhongke Wu,
Xuequan Lu,
Ying He
Abstract:
This paper introduces a new approach for generating globally consistent normals for point clouds sampled from manifold surfaces. Given that the generalized winding number (GWN) field generated by a point cloud with globally consistent normals is a solution to a PDE with jump boundary conditions and possesses harmonic properties, and the Dirichlet energy of the GWN field can be defined as an integr…
▽ More
This paper introduces a new approach for generating globally consistent normals for point clouds sampled from manifold surfaces. Given that the generalized winding number (GWN) field generated by a point cloud with globally consistent normals is a solution to a PDE with jump boundary conditions and possesses harmonic properties, and the Dirichlet energy of the GWN field can be defined as an integral over the boundary surface, we formulate a boundary energy derived from the Dirichlet energy of the GWN. Taking as input a point cloud with randomly oriented normals, we optimize this energy to restore the global harmonicity of the GWN field, thereby recovering the globally consistent normals. Experiments show that our method outperforms state-of-the-art approaches, exhibiting enhanced robustness to noise, outliers, complex topologies, and thin structures. Our code can be found at \url{https://github.com/liuweizhou319/BIM}.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be…
▽ More
A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be $\mathcal{B}(J/ψ\to p \bar{p} η(η\to γγ)) = (1.480 \pm 0.001 \pm 0.024)\times\,10^{-3}$ and $\mathcal{B}(J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)) = (1.557 \pm 0.003 \pm 0.038)\times\,10^{-3}$, where the first uncertainties are statistical and the second systematic. Both results are compatible within their uncorrelated systematic uncertainties. The combined result is $\mathcal{B}(J/ψ\to p \bar{p} η)=(1.495 \pm 0.001 \pm 0.023)\times\,10^{-3}$ where the first uncertainty is the combined statistical uncertainty and the second one the combined systematic uncertainty of both analyses, incorporating correlations between them. In addition, the $p \bar{p}$ threshold region is investigated for a potential threshold enhancement, and no evidence for one is observed.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Strange metal at van Hove singularity in magnetic heterostructure
Authors:
Yi-Hui Xing,
Wu-Ming Liu,
Xiao-Tian Zhang
Abstract:
We investigate the non-Fermi liquid (NFL) behaviors at the two-dimensional (2D) magnetic heterostructure interface between a normal metal and a magnetic insulator. The normal metal undergoes a Lifshitz transition with van Hove singularities (VHS) saddle points tuned onto the Fermi surface, which represents a convex-to-concave geometric transition of the 2D Fermi surface. By coupling to critical sp…
▽ More
We investigate the non-Fermi liquid (NFL) behaviors at the two-dimensional (2D) magnetic heterostructure interface between a normal metal and a magnetic insulator. The normal metal undergoes a Lifshitz transition with van Hove singularities (VHS) saddle points tuned onto the Fermi surface, which represents a convex-to-concave geometric transition of the 2D Fermi surface. By coupling to critical spin fluctuations via interfacial exchange interactions, we uncover a novel NFL phase at the VHS with the quasiparticle lifetime scaling as $\sim ω^{1/2}$. This NFL feature dominates over the rest of the Fermi surface points in the low-energy limit. At finite energies, it crossovers to the marginal Fermi liquid. Importantly, we demonstrate the strange metal behaviors, that includes the linear-in-$T$ resistivity and the $T\ln (1/T)$ specific heat, already present with the spatially uniform interaction, which broadens the scenario in \href{https://www.science.org/doi/abs/10.1126/science.abq6011}{Science {\bf 381}, 790 (2023)}. Furthermore, we propose the non-magnetic ARPES can directly probe the NFL features at heterostructure interface. Our findings explore the interplay of VHS in lower dimension, disorder and strong interactions, which paves a novel path to study the quantum critical phenomenon, 2D magnetism and spintronics.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Feynman-Kac Operator Expectation Estimator
Authors:
Jingyuan Li,
Wei Liu
Abstract:
The Feynman-Kac Operator Expectation Estimator (FKEE) is an innovative method for estimating the target Mathematical Expectation $\mathbb{E}_{X\sim P}[f(X)]$ without relying on a large number of samples, in contrast to the commonly used Markov Chain Monte Carlo (MCMC) Expectation Estimator. FKEE comprises diffusion bridge models and approximation of the Feynman-Kac operator. The key idea is to use…
▽ More
The Feynman-Kac Operator Expectation Estimator (FKEE) is an innovative method for estimating the target Mathematical Expectation $\mathbb{E}_{X\sim P}[f(X)]$ without relying on a large number of samples, in contrast to the commonly used Markov Chain Monte Carlo (MCMC) Expectation Estimator. FKEE comprises diffusion bridge models and approximation of the Feynman-Kac operator. The key idea is to use the solution to the Feynmann-Kac equation at the initial time $u(x_0,0)=\mathbb{E}[f(X_T)|X_0=x_0]$. We use Physically Informed Neural Networks (PINN) to approximate the Feynman-Kac operator, which enables the incorporation of diffusion bridge models into the expectation estimator and significantly improves the efficiency of using data while substantially reducing the variance. Diffusion Bridge Model is a more general MCMC method. In order to incorporate extensive MCMC algorithms, we propose a new diffusion bridge model based on the Minimum Wasserstein distance. This diffusion bridge model is universal and reduces the training time of the PINN. FKEE also reduces the adverse impact of the curse of dimensionality and weakens the assumptions on the distribution of $X$ and performance function $f$ in the general MCMC expectation estimator. The theoretical properties of this universal diffusion bridge model are also shown. Finally, we demonstrate the advantages and potential applications of this method through various concrete experiments, including the challenging task of approximating the partition function in the random graph model such as the Ising model.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
SymPoint Revolutionized: Boosting Panoptic Symbol Spotting with Layer Feature Enhancement
Authors:
Wenlong Liu,
Tianyu Yang,
Qizhi Yu,
Lei Zhang
Abstract:
SymPoint is an initial attempt that utilizes point set representation to solve the panoptic symbol spotting task on CAD drawing. Despite its considerable success, it overlooks graphical layer information and suffers from prohibitively slow training convergence. To tackle this issue, we introduce SymPoint-V2, a robust and efficient solution featuring novel, streamlined designs that overcome these l…
▽ More
SymPoint is an initial attempt that utilizes point set representation to solve the panoptic symbol spotting task on CAD drawing. Despite its considerable success, it overlooks graphical layer information and suffers from prohibitively slow training convergence. To tackle this issue, we introduce SymPoint-V2, a robust and efficient solution featuring novel, streamlined designs that overcome these limitations. In particular, we first propose a Layer Feature-Enhanced module (LFE) to encode the graphical layer information into the primitive feature, which significantly boosts the performance. We also design a Position-Guided Training (PGT) method to make it easier to learn, which accelerates the convergence of the model in the early stages and further promotes performance. Extensive experiments show that our model achieves better performance and faster convergence than its predecessor SymPoint on the public benchmark. Our code and trained models are available at https://github.com/nicehuster/SymPointV2.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Memory Kernel Coupling Theory: Obtain Time Correlation Function from Higher-order Moments
Authors:
Wei Liu,
Yu Su,
Yao Wang,
Wenjie Dou
Abstract:
Dynamical observables can often be described by time correlation functions (TCFs). However, efficiently calculating TCFs for complex quantum systems is a significant challenge, which generally requires solving the full dynamics of the systems. This Letter presents the memory kernel coupling theory (MKCT), a general formalism for evaluating TCFs. The MKCT builds upon Mori's memory kernel formalism…
▽ More
Dynamical observables can often be described by time correlation functions (TCFs). However, efficiently calculating TCFs for complex quantum systems is a significant challenge, which generally requires solving the full dynamics of the systems. This Letter presents the memory kernel coupling theory (MKCT), a general formalism for evaluating TCFs. The MKCT builds upon Mori's memory kernel formalism for TCFs. Our theory further decomposes the memory kernel into auxiliary kernels. Rapid decay of auxiliary kernels allows us to truncate the coupled equations of motion with high accuracy. Notably, only higher-order moments are sufficient as the input for obtaining TCFs. While this formalism is general, we carry out the numerical demonstration for a typical open quantum system--the spin-boson model.
△ Less
Submitted 2 July, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
Sequential Manipulation Against Rank Aggregation: Theory and Algorithm
Authors:
Ke Ma,
Qianqian Xu,
Jinshan Zeng,
Wei Liu,
Xiaochun Cao,
Yingfei Sun,
Qingming Huang
Abstract:
Rank aggregation with pairwise comparisons is widely encountered in sociology, politics, economics, psychology, sports, etc . Given the enormous social impact and the consequent incentives, the potential adversary has a strong motivation to manipulate the ranking list. However, the ideal attack opportunity and the excessive adversarial capability cause the existing methods to be impractical. To fu…
▽ More
Rank aggregation with pairwise comparisons is widely encountered in sociology, politics, economics, psychology, sports, etc . Given the enormous social impact and the consequent incentives, the potential adversary has a strong motivation to manipulate the ranking list. However, the ideal attack opportunity and the excessive adversarial capability cause the existing methods to be impractical. To fully explore the potential risks, we leverage an online attack on the vulnerable data collection process. Since it is independent of rank aggregation and lacks effective protection mechanisms, we disrupt the data collection process by fabricating pairwise comparisons without knowledge of the future data or the true distribution. From the game-theoretic perspective, the confrontation scenario between the online manipulator and the ranker who takes control of the original data source is formulated as a distributionally robust game that deals with the uncertainty of knowledge. Then we demonstrate that the equilibrium in the above game is potentially favorable to the adversary by analyzing the vulnerability of the sampling algorithms such as Bernoulli and reservoir methods. According to the above theoretical analysis, different sequential manipulation policies are proposed under a Bayesian decision framework and a large class of parametric pairwise comparison models. For attackers with complete knowledge, we establish the asymptotic optimality of the proposed policies. To increase the success rate of the sequential manipulation with incomplete knowledge, a distributionally robust estimator, which replaces the maximum likelihood estimation in a saddle point problem, provides a conservative data generation solution. Finally, the corroborating empirical evidence shows that the proposed method manipulates the results of rank aggregation methods in a sequential manner.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Centerline Boundary Dice Loss for Vascular Segmentation
Authors:
Pengcheng Shi,
Jiesi Hu,
Yanwu Yang,
Zilve Gao,
Wei Liu,
Ting Ma
Abstract:
Vascular segmentation in medical imaging plays a crucial role in analysing morphological and functional assessments. Traditional methods, like the centerline Dice (clDice) loss, ensure topology preservation but falter in capturing geometric details, especially under translation and deformation. The combination of clDice with traditional Dice loss can lead to diameter imbalance, favoring larger ves…
▽ More
Vascular segmentation in medical imaging plays a crucial role in analysing morphological and functional assessments. Traditional methods, like the centerline Dice (clDice) loss, ensure topology preservation but falter in capturing geometric details, especially under translation and deformation. The combination of clDice with traditional Dice loss can lead to diameter imbalance, favoring larger vessels. Addressing these challenges, we introduce the centerline boundary Dice (cbDice) loss function, which harmonizes topological integrity and geometric nuances, ensuring consistent segmentation across various vessel sizes. cbDice enriches the clDice approach by including boundary-aware aspects, thereby improving geometric detail recognition. It matches the performance of the boundary difference over union (B-DoU) loss through a mask-distance-based approach, enhancing traslation sensitivity. Crucially, cbDice incorporates radius information from vascular skeletons, enabling uniform adaptation to vascular diameter changes and maintaining balance in branch growth and fracture impacts. Furthermore, we conducted a theoretical analysis of clDice variants (cl-X-Dice). We validated cbDice's efficacy on three diverse vascular segmentation datasets, encompassing both 2D and 3D, and binary and multi-class segmentation. Particularly, the method integrated with cbDice demonstrated outstanding performance on the MICCAI 2023 TopCoW Challenge dataset. Our code is made publicly available at: https://github.com/PengchengShi1220/cbDice.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents
Authors:
Shihan Deng,
Weikai Xu,
Hongda Sun,
Wei Liu,
Tao Tan,
Jianfeng Liu,
Ang Li,
Jian Luan,
Bin Wang,
Rui Yan,
Shuo Shang
Abstract:
With the remarkable advancements of large language models (LLMs), LLM-based agents have become a research hotspot in human-computer interaction. However, there is a scarcity of benchmarks available for LLM-based mobile agents. Benchmarking these agents generally faces three main challenges: (1) The inefficiency of UI-only operations imposes limitations to task evaluation. (2) Specific instructions…
▽ More
With the remarkable advancements of large language models (LLMs), LLM-based agents have become a research hotspot in human-computer interaction. However, there is a scarcity of benchmarks available for LLM-based mobile agents. Benchmarking these agents generally faces three main challenges: (1) The inefficiency of UI-only operations imposes limitations to task evaluation. (2) Specific instructions within a singular application lack adequacy for assessing the multi-dimensional reasoning and decision-making capacities of LLM mobile agents. (3) Current evaluation metrics are insufficient to accurately assess the process of sequential actions. To this end, we propose Mobile-Bench, a novel benchmark for evaluating the capabilities of LLM-based mobile agents. First, we expand conventional UI operations by incorporating 103 collected APIs to accelerate the efficiency of task completion. Subsequently, we collect evaluation data by combining real user queries with augmentation from LLMs. To better evaluate different levels of planning capabilities for mobile agents, our data is categorized into three distinct groups: SAST, SAMT, and MAMT, reflecting varying levels of task complexity. Mobile-Bench comprises 832 data entries, with more than 200 tasks specifically designed to evaluate multi-APP collaboration scenarios. Furthermore, we introduce a more accurate evaluation metric, named CheckPoint, to assess whether LLM-based mobile agents reach essential points during their planning and reasoning steps.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
UWBAD: Towards Effective and Imperceptible Jamming Attacks Against UWB Ranging Systems with COTS Chips
Authors:
Yuqiao Yang,
Zhongjie Wu,
Yongzhao Zhang,
Ting Chen,
Jun Li,
Jie Yang,
Wenhao Liu,
Xiaosong Zhang,
Ruicong Shi,
Jingwei Li,
Yu Jiang,
Zhuo Su
Abstract:
UWB ranging systems have been adopted in many critical and security sensitive applications due to its precise positioning and secure ranging capabilities. We present a practical jamming attack, namely UWBAD, against commercial UWB ranging systems, which exploits the vulnerability of the adoption of the normalized cross-correlation process in UWB ranging and can selectively and quickly block rangin…
▽ More
UWB ranging systems have been adopted in many critical and security sensitive applications due to its precise positioning and secure ranging capabilities. We present a practical jamming attack, namely UWBAD, against commercial UWB ranging systems, which exploits the vulnerability of the adoption of the normalized cross-correlation process in UWB ranging and can selectively and quickly block ranging sessions without prior knowledge of the configurations of the victim devices, potentially leading to severe consequences such as property loss, unauthorized access, or vehicle theft. UWBAD achieves more effective and less imperceptible jamming due to: (i) it efficiently blocks every ranging session by leveraging the field-level jamming, thereby exerting a tangible impact on commercial UWB ranging systems, and (ii) the compact, reactive, and selective system design based on COTS UWB chips, making it affordable and less imperceptible. We successfully conducted real attacks against commercial UWB ranging systems from the three largest UWB chip vendors on the market, e.g., Apple, NXP, and Qorvo. We reported our findings to Apple, related Original Equipment Manufacturers (OEM), and the Automotive Security Research Group, triggering internal security incident response procedures at Volkswagen, Audi, Bosch, and NXP. As of the writing of this paper, the related OEM has acknowledged this vulnerability in their automotive systems and has offered a $5,000 reward as a bounty.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
S. Ahmed,
M. Albrecht,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
X. H. Bai,
Y. Bai,
O. Bakina,
R. Baldini Ferroli,
I. Balossino,
Y. Ban,
K. Begzsuren,
N. Berger,
M. Bertani,
D. Bettoni,
F. Bianchi,
J. Bloms,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (495 additional authors not shown)
Abstract:
Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions…
▽ More
Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions $\frac{\mathcal{B}(h_c\rightarrow e^+e^-η_c)}{\mathcal{B}(h_c\rightarrow γη_c)}$ separately for the $h_c$ samples produced via $ψ(3686)\toπ^0h_c$ and $e^+e^-\toπ^+π^-h_c$. The average ratio is determined to be $(0.59\pm0.10(\text{stat.})\pm0.04(\text{syst.}))\%$, where the uncertainty includes both statistical and systematic components.
△ Less
Submitted 2 July, 2024; v1 submitted 28 June, 2024;
originally announced July 2024.
-
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
Authors:
Yuxuan Zhang,
Tianheng Cheng,
Rui Hu,
Lei Liu,
Heng Liu,
Longjin Ran,
Xiaoxin Chen,
Wenyu Liu,
Xinggang Wang
Abstract:
Segment Anything Model (SAM) has attracted widespread attention for its superior interactive segmentation capabilities with visual prompts while lacking further exploration of text prompts. In this paper, we empirically investigate what text prompt encoders (e.g., CLIP or LLM) are good for adapting SAM for referring expression segmentation and introduce the Early Vision-language Fusion-based SAM (…
▽ More
Segment Anything Model (SAM) has attracted widespread attention for its superior interactive segmentation capabilities with visual prompts while lacking further exploration of text prompts. In this paper, we empirically investigate what text prompt encoders (e.g., CLIP or LLM) are good for adapting SAM for referring expression segmentation and introduce the Early Vision-language Fusion-based SAM (EVF-SAM). EVF-SAM is a simple yet effective referring segmentation method which exploits multimodal prompts (i.e., image and text) and comprises a pre-trained vision-language model to generate referring prompts and a SAM model for segmentation. Surprisingly, we observe that: (1) multimodal prompts and (2) vision-language models with early fusion (e.g., BEIT-3) are beneficial for prompting SAM for accurate referring segmentation. Our experiments show that the proposed EVF-SAM based on BEIT-3 can obtain state-of-the-art performance on RefCOCO/+/g for referring expression segmentation and demonstrate the superiority of prompting SAM with early vision-language fusion. In addition, the proposed EVF-SAM with 1.32B parameters achieves remarkably higher performance while reducing nearly 82% of parameters compared to previous SAM methods based on large multimodal models.
△ Less
Submitted 3 July, 2024; v1 submitted 28 June, 2024;
originally announced June 2024.
-
"Hidden" mechanisms for Gouy-Chapman layer and other critical features via Poisson-Boltzmann equations
Authors:
Kaiyin Huang,
Weishi Liu
Abstract:
In this work a dynamical system approach is taken to systematically investigate the one-dimensional classical Poisson-Boltzmann (PB) equation with various boundary conditions. This framework, particularly, the phase space portrait, has a unique advantage of a geometric view of the dynamical systems, which allows one to reveal and examine critical features of the PB models. More specifically, we ar…
▽ More
In this work a dynamical system approach is taken to systematically investigate the one-dimensional classical Poisson-Boltzmann (PB) equation with various boundary conditions. This framework, particularly, the phase space portrait, has a unique advantage of a geometric view of the dynamical systems, which allows one to reveal and examine critical features of the PB models. More specifically, we are able to reveal the mechanism of Gouy-Chapman layer: the presence of an {\em equilibrium} for the PB equation, including equilibrium-at-infinity for Gouy-Chapman's original setup as the limiting case. Several other critical, somehow counterintuitive, features revealed in this work are the saturation phenomenon of surface charge density, the uniform boundedness of electric pressure (given length) and of length (given electric pressure) in surface charge, and the critical length for a reversal of electric force direction. All have a common mechanism: the presence of an equilibrium for the PB equation. We believe that the critical features presented from the classical PB models persist for modified PB systems up to a certain degree. On the other hand, any qualitative change in these features as the sophistication of the model is increasing is an indication of new phenomena with new mechanisms.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation
Authors:
Zijun Yao,
Weijian Qi,
Liangming Pan,
Shulin Cao,
Linmei Hu,
Weichuan Liu,
Lei Hou,
Juanzi Li
Abstract:
This paper introduces Self-aware Knowledge Retrieval (SeaKR), a novel adaptive RAG model that extracts self-aware uncertainty of LLMs from their internal states. SeaKR activates retrieval when the LLMs present high self-aware uncertainty for generation. To effectively integrate retrieved knowledge snippets, SeaKR re-ranks them based on LLM's self-aware uncertainty to preserve the snippet that redu…
▽ More
This paper introduces Self-aware Knowledge Retrieval (SeaKR), a novel adaptive RAG model that extracts self-aware uncertainty of LLMs from their internal states. SeaKR activates retrieval when the LLMs present high self-aware uncertainty for generation. To effectively integrate retrieved knowledge snippets, SeaKR re-ranks them based on LLM's self-aware uncertainty to preserve the snippet that reduces their uncertainty to the utmost. To facilitate solving complex tasks that require multiple retrievals, SeaKR utilizes their self-aware uncertainty to choose among different reasoning strategies. Our experiments on both complex and simple Question Answering datasets show that SeaKR outperforms existing adaptive RAG methods. We release our code at https://github.com/THU-KEG/SeaKR.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Improved measurement of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential dec…
▽ More
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential decay rate of $D^+_s\to K^0 e^+ν_e$ to be $f^{K^0}_+(0)=0.636\pm0.049\pm0.013$. For both measurements, the first uncertainty is statistical and the second systematic. The branching fraction and form factor measurements are factors of 1.6 and 1.7 more precise than the previous world averages, respectively.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Interference Cancellation Based Neural Receiver for Superimposed Pilot in Multi-Layer Transmission
Authors:
Han Xiao,
Wenqiang Tian,
Shi Jin,
Wendong Liu,
Jia Shen,
Zhihua Shi,
Zhi Zhang
Abstract:
In this paper, an interference cancellation based neural receiver for superimposed pilot (SIP) in multi-layer transmission is proposed, where the data and pilot are non-orthogonally superimposed in the same time-frequency resource. Specifically, to deal with the intra-layer and inter-layer interference of SIP under multi-layer transmission, the interference cancellation with superimposed symbol ai…
▽ More
In this paper, an interference cancellation based neural receiver for superimposed pilot (SIP) in multi-layer transmission is proposed, where the data and pilot are non-orthogonally superimposed in the same time-frequency resource. Specifically, to deal with the intra-layer and inter-layer interference of SIP under multi-layer transmission, the interference cancellation with superimposed symbol aided channel estimation is leveraged in the neural receiver, accompanied by the pre-design of pilot code-division orthogonal mechanism at transmitter. In addition, to address the complexity issue for inter-vendor collaboration and the generalization problem in practical deployments, respectively, this paper also provides a fixed SIP (F-SIP) design based on constant pilot power ratio and scalable mechanisms for different modulation and coding schemes (MCSs) and transmission layers. Simulation results demonstrate the superiority of the proposed schemes on the performance of block error rate and throughput compared with existing counterparts.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
AFBench: A Large-scale Benchmark for Airfoil Design
Authors:
Jian Liu,
Jianyu Wu,
Hairun Xie,
Guoqing Zhang,
Jing Wang,
Wei Liu,
Wanli Ouyang,
Junjun Jiang,
Xianming Liu,
Shixiang Tang,
Miao Zhang
Abstract:
Data-driven generative models have emerged as promising approaches towards achieving efficient mechanical inverse design. However, due to prohibitively high cost in time and money, there is still lack of open-source and large-scale benchmarks in this field. It is mainly the case for airfoil inverse design, which requires to generate and edit diverse geometric-qualified and aerodynamic-qualified ai…
▽ More
Data-driven generative models have emerged as promising approaches towards achieving efficient mechanical inverse design. However, due to prohibitively high cost in time and money, there is still lack of open-source and large-scale benchmarks in this field. It is mainly the case for airfoil inverse design, which requires to generate and edit diverse geometric-qualified and aerodynamic-qualified airfoils following the multimodal instructions, \emph{i.e.,} dragging points and physical parameters. This paper presents the open-source endeavors in airfoil inverse design, \emph{AFBench}, including a large-scale dataset with 200 thousand airfoils and high-quality aerodynamic and geometric labels, two novel and practical airfoil inverse design tasks, \emph{i.e.,} conditional generation on multimodal physical parameters, controllable editing, and comprehensive metrics to evaluate various existing airfoil inverse design methods. Our aim is to establish \emph{AFBench} as an ecosystem for training and evaluating airfoil inverse design methods, with a specific focus on data-driven controllable inverse design models by multimodal instructions capable of bridging the gap between ideas and execution, the academic research and industrial applications. We have provided baseline models, comprehensive experimental observations, and analysis to accelerate future research. Our baseline model is trained on an RTX 3090 GPU within 16 hours. The codebase, datasets and benchmarks will be available at \url{https://hitcslj.github.io/afbench/}.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
Authors:
Le Zhuo,
Ruoyi Du,
Han Xiao,
Yangguang Li,
Dongyang Liu,
Rongjie Huang,
Wenze Liu,
Lirui Zhao,
Fu-Yun Wang,
Zhanyu Ma,
Xu Luo,
Zehan Wang,
Kaipeng Zhang,
Xiangyang Zhu,
Si Liu,
Xiangyu Yue,
Dingning Liu,
Wanli Ouyang,
Ziwei Liu,
Yu Qiao,
Hongsheng Li,
Peng Gao
Abstract:
Lumina-T2X is a nascent family of Flow-based Large Diffusion Transformers that establishes a unified framework for transforming noise into various modalities, such as images and videos, conditioned on text instructions. Despite its promising capabilities, Lumina-T2X still encounters challenges including training instability, slow inference, and extrapolation artifacts. In this paper, we present Lu…
▽ More
Lumina-T2X is a nascent family of Flow-based Large Diffusion Transformers that establishes a unified framework for transforming noise into various modalities, such as images and videos, conditioned on text instructions. Despite its promising capabilities, Lumina-T2X still encounters challenges including training instability, slow inference, and extrapolation artifacts. In this paper, we present Lumina-Next, an improved version of Lumina-T2X, showcasing stronger generation performance with increased training and inference efficiency. We begin with a comprehensive analysis of the Flag-DiT architecture and identify several suboptimal components, which we address by introducing the Next-DiT architecture with 3D RoPE and sandwich normalizations. To enable better resolution extrapolation, we thoroughly compare different context extrapolation methods applied to text-to-image generation with 3D RoPE, and propose Frequency- and Time-Aware Scaled RoPE tailored for diffusion transformers. Additionally, we introduced a sigmoid time discretization schedule to reduce sampling steps in solving the Flow ODE and the Context Drop method to merge redundant visual tokens for faster network evaluation, effectively boosting the overall sampling speed. Thanks to these improvements, Lumina-Next not only improves the quality and efficiency of basic text-to-image generation but also demonstrates superior resolution extrapolation capabilities and multilingual generation using decoder-based LLMs as the text encoder, all in a zero-shot manner. To further validate Lumina-Next as a versatile generative framework, we instantiate it on diverse tasks including visual recognition, multi-view, audio, music, and point cloud generation, showcasing strong performance across these domains. By releasing all codes and model weights, we aim to advance the development of next-generation generative AI capable of universal modeling.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality
Authors:
Taoran Yi,
Jiemin Fang,
Zanwei Zhou,
Junjie Wang,
Guanjun Wu,
Lingxi Xie,
Xiaopeng Zhang,
Wenyu Liu,
Xinggang Wang,
Qi Tian
Abstract:
Recently, 3D Gaussian splatting (3D-GS) has achieved great success in reconstructing and rendering real-world scenes. To transfer the high rendering quality to generation tasks, a series of research works attempt to generate 3D-Gaussian assets from text. However, the generated assets have not achieved the same quality as those in reconstruction tasks. We observe that Gaussians tend to grow without…
▽ More
Recently, 3D Gaussian splatting (3D-GS) has achieved great success in reconstructing and rendering real-world scenes. To transfer the high rendering quality to generation tasks, a series of research works attempt to generate 3D-Gaussian assets from text. However, the generated assets have not achieved the same quality as those in reconstruction tasks. We observe that Gaussians tend to grow without control as the generation process may cause indeterminacy. Aiming at highly enhancing the generation quality, we propose a novel framework named GaussianDreamerPro. The main idea is to bind Gaussians to reasonable geometry, which evolves over the whole generation process. Along different stages of our framework, both the geometry and appearance can be enriched progressively. The final output asset is constructed with 3D Gaussians bound to mesh, which shows significantly enhanced details and quality compared with previous methods. Notably, the generated asset can also be seamlessly integrated into downstream manipulation pipelines, e.g. animation, composition, and simulation etc., greatly promoting its potential in wide applications. Demos are available at https://taoranyi.com/gaussiandreamerpro/.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Measurement of the cross sections of $e^+e^-\to K^{-}\barΞ^{+}Λ/Σ^{0}$ at center-of-mass energies between 3.510 and 4.914 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of…
▽ More
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$, evidence for $ψ(4160) \to K^{-}\barΞ^{+}Λ$ is found for the first time with a significance of 4.4$σ$, including systematic uncertainties. No evidence for other possible resonances is found. In addition, the products of electronic partial width and branching fraction for all assumed resonances decaying into $K^{-}\barΞ^{+}Λ/Σ^{0}$ are determined.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.