-
Measurement of the branching fraction of $D^+_s\to \ell^+ν_\ell$ via $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (634 additional authors not shown)
Abstract:
Based on $10.64~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data taken at center-of-mass energies between 4.237 and 4.699 GeV with the BESIII detector, we study the leptonic $D^+_s$ decays using the $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$ process. The branching fractions of $D_s^+\to\ell^+ν_{\ell}\,(\ell=μ,τ)$ are measured to be $\mathcal{B}(D_s^+\toμ^+ν_μ)=(\bfmuv)\%$ and…
▽ More
Based on $10.64~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data taken at center-of-mass energies between 4.237 and 4.699 GeV with the BESIII detector, we study the leptonic $D^+_s$ decays using the $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$ process. The branching fractions of $D_s^+\to\ell^+ν_{\ell}\,(\ell=μ,τ)$ are measured to be $\mathcal{B}(D_s^+\toμ^+ν_μ)=(\bfmuv)\%$ and $\mathcal{B}(D_s^+\toτ^+ν_τ)=(\bftauv)\%$, respectively. The product of the decay constant and Cabibbo-Kobayashi-Maskawa matrix element $|V_{cs}|$ is determined to be $f_{D_s^+}|V_{cs}|=(\mufdsxvcsresult)_{μν}~\mathrm{MeV}$ and $f_{D_s^+}|V_{cs}|=(\taufdsxvcsresult))_{τν}~\mathrm{MeV}$, respectively. Taking the value of $|V_{cs}|$ from a global fit in the Standard Model, we obtain ${f_{D^+_s}}=(\mufdsresult)_{μν}$\,MeV and ${f_{D^+_s}}=(\taufdsresult)_{τν}$\,MeV, respectively. Conversely, taking the value for $f_{D_s^+}$ from the latest lattice quantum chromodynamics calculation, we obtain $|V_{cs}| =(\muvcsresult)_{μν}$ and $|V_{cs}| = (\tauvcsresult)_{τν}$, respectively.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Authors:
Haodong Duan,
Junming Yang,
Yuxuan Qiao,
Xinyu Fang,
Lin Chen,
Yuan Liu,
Xiaoyi Dong,
Yuhang Zang,
Pan Zhang,
Jiaqi Wang,
Dahua Lin,
Kai Chen
Abstract:
We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework for researchers and developers to evaluate existing multi-modality models and publish reproducible evaluation results. In VLMEvalKit, we implement over 70 different large multi-modality models, including both proprietary…
▽ More
We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework for researchers and developers to evaluate existing multi-modality models and publish reproducible evaluation results. In VLMEvalKit, we implement over 70 different large multi-modality models, including both proprietary APIs and open-source models, as well as more than 20 different multi-modal benchmarks. By implementing a single interface, new models can be easily added to the toolkit, while the toolkit automatically handles the remaining workloads, including data preparation, distributed inference, prediction post-processing, and metric calculation. Although the toolkit is currently mainly used for evaluating large vision-language models, its design is compatible with future updates that incorporate additional modalities, such as audio and video. Based on the evaluation results obtained with the toolkit, we host OpenVLM Leaderboard, a comprehensive leaderboard to track the progress of multi-modality learning research. The toolkit is released at https://github.com/open-compass/VLMEvalKit and is actively maintained.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
First Measurement of Solar $^8$B Neutrino Flux through Coherent Elastic Neutrino-Nucleus Scattering in PandaX-4T
Authors:
PandaX Collaboration,
Zihao Bo,
Wei Chen,
Xun Chen,
Yunhua Chen,
Zhaokan Cheng,
Xiangyi Cui,
Yingjie Fan,
Deqing Fang,
Zhixing Gao,
Lisheng Geng,
Karl Giboni,
Xunan Guo,
Xuyuan Guo,
Zichao Guo,
Chencheng Han,
Ke Han,
Changda He,
Jinrong He,
Di Huang,
Houqi Huang,
Junting Huang,
Ruquan Hou,
Yu Hou,
Xiangdong Ji
, et al. (77 additional authors not shown)
Abstract:
The PandaX-4T liquid xenon detector at the China Jinping Underground Laboratory is used to measure the solar $^8$B neutrino flux by detecting neutrinos through coherent scattering with xenon nuclei. Data samples requiring the coincidence of scintillation and ionization signals (paired), as well as unpaired ionization-only signals (US2), are selected with energy threshold of approximately 1.1 keV (…
▽ More
The PandaX-4T liquid xenon detector at the China Jinping Underground Laboratory is used to measure the solar $^8$B neutrino flux by detecting neutrinos through coherent scattering with xenon nuclei. Data samples requiring the coincidence of scintillation and ionization signals (paired), as well as unpaired ionization-only signals (US2), are selected with energy threshold of approximately 1.1 keV (0.33 keV) nuclear recoil energy. Combining the commissioning run and the first science run of PandaX-4T, a total exposure of 1.25 and 1.04 tonne$\cdot$year are collected for the paired and US2, respectively. After unblinding, 3 and 332 events are observed with an expectation of 2.8$\pm$0.5 and 251$\pm$32 background events, for the paired and US2 data, respectively. A combined analysis yields a best-fit $^8$B neutrino signal of 3.5 (75) events from the paired (US2) data sample, with $\sim$37\% uncertainty, and the background-only hypothesis is disfavored at 2.64$σ$ significance. This gives a solar $^8$B neutrino flux of ($8.4\pm3.1$)$\times$10$^6$ cm$^{-2}$s$^{-1}$, consistent with the standard solar model prediction. This is the first indication of solar $^8$B neutrino ``fog'' in a dark matter direct detection experiment.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Displacement memory for flyby
Authors:
P. -M. Zhang,
Q. -L. Zhao,
P. A. Horvathy
Abstract:
Zel'dovich and Polnarev, in their seminal paper on the displacement memory effect [1], suggested that particles hit by a burst of gravitational waves generated by flyby would be merely displaced. Their prediction is confirmed numerically for the wave profile which is the derivative of a Gaussian proposed by Gibbons and Hawking [2]. The study is extended to higher-order derivative profiles as propo…
▽ More
Zel'dovich and Polnarev, in their seminal paper on the displacement memory effect [1], suggested that particles hit by a burst of gravitational waves generated by flyby would be merely displaced. Their prediction is confirmed numerically for the wave profile which is the derivative of a Gaussian proposed by Gibbons and Hawking [2]. The study is extended to higher-order derivative profiles as proposed e.g. for gravitational collapse, etc.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Qwen2 Technical Report
Authors:
An Yang,
Baosong Yang,
Binyuan Hui,
Bo Zheng,
Bowen Yu,
Chang Zhou,
Chengpeng Li,
Chengyuan Li,
Dayiheng Liu,
Fei Huang,
Guanting Dong,
Haoran Wei,
Huan Lin,
Jialong Tang,
Jialin Wang,
Jian Yang,
Jianhong Tu,
Jianwei Zhang,
Jianxin Ma,
Jin Xu,
Jingren Zhou,
Jinze Bai,
Jinzheng He,
Junyang Lin,
Kai Dang
, et al. (34 additional authors not shown)
Abstract:
This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, a…
▽ More
This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, and exhibits competitive performance relative to proprietary models across diverse benchmarks on language understanding, generation, multilingual proficiency, coding, mathematics, and reasoning.
The flagship model, Qwen2-72B, showcases remarkable performance: 84.2 on MMLU, 37.9 on GPQA, 64.6 on HumanEval, 89.5 on GSM8K, and 82.4 on BBH as a base language model. The instruction-tuned variant, Qwen2-72B-Instruct, attains 9.1 on MT-Bench, 48.1 on Arena-Hard, and 35.7 on LiveCodeBench. Moreover, Qwen2 demonstrates robust multilingual capabilities, proficient in approximately 30 languages, spanning English, Chinese, Spanish, French, German, Arabic, Russian, Korean, Japanese, Thai, Vietnamese, and more, underscoring its versatility and global reach.
To foster community innovation and accessibility, we have made the Qwen2 model weights openly available on Hugging Face and ModelScope, and the supplementary materials including example code on GitHub. These platforms also include resources for quantization, fine-tuning, and deployment, facilitating a wide range of applications and research endeavors.
△ Less
Submitted 16 July, 2024; v1 submitted 15 July, 2024;
originally announced July 2024.
-
How buildings change the fundamental allometry
Authors:
Fabiano L. Ribeiro,
Peiran Zhang,
Liang Gao,
Diego Rybski
Abstract:
We demonstrate that the original fundamental allometry alone cannot accurately describe the relationship between urban area and population size. Instead, building height is a third factor that interplays with area and population. To illustrate this, we propose a straightforward model based on the idea that city area is the result of people's desire to live close to one another while also having su…
▽ More
We demonstrate that the original fundamental allometry alone cannot accurately describe the relationship between urban area and population size. Instead, building height is a third factor that interplays with area and population. To illustrate this, we propose a straightforward model based on the idea that city area is the result of people's desire to live close to one another while also having sufficient living space. This leads to a more general form of fundamental allometry (relating area, population, and building height). Our argument is supported by empirical data from different countries.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
DRM Revisited: A Complete Error Analysis
Authors:
Yuling Jiao,
Ruoxuan Li,
Peiying Wu,
Jerry Zhijian Yang,
Pingwen Zhang
Abstract:
In this work, we address a foundational question in the theoretical analysis of the Deep Ritz Method (DRM) under the over-parameteriztion regime: Given a target precision level, how can one determine the appropriate number of training samples, the key architectural parameters of the neural networks, the step size for the projected gradient descent optimization procedure, and the requisite number o…
▽ More
In this work, we address a foundational question in the theoretical analysis of the Deep Ritz Method (DRM) under the over-parameteriztion regime: Given a target precision level, how can one determine the appropriate number of training samples, the key architectural parameters of the neural networks, the step size for the projected gradient descent optimization procedure, and the requisite number of iterations, such that the output of the gradient descent process closely approximates the true solution of the underlying partial differential equation to the specified precision?
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Enhancing Thermal Infrared Tracking with Natural Language Modeling and Coordinate Sequence Generation
Authors:
Miao Yan,
Ping Zhang,
Haofei Zhang,
Ruqian Hao,
Juanxiu Liu,
Xiaoyang Wang,
Lin Liu
Abstract:
Thermal infrared tracking is an essential topic in computer vision tasks because of its advantage of all-weather imaging. However, most conventional methods utilize only hand-crafted features, while deep learning-based correlation filtering methods are limited by simple correlation operations. Transformer-based methods ignore temporal and coordinate information, which is critical for TIR tracking…
▽ More
Thermal infrared tracking is an essential topic in computer vision tasks because of its advantage of all-weather imaging. However, most conventional methods utilize only hand-crafted features, while deep learning-based correlation filtering methods are limited by simple correlation operations. Transformer-based methods ignore temporal and coordinate information, which is critical for TIR tracking that lacks texture and color information. In this paper, to address these issues, we apply natural language modeling to TIR tracking and propose a novel model called NLMTrack, which enhances the utilization of coordinate and temporal information. NLMTrack applies an encoder that unifies feature extraction and feature fusion, which simplifies the TIR tracking pipeline. To address the challenge of low detail and low contrast in TIR images, on the one hand, we design a multi-level progressive fusion module that enhances the semantic representation and incorporates multi-scale features. On the other hand, the decoder combines the TIR features and the coordinate sequence features using a causal transformer to generate the target sequence step by step. Moreover, we explore an adaptive loss aimed at elevating tracking accuracy and a simple template update strategy to accommodate the target's appearance variations. Experiments show that NLMTrack achieves state-of-the-art performance on multiple benchmarks. The Code is publicly available at \url{https://github.com/ELOESZHANG/NLMTrack}.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Thermodynamic bounce effect in quantum BTZ black hole
Authors:
Zhen-Ming Xu,
Pan-Pan Zhang,
Bin Wu,
Xing Zhang
Abstract:
A novel thermodynamic phenomenon has been observed in the quantum Bañados-Teitelboim-Zanelli (qBTZ) black hole, utilizing generalized free energy and Kramer escape rate. This phenomenon also reveals the unique property of the quantum black hole. The stochastic thermal motion of various thermodynamic states within the black hole system induces phase transitions, under the influence of generalized f…
▽ More
A novel thermodynamic phenomenon has been observed in the quantum Bañados-Teitelboim-Zanelli (qBTZ) black hole, utilizing generalized free energy and Kramer escape rate. This phenomenon also reveals the unique property of the quantum black hole. The stochastic thermal motion of various thermodynamic states within the black hole system induces phase transitions, under the influence of generalized free energy which obtained by extending Maxwell's construction. Through the analysis of Kramer escape rate, it is discovered that the qBTZ black hole thermodynamic system exhibits a bounce effect. Furthermore, the overall thermodynamic picture of the qBTZ black hole has been obtained under different quantum backreactions.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Urban Waterlogging Detection: A Challenging Benchmark and Large-Small Model Co-Adapter
Authors:
Suqi Song,
Chenxu Zhang,
Peng Zhang,
Pengkun Li,
Fenglong Song,
Lei Zhang
Abstract:
Urban waterlogging poses a major risk to public safety and infrastructure. Conventional methods using water-level sensors need high-maintenance to hardly achieve full coverage. Recent advances employ surveillance camera imagery and deep learning for detection, yet these struggle amidst scarce data and adverse environmental conditions. In this paper, we establish a challenging Urban Waterlogging Be…
▽ More
Urban waterlogging poses a major risk to public safety and infrastructure. Conventional methods using water-level sensors need high-maintenance to hardly achieve full coverage. Recent advances employ surveillance camera imagery and deep learning for detection, yet these struggle amidst scarce data and adverse environmental conditions. In this paper, we establish a challenging Urban Waterlogging Benchmark (UW-Bench) under diverse adverse conditions to advance real-world applications. We propose a Large-Small Model co-adapter paradigm (LSM-adapter), which harnesses the substantial generic segmentation potential of large model and the specific task-directed guidance of small model. Specifically, a Triple-S Prompt Adapter module alongside a Dynamic Prompt Combiner are proposed to generate then merge multiple prompts for mask decoder adaptation. Meanwhile, a Histogram Equalization Adap-ter module is designed to infuse the image specific information for image encoder adaptation. Results and analysis show the challenge and superiority of our developed benchmark and algorithm. Project page: \url{https://github.com/zhang-chenxu/LSM-Adapter}
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$
Authors:
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (645 additional authors not shown)
Abstract:
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be…
▽ More
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Gravitational orbital Hall effect of vortex photons in Lense-Thirring metric
Authors:
Wei-Si Qiu,
Dan-Dan Lian,
Peng-Ming Zhang
Abstract:
Vortex photons, possessing an intrinsic orbital angular momentum (OAM) aligned with the direction of propagation, are described using vortex electromagnetic wave packets. Similar to the gravitational spin Hall effect (SHE), these vortex photons are expected to exhibit intrinsic OAM-dependent trajectories and separations when propagating through a gravitational field, a phenomenon termed the gravit…
▽ More
Vortex photons, possessing an intrinsic orbital angular momentum (OAM) aligned with the direction of propagation, are described using vortex electromagnetic wave packets. Similar to the gravitational spin Hall effect (SHE), these vortex photons are expected to exhibit intrinsic OAM-dependent trajectories and separations when propagating through a gravitational field, a phenomenon termed the gravitational orbital Hall effect (OHE). In this work, we construct a vortex Laguerre-Gaussian electromagnetic wave packet and analyze its motion by solving covariant Maxwell equations within the Lense-Thirring metric. Our findings reveal that vortex photons with different intrinsic OAM not only separate perpendicular to the null geodesic plane but also within it. This behavior contrasts with the gravitational SHE, where photons of opposite spins separate primarily perpendicular to the null geodesic plane. Moreover, the relationship between the separation and intrinsic OAM differs significantly from that between the separation and spin. These results suggest a unique interaction between intrinsic OAM and gravity, distinct from the spin-gravity coupling, indicating that the gravitational OHE might not be precisely predicted by merely substituting spin with intrinsic OAM in the gravitational SHE.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Asymmetric Mask Scheme for Self-Supervised Real Image Denoising
Authors:
Xiangyu Liao,
Tianheng Zheng,
Jiayu Zhong,
Pingping Zhang,
Chao Ren
Abstract:
In recent years, self-supervised denoising methods have gained significant success and become critically important in the field of image restoration. Among them, the blind spot network based methods are the most typical type and have attracted the attentions of a large number of researchers. Although the introduction of blind spot operations can prevent identity mapping from noise to noise, it imp…
▽ More
In recent years, self-supervised denoising methods have gained significant success and become critically important in the field of image restoration. Among them, the blind spot network based methods are the most typical type and have attracted the attentions of a large number of researchers. Although the introduction of blind spot operations can prevent identity mapping from noise to noise, it imposes stringent requirements on the receptive fields in the network design, thereby limiting overall performance. To address this challenge, we propose a single mask scheme for self-supervised denoising training, which eliminates the need for blind spot operation and thereby removes constraints on the network structure design. Furthermore, to achieve denoising across entire image during inference, we propose a multi-mask scheme. Our method, featuring the asymmetric mask scheme in training and inference, achieves state-of-the-art performance on existing real noisy image datasets. All the source code will be made available to the public.
△ Less
Submitted 14 July, 2024; v1 submitted 8 July, 2024;
originally announced July 2024.
-
Exploring the Capability of ChatGPT to Reproduce Human Labels for Social Computing Tasks (Extended Version)
Authors:
Yiming Zhu,
Peixian Zhang,
Ehsan-Ul Haq,
Pan Hui,
Gareth Tyson
Abstract:
Harnessing the potential of large language models (LLMs) like ChatGPT can help address social challenges through inclusive, ethical, and sustainable means. In this paper, we investigate the extent to which ChatGPT can annotate data for social computing tasks, aiming to reduce the complexity and cost of undertaking web research. To evaluate ChatGPT's potential, we re-annotate seven datasets using C…
▽ More
Harnessing the potential of large language models (LLMs) like ChatGPT can help address social challenges through inclusive, ethical, and sustainable means. In this paper, we investigate the extent to which ChatGPT can annotate data for social computing tasks, aiming to reduce the complexity and cost of undertaking web research. To evaluate ChatGPT's potential, we re-annotate seven datasets using ChatGPT, covering topics related to pressing social issues like COVID-19 misinformation, social bot deception, cyberbully, clickbait news, and the Russo-Ukrainian War. Our findings demonstrate that ChatGPT exhibits promise in handling these data annotation tasks, albeit with some challenges. Across the seven datasets, ChatGPT achieves an average annotation F1-score of 72.00%. Its performance excels in clickbait news annotation, correctly labeling 89.66% of the data. However, we also observe significant variations in performance across individual labels. Our study reveals predictable patterns in ChatGPT's annotation performance. Thus, we propose GPT-Rater, a tool to predict if ChatGPT can correctly label data for a given annotation task. Researchers can use this to identify where ChatGPT might be suitable for their annotation requirements. We show that GPT-Rater effectively predicts ChatGPT's performance. It performs best on a clickbait headlines dataset by achieving an average F1-score of 95.00%. We believe that this research opens new avenues for analysis and can reduce barriers to engaging in social computing research.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Enhanced Safety in Autonomous Driving: Integrating Latent State Diffusion Model for End-to-End Navigation
Authors:
Detian Chu,
Linyuan Bai,
Jianuo Huang,
Zhenlong Fang,
Peng Zhang,
Wei Kang
Abstract:
With the advancement of autonomous driving, ensuring safety during motion planning and navigation is becoming more and more important. However, most end-to-end planning methods suffer from a lack of safety. This research addresses the safety issue in the control optimization problem of autonomous driving, formulated as Constrained Markov Decision Processes (CMDPs). We propose a novel, model-based…
▽ More
With the advancement of autonomous driving, ensuring safety during motion planning and navigation is becoming more and more important. However, most end-to-end planning methods suffer from a lack of safety. This research addresses the safety issue in the control optimization problem of autonomous driving, formulated as Constrained Markov Decision Processes (CMDPs). We propose a novel, model-based approach for policy optimization, utilizing a conditional Value-at-Risk based Soft Actor Critic to manage constraints in complex, high-dimensional state spaces effectively. Our method introduces a worst-case actor to guide safe exploration, ensuring rigorous adherence to safety requirements even in unpredictable scenarios. The policy optimization employs the Augmented Lagrangian method and leverages latent diffusion models to predict and simulate future trajectories. This dual approach not only aids in navigating environments safely but also refines the policy's performance by integrating distribution modeling to account for environmental uncertainties. Empirical evaluations conducted in both simulated and real environment demonstrate that our approach outperforms existing methods in terms of safety, efficiency, and decision-making capabilities.
△ Less
Submitted 16 July, 2024; v1 submitted 8 July, 2024;
originally announced July 2024.
-
Receiver Selection and Transmit Beamforming for Multi-static Integrated Sensing and Communications
Authors:
Dan Wang,
Yuanming Tian,
Chuan Huang,
Hao Chen,
Xiaodong Xu,
Ping Zhang
Abstract:
Next-generation wireless networks are expected to develop a novel paradigm of integrated sensing and communications (ISAC) to enable both the high-accuracy sensing and high-speed communications. However, conventional mono-static ISAC systems, which simultaneously transmit and receive at the same equipment, may suffer from severe self-interference, and thus significantly degrade the system performa…
▽ More
Next-generation wireless networks are expected to develop a novel paradigm of integrated sensing and communications (ISAC) to enable both the high-accuracy sensing and high-speed communications. However, conventional mono-static ISAC systems, which simultaneously transmit and receive at the same equipment, may suffer from severe self-interference, and thus significantly degrade the system performance.To address this issue, this paper studies a multi-static ISAC system for cooperative target localization and communications, where the transmitter transmits ISAC signal to multiple receivers (REs) deployed at different positions. We derive the closed-form Cramér-Rao bound (CRB) on the joint estimations of both the transmission delay and Doppler shift for cooperative target localization, and the CRB minimization problem is formulated by considering the cooperative cost and communication rate requirements for the REs. To solve this problem, we first decouple it into two subproblems for RE selection and transmit beamforming, respectively. Then, a minimax linkage-based method is proposed to solve the RE selection subproblem, and a successive convex approximation algorithm is adopted to deal with the transmit beamforming subproblem with non-convex constraints. Finally, numerical results validate our analysis and reveal that our proposed multi-static ISAC scheme achieves better ISAC performance than the conventional mono-static ones when the number of cooperative REs is large.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Neuromorphic Imaging with Super-Resolution
Authors:
Pei Zhang,
Shuo Zhu,
Chutian Wang,
Yaping Zhao,
Edmund Y. Lam
Abstract:
Neuromorphic imaging is a bio-inspired technique that imitates the human retina to sense variations in a dynamic scene. It responds to pixel-level brightness changes by asynchronous streaming events and boasts microsecond temporal precision over a high dynamic range, yielding blur-free recordings under extreme illumination. Nevertheless, such a modality falls short in spatial resolution and leads…
▽ More
Neuromorphic imaging is a bio-inspired technique that imitates the human retina to sense variations in a dynamic scene. It responds to pixel-level brightness changes by asynchronous streaming events and boasts microsecond temporal precision over a high dynamic range, yielding blur-free recordings under extreme illumination. Nevertheless, such a modality falls short in spatial resolution and leads to a low level of visual richness and clarity. Pursuing hardware upgrades is expensive and might cause compromised performance due to more burdens on computational requirements. Another option is to harness offline, plug-in-play neuromorphic super-resolution solutions. However, existing ones, which demand substantial sample volumes for lengthy training on massive computing resources, are largely restricted by real data availability owing to the current imperfect high-resolution devices, as well as the randomness and variability of motion. To tackle these challenges, we introduce the first self-supervised neuromorphic super-resolution prototype. It can be self-adaptive to per input source from any low-resolution camera to estimate an optimal, high-resolution counterpart of any scale, without the need of side knowledge and prior training. Evaluated on downstream event-driven tasks, such a simple yet effective method can obtain competitive results against the state-of-the-arts, significantly promoting flexibility but not sacrificing accuracy. It also delivers enhancements for inferior natural images and optical micrographs acquired under non-ideal imaging conditions, breaking through the limitations that are challenging to overcome with traditional frame techniques. In the current landscape where the use of high-resolution cameras for event-based sensing remains an open debate, our solution serves as a cost-efficient and practical alternative, paving the way for more intelligent imaging systems.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
Authors:
Haorui He,
Zengqiang Shang,
Chaoren Wang,
Xuyuan Li,
Yicheng Gu,
Hua Hua,
Liwei Liu,
Chen Yang,
Jiaqi Li,
Peiyang Shi,
Yuancheng Wang,
Kai Chen,
Pengyuan Zhang,
Zhizheng Wu
Abstract:
Recently, speech generation models have made significant progress by using large-scale training data. However, the research community struggle to produce highly spontaneous and human-like speech due to the lack of large-scale, diverse, and spontaneous speech data. This paper present Emilia, the first multilingual speech generation dataset from in-the-wild speech data, and Emilia-Pipe, the first op…
▽ More
Recently, speech generation models have made significant progress by using large-scale training data. However, the research community struggle to produce highly spontaneous and human-like speech due to the lack of large-scale, diverse, and spontaneous speech data. This paper present Emilia, the first multilingual speech generation dataset from in-the-wild speech data, and Emilia-Pipe, the first open-source preprocessing pipeline designed to transform in-the-wild speech data into high-quality training data with annotations for speech generation. Emilia starts with over 101k hours of speech in six languages and features diverse speech with varied speaking styles. To facilitate the scale-up of Emilia, the open-source pipeline Emilia-Pipe can process one hour of raw speech data ready for model training in a few mins, which enables the research community to collaborate on large-scale speech generation research. Experimental results validate the effectiveness of Emilia. Demos are available at: https://emilia-dataset.github.io/Emilia-Demo-Page/.
△ Less
Submitted 12 July, 2024; v1 submitted 7 July, 2024;
originally announced July 2024.
-
Unraveling Radiomics Complexity: Strategies for Optimal Simplicity in Predictive Modeling
Authors:
Mahdi Ait Lhaj Loutfi,
Teodora Boblea Podasca,
Alex Zwanenburg,
Taman Upadhaya,
Jorge Barrios,
David R. Raleigh,
William C. Chen,
Dante P. I. Capaldi,
Hong Zheng,
Olivier Gevaert,
Jing Wu,
Alvin C. Silva,
Paul J. Zhang,
Harrison X. Bai,
Jan Seuntjens,
Steffen Löck,
Patrick O. Richard,
Olivier Morin,
Caroline Reinhold,
Martin Lepage,
Martin Vallières
Abstract:
Background: The high dimensionality of radiomic feature sets, the variability in radiomic feature types and potentially high computational requirements all underscore the need for an effective method to identify the smallest set of predictive features for a given clinical problem. Purpose: Develop a methodology and tools to identify and explain the smallest set of predictive radiomic features. Mat…
▽ More
Background: The high dimensionality of radiomic feature sets, the variability in radiomic feature types and potentially high computational requirements all underscore the need for an effective method to identify the smallest set of predictive features for a given clinical problem. Purpose: Develop a methodology and tools to identify and explain the smallest set of predictive radiomic features. Materials and Methods: 89,714 radiomic features were extracted from five cancer datasets: low-grade glioma, meningioma, non-small cell lung cancer (NSCLC), and two renal cell carcinoma cohorts (n=2104). Features were categorized by computational complexity into morphological, intensity, texture, linear filters, and nonlinear filters. Models were trained and evaluated on each complexity level using the area under the curve (AUC). The most informative features were identified, and their importance was explained. The optimal complexity level and associated most informative features were identified using systematic statistical significance analyses and a false discovery avoidance procedure, respectively. Their predictive importance was explained using a novel tree-based method. Results: MEDimage, a new open-source tool, was developed to facilitate radiomic studies. Morphological features were optimal for MRI-based meningioma (AUC: 0.65) and low-grade glioma (AUC: 0.68). Intensity features were optimal for CECT-based renal cell carcinoma (AUC: 0.82) and CT-based NSCLC (AUC: 0.76). Texture features were optimal for MRI-based renal cell carcinoma (AUC: 0.72). Tuning the Hounsfield unit range improved results for CECT-based renal cell carcinoma (AUC: 0.86). Conclusion: Our proposed methodology and software can estimate the optimal radiomics complexity level for specific medical outcomes, potentially simplifying the use of radiomics in predictive modeling across various contexts.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
TongGu: Mastering Classical Chinese Understanding with Knowledge-Grounded Large Language Models
Authors:
Jiahuan Cao,
Dezhi Peng,
Peirong Zhang,
Yongxin Shi,
Yang Liu,
Kai Ding,
Lianwen Jin
Abstract:
Classical Chinese is a gateway to the rich heritage and wisdom of ancient China, yet its complexities pose formidable comprehension barriers for most modern people without specialized knowledge. While Large Language Models (LLMs) have shown remarkable capabilities in Natural Language Processing (NLP), they struggle with Classical Chinese Understanding (CCU), especially in data-demanding and knowle…
▽ More
Classical Chinese is a gateway to the rich heritage and wisdom of ancient China, yet its complexities pose formidable comprehension barriers for most modern people without specialized knowledge. While Large Language Models (LLMs) have shown remarkable capabilities in Natural Language Processing (NLP), they struggle with Classical Chinese Understanding (CCU), especially in data-demanding and knowledge-intensive tasks. In response to this dilemma, we propose \textbf{TongGu} (mean understanding ancient and modern), the first CCU-specific LLM, underpinned by three core contributions. First, we construct a two-stage instruction-tuning dataset ACCN-INS derived from rich classical Chinese corpora, aiming to unlock the full CCU potential of LLMs. Second, we propose Redundancy-Aware Tuning (RAT) to prevent catastrophic forgetting, enabling TongGu to acquire new capabilities while preserving its foundational knowledge. Third, we present a CCU Retrieval-Augmented Generation (CCU-RAG) technique to reduce hallucinations based on knowledge-grounding. Extensive experiments across 24 diverse CCU tasks validate TongGu's superior ability, underscoring the effectiveness of RAT and CCU-RAG. The model and dataset will be public available.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Evidence of $h_{b}(\text{2P}) \to Υ(\text{1S})η$ decay and search for $h_{b}(\text{1P,2P}) \to Υ(\text{1S})π^0$ with the Belle detector
Authors:
Belle Collaboration,
E. Kovalenko,
I. Adachi,
H. Aihara,
D. M. Asner,
T. Aushev,
R. Ayad,
V. Babu,
Sw. Banerjee,
K. Belous,
J. Bennett,
M. Bessner,
T. Bilka,
D. Biswas,
A. Bobrov,
D. Bodrov,
A. Bondar,
A. Bozek,
M. Bračko,
P. Branchini,
T. E. Browder,
A. Budano,
M. Campajola,
M. -C. Chang,
B. G. Cheon
, et al. (142 additional authors not shown)
Abstract:
We report the first evidence for the $h_{b}(\text{2P}) \to Υ(\text{1S})η$ transition with a significance of $3.5$ standard deviations. The decay branching fraction is measured to be $\mathcal{B}[h_{b}(\text{2P}) \to Υ(\text{1S})η]=(7.1 ~^{+3.7} _{-3.2}\pm 0.8)\times10^{-3}$, which is noticeably smaller than expected. We also set upper limits on $π^0$ transitions of…
▽ More
We report the first evidence for the $h_{b}(\text{2P}) \to Υ(\text{1S})η$ transition with a significance of $3.5$ standard deviations. The decay branching fraction is measured to be $\mathcal{B}[h_{b}(\text{2P}) \to Υ(\text{1S})η]=(7.1 ~^{+3.7} _{-3.2}\pm 0.8)\times10^{-3}$, which is noticeably smaller than expected. We also set upper limits on $π^0$ transitions of $\mathcal{B}[h_{b}(\text{2P}) \to Υ(\text{1S})π^0] < 1.8\times10^{-3}$, and $\mathcal{B}[h_{b}(\text{1P})\to Υ(\text{1S})π^0] < 1.8\times10^{-3}$, at the $90\%$ confidence level. These results are obtained with a $131.4$~fb$^{-1}$ data sample collected near the $Υ(\text{5S})$ resonance with the Belle detector at the KEKB asymmetric-energy $e^+e^-$ collider.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
CLASH: Complementary Learning with Neural Architecture Search for Gait Recognition
Authors:
Huanzhang Dou,
Pengyi Zhang,
Yuhan Zhao,
Lu Jin,
Xi Li
Abstract:
Gait recognition, which aims at identifying individuals by their walking patterns, has achieved great success based on silhouette. The binary silhouette sequence encodes the walking pattern within the sparse boundary representation. Therefore, most pixels in the silhouette are under-sensitive to the walking pattern since the sparse boundary lacks dense spatial-temporal information, which is suitab…
▽ More
Gait recognition, which aims at identifying individuals by their walking patterns, has achieved great success based on silhouette. The binary silhouette sequence encodes the walking pattern within the sparse boundary representation. Therefore, most pixels in the silhouette are under-sensitive to the walking pattern since the sparse boundary lacks dense spatial-temporal information, which is suitable to be represented with dense texture. To enhance the sensitivity to the walking pattern while maintaining the robustness of recognition, we present a Complementary Learning with neural Architecture Search (CLASH) framework, consisting of walking pattern sensitive gait descriptor named dense spatial-temporal field (DSTF) and neural architecture search based complementary learning (NCL). Specifically, DSTF transforms the representation from the sparse binary boundary into the dense distance-based texture, which is sensitive to the walking pattern at the pixel level. Further, NCL presents a task-specific search space for complementary learning, which mutually complements the sensitivity of DSTF and the robustness of the silhouette to represent the walking pattern effectively. Extensive experiments demonstrate the effectiveness of the proposed methods under both in-the-lab and in-the-wild scenarios. On CASIA-B, we achieve rank-1 accuracy of 98.8%, 96.5%, and 89.3% under three conditions. On OU-MVLP, we achieve rank-1 accuracy of 91.9%. Under the latest in-the-wild datasets, we outperform the latest silhouette-based methods by 16.3% and 19.7% on Gait3D and GREW, respectively.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Authors:
Pan Zhang,
Xiaoyi Dong,
Yuhang Zang,
Yuhang Cao,
Rui Qian,
Lin Chen,
Qipeng Guo,
Haodong Duan,
Bin Wang,
Linke Ouyang,
Songyang Zhang,
Wenwei Zhang,
Yining Li,
Yang Gao,
Peng Sun,
Xinyue Zhang,
Wei Li,
Jingwen Li,
Wenhai Wang,
Hang Yan,
Conghui He,
Xingcheng Zhang,
Kai Chen,
Jifeng Dai,
Yu Qiao
, et al. (2 additional authors not shown)
Abstract:
We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. Th…
▽ More
We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. This long-context capability allows IXC-2.5 to excel in tasks requiring extensive input and output contexts. Compared to its previous 2.0 version, InternLM-XComposer-2.5 features three major upgrades in vision-language comprehension: (1) Ultra-High Resolution Understanding, (2) Fine-Grained Video Understanding, and (3) Multi-Turn Multi-Image Dialogue. In addition to comprehension, IXC-2.5 extends to two compelling applications using extra LoRA parameters for text-image composition: (1) Crafting Webpages and (2) Composing High-Quality Text-Image Articles. IXC-2.5 has been evaluated on 28 benchmarks, outperforming existing open-source state-of-the-art models on 16 benchmarks. It also surpasses or competes closely with GPT-4V and Gemini Pro on 16 key tasks. The InternLM-XComposer-2.5 is publicly available at https://github.com/InternLM/InternLM-XComposer.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Thorium doped strontium fluoride crystal: a unique candidate for solid nuclear optical clock material
Authors:
Qiaorui Gong,
Shanming Li,
Shulong Zhang,
Siliang Tao,
Guoliang Deng,
Peixiong Zhang,
Chengchun Zhao,
Yin Hang,
Shining Zhu,
Longsheng Ma
Abstract:
We report a candidate with unique advantages in the cultivation of solid-state nuclear clock material, Th:SrF2 crystal. It not only has a segregation coefficient close to 1, which can achieve highly efficient and uniform doping of Th, but also ensures a high transmittance (~69% at 150 nm) while achieving extremely high doping concentration (232Th>6*10^20 cm^(-3). In addition, SrF2 crystal will not…
▽ More
We report a candidate with unique advantages in the cultivation of solid-state nuclear clock material, Th:SrF2 crystal. It not only has a segregation coefficient close to 1, which can achieve highly efficient and uniform doping of Th, but also ensures a high transmittance (~69% at 150 nm) while achieving extremely high doping concentration (232Th>6*10^20 cm^(-3). In addition, SrF2 crystal will not be irradiated-colored under strong α radiation like CaF2 crystal, Th:SrF2 crystal is expected to fully unleash its high concentration doping characteristics while ensuring its transmission performance in nuclear transition band not be severely affected by 229Th radiation damage.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
ISWSST: Index-space-wave State Superposition Transformers for Multispectral Remotely Sensed Imagery Semantic Segmentation
Authors:
Chang Li,
Pengfei Zhang,
Yu Wang
Abstract:
Currently the semantic segmentation task of multispectral remotely sensed imagery (MSRSI) faces the following problems: 1) Usually, only single domain feature (i.e., space domain or frequency domain) is considered; 2) downsampling operation in encoder generally leads to the accuracy loss of edge extraction; 3) multichannel features of MSRSI are not fully considered; and 4) prior knowledge of remot…
▽ More
Currently the semantic segmentation task of multispectral remotely sensed imagery (MSRSI) faces the following problems: 1) Usually, only single domain feature (i.e., space domain or frequency domain) is considered; 2) downsampling operation in encoder generally leads to the accuracy loss of edge extraction; 3) multichannel features of MSRSI are not fully considered; and 4) prior knowledge of remote sensing is not fully utilized. To solve the aforementioned issues, an index-space-wave state superposition Transformer (ISWSST) is the first to be proposed for MSRSI semantic segmentation by the inspiration from quantum mechanics, whose superiority is as follows: 1) index, space and wave states are superposed or fused to simulate quantum superposition by adaptively voting decision (i.e., ensemble learning idea) for being a stronger classifier and improving the segmentation accuracy; 2) a lossless wavelet pyramid encoder-decoder module is designed to losslessly reconstruct image and simulate quantum entanglement based on wavelet transform and inverse wavelet transform for avoiding the edge extraction loss; 3) combining multispectral features (i.e. remote sensing index and channel attention mechanism) is proposed to accurately extract ground objects from original resolution images; and 4) quantum mechanics are introduced to interpret the underlying superiority of ISWSST. Experiments show that ISWSST is validated and superior to the state-of-the-art architectures for the MSRSI segmentation task, which improves the segmentation and edge extraction accuracy effectively. Codes will be available publicly after our paper is accepted.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be…
▽ More
A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be $\mathcal{B}(J/ψ\to p \bar{p} η(η\to γγ)) = (1.480 \pm 0.001 \pm 0.024)\times\,10^{-3}$ and $\mathcal{B}(J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)) = (1.557 \pm 0.003 \pm 0.038)\times\,10^{-3}$, where the first uncertainties are statistical and the second systematic. Both results are compatible within their uncorrelated systematic uncertainties. The combined result is $\mathcal{B}(J/ψ\to p \bar{p} η)=(1.495 \pm 0.001 \pm 0.023)\times\,10^{-3}$ where the first uncertainty is the combined statistical uncertainty and the second one the combined systematic uncertainty of both analyses, incorporating correlations between them. In addition, the $p \bar{p}$ threshold region is investigated for a potential threshold enhancement, and no evidence for one is observed.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Generalized Gouy Rotation of Electron Vortex beams in uniform magnetic fields
Authors:
Qi Meng,
Xuan Liu,
Wei Ma,
Zhen Yang,
Liang Lu,
Alexander J. Silenko,
Pengming Zhang,
Liping Zou
Abstract:
The rotation of electron vortex beams (EVBs) presents a complex interplay of the Gouy phase characterizing free-space behavior and Landau states or Larmor rotation observed in magnetic fields. Despite being studied separately, these phenomena manifest within a single beam during its propagation in magnetic fields, lacking a comprehensive description. We address this by utilizing exact solutions of…
▽ More
The rotation of electron vortex beams (EVBs) presents a complex interplay of the Gouy phase characterizing free-space behavior and Landau states or Larmor rotation observed in magnetic fields. Despite being studied separately, these phenomena manifest within a single beam during its propagation in magnetic fields, lacking a comprehensive description. We address this by utilizing exact solutions of the relativistic paraxial equation in magnetic fields, termed "paraxial Landau modes". The paraxial Landau modes describe the quantum states of EVBs in magnetic fields. Our study of rotation angles demonstrates consistency with experimental data, supporting the practical presence of these modes. We provide a unified description of different regimes under generalized Gouy rotation, linking the Gouy phase to EVB rotation angles. This connection enhances our understanding of the Gouy phase and can be extended to nonuniform magnetic fields. Our theoretical analysis is validated through numerical simulations using the Chebyshev method. This work offers new insights into the dynamics of EVBs in magnetic fields and suggests practical applications in beam manipulation and beam optics of vortex particles.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Inducing superconductivity in quantum anomalous Hall regime
Authors:
Yu Huang,
Yu Fu,
Peng Zhang,
Kang L. Wang,
Qing Lin He
Abstract:
Interfacing the quantum anomalous Hall insulator with a conventional superconductor is known to be a promising manner for realizing a topological superconductor, which has been continuously pursued for years. Such a proximity route depends to a great extent on the control of the delicate interfacial coupling of the two constituents. However, a recent experiment reported the failure to reproduce su…
▽ More
Interfacing the quantum anomalous Hall insulator with a conventional superconductor is known to be a promising manner for realizing a topological superconductor, which has been continuously pursued for years. Such a proximity route depends to a great extent on the control of the delicate interfacial coupling of the two constituents. However, a recent experiment reported the failure to reproduce such a topological superconductor, which is ascribed to the negligence of the electrical short by the superconductor in the theoretical proposal. Here, we reproduce this topological superconductor with attention to the interface control. The resulted conductance matrix under a wide magnetic field range agrees with the fingerprint of this topological superconductor. This allows us to develop a phase diagram that unveils three regions parameterized by various coupling limits, which not only supports the feasibility to fabricate the topological superconductor by proximity but also fully explains the origin of the previous debate. The present work provides a comprehensible guide on fabricating the topological superconductor.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
MedVH: Towards Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical Context
Authors:
Zishan Gu,
Changchang Yin,
Fenglin Liu,
Ping Zhang
Abstract:
Large Vision Language Models (LVLMs) have recently achieved superior performance in various tasks on natural image and text data, which inspires a large amount of studies for LVLMs fine-tuning and training. Despite their advancements, there has been scant research on the robustness of these models against hallucination when fine-tuned on smaller datasets. In this study, we introduce a new benchmar…
▽ More
Large Vision Language Models (LVLMs) have recently achieved superior performance in various tasks on natural image and text data, which inspires a large amount of studies for LVLMs fine-tuning and training. Despite their advancements, there has been scant research on the robustness of these models against hallucination when fine-tuned on smaller datasets. In this study, we introduce a new benchmark dataset, the Medical Visual Hallucination Test (MedVH), to evaluate the hallucination of domain-specific LVLMs. MedVH comprises five tasks to evaluate hallucinations in LVLMs within the medical context, which includes tasks for comprehensive understanding of textual and visual input, as well as long textual response generation. Our extensive experiments with both general and medical LVLMs reveal that, although medical LVLMs demonstrate promising performance on standard medical tasks, they are particularly susceptible to hallucinations, often more so than the general models, raising significant concerns about the reliability of these domain-specific models. For medical LVLMs to be truly valuable in real-world applications, they must not only accurately integrate medical knowledge but also maintain robust reasoning abilities to prevent hallucination. Our work paves the way for future evaluations of these studies.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
A new subclass of gamma-ray burst originating from compact binary merger
Authors:
Chen-Wei Wang,
Wen-Jun Tan,
Shao-Lin Xiong,
Shu-Xu Yi,
Rahim Moradi,
Bing Li,
Zhen Zhang,
Yu Wang,
Yan-Zhi Meng,
Jia-Cong Liu,
Yue Wang,
Sheng-Lun Xie,
Wang-Chen Xue,
Zheng-Hang Yu,
Peng Zhang,
Wen-Long Zhang,
Yan-Qiu Zhang,
Chao Zheng
Abstract:
Type I gamma-ray bursts (GRBs) are believed to originate from compact binary merger usually with duration less than 2 seconds for the main emission. However, recent observations of GRB 211211A and GRB 230307A indicate that some merger-origin GRBs could last much longer. Since they show strikingly similar properties (indicating a common mechanism) which are different from the classic "long"-short b…
▽ More
Type I gamma-ray bursts (GRBs) are believed to originate from compact binary merger usually with duration less than 2 seconds for the main emission. However, recent observations of GRB 211211A and GRB 230307A indicate that some merger-origin GRBs could last much longer. Since they show strikingly similar properties (indicating a common mechanism) which are different from the classic "long"-short burst (e.g. GRB 060614), forming an interesting subclass of type I GRBs, we suggest to name them as type IL GRBs. By identifying the first peak of GRB 230307A as a quasi-thermal precursor, we find that the prompt emission of type IL GRB is composed of three episodes: (1) a precursor followed by a short quiescent (or weak emission) period, (2) a long-duration main emission, and (3) an extended emission. With this burst pattern, a good candidate, GRB 170228A, was found in the Fermi/GBM archive data, and subsequent temporal and spectral analyses indeed show that GRB 170228A falls in the same cluster with GRB 211211A and GRB 230307A in many diagnostic figures. Thus this burst pattern could be a good reference for rapidly identifying type IL GRB and conducting low-latency follow-up observation. We estimated the occurrence rate and discussed the physical origins and implications for the three emission episodes of type IL GRBs. Our analysis suggests the pre-merger precursor model, especially the super flare model, is more favored for type IL GRBs.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Structure-Aware Consensus Network on Graphs with Few Labeled Nodes
Authors:
Shuaike Xu,
Xiaolin Zhang,
Peng Zhang,
Kun Zhan
Abstract:
Graph node classification with few labeled nodes presents significant challenges due to limited supervision. Conventional methods often exploit the graph in a transductive learning manner. They fail to effectively utilize the abundant unlabeled data and the structural information inherent in graphs. To address these issues, we introduce a Structure-Aware Consensus Network (SACN) from three perspec…
▽ More
Graph node classification with few labeled nodes presents significant challenges due to limited supervision. Conventional methods often exploit the graph in a transductive learning manner. They fail to effectively utilize the abundant unlabeled data and the structural information inherent in graphs. To address these issues, we introduce a Structure-Aware Consensus Network (SACN) from three perspectives. Firstly, SACN leverages a novel structure-aware consensus learning strategy between two strongly augmented views. The proposed strategy can fully exploit the potentially useful information of the unlabeled nodes and the structural information of the entire graph. Secondly, SACN uniquely integrates the graph's structural information to achieve strong-to-strong consensus learning, improving the utilization of unlabeled data while maintaining multiview learning. Thirdly, unlike two-branch graph neural network-based methods, SACN is designed for multiview feature learning within a single-branch architecture. Furthermore, a class-aware pseudolabel selection strategy helps address class imbalance and achieve effective weak-to-strong supervision. Extensive experiments on three benchmark datasets demonstrate SACN's superior performance in node classification tasks, particularly at very low label rates, outperforming state-of-the-art methods while maintaining computational simplicity.The source code is available at https://github.com/kunzhan/SACN
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Research on target detection method of distracted driving behavior based on improved YOLOv8
Authors:
Shiquan Shen,
Zhizhong Wu,
Pan Zhang
Abstract:
With the development of deep learning technology, the detection and classification of distracted driving behaviour requires higher accuracy. Existing deep learning-based methods are computationally intensive and parameter redundant, limiting the efficiency and accuracy in practical applications. To solve this problem, this study proposes an improved YOLOv8 detection method based on the original YO…
▽ More
With the development of deep learning technology, the detection and classification of distracted driving behaviour requires higher accuracy. Existing deep learning-based methods are computationally intensive and parameter redundant, limiting the efficiency and accuracy in practical applications. To solve this problem, this study proposes an improved YOLOv8 detection method based on the original YOLOv8 model by integrating the BoTNet module, GAM attention mechanism and EIoU loss function. By optimising the feature extraction and multi-scale feature fusion strategies, the training and inference processes are simplified, and the detection accuracy and efficiency are significantly improved. Experimental results show that the improved model performs well in both detection speed and accuracy, with an accuracy rate of 99.4%, and the model is smaller and easy to deploy, which is able to identify and classify distracted driving behaviours in real time, provide timely warnings, and enhance driving safety.
△ Less
Submitted 5 July, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models
Authors:
Haibo Jin,
Leyang Hu,
Xinuo Li,
Peiyan Zhang,
Chonghan Chen,
Jun Zhuang,
Haohan Wang
Abstract:
The rapid evolution of artificial intelligence (AI) through developments in Large Language Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements across various technological domains. While these models enhance capabilities in natural language processing and visual interactive tasks, their growing adoption raises critical concerns regarding security and ethical alignm…
▽ More
The rapid evolution of artificial intelligence (AI) through developments in Large Language Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements across various technological domains. While these models enhance capabilities in natural language processing and visual interactive tasks, their growing adoption raises critical concerns regarding security and ethical alignment. This survey provides an extensive review of the emerging field of jailbreaking--deliberately circumventing the ethical and operational boundaries of LLMs and VLMs--and the consequent development of defense mechanisms. Our study categorizes jailbreaks into seven distinct types and elaborates on defense strategies that address these vulnerabilities. Through this comprehensive examination, we identify research gaps and propose directions for future studies to enhance the security frameworks of LLMs and VLMs. Our findings underscore the necessity for a unified perspective that integrates both jailbreak strategies and defensive solutions to foster a robust, secure, and reliable environment for the next generation of language models. More details can be found on our website: \url{https://chonghan-chen.com/llm-jailbreak-zoo-survey/}.
△ Less
Submitted 25 June, 2024;
originally announced July 2024.
-
MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations
Authors:
Yubo Ma,
Yuhang Zang,
Liangyu Chen,
Meiqi Chen,
Yizhu Jiao,
Xinze Li,
Xinyuan Lu,
Ziyu Liu,
Yan Ma,
Xiaoyi Dong,
Pan Zhang,
Liangming Pan,
Yu-Gang Jiang,
Jiaqi Wang,
Yixin Cao,
Aixin Sun
Abstract:
Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document understanding (DU). However, their abilities on long-context DU remain an open problem. This work presents MMLongBench-Doc, a long-context, multi-modal benchmark co…
▽ More
Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document understanding (DU). However, their abilities on long-context DU remain an open problem. This work presents MMLongBench-Doc, a long-context, multi-modal benchmark comprising 1,062 expert-annotated questions. Distinct from previous datasets, it is constructed upon 130 lengthy PDF-formatted documents with an average of 49.4 pages and 20,971 textual tokens. Towards comprehensive evaluation, answers to these questions rely on pieces of evidence from (1) different sources (text, image, chart, table, and layout structure) and (2) various locations (i.e. page number). Moreover, 33.2% of the questions are cross-page questions requiring evidence across multiple pages. 22.8% of the questions are designed to be unanswerable for detecting potential hallucinations. Experiments on 14 LVLMs demonstrate that long-context DU greatly challenges current models. Notably, the best-performing model, GPT-4o, achieves an F1 score of only 42.7%, while the second-best, GPT-4V, scores 31.4%. Furthermore, 12 LVLMs (all except GPT-4o and GPT-4V) even present worse performance than their LLM counterparts which are fed with lossy-parsed OCR documents. These results validate the necessity of future research toward more capable long-context LVLMs. Project Page: https://mayubo2333.github.io/MMLongBench-Doc
△ Less
Submitted 10 July, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
MARS: Multimodal Active Robotic Sensing for Articulated Characterization
Authors:
Hongliang Zeng,
Ping Zhang,
Chengjiong Wu,
Jiahua Wang,
Tingyu Ye,
Fang Li
Abstract:
Precise perception of articulated objects is vital for empowering service robots. Recent studies mainly focus on point cloud, a single-modal approach, often neglecting vital texture and lighting details and assuming ideal conditions like optimal viewpoints, unrepresentative of real-world scenarios. To address these limitations, we introduce MARS, a novel framework for articulated object characteri…
▽ More
Precise perception of articulated objects is vital for empowering service robots. Recent studies mainly focus on point cloud, a single-modal approach, often neglecting vital texture and lighting details and assuming ideal conditions like optimal viewpoints, unrepresentative of real-world scenarios. To address these limitations, we introduce MARS, a novel framework for articulated object characterization. It features a multi-modal fusion module utilizing multi-scale RGB features to enhance point cloud features, coupled with reinforcement learning-based active sensing for autonomous optimization of observation viewpoints. In experiments conducted with various articulated object instances from the PartNet-Mobility dataset, our method outperformed current state-of-the-art methods in joint parameter estimation accuracy. Additionally, through active sensing, MARS further reduces errors, demonstrating enhanced efficiency in handling suboptimal viewpoints. Furthermore, our method effectively generalizes to real-world articulated objects, enhancing robot interactions. Code is available at https://github.com/robhlzeng/MARS.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Study of $χ_{bJ}(2P)\toωΥ(1S)$ at Belle
Authors:
Belle Collaboration,
Z. S. Stottler,
T. K. Pedlar,
B. G. Fulsom,
I. Adachi,
K. Adamczyk,
H. Aihara,
S. Al Said,
D. M. Asner,
H. Atmacan,
T. Aushev,
R. Ayad,
V. Babu,
Sw. Banerjee,
M. Bauer,
P. Behera,
K. Belous,
J. Bennett,
F. Bernlochner,
M. Bessner,
T. Bilka,
D. Biswas,
A. Bobrov,
D. Bodrov,
G. Bonvicini
, et al. (157 additional authors not shown)
Abstract:
We report a study of the hadronic transitions $χ_{bJ}(2P)\toωΥ(1S)$, with $ω\toπ^{+}π^{-}π^{0}$, using $28.2\times10^6~Υ(3S)$ mesons recorded by the Belle detector. We present the first evidence for the near--threshold transition $χ_{b0}(2P)\toωΥ(1S)$, the analog of the charm sector decay $χ_{c1}(3872)\toωJ/ψ$, with a branching fraction of…
▽ More
We report a study of the hadronic transitions $χ_{bJ}(2P)\toωΥ(1S)$, with $ω\toπ^{+}π^{-}π^{0}$, using $28.2\times10^6~Υ(3S)$ mesons recorded by the Belle detector. We present the first evidence for the near--threshold transition $χ_{b0}(2P)\toωΥ(1S)$, the analog of the charm sector decay $χ_{c1}(3872)\toωJ/ψ$, with a branching fraction of $B\big(χ_{b0}(2P)\toωΥ(1S)\big) = \big(0.55\pm0.19\pm0.07\big)\%$. We also obtain branching fractions of $B\big(χ_{b1}(2P)\toωΥ(1S)\big) = \big(2.39{}^{+0.20}_{-0.19}\pm0.24\big)\%$ and $B\big(χ_{b2}(2P)\toωΥ(1S)\big) = \big(0.47{}^{+0.13}_{-0.12}\pm0.06\big)\%$, confirming the measurement of the $ω$ transitions of the $J=1,2~P$--wave states. The ratio for the $J=2$ to $J=1$ transitions is also measured and found to differ by 3.3 standard deviations from the expected value in the QCD multipole expansion.
△ Less
Submitted 8 July, 2024; v1 submitted 30 June, 2024;
originally announced July 2024.
-
Achieving Energetic Superiority Through System-Level Quantum Circuit Simulation
Authors:
Rong Fu,
Zhongling Su,
Han-Sen Zhong,
Xiti Zhao,
Jianyang Zhang,
Feng Pan,
Pan Zhang,
Xianhe Zhao,
Ming-Cheng Chen,
Chao-Yang Lu,
Jian-Wei Pan,
Zhiling Pei,
Xingcheng Zhang,
Wanli Ouyang
Abstract:
Quantum Computational Superiority boasts rapid computation and high energy efficiency. Despite recent advances in classical algorithms aimed at refuting the milestone claim of Google's sycamore, challenges remain in generating uncorrelated samples of random quantum circuits. In this paper, we present a groundbreaking large-scale system technology that leverages optimization on global, node, and de…
▽ More
Quantum Computational Superiority boasts rapid computation and high energy efficiency. Despite recent advances in classical algorithms aimed at refuting the milestone claim of Google's sycamore, challenges remain in generating uncorrelated samples of random quantum circuits. In this paper, we present a groundbreaking large-scale system technology that leverages optimization on global, node, and device levels to achieve unprecedented scalability for tensor networks. This enables the handling of large-scale tensor networks with memory capacities reaching tens of terabytes, surpassing memory space constraints on a single node. Our techniques enable accommodating large-scale tensor networks with up to tens of terabytes of memory, reaching up to 2304 GPUs with a peak computing power of 561 PFLOPS half-precision. Notably, we have achieved a time-to-solution of 14.22 seconds with energy consumption of 2.39 kWh which achieved fidelity of 0.002 and our most remarkable result is a time-to-solution of 17.18 seconds, with energy consumption of only 0.29 kWh which achieved a XEB of 0.002 after post-processing, outperforming Google's quantum processor Sycamore in both speed and energy efficiency, which recorded 600 seconds and 4.3 kWh, respectively.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
JSCDS: A Core Data Selection Method with Jason-Shannon Divergence for Caries RGB Images-Efficient Learning
Authors:
Peiliang Zhang,
Yujia Tong,
Chenghu Du,
Chao Che,
Yongjun Zhu
Abstract:
Deep learning-based RGB caries detection improves the efficiency of caries identification and is crucial for preventing oral diseases. The performance of deep learning models depends on high-quality data and requires substantial training resources, making efficient deployment challenging. Core data selection, by eliminating low-quality and confusing data, aims to enhance training efficiency withou…
▽ More
Deep learning-based RGB caries detection improves the efficiency of caries identification and is crucial for preventing oral diseases. The performance of deep learning models depends on high-quality data and requires substantial training resources, making efficient deployment challenging. Core data selection, by eliminating low-quality and confusing data, aims to enhance training efficiency without significantly compromising model performance. However, distance-based data selection methods struggle to distinguish dependencies among high-dimensional caries data. To address this issue, we propose a Core Data Selection Method with Jensen-Shannon Divergence (JSCDS) for efficient caries image learning and caries classification. We describe the core data selection criterion as the distribution of samples in different classes. JSCDS calculates the cluster centers by sample embedding representation in the caries classification network and utilizes Jensen-Shannon Divergence to compute the mutual information between data samples and cluster centers, capturing nonlinear dependencies among high-dimensional data. The average mutual information is calculated to fit the above distribution, serving as the criterion for constructing the core set for model training. Extensive experiments on RGB caries datasets show that JSCDS outperforms other data selection methods in prediction performance and time consumption. Notably, JSCDS exceeds the performance of the full dataset model with only 50% of the core data, with its performance advantage becoming more pronounced in the 70% of core data.
△ Less
Submitted 6 July, 2024; v1 submitted 29 June, 2024;
originally announced July 2024.
-
Electronic Correlations and Hund's Rule Coupling in Trilayer Nickelate La4Ni3O10
Authors:
Zihao Huo,
Peng Zhang,
Zihan Zhang,
Defang Duan,
Tian Cui
Abstract:
Trilayer Ruddlesden-Popper phase La4Ni3O10 has been observed with Tc over 30 K at high pressure in recent experiment, which further expanded the nickelate superconductors family. In this study, we explored the effects of electronic correlations in La4Ni3O10 using density function theory plus dynamical mean-field theory at ambient pressure and high pressure. Our derived spectral functions and Fermi…
▽ More
Trilayer Ruddlesden-Popper phase La4Ni3O10 has been observed with Tc over 30 K at high pressure in recent experiment, which further expanded the nickelate superconductors family. In this study, we explored the effects of electronic correlations in La4Ni3O10 using density function theory plus dynamical mean-field theory at ambient pressure and high pressure. Our derived spectral functions and Fermi surface of ambient phase are nicely consistent with the experimental results of angle-resolved photoemission spectroscopy, which emphasized the importance of electronic correlations in La4Ni3O10. We also found the electronic correlations in pressured La4Ni3O10 are both orbital-dependent and layer-dependent due to the presence of Hund's rule coupling. There is a competition between Hund's rule coupling and crystal-field splitting, where the Ni-O layers with weaker crystal field splitting energy would have stronger electronic correlations.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
The Emergence of Threads: The Birth of a New Social Network
Authors:
Peixian Zhang,
Yupeng He,
Ehsan-Ul Haq,
Jiahui He,
Gareth Tyson
Abstract:
Threads, a new microblogging platform from Meta, was launched in July 2023. In contrast to prior new platforms, Threads was borne out of an existing parent platform, Instagram, for which all users must already possess an account. This offers a unique opportunity to study platform evolution, to understand how one existing platform can support the "birth" of another. With this in mind, this paper pr…
▽ More
Threads, a new microblogging platform from Meta, was launched in July 2023. In contrast to prior new platforms, Threads was borne out of an existing parent platform, Instagram, for which all users must already possess an account. This offers a unique opportunity to study platform evolution, to understand how one existing platform can support the "birth" of another. With this in mind, this paper provides an initial exploration of Threads, contrasting it with its parent, Instagram. We compare user behaviour within and across the two social media platforms, focusing on posting frequency, content preferences, and engagement patterns. Utilising a temporal analysis framework, we identify consistent daily posting trends on the parent platform and uncover contrasting behaviours when comparing intra-platform and cross-platform activities. Our findings reveal that Threads engages more with political and AI-related topics, compared to Instagram which focuses more on lifestyle and fashion topics. Our analysis also shows that user activities align more closely on weekends across both platforms. Engagement analysis suggests that users prefer to post about topics that garner more likes and that topic consistency is maintained when users transition from Instagram to Threads. Our research provides insights into user behaviour and offers a basis for future studies on Threads.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Improved measurement of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential dec…
▽ More
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential decay rate of $D^+_s\to K^0 e^+ν_e$ to be $f^{K^0}_+(0)=0.636\pm0.049\pm0.013$. For both measurements, the first uncertainty is statistical and the second systematic. The branching fraction and form factor measurements are factors of 1.6 and 1.7 more precise than the previous world averages, respectively.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Leapfrogging Sycamore: Harnessing 1432 GPUs for 7$\times$ Faster Quantum Random Circuit Sampling
Authors:
Xian-He Zhao,
Han-Sen Zhong,
Feng Pan,
Zi-Han Chen,
Rong Fu,
Zhongling Su,
Xiaotong Xie,
Chaoxing Zhao,
Pan Zhang,
Wanli Ouyang,
Chao-Yang Lu,
Jian-Wei Pan,
Ming-Cheng Chen
Abstract:
Random quantum circuit sampling serves as a benchmark to demonstrate quantum computational advantage. Recent progress in classical algorithms, especially those based on tensor network methods, has significantly reduced the classical simulation time and challenged the claim of the first-generation quantum advantage experiments. However, in terms of generating uncorrelated samples, time-to-solution,…
▽ More
Random quantum circuit sampling serves as a benchmark to demonstrate quantum computational advantage. Recent progress in classical algorithms, especially those based on tensor network methods, has significantly reduced the classical simulation time and challenged the claim of the first-generation quantum advantage experiments. However, in terms of generating uncorrelated samples, time-to-solution, and energy consumption, previous classical simulation experiments still underperform the \textit{Sycamore} processor. Here we report an energy-efficient classical simulation algorithm, using 1432 GPUs to simulate quantum random circuit sampling which generates uncorrelated samples with higher linear cross entropy score and is 7 times faster than \textit{Sycamore} 53 qubits experiment. We propose a post-processing algorithm to reduce the overall complexity, and integrated state-of-the-art high-performance general-purpose GPU to achieve two orders of lower energy consumption compared to previous works. Our work provides the first unambiguous experimental evidence to refute \textit{Sycamore}'s claim of quantum advantage, and redefines the boundary of quantum computational advantage using random circuit sampling.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Measurement of the cross sections of $e^+e^-\to K^{-}\barΞ^{+}Λ/Σ^{0}$ at center-of-mass energies between 3.510 and 4.914 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of…
▽ More
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$, evidence for $ψ(4160) \to K^{-}\barΞ^{+}Λ$ is found for the first time with a significance of 4.4$σ$, including systematic uncertainties. No evidence for other possible resonances is found. In addition, the products of electronic partial width and branching fraction for all assumed resonances decaying into $K^{-}\barΞ^{+}Λ/Σ^{0}$ are determined.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Measurements of $K_S^0$-$K_L^0$ asymmetries in the decays $Λ_c^+ \to pK_{L,S}^0$, $pK_{L,S}^0π^+π^-$ and $pK_{L,S}^0π^0$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, an…
▽ More
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, and $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^0)=(2.02 \pm 0.13 \pm 0.05)\%$, where the first uncertainties are statistical and the second systematic. Combining with the known branching fractions of $Λ_c^+ \to pK_{S}^{0}$, $Λ_c^+ \to pK_{S}^{0}π^+π^-$, and $Λ_c^+ \to pK_{S}^{0}π^0$, we present the first measurements of the $K_{S}^{0}$-$K_{L}^{0}$ asymmetries $R(Λ_c^+, K_{S,L}^0X) = \frac{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) - \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) + \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}$ in charmed baryon decays: $R(Λ_c^+, pK_{S,L}^0) = -0.025 \pm 0.031$, $R(Λ_c^+, pK_{S,L}^0π^+π^-) = -0.027 \pm 0.048$, and $R(Λ_c^+, pK_{S,L}^0π^0) =-0.015 \pm 0.046$. No significant asymmetries within the uncertainties are observed.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Probing many-body Bell correlation depth with superconducting qubits
Authors:
Ke Wang,
Weikang Li,
Shibo Xu,
Mengyao Hu,
Jiachen Chen,
Yaozu Wu,
Chuanyu Zhang,
Feitong Jin,
Xuhao Zhu,
Yu Gao,
Ziqi Tan,
Aosai Zhang,
Ning Wang,
Yiren Zou,
Tingting Li,
Fanhao Shen,
Jiarun Zhong,
Zehang Bao,
Zitian Zhu,
Zixuan Song,
Jinfeng Deng,
Hang Dong,
Xu Zhang,
Pengfei Zhang,
Wenjie Jiang
, et al. (10 additional authors not shown)
Abstract:
Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing…
▽ More
Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing to machine learning. Nevertheless, the detection of nonlocality, especially in quantum many-body systems, is notoriously challenging. Here, we report an experimental certification of genuine multipartite Bell correlations, which signal nonlocality in quantum many-body systems, up to 24 qubits with a fully programmable superconducting quantum processor. In particular, we employ energy as a Bell correlation witness and variationally decrease the energy of a many-body system across a hierarchy of thresholds, below which an increasing Bell correlation depth can be certified from experimental data. As an illustrating example, we variationally prepare the low-energy state of a two-dimensional honeycomb model with 73 qubits and certify its Bell correlations by measuring an energy that surpasses the corresponding classical bound with up to 48 standard deviations. In addition, we variationally prepare a sequence of low-energy states and certify their genuine multipartite Bell correlations up to 24 qubits via energies measured efficiently by parity oscillation and multiple quantum coherence techniques. Our results establish a viable approach for preparing and certifying multipartite Bell correlations, which provide not only a finer benchmark beyond entanglement for quantum devices, but also a valuable guide towards exploiting multipartite Bell correlation in a wide spectrum of practical applications.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Physics-Informed AI Inverter
Authors:
Qing Shen,
Yifan Zhou,
Peng Zhang,
Yacov A. Shamash,
Roshan Sharma,
Bo Chen
Abstract:
This letter devises an AI-Inverter that pilots the use of a physics-informed neural network (PINN) to enable AI-based electromagnetic transient simulations (EMT) of grid-forming inverters. The contributions are threefold: (1) A PINN-enabled AI-Inverter is formulated; (2) An enhanced learning strategy, balanced-adaptive PINN, is devised; (3) extensive validations and comparative analysis of the acc…
▽ More
This letter devises an AI-Inverter that pilots the use of a physics-informed neural network (PINN) to enable AI-based electromagnetic transient simulations (EMT) of grid-forming inverters. The contributions are threefold: (1) A PINN-enabled AI-Inverter is formulated; (2) An enhanced learning strategy, balanced-adaptive PINN, is devised; (3) extensive validations and comparative analysis of the accuracy and efficiency of AI-Inverter are made to show its superiority over the classical electromagnetic transient programs (EMTP).
△ Less
Submitted 10 July, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
Study of the $f_{0}(980)$ through the decay $D_{s}^{+}\rightarrow π^{+}π^{+}π^{-}π^{0}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (649 additional authors not shown)
Abstract:
We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and…
▽ More
We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and determine the branching fractions $\mathcal{B}(D_s^+\toπ^+π^+π^-π^0|_{{\rm non}-η})=(2.04\pm0.08_{\rm stat.}\pm0.05_{\rm syst.})\%$ and $\mathcal{B}(D_s^+\toηπ^+)=(1.56\pm0.09_{\rm stat.}\pm0.04_{\rm syst.})\%$. Moreover, we measure the relative branching fraction between $φ\toπ^+π^-π^0$ and $φ\to K^+K^-$ to be $\frac{\mathcal{B}(φ(1020) \to π^+π^-π^0)}{\mathcal{B}(φ(1020) \to K^+K^-)}=0.230 \pm 0.014_{\rm stat.} \pm 0.010_{\rm syst.}$, which deviates from the world average value by more than $4σ$.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Masked Generative Extractor for Synergistic Representation and 3D Generation of Point Clouds
Authors:
Hongliang Zeng,
Ping Zhang,
Fang Li,
Jiahua Wang,
Tingyu Ye,
Pengteng Guo
Abstract:
In the field of 2D image generation modeling and representation learning, Masked Generative Encoder (MAGE) has demonstrated the synergistic potential between generative modeling and representation learning. Inspired by this, we propose Point-MAGE to extend this concept to point cloud data. Specifically, this framework first utilizes a Vector Quantized Variational Autoencoder (VQVAE) to reconstruct…
▽ More
In the field of 2D image generation modeling and representation learning, Masked Generative Encoder (MAGE) has demonstrated the synergistic potential between generative modeling and representation learning. Inspired by this, we propose Point-MAGE to extend this concept to point cloud data. Specifically, this framework first utilizes a Vector Quantized Variational Autoencoder (VQVAE) to reconstruct a neural field representation of 3D shapes, thereby learning discrete semantic features of point patches. Subsequently, by combining the masking model with variable masking ratios, we achieve synchronous training for both generation and representation learning. Furthermore, our framework seamlessly integrates with existing point cloud self-supervised learning (SSL) models, thereby enhancing their performance. We extensively evaluate the representation learning and generation capabilities of Point-MAGE. In shape classification tasks, Point-MAGE achieved an accuracy of 94.2% on the ModelNet40 dataset and 92.9% (+1.3%) on the ScanObjectNN dataset. Additionally, it achieved new state-of-the-art performance in few-shot learning and part segmentation tasks. Experimental results also confirmed that Point-MAGE can generate detailed and high-quality 3D shapes in both unconditional and conditional settings.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
MindSpore Quantum: A User-Friendly, High-Performance, and AI-Compatible Quantum Computing Framework
Authors:
Xusheng Xu,
Jiangyu Cui,
Zidong Cui,
Runhong He,
Qingyu Li,
Xiaowei Li,
Yanling Lin,
Jiale Liu,
Wuxin Liu,
Jiale Lu,
Maolin Luo,
Chufan Lyu,
Shijie Pan,
Mosharev Pavel,
Runqiu Shu,
Jialiang Tang,
Ruoqian Xu,
Shu Xu,
Kang Yang,
Fan Yu,
Qingguo Zeng,
Haiying Zhao,
Qiang Zheng,
Junyuan Zhou,
Xu Zhou
, et al. (14 additional authors not shown)
Abstract:
We introduce MindSpore Quantum, a pioneering hybrid quantum-classical framework with a primary focus on the design and implementation of noisy intermediate-scale quantum (NISQ) algorithms. Leveraging the robust support of MindSpore, an advanced open-source deep learning training/inference framework, MindSpore Quantum exhibits exceptional efficiency in the design and training of variational quantum…
▽ More
We introduce MindSpore Quantum, a pioneering hybrid quantum-classical framework with a primary focus on the design and implementation of noisy intermediate-scale quantum (NISQ) algorithms. Leveraging the robust support of MindSpore, an advanced open-source deep learning training/inference framework, MindSpore Quantum exhibits exceptional efficiency in the design and training of variational quantum algorithms on both CPU and GPU platforms, delivering remarkable performance. Furthermore, this framework places a strong emphasis on enhancing the operational efficiency of quantum algorithms when executed on real quantum hardware. This encompasses the development of algorithms for quantum circuit compilation and qubit mapping, crucial components for achieving optimal performance on quantum processors. In addition to the core framework, we introduce QuPack, a meticulously crafted quantum computing acceleration engine. QuPack significantly accelerates the simulation speed of MindSpore Quantum, particularly in variational quantum eigensolver (VQE), quantum approximate optimization algorithm (QAOA), and tensor network simulations, providing astonishing speed. This combination of cutting-edge technologies empowers researchers and practitioners to explore the frontiers of quantum computing with unprecedented efficiency and performance.
△ Less
Submitted 10 July, 2024; v1 submitted 24 June, 2024;
originally announced June 2024.
-
Long Context Transfer from Language to Vision
Authors:
Peiyuan Zhang,
Kaichen Zhang,
Bo Li,
Guangtao Zeng,
Jingkang Yang,
Yuanhan Zhang,
Ziyue Wang,
Haoran Tan,
Chunyuan Li,
Ziwei Liu
Abstract:
Video sequences offer valuable temporal information, but existing large multimodal models (LMMs) fall short in understanding extremely long videos. Many works address this by reducing the number of visual tokens using visual resamplers. Alternatively, in this paper, we approach this problem from the perspective of the language model. By simply extrapolating the context length of the language backb…
▽ More
Video sequences offer valuable temporal information, but existing large multimodal models (LMMs) fall short in understanding extremely long videos. Many works address this by reducing the number of visual tokens using visual resamplers. Alternatively, in this paper, we approach this problem from the perspective of the language model. By simply extrapolating the context length of the language backbone, we enable LMMs to comprehend orders of magnitude more visual tokens without any video training. We call this phenomenon long context transfer and carefully ablate its properties. To effectively measure LMMs' ability to generalize to long contexts in the vision modality, we develop V-NIAH (Visual Needle-In-A-Haystack), a purely synthetic long vision benchmark inspired by the language model's NIAH test. Our proposed Long Video Assistant (LongVA) can process 2000 frames or over 200K visual tokens without additional complexities. With its extended context length, LongVA achieves state-of-the-art performance on Video-MME among 7B-scale models by densely sampling more input frames. Our work is open-sourced at https://github.com/EvolvingLMMs-Lab/LongVA.
△ Less
Submitted 30 June, 2024; v1 submitted 24 June, 2024;
originally announced June 2024.