-
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?
Authors:
Mo Li,
Songyang Zhang,
Yunxin Liu,
Kai Chen
Abstract:
In evaluating the long-context capabilities of large language models (LLMs), identifying content relevant to a user's query from original long documents is a crucial prerequisite for any LLM to answer questions based on long text. We present NeedleBench, a framework consisting of a series of progressively more challenging tasks for assessing bilingual long-context capabilities, spanning multiple l…
▽ More
In evaluating the long-context capabilities of large language models (LLMs), identifying content relevant to a user's query from original long documents is a crucial prerequisite for any LLM to answer questions based on long text. We present NeedleBench, a framework consisting of a series of progressively more challenging tasks for assessing bilingual long-context capabilities, spanning multiple length intervals (4k, 8k, 32k, 128k, 200k, 1000k, and beyond) and different depth ranges, allowing the strategic insertion of critical data points in different text depth zones to rigorously test the retrieval and reasoning capabilities of models in diverse contexts. We use the NeedleBench framework to assess how well the leading open-source models can identify key information relevant to the question and apply that information to reasoning in bilingual long texts. Furthermore, we propose the Ancestral Trace Challenge (ATC) to mimic the complexity of logical reasoning challenges that are likely to be present in real-world long-context tasks, providing a simple method for evaluating LLMs in dealing with complex long-context situations. Our results suggest that current LLMs have significant room for improvement in practical long-context applications, as they struggle with the complexity of logical reasoning challenges that are likely to be present in real-world long-context tasks. All codes and resources are available at OpenCompass: https://github.com/open-compass/opencompass.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Impossibility of latent inner product recovery via rate distortion
Authors:
Cheng Mao,
Shenduo Zhang
Abstract:
In this largely expository note, we present an impossibility result for inner product recovery in a random geometric graph or latent space model using the rate-distortion theory. More precisely, suppose that we observe a graph $A$ on $n$ vertices with average edge density $p$ generated from Gaussian or spherical latent locations $z_1, \dots, z_n \in \mathbb{R}^d$ associated with the $n$ vertices.…
▽ More
In this largely expository note, we present an impossibility result for inner product recovery in a random geometric graph or latent space model using the rate-distortion theory. More precisely, suppose that we observe a graph $A$ on $n$ vertices with average edge density $p$ generated from Gaussian or spherical latent locations $z_1, \dots, z_n \in \mathbb{R}^d$ associated with the $n$ vertices. It is of interest to estimate the inner products $\langle z_i, z_j \rangle$ which represent the geometry of the latent points. We prove that it is impossible to recover the inner products if $d \gtrsim n h(p)$ where $h(p)$ is the binary entropy function. This matches the condition required for positive results on inner product recovery in the literature. The proof follows the well-established rate-distortion theory with the main technical ingredient being a lower bound on the rate-distortion function of the Wishart distribution which is interesting in its own right.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Improved Quantum Power Method and Numerical Integration Using Quantum Singular Value Transformation
Authors:
Nhat A. Nghiem,
Hiroki Sukeno,
Shuyu Zhang,
Tzu-Chieh Wei
Abstract:
Quantum singular value transformation (QSVT) is a framework that has been shown to unify many primitives in quantum algorithms. In this work, we leverage the QSVT framework in two directions. We first show that the QSVT framework can accelerate one recently introduced quantum power method, which substantially improves its running time. Additionally, we incorporate several elementary numerical inte…
▽ More
Quantum singular value transformation (QSVT) is a framework that has been shown to unify many primitives in quantum algorithms. In this work, we leverage the QSVT framework in two directions. We first show that the QSVT framework can accelerate one recently introduced quantum power method, which substantially improves its running time. Additionally, we incorporate several elementary numerical integration techniques, such as the rectangular method, Monte Carlo method, and quadrature method, into the QSVT framework, which results in polynomial speedup with respect to the size or the number of points of the grid. Our results thus provide further examples to demonstrate the potential of the QSVT and how it may enhance quantum algorithmic tasks.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Measurement of the branching fraction of $D^+_s\to \ell^+ν_\ell$ via $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (634 additional authors not shown)
Abstract:
Based on $10.64~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data taken at center-of-mass energies between 4.237 and 4.699 GeV with the BESIII detector, we study the leptonic $D^+_s$ decays using the $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$ process. The branching fractions of $D_s^+\to\ell^+ν_{\ell}\,(\ell=μ,τ)$ are measured to be $\mathcal{B}(D_s^+\toμ^+ν_μ)=(\bfmuv)\%$ and…
▽ More
Based on $10.64~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data taken at center-of-mass energies between 4.237 and 4.699 GeV with the BESIII detector, we study the leptonic $D^+_s$ decays using the $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$ process. The branching fractions of $D_s^+\to\ell^+ν_{\ell}\,(\ell=μ,τ)$ are measured to be $\mathcal{B}(D_s^+\toμ^+ν_μ)=(\bfmuv)\%$ and $\mathcal{B}(D_s^+\toτ^+ν_τ)=(\bftauv)\%$, respectively. The product of the decay constant and Cabibbo-Kobayashi-Maskawa matrix element $|V_{cs}|$ is determined to be $f_{D_s^+}|V_{cs}|=(\mufdsxvcsresult)_{μν}~\mathrm{MeV}$ and $f_{D_s^+}|V_{cs}|=(\taufdsxvcsresult))_{τν}~\mathrm{MeV}$, respectively. Taking the value of $|V_{cs}|$ from a global fit in the Standard Model, we obtain ${f_{D^+_s}}=(\mufdsresult)_{μν}$\,MeV and ${f_{D^+_s}}=(\taufdsresult)_{τν}$\,MeV, respectively. Conversely, taking the value for $f_{D_s^+}$ from the latest lattice quantum chromodynamics calculation, we obtain $|V_{cs}| =(\muvcsresult)_{μν}$ and $|V_{cs}| = (\tauvcsresult)_{τν}$, respectively.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Relation DETR: Exploring Explicit Position Relation Prior for Object Detection
Authors:
Xiuquan Hou,
Meiqin Liu,
Senlin Zhang,
Ping Wei,
Badong Chen,
Xuguang Lan
Abstract:
This paper presents a general scheme for enhancing the convergence and performance of DETR (DEtection TRansformer). We investigate the slow convergence problem in transformers from a new perspective, suggesting that it arises from the self-attention that introduces no structural bias over inputs. To address this issue, we explore incorporating position relation prior as attention bias to augment o…
▽ More
This paper presents a general scheme for enhancing the convergence and performance of DETR (DEtection TRansformer). We investigate the slow convergence problem in transformers from a new perspective, suggesting that it arises from the self-attention that introduces no structural bias over inputs. To address this issue, we explore incorporating position relation prior as attention bias to augment object detection, following the verification of its statistical significance using a proposed quantitative macroscopic correlation (MC) metric. Our approach, termed Relation-DETR, introduces an encoder to construct position relation embeddings for progressive attention refinement, which further extends the traditional streaming pipeline of DETR into a contrastive relation pipeline to address the conflicts between non-duplicate predictions and positive supervision. Extensive experiments on both generic and task-specific datasets demonstrate the effectiveness of our approach. Under the same configurations, Relation-DETR achieves a significant improvement (+2.0% AP compared to DINO), state-of-the-art performance (51.7% AP for 1x and 52.1% AP for 2x settings), and a remarkably faster convergence speed (over 40% AP with only 2 training epochs) than existing DETR detectors on COCO val2017. Moreover, the proposed relation encoder serves as a universal plug-in-and-play component, bringing clear improvements for theoretically any DETR-like methods. Furthermore, we introduce a class-agnostic detection dataset, SA-Det-100k. The experimental results on the dataset illustrate that the proposed explicit position relation achieves a clear improvement of 1.3% AP, highlighting its potential towards universal object detection. The code and dataset are available at https://github.com/xiuqhou/Relation-DETR.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Video-Language Alignment Pre-training via Spatio-Temporal Graph Transformer
Authors:
Shi-Xue Zhang,
Hongfa Wang,
Xiaobin Zhu,
Weibo Gu,
Tianjin Zhang,
Chun Yang,
Wei Liu,
Xu-Cheng Yin
Abstract:
Video-language alignment is a crucial multi-modal task that benefits various downstream applications, e.g., video-text retrieval and video question answering. Existing methods either utilize multi-modal information in video-text pairs or apply global and local alignment techniques to promote alignment precision. However, these methods often fail to fully explore the spatio-temporal relationships a…
▽ More
Video-language alignment is a crucial multi-modal task that benefits various downstream applications, e.g., video-text retrieval and video question answering. Existing methods either utilize multi-modal information in video-text pairs or apply global and local alignment techniques to promote alignment precision. However, these methods often fail to fully explore the spatio-temporal relationships among vision tokens within video and across different video-text pairs. In this paper, we propose a novel Spatio-Temporal Graph Transformer module to uniformly learn spatial and temporal contexts for video-language alignment pre-training (dubbed STGT). Specifically, our STGT combines spatio-temporal graph structure information with attention in transformer block, effectively utilizing the spatio-temporal contexts. In this way, we can model the relationships between vision tokens, promoting video-text alignment precision for benefiting downstream tasks. In addition, we propose a self-similarity alignment loss to explore the inherent self-similarity in the video and text. With the initial optimization achieved by contrastive learning, it can further promote the alignment accuracy between video and text. Experimental results on challenging downstream tasks, including video-text retrieval and video question answering, verify the superior performance of our method.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Perception Helps Planning: Facilitating Multi-Stage Lane-Level Integration via Double-Edge Structures
Authors:
Guoliang You,
Xiaomeng Chu,
Yifan Duan,
Wenyu Zhang,
Xingchen Li,
Sha Zhang,
Yao Li,
Jianmin Ji,
Yanyong Zhang
Abstract:
When planning for autonomous driving, it is crucial to consider essential traffic elements such as lanes, intersections, traffic regulations, and dynamic agents. However, they are often overlooked by the traditional end-to-end planning methods, likely leading to inefficiencies and non-compliance with traffic regulations. In this work, we endeavor to integrate the perception of these elements into…
▽ More
When planning for autonomous driving, it is crucial to consider essential traffic elements such as lanes, intersections, traffic regulations, and dynamic agents. However, they are often overlooked by the traditional end-to-end planning methods, likely leading to inefficiencies and non-compliance with traffic regulations. In this work, we endeavor to integrate the perception of these elements into the planning task. To this end, we propose Perception Helps Planning (PHP), a novel framework that reconciles lane-level planning with perception. This integration ensures that planning is inherently aligned with traffic constraints, thus facilitating safe and efficient driving. Specifically, PHP focuses on both edges of a lane for planning and perception purposes, taking into consideration the 3D positions of both lane edges and attributes for lane intersections, lane directions, lane occupancy, and planning. In the algorithmic design, the process begins with the transformer encoding multi-camera images to extract the above features and predicting lane-level perception results. Next, the hierarchical feature early fusion module refines the features for predicting planning attributes. Finally, the double-edge interpreter utilizes a late-fusion process specifically designed to integrate lane-level perception and planning information, culminating in the generation of vehicle control signals. Experiments on three Carla benchmarks show significant improvements in driving score of 27.20%, 33.47%, and 15.54% over existing algorithms, respectively, achieving the state-of-the-art performance, with the system operating up to 22.57 FPS.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark
Authors:
Yuke Lin,
Ming Cheng,
Fulin Zhang,
Yingying Gao,
Shilei Zhang,
Ming Li
Abstract:
In this paper, we provide a large audio-visual speaker recognition dataset, VoxBlink2, which includes approximately 10M utterances with videos from 110K+ speakers in the wild. This dataset represents a significant expansion over the VoxBlink dataset, encompassing a broader diversity of speakers and scenarios by the grace of an optimized data collection pipeline. Afterward, we explore the impact of…
▽ More
In this paper, we provide a large audio-visual speaker recognition dataset, VoxBlink2, which includes approximately 10M utterances with videos from 110K+ speakers in the wild. This dataset represents a significant expansion over the VoxBlink dataset, encompassing a broader diversity of speakers and scenarios by the grace of an optimized data collection pipeline. Afterward, we explore the impact of training strategies, data scale, and model complexity on speaker verification and finally establish a new single-model state-of-the-art EER at 0.170% and minDCF at 0.006% on the VoxCeleb1-O test set. Such remarkable results motivate us to explore speaker recognition from a new challenging perspective. We raise the Open-Set Speaker-Identification task, which is designed to either match a probe utterance with a known gallery speaker or categorize it as an unknown query. Associated with this task, we design concrete benchmark and evaluation protocols. The data and model resources can be found in http://voxblink2.github.io.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Search for the rare $Λ_c^+ \to p μ^+ μ^-$ decay
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1062 additional authors not shown)
Abstract:
A search for the nonresonant $Λ_c^+ \to p μ^+ μ^-$ decay is performed using proton-proton collision data recorded at a centre-of-mass energy of 13 TeV by the LHCb experiment, corresponding to an integrated luminosity of 5.4 fb$^{-1}$. No evidence for the decay is found in the dimuon invariant-mass regions where the expected contributions of resonances is subdominant. The upper limit on the branchi…
▽ More
A search for the nonresonant $Λ_c^+ \to p μ^+ μ^-$ decay is performed using proton-proton collision data recorded at a centre-of-mass energy of 13 TeV by the LHCb experiment, corresponding to an integrated luminosity of 5.4 fb$^{-1}$. No evidence for the decay is found in the dimuon invariant-mass regions where the expected contributions of resonances is subdominant. The upper limit on the branching fraction of the $Λ_c^+ \to p μ^+ μ^-$ decay is determined to be $2.9~(3.2) \times 10^{-8}$ at 90% (95%) confidence level. The branching fractions in the dimuon invariant-mass regions dominated by the $η$, $ρ$ and $ω$ resonances are also determined.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Follow-up observations of apparently one-off sources from the Parkes telescope
Authors:
S. B. Zhang,
X. Yang
Abstract:
A small fraction of fast radio bursts (FRBs) have been observed with multiple bursts, whereas most Galactic sources emitting radio pulses are known to repeat. Here we present the results of follow-up observations of two FRBs and four rotating radio transients (RRATs). Among these, only one RRAT has been observed with repeating pulses, with an estimated period of around 1.297047 s. For comparison,…
▽ More
A small fraction of fast radio bursts (FRBs) have been observed with multiple bursts, whereas most Galactic sources emitting radio pulses are known to repeat. Here we present the results of follow-up observations of two FRBs and four rotating radio transients (RRATs). Among these, only one RRAT has been observed with repeating pulses, with an estimated period of around 1.297047 s. For comparison, we reanalysed the Parkes archival follow-up observations in CSIRO's data archive for all apparently one-off sources discovered by the Parkes telescopes, including 13 RRATs and 29 FRBs. In total, 3 RRATs are suggested to be repeaters, but no repeating signals were detected from the other sources. Reporting details of the non-detection observations for the apparently one-off sources would help investigate their origins, and catastrophic scenarios are worth proposing for both extragalactic and Galactic sources.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models
Authors:
Mianxin Liu,
Jinru Ding,
Jie Xu,
Weiguo Hu,
Xiaoyang Li,
Lifeng Zhu,
Zhian Bai,
Xiaoming Shi,
Benyou Wang,
Haitao Song,
Pengfei Liu,
Xiaofan Zhang,
Shanshan Wang,
Kang Li,
Haofen Wang,
Tong Ruan,
Xuanjing Huang,
Xin Sun,
Shaoting Zhang
Abstract:
Ensuring the general efficacy and goodness for human beings from medical large language models (LLM) before real-world deployment is crucial. However, a widely accepted and accessible evaluation process for medical LLM, especially in the Chinese context, remains to be established. In this work, we introduce "MedBench", a comprehensive, standardized, and reliable benchmarking system for Chinese med…
▽ More
Ensuring the general efficacy and goodness for human beings from medical large language models (LLM) before real-world deployment is crucial. However, a widely accepted and accessible evaluation process for medical LLM, especially in the Chinese context, remains to be established. In this work, we introduce "MedBench", a comprehensive, standardized, and reliable benchmarking system for Chinese medical LLM. First, MedBench assembles the currently largest evaluation dataset (300,901 questions) to cover 43 clinical specialties and performs multi-facet evaluation on medical LLM. Second, MedBench provides a standardized and fully automatic cloud-based evaluation infrastructure, with physical separations for question and ground truth. Third, MedBench implements dynamic evaluation mechanisms to prevent shortcut learning and answer remembering. Applying MedBench to popular general and medical LLMs, we observe unbiased, reproducible evaluation results largely aligning with medical professionals' perspectives. This study establishes a significant foundation for preparing the practical applications of Chinese medical LLMs. MedBench is publicly accessible at https://medbench.opencompass.org.cn.
△ Less
Submitted 23 June, 2024;
originally announced July 2024.
-
When Heterophily Meets Heterogeneity: New Graph Benchmarks and Effective Methods
Authors:
Junhong Lin,
Xiaojie Guo,
Shuaicheng Zhang,
Dawei Zhou,
Yada Zhu,
Julian Shun
Abstract:
Many real-world graphs frequently present challenges for graph learning due to the presence of both heterophily and heterogeneity. However, existing benchmarks for graph learning often focus on heterogeneous graphs with homophily or homogeneous graphs with heterophily, leaving a gap in understanding how methods perform on graphs that are both heterogeneous and heterophilic. To bridge this gap, we…
▽ More
Many real-world graphs frequently present challenges for graph learning due to the presence of both heterophily and heterogeneity. However, existing benchmarks for graph learning often focus on heterogeneous graphs with homophily or homogeneous graphs with heterophily, leaving a gap in understanding how methods perform on graphs that are both heterogeneous and heterophilic. To bridge this gap, we introduce H2GB, a novel graph benchmark that brings together the complexities of both the heterophily and heterogeneity properties of graphs. Our benchmark encompasses 9 diverse real-world datasets across 5 domains, 28 baseline model implementations, and 26 benchmark results. In addition, we present a modular graph transformer framework UnifiedGT and a new model variant, H2G-former, that excels at this challenging benchmark. By integrating masked label embeddings, cross-type heterogeneous attention, and type-specific FFNs, H2G-former effectively tackles graph heterophily and heterogeneity. Extensive experiments across 26 baselines on H2GB reveal inadequacies of current models on heterogeneous heterophilic graph learning, and demonstrate the superiority of our H2G-former over existing solutions. Both the benchmark and the framework are available on GitHub (https://github.com/junhongmit/H2GB) and PyPI (https://pypi.org/project/H2GB), and documentation can be found at https://junhongmit.github.io/H2GB/.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
First Measurement of Solar $^8$B Neutrino Flux through Coherent Elastic Neutrino-Nucleus Scattering in PandaX-4T
Authors:
PandaX Collaboration,
Zihao Bo,
Wei Chen,
Xun Chen,
Yunhua Chen,
Zhaokan Cheng,
Xiangyi Cui,
Yingjie Fan,
Deqing Fang,
Zhixing Gao,
Lisheng Geng,
Karl Giboni,
Xunan Guo,
Xuyuan Guo,
Zichao Guo,
Chencheng Han,
Ke Han,
Changda He,
Jinrong He,
Di Huang,
Houqi Huang,
Junting Huang,
Ruquan Hou,
Yu Hou,
Xiangdong Ji
, et al. (77 additional authors not shown)
Abstract:
The PandaX-4T liquid xenon detector at the China Jinping Underground Laboratory is used to measure the solar $^8$B neutrino flux by detecting neutrinos through coherent scattering with xenon nuclei. Data samples requiring the coincidence of scintillation and ionization signals (paired), as well as unpaired ionization-only signals (US2), are selected with energy threshold of approximately 1.1 keV (…
▽ More
The PandaX-4T liquid xenon detector at the China Jinping Underground Laboratory is used to measure the solar $^8$B neutrino flux by detecting neutrinos through coherent scattering with xenon nuclei. Data samples requiring the coincidence of scintillation and ionization signals (paired), as well as unpaired ionization-only signals (US2), are selected with energy threshold of approximately 1.1 keV (0.33 keV) nuclear recoil energy. Combining the commissioning run and the first science run of PandaX-4T, a total exposure of 1.25 and 1.04 tonne$\cdot$year are collected for the paired and US2, respectively. After unblinding, 3 and 332 events are observed with an expectation of 2.8$\pm$0.5 and 251$\pm$32 background events, for the paired and US2 data, respectively. A combined analysis yields a best-fit $^8$B neutrino signal of 3.5 (75) events from the paired (US2) data sample, with $\sim$37\% uncertainty, and the background-only hypothesis is disfavored at 2.64$σ$ significance. This gives a solar $^8$B neutrino flux of ($8.4\pm3.1$)$\times$10$^6$ cm$^{-2}$s$^{-1}$, consistent with the standard solar model prediction. This is the first indication of solar $^8$B neutrino ``fog'' in a dark matter direct detection experiment.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
R3D-AD: Reconstruction via Diffusion for 3D Anomaly Detection
Authors:
Zheyuan Zhou,
Le Wang,
Naiyu Fang,
Zili Wang,
Lemiao Qiu,
Shuyou Zhang
Abstract:
3D anomaly detection plays a crucial role in monitoring parts for localized inherent defects in precision manufacturing. Embedding-based and reconstruction-based approaches are among the most popular and successful methods. However, there are two major challenges to the practical application of the current approaches: 1) the embedded models suffer the prohibitive computational and storage due to t…
▽ More
3D anomaly detection plays a crucial role in monitoring parts for localized inherent defects in precision manufacturing. Embedding-based and reconstruction-based approaches are among the most popular and successful methods. However, there are two major challenges to the practical application of the current approaches: 1) the embedded models suffer the prohibitive computational and storage due to the memory bank structure; 2) the reconstructive models based on the MAE mechanism fail to detect anomalies in the unmasked regions. In this paper, we propose R3D-AD, reconstructing anomalous point clouds by diffusion model for precise 3D anomaly detection. Our approach capitalizes on the data distribution conversion of the diffusion process to entirely obscure the input's anomalous geometry. It step-wisely learns a strict point-level displacement behavior, which methodically corrects the aberrant points. To increase the generalization of the model, we further present a novel 3D anomaly simulation strategy named Patch-Gen to generate realistic and diverse defect shapes, which narrows the domain gap between training and testing. Our R3D-AD ensures a uniform spatial transformation, which allows straightforwardly generating anomaly results by distance comparison. Extensive experiments show that our R3D-AD outperforms previous state-of-the-art methods, achieving 73.4% Image-level AUROC on the Real3D-AD dataset and 74.9% Image-level AUROC on the Anomaly-ShapeNet dataset with an exceptional efficiency.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Revealing Gas Inflows towards the Galactic Central Molecular Zone
Authors:
Yang Su,
Shiyu Zhang,
Yan Sun,
Ji Yang,
Qing-Zeng Yan,
Shaobo Zhang,
Zhiwei Chen,
Xuepeng Chen,
Xin Zhou,
Lixia Yuan
Abstract:
We study the gas inflows towards the Galactic Central Molecular Zone (CMZ) based on the gas morphological and kinematic features from the MWISP in the region of l=1.2 deg--19.0 deg and |b|<3.0 deg. We find that the near dust lane extends to l~15 deg, in which the end of the large-scale gas structure intersects with the 3 kpc-ring at a distance of ~5 kpc. Intriguingly, many filamentary MCs, togethe…
▽ More
We study the gas inflows towards the Galactic Central Molecular Zone (CMZ) based on the gas morphological and kinematic features from the MWISP in the region of l=1.2 deg--19.0 deg and |b|<3.0 deg. We find that the near dust lane extends to l~15 deg, in which the end of the large-scale gas structure intersects with the 3 kpc-ring at a distance of ~5 kpc. Intriguingly, many filamentary MCs, together with the bow-like/ballistic-like clouds and continuous CO features with notable velocity gradient, are finely outlined along the long structure. These MCs also have relatively large velocity dispersions, indicating the shocked gas generated by local continuous accretion and thus the enhanced turbulence along the entire gas structure. We suggest that the ~3.1--3.6 kpc long CO structure originates from the accretion molecular gas driven by the Galactic bar. The gas near the bar end at the 3 kpc-ring becomes an important reservoir for the large-scale accreting flows inwards to the CMZ through the bar channel. The inclination angle of the bar is estimated to be 20--26 deg, while the pattern speed of the bar is 30--35 km/s. The total mass of the whole gas lane is about (0.9-1.7)x10^7 Msun according to the calculated X_CO=(0.6-1.4)x10^20 cm^-2 (Kkm/s)^-1 from the large-scale CO data and the complementary HI data. The mean gas inflow rate is about 0.8-1.4 Msun/yr, which seems to be comparable to the outflow's rate of the Galactic nuclear winds after applying the updated lower X-factor value above.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Pathology-knowledge Enhanced Multi-instance Prompt Learning for Few-shot Whole Slide Image Classification
Authors:
Linhao Qu,
Dingkang Yang,
Dan Huang,
Qinhao Guo,
Rongkui Luo,
Shaoting Zhang,
Xiaosong Wang
Abstract:
Current multi-instance learning algorithms for pathology image analysis often require a substantial number of Whole Slide Images for effective training but exhibit suboptimal performance in scenarios with limited learning data. In clinical settings, restricted access to pathology slides is inevitable due to patient privacy concerns and the prevalence of rare or emerging diseases. The emergence of…
▽ More
Current multi-instance learning algorithms for pathology image analysis often require a substantial number of Whole Slide Images for effective training but exhibit suboptimal performance in scenarios with limited learning data. In clinical settings, restricted access to pathology slides is inevitable due to patient privacy concerns and the prevalence of rare or emerging diseases. The emergence of the Few-shot Weakly Supervised WSI Classification accommodates the significant challenge of the limited slide data and sparse slide-level labels for diagnosis. Prompt learning based on the pre-trained models (\eg, CLIP) appears to be a promising scheme for this setting; however, current research in this area is limited, and existing algorithms often focus solely on patch-level prompts or confine themselves to language prompts. This paper proposes a multi-instance prompt learning framework enhanced with pathology knowledge, \ie, integrating visual and textual prior knowledge into prompts at both patch and slide levels. The training process employs a combination of static and learnable prompts, effectively guiding the activation of pre-trained models and further facilitating the diagnosis of key pathology patterns. Lightweight Messenger (self-attention) and Summary (attention-pooling) layers are introduced to model relationships between patches and slides within the same patient data. Additionally, alignment-wise contrastive losses ensure the feature-level alignment between visual and textual learnable prompts for both patches and slides. Our method demonstrates superior performance in three challenging clinical tasks, significantly outperforming comparative few-shot methods.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Edwards thermodynamic framework controls density segregation in cyclically sheared granular materials
Authors:
Haiyang Lu,
Houfei Yuan,
Shuyang Zhang,
Zhikun Zeng,
Yi Xing,
Jiazhao Xu,
Xin Wang,
Yujie Wang
Abstract:
Using X-ray tomography, we experimentally investigate granular segregation phenomena in a mixture of particles with different densities under quasi-static cyclic shear. We quantitatively characterize their height distributions at steady states by minimizing effective free energy based on a segregation temperature that captures the competition between the mixing entropy and gravitational potential…
▽ More
Using X-ray tomography, we experimentally investigate granular segregation phenomena in a mixture of particles with different densities under quasi-static cyclic shear. We quantitatively characterize their height distributions at steady states by minimizing effective free energy based on a segregation temperature that captures the competition between the mixing entropy and gravitational potential energy. We find this temperature coincides with Edwards' compactivity within error under various pressures and cyclic shear amplitudes. Therefore, we find that granular segregation in quasi-static conditions can be fundamentally explained by an effective granular thermodynamic framework including real energy terms based on the Edwards statistical ensemble.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Sudden polarization angle jumps of the repeating fast radio burst FRB 20201124A
Authors:
J. R. Niu,
W. Y. Wang,
J. C. Jiang,
Y. Qu,
D. J. Zhou,
W. W. Zhu,
K. J. Lee,
J. L. Han,
B. Zhang,
D. Li,
S. Cao,
Z. Y. Fang,
Y. Feng,
Q. Y. Fu,
P. Jiang,
W. C. Jing,
J. Li,
Y. Li,
R. Luo,
L. Q. Meng,
C. C. Miao,
X. L. Miao,
C. H. Niu,
Y. C. Pan,
B. J. Wang
, et al. (19 additional authors not shown)
Abstract:
We report the first detection of polarization angle (PA) orthogonal jumps, a phenomenon previously only observed from radio pulsars, from a fast radio burst (FRB) source FRB 20201124A. We find three cases of orthogonal jumps in over two thousand bursts, all resembling those observed in pulsar single pulses. We propose that the jumps are due to the superposition of two orthogonal emission modes tha…
▽ More
We report the first detection of polarization angle (PA) orthogonal jumps, a phenomenon previously only observed from radio pulsars, from a fast radio burst (FRB) source FRB 20201124A. We find three cases of orthogonal jumps in over two thousand bursts, all resembling those observed in pulsar single pulses. We propose that the jumps are due to the superposition of two orthogonal emission modes that could only be produced in a highly magnetized plasma, and they are caused by the line of sight sweeping across a rotating magnetosphere. The shortest jump timescale is of the order of one-millisecond, which hints that the emission modes come from regions smaller than the light cylinder of most pulsars or magnetars. This discovery provides convincing evidence that FRB emission originates from the complex magnetosphere of a magnetar, suggesting an FRB emission mechanism that is analogous to radio pulsars despite a huge luminosity difference between two types of objects.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin
Authors:
Songyang Zhang,
Chuyu Zhang,
Yingfan Hu,
Haowen Shen,
Kuikun Liu,
Zerun Ma,
Fengzhe Zhou,
Wenwei Zhang,
Xuming He,
Dahua Lin,
Kai Chen
Abstract:
While LLM-Based agents, which use external tools to solve complex problems, have made significant progress, benchmarking their ability is challenging, thereby hindering a clear understanding of their limitations. In this paper, we propose an interactive evaluation framework, named CIBench, to comprehensively assess LLMs' ability to utilize code interpreters for data science tasks. Our evaluation f…
▽ More
While LLM-Based agents, which use external tools to solve complex problems, have made significant progress, benchmarking their ability is challenging, thereby hindering a clear understanding of their limitations. In this paper, we propose an interactive evaluation framework, named CIBench, to comprehensively assess LLMs' ability to utilize code interpreters for data science tasks. Our evaluation framework includes an evaluation dataset and two evaluation modes. The evaluation dataset is constructed using an LLM-human cooperative approach and simulates an authentic workflow by leveraging consecutive and interactive IPython sessions. The two evaluation modes assess LLMs' ability with and without human assistance. We conduct extensive experiments to analyze the ability of 24 LLMs on CIBench and provide valuable insights for future LLMs in code interpreter utilization.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Cooperative Reward Shaping for Multi-Agent Pathfinding
Authors:
Zhenyu Song,
Ronghao Zheng,
Senlin Zhang,
Meiqin Liu
Abstract:
The primary objective of Multi-Agent Pathfinding (MAPF) is to plan efficient and conflict-free paths for all agents. Traditional multi-agent path planning algorithms struggle to achieve efficient distributed path planning for multiple agents. In contrast, Multi-Agent Reinforcement Learning (MARL) has been demonstrated as an effective approach to achieve this objective. By modeling the MAPF problem…
▽ More
The primary objective of Multi-Agent Pathfinding (MAPF) is to plan efficient and conflict-free paths for all agents. Traditional multi-agent path planning algorithms struggle to achieve efficient distributed path planning for multiple agents. In contrast, Multi-Agent Reinforcement Learning (MARL) has been demonstrated as an effective approach to achieve this objective. By modeling the MAPF problem as a MARL problem, agents can achieve efficient path planning and collision avoidance through distributed strategies under partial observation. However, MARL strategies often lack cooperation among agents due to the absence of global information, which subsequently leads to reduced MAPF efficiency. To address this challenge, this letter introduces a unique reward shaping technique based on Independent Q-Learning (IQL). The aim of this method is to evaluate the influence of one agent on its neighbors and integrate such an interaction into the reward function, leading to active cooperation among agents. This reward shaping method facilitates cooperation among agents while operating in a distributed manner. The proposed approach has been evaluated through experiments across various scenarios with different scales and agent counts. The results are compared with those from other state-of-the-art (SOTA) planners. The evidence suggests that the approach proposed in this letter parallels other planners in numerous aspects, and outperforms them in scenarios featuring a large number of agents.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
Supernova Pointing Capabilities of DUNE
Authors:
DUNE Collaboration,
A. Abed Abud,
B. Abi,
R. Acciarri,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
D. Adams,
M. Adinolfi,
C. Adriano,
A. Aduszkiewicz,
J. Aguilar,
B. Aimard,
F. Akbar,
K. Allison,
S. Alonso Monsalve,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
H. Amar,
P. Amedo,
J. Anderson,
D. A. Andrade
, et al. (1340 additional authors not shown)
Abstract:
The determination of the direction of a stellar core collapse via its neutrino emission is crucial for the identification of the progenitor for a multimessenger follow-up. A highly effective method of reconstructing supernova directions within the Deep Underground Neutrino Experiment (DUNE) is introduced. The supernova neutrino pointing resolution is studied by simulating and reconstructing electr…
▽ More
The determination of the direction of a stellar core collapse via its neutrino emission is crucial for the identification of the progenitor for a multimessenger follow-up. A highly effective method of reconstructing supernova directions within the Deep Underground Neutrino Experiment (DUNE) is introduced. The supernova neutrino pointing resolution is studied by simulating and reconstructing electron-neutrino charged-current absorption on $^{40}$Ar and elastic scattering of neutrinos on electrons. Procedures to reconstruct individual interactions, including a newly developed technique called ``brems flipping'', as well as the burst direction from an ensemble of interactions are described. Performance of the burst direction reconstruction is evaluated for supernovae happening at a distance of 10 kpc for a specific supernova burst flux model. The pointing resolution is found to be 3.4 degrees at 68% coverage for a perfect interaction-channel classification and a fiducial mass of 40 kton, and 6.6 degrees for a 10 kton fiducial mass respectively. Assuming a 4% rate of charged-current interactions being misidentified as elastic scattering, DUNE's burst pointing resolution is found to be 4.3 degrees (8.7 degrees) at 68% coverage.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
The inadequacy of the $ρ$-T curve for phase transitions in the presence of magnetic fields
Authors:
Shengnan Zhang,
Zhong Fang,
Hongming Weng,
Quansheng Wu
Abstract:
The $ρ(T)$ curve is traditionally employed to discern metallic, semiconductor, and insulating behaviors in materials, with any deviations often interpreted as indicative of phase transitions. However, does this interpretation hold under the influence of a magnetic field? Our research addresses this critical question by reevaluating the $ρ(T)$ curve in the presence of magnetic field. We uncover tha…
▽ More
The $ρ(T)$ curve is traditionally employed to discern metallic, semiconductor, and insulating behaviors in materials, with any deviations often interpreted as indicative of phase transitions. However, does this interpretation hold under the influence of a magnetic field? Our research addresses this critical question by reevaluating the $ρ(T)$ curve in the presence of magnetic field. We uncover that metal-insulator shifts and reentrant metallic states may not indicate true phase transitions but rather originate from the scaling behavior of magnetoresistance, influenced by magnetic field and temperature through a power-law dependence. Employing advanced first-principles calculations and the Boltzmann method, we analyzed the magnetoresistance of SiP$_2$ and NbP across a range of conditions, successfully explaining not only the reentrant behavior observed in experiments but also resolving the discrepancies in magnetoresistance behavior reported by different research groups. These findings challenge the conventional use of the $ρ(T)$ curve as a straightforward indicator of phase transitions under magnetic conditions, highlighting the essential need to exclude typical magnetoresistance effects due to the Lorentz force before confirming such transitions. This novel insight reshapes our understanding of complex material properties in magnetic fields and sets a new precedent for the interpretation of transport phenomena in condensed matter physics.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
Triggering the Untriggered: The First Einstein Probe-Detected Gamma-Ray Burst 240219A and Its Implications
Authors:
Yi-Han Iris Yin,
Bin-Bin Zhang,
Jun Yang,
Hui Sun,
Chen Zhang,
Yi-Xuan Shao,
You-Dong Hu,
Zi-Pei Zhu,
Dong Xu,
Li An,
He Gao,
Xue-Feng Wu,
Bing Zhang,
Alberto Javier Castro-Tirado,
Shashi B. Pandey,
Arne Rau,
Weihua Lei,
Wei Xie,
Giancarlo Ghirlanda,
Luigi Piro,
Paul O'Brien,
Eleonora Troja,
Peter Jonker,
Yun-Wei Yu,
Jie An
, et al. (26 additional authors not shown)
Abstract:
The Einstein Probe (EP) achieved its first detection and localization of a bright X-ray flare, EP240219a, on February 19, 2024, during its commissioning phase. Subsequent targeted searches triggered by the EP240219a alert identified a faint, untriggered gamma-ray burst (GRB) in the archived data of Fermi/GBM, Swift/BAT, Insight-HXMT/HE and INTEGRAL/SPI-ACS. The EP/WXT light curve reveals a long du…
▽ More
The Einstein Probe (EP) achieved its first detection and localization of a bright X-ray flare, EP240219a, on February 19, 2024, during its commissioning phase. Subsequent targeted searches triggered by the EP240219a alert identified a faint, untriggered gamma-ray burst (GRB) in the archived data of Fermi/GBM, Swift/BAT, Insight-HXMT/HE and INTEGRAL/SPI-ACS. The EP/WXT light curve reveals a long duration of approximately 160 seconds with a slow decay, whereas the Fermi/GBM light curve shows a total duration of approximately 70 seconds. The peak in the Fermi/GBM light curve occurs slightly later with respect to the peak seen in the EP/WXT light curve. Our spectral analysis shows that a single cutoff power-law model effectively describes the joint EP/WXT-Fermi/GBM spectra in general, indicating coherent broad emission typical of GRBs. The model yielded a photon index of $\sim -1.70 \pm 0.05$ and a peak energy of $\sim 257 \pm 134$ keV. After detection of GRB 240219A, long-term observations identified several candidates in optical and radio wavelengths, none of which was confirmed as the afterglow counterpart during subsequent optical and near-infrared follow-ups. The analysis of GRB 240219A classifies it as an X-ray rich GRB with a high peak energy, presenting both challenges and opportunities for studying the physical origins of X-ray flashes (XRFs), X-ray rich GRBs (XRRs), and classical GRBs (C-GRBs). Furthermore, linking the cutoff power-law component to non-thermal synchrotron radiation suggests that the burst is driven by a Poynting flux-dominated outflow.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
The STAR Forward Silicon Tracker
Authors:
J. D. Brandenburg,
Y. Chang,
J. Dong,
Y. He,
Y. Hu,
H. Huang,
T. Huang,
H. Li,
M. Nie,
R. Sharma,
X. Sun,
P. Tribedy,
F. Videbæk,
G. Visser,
G. Wilks,
P. Wang,
G. Xie,
G. Yan,
Z. Ye,
L. Yi,
Y. Yang,
S. Zhang,
Z. Zhang
Abstract:
The Forward Silicon Tracker (FST) is a pivotal component of the forward upgrade of the Solenoidal Tracker at RHIC (STAR), designed to discern hadron charge signs with a momentum resolution better than 30\% for $0.2 < p_T < 2$ GeV/c in the $2.5 < η< 4$ pseudorapidity range. Its compact design features three disks along the beam direction, minimized material budget and scattering effects. The FST us…
▽ More
The Forward Silicon Tracker (FST) is a pivotal component of the forward upgrade of the Solenoidal Tracker at RHIC (STAR), designed to discern hadron charge signs with a momentum resolution better than 30\% for $0.2 < p_T < 2$ GeV/c in the $2.5 < η< 4$ pseudorapidity range. Its compact design features three disks along the beam direction, minimized material budget and scattering effects. The FST uses Hamamatsu's p-in-n silicon strip sensors with a double metal layer for efficient signal processing. The flexible hybrid boards, essential for the readout system, are constructed with Kapton and copper layers to optimize signal handling and power distribution. These boards connect silicon strips to analogue pipeline ASIC APV25-S1 chips, which read up to 128 channels each. A cooling system with nonconducting, volatile NOVEC 7200 coolant at 22.2°C mitigates ASIC-generated heat. The FST enhances forward tracking performance at RHIC, showcasing unique design solutions to complex challenges.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
Detection of hidden emissions in two rotating radio transients with high surface magnetic fields
Authors:
S. B. Zhang,
X. Yang,
J. J. Geng,
Y. P. Yang,
X. F. Wu
Abstract:
Rotating Radio Transients (RRATs) are neutron stars emitting sporadic radio pulses. The unique emission of RRATs has been proposed to resemble those of known pulsar types, such as extreme nulling pulsars or pulsars with giant pulses. However, the presence of additional radiation beyond these sporadic pulses remains unclear. Through high-sensitivity observations and extended tracking, we detected t…
▽ More
Rotating Radio Transients (RRATs) are neutron stars emitting sporadic radio pulses. The unique emission of RRATs has been proposed to resemble those of known pulsar types, such as extreme nulling pulsars or pulsars with giant pulses. However, the presence of additional radiation beyond these sporadic pulses remains unclear. Through high-sensitivity observations and extended tracking, we detected the sequential weak emissions in two RRATs with relatively high surface magnetic fields (Bs > 10^13 G): J1846-0257 and J1854+0306. These emissions show peak flux densities of 0.15 and 0.41 mJy, up to 687 and 512 times weaker than our detected RRAT single pulses, respectively. The weak emissions contribute small fractions (~ 16% and 5%) to the total radio pulse energy releases, contrasting significantly with giant-pulse pulsars where normal pulses dominate. Polarization analysis of J1854+0306 suggests that its sporadic RRAT pulses may originate from intermittent enhanced sparking processes due to magnetospheric evolution. Our findings indicate that some RRATs may represent a novel class of pulsars, distinct from any previously known subclass. Further observations of sources with similar rotational properties using high-sensitivity instruments could validate the generality of these hidden emissions.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
Don't Fear Peculiar Activation Functions: EUAF and Beyond
Authors:
Qianchao Wang,
Shijun Zhang,
Dong Zeng,
Zhaoheng Xie,
Hengtao Guo,
Feng-Lei Fan,
Tieyong Zeng
Abstract:
In this paper, we propose a new super-expressive activation function called the Parametric Elementary Universal Activation Function (PEUAF). We demonstrate the effectiveness of PEUAF through systematic and comprehensive experiments on various industrial and image datasets, including CIFAR10, Tiny-ImageNet, and ImageNet. Moreover, we significantly generalize the family of super-expressive activatio…
▽ More
In this paper, we propose a new super-expressive activation function called the Parametric Elementary Universal Activation Function (PEUAF). We demonstrate the effectiveness of PEUAF through systematic and comprehensive experiments on various industrial and image datasets, including CIFAR10, Tiny-ImageNet, and ImageNet. Moreover, we significantly generalize the family of super-expressive activation functions, whose existence has been demonstrated in several recent works by showing that any continuous function can be approximated to any desired accuracy by a fixed-size network with a specific super-expressive activation function. Specifically, our work addresses two major bottlenecks in impeding the development of super-expressive activation functions: the limited identification of super-expressive functions, which raises doubts about their broad applicability, and their often peculiar forms, which lead to skepticism regarding their scalability and practicality in real-world applications.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Structure preserving schemes for a class of Wasserstein gradient flows
Authors:
Shiheng Zhang,
Jie Shen
Abstract:
We introduce in this paper two time discretization schemes tailored for a range of Wasserstein gradient flows. These schemes are designed to preserve mass, positivity and to be uniquely solvable. In addition, they also ensure energy dissipation in many typical scenarios. Through extensive numerical experiments, we demonstrate the schemes' robustness, accuracy and efficiency.
We introduce in this paper two time discretization schemes tailored for a range of Wasserstein gradient flows. These schemes are designed to preserve mass, positivity and to be uniquely solvable. In addition, they also ensure energy dissipation in many typical scenarios. Through extensive numerical experiments, we demonstrate the schemes' robustness, accuracy and efficiency.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Grain boundaries control lithiation of solid solution substrates in lithium metal batteries
Authors:
Leonardo Shoji Aota,
Chanwon Jung,
Siyuan Zhang,
Ömer K. Büyükuslu,
Poonam Yadav,
Mahander Pratap Singh,
Xinren Chen,
Eric Woods,
Christina Scheu,
Se-Ho Kim,
Dierk Raabe,
Baptiste Gault
Abstract:
The development of sustainable transportation and communication systems requires an increase in both energy density and capacity retention of Li-batteries. Using substrates forming a solid solution with body centered cubic Li enhances the cycle stability of anode-less batteries. However, it remains unclear how the substrate microstructure affects the lithiation behavior. Here, we deploy a correlat…
▽ More
The development of sustainable transportation and communication systems requires an increase in both energy density and capacity retention of Li-batteries. Using substrates forming a solid solution with body centered cubic Li enhances the cycle stability of anode-less batteries. However, it remains unclear how the substrate microstructure affects the lithiation behavior. Here, we deploy a correlative, near-atomic scale probing approach through combined ion- and electron-microscopy to examine the distribution of Li in Li-Ag diffusion couples as model system. We reveal that Li regions with over 93.8% at.% nucleate within Ag at random high angle grain boundaries, whereas grain interiors are not lithiated. We evidence the role of kinetics and mechanical constraint from the microstructure over equilibrium thermodynamics in dictating the lithiation process. The findings suggest that grain size and grain boundary character are critical to enhance the electrochemical performance of interlayers/electrodes, particularly for improving lithiation kinetics and hence reducing dendrite formation.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
FedVAE: Trajectory privacy preserving based on Federated Variational AutoEncoder
Authors:
Yuchen Jiang,
Ying Wu,
Shiyao Zhang,
James J. Q. Yu
Abstract:
The use of trajectory data with abundant spatial-temporal information is pivotal in Intelligent Transport Systems (ITS) and various traffic system tasks. Location-Based Services (LBS) capitalize on this trajectory data to offer users personalized services tailored to their location information. However, this trajectory data contains sensitive information about users' movement patterns and habits,…
▽ More
The use of trajectory data with abundant spatial-temporal information is pivotal in Intelligent Transport Systems (ITS) and various traffic system tasks. Location-Based Services (LBS) capitalize on this trajectory data to offer users personalized services tailored to their location information. However, this trajectory data contains sensitive information about users' movement patterns and habits, necessitating confidentiality and protection from unknown collectors. To address this challenge, privacy-preserving methods like K-anonymity and Differential Privacy have been proposed to safeguard private information in the dataset. Despite their effectiveness, these methods can impact the original features by introducing perturbations or generating unrealistic trajectory data, leading to suboptimal performance in downstream tasks. To overcome these limitations, we propose a Federated Variational AutoEncoder (FedVAE) approach, which effectively generates a new trajectory dataset while preserving the confidentiality of private information and retaining the structure of the original features. In addition, FedVAE leverages Variational AutoEncoder (VAE) to maintain the original feature space and generate new trajectory data, and incorporates Federated Learning (FL) during the training stage, ensuring that users' data remains locally stored to protect their personal information. The results demonstrate its superior performance compared to other existing methods, affirming FedVAE as a promising solution for enhancing data privacy and utility in location-based applications.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Zeroth-Order Katyusha: An Accelerated Derivative-Free Method for Composite Convex Optimization
Authors:
Silan Zhang,
Yujie Tang
Abstract:
We investigate accelerated zeroth-order algorithms for smooth composite convex optimization problems. While for unconstrained optimization, existing methods that merge 2-point zeroth-order gradient estimators with first-order frameworks usually lead to satisfactory performance, for constrained/composite problems, there is still a gap in the complexity bound that is related to the non-vanishing var…
▽ More
We investigate accelerated zeroth-order algorithms for smooth composite convex optimization problems. While for unconstrained optimization, existing methods that merge 2-point zeroth-order gradient estimators with first-order frameworks usually lead to satisfactory performance, for constrained/composite problems, there is still a gap in the complexity bound that is related to the non-vanishing variance of the 2-point gradient estimator near an optimal point. To bridge this gap, we propose the Zeroth-Order Loopless Katyusha (ZO-L-Katyusha) algorithm, leveraging the variance reduction as well as acceleration techniques from the first-order loopless Katyusha algorithm. We show that ZO-L-Katyusha is able to achieve accelerated linear convergence for compositve smooth and strongly convex problems, and has the same oracle complexity as the unconstrained case. Moreover, the number of function queries to construct a zeroth-order gradient estimator in ZO-L-Katyusha can be made to be O(1) on average. These results suggest that ZO-L-Katyusha provides a promising approach towards bridging the gap in the complexity bound for zeroth-order composite optimization.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Accurate Prior-centric Monocular Positioning with Offline LiDAR Fusion
Authors:
Jinhao He,
Huaiyang Huang,
Shuyang Zhang,
Jianhao Jiao,
Chengju Liu,
Ming Liu
Abstract:
Unmanned vehicles usually rely on Global Positioning System (GPS) and Light Detection and Ranging (LiDAR) sensors to achieve high-precision localization results for navigation purpose. However, this combination with their associated costs and infrastructure demands, poses challenges for widespread adoption in mass-market applications. In this paper, we aim to use only a monocular camera to achieve…
▽ More
Unmanned vehicles usually rely on Global Positioning System (GPS) and Light Detection and Ranging (LiDAR) sensors to achieve high-precision localization results for navigation purpose. However, this combination with their associated costs and infrastructure demands, poses challenges for widespread adoption in mass-market applications. In this paper, we aim to use only a monocular camera to achieve comparable onboard localization performance by tracking deep-learning visual features on a LiDAR-enhanced visual prior map. Experiments show that the proposed algorithm can provide centimeter-level global positioning results with scale, which is effortlessly integrated and favorable for low-cost robot system deployment in real-world applications.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Understanding chiral charge-density wave by frozen chiral phonon
Authors:
Shuai Zhang,
Kaifa Luo,
Tiantian Zhang
Abstract:
Charge density wave (CDW) is discovered within a wide interval in solids, however, its microscopic nature is still not transparent in most realistic materials, and the recently studied chiral ones with chiral structural distortion remain unclear. In this paper, we try to understand the driving forces of chiral CDW transition by chiral phonons from the electron-phonon coupling scenario. We use the…
▽ More
Charge density wave (CDW) is discovered within a wide interval in solids, however, its microscopic nature is still not transparent in most realistic materials, and the recently studied chiral ones with chiral structural distortion remain unclear. In this paper, we try to understand the driving forces of chiral CDW transition by chiral phonons from the electron-phonon coupling scenario. We use the prototypal monolayer 1T-TiSe$_2$ as a case study to unveil the absence of chirality in the CDW transition and propose a general approach, i.e., symmetry-breaking stimuli, to engineer the chirality of CDW in experiments. Inelastic scattering patterns are also studied as a benchmark of chiral CDW (CCDW, which breaks the mirror/inversion symmetry in 2D/3D systems). We notice that the anisotropy changing of Bragg peak profiles, which is contributed by the soft chiral phonons, can show a remarkable signature for CCDW. Our findings pave a path to understanding the CCDW from the chiral phonon perspective, especially in van der Waals materials, and provide a powerful way to manipulate the chirality of CDW.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
MAVIS: Mathematical Visual Instruction Tuning
Authors:
Renrui Zhang,
Xinyu Wei,
Dongzhi Jiang,
Yichi Zhang,
Ziyu Guo,
Chengzhuo Tong,
Jiaming Liu,
Aojun Zhou,
Bin Wei,
Shanghang Zhang,
Peng Gao,
Hongsheng Li
Abstract:
Multi-modal Large Language Models (MLLMs) have recently emerged as a significant focus in academia and industry. Despite their proficiency in general multi-modal scenarios, the mathematical problem-solving capabilities in visual contexts remain insufficiently explored. We identify three key areas within MLLMs that need to be improved: visual encoding of math diagrams, diagram-language alignment, a…
▽ More
Multi-modal Large Language Models (MLLMs) have recently emerged as a significant focus in academia and industry. Despite their proficiency in general multi-modal scenarios, the mathematical problem-solving capabilities in visual contexts remain insufficiently explored. We identify three key areas within MLLMs that need to be improved: visual encoding of math diagrams, diagram-language alignment, and mathematical reasoning skills. This draws forth an urgent demand for large-scale, high-quality data and training pipelines in visual mathematics. In this paper, we propose MAVIS, the first MAthematical VISual instruction tuning paradigm for MLLMs, involving a series of mathematical visual datasets and specialized MLLMs. Targeting the three issues, MAVIS contains three progressive training stages from scratch. First, we curate MAVIS-Caption, consisting of 558K diagram-caption pairs, to fine-tune a math-specific vision encoder (CLIP-Math) through contrastive learning, tailored for improved diagram visual encoding. Second, we utilize MAVIS-Caption to align the CLIP-Math with a large language model (LLM) by a projection layer, enhancing vision-language alignment in mathematical domains. Third, we introduce MAVIS-Instruct, including 900K meticulously collected and annotated visual math problems, which is adopted to finally instruct-tune the MLLM for robust mathematical reasoning skills. In MAVIS-Instruct, we incorporate complete chain-of-thought (CoT) rationales for each problem, and minimize textual redundancy, thereby concentrating the model towards the visual elements. Data and Models are released at https://github.com/ZrrSkywalker/MAVIS
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
GTA: A Benchmark for General Tool Agents
Authors:
Jize Wang,
Zerun Ma,
Yining Li,
Songyang Zhang,
Cailian Chen,
Kai Chen,
Xinyi Le
Abstract:
Significant focus has been placed on integrating large language models (LLMs) with various tools in developing general-purpose agents. This poses a challenge to LLMs' tool-use capabilities. However, there are evident gaps between existing tool-use evaluations and real-world scenarios. Current evaluations often use AI-generated queries, single-step tasks, dummy tools, and text-only interactions, fa…
▽ More
Significant focus has been placed on integrating large language models (LLMs) with various tools in developing general-purpose agents. This poses a challenge to LLMs' tool-use capabilities. However, there are evident gaps between existing tool-use evaluations and real-world scenarios. Current evaluations often use AI-generated queries, single-step tasks, dummy tools, and text-only interactions, failing to reveal the agents' real-world problem-solving abilities effectively. To address this, we propose GTA, a benchmark for General Tool Agents, featuring three main aspects: (i) Real user queries: human-written queries with simple real-world objectives but implicit tool-use, requiring the LLM to reason the suitable tools and plan the solution steps. (ii) Real deployed tools: an evaluation platform equipped with tools across perception, operation, logic, and creativity categories to evaluate the agents' actual task execution performance. (iii) Real multimodal inputs: authentic image files, such as spatial scenes, web page screenshots, tables, code snippets, and printed/handwritten materials, used as the query contexts to align with real-world scenarios closely. We design 229 real-world tasks and executable tool chains to evaluate mainstream LLMs. Our findings show that real-world user queries are challenging for existing LLMs, with GPT-4 completing less than 50% of the tasks and most LLMs achieving below 25%. This evaluation reveals the bottlenecks in the tool-use capabilities of current LLMs in real-world scenarios, which provides future direction for advancing general-purpose tool agents. The code and dataset are available at https://github.com/open-compass/GTA.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Revisiting the Formulation of Charged Defect in Solids
Authors:
Hanzhi Shang,
Zeyu Jiang,
Yiyang Sun,
Damien West,
Shengbai Zhang
Abstract:
Defect physics is at the heart of microelectronics. By keeping track of the reference energy in total energy calculations, we explicitly show that the "potential alignment" correction vanishes, and the classic Markov-Payne correction yields accurate results. From linear response theory, we further formulate an accurate expression for the quadrupole correction. Application to numerous defects inclu…
▽ More
Defect physics is at the heart of microelectronics. By keeping track of the reference energy in total energy calculations, we explicitly show that the "potential alignment" correction vanishes, and the classic Markov-Payne correction yields accurate results. From linear response theory, we further formulate an accurate expression for the quadrupole correction. Application to numerous defects including anisotropic material yields accurate formation energies in small supercells and the historically slow convergence of the 2+ diamond vacancy is shown to be a result of slow varying gap levels of the defect leading to a size dependent dielectric constant.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Skin Effect of Nonlinear Optical Responses in Antiferromagnets
Authors:
Hang Zhou,
Rui-Chun Xiao,
Shu-Hui Zhang,
Wei Gan,
Hui Han,
Hong-Miao Zhao,
Wenjian Lu,
Changjin Zhang,
Yuping Sun,
Hui Li,
Ding-Fu Shao
Abstract:
Nonlinear optics plays important roles in the research of fundamental physics and the applications of highperformance optoelectronic devices. The bulk nonlinear optical responses arise from the uniform light absorption in noncentrosymmetric crystals, and hence are usually considered to be the collective phenomena of all atoms. Here we show, in contrast to this common expectation, the nonlinear opt…
▽ More
Nonlinear optics plays important roles in the research of fundamental physics and the applications of highperformance optoelectronic devices. The bulk nonlinear optical responses arise from the uniform light absorption in noncentrosymmetric crystals, and hence are usually considered to be the collective phenomena of all atoms. Here we show, in contrast to this common expectation, the nonlinear optical responses in antiferromagnets can be selectively accumulated near the surfaces, representing a skin effect. This is because the inversion symmetry, despite being broken globally, is barely violated locally deeply inside these antiferromagnets. Using A-type layered antiferromagnets as the representatives, we predict that the spatial-dependent nonlinear optical responses, such as bulk photovoltaic effect (BPVE) and second harmonic generation (SHG), are notable in the top- and bottom-most layers and decay rapidly when moving away from the surfaces. Such a phenomenon exists in a broad range of antiferromagnets composed of centrosymmetric sublattices, offering promising device applications using these antiferromagnets. Our work uncovers a previously overlooked property of nonlinear optical responses and opens new opportunities for high-performance antiferromagnetic optospintronics.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Revisiting the dead time effects of Insight-HXMT/ME on timing analysis
Authors:
Youli Tuo,
Xiaobo Li,
Ying Tan,
Baiyang Wu,
Weichun Jiang,
Liming Song,
Jinlu Qu,
Sudeep Gogate,
Shuang-Nan Zhang,
Andrea Santangelo
Abstract:
Dead time is a common instrumental effect of X-ray detectors which would alter the behavior of timing properties of astronomical signals, such as distorting the shape of power density spectra (PDS), affecting the root-mean-square of potential quasi-periodic oscillation signals, etc. We revisit the effects of the dead time of Medium Energy X-ray telescope (ME) onboard Insight-HXMT, based on the sim…
▽ More
Dead time is a common instrumental effect of X-ray detectors which would alter the behavior of timing properties of astronomical signals, such as distorting the shape of power density spectra (PDS), affecting the root-mean-square of potential quasi-periodic oscillation signals, etc. We revisit the effects of the dead time of Medium Energy X-ray telescope (ME) onboard Insight-HXMT, based on the simulation of electronic read-out mechanism that causes the dead time, and the real data. We investigate dead time effects on the pulse profile as well as the Quasi-Periodic Oscillation (QPO) signals. The dead time coefficient suggests a linear correlation with the observed count rate in each phase bin of the pulse profile according to the simulation of periodic signal as well as the real data observed on Swift J0243.6+6124. The Fourier-amplitude-difference (FAD) method could well recover the intrinsic shape of the observed PDS in the case that the PDS is from two identical detectors. We apply this technique on ME, by splitting the 9 FPGA modules into 2 groups. The results indicate that the FAD technique suits the case when two groups of detectors are not largely different; and the recovered PDS of Sco X-1 observed by ME slightly enhances the significance of the previously known QPO signal, meanwhile the root-mean-square of QPO is significantly improved. We provide the FAD correction tool implemented in HXMTDAS for users in the future to better analyze QPO signals.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Revealing spontaneous symmetry breaking in continuous time crystals
Authors:
Yuanjiang Tang,
Chenyang Wang,
Bei Liu,
Jin Peng,
Chao Liang,
Yaohua Li,
Xian Zhao,
Cuicui Lu,
Shuang Zhang,
Yong-Chun Liu
Abstract:
Spontaneous symmetry breaking plays a pivotal role in physics ranging from the emergence of elementary particles to the phase transitions of matter. The spontaneous breaking of continuous time translation symmetry leads to a novel state of matter named continuous time crystal (CTC). It exhibits periodic oscillation without the need for periodic driving, and the relative phases for repetitively rea…
▽ More
Spontaneous symmetry breaking plays a pivotal role in physics ranging from the emergence of elementary particles to the phase transitions of matter. The spontaneous breaking of continuous time translation symmetry leads to a novel state of matter named continuous time crystal (CTC). It exhibits periodic oscillation without the need for periodic driving, and the relative phases for repetitively realized oscillations are random. However, the mechanism behind the spontaneous symmetry breaking in CTCs, particularly the random phases, remains elusive. Here we propose and experimentally realize two types of CTCs based on distinct mechanisms: manifold topology and near-chaotic motion. We observe both types of CTCs in thermal atomic ensembles by artificially synthesizing spin-spin nonlinear interactions through a measurement-feedback scheme. Our work provides general recipes for the realization of CTCs, and paves the way for exploring CTCs in various systems.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$
Authors:
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (645 additional authors not shown)
Abstract:
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be…
▽ More
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
Authors:
Yatai Ji,
Shilong Zhang,
Jie Wu,
Peize Sun,
Weifeng Chen,
Xuefeng Xiao,
Sidi Yang,
Yujiu Yang,
Ping Luo
Abstract:
The rapid advancement of Large Vision-Language models (LVLMs) has demonstrated a spectrum of emergent capabilities. Nevertheless, current models only focus on the visual content of a single scenario, while their ability to associate instances across different scenes has not yet been explored, which is essential for understanding complex visual content, such as movies with multiple characters and i…
▽ More
The rapid advancement of Large Vision-Language models (LVLMs) has demonstrated a spectrum of emergent capabilities. Nevertheless, current models only focus on the visual content of a single scenario, while their ability to associate instances across different scenes has not yet been explored, which is essential for understanding complex visual content, such as movies with multiple characters and intricate plots. Towards movie understanding, a critical initial step for LVLMs is to unleash the potential of character identities memory and recognition across multiple visual scenarios. To achieve the goal, we propose visual instruction tuning with ID reference and develop an ID-Aware Large Vision-Language Model, IDA-VLM. Furthermore, our research introduces a novel benchmark MM-ID, to examine LVLMs on instance IDs memory and recognition across four dimensions: matching, location, question-answering, and captioning. Our findings highlight the limitations of existing LVLMs in recognizing and associating instance identities with ID reference. This paper paves the way for future artificial intelligence systems to possess multi-identity visual inputs, thereby facilitating the comprehension of complex visual narratives like movies.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Electrical Impedance Tomography Based Closed-loop Tumor Treating Fields in Dynamic Lung Tumors
Authors:
Minmin Wang,
Xu Xie,
Yuxi Guo,
Liying Zhu,
Yue Lan,
Haitang Yang,
Yun Pan,
Guangdi Chen,
Shaomin Zhang,
Maomao Zhang
Abstract:
Tumor Treating Fields (TTFields) is a non-invasive anticancer modality that utilizes alternating electric fields to disrupt cancer cell division and growth. While generally well-tolerated with minimal side effects, traditional TTFields therapy for lung tumors faces challenges due to the influence of respiratory motion. We design a novel closed-loop TTFields strategy for lung tumors by incorporatin…
▽ More
Tumor Treating Fields (TTFields) is a non-invasive anticancer modality that utilizes alternating electric fields to disrupt cancer cell division and growth. While generally well-tolerated with minimal side effects, traditional TTFields therapy for lung tumors faces challenges due to the influence of respiratory motion. We design a novel closed-loop TTFields strategy for lung tumors by incorporating electrical impedance tomography (EIT) for real-time respiratory phase monitoring and dynamic parameter adjustments. Furthermore, we conduct theoretical analysis to evaluate the performance of the proposed method using the lung motion model. Compared to conventional TTFields settings, we observed that variations in the electrical conductivity of lung during different respiratory phases led to a decrease in the average electric field intensity within lung tumors, transitioning from end-expiratory (1.08 V/cm) to end-inspiratory (0.87 V/cm) phases. Utilizing our proposed closed-Loop TTFields approach at the same dose setting (2400 mA, consistent with the traditional TTFields setting), we can achieve a higher and consistent average electric field strength at the tumor site (1.30 V/cm) across different respiratory stages. Our proposed closed-loop TTFields method has the potential to improved lung tumor therapy by mitigating the impact of respiratory motion.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Revealing the evanescent components in Kronecker-product based codebooks: insights and implications
Authors:
Jun Yang,
Yijian Chen,
Yunqi Sun,
Yuan Si,
Hongkang Yu,
Shujuan Zhang,
Zhaohua Lu
Abstract:
The orthogonal bases of discrete Fourier transform (DFT) has been recognized as the standard spatial-domain bases for Type I, Type II and enhanced Type II codewords by the 3rd Generation Partnership Project (3GPP). For uniform planar arrays, these spatial-domain bases are derived as the Kronecker product of one-dimensional DFT bases. Theoretically, each spatial basis corresponds to a beam directed…
▽ More
The orthogonal bases of discrete Fourier transform (DFT) has been recognized as the standard spatial-domain bases for Type I, Type II and enhanced Type II codewords by the 3rd Generation Partnership Project (3GPP). For uniform planar arrays, these spatial-domain bases are derived as the Kronecker product of one-dimensional DFT bases. Theoretically, each spatial basis corresponds to a beam directed towards a specific angle of departure and the set of bases represent the orthogonal beams that cover the front hemisphere of an array. While the Kronecker-product based precoding scheme facilitates the concise indexing of a codeword in the codebooks through precoding matrix indicators (PMIs) in channel state information feedback, it introduces redundant spatial beams characterized by high spatial-frequency components. This paper investigates the presence of codewords representing high spatial-frequency components within the Kronecker-product based codebooks. Through theoretical analysis and simulations, we confirm the redundancy of these codewords in MIMO communications, advocating for their removal from the codebooks to enhance system performance. Several topics relevant to the high spatial components are also involved in the discussion. Practical suggestions regarding future standard design are provided based on our theoretical analysis and simulation results.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Realization of Conditional Operations through Transition Pathway Engineering
Authors:
Sheng Zhang,
Peng Duan,
Yun-Jie Wang,
Tian-Le Wang,
Peng Wang,
Ren-Ze Zhao,
Xiao-Yan Yang,
Ze-An Zhao,
Liang-Liang Guo,
Yong Chen,
Hai-Feng Zhang,
Lei Du,
Hao-Ran Tao,
Zhi-Fei Li,
Yuan Wu,
Zhi-Long Jia,
Wei-Cheng Kong,
Zhao-Yun Chen,
Yu-Chun Wu,
Guo-Ping Guo
Abstract:
In the NISQ era, achieving large-scale quantum computing demands compact circuits to mitigate decoherence and gate error accumulation. Quantum operations with diverse degrees of freedom hold promise for circuit compression, but conventional approaches encounter challenges in simultaneously adjusting multiple parameters. Here, we propose a transition composite gate (TCG) scheme grounded on state-se…
▽ More
In the NISQ era, achieving large-scale quantum computing demands compact circuits to mitigate decoherence and gate error accumulation. Quantum operations with diverse degrees of freedom hold promise for circuit compression, but conventional approaches encounter challenges in simultaneously adjusting multiple parameters. Here, we propose a transition composite gate (TCG) scheme grounded on state-selective transition path engineering, enabling more expressive conditional operations. We experimentally validate a controlled unitary (CU) gate as an example, with independent and continuous parameters. By adjusting the parameters of $\rm X^{12}$ gate, we obtain the CU family with a fidelity range of 95.2% to 99.0% leveraging quantum process tomography (QPT). To demonstrate the capability of circuit compression, we use TCG scheme to prepare 3-qubit Greenberger-Horne-Zeilinger (GHZ) and W states, with the fidelity of 96.77% and 95.72%. TCG can achieve the reduction in circuit depth of about 40% and 44% compared with the use of CZ gates only. Moreover, we show that short-path TCG (SPTCG) can further reduce the state-preparation circuit time cost. The TCG scheme exhibits advantages in certain quantum circuits and shows significant potential for large-scale quantum algorithms.
△ Less
Submitted 10 July, 2024; v1 submitted 9 July, 2024;
originally announced July 2024.
-
Revolutionizing Battery Disassembly: The Design and Implementation of a Battery Disassembly Autonomous Mobile Manipulator Robot(BEAM-1)
Authors:
Yanlong Peng,
Zhigang Wang,
Yisheng Zhang,
Shengmin Zhang,
Nan Cai,
Fan Wu,
Ming Chen
Abstract:
The efficient disassembly of end-of-life electric vehicle batteries(EOL-EVBs) is crucial for green manufacturing and sustainable development. The current pre-programmed disassembly conducted by the Autonomous Mobile Manipulator Robot(AMMR) struggles to meet the disassembly requirements in dynamic environments, complex scenarios, and unstructured processes. In this paper, we propose a Battery Disas…
▽ More
The efficient disassembly of end-of-life electric vehicle batteries(EOL-EVBs) is crucial for green manufacturing and sustainable development. The current pre-programmed disassembly conducted by the Autonomous Mobile Manipulator Robot(AMMR) struggles to meet the disassembly requirements in dynamic environments, complex scenarios, and unstructured processes. In this paper, we propose a Battery Disassembly AMMR(BEAM-1) system based on NeuralSymbolic AI. It detects the environmental state by leveraging a combination of multi-sensors and neural predicates and then translates this information into a quasi-symbolic space. In real-time, it identifies the optimal sequence of action primitives through LLM-heuristic tree search, ensuring high-precision execution of these primitives. Additionally, it employs positional speculative sampling using intuitive networks and achieves the disassembly of various bolt types with a meticulously designed end-effector. Importantly, BEAM-1 is a continuously learning embodied intelligence system capable of subjective reasoning like a human, and possessing intuition. A large number of real scene experiments have proved that it can autonomously perceive, decide, and execute to complete the continuous disassembly of bolts in multiple, multi-category, and complex situations, with a success rate of 98.78%. This research attempts to use NeuroSymbolic AI to give robots real autonomous reasoning, planning, and learning capabilities. BEAM-1 realizes the revolution of battery disassembly. Its framework can be easily ported to any robotic system to realize different application scenarios, which provides a ground-breaking idea for the design and implementation of future embodied intelligent robotic systems.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
LuSNAR:A Lunar Segmentation, Navigation and Reconstruction Dataset based on Muti-sensor for Autonomous Exploration
Authors:
Jiayi Liu,
Qianyu Zhang,
Xue Wan,
Shengyang Zhang,
Yaolin Tian,
Haodong Han,
Yutao Zhao,
Baichuan Liu,
Zeyuan Zhao,
Xubo Luo
Abstract:
With the complexity of lunar exploration missions, the moon needs to have a higher level of autonomy. Environmental perception and navigation algorithms are the foundation for lunar rovers to achieve autonomous exploration. The development and verification of algorithms require highly reliable data support. Most of the existing lunar datasets are targeted at a single task, lacking diverse scenes a…
▽ More
With the complexity of lunar exploration missions, the moon needs to have a higher level of autonomy. Environmental perception and navigation algorithms are the foundation for lunar rovers to achieve autonomous exploration. The development and verification of algorithms require highly reliable data support. Most of the existing lunar datasets are targeted at a single task, lacking diverse scenes and high-precision ground truth labels. To address this issue, we propose a multi-task, multi-scene, and multi-label lunar benchmark dataset LuSNAR. This dataset can be used for comprehensive evaluation of autonomous perception and navigation systems, including high-resolution stereo image pairs, panoramic semantic labels, dense depth maps, LiDAR point clouds, and the position of rover. In order to provide richer scene data, we built 9 lunar simulation scenes based on Unreal Engine. Each scene is divided according to topographic relief and the density of objects. To verify the usability of the dataset, we evaluated and analyzed the algorithms of semantic segmentation, 3D reconstruction, and autonomous navigation. The experiment results prove that the dataset proposed in this paper can be used for ground verification of tasks such as autonomous environment perception and navigation, and provides a lunar benchmark dataset for testing the accessibility of algorithm metrics. We make LuSNAR publicly available at: https://github.com/autumn999999/LuSNAR-dataset.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
A comparative study of ultraluminous infrared galaxies in the IRAS and SDSS Surveys
Authors:
Shaohua Zhang,
Zhijian Luo,
Xiheng Shi,
Chenggan Shu,
Hubing Xiao,
Hongyan Zhou
Abstract:
We present a comprehensive study of Ultraluminous Infrared Galaxies (ULIRGs), leveraging data from the IRAS Faint Source Catalogue (FSC) and the spectroscopic catalog in the Sloan Digital Sky Survey (SDSS) DR16. Our meticulous cross-matching technique significantly enhances the reliability of ULIRG identification, resulting in the identification of 283 reliable ULIRGs, including 102 new detections…
▽ More
We present a comprehensive study of Ultraluminous Infrared Galaxies (ULIRGs), leveraging data from the IRAS Faint Source Catalogue (FSC) and the spectroscopic catalog in the Sloan Digital Sky Survey (SDSS) DR16. Our meticulous cross-matching technique significantly enhances the reliability of ULIRG identification, resulting in the identification of 283 reliable ULIRGs, including 102 new detections, while discarding 120 previously reported false sources. Covering a redshift range of $z = 0.018 - 0.996$, with a median redshift of $\bar{z} = 0.259$, our uniform sample reveals apparent interaction features in approximately 40\% of ULIRGs, increasing to 92\% for those with $z < 0.1$. Through optical spectra analysis, it is indicated that over 58\% of ULIRGs host an AGN, which is twice as high as the detections based solely on infrared colors. Moreover, a pronounced excess of radio emissions associated with AGN activity results in a steeper radio-far-infrared correlation. Notably, Type I ULIRGs exhibit properties similar to those of narrow-line Seyfert 1 galaxies (NLS1s), with an elevated incidence rate of \ion{Mg}{2} BALs (16.7\%), surpassing that of typical optically selected quasars by over tenfold, consistent with current evolutionary models. We anticipate that forthcoming telescopes such as the China Space Station Telescope (CSST) and Leighton Chajnantor Telescope (LCT) will provide deeper insights into ULIRG morphology, dust distribution, molecular gas, and AGN activity.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
Authors:
Ruibo Fu,
Xin Qi,
Zhengqi Wen,
Jianhua Tao,
Tao Wang,
Chunyu Qiang,
Zhiyong Wang,
Yi Lu,
Xiaopeng Wang,
Shuchen Shi,
Yukun Liu,
Xuefei Liu,
Shuai Zhang
Abstract:
Speaker adaptation, which involves cloning voices from unseen speakers in the Text-to-Speech task, has garnered significant interest due to its numerous applications in multi-media fields. Despite recent advancements, existing methods often struggle with inadequate speaker representation accuracy and overfitting, particularly in limited reference speeches scenarios. To address these challenges, we…
▽ More
Speaker adaptation, which involves cloning voices from unseen speakers in the Text-to-Speech task, has garnered significant interest due to its numerous applications in multi-media fields. Despite recent advancements, existing methods often struggle with inadequate speaker representation accuracy and overfitting, particularly in limited reference speeches scenarios. To address these challenges, we propose an Agile Speaker Representation Reinforcement Learning strategy to enhance speaker similarity in speaker adaptation tasks. ASRRL is the first work to apply reinforcement learning to improve the modeling accuracy of speaker embeddings in speaker adaptation, addressing the challenge of decoupling voice content and timbre. Our approach introduces two action strategies tailored to different reference speeches scenarios. In the single-sentence scenario, a knowledge-oriented optimal routine searching RL method is employed to expedite the exploration and retrieval of refinement information on the fringe of speaker representations. In the few-sentence scenario, we utilize a dynamic RL method to adaptively fuse reference speeches, enhancing the robustness and accuracy of speaker modeling. To achieve optimal results in the target domain, a multi-scale fusion scoring mechanism based reward model that evaluates speaker similarity, speech quality, and intelligibility across three dimensions is proposed, ensuring that improvements in speaker similarity do not compromise speech quality or intelligibility. The experimental results on the LibriTTS and VCTK datasets within mainstream TTS frameworks demonstrate the extensibility and generalization capabilities of the proposed ASRRL method. The results indicate that the ASRRL method significantly outperforms traditional fine-tuning approaches, achieving higher speaker similarity and better overall speech quality with limited reference speeches.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens
Authors:
Zhihao Du,
Qian Chen,
Shiliang Zhang,
Kai Hu,
Heng Lu,
Yexin Yang,
Hangrui Hu,
Siqi Zheng,
Yue Gu,
Ziyang Ma,
Zhifu Gao,
Zhijie Yan
Abstract:
Recent years have witnessed a trend that large language model (LLM) based text-to-speech (TTS) emerges into the mainstream due to their high naturalness and zero-shot capacity. In this paradigm, speech signals are discretized into token sequences, which are modeled by an LLM with text as prompts and reconstructed by a token-based vocoder to waveforms. Obviously, speech tokens play a critical role…
▽ More
Recent years have witnessed a trend that large language model (LLM) based text-to-speech (TTS) emerges into the mainstream due to their high naturalness and zero-shot capacity. In this paradigm, speech signals are discretized into token sequences, which are modeled by an LLM with text as prompts and reconstructed by a token-based vocoder to waveforms. Obviously, speech tokens play a critical role in LLM-based TTS models. Current speech tokens are learned in an unsupervised manner, which lacks explicit semantic information and alignment to the text. In this paper, we propose to represent speech with supervised semantic tokens, which are derived from a multilingual speech recognition model by inserting vector quantization into the encoder. Based on the tokens, we further propose a scalable zero-shot TTS synthesizer, CosyVoice, which consists of an LLM for text-to-token generation and a conditional flow matching model for token-to-speech synthesis. Experimental results show that supervised semantic tokens significantly outperform existing unsupervised tokens in terms of content consistency and speaker similarity for zero-shot voice cloning. Moreover, we find that utilizing large-scale data further improves the synthesis performance, indicating the scalable capacity of CosyVoice. To the best of our knowledge, this is the first attempt to involve supervised speech tokens into TTS models.
△ Less
Submitted 9 July, 2024; v1 submitted 7 July, 2024;
originally announced July 2024.
-
A timing view of the additional high-energy spectral component discovered in the black hole candidate Swift J1727.8-1613
Authors:
Zi-Xu Yang,
Liang Zhang,
Shuang-Nan Zhang,
L. Tao,
Shu Zhang,
Ruican Ma,
Qingcui Bu,
Yue Huang,
He-Xin Liu,
Wei Yu,
Guang C. Xiao,
Peng-Ju Wang,
Hua Feng,
Li-Ming Song,
Xiang Ma,
Mingyu Ge,
QingChang Zhao,
J. L. Qu
Abstract:
We present an energy-dependent analysis for the type-C quasi-periodic oscillations (QPOs) observed in the black hole X-ray binary Swift J1727.8-1613 using Insight-HXMT observations. We find that the QPO fractional rms at energies above 40 keV is significantly higher than that below 20 keV. This is the first report of a high energy (HE)-rms excess in the rms spectrum of a black hole X-ray binary. I…
▽ More
We present an energy-dependent analysis for the type-C quasi-periodic oscillations (QPOs) observed in the black hole X-ray binary Swift J1727.8-1613 using Insight-HXMT observations. We find that the QPO fractional rms at energies above 40 keV is significantly higher than that below 20 keV. This is the first report of a high energy (HE)-rms excess in the rms spectrum of a black hole X-ray binary. In the high energy band, an extra hard component is observed in additional to the standard thermal Comptonization component at similar energy band. The value of the QPO HE-rms excess is not only correlated with the disk parameters and the photon index of the standard Comptonization component, but also exhibits a moderate positive correlation with the flux of the additional hard spectral component. No features in the QPO phase-lag spectra are seen corresponding to the additional hard component. We propose that the additional hard component in the spectrum may originate from jet emission and the associated QPO HE-rms excess can be explained by the precession of the jet base.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Three-Body Recombination of Ultracold Microwave-Shielded Polar Molecules
Authors:
Ian Stevenson,
Shayamal Singh,
Ahmed Elkamshishy,
Niccoló Bigagli,
Weijun Yuan,
Siwei Zhang,
Chris H. Greene,
Sebastian Will
Abstract:
A combined experimental and theoretical study is carried out on the three-body recombination process in a gas of microwave-shielded polar molecules. For ground-state polar molecules dressed with a strong microwave field, field-linked bound states can appear in the intermolecular potential. We model three-body recombination into such bound states using classical trajectory calculations. Our results…
▽ More
A combined experimental and theoretical study is carried out on the three-body recombination process in a gas of microwave-shielded polar molecules. For ground-state polar molecules dressed with a strong microwave field, field-linked bound states can appear in the intermolecular potential. We model three-body recombination into such bound states using classical trajectory calculations. Our results show that recombination can explain the enhanced loss rates observed at small microwave detunings in trapped samples of bosonic NaCs [Bigagli, $\textit{et al.}$, Nat. Phys. $\textbf{19}$ 1579-1584 (2023)]. Specifically, our calculations reproduce the experimentally measured three-body loss rates across a wide range of microwave Rabi couplings, detunings, and temperatures. This work suggests that for bosonic shielded molecular systems in which the two-body loss is sufficiently suppressed and a field-linked bound state is present, the dominant loss process will be three-body recombination.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.