-
Measurement of the branching fraction of $D^+_s\to \ell^+ν_\ell$ via $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (634 additional authors not shown)
Abstract:
Based on $10.64~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data taken at center-of-mass energies between 4.237 and 4.699 GeV with the BESIII detector, we study the leptonic $D^+_s$ decays using the $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$ process. The branching fractions of $D_s^+\to\ell^+ν_{\ell}\,(\ell=μ,τ)$ are measured to be $\mathcal{B}(D_s^+\toμ^+ν_μ)=(\bfmuv)\%$ and…
▽ More
Based on $10.64~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data taken at center-of-mass energies between 4.237 and 4.699 GeV with the BESIII detector, we study the leptonic $D^+_s$ decays using the $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$ process. The branching fractions of $D_s^+\to\ell^+ν_{\ell}\,(\ell=μ,τ)$ are measured to be $\mathcal{B}(D_s^+\toμ^+ν_μ)=(\bfmuv)\%$ and $\mathcal{B}(D_s^+\toτ^+ν_τ)=(\bftauv)\%$, respectively. The product of the decay constant and Cabibbo-Kobayashi-Maskawa matrix element $|V_{cs}|$ is determined to be $f_{D_s^+}|V_{cs}|=(\mufdsxvcsresult)_{μν}~\mathrm{MeV}$ and $f_{D_s^+}|V_{cs}|=(\taufdsxvcsresult))_{τν}~\mathrm{MeV}$, respectively. Taking the value of $|V_{cs}|$ from a global fit in the Standard Model, we obtain ${f_{D^+_s}}=(\mufdsresult)_{μν}$\,MeV and ${f_{D^+_s}}=(\taufdsresult)_{τν}$\,MeV, respectively. Conversely, taking the value for $f_{D_s^+}$ from the latest lattice quantum chromodynamics calculation, we obtain $|V_{cs}| =(\muvcsresult)_{μν}$ and $|V_{cs}| = (\tauvcsresult)_{τν}$, respectively.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
A Quantum Automatic Tool for Finding Impossible Differentials
Authors:
Huiqin Xie,
Qiqing Xia,
Ke Wang,
Yanjun Li,
Li Yang
Abstract:
Due to the superiority of quantum computing, traditional cryptography is facing severe threat. This makes the security evaluation of cryptographic systems in quantum attack models significant and urgent. For symmetric ciphers, the security analysis heavily relies on cyptanalytic tools. Thus exploring the use of quantum algorithms to traditional cyptanalytic tools has drawn a lot of attention. In t…
▽ More
Due to the superiority of quantum computing, traditional cryptography is facing severe threat. This makes the security evaluation of cryptographic systems in quantum attack models significant and urgent. For symmetric ciphers, the security analysis heavily relies on cyptanalytic tools. Thus exploring the use of quantum algorithms to traditional cyptanalytic tools has drawn a lot of attention. In this study, we utilize quantum algorithms to improve impossible differential attack, and design two quantum automatic tools for searching impossible differentials. The proposed quantum algorithms exploit the idea of miss-in-the-middle and the properties of truncated differentials. We rigorously prove their validity and calculate the quantum resources required to implement them. Compared to existing classical automatic cryptanalysis, the quantum tools proposed have the advantage of accurately characterizing S-boxes while only requiring polynomial complexity, and can take into consideration the impact of the key schedules in single-key model.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
Language-Augmented Symbolic Planner for Open-World Task Planning
Authors:
Guanqi Chen,
Lei Yang,
Ruixing Jia,
Zhe Hu,
Yizhou Chen,
Wei Zhang,
Wenping Wang,
Jia Pan
Abstract:
Enabling robotic agents to perform complex long-horizon tasks has been a long-standing goal in robotics and artificial intelligence (AI). Despite the potential shown by large language models (LLMs), their planning capabilities remain limited to short-horizon tasks and they are unable to replace the symbolic planning approach. Symbolic planners, on the other hand, may encounter execution errors due…
▽ More
Enabling robotic agents to perform complex long-horizon tasks has been a long-standing goal in robotics and artificial intelligence (AI). Despite the potential shown by large language models (LLMs), their planning capabilities remain limited to short-horizon tasks and they are unable to replace the symbolic planning approach. Symbolic planners, on the other hand, may encounter execution errors due to their common assumption of complete domain knowledge which is hard to manually prepare for an open-world setting. In this paper, we introduce a Language-Augmented Symbolic Planner (LASP) that integrates pre-trained LLMs to enable conventional symbolic planners to operate in an open-world environment where only incomplete knowledge of action preconditions, objects, and properties is initially available. In case of execution errors, LASP can utilize the LLM to diagnose the cause of the error based on the observation and interact with the environment to incrementally build up its knowledge base necessary for accomplishing the given tasks. Experiments demonstrate that LASP is proficient in solving planning problems in the open-world setting, performing well even in situations where there are multiple gaps in the knowledge.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
Quantum Beam Splitter as a Quantum Coherence Controller
Authors:
Li-Ping Yang,
Yue Chang
Abstract:
We propose a quantum beam splitter (QBS) with tunable reflection and transmission coefficients. More importantly, our device based on a Hermitian parity-time ($\mathcal{PT}$) symmetric system enables the generation and manipulation of asymmetric quantum coherence of the output photons. For the interference of two weak coherent-state inputs, our QBS can produce anti-bunched photons from one output…
▽ More
We propose a quantum beam splitter (QBS) with tunable reflection and transmission coefficients. More importantly, our device based on a Hermitian parity-time ($\mathcal{PT}$) symmetric system enables the generation and manipulation of asymmetric quantum coherence of the output photons. For the interference of two weak coherent-state inputs, our QBS can produce anti-bunched photons from one output port and bunched photons from the other, showcasing high parity asymmetry and strong coherence control capabilities. Beyond the Hong-Ou-Mandel effect, perfect photon blockade with vanishing $g^{(2)}(0)$ is achievable in two-photon interference. These striking effects of the QBS fundamentally arise from the parity-symmetry-breaking interaction and the quantum interference between the photon scattering channels. Our results could inspire novel applications and the development of innovative photonic devices for the manipulation of weak quantum light.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection
Authors:
Xingyu Peng,
Yan Bai,
Chen Gao,
Lirong Yang,
Fei Xia,
Beipeng Mu,
Xiaofei Wang,
Si Liu
Abstract:
Open-Vocabulary Detection (OVD) is the task of detecting all interesting objects in a given scene without predefined object classes. Extensive work has been done to deal with the OVD for 2D RGB images, but the exploration of 3D OVD is still limited. Intuitively, lidar point clouds provide 3D information, both object level and scene level, to generate trustful detection results. However, previous l…
▽ More
Open-Vocabulary Detection (OVD) is the task of detecting all interesting objects in a given scene without predefined object classes. Extensive work has been done to deal with the OVD for 2D RGB images, but the exploration of 3D OVD is still limited. Intuitively, lidar point clouds provide 3D information, both object level and scene level, to generate trustful detection results. However, previous lidar-based OVD methods only focus on the usage of object-level features, ignoring the essence of scene-level information. In this paper, we propose a Global-Local Collaborative Scheme (GLIS) for the lidar-based OVD task, which contains a local branch to generate object-level detection result and a global branch to obtain scene-level global feature. With the global-local information, a Large Language Model (LLM) is applied for chain-of-thought inference, and the detection result can be refined accordingly. We further propose Reflected Pseudo Labels Generation (RPLG) to generate high-quality pseudo labels for supervision and Background-Aware Object Localization (BAOL) to select precise object proposals. Extensive experiments on ScanNetV2 and SUN RGB-D demonstrate the superiority of our methods. Code is released at https://github.com/GradiusTwinbee/GLIS.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On
Authors:
Liang Zeng,
Liangjun Zhong,
Liang Zhao,
Tianwen Wei,
Liu Yang,
Jujie He,
Cheng Cheng,
Rui Hu,
Yang Liu,
Shuicheng Yan,
Han Fang,
Yahui Zhou
Abstract:
In this paper, we investigate the underlying factors that potentially enhance the mathematical reasoning capabilities of large language models (LLMs). We argue that the data scaling law for math reasoning capabilities in modern LLMs is far from being saturated, highlighting how the model's quality improves with increases in data quantity. To support this claim, we introduce the Skywork-Math model…
▽ More
In this paper, we investigate the underlying factors that potentially enhance the mathematical reasoning capabilities of large language models (LLMs). We argue that the data scaling law for math reasoning capabilities in modern LLMs is far from being saturated, highlighting how the model's quality improves with increases in data quantity. To support this claim, we introduce the Skywork-Math model series, supervised fine-tuned (SFT) on common 7B LLMs using our proposed 2.5M-instance Skywork-MathQA dataset. Skywork-Math 7B has achieved impressive accuracies of 51.2% on the competition-level MATH benchmark and 83.9% on the GSM8K benchmark using only SFT data, outperforming an early version of GPT-4 on MATH. The superior performance of Skywork-Math models contributes to our novel two-stage data synthesis and model SFT pipelines, which include three different augmentation methods and a diverse seed problem set, ensuring both the quantity and quality of Skywork-MathQA dataset across varying difficulty levels. Most importantly, we provide several practical takeaways to enhance math reasoning abilities in LLMs for both research and industry applications.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
A Critical Review of Causal Reasoning Benchmarks for Large Language Models
Authors:
Linying Yang,
Vik Shirvaikar,
Oscar Clivio,
Fabian Falck
Abstract:
Numerous benchmarks aim to evaluate the capabilities of Large Language Models (LLMs) for causal inference and reasoning. However, many of them can likely be solved through the retrieval of domain knowledge, questioning whether they achieve their purpose. In this review, we present a comprehensive overview of LLM benchmarks for causality. We highlight how recent benchmarks move towards a more thoro…
▽ More
Numerous benchmarks aim to evaluate the capabilities of Large Language Models (LLMs) for causal inference and reasoning. However, many of them can likely be solved through the retrieval of domain knowledge, questioning whether they achieve their purpose. In this review, we present a comprehensive overview of LLM benchmarks for causality. We highlight how recent benchmarks move towards a more thorough definition of causal reasoning by incorporating interventional or counterfactual reasoning. We derive a set of criteria that a useful benchmark or set of benchmarks should aim to satisfy. We hope this work will pave the way towards a general framework for the assessment of causal understanding in LLMs and the design of novel benchmarks.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Envelopes created by pseudo-circle families in the Minkowski plane
Authors:
Yongqiao Wang,
Lin Yang,
Yuan Chang,
Pengcheng Li
Abstract:
In this paper, we address the topic of envelopes created by pseudo-circle families in the Minkowski plane, which exhibit some different properties when compared with the Euclidean case. We provide solutions to all four basic problems associated with these envelopes, namely the existence problem, representation problem, problem on the number of envelopes, and problem on the relationships of definit…
▽ More
In this paper, we address the topic of envelopes created by pseudo-circle families in the Minkowski plane, which exhibit some different properties when compared with the Euclidean case. We provide solutions to all four basic problems associated with these envelopes, namely the existence problem, representation problem, problem on the number of envelopes, and problem on the relationships of definitions.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$
Authors:
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (645 additional authors not shown)
Abstract:
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be…
▽ More
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Electronic Correlation and Pseudogap-like Behavior of High-Temperature Superconductor La3Ni2O7
Authors:
Yidian Li,
Xian Du,
Yantao Cao,
Cuiying Pei,
Mingxin Zhang,
Wenxuan Zhao,
Kaiyi Zhai,
Runzhe Xu,
Zhongkai Liu,
Zhiwei Li,
Jinkui Zhao,
Gang Li,
Yanpeng Qi,
Hanjie Guo,
Yulin Chen,
Lexian Yang
Abstract:
High-temperature superconductivity (HTSC) remains one of the most challenging and fascinating mysteries in condensed matter physics. Recently, superconductivity with transition temperature exceeding liquid-nitrogen temperature is discovered in La3Ni2O7 at high pressure, which provides a new platform to explore the unconventional HTSC. In this work, using high-resolution angle-resolved photoemissio…
▽ More
High-temperature superconductivity (HTSC) remains one of the most challenging and fascinating mysteries in condensed matter physics. Recently, superconductivity with transition temperature exceeding liquid-nitrogen temperature is discovered in La3Ni2O7 at high pressure, which provides a new platform to explore the unconventional HTSC. In this work, using high-resolution angle-resolved photoemission spectroscopy and ab-initio calculation, we systematically investigate the electronic structures of La3Ni2O7 at ambient pressure. Our experiments are in nice agreement with ab-initio calculations after considering an orbital-dependent band renormalization effect. The strong electron correlation effect pushes a flat band of d_(z^2 ) orbital component below the Fermi level (EF), which is predicted to locate right at EF under high pressure. Moreover, the d_(x^2-y^2 ) band shows a pseudogap-like behavior with suppressed spectral weight and diminished quasiparticle peak near EF. Our findings provide important insights into the electronic structure of La3Ni2O7, which will shed light on the understanding of the unconventional superconductivity in nickelates.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
ViTime: A Visual Intelligence-Based Foundation Model for Time Series Forecasting
Authors:
Luoxiao Yang,
Yun Wang,
Xinqi Fan,
Israel Cohen,
Yue Zhao,
Zijun Zhang
Abstract:
The success of large pretrained models in natural language processing (NLP) and computer vision (CV) has opened new avenues for constructing foundation models for time series forecasting (TSF). Traditional TSF foundation models rely heavily on numerical data fitting. In contrast, the human brain is inherently skilled at processing visual information, prefer predicting future trends by observing vi…
▽ More
The success of large pretrained models in natural language processing (NLP) and computer vision (CV) has opened new avenues for constructing foundation models for time series forecasting (TSF). Traditional TSF foundation models rely heavily on numerical data fitting. In contrast, the human brain is inherently skilled at processing visual information, prefer predicting future trends by observing visualized sequences. From a biomimetic perspective, utilizing models to directly process numerical sequences might not be the most effective route to achieving Artificial General Intelligence (AGI). This paper proposes ViTime, a novel Visual Intelligence-based foundation model for TSF. ViTime overcomes the limitations of numerical time series data fitting by utilizing visual data processing paradigms and employs a innovative data synthesis method during training, called Real Time Series (RealTS). Experiments on a diverse set of previously unseen forecasting datasets demonstrate that ViTime achieves state-of-the-art zero-shot performance, even surpassing the best individually trained supervised models in some situations. These findings suggest that visual intelligence can significantly enhance time series analysis and forecasting, paving the way for more advanced and versatile models in the field. The code for our framework is accessible at https://github.com/IkeYang/ViTime.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
TARGO: Benchmarking Target-driven Object Grasping under Occlusions
Authors:
Yan Xia,
Ran Ding,
Ziyuan Qin,
Guanqi Zhan,
Kaichen Zhou,
Long Yang,
Hao Dong,
Daniel Cremers
Abstract:
Recent advances in predicting 6D grasp poses from a single depth image have led to promising performance in robotic grasping. However, previous grasping models face challenges in cluttered environments where nearby objects impact the target object's grasp. In this paper, we first establish a new benchmark dataset for TARget-driven Grasping under Occlusions, named TARGO. We make the following contr…
▽ More
Recent advances in predicting 6D grasp poses from a single depth image have led to promising performance in robotic grasping. However, previous grasping models face challenges in cluttered environments where nearby objects impact the target object's grasp. In this paper, we first establish a new benchmark dataset for TARget-driven Grasping under Occlusions, named TARGO. We make the following contributions: 1) We are the first to study the occlusion level of grasping. 2) We set up an evaluation benchmark consisting of large-scale synthetic data and part of real-world data, and we evaluated five grasp models and found that even the current SOTA model suffers when the occlusion level increases, leaving grasping under occlusion still a challenge. 3) We also generate a large-scale training dataset via a scalable pipeline, which can be used to boost the performance of grasping under occlusion and generalized to the real world. 4) We further propose a transformer-based grasping model involving a shape completion module, termed TARGO-Net, which performs most robustly as occlusion increases. Our benchmark dataset can be found at https://TARGO-benchmark.github.io/.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU
Authors:
Daliang Xu,
Hao Zhang,
Liming Yang,
Ruiqi Liu,
Gang Huang,
Mengwei Xu,
Xuanzhe Liu
Abstract:
On-device large language models (LLMs) are catalyzing novel mobile applications such as UI task automation and personalized email auto-reply, without giving away users' private data. However, on-device LLMs still suffer from unacceptably long inference latency, especially the time to first token (prefill stage) due to the need of long context for accurate, personalized content generation, as well…
▽ More
On-device large language models (LLMs) are catalyzing novel mobile applications such as UI task automation and personalized email auto-reply, without giving away users' private data. However, on-device LLMs still suffer from unacceptably long inference latency, especially the time to first token (prefill stage) due to the need of long context for accurate, personalized content generation, as well as the lack of parallel computing capacity of mobile CPU/GPU.
To enable practical on-device LLM, we present mllm-NPU, the first-of-its-kind LLM inference system that efficiently leverages on-device Neural Processing Unit (NPU) offloading. Essentially, mllm-NPU is an algorithm-system co-design that tackles a few semantic gaps between the LLM architecture and contemporary NPU design. Specifically, it re-constructs the prompt and model in three levels: (1) At prompt level, it divides variable-length prompts into multiple fixed-sized chunks while maintaining data dependencies; (2) At tensor level, it identifies and extracts significant outliers to run on the CPU/GPU in parallel with minimal overhead; (3) At block level, it schedules Transformer blocks in an out-of-order manner to the CPU/GPU and NPU based on their hardware affinity and sensitivity to accuracy. Compared to competitive baselines, mllm-NPU achieves 22.4x faster prefill speed and 30.7x energy savings on average, and up to 32.8x speedup in an end-to-end real-world application. For the first time, mllm-NPU achieves more than 1,000 tokens/sec prefilling for a billion-sized model (Qwen1.5-1.8B), paving the way towards practical on-device LLM.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering
Authors:
Pingyi Chen,
Chenglu Zhu,
Sunyi Zheng,
Honglin Li,
Lin Yang
Abstract:
Whole slide imaging is routinely adopted for carcinoma diagnosis and prognosis. Abundant experience is required for pathologists to achieve accurate and reliable diagnostic results of whole slide images (WSI). The huge size and heterogeneous features of WSIs make the workflow of pathological reading extremely time-consuming. In this paper, we propose a novel framework (WSI-VQA) to interpret WSIs b…
▽ More
Whole slide imaging is routinely adopted for carcinoma diagnosis and prognosis. Abundant experience is required for pathologists to achieve accurate and reliable diagnostic results of whole slide images (WSI). The huge size and heterogeneous features of WSIs make the workflow of pathological reading extremely time-consuming. In this paper, we propose a novel framework (WSI-VQA) to interpret WSIs by generative visual question answering. WSI-VQA shows universality by reframing various kinds of slide-level tasks in a question-answering pattern, in which pathologists can achieve immunohistochemical grading, survival prediction, and tumor subtyping following human-machine interaction. Furthermore, we establish a WSI-VQA dataset which contains 8672 slide-level question-answering pairs with 977 WSIs. Besides the ability to deal with different slide-level tasks, our generative model which is named Wsi2Text Transformer (W2T) outperforms existing discriminative models in medical correctness, which reveals the potential of our model to be applied in the clinical scenario. Additionally, we also visualize the co-attention mapping between word embeddings and WSIs as an intuitive explanation for diagnostic results. The dataset and related code are available at https://github.com/cpystan/WSI-VQA.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
$\mathrm{E^{2}CFD}$: Towards Effective and Efficient Cost Function Design for Safe Reinforcement Learning via Large Language Model
Authors:
Zepeng Wang,
Chao Ma,
Linjiang Zhou,
Libing Wu,
Lei Yang,
Xiaochuan Shi,
Guojun Peng
Abstract:
Different classes of safe reinforcement learning algorithms have shown satisfactory performance in various types of safety requirement scenarios. However, the existing methods mainly address one or several classes of specific safety requirement scenario problems and cannot be applied to arbitrary safety requirement scenarios. In addition, the optimization objectives of existing reinforcement learn…
▽ More
Different classes of safe reinforcement learning algorithms have shown satisfactory performance in various types of safety requirement scenarios. However, the existing methods mainly address one or several classes of specific safety requirement scenario problems and cannot be applied to arbitrary safety requirement scenarios. In addition, the optimization objectives of existing reinforcement learning algorithms are misaligned with the task requirements. Based on the need to address these issues, we propose $\mathrm{E^{2}CFD}$, an effective and efficient cost function design framework. $\mathrm{E^{2}CFD}$ leverages the capabilities of a large language model (LLM) to comprehend various safety scenarios and generate corresponding cost functions. It incorporates the \textit{fast performance evaluation (FPE)} method to facilitate rapid and iterative updates to the generated cost function. Through this iterative process, $\mathrm{E^{2}CFD}$ aims to obtain the most suitable cost function for policy training, tailored to the specific tasks within the safety scenario. Experiments have proven that the performance of policies trained using this framework is superior to traditional safe reinforcement learning algorithms and policies trained with carefully designed cost functions.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Fine-grained Dynamic Network for Generic Event Boundary Detection
Authors:
Ziwei Zheng,
Lijun He,
Le Yang,
Fan Li
Abstract:
Generic event boundary detection (GEBD) aims at pinpointing event boundaries naturally perceived by humans, playing a crucial role in understanding long-form videos. Given the diverse nature of generic boundaries, spanning different video appearances, objects, and actions, this task remains challenging. Existing methods usually detect various boundaries by the same protocol, regardless of their di…
▽ More
Generic event boundary detection (GEBD) aims at pinpointing event boundaries naturally perceived by humans, playing a crucial role in understanding long-form videos. Given the diverse nature of generic boundaries, spanning different video appearances, objects, and actions, this task remains challenging. Existing methods usually detect various boundaries by the same protocol, regardless of their distinctive characteristics and detection difficulties, resulting in suboptimal performance. Intuitively, a more intelligent and reasonable way is to adaptively detect boundaries by considering their special properties. In light of this, we propose a novel dynamic pipeline for generic event boundaries named DyBDet. By introducing a multi-exit network architecture, DyBDet automatically learns the subnet allocation to different video snippets, enabling fine-grained detection for various boundaries. Besides, a multi-order difference detector is also proposed to ensure generic boundaries can be effectively identified and adaptively processed. Extensive experiments on the challenging Kinetics-GEBD and TAPOS datasets demonstrate that adopting the dynamic strategy significantly benefits GEBD tasks, leading to obvious improvements in both performance and efficiency compared to the current state-of-the-art.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
DyFADet: Dynamic Feature Aggregation for Temporal Action Detection
Authors:
Le Yang,
Ziwei Zheng,
Yizeng Han,
Hao Cheng,
Shiji Song,
Gao Huang,
Fan Li
Abstract:
Recent proposed neural network-based Temporal Action Detection (TAD) models are inherently limited to extracting the discriminative representations and modeling action instances with various lengths from complex scenes by shared-weights detection heads. Inspired by the successes in dynamic neural networks, in this paper, we build a novel dynamic feature aggregation (DFA) module that can simultaneo…
▽ More
Recent proposed neural network-based Temporal Action Detection (TAD) models are inherently limited to extracting the discriminative representations and modeling action instances with various lengths from complex scenes by shared-weights detection heads. Inspired by the successes in dynamic neural networks, in this paper, we build a novel dynamic feature aggregation (DFA) module that can simultaneously adapt kernel weights and receptive fields at different timestamps. Based on DFA, the proposed dynamic encoder layer aggregates the temporal features within the action time ranges and guarantees the discriminability of the extracted representations. Moreover, using DFA helps to develop a Dynamic TAD head (DyHead), which adaptively aggregates the multi-scale features with adjusted parameters and learned receptive fields better to detect the action instances with diverse ranges from videos. With the proposed encoder layer and DyHead, a new dynamic TAD model, DyFADet, achieves promising performance on a series of challenging TAD benchmarks, including HACS-Segment, THUMOS14, ActivityNet-1.3, Epic-Kitchen 100, Ego4D-Moment QueriesV1.0, and FineAction. Code is released to https://github.com/yangle15/DyFADet-pytorch.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be…
▽ More
A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be $\mathcal{B}(J/ψ\to p \bar{p} η(η\to γγ)) = (1.480 \pm 0.001 \pm 0.024)\times\,10^{-3}$ and $\mathcal{B}(J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)) = (1.557 \pm 0.003 \pm 0.038)\times\,10^{-3}$, where the first uncertainties are statistical and the second systematic. Both results are compatible within their uncorrelated systematic uncertainties. The combined result is $\mathcal{B}(J/ψ\to p \bar{p} η)=(1.495 \pm 0.001 \pm 0.023)\times\,10^{-3}$ where the first uncertainty is the combined statistical uncertainty and the second one the combined systematic uncertainty of both analyses, incorporating correlations between them. In addition, the $p \bar{p}$ threshold region is investigated for a potential threshold enhancement, and no evidence for one is observed.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Consistency Flow Matching: Defining Straight Flows with Velocity Consistency
Authors:
Ling Yang,
Zixiang Zhang,
Zhilong Zhang,
Xingchao Liu,
Minkai Xu,
Wentao Zhang,
Chenlin Meng,
Stefano Ermon,
Bin Cui
Abstract:
Flow matching (FM) is a general framework for defining probability paths via Ordinary Differential Equations (ODEs) to transform between noise and data samples. Recent approaches attempt to straighten these flow trajectories to generate high-quality samples with fewer function evaluations, typically through iterative rectification methods or optimal transport solutions. In this paper, we introduce…
▽ More
Flow matching (FM) is a general framework for defining probability paths via Ordinary Differential Equations (ODEs) to transform between noise and data samples. Recent approaches attempt to straighten these flow trajectories to generate high-quality samples with fewer function evaluations, typically through iterative rectification methods or optimal transport solutions. In this paper, we introduce Consistency Flow Matching (Consistency-FM), a novel FM method that explicitly enforces self-consistency in the velocity field. Consistency-FM directly defines straight flows starting from different times to the same endpoint, imposing constraints on their velocity values. Additionally, we propose a multi-segment training approach for Consistency-FM to enhance expressiveness, achieving a better trade-off between sampling quality and speed. Preliminary experiments demonstrate that our Consistency-FM significantly improves training efficiency by converging 4.4x faster than consistency models and 1.7x faster than rectified flow models while achieving better generation quality. Our code is available at: https://github.com/YangLing0818/consistency_flow_matching
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
SwiftDiffusion: Efficient Diffusion Model Serving with Add-on Modules
Authors:
Suyi Li,
Lingyun Yang,
Xiaoxiao Jiang,
Hanfeng Lu,
Zhipeng Di,
Weiyi Lu,
Jiawei Chen,
Kan Liu,
Yinghao Yu,
Tao Lan,
Guodong Yang,
Lin Qu,
Liping Zhang,
Wei Wang
Abstract:
This paper documents our characterization study and practices for serving text-to-image requests with stable diffusion models in production. We first comprehensively analyze inference request traces for commercial text-to-image applications. It commences with our observation that add-on modules, i.e., ControlNets and LoRAs, that augment the base stable diffusion models, are ubiquitous in generatin…
▽ More
This paper documents our characterization study and practices for serving text-to-image requests with stable diffusion models in production. We first comprehensively analyze inference request traces for commercial text-to-image applications. It commences with our observation that add-on modules, i.e., ControlNets and LoRAs, that augment the base stable diffusion models, are ubiquitous in generating images for commercial applications. Despite their efficacy, these add-on modules incur high loading overhead, prolong the serving latency, and swallow up expensive GPU resources. Driven by our characterization study, we present SwiftDiffusion, a system that efficiently generates high-quality images using stable diffusion models and add-on modules. To achieve this, SwiftDiffusion reconstructs the existing text-to-image serving workflow by identifying the opportunities for parallel computation and distributing ControlNet computations across multiple GPUs. Further, SwiftDiffusion thoroughly analyzes the dynamics of image generation and develops techniques to eliminate the overhead associated with LoRA loading and patching while preserving the image quality. Last, SwiftDiffusion proposes specialized optimizations in the backbone architecture of the stable diffusion models, which are also compatible with the efficient serving of add-on modules. Compared to state-of-the-art text-to-image serving systems, SwiftDiffusion reduces serving latency by up to 5x and improves serving throughput by up to 2x without compromising image quality.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Origin of the Chromospheric Umbral Waves in Sunspots
Authors:
Xinsheng Zhang,
Xiaoli Yan,
Zhike Xue,
Jincheng Wang,
Zhe Xu,
Qiaoling Li,
Yang Peng,
Liping Yang
Abstract:
Oscillations are ubiquitous in sunspots and the associated higher atmospheres. However, it is still unclear whether these oscillations are driven by the external acoustic waves (p-modes) or generated by the internal magnetoconvection. To obtain clues about the driving source of umbral waves in sunspots, we analyzed the spiral wave patterns (SWPs) in two sunspots registered by IRIS MgII 2796 Å slit…
▽ More
Oscillations are ubiquitous in sunspots and the associated higher atmospheres. However, it is still unclear whether these oscillations are driven by the external acoustic waves (p-modes) or generated by the internal magnetoconvection. To obtain clues about the driving source of umbral waves in sunspots, we analyzed the spiral wave patterns (SWPs) in two sunspots registered by IRIS MgII 2796 Å slit-jaw images. By tracking the motion of the SWPs, we find for the first time that two one-armed SWPs coexist in the umbra, and they can rotate either in the same or opposite directions. Furthermore, by analyzing the spatial distribution of the oscillation centers of the one-armed SWPs within the umbra (the oscillation center is defined as the location where the SWP first appears), we find that the chromospheric umbral waves repeatedly originate from the regions with high oscillation power and most of the umbral waves occur in the dark nuclei and strong magnetic field regions of the umbra. Our study results indicate that the chromospheric umbral waves are likely excited by the p-mode oscillations.
△ Less
Submitted 3 July, 2024; v1 submitted 2 July, 2024;
originally announced July 2024.
-
CURLS: Causal Rule Learning for Subgroups with Significant Treatment Effect
Authors:
Jiehui Zhou,
Linxiao Yang,
Xingyu Liu,
Xinyue Gu,
Liang Sun,
Wei Chen
Abstract:
In causal inference, estimating heterogeneous treatment effects (HTE) is critical for identifying how different subgroups respond to interventions, with broad applications in fields such as precision medicine and personalized advertising. Although HTE estimation methods aim to improve accuracy, how to provide explicit subgroup descriptions remains unclear, hindering data interpretation and strateg…
▽ More
In causal inference, estimating heterogeneous treatment effects (HTE) is critical for identifying how different subgroups respond to interventions, with broad applications in fields such as precision medicine and personalized advertising. Although HTE estimation methods aim to improve accuracy, how to provide explicit subgroup descriptions remains unclear, hindering data interpretation and strategic intervention management. In this paper, we propose CURLS, a novel rule learning method leveraging HTE, which can effectively describe subgroups with significant treatment effects. Specifically, we frame causal rule learning as a discrete optimization problem, finely balancing treatment effect with variance and considering the rule interpretability. We design an iterative procedure based on the minorize-maximization algorithm and solve a submodular lower bound as an approximation for the original. Quantitative experiments and qualitative case studies verify that compared with state-of-the-art methods, CURLS can find subgroups where the estimated and true effects are 16.1% and 13.8% higher and the variance is 12.0% smaller, while maintaining similar or better estimation accuracy and rule interpretability. Code is available at https://osf.io/zwp2k/.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Multi-Modal Fusion-Based Multi-Task Semantic Communication System
Authors:
Zengle Zhu,
Rongqing Zhang,
Xiang Cheng,
Liuqing Yang
Abstract:
In recent years, there has been significant progress in semantic communication systems empowered by deep learning techniques. It has greatly improved the efficiency of information transmission. Nevertheless, traditional semantic communication models still face challenges, particularly due to their single-task and single-modal orientation. Many of these models are designed for specific tasks, which…
▽ More
In recent years, there has been significant progress in semantic communication systems empowered by deep learning techniques. It has greatly improved the efficiency of information transmission. Nevertheless, traditional semantic communication models still face challenges, particularly due to their single-task and single-modal orientation. Many of these models are designed for specific tasks, which may result in limitations when applied to multi-task communication systems. Moreover, these models often overlook the correlations among different modal data in multi-modal tasks. It leads to an incomplete understanding of complex information, causing increased communication overhead and diminished performance. To address these problems, we propose a multi-modal fusion-based multi-task semantic communication (MFMSC) framework. In contrast to traditional semantic communication approaches, MFMSC can effectively handle various tasks across multiple modalities. Furthermore, we design a fusion module based on Bidirectional Encoder Representations from Transformers (BERT) for multi-modal semantic information fusion. By leveraging the powerful semantic understanding capabilities and self-attention mechanism of BERT, we achieve effective fusion of semantic information from different modalities. We compare our model with multiple benchmarks. Simulation results show that MFMSC outperforms these models in terms of both performance and communication overhead.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Active-RIS-Aided Covert Communications in NOMA-Inspired ISAC Wireless Systems
Authors:
Miaomiao Zhu,
Pengxu Chen,
Liang Yang,
Alexandros-Apostolos A. Boulogeorgos,
Theodoros A. Tsiftsis,
Hongwu Liu
Abstract:
Non-orthogonal multiple access (NOMA)-inspired integrated sensing and communication (ISAC) facilitates spectrum sharing for radar sensing and NOMA communications, whereas facing privacy and security challenges due to open wireless propagation. In this paper, active reconfigurable intelligent surface (RIS) is employed to aid covert communications in NOMA-inspired ISAC wireless system with the aim o…
▽ More
Non-orthogonal multiple access (NOMA)-inspired integrated sensing and communication (ISAC) facilitates spectrum sharing for radar sensing and NOMA communications, whereas facing privacy and security challenges due to open wireless propagation. In this paper, active reconfigurable intelligent surface (RIS) is employed to aid covert communications in NOMA-inspired ISAC wireless system with the aim of maximizing the covert rate. Specifically, a dual-function base-station (BS) transmits the superposition signal to sense multiple targets, while achieving covert and reliable communications for a pair of NOMA covert and public users, respectively, in the presence of a warden. Two superposition transmission schemes, namely, the transmissions with dedicated sensing signal (w-DSS) and without dedicated sensing signal (w/o-DSS), are respectively considered in the formulations of the joint transmission and reflection beamforming optimization problems. Numerical results demonstrate that active-RIS-aided NOMA-ISAC system outperforms the passive-RIS-aided and without-RIS counterparts in terms of covert rate and trade-off between covert communication and sensing performance metrics. Finally, the w/o-DSS scheme, which omits the dedicated sensing signal, achieves a higher covert rate than the w-DSS scheme by allocating more transmit power for the covert transmissions, while preserving a comparable multi-target sensing performance.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Imaging of single barium atoms in a second matrix site in solid xenon for barium tagging in a $^{136}$Xe double beta decay experiment
Authors:
M. Yvaine,
D. Fairbank,
J. Soderstrom,
C. Taylor,
J. Stanley,
T. Walton,
C. Chambers,
A. Iverson,
W. Fairbank,
S. Al Kharusi,
A. Amy,
E. Angelico,
A. Anker,
I. J. Arnquist,
A. Atencio,
J. Bane,
V. Belov,
E. P. Bernard,
T. Bhatta,
A. Bolotnikov,
J. Breslin,
P. A. Breur,
J. P. Brodsky,
E. Brown,
T. Brunner
, et al. (112 additional authors not shown)
Abstract:
Neutrinoless double beta decay is one of the most sensitive probes for new physics beyond the Standard Model of particle physics. One of the isotopes under investigation is $^{136}$Xe, which would double beta decay into $^{136}$Ba. Detecting the single $^{136}$Ba daughter provides a sort of ultimate tool in the discrimination against backgrounds. Previous work demonstrated the ability to perform s…
▽ More
Neutrinoless double beta decay is one of the most sensitive probes for new physics beyond the Standard Model of particle physics. One of the isotopes under investigation is $^{136}$Xe, which would double beta decay into $^{136}$Ba. Detecting the single $^{136}$Ba daughter provides a sort of ultimate tool in the discrimination against backgrounds. Previous work demonstrated the ability to perform single atom imaging of Ba atoms in a single-vacancy site of a solid xenon matrix. In this paper, the effort to identify signal from individual barium atoms is extended to Ba atoms in a hexa-vacancy site in the matrix and is achieved despite increased photobleaching in this site. Abrupt fluorescence turn-off of a single Ba atom is also observed. Significant recovery of fluorescence signal lost through photobleaching is demonstrated upon annealing of Ba deposits in the Xe ice. Following annealing, it is observed that Ba atoms in the hexa-vacancy site exhibit antibleaching while Ba atoms in the tetra-vacancy site exhibit bleaching. This may be evidence for a matrix site transfer upon laser excitation. Our findings offer a path of continued research toward tagging of Ba daughters in all significant sites in solid xenon.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration
Authors:
Yuxuan Sun,
Yunlong Zhang,
Yixuan Si,
Chenglu Zhu,
Zhongyi Shui,
Kai Zhang,
Jingxiong Li,
Xingheng Lyu,
Tao Lin,
Lin Yang
Abstract:
Vision Language Models (VLMs) like CLIP have attracted substantial attention in pathology, serving as backbones for applications such as zero-shot image classification and Whole Slide Image (WSI) analysis. Additionally, they can function as vision encoders when combined with large language models (LLMs) to support broader capabilities. Current efforts to train pathology VLMs rely on pathology imag…
▽ More
Vision Language Models (VLMs) like CLIP have attracted substantial attention in pathology, serving as backbones for applications such as zero-shot image classification and Whole Slide Image (WSI) analysis. Additionally, they can function as vision encoders when combined with large language models (LLMs) to support broader capabilities. Current efforts to train pathology VLMs rely on pathology image-text pairs from platforms like PubMed, YouTube, and Twitter, which provide limited, unscalable data with generally suboptimal image quality. In this work, we leverage large-scale WSI datasets like TCGA to extract numerous high-quality image patches. We then train a large multimodal model to generate captions for these images, creating PathGen-1.6M, a dataset containing 1.6 million high-quality image-caption pairs. Our approach involves multiple agent models collaborating to extract representative WSI patches, generating and refining captions to obtain high-quality image-text pairs. Extensive experiments show that integrating these generated pairs with existing datasets to train a pathology-specific CLIP model, PathGen-CLIP, significantly enhances its ability to analyze pathological images, with substantial improvements across nine pathology-related zero-shot image classification tasks and three whole-slide image tasks. Furthermore, we construct 200K instruction-tuning data based on PathGen-1.6M and integrate PathGen-CLIP with the Vicuna LLM to create more powerful multimodal models through instruction tuning. Overall, we provide a scalable pathway for high-quality data generation in pathology, paving the way for next-generation general pathology models.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
S. Ahmed,
M. Albrecht,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
X. H. Bai,
Y. Bai,
O. Bakina,
R. Baldini Ferroli,
I. Balossino,
Y. Ban,
K. Begzsuren,
N. Berger,
M. Bertani,
D. Bettoni,
F. Bianchi,
J. Bloms,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (495 additional authors not shown)
Abstract:
Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions…
▽ More
Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions $\frac{\mathcal{B}(h_c\rightarrow e^+e^-η_c)}{\mathcal{B}(h_c\rightarrow γη_c)}$ separately for the $h_c$ samples produced via $ψ(3686)\toπ^0h_c$ and $e^+e^-\toπ^+π^-h_c$. The average ratio is determined to be $(0.59\pm0.10(\text{stat.})\pm0.04(\text{syst.}))\%$, where the uncertainty includes both statistical and systematic components.
△ Less
Submitted 2 July, 2024; v1 submitted 28 June, 2024;
originally announced July 2024.
-
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Authors:
Longrong Yang,
Dong Sheng,
Chaoxiang Cai,
Fan Yang,
Size Li,
Di Zhang,
Xi Li
Abstract:
The Mixture-of-Experts (MoE) has gained increasing attention in the study of Large Vision-Language Models (LVLMs). It uses a sparse model to replace the dense model, achieving comparable performance while activating fewer parameters during inference, thus significantly reducing the inference cost. Existing MoE methods in LVLMs encourage different experts to handle different tokens, and thus they e…
▽ More
The Mixture-of-Experts (MoE) has gained increasing attention in the study of Large Vision-Language Models (LVLMs). It uses a sparse model to replace the dense model, achieving comparable performance while activating fewer parameters during inference, thus significantly reducing the inference cost. Existing MoE methods in LVLMs encourage different experts to handle different tokens, and thus they employ a router to predict the routing for each token. However, the predictions are based solely on sample features and do not truly reveal the optimization direction of tokens. This can lead to severe optimization conflicts between different tokens within an expert. To address this problem, this paper proposes a novel method based on token-level gradient analysis. Specifically, we first use token-level gradients to identify conflicting tokens in experts. Then, we add a specialized loss tailored to eliminate conflicts among tokens within each expert. Our method can serve as a plug-in for diverse Large Vision-Language Models, and extensive experimental results demonstrate the effectiveness of our method. The code will be publicly available at https://github.com/longrongyang/STGC.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
PathAlign: A vision-language model for whole slide images in histopathology
Authors:
Faruk Ahmed,
Andrew Sellergren,
Lin Yang,
Shawn Xu,
Boris Babenko,
Abbi Ward,
Niels Olson,
Arash Mohtashamian,
Yossi Matias,
Greg S. Corrado,
Quang Duong,
Dale R. Webster,
Shravya Shetty,
Daniel Golden,
Yun Liu,
David F. Steiner,
Ellery Wulczyn
Abstract:
Microscopic interpretation of histopathology images underlies many important diagnostic and treatment decisions. While advances in vision-language modeling raise new opportunities for analysis of such images, the gigapixel-scale size of whole slide images (WSIs) introduces unique challenges. Additionally, pathology reports simultaneously highlight key findings from small regions while also aggrega…
▽ More
Microscopic interpretation of histopathology images underlies many important diagnostic and treatment decisions. While advances in vision-language modeling raise new opportunities for analysis of such images, the gigapixel-scale size of whole slide images (WSIs) introduces unique challenges. Additionally, pathology reports simultaneously highlight key findings from small regions while also aggregating interpretation across multiple slides, often making it difficult to create robust image-text pairs. As such, pathology reports remain a largely untapped source of supervision in computational pathology, with most efforts relying on region-of-interest annotations or self-supervision at the patch-level. In this work, we develop a vision-language model based on the BLIP-2 framework using WSIs paired with curated text from pathology reports. This enables applications utilizing a shared image-text embedding space, such as text or image retrieval for finding cases of interest, as well as integration of the WSI encoder with a frozen large language model (LLM) for WSI-based generative text capabilities such as report generation or AI-in-the-loop interactions. We utilize a de-identified dataset of over 350,000 WSIs and diagnostic text pairs, spanning a wide range of diagnoses, procedure types, and tissue types. We present pathologist evaluation of text generation and text retrieval using WSI embeddings, as well as results for WSI classification and workflow prioritization (slide-level triaging). Model-generated text for WSIs was rated by pathologists as accurate, without clinically significant error or omission, for 78% of WSIs on average. This work demonstrates exciting potential capabilities for language-aligned WSI embeddings.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Improved measurement of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential dec…
▽ More
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential decay rate of $D^+_s\to K^0 e^+ν_e$ to be $f^{K^0}_+(0)=0.636\pm0.049\pm0.013$. For both measurements, the first uncertainty is statistical and the second systematic. The branching fraction and form factor measurements are factors of 1.6 and 1.7 more precise than the previous world averages, respectively.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Geometric Features Enhanced Human-Object Interaction Detection
Authors:
Manli Zhu,
Edmond S. L. Ho,
Shuang Chen,
Longzhi Yang,
Hubert P. H. Shum
Abstract:
Cameras are essential vision instruments to capture images for pattern detection and measurement. Human-object interaction (HOI) detection is one of the most popular pattern detection approaches for captured human-centric visual scenes. Recently, Transformer-based models have become the dominant approach for HOI detection due to their advanced network architectures and thus promising results. Howe…
▽ More
Cameras are essential vision instruments to capture images for pattern detection and measurement. Human-object interaction (HOI) detection is one of the most popular pattern detection approaches for captured human-centric visual scenes. Recently, Transformer-based models have become the dominant approach for HOI detection due to their advanced network architectures and thus promising results. However, most of them follow the one-stage design of vanilla Transformer, leaving rich geometric priors under-exploited and leading to compromised performance especially when occlusion occurs. Given that geometric features tend to outperform visual ones in occluded scenarios and offer information that complements visual cues, we propose a novel end-to-end Transformer-style HOI detection model, i.e., geometric features enhanced HOI detector (GeoHOI). One key part of the model is a new unified self-supervised keypoint learning method named UniPointNet that bridges the gap of consistent keypoint representation across diverse object categories, including humans. GeoHOI effectively upgrades a Transformer-based HOI detector benefiting from the keypoints similarities measuring the likelihood of human-object interactions as well as local keypoint patches to enhance interaction query representation, so as to boost HOI predictions. Extensive experiments show that the proposed method outperforms the state-of-the-art models on V-COCO and achieves competitive performance on HICO-DET. Case study results on the post-disaster rescue with vision-based instruments showcase the applicability of the proposed GeoHOI in real-world applications.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Confident Natural Policy Gradient for Local Planning in $q_π$-realizable Constrained MDPs
Authors:
Tian Tian,
Lin F. Yang,
Csaba Szepesvári
Abstract:
The constrained Markov decision process (CMDP) framework emerges as an important reinforcement learning approach for imposing safety or other critical objectives while maximizing cumulative reward. However, the current understanding of how to learn efficiently in a CMDP environment with a potentially infinite number of states remains under investigation, particularly when function approximation is…
▽ More
The constrained Markov decision process (CMDP) framework emerges as an important reinforcement learning approach for imposing safety or other critical objectives while maximizing cumulative reward. However, the current understanding of how to learn efficiently in a CMDP environment with a potentially infinite number of states remains under investigation, particularly when function approximation is applied to the value functions. In this paper, we address the learning problem given linear function approximation with $q_π$-realizability, where the value functions of all policies are linearly representable with a known feature map, a setting known to be more general and challenging than other linear settings. Utilizing a local-access model, we propose a novel primal-dual algorithm that, after $\tilde{O}(\text{poly}(d) ε^{-3})$ queries, outputs with high probability a policy that strictly satisfies the constraints while nearly optimizing the value with respect to a reward function. Here, $d$ is the feature dimension and $ε> 0$ is a given error. The algorithm relies on a carefully crafted off-policy evaluation procedure to evaluate the policy using historical data, which informs policy updates through policy gradients and conserves samples. To our knowledge, this is the first result achieving polynomial sample complexity for CMDP in the $q_π$-realizable setting.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets
Authors:
Zuxin Liu,
Thai Hoang,
Jianguo Zhang,
Ming Zhu,
Tian Lan,
Shirley Kokane,
Juntao Tan,
Weiran Yao,
Zhiwei Liu,
Yihao Feng,
Rithesh Murthy,
Liangwei Yang,
Silvio Savarese,
Juan Carlos Niebles,
Huan Wang,
Shelby Heinecke,
Caiming Xiong
Abstract:
The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets in a scal…
▽ More
The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets in a scalable and structured manner. Each data in our dataset is verified through three hierarchical stages: format checking, actual function executions, and semantic verification, ensuring its reliability and correctness. We demonstrate that models trained with our curated datasets, even with only 7B parameters, can achieve state-of-the-art performance on the Berkeley Function-Calling Benchmark, outperforming multiple GPT-4 models. Moreover, our 1B model achieves exceptional performance, surpassing GPT-3.5-Turbo and Claude-3 Haiku. We release a dataset containing 60,000 high-quality entries, aiming to advance the field of function-calling agent domains. The dataset is available on Huggingface: https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k and the project homepage: https://apigen-pipeline.github.io/
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Measurement of the cross sections of $e^+e^-\to K^{-}\barΞ^{+}Λ/Σ^{0}$ at center-of-mass energies between 3.510 and 4.914 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of…
▽ More
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$, evidence for $ψ(4160) \to K^{-}\barΞ^{+}Λ$ is found for the first time with a significance of 4.4$σ$, including systematic uncertainties. No evidence for other possible resonances is found. In addition, the products of electronic partial width and branching fraction for all assumed resonances decaying into $K^{-}\barΞ^{+}Λ/Σ^{0}$ are determined.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
An Empirical Study of Unit Test Generation with Large Language Models
Authors:
Lin Yang,
Chen Yang,
Shutao Gao,
Weijing Wang,
Bo Wang,
Qihao Zhu,
Xiao Chu,
Jianyi Zhou,
Guangtai Liang,
Qianxiang Wang,
Junjie Chen
Abstract:
Unit testing is an essential activity in software development for verifying the correctness of software components. However, manually writing unit tests is challenging and time-consuming. The emergence of Large Language Models (LLMs) offers a new direction for automating unit test generation. Existing research primarily focuses on closed-source LLMs (e.g., ChatGPT and CodeX) with fixed prompting s…
▽ More
Unit testing is an essential activity in software development for verifying the correctness of software components. However, manually writing unit tests is challenging and time-consuming. The emergence of Large Language Models (LLMs) offers a new direction for automating unit test generation. Existing research primarily focuses on closed-source LLMs (e.g., ChatGPT and CodeX) with fixed prompting strategies, leaving the capabilities of advanced open-source LLMs with various prompting settings unexplored. Particularly, open-source LLMs offer advantages in data privacy protection and have demonstrated superior performance in some tasks. Moreover, effective prompting is crucial for maximizing LLMs' capabilities. In this paper, we conduct the first empirical study to fill this gap, based on 17 Java projects, five widely-used open-source LLMs with different structures and parameter sizes, and comprehensive evaluation metrics. Our findings highlight the significant influence of various prompt factors, show the performance of open-source LLMs compared to the commercial GPT-4 and the traditional Evosuite, and identify limitations in LLM-based unit test generation. We then derive a series of implications from our study to guide future research and practical use of LLM-based unit test generation.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Measurements of $K_S^0$-$K_L^0$ asymmetries in the decays $Λ_c^+ \to pK_{L,S}^0$, $pK_{L,S}^0π^+π^-$ and $pK_{L,S}^0π^0$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, an…
▽ More
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, and $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^0)=(2.02 \pm 0.13 \pm 0.05)\%$, where the first uncertainties are statistical and the second systematic. Combining with the known branching fractions of $Λ_c^+ \to pK_{S}^{0}$, $Λ_c^+ \to pK_{S}^{0}π^+π^-$, and $Λ_c^+ \to pK_{S}^{0}π^0$, we present the first measurements of the $K_{S}^{0}$-$K_{L}^{0}$ asymmetries $R(Λ_c^+, K_{S,L}^0X) = \frac{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) - \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) + \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}$ in charmed baryon decays: $R(Λ_c^+, pK_{S,L}^0) = -0.025 \pm 0.031$, $R(Λ_c^+, pK_{S,L}^0π^+π^-) = -0.027 \pm 0.048$, and $R(Λ_c^+, pK_{S,L}^0π^0) =-0.015 \pm 0.046$. No significant asymmetries within the uncertainties are observed.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Learning for Bandits under Action Erasures
Authors:
Osama Hanna,
Merve Karakas,
Lin F. Yang,
Christina Fragouli
Abstract:
We consider a novel multi-arm bandit (MAB) setup, where a learner needs to communicate the actions to distributed agents over erasure channels, while the rewards for the actions are directly available to the learner through external sensors. In our model, while the distributed agents know if an action is erased, the central learner does not (there is no feedback), and thus does not know whether th…
▽ More
We consider a novel multi-arm bandit (MAB) setup, where a learner needs to communicate the actions to distributed agents over erasure channels, while the rewards for the actions are directly available to the learner through external sensors. In our model, while the distributed agents know if an action is erased, the central learner does not (there is no feedback), and thus does not know whether the observed reward resulted from the desired action or not. We propose a scheme that can work on top of any (existing or future) MAB algorithm and make it robust to action erasures. Our scheme results in a worst-case regret over action-erasure channels that is at most a factor of $O(1/\sqrt{1-ε})$ away from the no-erasure worst-case regret of the underlying MAB algorithm, where $ε$ is the erasure probability. We also propose a modification of the successive arm elimination algorithm and prove that its worst-case regret is $\Tilde{O}(\sqrt{KT}+K/(1-ε))$, which we prove is optimal by providing a matching lower bound.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Effects of model size in density-functional-theory study of alloys: A case study of CsPbBr$_2$Cl
Authors:
Fang Pan,
Lin Yang,
Zhuangde Jiang,
Wei Ren,
Zuo-Guang Ye,
Jingrui Li
Abstract:
The primary challenge of density-functional-theory exploration of alloy systems concerns the size of computational model. Small alloy models can hardly exhibit the chemical disorder properly, while large models induce difficulty in sampling the alignments within the massive material space. We study this problem with the γ phase of the mixed halide inorganic perovskite alloy CsPbBr$_2$Cl. The distr…
▽ More
The primary challenge of density-functional-theory exploration of alloy systems concerns the size of computational model. Small alloy models can hardly exhibit the chemical disorder properly, while large models induce difficulty in sampling the alignments within the massive material space. We study this problem with the γ phase of the mixed halide inorganic perovskite alloy CsPbBr$_2$Cl. The distribution of alloy formation energy becomes narrower when the size of the model system increases along $\sqrt{2}\times\sqrt{2}\times2$, $2\times2\times2$, and $2\sqrt{2}\times2\sqrt{2}\times2$ models. This is primarily because the distribution of Br distribution parameters, which plays a leading role in determining the formation energy range, is more narrow for larger models. As a result, larger entropy stability effect can be observed with larger models especially at high temperatures, for which the approximation using mixing entropy based on the ideal solution model becomes better.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Study of the $f_{0}(980)$ through the decay $D_{s}^{+}\rightarrow π^{+}π^{+}π^{-}π^{0}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (649 additional authors not shown)
Abstract:
We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and…
▽ More
We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and determine the branching fractions $\mathcal{B}(D_s^+\toπ^+π^+π^-π^0|_{{\rm non}-η})=(2.04\pm0.08_{\rm stat.}\pm0.05_{\rm syst.})\%$ and $\mathcal{B}(D_s^+\toηπ^+)=(1.56\pm0.09_{\rm stat.}\pm0.04_{\rm syst.})\%$. Moreover, we measure the relative branching fraction between $φ\toπ^+π^-π^0$ and $φ\to K^+K^-$ to be $\frac{\mathcal{B}(φ(1020) \to π^+π^-π^0)}{\mathcal{B}(φ(1020) \to K^+K^-)}=0.230 \pm 0.014_{\rm stat.} \pm 0.010_{\rm syst.}$, which deviates from the world average value by more than $4σ$.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
The State-Action-Reward-State-Action Algorithm in Spatial Prisoner's Dilemma Game
Authors:
Lanyu Yang,
Dongchun Jiang,
Fuqiang Guo,
Mingjian Fu
Abstract:
Cooperative behavior is prevalent in both human society and nature. Understanding the emergence and maintenance of cooperation among self-interested individuals remains a significant challenge in evolutionary biology and social sciences. Reinforcement learning (RL) provides a suitable framework for studying evolutionary game theory as it can adapt to environmental changes and maximize expected ben…
▽ More
Cooperative behavior is prevalent in both human society and nature. Understanding the emergence and maintenance of cooperation among self-interested individuals remains a significant challenge in evolutionary biology and social sciences. Reinforcement learning (RL) provides a suitable framework for studying evolutionary game theory as it can adapt to environmental changes and maximize expected benefits. In this study, we employ the State-Action-Reward-State-Action (SARSA) algorithm as the decision-making mechanism for individuals in evolutionary game theory. Initially, we apply SARSA to imitation learning, where agents select neighbors to imitate based on rewards. This approach allows us to observe behavioral changes in agents without independent decision-making abilities. Subsequently, SARSA is utilized for primary agents to independently choose cooperation or betrayal with their neighbors. We evaluate the impact of SARSA on cooperation rates by analyzing variations in rewards and the distribution of cooperators and defectors within the network.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Robust Zero Trust Architecture: Joint Blockchain based Federated learning and Anomaly Detection based Framework
Authors:
Shiva Raj Pokhrel,
Luxing Yang,
Sutharshan Rajasegarar,
Gang Li
Abstract:
This paper introduces a robust zero-trust architecture (ZTA) tailored for the decentralized system that empowers efficient remote work and collaboration within IoT networks. Using blockchain-based federated learning principles, our proposed framework includes a robust aggregation mechanism designed to counteract malicious updates from compromised clients, enhancing the security of the global learn…
▽ More
This paper introduces a robust zero-trust architecture (ZTA) tailored for the decentralized system that empowers efficient remote work and collaboration within IoT networks. Using blockchain-based federated learning principles, our proposed framework includes a robust aggregation mechanism designed to counteract malicious updates from compromised clients, enhancing the security of the global learning process. Moreover, secure and reliable trust computation is essential for remote work and collaboration. The robust ZTA framework integrates anomaly detection and trust computation, ensuring secure and reliable device collaboration in a decentralized fashion. We introduce an adaptive algorithm that dynamically adjusts to varying user contexts, using unsupervised clustering to detect novel anomalies, like zero-day attacks. To ensure a reliable and scalable trust computation, we develop an algorithm that dynamically adapts to varying user contexts by employing incremental anomaly detection and clustering techniques to identify and share local and global anomalies between nodes. Future directions include scalability improvements, Dirichlet process for advanced anomaly detection, privacy-preserving techniques, and the integration of post-quantum cryptographic methods to safeguard against emerging quantum threats.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
UniCoder: Scaling Code Large Language Model via Universal Code
Authors:
Tao Sun,
Linzheng Chai,
Jian Yang,
Yuwei Yin,
Hongcheng Guo,
Jiaheng Liu,
Bing Wang,
Liqun Yang,
Zhoujun Li
Abstract:
Intermediate reasoning or acting steps have successfully improved large language models (LLMs) for handling various downstream natural language processing (NLP) tasks. When applying LLMs for code generation, recent works mainly focus on directing the models to articulate intermediate natural-language reasoning steps, as in chain-of-thought (CoT) prompting, and then output code with the natural lan…
▽ More
Intermediate reasoning or acting steps have successfully improved large language models (LLMs) for handling various downstream natural language processing (NLP) tasks. When applying LLMs for code generation, recent works mainly focus on directing the models to articulate intermediate natural-language reasoning steps, as in chain-of-thought (CoT) prompting, and then output code with the natural language or other structured intermediate steps. However, such output is not suitable for code translation or generation tasks since the standard CoT has different logical structures and forms of expression with the code. In this work, we introduce the universal code (UniCode) as the intermediate representation. It is a description of algorithm steps using a mix of conventions of programming languages, such as assignment operator, conditional operator, and loop. Hence, we collect an instruction dataset UniCoder-Instruct to train our model UniCoder on multi-task learning objectives. UniCoder-Instruct comprises natural-language questions, code solutions, and the corresponding universal code. The alignment between the intermediate universal code representation and the final code solution significantly improves the quality of the generated code. The experimental results demonstrate that UniCoder with the universal code significantly outperforms the previous prompting methods by a large margin, showcasing the effectiveness of the structural clues in pseudo-code.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models
Authors:
Jiajia Li,
Lu Yang,
Mingni Tang,
Cong Chen,
Zuchao Li,
Ping Wang,
Hai Zhao
Abstract:
Benchmark plays a pivotal role in assessing the advancements of large language models (LLMs). While numerous benchmarks have been proposed to evaluate LLMs' capabilities, there is a notable absence of a dedicated benchmark for assessing their musical abilities. To address this gap, we present ZIQI-Eval, a comprehensive and large-scale music benchmark specifically designed to evaluate the music-rel…
▽ More
Benchmark plays a pivotal role in assessing the advancements of large language models (LLMs). While numerous benchmarks have been proposed to evaluate LLMs' capabilities, there is a notable absence of a dedicated benchmark for assessing their musical abilities. To address this gap, we present ZIQI-Eval, a comprehensive and large-scale music benchmark specifically designed to evaluate the music-related capabilities of LLMs. ZIQI-Eval encompasses a wide range of questions, covering 10 major categories and 56 subcategories, resulting in over 14,000 meticulously curated data entries. By leveraging ZIQI-Eval, we conduct a comprehensive evaluation over 16 LLMs to evaluate and analyze LLMs' performance in the domain of music. Results indicate that all LLMs perform poorly on the ZIQI-Eval benchmark, suggesting significant room for improvement in their musical capabilities. With ZIQI-Eval, we aim to provide a standardized and robust evaluation framework that facilitates a comprehensive assessment of LLMs' music-related abilities. The dataset is available at GitHub\footnote{https://github.com/zcli-charlie/ZIQI-Eval} and HuggingFace\footnote{https://huggingface.co/datasets/MYTH-Lab/ZIQI-Eval}.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
ADR: Attention Diversification Regularization for Mitigating Overfitting in Multiple Instance Learning based Whole Slide Image Classification
Authors:
Yunlong Zhang,
Zhongyi Shui,
Yunxuan Sun,
Honglin Li,
Jingxiong Li,
Chenglu Zhu,
Sunyi Zheng,
Lin Yang
Abstract:
Multiple Instance Learning (MIL) has demonstrated effectiveness in analyzing whole slide images (WSIs), yet it often encounters overfitting challenges in real-world applications. This paper reveals the correlation between MIL's performance and the entropy of attention values. Based on this observation, we propose Attention Diversity Regularization (ADR), a simple but effective technique aimed at p…
▽ More
Multiple Instance Learning (MIL) has demonstrated effectiveness in analyzing whole slide images (WSIs), yet it often encounters overfitting challenges in real-world applications. This paper reveals the correlation between MIL's performance and the entropy of attention values. Based on this observation, we propose Attention Diversity Regularization (ADR), a simple but effective technique aimed at promoting high entropy in attention values. Specifically, ADR introduces a negative Shannon entropy loss for attention values into the regular MIL framework. Compared to existing methods aimed at alleviating overfitting, which often necessitate additional modules or processing steps, our ADR approach requires no such extras, demonstrating simplicity and efficiency. We evaluate our ADR on three WSI classification tasks. ADR achieves superior performance over the state-of-the-art on most of them. We also show that ADR can enhance heatmaps, aligning them better with pathologists' diagnostic criteria. The source code is available at \url{https://github.com/dazhangyu123/ADR}.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Sublattice Dichotomy in Monolayer FeSe Superconductor
Authors:
Cui Ding,
Zhipeng Xu,
Xiaotong Jiao,
Qiyin Hu,
Wenxuan Zhao,
Lexian Yang,
Kun Jiang,
Jin-Feng Jia,
Lili Wang,
Jiangping Hu,
Qi-Kun Xue
Abstract:
The pairing mechanism behind the monolayer FeSe is one essential question for iron-based superconductors. In this work, we show the sublattice degree of freedoms of monolayer FeSe plays a special role in its pairing properties, namely the sublattice dichotomy. The high-quality monolayer FeSe samples with atomic flat $1\times1$ topography on the SrTiO$_3$(001) substrates are grown by molecular beam…
▽ More
The pairing mechanism behind the monolayer FeSe is one essential question for iron-based superconductors. In this work, we show the sublattice degree of freedoms of monolayer FeSe plays a special role in its pairing properties, namely the sublattice dichotomy. The high-quality monolayer FeSe samples with atomic flat $1\times1$ topography on the SrTiO$_3$(001) substrates are grown by molecular beam epitaxy. By comparing the tunneling spectra at $α$ and $β$ Fe sublattices, we find the coherence peak of $α$-Fe at the inner gap $+V_i$ is higher than $β$-Fe while the coherence peak of $β$-Fe at $-V_i$ is higher than $α$-Fe with a similar amount. We also observed a reversed effect at the outer gap $\pm V_o$. We propose the $η$-pairing mechanism between $k$ and $-k+Q$ is the key mechanism for this unconventional sublattice dichotomy effect.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Search for the $e^+e^- \to φχ_{c1}(3872)$ process at BESIII
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction…
▽ More
Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction $\mathcal{B}[χ_{c1}(3872)\toπ^+π^- J/ψ]$ at 4.914 and 4.946 GeV are set to be 0.85 and 0.96 pb, respectively. These measurements provide useful information for the production of the $χ_{c1}(3872)$ at $e^+e^-$ collider and deepen our understanding about the nature of this particle.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
CoCPF: Coordinate-based Continuous Projection Field for Ill-Posed Inverse Problem in Imaging
Authors:
Zixuan Chen,
Lingxiao Yang,
Jian-Huang Lai,
Xiaohua Xie
Abstract:
Sparse-view computed tomography (SVCT) reconstruction aims to acquire CT images based on sparsely-sampled measurements. It allows the subjects exposed to less ionizing radiation, reducing the lifetime risk of developing cancers. Recent researches employ implicit neural representation (INR) techniques to reconstruct CT images from a single SV sinogram. However, due to ill-posedness, these INR-based…
▽ More
Sparse-view computed tomography (SVCT) reconstruction aims to acquire CT images based on sparsely-sampled measurements. It allows the subjects exposed to less ionizing radiation, reducing the lifetime risk of developing cancers. Recent researches employ implicit neural representation (INR) techniques to reconstruct CT images from a single SV sinogram. However, due to ill-posedness, these INR-based methods may leave considerable ``holes'' (i.e., unmodeled spaces) in their fields, leading to sub-optimal results. In this paper, we propose the Coordinate-based Continuous Projection Field (CoCPF), which aims to build hole-free representation fields for SVCT reconstruction, achieving better reconstruction quality. Specifically, to fill the holes, CoCPF first employs the stripe-based volume sampling module to broaden the sampling regions of Radon transformation from rays (1D space) to stripes (2D space), which can well cover the internal regions between SV projections. Then, by feeding the sampling regions into the proposed differentiable rendering modules, the holes can be jointly optimized during training, reducing the ill-posed levels. As a result, CoCPF can accurately estimate the internal measurements between SV projections (i.e., DV sinograms), producing high-quality CT images after re-projection. Extensive experiments on simulated and real projection datasets demonstrate that CoCPF outperforms state-of-the-art methods for 2D and 3D SVCT reconstructions under various projection numbers and geometries, yielding fine-grained details and fewer artifacts. Our code will be publicly available.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation
Authors:
Zixuan Chen,
Ruijie Su,
Jiahao Zhu,
Lingxiao Yang,
Jian-Huang Lai,
Xiaohua Xie
Abstract:
Text-to-3D generation aims to create 3D assets from text-to-image diffusion models. However, existing methods face an inherent bottleneck in generation quality because the widely-used objectives such as Score Distillation Sampling (SDS) inappropriately omit U-Net jacobians for swift generation, leading to significant bias compared to the "true" gradient obtained by full denoising sampling. This bi…
▽ More
Text-to-3D generation aims to create 3D assets from text-to-image diffusion models. However, existing methods face an inherent bottleneck in generation quality because the widely-used objectives such as Score Distillation Sampling (SDS) inappropriately omit U-Net jacobians for swift generation, leading to significant bias compared to the "true" gradient obtained by full denoising sampling. This bias brings inconsistent updating direction, resulting in implausible 3D generation e.g., color deviation, Janus problem, and semantically inconsistent details). In this work, we propose Pose-dependent Consistency Distillation Sampling (PCDS), a novel yet efficient objective for diffusion-based 3D generation tasks. Specifically, PCDS builds the pose-dependent consistency function within diffusion trajectories, allowing to approximate true gradients through minimal sampling steps (1-3). Compared to SDS, PCDS can acquire a more accurate updating direction with the same sampling time (1 sampling step), while enabling few-step (2-3) sampling to trade compute for higher generation quality. For efficient generation, we propose a coarse-to-fine optimization strategy, which first utilizes 1-step PCDS to create the basic structure of 3D objects, and then gradually increases PCDS steps to generate fine-grained details. Extensive experiments demonstrate that our approach outperforms the state-of-the-art in generation quality and training efficiency, conspicuously alleviating the implausible 3D generation issues caused by the deviated updating direction. Moreover, it can be simply applied to many 3D generative applications to yield impressive 3D assets, please see our project page: https://narcissusex.github.io/VividDreamer.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
TemPrompt: Multi-Task Prompt Learning for Temporal Relation Extraction in RAG-based Crowdsourcing Systems
Authors:
Jing Yang,
Yu Zhao,
Linyao Yang,
Xiao Wang,
Long Chen,
Fei-Yue Wang
Abstract:
Temporal relation extraction (TRE) aims to grasp the evolution of events or actions, and thus shape the workflow of associated tasks, so it holds promise in helping understand task requests initiated by requesters in crowdsourcing systems. However, existing methods still struggle with limited and unevenly distributed annotated data. Therefore, inspired by the abundant global knowledge stored withi…
▽ More
Temporal relation extraction (TRE) aims to grasp the evolution of events or actions, and thus shape the workflow of associated tasks, so it holds promise in helping understand task requests initiated by requesters in crowdsourcing systems. However, existing methods still struggle with limited and unevenly distributed annotated data. Therefore, inspired by the abundant global knowledge stored within pre-trained language models (PLMs), we propose a multi-task prompt learning framework for TRE (TemPrompt), incorporating prompt tuning and contrastive learning to tackle these issues. To elicit more effective prompts for PLMs, we introduce a task-oriented prompt construction approach that thoroughly takes the myriad factors of TRE into consideration for automatic prompt generation. In addition, we design temporal event reasoning in the form of masked language modeling as auxiliary tasks to bolster the model's focus on events and temporal cues. The experimental results demonstrate that TemPrompt outperforms all compared baselines across the majority of metrics under both standard and few-shot settings. A case study on designing and manufacturing printed circuit boards is provided to validate its effectiveness in crowdsourcing scenarios.
△ Less
Submitted 9 July, 2024; v1 submitted 20 June, 2024;
originally announced June 2024.
-
LLM4CP: Adapting Large Language Models for Channel Prediction
Authors:
Boxun Liu,
Xuanyu Liu,
Shijian Gao,
Xiang Cheng,
Liuqing Yang
Abstract:
Channel prediction is an effective approach for reducing the feedback or estimation overhead in massive multi-input multi-output (m-MIMO) systems. However, existing channel prediction methods lack precision due to model mismatch errors or network generalization issues. Large language models (LLMs) have demonstrated powerful modeling and generalization abilities, and have been successfully applied…
▽ More
Channel prediction is an effective approach for reducing the feedback or estimation overhead in massive multi-input multi-output (m-MIMO) systems. However, existing channel prediction methods lack precision due to model mismatch errors or network generalization issues. Large language models (LLMs) have demonstrated powerful modeling and generalization abilities, and have been successfully applied to cross-modal tasks, including the time series analysis. Leveraging the expressive power of LLMs, we propose a pre-trained LLM-empowered channel prediction method (LLM4CP) to predict the future downlink channel state information (CSI) sequence based on the historical uplink CSI sequence. We fine-tune the network while freezing most of the parameters of the pre-trained LLM for better cross-modality knowledge transfer. To bridge the gap between the channel data and the feature space of the LLM, preprocessor, embedding, and output modules are specifically tailored by taking into account unique channel characteristics. Simulations validate that the proposed method achieves SOTA prediction performance on full-sample, few-shot, and generalization tests with low training and inference costs.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.