-
Search for $X(1870)$ via the decay $J/ψ\to ωK^+ K^-η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the…
▽ More
Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the $90\%$ confidence level. In addition, the branching faction $B(J/ψ\toωK^+ K^- η)$ is measured to be $(3.33\pm0.02(\rm{stat.})\pm 0.12(\rm{syst.}))\times 10^{-4}$.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Conformance Testing of Relational DBMS Against SQL Specifications
Authors:
Shuang Liu,
Chenglin Tian,
Jun Sun,
Ruifeng Wang,
Wei Lu,
Yongxin Zhao,
Yinxing Xue,
Junjie Wang,
Xiaoyong Du
Abstract:
A Relational Database Management System (RDBMS) is one of the fundamental software that supports a wide range of applications, making it critical to identify bugs within these systems. There has been active research on testing RDBMS, most of which employ crash or use metamorphic relations as the oracle. Although existing approaches can detect bugs in RDBMS, they are far from comprehensively evalua…
▽ More
A Relational Database Management System (RDBMS) is one of the fundamental software that supports a wide range of applications, making it critical to identify bugs within these systems. There has been active research on testing RDBMS, most of which employ crash or use metamorphic relations as the oracle. Although existing approaches can detect bugs in RDBMS, they are far from comprehensively evaluating the RDBMS's correctness (i.e., with respect to the semantics of SQL). In this work, we propose a method to test the semantic conformance of RDBMS i.e., whether its behavior respects the intended semantics of SQL. Specifically, we have formally defined the semantics of SQL and implemented them in Prolog. Then, the Prolog implementation serves as the reference RDBMS, enabling differential testing on existing RDBMS. We applied our approach to four widely-used and thoroughly tested RDBMSs, i.e., MySQL, TiDB, SQLite, and DuckDB. In total, our approach uncovered 19 bugs and 11 inconsistencies, which are all related to violating the SQL specification or missing/unclear specification, thereby demonstrating the effectiveness and applicability of our approach.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Self-Sustainable Active Reconfigurable Intelligent Surfaces for Anti-Jamming in Wireless Communications
Authors:
Yang Cao,
Wenchi Cheng,
Jingqing Wang,
Wei Zhang
Abstract:
Wireless devices can be easily attacked by jammers during transmission, which is a potential security threat for wireless communications. Active reconfigurable intelligent surface (RIS) attracts considerable attention and is expected to be employed in anti-jamming systems for secure transmission to significantly enhance the anti-jamming performance. However, active RIS introduces external power lo…
▽ More
Wireless devices can be easily attacked by jammers during transmission, which is a potential security threat for wireless communications. Active reconfigurable intelligent surface (RIS) attracts considerable attention and is expected to be employed in anti-jamming systems for secure transmission to significantly enhance the anti-jamming performance. However, active RIS introduces external power load, which increases the complexity of hardware and restricts the flexible deployment of active RIS. To overcome these drawbacks, we design a innovative self-sustainable structure in this paper, where the active RIS is energized by harvesting energy from base station (BS) signals through the time dividing based simultaneous wireless information and power transfer (TD-SWIPT) scheme. Based on the above structure, we develop the BS harvesting scheme based on joint transmit and reflecting beamforming with the aim of maximizing the achievable rate of active RIS-assisted system, where the alternating optimization (AO) algorithm based on stochastic successive convex approximation (SSCA) tackles the nonconvex optimization problem in the scheme. Simulation results verified the effectiveness of our developed BS harvesting scheme, which can attain higher anti-jamming performance than other schemes when given the same maximum transmit power.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Authors:
Junke Wang,
Yi Jiang,
Zehuan Yuan,
Binyue Peng,
Zuxuan Wu,
Yu-Gang Jiang
Abstract:
Tokenizer, serving as a translator to map the intricate visual data into a compact latent space, lies at the core of visual generative models. Based on the finding that existing tokenizers are tailored to image or video inputs, this paper presents OmniTokenizer, a transformer-based tokenizer for joint image and video tokenization. OmniTokenizer is designed with a spatial-temporal decoupled archite…
▽ More
Tokenizer, serving as a translator to map the intricate visual data into a compact latent space, lies at the core of visual generative models. Based on the finding that existing tokenizers are tailored to image or video inputs, this paper presents OmniTokenizer, a transformer-based tokenizer for joint image and video tokenization. OmniTokenizer is designed with a spatial-temporal decoupled architecture, which integrates window and causal attention for spatial and temporal modeling. To exploit the complementary nature of image and video data, we further propose a progressive training strategy, where OmniTokenizer is first trained on image data on a fixed resolution to develop the spatial encoding capacity and then jointly trained on image and video data on multiple resolutions to learn the temporal dynamics. OmniTokenizer, for the first time, handles both image and video inputs within a unified framework and proves the possibility of realizing their synergy. Extensive experiments demonstrate that OmniTokenizer achieves state-of-the-art (SOTA) reconstruction performance on various image and video datasets, e.g., 1.11 reconstruction FID on ImageNet and 42 reconstruction FVD on UCF-101, beating the previous SOTA methods by 13% and 26%, respectively. Additionally, we also show that when integrated with OmniTokenizer, both language model-based approaches and diffusion models can realize advanced visual synthesis performance, underscoring the superiority and versatility of our method. Code is available at https://github.com/FoundationVision/OmniTokenizer.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior
Authors:
Baiang Li,
Sizhuo Ma,
Yanhong Zeng,
Xiaogang Xu,
Youqing Fang,
Zhao Zhang,
Jian Wang,
Kai Chen
Abstract:
Capturing High Dynamic Range (HDR) scenery using 8-bit cameras often suffers from over-/underexposure, loss of fine details due to low bit-depth compression, skewed color distributions, and strong noise in dark areas. Traditional LDR image enhancement methods primarily focus on color mapping, which enhances the visual representation by expanding the image's color range and adjusting the brightness…
▽ More
Capturing High Dynamic Range (HDR) scenery using 8-bit cameras often suffers from over-/underexposure, loss of fine details due to low bit-depth compression, skewed color distributions, and strong noise in dark areas. Traditional LDR image enhancement methods primarily focus on color mapping, which enhances the visual representation by expanding the image's color range and adjusting the brightness. However, these approaches fail to effectively restore content in dynamic range extremes, which are regions with pixel values close to 0 or 255. To address the full scope of challenges in HDR imaging and surpass the limitations of current models, we propose a novel two-stage approach. The first stage maps the color and brightness to an appropriate range while keeping the existing details, and the second stage utilizes a diffusion prior to generate content in dynamic range extremes lost during capture. This generative refinement module can also be used as a plug-and-play module to enhance and complement existing LDR enhancement models. The proposed method markedly improves the quality and details of LDR images, demonstrating superior performance through rigorous experimental validation. The project page is at https://sagiri0208.github.io
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases
Authors:
Meng Wang,
Tian Lin,
Aidi Lin,
Kai Yu,
Yuanyuan Peng,
Lianyu Wang,
Cheng Chen,
Ke Zou,
Huiyu Liang,
Man Chen,
Xue Yao,
Meiqin Zhang,
Binwei Huang,
Chaoxin Zheng,
Peixin Zhang,
Wei Chen,
Yilong Luo,
Yifan Chen,
Honghe Xia,
Tingkun Shi,
Qi Zhang,
Jinming Guo,
Xiaolin Chen,
Jingcheng Wang,
Yih Chung Tham
, et al. (24 additional authors not shown)
Abstract:
Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources…
▽ More
Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources, encompassing a diverse range of diseases across multiple ethnicities and countries. RetiZero exhibits superior performance in several downstream tasks, including zero-shot disease recognition, image-to-image retrieval, and internal- and cross-domain disease identification. In zero-shot scenarios, RetiZero achieves Top5 accuracy scores of 0.8430 for 15 fundus diseases and 0.7561 for 52 fundus diseases. For image retrieval, it achieves Top5 scores of 0.9500 and 0.8860 for the same disease sets, respectively. Clinical evaluations show that RetiZero's Top3 zero-shot performance surpasses the average of 19 ophthalmologists from Singapore, China and the United States. Furthermore, RetiZero significantly enhances clinicians' accuracy in diagnosing fundus disease. These findings underscore the value of integrating the RetiZero foundation model into clinical settings, where a variety of fundus diseases are encountered.
△ Less
Submitted 30 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Assessing Model Generalization in Vicinity
Authors:
Yuchi Liu,
Yifan Sun,
Jingdong Wang,
Liang Zheng
Abstract:
This paper evaluates the generalization ability of classification models on out-of-distribution test sets without depending on ground truth labels. Common approaches often calculate an unsupervised metric related to a specific model property, like confidence or invariance, which correlates with out-of-distribution accuracy. However, these metrics are typically computed for each test sample individ…
▽ More
This paper evaluates the generalization ability of classification models on out-of-distribution test sets without depending on ground truth labels. Common approaches often calculate an unsupervised metric related to a specific model property, like confidence or invariance, which correlates with out-of-distribution accuracy. However, these metrics are typically computed for each test sample individually, leading to potential issues caused by spurious model responses, such as overly high or low confidence. To tackle this challenge, we propose incorporating responses from neighboring test samples into the correctness assessment of each individual sample. In essence, if a model consistently demonstrates high correctness scores for nearby samples, it increases the likelihood of correctly predicting the target sample, and vice versa. The resulting scores are then averaged across all test samples to provide a holistic indication of model accuracy. Developed under the vicinal risk formulation, this approach, named vicinal risk proxy (VRP), computes accuracy without relying on labels. We show that applying the VRP method to existing generalization indicators, such as average confidence and effective invariance, consistently improves over these baselines both methodologically and experimentally. This yields a stronger correlation with model accuracy, especially on challenging out-of-distribution test sets.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Dense Outflowing Molecular Gas in Massive Star-forming Regions
Authors:
Yani Xu,
Junzhi Wang,
Shu Liu,
Juan Li,
Yuqiang LI,
Rui Luo,
Chao Ou,
Siqi Zheng,
Yijia Liu
Abstract:
Dense outflowing gas, traced by transitions of molecules with large dipole moment, is important for understanding mass loss and feedback of massive star formation. HCN 3-2 and HCO$^+$ 3-2 are good tracers of dense outflowing molecular gas, which are closely related to active star formation. In this study, we present on-the-fly (OTF) mapping observations of HCN 3-2 and HCO$^+$ 3-2 toward a sample o…
▽ More
Dense outflowing gas, traced by transitions of molecules with large dipole moment, is important for understanding mass loss and feedback of massive star formation. HCN 3-2 and HCO$^+$ 3-2 are good tracers of dense outflowing molecular gas, which are closely related to active star formation. In this study, we present on-the-fly (OTF) mapping observations of HCN 3-2 and HCO$^+$ 3-2 toward a sample of 33 massive star-forming regions using the 10-m Submillimeter Telescope (SMT). With the spatial distribution of line wings of HCO$^+$ 3-2 and HCN 3-2, outflows are detected in 25 sources, resulting in a detection rate of 76$\%$. The optically thin H$^{13}$CN and H$^{13}$CO$^+$ 3-2 lines are used to identify line wings as outflows and estimate core mass. The mass $M_{out}$, momentum $P_{out}$, kinetic energy $E_{K}$, force $F_{out}$ and mass loss rate $\dot M_{out}$ of outflow and core mass, are obtained for each source. A sublinear tight correlation is found between the mass of dense molecular outflow and core mass, with an index of $\sim$ 0.8 and a correlation coefficient of 0.88.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Spatially resolved analysis of Stellar Populations in NGC 2992: Impact of AGN feedback
Authors:
Xiaoyu Xu,
Junfeng Wang,
Zhiyuan Li,
Yanmei Chen
Abstract:
In NGC 2992, a galaxy-scale ionized gas outflow driven by AGN has long been recognized, yet its impact on the host galaxy has remained elusive. In this paper, we utilize data from the archival Very Large Telescope (VLT)/MUSE to present a spatially resolved analysis of stellar populations in this galaxy. Two different stellar population templates are employed to fit the stellar continuum, allowing…
▽ More
In NGC 2992, a galaxy-scale ionized gas outflow driven by AGN has long been recognized, yet its impact on the host galaxy has remained elusive. In this paper, we utilize data from the archival Very Large Telescope (VLT)/MUSE to present a spatially resolved analysis of stellar populations in this galaxy. Two different stellar population templates are employed to fit the stellar continuum, allowing us to determine the light-weighted stellar age, metallicity, the fraction of the young stellar population (age $<100$ Myr, $P_{\rm Y}$), and the average age and metallicity of $P_{\rm Y}$. Our results reveal the presence of a very young stellar population ($\leq40$ Myr) within the dust lane and nearly along the galaxy's major axis. The light-weighted stellar age and the fraction of $P_{\rm Y}$ show negative trends along the major and minor axes. The average age and metallicity of $P_{\rm Y}$ present positive trends with increasing distance, except along the northern direction of the major axis. Within the circumnuclear region ($<1$ kpc), the distribution of the young stellar population is spatially anti-correlated with the AGN outflow cone. The highest fraction of $P_{\rm Y}$ is observed at the outskirts of the nuclear radio bubble in the northern region near the nucleus. Considering the coupling efficiency and timescales, we propose that the AGN outflow in this galaxy may exert both negative and positive feedback on its host. Additionally, the star formation and the AGN activities could be attributed to the interaction between NGC 2992 and NGC 2993.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing
Authors:
Jiangshan Wang,
Yue Ma,
Jiayi Guo,
Yicheng Xiao,
Gao Huang,
Xiu Li
Abstract:
Video editing is an emerging task, in which most current methods adopt the pre-trained text-to-image (T2I) diffusion model to edit the source video in a zero-shot manner. Despite extensive efforts, maintaining the temporal consistency of edited videos remains challenging due to the lack of temporal constraints in the regular T2I diffusion model. To address this issue, we propose COrrespondence-gui…
▽ More
Video editing is an emerging task, in which most current methods adopt the pre-trained text-to-image (T2I) diffusion model to edit the source video in a zero-shot manner. Despite extensive efforts, maintaining the temporal consistency of edited videos remains challenging due to the lack of temporal constraints in the regular T2I diffusion model. To address this issue, we propose COrrespondence-guided Video Editing (COVE), leveraging the inherent diffusion feature correspondence to achieve high-quality and consistent video editing. Specifically, we propose an efficient sliding-window-based strategy to calculate the similarity among tokens in the diffusion features of source videos, identifying the tokens with high correspondence across frames. During the inversion and denoising process, we sample the tokens in noisy latent based on the correspondence and then perform self-attention within them. To save GPU memory usage and accelerate the editing process, we further introduce the temporal-dimensional token merging strategy, which can effectively reduce redundancy. COVE can be seamlessly integrated into the pre-trained T2I diffusion model without the need for extra training or optimization. Extensive experiment results demonstrate that COVE achieves the start-of-the-art performance in various video editing scenarios, outperforming existing methods both quantitatively and qualitatively. The code will be release at https://github.com/wangjiangshan0725/COVE
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Electronic processes in collisions between nitrogen impurity ions and hydrogen atoms
Authors:
C. C. Jia,
Y. Y. Qi,
J. J. Niu,
Y. Wu J. G. Wang,
A. Dubois,
N. Sisourat,
J. W. Gao
Abstract:
In order to interpret and predict the behavior and properties of fusion plasma, accurate cross sections for electronic processes in collisions between plasma impurities and atomic hydrogen are required. In this work, we investigate the electron capture, target excitation, and ionization processes occurring in collision of ${\rm N}^{4+}$ with atomic hydrogen in a broad energy domain ranging from 0.…
▽ More
In order to interpret and predict the behavior and properties of fusion plasma, accurate cross sections for electronic processes in collisions between plasma impurities and atomic hydrogen are required. In this work, we investigate the electron capture, target excitation, and ionization processes occurring in collision of ${\rm N}^{4+}$ with atomic hydrogen in a broad energy domain ranging from 0.06 to 225 keV/u. We consider ${\rm N}^{4+}$ ground state ${\rm N}^{4+} (2s)$ and also ${\rm N}^{4+} (2p)$ since the impurities in the edge plasma environment may be excited due to collisions with electrons and ions/atoms. Total and partial cross sections in both spin-averaged and spin-resolved cases are calculated using a two-active-electron semiclassical asymptotic-state close-coupling approach. For electron capture cross sections the present results show the best overall agreement with available experimental data for both total and partial cross sections, and the origins of observed discrepancies are discussed. Furthermore, we provide new data for target excitation and ionization processes, which are essential to improve our understanding of this relevant collision system. The International Atomic Energy Agency (IAEA) has recently published a report highlighting the importance and the scarcity of such data. Our work therefore will allow a better modeling and thus understanding of magnetically confined fusion plasma.
△ Less
Submitted 1 July, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Estimating Difficulty Levels of Programming Problems with Pre-trained Model
Authors:
Zhiyuan Wang,
Wei Zhang,
Jun Wang
Abstract:
As the demand for programming skills grows across industries and academia, students often turn to Programming Online Judge (POJ) platforms for coding practice and competition. The difficulty level of each programming problem serves as an essential reference for guiding students' adaptive learning. However, current methods of determining difficulty levels either require extensive expert annotations…
▽ More
As the demand for programming skills grows across industries and academia, students often turn to Programming Online Judge (POJ) platforms for coding practice and competition. The difficulty level of each programming problem serves as an essential reference for guiding students' adaptive learning. However, current methods of determining difficulty levels either require extensive expert annotations or take a long time to accumulate enough student solutions for each problem. To address this issue, we formulate the problem of automatic difficulty level estimation of each programming problem, given its textual description and a solution example of code. For tackling this problem, we propose to couple two pre-trained models, one for text modality and the other for code modality, into a unified model. We built two POJ datasets for the task and the results demonstrate the effectiveness of the proposed approach and the contributions of both modalities.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting
Authors:
Zhengqi Zhao,
Xiaohu Huang,
Hao Zhou,
Kun Yao,
Errui Ding,
Jingdong Wang,
Xinggang Wang,
Wenyu Liu,
Bin Feng
Abstract:
The key to action counting is accurately locating each video's repetitive actions. Instead of estimating the probability of each frame belonging to an action directly, we propose a dual-branch network, i.e., SkimFocusNet, working in a two-step manner. The model draws inspiration from empirical observations indicating that humans typically engage in coarse skimming of entire sequences to grasp the…
▽ More
The key to action counting is accurately locating each video's repetitive actions. Instead of estimating the probability of each frame belonging to an action directly, we propose a dual-branch network, i.e., SkimFocusNet, working in a two-step manner. The model draws inspiration from empirical observations indicating that humans typically engage in coarse skimming of entire sequences to grasp the general action pattern initially, followed by a finer, frame-by-frame focus to determine if it aligns with the target action. Specifically, SkimFocusNet incorporates a skim branch and a focus branch. The skim branch scans the global contextual information throughout the sequence to identify potential target action for guidance. Subsequently, the focus branch utilizes the guidance to diligently identify repetitive actions using a long-short adaptive guidance (LSAG) block. Additionally, we have observed that videos in existing datasets often feature only one type of repetitive action, which inadequately represents real-world scenarios. To more accurately describe real-life situations, we establish the Multi-RepCount dataset, which includes videos containing multiple repetitive motions. On Multi-RepCount, our SkimFoucsNet can perform specified action counting, that is, to enable counting a particular action type by referencing an exemplary video. This capability substantially exhibits the robustness of our method. Extensive experiments demonstrate that SkimFocusNet achieves state-of-the-art performances with significant improvements. We also conduct a thorough ablation study to evaluate the network components. The source code will be published upon acceptance.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
Authors:
Mingwang Xu,
Hui Li,
Qingkun Su,
Hanlin Shang,
Liwei Zhang,
Ce Liu,
Jingdong Wang,
Yao Yao,
Siyu Zhu
Abstract:
The field of portrait image animation, driven by speech audio input, has experienced significant advancements in the generation of realistic and dynamic portraits. This research delves into the complexities of synchronizing facial movements and creating visually appealing, temporally consistent animations within the framework of diffusion-based methodologies. Moving away from traditional paradigms…
▽ More
The field of portrait image animation, driven by speech audio input, has experienced significant advancements in the generation of realistic and dynamic portraits. This research delves into the complexities of synchronizing facial movements and creating visually appealing, temporally consistent animations within the framework of diffusion-based methodologies. Moving away from traditional paradigms that rely on parametric models for intermediate facial representations, our innovative approach embraces the end-to-end diffusion paradigm and introduces a hierarchical audio-driven visual synthesis module to enhance the precision of alignment between audio inputs and visual outputs, encompassing lip, expression, and pose motion. Our proposed network architecture seamlessly integrates diffusion-based generative models, a UNet-based denoiser, temporal alignment techniques, and a reference network. The proposed hierarchical audio-driven visual synthesis offers adaptive control over expression and pose diversity, enabling more effective personalization tailored to different identities. Through a comprehensive evaluation that incorporates both qualitative and quantitative analyses, our approach demonstrates obvious enhancements in image and video quality, lip synchronization precision, and motion diversity. Further visualization and access to the source code can be found at: https://fudan-generative-vision.github.io/hallo.
△ Less
Submitted 16 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
BEVSpread: Spread Voxel Pooling for Bird's-Eye-View Representation in Vision-based Roadside 3D Object Detection
Authors:
Wenjie Wang,
Yehao Lu,
Guangcong Zheng,
Shuigen Zhan,
Xiaoqing Ye,
Zichang Tan,
Jingdong Wang,
Gaoang Wang,
Xi Li
Abstract:
Vision-based roadside 3D object detection has attracted rising attention in autonomous driving domain, since it encompasses inherent advantages in reducing blind spots and expanding perception range. While previous work mainly focuses on accurately estimating depth or height for 2D-to-3D mapping, ignoring the position approximation error in the voxel pooling process. Inspired by this insight, we p…
▽ More
Vision-based roadside 3D object detection has attracted rising attention in autonomous driving domain, since it encompasses inherent advantages in reducing blind spots and expanding perception range. While previous work mainly focuses on accurately estimating depth or height for 2D-to-3D mapping, ignoring the position approximation error in the voxel pooling process. Inspired by this insight, we propose a novel voxel pooling strategy to reduce such error, dubbed BEVSpread. Specifically, instead of bringing the image features contained in a frustum point to a single BEV grid, BEVSpread considers each frustum point as a source and spreads the image features to the surrounding BEV grids with adaptive weights. To achieve superior propagation performance, a specific weight function is designed to dynamically control the decay speed of the weights according to distance and depth. Aided by customized CUDA parallel acceleration, BEVSpread achieves comparable inference time as the original voxel pooling. Extensive experiments on two large-scale roadside benchmarks demonstrate that, as a plug-in, BEVSpread can significantly improve the performance of existing frustum-based BEV methods by a large margin of (1.12, 5.26, 3.01) AP in vehicle, pedestrian and cyclist.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Electric field controlled valley-polarized photocurrent switch based on the circular bulk photovoltaic effect
Authors:
Yaqing Yang,
Xiaoyu Cheng,
Liantuan Xiao,
Suotang Jia,
Jun Chen,
Lei Zhang,
Jian Wang
Abstract:
Efficient electric manipulation of valley degrees of freedom is critical and challenging for the advancement of valley-based information science and technology. We put forth an electrical scheme, based on a two-band Dirac model, that can switch the fully valley-polarized photocurrent between K and K' valleys using the circular bulk electro-photovoltaic effect. This is accomplished by applying an o…
▽ More
Efficient electric manipulation of valley degrees of freedom is critical and challenging for the advancement of valley-based information science and technology. We put forth an electrical scheme, based on a two-band Dirac model, that can switch the fully valley-polarized photocurrent between K and K' valleys using the circular bulk electro-photovoltaic effect. This is accomplished by applying an out-of-plane electric field to the two-dimensional valley materials, which enables continuous tuning of the Berry curvature and its sign flip. We found that the switch of the fully valley-polarized photocurrent is directly tied to the sign change of Berry curvature, which accompanies a topological phase transition, for instance, the quantum spin Hall effect and the quantum valley Hall effect. This scheme has been confirmed in monolayer BiAsI2 and germanene through first-principles calculations. Our paper offers a promising strategy for the development of a volatile valley-addressable memory device and could inspire further research in this area.
△ Less
Submitted 13 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes…
▽ More
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γ$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Exploring Fuzzing as Data Augmentation for Neural Test Generation
Authors:
Yifeng He,
Jicheng Wang,
Yuyang Rong,
Hao Chen
Abstract:
Testing is an essential part of modern software engineering to build reliable programs. As testing the software is important but expensive, automatic test case generation methods have become popular in software development. Unlike traditional search-based coverage-guided test generation like fuzzing, neural test generation backed by large language models can write tests that are semantically meani…
▽ More
Testing is an essential part of modern software engineering to build reliable programs. As testing the software is important but expensive, automatic test case generation methods have become popular in software development. Unlike traditional search-based coverage-guided test generation like fuzzing, neural test generation backed by large language models can write tests that are semantically meaningful and can be understood by other maintainers. However, compared to regular code corpus, unit tests in the datasets are limited in amount and diversity. In this paper, we present a novel data augmentation technique **FuzzAug**, that combines the advantages of fuzzing and large language models. FuzzAug not only keeps valid program semantics in the augmented data, but also provides more diverse inputs to the function under test, helping the model to associate correct inputs embedded with the function's dynamic behaviors with the function under test. We evaluate FuzzAug's benefits by using it on a neural test generation dataset to train state-of-the-art code generation models. By augmenting the training set, our model generates test cases with $11\%$ accuracy increases. Models trained with FuzzAug generate unit test functions with double the branch coverage compared to those without it. FuzzAug can be used across various datasets to train advanced code generation models, enhancing their utility in automated software testing. Our work shows the benefits of using dynamic analysis results to enhance neural test generation. Code and data will be publicly available.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Authors:
Xuehai He,
Weixi Feng,
Kaizhi Zheng,
Yujie Lu,
Wanrong Zhu,
Jiachen Li,
Yue Fan,
Jianfeng Wang,
Linjie Li,
Zhengyuan Yang,
Kevin Lin,
William Yang Wang,
Lijuan Wang,
Xin Eric Wang
Abstract:
Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics. To assess these abilities, we posit videos are the ideal medium, as they encapsulate rich representations of real-world dynamics and causalities. To this end, we introduce MMWorld, a new benchmark for multi-discipline, multi-faceted multi…
▽ More
Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics. To assess these abilities, we posit videos are the ideal medium, as they encapsulate rich representations of real-world dynamics and causalities. To this end, we introduce MMWorld, a new benchmark for multi-discipline, multi-faceted multimodal video understanding. MMWorld distinguishes itself from previous video understanding benchmarks with two unique advantages: (1) multi-discipline, covering various disciplines that often require domain expertise for comprehensive understanding; (2) multi-faceted reasoning, including explanation, counterfactual thinking, future prediction, etc. MMWorld consists of a human-annotated dataset to evaluate MLLMs with questions about the whole videos and a synthetic dataset to analyze MLLMs within a single modality of perception. Together, MMWorld encompasses 1,910 videos across seven broad disciplines and 69 subdisciplines, complete with 6,627 question-answer pairs and associated captions. The evaluation includes 2 proprietary and 10 open-source MLLMs, which struggle on MMWorld (e.g., GPT-4V performs the best with only 52.3\% accuracy), showing large room for improvement. Further ablation studies reveal other interesting findings such as models' different skill sets from humans. We hope MMWorld can serve as an essential step towards world model evaluation in videos.
△ Less
Submitted 13 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
FSH: 3D Representation via Fibonacci Spherical Harmonics
Authors:
Zikuan Li,
Anyi Huang,
Wenru Jia,
Qiaoyun Wu,
Mingqiang Wei,
Jun Wang
Abstract:
Spherical harmonics are a favorable technique for 3D representation, employing a frequency-based approach through the spherical harmonic transform (SHT). Typically, SHT is performed using equiangular sampling grids. However, these grids are non-uniform on spherical surfaces and exhibit local anisotropy, a common limitation in existing spherical harmonic decomposition methods. This paper proposes a…
▽ More
Spherical harmonics are a favorable technique for 3D representation, employing a frequency-based approach through the spherical harmonic transform (SHT). Typically, SHT is performed using equiangular sampling grids. However, these grids are non-uniform on spherical surfaces and exhibit local anisotropy, a common limitation in existing spherical harmonic decomposition methods. This paper proposes a 3D representation method using Fibonacci Spherical Harmonics (FSH). We introduce a spherical Fibonacci grid (SFG), which is more uniform than equiangular grids for SHT in the frequency domain. Our method employs analytical weights for SHT on SFG, effectively assigning sampling errors to spherical harmonic degrees higher than the recovered band-limited function. This provides a novel solution for spherical harmonic transformation on non-equiangular grids. The key advantages of our FSH method include: 1) With the same number of sampling points, SFG captures more features without bias compared to equiangular grids; 2) The root mean square error of 32-degree spherical harmonic coefficients is reduced by approximately 34.6\% for SFG compared to equiangular grids; and 3) FSH offers more stable frequency domain representations, especially for rotating functions. FSH enhances the stability of frequency domain representations under rotational transformations. Its application in 3D shape reconstruction and 3D shape classification results in more accurate and robust representations.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
HiFAST : An HI Data Calibration and Imaging Pipeline for FAST II. Flux Density Calibration
Authors:
Ziming Liu,
Jie Wang,
Yingjie Jing,
Zhi-Yu Zhang,
Chen Xu,
Tiantian Liang,
Qingze Chen,
Ningyu Tang,
Qingliang Yang
Abstract:
Accurate flux density calibration is essential for precise analysis and interpretation of observations across different observation modes and instruments. In this research, we firstly introduce the flux calibration model incorporated in HIFAST pipeline, designed for processing HI 21-cm spectra. Furthermore, we investigate different calibration techniques and assess the dependence of the gain param…
▽ More
Accurate flux density calibration is essential for precise analysis and interpretation of observations across different observation modes and instruments. In this research, we firstly introduce the flux calibration model incorporated in HIFAST pipeline, designed for processing HI 21-cm spectra. Furthermore, we investigate different calibration techniques and assess the dependence of the gain parameter on the time and environmental factors. A comparison is carried out in various observation modes (e.g. tracking and scanning modes) to determine the flux density gain ($G$), revealing insignificant discrepancies in $G$ among different methods. Long-term monitoring data shows a linear correlation between $G$ and atmospheric temperature. After subtracting the $G$--Temperature dependence, the dispersion of $G$ is reduced to $<$3% over a one-year time scale. The stability of the receiver response of FAST is considered sufficient to facilitate HI observations that can accommodate a moderate error in flux calibration (e.g., $>\sim5\%$) when utilizing a constant $G$ for calibration purposes. Our study will serve as a useful addition to the results provided by Jiang et al. (2020). Detailed measurement of $G$ for the 19 beams of FAST, covering the frequency range 1000 MHz -- 1500 MHz can be found on the HIFAST homepage: https://hifast.readthedocs.io/fluxgain.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Light-induced fictitious magnetic fields for quantum storage in cold atomic ensembles
Authors:
Jianmin Wang,
Liang Dong,
Xingchang Wang,
Zihan Zhou,
Ying Zuo,
Georgios A. Siviloglou,
J. F. Chen
Abstract:
In this work, we have demonstrated that optically generated fictitious magnetic fields can be utilized to extend the lifetime of quantum memories in cold atomic ensembles. All the degrees of freedom of an AC Stark shift such as polarization, spatial profile, and temporal waveform can be readily controlled in a precise manner. Temporal fluctuations over several experimental cycles, and spatial inho…
▽ More
In this work, we have demonstrated that optically generated fictitious magnetic fields can be utilized to extend the lifetime of quantum memories in cold atomic ensembles. All the degrees of freedom of an AC Stark shift such as polarization, spatial profile, and temporal waveform can be readily controlled in a precise manner. Temporal fluctuations over several experimental cycles, and spatial inhomogeneities along a cold atomic gas have been compensated by an optical beam. The advantage of the use of fictitious magnetic fields for quantum storage stems from the speed and spatial precision that these fields can be synthesized. Our simple and versatile technique can find widespread application in coherent pulse and single-photon storage in any atomic species.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Observation of $η_{c}$(1S, 2S) and $χ_{cJ}$ decays to 2$(π^{+}π^{-})η$ via $ψ$(3686) radiative transitions
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (636 additional authors not shown)
Abstract:
Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measur…
▽ More
Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measured in both destructive and constructive interference scenarios for the first time. The mass and width of the $η_{c}(1S)$ are measured to be $M=(2984.14 \pm 0.13 \pm 0.38)$ MeV/$c^{2}$ and $Γ=(28.82 \pm 0.11 \pm 0.82)$ MeV, respectively. Clear signals for the decays of the $χ_{cJ}(J=0,1,2)$ and the $η_{c}(2S)$ to $2(π^{+}π^{-})η$ are also observed for the first time, and the corresponding branching fractions are measured. The ratio of the branching fractions between the $η_{c}(2S)$ and $η_{c}(1S)$ decays is significantly lower than the theoretical prediction, which might suggest different dynamics in their decays.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields with Fine-Grained Understanding
Authors:
Yinan Deng,
Jiahui Wang,
Jingyu Zhao,
Jianyu Dou,
Yi Yang,
Yufeng Yue
Abstract:
In recent years, there has been a surge of interest in open-vocabulary 3D scene reconstruction facilitated by visual language models (VLMs), which showcase remarkable capabilities in open-set retrieval. However, existing methods face some limitations: they either focus on learning point-wise features, resulting in blurry semantic understanding, or solely tackle object-level reconstruction, thereby…
▽ More
In recent years, there has been a surge of interest in open-vocabulary 3D scene reconstruction facilitated by visual language models (VLMs), which showcase remarkable capabilities in open-set retrieval. However, existing methods face some limitations: they either focus on learning point-wise features, resulting in blurry semantic understanding, or solely tackle object-level reconstruction, thereby overlooking the intricate details of the object's interior. To address these challenges, we introduce OpenObj, an innovative approach to build open-vocabulary object-level Neural Radiance Fields (NeRF) with fine-grained understanding. In essence, OpenObj establishes a robust framework for efficient and watertight scene modeling and comprehension at the object-level. Moreover, we incorporate part-level features into the neural fields, enabling a nuanced representation of object interiors. This approach captures object-level instances while maintaining a fine-grained understanding. The results on multiple datasets demonstrate that OpenObj achieves superior performance in zero-shot semantic segmentation and retrieval tasks. Additionally, OpenObj supports real-world robotics tasks at multiple scales, including global movement and local manipulation.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Semantic-Aware Resource Allocation Based on Deep Reinforcement Learning for 5G-V2X HetNets
Authors:
Zhiyu Shao,
Qiong Wu,
Pingyi Fan,
Nan Cheng,
Qiang Fan,
Jiangzhou Wang
Abstract:
This letter proposes a semantic-aware resource allocation (SARA) framework with flexible duty cycle (DC) coexistence mechanism (SARADC) for 5G-V2X Heterogeneous Network (HetNets) based on deep reinforcement learning (DRL) proximal policy optimization (PPO). Specifically, we investigate V2X networks within a two-tiered HetNets structure. In response to the needs of high-speed vehicular networking i…
▽ More
This letter proposes a semantic-aware resource allocation (SARA) framework with flexible duty cycle (DC) coexistence mechanism (SARADC) for 5G-V2X Heterogeneous Network (HetNets) based on deep reinforcement learning (DRL) proximal policy optimization (PPO). Specifically, we investigate V2X networks within a two-tiered HetNets structure. In response to the needs of high-speed vehicular networking in urban environments, we design a semantic communication system and introduce two resource allocation metrics: high-speed semantic transmission rate (HSR) and semantic spectrum efficiency (HSSE). Our main goal is to maximize HSSE. Additionally, we address the coexistence of vehicular users and WiFi users in 5G New Radio Unlicensed (NR-U) networks. To tackle this complex challenge, we propose a novel approach that jointly optimizes flexible DC coexistence mechanism and the allocation of resources and base stations (BSs). Unlike traditional bit transmission methods, our approach integrates the semantic communication paradigm into the communication system. Experimental results demonstrate that our proposed solution outperforms traditional bit transmission methods with traditional DC coexistence mechanism in terms of HSSE and semantic throughput (ST) for both vehicular and WiFi users.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Proposal for realizing and probing topological crystalline insulators in optical lattices
Authors:
Jing-Xin Liu,
Jian-Te Wang,
Shi-Liang Zhu
Abstract:
We develop a lattice model which exhibits topological transitions from $Z_2$ topological insulators to mirror symmetry-protected topological crystalline insulators by introducing additional spin-orbit coupling terms. The topological phase is characterized by the mirror winding number, defined within the mirror symmetry invariant subspace, which ensures the protection of gapless edge states and zer…
▽ More
We develop a lattice model which exhibits topological transitions from $Z_2$ topological insulators to mirror symmetry-protected topological crystalline insulators by introducing additional spin-orbit coupling terms. The topological phase is characterized by the mirror winding number, defined within the mirror symmetry invariant subspace, which ensures the protection of gapless edge states and zero-energy corner states under specific boundary conditions. Additionally, we propose a feasible scheme using ultracold atoms confined in a stacked hexagonal optical lattice with Raman fields to realize the two-dimensional topological crystalline insulators. Detection of the mirror winding number in these systems can be achieved by implementing a simple quench sequence and observing the evolution of the time-of-flight patterns.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Dynamic Energy-Saving Design for Double-Faced Active RIS Assisted Communications with Perfect/Imperfect CSI
Authors:
Yang Cao,
Wenchi Cheng,
Jingqing Wang,
Wei Zhang
Abstract:
Although the emerging reconfigurable intelligent surface (RIS) paves a new way for next-generation wireless communications, it suffers from inherent flaws, i.e., double-fading attenuation effects and half-space coverage limitations. The state-of-the-art double-face active (DFA)-RIS architecture is proposed for significantly amplifying and transmitting incident signals in full-space. Despite the ef…
▽ More
Although the emerging reconfigurable intelligent surface (RIS) paves a new way for next-generation wireless communications, it suffers from inherent flaws, i.e., double-fading attenuation effects and half-space coverage limitations. The state-of-the-art double-face active (DFA)-RIS architecture is proposed for significantly amplifying and transmitting incident signals in full-space. Despite the efficacy of DFA-RIS in mitigating the aforementioned flaws, its potential drawback is that the complex active hardware also incurs intolerable energy consumption. To overcome this drawback, in this paper we propose a novel dynamic energy-saving design for the DFA-RIS, called the sub-array based DFA-RIS architecture. This architecture divides the DFA-RIS into multiple sub-arrays, where the signal amplification function in each sub-array can be activated/deactivated dynamically and flexibly. Utilizing the above architecture, we develop the joint optimization scheme based on transmit beamforming, DFA-RIS configuration, and reflection amplifier (RA) operating pattern to maximize the energy efficiency (EE) of the DFA-RIS assisted multiuser MISO system considering the perfect/imperfect channel state information (CSI) case. Then, the penalty dual decomposition (PDD) based alternating optimization (AO) algorithm and the constrained stochastic majorization-minimization (CSMM) based AO algorithm address non-convex problems in the perfect/imperfect CSI case, respectively. Simulation results verified that our proposed sub-array based DFA-RIS architecture can benefit the EE of the system more than other RIS architectures.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Authors:
Chenyu Yang,
Xizhou Zhu,
Jinguo Zhu,
Weijie Su,
Junjie Wang,
Xuan Dong,
Wenhai Wang,
Lewei Lu,
Bin Li,
Jie Zhou,
Yu Qiao,
Jifeng Dai
Abstract:
Recently, vision model pre-training has evolved from relying on manually annotated datasets to leveraging large-scale, web-crawled image-text data. Despite these advances, there is no pre-training method that effectively exploits the interleaved image-text data, which is very prevalent on the Internet. Inspired by the recent success of compression learning in natural language processing, we propos…
▽ More
Recently, vision model pre-training has evolved from relying on manually annotated datasets to leveraging large-scale, web-crawled image-text data. Despite these advances, there is no pre-training method that effectively exploits the interleaved image-text data, which is very prevalent on the Internet. Inspired by the recent success of compression learning in natural language processing, we propose a novel vision model pre-training method called Latent Compression Learning (LCL) for interleaved image-text data. This method performs latent compression learning by maximizing the mutual information between the inputs and outputs of a causal attention model. The training objective can be decomposed into two basic tasks: 1) contrastive learning between visual representation and preceding context, and 2) generating subsequent text based on visual representation. Our experiments demonstrate that our method not only matches the performance of CLIP on paired pre-training datasets (e.g., LAION), but can also leverage interleaved pre-training data (e.g., MMC4) to learn robust visual representation from scratch, showcasing the potential of vision model pre-training with interleaved image-text data. Code is released at https://github.com/OpenGVLab/LCL.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Few-Body Quantum Chaos, Localization, and Multi-Photon Entanglement in Optical Synthetic Frequency Dimension
Authors:
Junlin Wang,
Luojia Wang,
Jinlou Ma,
Ang Yang,
Luqi Yuan,
Lei Ying
Abstract:
Generation and control of entanglement are fundamental tasks in quantum information processing. In this paper, we propose a novel approach to generate controllable frequency-entangled photons by using the concept of synthetic frequency dimension in an optical system. Such a system consists of a ring resonator made by a tailored third-order nonlinear media to induce photon-photon interactions and a…
▽ More
Generation and control of entanglement are fundamental tasks in quantum information processing. In this paper, we propose a novel approach to generate controllable frequency-entangled photons by using the concept of synthetic frequency dimension in an optical system. Such a system consists of a ring resonator made by a tailored third-order nonlinear media to induce photon-photon interactions and a periodic modulator to manipulate coupling between different frequency modes. We show this system provides a unique platform for the exploration of distinct few- or many-body quantum phases including chaos, localization, and integrability in a highly integrable photonics platform. In particular, we develop the potential experimental method to calculate the spectral form factor, which characterizes the degree of chaos in the system and differentiates between these phases based on observable measurements. Interestingly, the transition signatures of each phase can lead to an efficient generation of frequency-entangled multi photons. This work is the first to explore rich and controllable quantum phases beyond single particle in a synthetic dimension.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Rethinking the impact of noisy labels in graph classification: A utility and privacy perspective
Authors:
De Li,
Xianxian Li,
Zeming Gan,
Qiyu Li,
Bin Qu,
Jinyan Wang
Abstract:
Graph neural networks based on message-passing mechanisms have achieved advanced results in graph classification tasks. However, their generalization performance degrades when noisy labels are present in the training data. Most existing noisy labeling approaches focus on the visual domain or graph node classification tasks and analyze the impact of noisy labels only from a utility perspective. Unl…
▽ More
Graph neural networks based on message-passing mechanisms have achieved advanced results in graph classification tasks. However, their generalization performance degrades when noisy labels are present in the training data. Most existing noisy labeling approaches focus on the visual domain or graph node classification tasks and analyze the impact of noisy labels only from a utility perspective. Unlike existing work, in this paper, we measure the effects of noise labels on graph classification from data privacy and model utility perspectives. We find that noise labels degrade the model's generalization performance and enhance the ability of membership inference attacks on graph data privacy. To this end, we propose the robust graph neural network approach with noisy labeled graph classification. Specifically, we first accurately filter the noisy samples by high-confidence samples and the first feature principal component vector of each class. Then, the robust principal component vectors and the model output under data augmentation are utilized to achieve noise label correction guided by dual spatial information. Finally, supervised graph contrastive learning is introduced to enhance the embedding quality of the model and protect the privacy of the training graph data. The utility and privacy of the proposed method are validated by comparing twelve different methods on eight real graph classification datasets. Compared with the state-of-the-art methods, the RGLC method achieves at most and at least 7.8% and 0.8% performance gain at 30% noisy labeling rate, respectively, and reduces the accuracy of privacy attacks to below 60%.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Semantic-Aware Spectrum Sharing in Internet of Vehicles Based on Deep Reinforcement Learning
Authors:
Zhiyu Shao,
Qiong Wu,
Pingyi Fan,
Nan Cheng,
Wen Chen,
Jiangzhou Wang,
Khaled B. Letaief
Abstract:
This work aims to investigate semantic communication in high-speed mobile Internet of vehicles (IoV) environments, with a focus on the spectrum sharing between vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communications. We specifically address spectrum scarcity and network traffic and then propose a semantic-aware spectrum sharing algorithm (SSS) based on the deep reinforcement le…
▽ More
This work aims to investigate semantic communication in high-speed mobile Internet of vehicles (IoV) environments, with a focus on the spectrum sharing between vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communications. We specifically address spectrum scarcity and network traffic and then propose a semantic-aware spectrum sharing algorithm (SSS) based on the deep reinforcement learning (DRL) soft actor-critic (SAC) approach. Firstly, we delve into the extraction of semantic information. Secondly, we redefine metrics for semantic information in V2V and V2I spectrum sharing in IoV environments, introducing high-speed semantic spectrum efficiency (HSSE) and semantic transmission rate (HSR). Finally, we employ the SAC algorithm for decision optimization in V2V and V2I spectrum sharing based on semantic information. This optimization encompasses the optimal link of V2V and V2I sharing strategies, the transmission power for vehicles sending semantic information and the length of transmitted semantic symbols, aiming at maximizing HSSE of V2I and enhancing success rate of effective semantic information transmission (SRS) of V2V. Experimental results demonstrate that the SSS algorithm outperforms other baseline algorithms, including other traditional-communication-based spectrum sharing algorithms and spectrum sharing algorithm using other reinforcement learning approaches. The SSS algorithm exhibits a 15% increase in HSSE and approximately a 7% increase in SRS.
△ Less
Submitted 17 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
Convergence of bi-spatial pullback random attractors and stochastic Liouville type equations for nonautonomous stochastic p-Laplacian lattice system
Authors:
Jintao Wang,
Qinghai Peng,
Chunqiu Li
Abstract:
We consider convergence properties of the long-term behaviors with respect to the coefficient of the stochastic term for a nonautonomous stochastic $p$-Laplacian lattice equation with multiplicative noise. First, the upper semi-continuity of pullback random $(\ell^2,\ell^q)$-attractor is proved for each $q\in[1,+\infty)$. Then, a convergence result of the time-dependent invariant sample Borel prob…
▽ More
We consider convergence properties of the long-term behaviors with respect to the coefficient of the stochastic term for a nonautonomous stochastic $p$-Laplacian lattice equation with multiplicative noise. First, the upper semi-continuity of pullback random $(\ell^2,\ell^q)$-attractor is proved for each $q\in[1,+\infty)$. Then, a convergence result of the time-dependent invariant sample Borel probability measures is obtained in $\ell^2$. Next, we show that the invariant sample measures satisfy a stochastic Liouville type equation and a termwise convergence of the stochastic Liouville type equations is verified. Furthermore, each family of the invariant sample measures is turned out to be a sample statistical solution, which hence also fulfills a convergence consequence.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents
Authors:
Wenjia Xu,
Zijian Yu,
Yixu Wang,
Jiuniu Wang,
Mugen Peng
Abstract:
An increasing number of models have achieved great performance in remote sensing tasks with the recent development of Large Language Models (LLMs) and Visual Language Models (VLMs). However, these models are constrained to basic vision and language instruction-tuning tasks, facing challenges in complex remote sensing applications. Additionally, these models lack specialized expertise in profession…
▽ More
An increasing number of models have achieved great performance in remote sensing tasks with the recent development of Large Language Models (LLMs) and Visual Language Models (VLMs). However, these models are constrained to basic vision and language instruction-tuning tasks, facing challenges in complex remote sensing applications. Additionally, these models lack specialized expertise in professional domains. To address these limitations, we propose a LLM-driven remote sensing intelligent agent named RS-Agent. Firstly, RS-Agent is powered by a large language model (LLM) that acts as its "Central Controller," enabling it to understand and respond to various problems intelligently. Secondly, our RS-Agent integrates many high-performance remote sensing image processing tools, facilitating multi-tool and multi-turn conversations. Thirdly, our RS-Agent can answer professional questions by leveraging robust knowledge documents. We conducted experiments using several datasets, e.g., RSSDIVCS, RSVQA, and DOTAv1. The experimental results demonstrate that our RS-Agent delivers outstanding performance in many tasks, i.e., scene classification, visual question answering, and object counting tasks.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Bayesian inference of nuclear incompressibility from proton elliptic flow in central Au+Au collisions at 400 MeV/nucleon
Authors:
J. M. Wang,
X. G. Deng,
W. J. Xie,
B. A. Li,
Y. G. Ma
Abstract:
The incompressibility $K$ of symmetric nuclear matter (SNM) is inferred in a Bayesian analysis of proton elliptic flow in mid-central Au + Au collisions at $E = 400$ MeV/nucleon using a Gaussian process (GP) emulator of the isospin-dependent quantum molecular dynamics (IQMD) model for heavy-ion collisions, with or without considering the momentum dependence of single-nucleon potentials. Consistent…
▽ More
The incompressibility $K$ of symmetric nuclear matter (SNM) is inferred in a Bayesian analysis of proton elliptic flow in mid-central Au + Au collisions at $E = 400$ MeV/nucleon using a Gaussian process (GP) emulator of the isospin-dependent quantum molecular dynamics (IQMD) model for heavy-ion collisions, with or without considering the momentum dependence of single-nucleon potentials. Consistent but with smaller quantified uncertainties than previous results from forward modeling of the collective flow in heavy-ion collisions using IQMD, considering the momentum dependence of nucleon potentials, $K=191.3^{+3.7}_{-6.3}$ MeV at 68\% confidence level, indicating a very soft SNM equation of state, is inferred from the combined data of the rapidity and transverse momentum dependence of the proton elliptic flow in the Au+Au collisions considered. Ignoring the momentum dependence of single-nucleon potentials, the extracted value for $K$ is $234.7^{+14.6}_{-11.4}$ MeV, in agreement with its fiducial value derived from giant resonance studies.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
DualMamba: A Lightweight Spectral-Spatial Mamba-Convolution Network for Hyperspectral Image Classification
Authors:
Jiamu Sheng,
Jingyi Zhou,
Jiong Wang,
Peng Ye,
Jiayuan Fan
Abstract:
The effectiveness and efficiency of modeling complex spectral-spatial relations are both crucial for Hyperspectral image (HSI) classification. Most existing methods based on CNNs and transformers still suffer from heavy computational burdens and have room for improvement in capturing the global-local spectral-spatial feature representation. To this end, we propose a novel lightweight parallel desi…
▽ More
The effectiveness and efficiency of modeling complex spectral-spatial relations are both crucial for Hyperspectral image (HSI) classification. Most existing methods based on CNNs and transformers still suffer from heavy computational burdens and have room for improvement in capturing the global-local spectral-spatial feature representation. To this end, we propose a novel lightweight parallel design called lightweight dual-stream Mamba-convolution network (DualMamba) for HSI classification. Specifically, a parallel lightweight Mamba and CNN block are first developed to extract global and local spectral-spatial features. First, the cross-attention spectral-spatial Mamba module is proposed to leverage the global modeling of Mamba at linear complexity. Within this module, dynamic positional embedding is designed to enhance the spatial location information of visual sequences. The lightweight spectral/spatial Mamba blocks comprise an efficient scanning strategy and a lightweight Mamba design to efficiently extract global spectral-spatial features. And the cross-attention spectral-spatial fusion is designed to learn cross-correlation and fuse spectral-spatial features. Second, the lightweight spectral-spatial residual convolution module is proposed with lightweight spectral and spatial branches to extract local spectral-spatial features through residual learning. Finally, the adaptive global-local fusion is proposed to dynamically combine global Mamba features and local convolution features for a global-local spectral-spatial representation. Compared with state-of-the-art HSI classification methods, experimental results demonstrate that DualMamba achieves significant classification accuracy on three public HSI datasets and a superior reduction in model parameters and floating point operations (FLOPs).
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Sensitivity Analysis for the Test-Negative Design
Authors:
Soumyabrata Kundu,
Peng Ding,
Xinran Li,
Jingshu Wang
Abstract:
The test-negative design has become popular for evaluating the effectiveness of post-licensure vaccines using observational data. In addition to its logistical convenience on data collection, the design is also believed to control for the differential health-care-seeking behavior between vaccinated and unvaccinated individuals, which is an important while often unmeasured confounder between the va…
▽ More
The test-negative design has become popular for evaluating the effectiveness of post-licensure vaccines using observational data. In addition to its logistical convenience on data collection, the design is also believed to control for the differential health-care-seeking behavior between vaccinated and unvaccinated individuals, which is an important while often unmeasured confounder between the vaccination and infection. Hence, the design has been employed routinely to monitor seasonal flu vaccines and more recently to measure the COVID-19 vaccine effectiveness. Despite its popularity, the design has been questioned, in particular about its ability to fully control for the unmeasured confounding. In this paper, we explore deviations from a perfect test-negative design, and propose various sensitivity analysis methods for estimating the effect of vaccination measured by the causal odds ratio on the subpopulation of individuals with good health-care-seeking behavior. We start with point identification of the causal odds ratio under a test-negative design, considering two forms of assumptions on the unmeasured confounder. These assumptions then lead to two approaches for conducting sensitivity analysis, addressing the influence of the unmeasured confounding in different ways. Specifically, one approach investigates partial control for unmeasured confounder in the test-negative design, while the other examines the impact of unmeasured confounder on both vaccination and infection. Furthermore, these approaches can be combined to provide narrower bounds on the true causal odds ratio, and can be further extended to sharpen the bounds by restricting the treatment effect heterogeneity. Finally, we apply the proposed methods to evaluate the effectiveness of COVID-19 vaccines using observational data from test-negative designs.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Cross-domain-aware Worker Selection with Training for Crowdsourced Annotation
Authors:
Yushi Sun,
Jiachuan Wang,
Peng Cheng,
Libin Zheng,
Lei Chen,
Jian Yin
Abstract:
Annotation through crowdsourcing draws incremental attention, which relies on an effective selection scheme given a pool of workers. Existing methods propose to select workers based on their performance on tasks with ground truth, while two important points are missed. 1) The historical performances of workers in other tasks. In real-world scenarios, workers need to solve a new task whose correlat…
▽ More
Annotation through crowdsourcing draws incremental attention, which relies on an effective selection scheme given a pool of workers. Existing methods propose to select workers based on their performance on tasks with ground truth, while two important points are missed. 1) The historical performances of workers in other tasks. In real-world scenarios, workers need to solve a new task whose correlation with previous tasks is not well-known before the training, which is called cross-domain. 2) The dynamic worker performance as workers will learn from the ground truth. In this paper, we consider both factors in designing an allocation scheme named cross-domain-aware worker selection with training approach. Our approach proposes two estimation modules to both statistically analyze the cross-domain correlation and simulate the learning gain of workers dynamically. A framework with a theoretical analysis of the worker elimination process is given. To validate the effectiveness of our methods, we collect two novel real-world datasets and generate synthetic datasets. The experiment results show that our method outperforms the baselines on both real-world and synthetic datasets.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation
Authors:
Yuanhao Zhai,
Kevin Lin,
Zhengyuan Yang,
Linjie Li,
Jianfeng Wang,
Chung-Ching Lin,
David Doermann,
Junsong Yuan,
Lijuan Wang
Abstract:
Image diffusion distillation achieves high-fidelity generation with very few sampling steps. However, applying these techniques directly to video diffusion often results in unsatisfactory frame quality due to the limited visual quality in public video datasets. This affects the performance of both teacher and student video diffusion models. Our study aims to improve video diffusion distillation wh…
▽ More
Image diffusion distillation achieves high-fidelity generation with very few sampling steps. However, applying these techniques directly to video diffusion often results in unsatisfactory frame quality due to the limited visual quality in public video datasets. This affects the performance of both teacher and student video diffusion models. Our study aims to improve video diffusion distillation while improving frame appearance using abundant high-quality image data. We propose motion consistency model (MCM), a single-stage video diffusion distillation method that disentangles motion and appearance learning. Specifically, MCM includes a video consistency model that distills motion from the video teacher model, and an image discriminator that enhances frame appearance to match high-quality image data. This combination presents two challenges: (1) conflicting frame learning objectives, as video distillation learns from low-quality video frames while the image discriminator targets high-quality images; and (2) training-inference discrepancies due to the differing quality of video samples used during training and inference. To address these challenges, we introduce disentangled motion distillation and mixed trajectory distillation. The former applies the distillation objective solely to the motion representation, while the latter mitigates training-inference discrepancies by mixing distillation trajectories from both the low- and high-quality video domains. Extensive experiments show that our MCM achieves the state-of-the-art video diffusion distillation performance. Additionally, our method can enhance frame quality in video diffusion models, producing frames with high aesthetic scores or specific styles without corresponding video data.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction
Authors:
Jikai Wang,
Qifan Zhang,
Yu-Wei Chao,
Bowen Wen,
Xiaohu Guo,
Yu Xiang
Abstract:
We introduce a data capture system and a new dataset named HO-Cap that can be used to study 3D reconstruction and pose tracking of hands and objects in videos. The capture system uses multiple RGB-D cameras and a HoloLens headset for data collection, avoiding the use of expensive 3D scanners or mocap systems. We propose a semi-automatic method to obtain annotations of shape and pose of hands and o…
▽ More
We introduce a data capture system and a new dataset named HO-Cap that can be used to study 3D reconstruction and pose tracking of hands and objects in videos. The capture system uses multiple RGB-D cameras and a HoloLens headset for data collection, avoiding the use of expensive 3D scanners or mocap systems. We propose a semi-automatic method to obtain annotations of shape and pose of hands and objects in the collected videos, which significantly reduces the required annotation time compared to manual labeling. With this system, we captured a video dataset of humans using objects to perform different tasks, as well as simple pick-and-place and handover of an object from one hand to the other, which can be used as human demonstrations for embodied AI and robot manipulation research. Our data capture setup and annotation framework can be used by the community to reconstruct 3D shapes of objects and human hands and track their poses in videos.
△ Less
Submitted 16 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
EAVE: Efficient Product Attribute Value Extraction via Lightweight Sparse-layer Interaction
Authors:
Li Yang,
Qifan Wang,
Jianfeng Chi,
Jiahao Liu,
Jingang Wang,
Fuli Feng,
Zenglin Xu,
Yi Fang,
Lifu Huang,
Dongfang Liu
Abstract:
Product attribute value extraction involves identifying the specific values associated with various attributes from a product profile. While existing methods often prioritize the development of effective models to improve extraction performance, there has been limited emphasis on extraction efficiency. However, in real-world scenarios, products are typically associated with multiple attributes, ne…
▽ More
Product attribute value extraction involves identifying the specific values associated with various attributes from a product profile. While existing methods often prioritize the development of effective models to improve extraction performance, there has been limited emphasis on extraction efficiency. However, in real-world scenarios, products are typically associated with multiple attributes, necessitating multiple extractions to obtain all corresponding values. In this work, we propose an Efficient product Attribute Value Extraction (EAVE) approach via lightweight sparse-layer interaction. Specifically, we employ a heavy encoder to separately encode the product context and attribute. The resulting non-interacting heavy representations of the context can be cached and reused for all attributes. Additionally, we introduce a light encoder to jointly encode the context and the attribute, facilitating lightweight interactions between them. To enrich the interaction within the lightweight encoder, we design a sparse-layer interaction module to fuse the non-interacting heavy representation into the lightweight encoder. Comprehensive evaluation on two benchmarks demonstrate that our method achieves significant efficiency gains with neutral or marginal loss in performance when the context is long and number of attributes is large. Our code is available \href{https://anonymous.4open.science/r/EAVE-EA18}{here}.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications
Authors:
Junlin Wang,
Tianyi Yang,
Roy Xie,
Bhuwan Dhingra
Abstract:
With the proliferation of LLM-integrated applications such as GPT-s, millions are deployed, offering valuable services through proprietary instruction prompts. These systems, however, are prone to prompt extraction attacks through meticulously designed queries. To help mitigate this problem, we introduce the Raccoon benchmark which comprehensively evaluates a model's susceptibility to prompt extra…
▽ More
With the proliferation of LLM-integrated applications such as GPT-s, millions are deployed, offering valuable services through proprietary instruction prompts. These systems, however, are prone to prompt extraction attacks through meticulously designed queries. To help mitigate this problem, we introduce the Raccoon benchmark which comprehensively evaluates a model's susceptibility to prompt extraction attacks. Our novel evaluation method assesses models under both defenseless and defended scenarios, employing a dual approach to evaluate the effectiveness of existing defenses and the resilience of the models. The benchmark encompasses 14 categories of prompt extraction attacks, with additional compounded attacks that closely mimic the strategies of potential attackers, alongside a diverse collection of defense templates. This array is, to our knowledge, the most extensive compilation of prompt theft attacks and defense mechanisms to date. Our findings highlight universal susceptibility to prompt theft in the absence of defenses, with OpenAI models demonstrating notable resilience when protected. This paper aims to establish a more systematic benchmark for assessing LLM robustness against prompt extraction attacks, offering insights into their causes and potential countermeasures. Resources of Raccoon are publicly available at https://github.com/M0gician/RaccoonBench.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Long-Term Fairness Inquiries and Pursuits in Machine Learning: A Survey of Notions, Methods, and Challenges
Authors:
Usman Gohar,
Zeyu Tang,
Jialu Wang,
Kun Zhang,
Peter L. Spirtes,
Yang Liu,
Lu Cheng
Abstract:
The widespread integration of Machine Learning systems in daily life, particularly in high-stakes domains, has raised concerns about the fairness implications. While prior works have investigated static fairness measures, recent studies reveal that automated decision-making has long-term implications and that off-the-shelf fairness approaches may not serve the purpose of achieving long-term fairne…
▽ More
The widespread integration of Machine Learning systems in daily life, particularly in high-stakes domains, has raised concerns about the fairness implications. While prior works have investigated static fairness measures, recent studies reveal that automated decision-making has long-term implications and that off-the-shelf fairness approaches may not serve the purpose of achieving long-term fairness. Additionally, the existence of feedback loops and the interaction between models and the environment introduces additional complexities that may deviate from the initial fairness goals. In this survey, we review existing literature on long-term fairness from different perspectives and present a taxonomy for long-term fairness studies. We highlight key challenges and consider future research directions, analyzing both current issues and potential further explorations.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies
Authors:
Junlin Wang,
Siddhartha Jain,
Dejiao Zhang,
Baishakhi Ray,
Varun Kumar,
Ben Athiwaratkun
Abstract:
A diverse array of reasoning strategies has been proposed to elicit the capabilities of large language models. However, in this paper, we point out that traditional evaluations which focus solely on performance metrics miss a key factor: the increased effectiveness due to additional compute. By overlooking this aspect, a skewed view of strategy efficiency is often presented. This paper introduces…
▽ More
A diverse array of reasoning strategies has been proposed to elicit the capabilities of large language models. However, in this paper, we point out that traditional evaluations which focus solely on performance metrics miss a key factor: the increased effectiveness due to additional compute. By overlooking this aspect, a skewed view of strategy efficiency is often presented. This paper introduces a framework that incorporates the compute budget into the evaluation, providing a more informative comparison that takes into account both performance metrics and computational cost. In this budget-aware perspective, we find that complex reasoning strategies often don't surpass simpler baselines purely due to algorithmic ingenuity, but rather due to the larger computational resources allocated. When we provide a simple baseline like chain-of-thought self-consistency with comparable compute resources, it frequently outperforms reasoning strategies proposed in the literature. In this scale-aware perspective, we find that unlike self-consistency, certain strategies such as multi-agent debate or Reflexion can become worse if more compute budget is utilized.
△ Less
Submitted 14 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
Can I understand what I create? Self-Knowledge Evaluation of Large Language Models
Authors:
Zhiquan Tan,
Lai Wei,
Jindong Wang,
Xing Xie,
Weiran Huang
Abstract:
Large language models (LLMs) have achieved remarkable progress in linguistic tasks, necessitating robust evaluation frameworks to understand their capabilities and limitations. Inspired by Feynman's principle of understanding through creation, we introduce a self-knowledge evaluation framework that is easy to implement, evaluating models on their ability to comprehend and respond to self-generated…
▽ More
Large language models (LLMs) have achieved remarkable progress in linguistic tasks, necessitating robust evaluation frameworks to understand their capabilities and limitations. Inspired by Feynman's principle of understanding through creation, we introduce a self-knowledge evaluation framework that is easy to implement, evaluating models on their ability to comprehend and respond to self-generated questions. Our findings, based on testing multiple models across diverse tasks, reveal significant gaps in the model's self-knowledge ability. Further analysis indicates these gaps may be due to misalignment with human attention mechanisms. Additionally, fine-tuning on self-generated math task may enhance the model's math performance, highlighting the potential of the framework for efficient and insightful model evaluation and may also contribute to the improvement of LLMs.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Strong and weak $CP$ tests in sequential decays of polarized $Σ^0$ hyperons
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The wea…
▽ More
The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The weak-$CP$ test is performed in the subsequent decays of their daughter particles $Λ$ and $\barΛ$. Also for the first time, the transverse polarizations of the $Σ^0$ hyperons in $J/ψ$ and $ψ(3686)$ decays are observed with opposite directions, and the ratios between the S-wave and D-wave contributions of the $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ decays are obtained. These results are crucial to understand the decay dynamics of the charmonium states and the production mechanism of the $Σ^0-\barΣ^0$ pairs.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Recurrent Context Compression: Efficiently Expanding the Context Window of LLM
Authors:
Chensen Huang,
Guibo Zhu,
Xuepeng Wang,
Yifei Luo,
Guojing Ge,
Haoran Chen,
Dong Yi,
Jinqiao Wang
Abstract:
To extend the context length of Transformer-based large language models (LLMs) and improve comprehension capabilities, we often face limitations due to computational resources and bounded memory storage capacity. This work introduces a method called Recurrent Context Compression (RCC), designed to efficiently expand the context window length of LLMs within constrained storage space. We also invest…
▽ More
To extend the context length of Transformer-based large language models (LLMs) and improve comprehension capabilities, we often face limitations due to computational resources and bounded memory storage capacity. This work introduces a method called Recurrent Context Compression (RCC), designed to efficiently expand the context window length of LLMs within constrained storage space. We also investigate the issue of poor model responses when both instructions and context are compressed in downstream tasks, and propose an instruction reconstruction method to mitigate this problem. We validated the effectiveness of our approach on multiple tasks, achieving a compression rate of up to 32x on text reconstruction tasks with a BLEU4 score close to 0.95, and nearly 100\% accuracy on a passkey retrieval task with a sequence length of 1M. Finally, our method demonstrated competitive performance in long-text question-answering tasks compared to non-compressed methods, while significantly saving storage resources in long-text inference tasks. Our code, models, and demo are available at https://github.com/WUHU-G/RCC_Transformer
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
GAIA: Rethinking Action Quality Assessment for AI-Generated Videos
Authors:
Zijian Chen,
Wei Sun,
Yuan Tian,
Jun Jia,
Zicheng Zhang,
Jiarui Wang,
Ru Huang,
Xiongkuo Min,
Guangtao Zhai,
Wenjun Zhang
Abstract:
Assessing action quality is both imperative and challenging due to its significant impact on the quality of AI-generated videos, further complicated by the inherently ambiguous nature of actions within AI-generated video (AIGV). Current action quality assessment (AQA) algorithms predominantly focus on actions from real specific scenarios and are pre-trained with normative action features, thus ren…
▽ More
Assessing action quality is both imperative and challenging due to its significant impact on the quality of AI-generated videos, further complicated by the inherently ambiguous nature of actions within AI-generated video (AIGV). Current action quality assessment (AQA) algorithms predominantly focus on actions from real specific scenarios and are pre-trained with normative action features, thus rendering them inapplicable in AIGVs. To address these problems, we construct GAIA, a Generic AI-generated Action dataset, by conducting a large-scale subjective evaluation from a novel causal reasoning-based perspective, resulting in 971,244 ratings among 9,180 video-action pairs. Based on GAIA, we evaluate a suite of popular text-to-video (T2V) models on their ability to generate visually rational actions, revealing their pros and cons on different categories of actions. We also extend GAIA as a testbed to benchmark the AQA capacity of existing automatic evaluation methods. Results show that traditional AQA methods, action-related metrics in recent T2V benchmarks, and mainstream video quality methods correlate poorly with human opinions, indicating a sizable gap between current models and human action perception patterns in AIGVs. Our findings underscore the significance of action quality as a unique perspective for studying AIGVs and can catalyze progress towards methods with enhanced capacities for AQA in AIGVs.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Enabling Large-Scale and High-Precision Fluid Simulations on Near-Term Quantum Computers
Authors:
Zhao-Yun Chen,
Teng-Yang Ma,
Chuang-Chao Ye,
Liang Xu,
Ming-Yang Tan,
Xi-Ning Zhuang,
Xiao-Fan Xu,
Yun-Jie Wang,
Tai-Ping Sun,
Yong Chen,
Lei Du,
Liang-Liang Guo,
Hai-Feng Zhang,
Hao-Ran Tao,
Tian-Le Wang,
Xiao-Yan Yang,
Ze-An Zhao,
Peng Wang,
Sheng Zhang,
Chi Zhang,
Ren-Ze Zhao,
Zhi-Long Jia,
Wei-Cheng Kong,
Meng-Han Dou,
Jun-Chao Wang
, et al. (7 additional authors not shown)
Abstract:
Quantum computational fluid dynamics (QCFD) offers a promising alternative to classical computational fluid dynamics (CFD) by leveraging quantum algorithms for higher efficiency. This paper introduces a comprehensive QCFD method, including an iterative method "Iterative-QLS" that suppresses error in quantum linear solver, and a subspace method to scale the solution to a larger size. We implement o…
▽ More
Quantum computational fluid dynamics (QCFD) offers a promising alternative to classical computational fluid dynamics (CFD) by leveraging quantum algorithms for higher efficiency. This paper introduces a comprehensive QCFD method, including an iterative method "Iterative-QLS" that suppresses error in quantum linear solver, and a subspace method to scale the solution to a larger size. We implement our method on a superconducting quantum computer, demonstrating successful simulations of steady Poiseuille flow and unsteady acoustic wave propagation. The Poiseuille flow simulation achieved a relative error of less than $0.2\%$, and the unsteady acoustic wave simulation solved a 5043-dimensional matrix. We emphasize the utilization of the quantum-classical hybrid approach in applications of near-term quantum computers. By adapting to quantum hardware constraints and offering scalable solutions for large-scale CFD problems, our method paves the way for practical applications of near-term quantum computers in computational science.
△ Less
Submitted 19 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models
Authors:
Xi Li,
Yusen Zhang,
Renze Lou,
Chen Wu,
Jiaqi Wang
Abstract:
Backdoor attacks present significant threats to Large Language Models (LLMs), particularly with the rise of third-party services that offer API integration and prompt engineering. Untrustworthy third parties can plant backdoors into LLMs and pose risks to users by embedding malicious instructions into user queries. The backdoor-compromised LLM will generate malicious output when and input is embed…
▽ More
Backdoor attacks present significant threats to Large Language Models (LLMs), particularly with the rise of third-party services that offer API integration and prompt engineering. Untrustworthy third parties can plant backdoors into LLMs and pose risks to users by embedding malicious instructions into user queries. The backdoor-compromised LLM will generate malicious output when and input is embedded with a specific trigger predetermined by an attacker. Traditional defense strategies, which primarily involve model parameter fine-tuning and gradient calculation, are inadequate for LLMs due to their extensive computational and clean data requirements. In this paper, we propose a novel solution, Chain-of-Scrutiny (CoS), to address these challenges. Backdoor attacks fundamentally create a shortcut from the trigger to the target output, thus lack reasoning support. Accordingly, CoS guides the LLMs to generate detailed reasoning steps for the input, then scrutinizes the reasoning process to ensure consistency with the final answer. Any inconsistency may indicate an attack. CoS only requires black-box access to LLM, offering a practical defense, particularly for API-accessible LLMs. It is user-friendly, enabling users to conduct the defense themselves. Driven by natural language, the entire defense process is transparent to users. We validate the effectiveness of CoS through extensive experiments across various tasks and LLMs. Additionally, experiments results shows CoS proves more beneficial for more powerful LLMs.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Measurement of the integrated luminosity of the data collected at 3.773 GeV by BESIII from 2021 to 2024
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (634 additional authors not shown)
Abstract:
We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$,…
▽ More
We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$, $8.157 \pm 0.031$~fb$^{-1}$, and $4.191 \pm 0.016$~fb$^{-1}$, respectively, by analyzing large angle Bhabha scattering events. The uncertainties are dominated by systematic effects and the statistical uncertainties are negligible. Our results provide essential input for future analyses and precision measurements.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.