subscribe to arXiv mailings

Prioritized League Reinforcement Learning for Large-Scale Heterogeneous Multiagent Systems

Authors: Qingxu Fu, Zhiqiang Pu, Min Chen, Tenghai Qiu, Jianqiang Yi

Abstract: Large-scale heterogeneous multiagent systems feature various realistic factors in the real world, such as agents with diverse abilities and overall system cost. In comparison to homogeneous systems, heterogeneous systems offer significant practical advantages. Nonetheless, they also present challenges for multiagent reinforcement learning, including addressing the non-stationary problem and managi… ▽ More Large-scale heterogeneous multiagent systems feature various realistic factors in the real world, such as agents with diverse abilities and overall system cost. In comparison to homogeneous systems, heterogeneous systems offer significant practical advantages. Nonetheless, they also present challenges for multiagent reinforcement learning, including addressing the non-stationary problem and managing an imbalanced number of agents with different types. We propose a Prioritized Heterogeneous League Reinforcement Learning (PHLRL) method to address large-scale heterogeneous cooperation problems. PHLRL maintains a record of various policies that agents have explored during their training and establishes a heterogeneous league consisting of diverse policies to aid in future policy optimization. Furthermore, we design a prioritized policy gradient approach to compensate for the gap caused by differences in the number of different types of agents. Next, we use Unreal Engine to design a large-scale heterogeneous cooperation benchmark named Large-Scale Multiagent Operation (LSMO), which is a complex two-team competition scenario that requires collaboration from both ground and airborne agents. We use experiments to show that PHLRL outperforms state-of-the-art methods, including QTRAN and QPLEX in LSMO. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.17460 [pdf, other]

Building Bridges across Spatial and Temporal Resolutions: Reference-Based Super-Resolution via Change Priors and Conditional Diffusion Model

Authors: Runmin Dong, Shuai Yuan, Bin Luo, Mengxuan Chen, Jinxiao Zhang, Lixian Zhang, Weijia Li, Juepeng Zheng, Haohuan Fu

Abstract: Reference-based super-resolution (RefSR) has the potential to build bridges across spatial and temporal resolutions of remote sensing images. However, existing RefSR methods are limited by the faithfulness of content reconstruction and the effectiveness of texture transfer in large scaling factors. Conditional diffusion models have opened up new opportunities for generating realistic high-resoluti… ▽ More Reference-based super-resolution (RefSR) has the potential to build bridges across spatial and temporal resolutions of remote sensing images. However, existing RefSR methods are limited by the faithfulness of content reconstruction and the effectiveness of texture transfer in large scaling factors. Conditional diffusion models have opened up new opportunities for generating realistic high-resolution images, but effectively utilizing reference images within these models remains an area for further exploration. Furthermore, content fidelity is difficult to guarantee in areas without relevant reference information. To solve these issues, we propose a change-aware diffusion model named Ref-Diff for RefSR, using the land cover change priors to guide the denoising process explicitly. Specifically, we inject the priors into the denoising model to improve the utilization of reference information in unchanged areas and regulate the reconstruction of semantically relevant content in changed areas. With this powerful guidance, we decouple the semantics-guided denoising and reference texture-guided denoising processes to improve the model performance. Extensive experiments demonstrate the superior effectiveness and robustness of the proposed method compared with state-of-the-art RefSR methods in both quantitative and qualitative evaluations. The code and data are available at https://github.com/dongrunmin/RefDiff. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR2024

arXiv:2403.16811 [pdf, ps, other]

Cross section measurement of $e^+e^-\to ηψ(2S)$ and search for $e^+e^-\toη\tilde{X}(3872)$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: The energy-dependent cross section for $e^+e^-\to ηψ(2S)$ is measured at eighteen center of mass energies from 4.288 GeV to 4.951 GeV using the BESIII detector. Using the same data samples, we also perform the first search for the reaction $e^+e^-\toη\tilde{X}(3872)$, but no evidence is found for the $\tilde{X}(3872)$ in the $π^+π^- J/ψ$ mass distribution. At each of the eighteen center of mass en… ▽ More The energy-dependent cross section for $e^+e^-\to ηψ(2S)$ is measured at eighteen center of mass energies from 4.288 GeV to 4.951 GeV using the BESIII detector. Using the same data samples, we also perform the first search for the reaction $e^+e^-\toη\tilde{X}(3872)$, but no evidence is found for the $\tilde{X}(3872)$ in the $π^+π^- J/ψ$ mass distribution. At each of the eighteen center of mass energies, upper limits at the 90\% confidence level on the cross section for $e^+e^-\toηψ(2S)$ and on the product of the $e^+e^-\toη\tilde{X}(3872)$ cross section with the branching fraction of $\tilde{X}(3872)\toπ^+π^- J/ψ$ are reported. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16698 [pdf, other]

Boson sampling enhanced quantum chemistry

Authors: Zhong-Xia Shang, Han-Sen Zhong, Yu-Kun Zhang, Cheng-Cheng Yu, Xiao Yuan, Chao-Yang Lu, Jian-Wei Pan, Ming-Cheng Chen

Abstract: In this work, we give a hybrid quantum-classical algorithm for solving electronic structure problems of molecules using only linear quantum optical systems. The variational ansatz we proposed is a hybrid of non-interacting Boson dynamics and classical computational chemistry methods, specifically, the Hartree-Fock method and the Configuration Interaction method. The Boson part is built by a linear… ▽ More In this work, we give a hybrid quantum-classical algorithm for solving electronic structure problems of molecules using only linear quantum optical systems. The variational ansatz we proposed is a hybrid of non-interacting Boson dynamics and classical computational chemistry methods, specifically, the Hartree-Fock method and the Configuration Interaction method. The Boson part is built by a linear optical interferometer which is easier to realize compared with the well-known Unitary Coupled Cluster (UCC) ansatz composed of quantum gates in conventional VQE and the classical part is merely classical processing acting on the Hamiltonian. We called such ansatzes Boson Sampling-Classic (BS-C). The appearance of permanents in the Boson part has its physical intuition to provide different kinds of resources from commonly used single-, double-, and higher-excitations in classical methods and the UCC ansatz to exploring chemical quantum states. Such resources can help enhance the accuracy of methods used in the classical parts. We give a scalable hybrid homodyne and photon number measurement procedure for evaluating the energy value which has intrinsic abilities to mitigate photon loss errors and discuss the extra measurement cost induced by the no Pauli exclusion principle for Bosons with its solutions. To demonstrate our proposal, we run numerical experiments on several molecules and obtain their potential energy curves reaching chemical accuracy. △ Less

Submitted 18 April, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

Comments: 15 pages, 5 figures

arXiv:2403.16461 [pdf]

Recent Advances on Transition-Metal-Based Layered Double Hydroxides Nanosheets for Electrocatalytic Energy Conversion

Authors: Yuchen Wang, Man Zhang, Yaoyu Liu, Zhikeng Zheng, Biying Liu, Meng Chen, Guoqing Guan, Kai Yan

Abstract: Transition-metal-based layered double hydroxides (TM-LDHs) nanosheets are promising electrocatalysts in the renewable electrochemical energy conversion system, which are regarded as alternatives to noble metal-based materials. In this review, recent advances on effective and facile strategies to rationally design TM-LDHs nanosheets as electrocatalysts, such as increasing the number of active sties… ▽ More Transition-metal-based layered double hydroxides (TM-LDHs) nanosheets are promising electrocatalysts in the renewable electrochemical energy conversion system, which are regarded as alternatives to noble metal-based materials. In this review, recent advances on effective and facile strategies to rationally design TM-LDHs nanosheets as electrocatalysts, such as increasing the number of active sties, improving the utilization of active sites (atomic-scale catalysts), modulating the electron configurations, and controlling the lattice facets, are summarized and compared. Then, the utilization of these fabricated TM-LDHs nanosheets for oxygen evolution reaction, hydrogen evolution reaction, urea oxidation reaction, nitrogen reduction reaction, small molecule oxidations, and biomass derivatives upgrading is articulated through systematically discussing the corresponding fundamental design principles and reaction mechanism. Finally, the existing challenges in increasing the density of catalytically active sites and future prospects of TM-LDHs nanosheets-based electrocatalysts in each application are also commented. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16097 [pdf, other]

Can Language Models Pretend Solvers? Logic Code Simulation with LLMs

Authors: Minyu Chen, Guoqiang Li, Ling-I Wu, Ruibang Liu, Yuxin Su, Xi Chang, Jianxin Xue

Abstract: Transformer-based large language models (LLMs) have demonstrated significant potential in addressing logic problems. capitalizing on the great capabilities of LLMs for code-related activities, several frameworks leveraging logical solvers for logic reasoning have been proposed recently. While existing research predominantly focuses on viewing LLMs as natural language logic solvers or translators,… ▽ More Transformer-based large language models (LLMs) have demonstrated significant potential in addressing logic problems. capitalizing on the great capabilities of LLMs for code-related activities, several frameworks leveraging logical solvers for logic reasoning have been proposed recently. While existing research predominantly focuses on viewing LLMs as natural language logic solvers or translators, their roles as logic code interpreters and executors have received limited attention. This study delves into a novel aspect, namely logic code simulation, which forces LLMs to emulate logical solvers in predicting the results of logical programs. To further investigate this novel task, we formulate our three research questions: Can LLMs efficiently simulate the outputs of logic codes? What strength arises along with logic code simulation? And what pitfalls? To address these inquiries, we curate three novel datasets tailored for the logic code simulation task and undertake thorough experiments to establish the baseline performance of LLMs in code simulation. Subsequently, we introduce a pioneering LLM-based code simulation technique, Dual Chains of Logic (DCoL). This technique advocates a dual-path thinking approach for LLMs, which has demonstrated state-of-the-art performance compared to other LLM prompt strategies, achieving a notable improvement in accuracy by 7.06% with GPT-4-Turbo. △ Less

Submitted 28 March, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

Comments: 12 pages, 8 figures

arXiv:2403.16038 [pdf, other]

Monotonic Paraphrasing Improves Generalization of Language Model Prompting

Authors: Qin Liu, Fei Wang, Nan Xu, Tianyi Yan, Tao Meng, Muhao Chen

Abstract: Performance of large language models (LLMs) may vary with different prompts or instructions of even the same task. One commonly recognized factor for this phenomenon is the model's familiarity with the given prompt or instruction, which is typically estimated by its perplexity. However, finding the prompt with the lowest perplexity is challenging, given the enormous space of possible prompting phr… ▽ More Performance of large language models (LLMs) may vary with different prompts or instructions of even the same task. One commonly recognized factor for this phenomenon is the model's familiarity with the given prompt or instruction, which is typically estimated by its perplexity. However, finding the prompt with the lowest perplexity is challenging, given the enormous space of possible prompting phrases. In this paper, we propose monotonic paraphrasing (MonoPara), an end-to-end decoding strategy that paraphrases given prompts or instructions into their lower perplexity counterparts based on an ensemble of a paraphrase LM for prompt (or instruction) rewriting, and a target LM (i.e. the prompt or instruction executor) that constrains the generation for lower perplexity. The ensemble decoding process can efficiently paraphrase the original prompt without altering its semantic meaning, while monotonically decreasing the perplexity of each generation as calculated by the target LM. We explore in detail both greedy and search-based decoding as two alternative decoding schemes of MonoPara. Notably, MonoPara does not require any training and can monotonically lower the perplexity of the paraphrased prompt or instruction, leading to improved performance of zero-shot LM prompting as evaluated on a wide selection of tasks. In addition, MonoPara is also shown to effectively improve LMs' generalization on perturbed and unseen task instructions. △ Less

Submitted 18 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

Comments: Under review at ARR 2024 April

arXiv:2403.15764 [pdf, other]

Radiation Effects on Scientific CMOS Detectors for X-ray Astronomy: II. Total Ionizing Dose Irradiation

Authors: Mengxi Chen, Zhixing Ling, Mingjun Liu, Qinyu Wu, Chen Zhang, Jiaqiang Liu, Zhenlong Zhang, Weimin Yuan, Shuang-Nan Zhang

Abstract: Complementary metal-oxide-semiconductor (CMOS) detectors are a competitive choice for current and upcoming astronomical missions. To understand the performance variations of CMOS detectors in space environment, we investigate the total ionizing dose effects on custom-made large-format X-ray CMOS detectors. Three CMOS detector samples were irradiated with a Co-60 source with a total dose of 70 krad… ▽ More Complementary metal-oxide-semiconductor (CMOS) detectors are a competitive choice for current and upcoming astronomical missions. To understand the performance variations of CMOS detectors in space environment, we investigate the total ionizing dose effects on custom-made large-format X-ray CMOS detectors. Three CMOS detector samples were irradiated with a Co-60 source with a total dose of 70 krad and 105 krad. We test and compare the performance of these detectors before and after irradiation. After irradiation, the dark current increases by roughly 20 to 100 times, and the readout noise increases from 3 e- to 6 e-. The bias level at 50 ms integration time decreases by 13 to 18 Digital Number (DN) at -30 degree. The energy resolution increases from about 150 eV to about 170 eV at 4.5 keV at -30 degree. The conversion gain of the detectors varies for less than 2% after the irradiation. Furthermore, there are about 50 pixels whose bias at 50 ms has changed by more than 20 DN after the exposure to the radiation and about 30 to 140 pixels whose readout noise has increased by over 20 e- at -30 degree at 50 ms integration time. These results demonstrate that the performances of large-format CMOS detectors do not suffer significant degeneration in space environment. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: accepted by JATIS

arXiv:2403.15676 [pdf, other]

AC4: Algebraic Computation Checker for Circuit Constraints in ZKPs

Authors: Hao Chen, Minyu Chen, Ruibang Liu, Guoqiang Li, Sinka Gao

Abstract: ZKP systems have surged attention and held a fundamental role in contemporary cryptography. Zk-SNARK protocols dominate the ZKP usage, often implemented through arithmetic circuit programming paradigm. However, underconstrained or overconstrained circuits may lead to bugs. Underconstrained circuits refer to circuits that lack the necessary constraints, resulting in unexpected solutions in the circ… ▽ More ZKP systems have surged attention and held a fundamental role in contemporary cryptography. Zk-SNARK protocols dominate the ZKP usage, often implemented through arithmetic circuit programming paradigm. However, underconstrained or overconstrained circuits may lead to bugs. Underconstrained circuits refer to circuits that lack the necessary constraints, resulting in unexpected solutions in the circuit and causing the verifier to accept a bogus witness. Overconstrained circuits refer to circuits that are constrained excessively, resulting in the circuit lacking necessary solutions and causing the verifier to accept no witness, rendering the circuit meaningless. This paper introduces a novel approach for pinpointing two distinct types of bugs in ZKP circuits. The method involves encoding the arithmetic circuit constraints to polynomial equation systems and solving polynomial equation systems over a finite field by algebraic computation. The classification of verification results is refined, greatly enhancing the expressive power of the system. We proposed a tool, AC4, to represent the implementation of this method. Experiments demonstrate that AC4 represents a substantial 29% increase in the checked ratio compared to prior work. Within a solvable range, the checking time of AC4 has also exhibited noticeable improvement, demonstrating a magnitude increase compared to previous efforts. △ Less

Submitted 7 May, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

Comments: 20 pages, 4 figures

arXiv:2403.14998 [pdf, other]

Precise measurement of the $e^+e^-\to D_s^+D_s^-$ cross sections at center-of-mass energies from threshold to 4.95 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Using the $e^+e^-$ collision data collected with the BESIII detector operating at the BEPCII collider, at center-of-mass energies from the threshold to $4.95$~GeV, we present precise measurements of the cross sections for the process $e^+e^-\to D_s^+D_s^-$ using a single tag method. The resulting cross section lineshape exhibits several new structures, thereby offering an input for coupled channel… ▽ More Using the $e^+e^-$ collision data collected with the BESIII detector operating at the BEPCII collider, at center-of-mass energies from the threshold to $4.95$~GeV, we present precise measurements of the cross sections for the process $e^+e^-\to D_s^+D_s^-$ using a single tag method. The resulting cross section lineshape exhibits several new structures, thereby offering an input for coupled channel analysis and model tests, which are critical to understand vector charmonium-like states with masses between 4 and 5~GeV. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 9 pages, 4 figures, published to PRL

arXiv:2403.13909 [pdf, other]

Sequential Modeling of Complex Marine Navigation: Case Study on a Passenger Vessel (Student Abstract)

Authors: Yimeng Fan, Pedram Agand, Mo Chen, Edward J. Park, Allison Kennedy, Chanwoo Bae

Abstract: The maritime industry's continuous commitment to sustainability has led to a dedicated exploration of methods to reduce vessel fuel consumption. This paper undertakes this challenge through a machine learning approach, leveraging a real-world dataset spanning two years of a ferry in west coast Canada. Our focus centers on the creation of a time series forecasting model given the dynamic and static… ▽ More The maritime industry's continuous commitment to sustainability has led to a dedicated exploration of methods to reduce vessel fuel consumption. This paper undertakes this challenge through a machine learning approach, leveraging a real-world dataset spanning two years of a ferry in west coast Canada. Our focus centers on the creation of a time series forecasting model given the dynamic and static states, actions, and disturbances. This model is designed to predict dynamic states based on the actions provided, subsequently serving as an evaluative tool to assess the proficiency of the ferry's operation under the captain's guidance. Additionally, it lays the foundation for future optimization algorithms, providing valuable feedback on decision-making processes. To facilitate future studies, our code is available at \url{https://github.com/pagand/model_optimze_vessel/tree/AAAI} △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 5 pages, 3 figures, AAAI 2024 student abstract

arXiv:2403.13437 [pdf, other]

Search for $ΔS=2$ nonleptonic hyperon decays $Ω^-\toΣ^{0}π^{-}$ and $Ω^-\to nK^{-}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Using $(27.12 \pm 0.14) \times 10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the center-of-mass energy of $\sqrt{s} = 3.686$ GeV, we search for the first time for two nonleptonic hyperon decays that change strangeness by two units, $Ω^-\toΣ^{0}π^-$ and $Ω^-\to nK^{-}$. No significant signal is observed. The upper limits on their decay branching fractions are determined to be… ▽ More Using $(27.12 \pm 0.14) \times 10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the center-of-mass energy of $\sqrt{s} = 3.686$ GeV, we search for the first time for two nonleptonic hyperon decays that change strangeness by two units, $Ω^-\toΣ^{0}π^-$ and $Ω^-\to nK^{-}$. No significant signal is observed. The upper limits on their decay branching fractions are determined to be $\mathcal{B}(Ω^-\toΣ^{0}π^-) < 5.4\times 10^{-4}$ and $\mathcal{B}(Ω^-\to nK^{-}) < 2.4\times 10^{-4}$ at the $90\%$ confidence level. △ Less

Submitted 14 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.13219 [pdf, other]

Diffusion Model for Data-Driven Black-Box Optimization

Authors: Zihao Li, Hui Yuan, Kaixuan Huang, Chengzhuo Ni, Yinyu Ye, Minshuo Chen, Mengdi Wang

Abstract: Generative AI has redefined artificial intelligence, enabling the creation of innovative content and customized solutions that drive business practices into a new era of efficiency and creativity. In this paper, we focus on diffusion models, a powerful generative AI technology, and investigate their potential for black-box optimization over complex structured variables. Consider the practical scen… ▽ More Generative AI has redefined artificial intelligence, enabling the creation of innovative content and customized solutions that drive business practices into a new era of efficiency and creativity. In this paper, we focus on diffusion models, a powerful generative AI technology, and investigate their potential for black-box optimization over complex structured variables. Consider the practical scenario where one wants to optimize some structured design in a high-dimensional space, based on massive unlabeled data (representing design variables) and a small labeled dataset. We study two practical types of labels: 1) noisy measurements of a real-valued reward function and 2) human preference based on pairwise comparisons. The goal is to generate new designs that are near-optimal and preserve the designed latent structures. Our proposed method reformulates the design optimization problem into a conditional sampling problem, which allows us to leverage the power of diffusion models for modeling complex distributions. In particular, we propose a reward-directed conditional diffusion model, to be trained on the mixed data, for sampling a near-optimal solution conditioned on high predicted rewards. Theoretically, we establish sub-optimality error bounds for the generated designs. The sub-optimality gap nearly matches the optimal guarantee in off-policy bandits, demonstrating the efficiency of reward-directed diffusion models for black-box optimization. Moreover, when the data admits a low-dimensional latent subspace structure, our model efficiently generates high-fidelity designs that closely respect the latent structure. We provide empirical experiments validating our model in decision-making and content-creation tasks. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2307.07055

arXiv:2403.13207 [pdf, ps, other]

Condensed-Phase Quantum Chemistry

Authors: Paul J. Robinson, Adam Rettig, Hieu Q. Dinh, Meng-Fu Chen, Joonho Lee

Abstract: Molecular quantum chemistry has seen enormous progress in the last few decades thanks to the more advanced and sophisticated numerical techniques and computing power. Following the recent interest in extending these capabilities to condensed-phase problems, we summarize basic knowledge of condensed-phase quantum chemistry for ones with experience in molecular quantum chemistry. We highlight recent… ▽ More Molecular quantum chemistry has seen enormous progress in the last few decades thanks to the more advanced and sophisticated numerical techniques and computing power. Following the recent interest in extending these capabilities to condensed-phase problems, we summarize basic knowledge of condensed-phase quantum chemistry for ones with experience in molecular quantum chemistry. We highlight recent efforts in this direction, including solving the electron repulsion integrals bottleneck and implementing hybrid density functional theory and wavefunction methods, and lattice dynamics for periodic systems within atom-centered basis sets. Many computational techniques presented here are inspired by the extensive method developments rooted in quantum chemistry. In this Focus Article, we selectively focus on the computational techniques rooted in molecular quantum chemistry, emphasize some challenges, and point out open questions. We hope our perspectives will encourage researchers to pursue this exciting and promising research avenue. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: 17 pages, 4 figures

arXiv:2403.12965 [pdf, other]

Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment

Authors: Mengting Chen, Xi Chen, Zhonghua Zhai, Chen Ju, Xuewen Hong, Jinsong Lan, Shuai Xiao

Abstract: This paper introduces a novel framework for virtual try-on, termed Wear-Any-Way. Different from previous methods, Wear-Any-Way is a customizable solution. Besides generating high-fidelity results, our method supports users to precisely manipulate the wearing style. To achieve this goal, we first construct a strong pipeline for standard virtual try-on, supporting single/multiple garment try-on and… ▽ More This paper introduces a novel framework for virtual try-on, termed Wear-Any-Way. Different from previous methods, Wear-Any-Way is a customizable solution. Besides generating high-fidelity results, our method supports users to precisely manipulate the wearing style. To achieve this goal, we first construct a strong pipeline for standard virtual try-on, supporting single/multiple garment try-on and model-to-model settings in complicated scenarios. To make it manipulable, we propose sparse correspondence alignment which involves point-based control to guide the generation for specific locations. With this design, Wear-Any-Way gets state-of-the-art performance for the standard setting and provides a novel interaction form for customizing the wearing style. For instance, it supports users to drag the sleeve to make it rolled up, drag the coat to make it open, and utilize clicks to control the style of tuck, etc. Wear-Any-Way enables more liberated and flexible expressions of the attires, holding profound implications in the fashion industry. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: Project Page: https://mengtingchen.github.io/wear-any-way-page/

arXiv:2403.12580 [pdf, other]

Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection

Authors: Chengjie Wang, Wenbing Zhu, Bin-Bin Gao, Zhenye Gan, Jianning Zhang, Zhihao Gu, Shuguang Qian, Mingang Chen, Lizhuang Ma

Abstract: Industrial anomaly detection (IAD) has garnered significant attention and experienced rapid development. However, the recent development of IAD approach has encountered certain difficulties due to dataset limitations. On the one hand, most of the state-of-the-art methods have achieved saturation (over 99% in AUROC) on mainstream datasets such as MVTec, and the differences of methods cannot be well… ▽ More Industrial anomaly detection (IAD) has garnered significant attention and experienced rapid development. However, the recent development of IAD approach has encountered certain difficulties due to dataset limitations. On the one hand, most of the state-of-the-art methods have achieved saturation (over 99% in AUROC) on mainstream datasets such as MVTec, and the differences of methods cannot be well distinguished, leading to a significant gap between public datasets and actual application scenarios. On the other hand, the research on various new practical anomaly detection settings is limited by the scale of the dataset, posing a risk of overfitting in evaluation results. Therefore, we propose a large-scale, Real-world, and multi-view Industrial Anomaly Detection dataset, named Real-IAD, which contains 150K high-resolution images of 30 different objects, an order of magnitude larger than existing datasets. It has a larger range of defect area and ratio proportions, making it more challenging than previous datasets. To make the dataset closer to real application scenarios, we adopted a multi-view shooting method and proposed sample-level evaluation metrics. In addition, beyond the general unsupervised anomaly detection setting, we propose a new setting for Fully Unsupervised Industrial Anomaly Detection (FUIAD) based on the observation that the yield rate in industrial production is usually greater than 60%, which has more practical application value. Finally, we report the results of popular IAD methods on the Real-IAD dataset, providing a highly challenging benchmark to promote the development of the IAD field. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: It is accepted by CVPR2024

arXiv:2403.12028 [pdf, other]

Ultraman: Single Image 3D Human Reconstruction with Ultra Speed and Detail

Authors: Mingjin Chen, Junhao Chen, Xiaojun Ye, Huan-ang Gao, Xiaoxue Chen, Zhaoxin Fan, Hao Zhao

Abstract: 3D human body reconstruction has been a challenge in the field of computer vision. Previous methods are often time-consuming and difficult to capture the detailed appearance of the human body. In this paper, we propose a new method called \emph{Ultraman} for fast reconstruction of textured 3D human models from a single image. Compared to existing techniques, \emph{Ultraman} greatly improves the re… ▽ More 3D human body reconstruction has been a challenge in the field of computer vision. Previous methods are often time-consuming and difficult to capture the detailed appearance of the human body. In this paper, we propose a new method called \emph{Ultraman} for fast reconstruction of textured 3D human models from a single image. Compared to existing techniques, \emph{Ultraman} greatly improves the reconstruction speed and accuracy while preserving high-quality texture details. We present a set of new frameworks for human reconstruction consisting of three parts, geometric reconstruction, texture generation and texture mapping. Firstly, a mesh reconstruction framework is used, which accurately extracts 3D human shapes from a single image. At the same time, we propose a method to generate a multi-view consistent image of the human body based on a single image. This is finally combined with a novel texture mapping method to optimize texture details and ensure color consistency during reconstruction. Through extensive experiments and evaluations, we demonstrate the superior performance of \emph{Ultraman} on various standard datasets. In addition, \emph{Ultraman} outperforms state-of-the-art methods in terms of human rendering quality and speed. Upon acceptance of the article, we will make the code and data publicly available. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: Project Page: https://air-discover.github.io/Ultraman/

arXiv:2403.11968 [pdf, other]

Unveil Conditional Diffusion Models with Classifier-free Guidance: A Sharp Statistical Theory

Authors: Hengyu Fu, Zhuoran Yang, Mengdi Wang, Minshuo Chen

Abstract: Conditional diffusion models serve as the foundation of modern image synthesis and find extensive application in fields like computational biology and reinforcement learning. In these applications, conditional diffusion models incorporate various conditional information, such as prompt input, to guide the sample generation towards desired properties. Despite the empirical success, theory of condit… ▽ More Conditional diffusion models serve as the foundation of modern image synthesis and find extensive application in fields like computational biology and reinforcement learning. In these applications, conditional diffusion models incorporate various conditional information, such as prompt input, to guide the sample generation towards desired properties. Despite the empirical success, theory of conditional diffusion models is largely missing. This paper bridges this gap by presenting a sharp statistical theory of distribution estimation using conditional diffusion models. Our analysis yields a sample complexity bound that adapts to the smoothness of the data distribution and matches the minimax lower bound. The key to our theoretical development lies in an approximation result for the conditional score function, which relies on a novel diffused Taylor approximation technique. Moreover, we demonstrate the utility of our statistical theory in elucidating the performance of conditional diffusion models across diverse applications, including model-based transition kernel estimation in reinforcement learning, solving inverse problems, and reward conditioned sample generation. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 92 pages, 5 figures

arXiv:2403.11813 [pdf, other]

Polarization multistates in composite ferroelectrics

Authors: Chuhan Tang, Zhiqiang Tian, Tao Ouyang, Anlian Pan, Mingxing Chen

Abstract: Going beyond the bistability paradigm of the charge polarizations in ferroelectrics is highly desired for ferroelectric memory devices toward ultra-high density information storage. Here, we propose to build multistates in composite ferroelectrics, which have both the intrinsic and sliding-induced polarizations. We illustrate the concept in H-stacking bilayers of 1T'' transition-metal dichalcogeni… ▽ More Going beyond the bistability paradigm of the charge polarizations in ferroelectrics is highly desired for ferroelectric memory devices toward ultra-high density information storage. Here, we propose to build multistates in composite ferroelectrics, which have both the intrinsic and sliding-induced polarizations. We illustrate the concept in H-stacking bilayers of 1T'' transition-metal dichalcogenides by first-principle calculations. We find that there is at least one order of magnitude difference in the energy barriers between these two types polarizations, which suggests that the external electric fields required to flipping them are significantly different. This difference allows for a novel flipping mechanism involving layer sliding and layer-by-layer flipping for the transforming of the polarization states. As a result, sextuple switchable states can be achieved for the 1T'' bilayers by properly controlling electrical field. Our study provides a new route to design polarization multistates for developing next-generation memory devices. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 6 pages, 3 figures

arXiv:2403.11417 [pdf, ps, other]

Positioning Using Wireless Networks: Applications, Recent Progress and Future Challenges

Authors: Yang Yang, Mingzhe Chen, Yufei Blankenship, Jemin Lee, Zabih Ghassemlooy, Julian Cheng, Shiwen Mao

Abstract: Positioning has recently received considerable attention as a key enabler in emerging applications such as extended reality, unmanned aerial vehicles and smart environments. These applications require both data communication and high-precision positioning, and thus they are particularly well-suited to be offered in wireless networks (WNs). The purpose of this paper is to provide a comprehensive ov… ▽ More Positioning has recently received considerable attention as a key enabler in emerging applications such as extended reality, unmanned aerial vehicles and smart environments. These applications require both data communication and high-precision positioning, and thus they are particularly well-suited to be offered in wireless networks (WNs). The purpose of this paper is to provide a comprehensive overview of existing works and new trends in the field of positioning techniques from both the academic and industrial perspectives. The paper provides a comprehensive overview of positioning in WNs, covering the background, applications, measurements, state-of-the-art technologies and future challenges. The paper outlines the applications of positioning from the perspectives of public facilities, enterprises and individual users. We investigate the key performance indicators and measurements of positioning systems, followed by the review of the key enabler techniques such as artificial intelligence/large models and adaptive systems. Next, we discuss a number of typical wireless positioning technologies. We extend our overview beyond the academic progress, to include the standardization efforts, and finally, we provide insight into the challenges that remain. The comprehensive overview of exisitng efforts and new trends in the field of positioning from both the academic and industrial communities would be a useful reference to researchers in the field. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.11239 [pdf]

A component-level co-rotational 3D continuum finite element framework for efficient flexible multibody analysis

Authors: Ziyun Kan, Mingdong Chen, Haijun Peng, Yizhu Guo, Xueguan Song

Abstract: This paper proposes a systematic and novel component level co-rotational (CR) framework, for upgrading existing 3D continuum finite elements to flexible multibody analysis. Without using any model reduction techniques, the high efficiency is achieved through sophisticated operations in both modeling and numerical implementation phrases. In modeling phrase, as in conventional 3D nonlinear finite an… ▽ More This paper proposes a systematic and novel component level co-rotational (CR) framework, for upgrading existing 3D continuum finite elements to flexible multibody analysis. Without using any model reduction techniques, the high efficiency is achieved through sophisticated operations in both modeling and numerical implementation phrases. In modeling phrase, as in conventional 3D nonlinear finite analysis, the nodal absolute coordinates are used as the system generalized coordinates, therefore simple formulations of the inertia force terms can be obtained. For the elastic force terms, inspired by existing floating frame of reference formulation (FFRF) and conventional element-level CR formulation, a component-level CR modeling strategy is developed. By in combination with Schur complement theory and fully exploring the nature of the component-level CR modeling method, an extremely efficient procedure is developed, which enables us to transform the linear equations raised from each Newton-Raphson iteration step into linear systems with constant coefficient matrix. The coefficient matrix thus can be pre-calculated and decomposed only once, and at all the subsequent time steps only back substitutions are needed, which avoids frequently updating the Jacobian matrix and avoids directly solving the large-scale linearized equation in each iteration. Multiple examples are presented to demonstrate the performance of the proposed framework. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.11102 [pdf, other]

Jointly Optimizing Terahertz based Sensing and Communications in Vehicular Networks: A Dynamic Graph Neural Network Approach

Authors: Xuefei Li, Mingzhe Chen, Ye Hu, Zhilong Zhang, Danpu Liu, Shiwen Mao

Abstract: In this paper, the problem of vehicle service mode selection (sensing, communication, or both) and vehicle connections within terahertz (THz) enabled joint sensing and communications over vehicular networks is studied. The considered network consists of several service provider vehicles (SPVs) that can provide: 1) only sensing service, 2) only communication service, and 3) both services, sensing s… ▽ More In this paper, the problem of vehicle service mode selection (sensing, communication, or both) and vehicle connections within terahertz (THz) enabled joint sensing and communications over vehicular networks is studied. The considered network consists of several service provider vehicles (SPVs) that can provide: 1) only sensing service, 2) only communication service, and 3) both services, sensing service request vehicles, and communication service request vehicles. Based on the vehicle network topology and their service accessibility, SPVs strategically select service request vehicles to provide sensing, communication, or both services. This problem is formulated as an optimization problem, aiming to maximize the number of successfully served vehicles by jointly determining the service mode of each SPV and its associated vehicles. To solve this problem, we propose a dynamic graph neural network (GNN) model that selects appropriate graph information aggregation functions according to the vehicle network topology, thus extracting more vehicle network information compared to traditional static GNNs that use fixed aggregation functions for different vehicle network topologies. Using the extracted vehicle network information, the service mode of each SPV and its served service request vehicles will be determined. Simulation results show that the proposed dynamic GNN based method can improve the number of successfully served vehicles by up to 17% and 28% compared to a GNN based algorithm with a fixed neural network model and a conventional optimization algorithm without using GNNs. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.10877 [pdf, ps, other]

Test of lepton universality and measurement of the form factors of $D^0\to K^{*}(892)^-μ^+ν_μ$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (637 additional authors not shown)

Abstract: We report a first study of the semileptonic decay $D^0\rightarrow K^-π^0μ^{+}ν_μ$ by analyzing an $e^+e^-$ annihilation data sample of $7.9~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The absolute branching fraction of $D^0\to K^-π^0μ^{+}ν_μ$ is measured for the first time to be $(0.729 \pm 0.014_{\rm stat} \pm 0.011_{\rm syst})\%$. Based on an a… ▽ More We report a first study of the semileptonic decay $D^0\rightarrow K^-π^0μ^{+}ν_μ$ by analyzing an $e^+e^-$ annihilation data sample of $7.9~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The absolute branching fraction of $D^0\to K^-π^0μ^{+}ν_μ$ is measured for the first time to be $(0.729 \pm 0.014_{\rm stat} \pm 0.011_{\rm syst})\%$. Based on an amplitude analysis, the $S\text{-}{\rm wave}$ contribution is determined to be $(5.76 \pm 0.35_{\rm stat} \pm 0.29_{\rm syst})\%$ of the total decay rate in addition to the dominated $K^{*}(892)^-$ component. The branching fraction of $D^0\to K^{*}(892)^-μ^+ν_μ$ is given to be $(2.062 \pm 0.039_{\rm stat} \pm 0.032_{\rm syst})\%$, which improves the precision of the world average by a factor of 5. Combining with the world average of ${\mathcal B}(D^0\to K^{*}(892)^-e^+ν_e)$, the ratio of the branching fractions obtained is $\frac{{\mathcal B}(D^0\to K^{*}(892)^-μ^+ν_μ)}{{\mathcal B}(D^0\to K^{*}(892)^-e^+ν_e)} = 0.96\pm0.08$, in agreement with lepton flavor universality. Furthermore, assuming single-pole dominance parameterization, the most precise hadronic form factor ratios for $D^0\to K^{*}(892)^{-} μ^+ν_μ$ are extracted to be $r_{V}=V(0)/A_1(0)=1.37 \pm 0.09_{\rm stat} \pm 0.03_{\rm syst}$ and $r_{2}=A_2(0)/A_1(0)=0.76 \pm 0.06_{\rm stat} \pm 0.02_{\rm syst}$. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: 9 pages, 3 figures

arXiv:2403.10010 [pdf, other]

doi 10.1103/PhysRevLett.132.131002

Measurements of All-Particle Energy Spectrum and Mean Logarithmic Mass of Cosmic Rays from 0.3 to 30 PeV with LHAASO-KM2A

Authors: The LHAASO Collaboration, Zhen Cao, F. Aharonian, Q. An, A. Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen , et al. (256 additional authors not shown)

Abstract: We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at… ▽ More We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at $3.67 \pm 0.05 \pm 0.15$ PeV. Below the knee, the spectral index is found to be -$2.7413 \pm 0.0004 \pm 0.0050$, while above the knee, it is -$3.128 \pm 0.005 \pm 0.027$, with the sharpness of the transition measured with a statistical error of 2%. The mean logarithmic mass of cosmic rays is almost heavier than helium in the whole measured energy range. It decreases from 1.7 at 0.3 PeV to 1.3 at 3 PeV, representing a 24% decline following a power law with an index of -$0.1200 \pm 0.0003 \pm 0.0341$. This is equivalent to an increase in abundance of light components. Above the knee, the mean logarithmic mass exhibits a power law trend towards heavier components, which is reversal to the behavior observed in the all-particle energy spectrum. Additionally, the knee position and the change in power-law index are approximately the same. These findings suggest that the knee observed in the all-particle spectrum corresponds to the knee of the light component, rather than the medium-heavy components. △ Less

Submitted 26 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: 8 pages, 3 figures

Journal ref: Physical Review Letters 132, 131002 (2024)

arXiv:2403.09877 [pdf, other]

Quantifying Distributional Input Uncertainty via Inflated Kolmogorov-Smirnov Confidence Band

Authors: Motong Chen, Henry Lam, Zhenyuan Liu

Abstract: In stochastic simulation, input uncertainty refers to the propagation of the statistical noise in calibrating input models to impact output accuracy, in addition to the Monte Carlo simulation noise. The vast majority of the input uncertainty literature focuses on estimating target output quantities that are real-valued. However, outputs of simulation models are random and real-valued targets essen… ▽ More In stochastic simulation, input uncertainty refers to the propagation of the statistical noise in calibrating input models to impact output accuracy, in addition to the Monte Carlo simulation noise. The vast majority of the input uncertainty literature focuses on estimating target output quantities that are real-valued. However, outputs of simulation models are random and real-valued targets essentially serve only as summary statistics. To provide a more holistic assessment, we study the input uncertainty problem from a distributional view, namely we construct confidence bands for the entire output distribution function. Our approach utilizes a novel test statistic whose asymptotic consists of the supremum of the sum of a Brownian bridge and a suitable mean-zero Gaussian process, which generalizes the Kolmogorov-Smirnov statistic to account for input uncertainty. Regarding implementation, we also demonstrate how to use subsampling to efficiently estimate the covariance function of the Gaussian process, thereby leading to an implementable estimation of the quantile of the test statistic and a statistically valid confidence band. Numerical results demonstrate how our new confidence bands provide valid coverage for output distributions under input uncertainty that is not achievable by conventional approaches. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.09513 [pdf, other]

AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting

Authors: Yu Wang, Xiaogeng Liu, Yu Li, Muhao Chen, Chaowei Xiao

Abstract: With the advent and widespread deployment of Multimodal Large Language Models (MLLMs), the imperative to ensure their safety has become increasingly pronounced. However, with the integration of additional modalities, MLLMs are exposed to new vulnerabilities, rendering them prone to structured-based jailbreak attacks, where semantic content (e.g., "harmful text") has been injected into the images t… ▽ More With the advent and widespread deployment of Multimodal Large Language Models (MLLMs), the imperative to ensure their safety has become increasingly pronounced. However, with the integration of additional modalities, MLLMs are exposed to new vulnerabilities, rendering them prone to structured-based jailbreak attacks, where semantic content (e.g., "harmful text") has been injected into the images to mislead MLLMs. In this work, we aim to defend against such threats. Specifically, we propose \textbf{Ada}ptive \textbf{Shield} Prompting (\textbf{AdaShield}), which prepends inputs with defense prompts to defend MLLMs against structure-based jailbreak attacks without fine-tuning MLLMs or training additional modules (e.g., post-stage content detector). Initially, we present a manually designed static defense prompt, which thoroughly examines the image and instruction content step by step and specifies response methods to malicious queries. Furthermore, we introduce an adaptive auto-refinement framework, consisting of a target MLLM and a LLM-based defense prompt generator (Defender). These components collaboratively and iteratively communicate to generate a defense prompt. Extensive experiments on the popular structure-based jailbreak attacks and benign datasets show that our methods can consistently improve MLLMs' robustness against structure-based jailbreak attacks without compromising the model's general capabilities evaluated on standard benign tasks. Our code is available at https://github.com/rain305f/AdaShield. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: Multimodal Large Language Models Defense, 25 Pages

arXiv:2403.08048 [pdf, other]

A Spectroscopic Hunt for Post-Red Supergiants in the Large Magellanic Cloud I: Preliminary Results

Authors: Kaitlyn M. Chen, Trevor Z. Dorn-Wallenstein

Abstract: Yellow supergiants (YSGs) are rare and poorly understood, and studying them is critical to constraining massive star evolution. We obtained flux-calibrated Magellan Inamori Kyocera Echelle (MIKE) high-resolution spectra of 40 YSGs in the Large Magellanic Cloud (LMC); this sample likely contains post-red supergiants (RSGs). Fitting these data with ATLAS9 model atmospheres, we determined fundamental… ▽ More Yellow supergiants (YSGs) are rare and poorly understood, and studying them is critical to constraining massive star evolution. We obtained flux-calibrated Magellan Inamori Kyocera Echelle (MIKE) high-resolution spectra of 40 YSGs in the Large Magellanic Cloud (LMC); this sample likely contains post-red supergiants (RSGs). Fitting these data with ATLAS9 model atmospheres, we determined fundamental parameters for these stars. We measure the first spectroscopic luminosities for YSGs above 20 $M_\odot$, providing us a novel probe of the luminosity-to-mass ratio. Many stars in our sample appear to have anomalously high surface gravities, despite being confirmed LMC supergiants. We manually inspected our data finding evidence for binary companions and ongoing mass loss. Our work demonstrates the valuable role of high-resolution spectroscopy in interpreting the evolutionary status of cool supergiants. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: Accepted for publication in RNAAS. 4 pages, 1 Table. Comments welcome

arXiv:2403.08002 [pdf, other]

Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation

Authors: Juan Manuel Zambrano Chaves, Shih-Cheng Huang, Yanbo Xu, Hanwen Xu, Naoto Usuyama, Sheng Zhang, Fei Wang, Yujia Xie, Mahmoud Khademi, Ziyi Yang, Hany Awadalla, Julia Gong, Houdong Hu, Jianwei Yang, Chunyuan Li, Jianfeng Gao, Yu Gu, Cliff Wong, Mu Wei, Tristan Naumann, Muhao Chen, Matthew P. Lungren, Akshay Chaudhari, Serena Yeung-Levy, Curtis P. Langlotz , et al. (2 additional authors not shown)

Abstract: The scaling laws and extraordinary performance of large foundation models motivate the development and utilization of such models in biomedicine. However, despite early promising results on some biomedical benchmarks, there are still major challenges that need to be addressed before these models can be used in real-world clinics. Frontier general-domain models such as GPT-4V still have significant… ▽ More The scaling laws and extraordinary performance of large foundation models motivate the development and utilization of such models in biomedicine. However, despite early promising results on some biomedical benchmarks, there are still major challenges that need to be addressed before these models can be used in real-world clinics. Frontier general-domain models such as GPT-4V still have significant performance gaps in multimodal biomedical applications. More importantly, less-acknowledged pragmatic issues, including accessibility, model cost, and tedious manual evaluation make it hard for clinicians to use state-of-the-art large models directly on private patient data. Here, we explore training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology. To maximize data efficiency, we adopt a modular approach by incorporating state-of-the-art pre-trained models for image and text modalities, and focusing on training a lightweight adapter to ground each modality to the text embedding space, as exemplified by LLaVA-Med. For training, we assemble a large dataset of over 697 thousand radiology image-text pairs. For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation. For best practice, we conduct a systematic ablation study on various choices in data engineering and multimodal training. The resulting LlaVA-Rad (7B) model attains state-of-the-art results on standard radiology tasks such as report generation and cross-modal retrieval, even outperforming much larger models such as GPT-4V and Med-PaLM M (84B). The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications. △ Less

Submitted 26 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.07901 [pdf, other]

MIP: CLIP-based Image Reconstruction from PEFT Gradients

Authors: Peiheng Zhou, Ming Hu, Xiaofei Xie, Yihao Huang, Kangjie Chen, Mingsong Chen

Abstract: Contrastive Language-Image Pre-training (CLIP) model, as an effective pre-trained multimodal neural network, has been widely used in distributed machine learning tasks, especially Federated Learning (FL). Typically, CLIP-based FL adopts Parameter-Efficient Fine-Tuning (PEFT) for model training, which only fine-tunes adapter parameters or soft prompts rather than the full parameters. Although PEFT… ▽ More Contrastive Language-Image Pre-training (CLIP) model, as an effective pre-trained multimodal neural network, has been widely used in distributed machine learning tasks, especially Federated Learning (FL). Typically, CLIP-based FL adopts Parameter-Efficient Fine-Tuning (PEFT) for model training, which only fine-tunes adapter parameters or soft prompts rather than the full parameters. Although PEFT is different from the traditional training mode, in this paper, we theoretically analyze that the gradients of adapters or soft prompts can still be used to perform image reconstruction attacks. Based on our theoretical analysis, we propose Multm-In-Parvo (MIP), a proprietary reconstruction attack method targeting CLIP-based distributed machine learning architecture. Specifically, MIP can reconstruct CLIP training images according to the gradients of soft prompts or an adapter. In addition, MIP includes a label prediction strategy to accelerate convergence and an inverse gradient estimation mechanism to avoid the vanishing gradient problem on the text encoder. Experimental results show that MIP can effectively reconstruct training images according to the gradients of soft prompts or adapters of CLIP models. △ Less

Submitted 25 February, 2024; originally announced March 2024.

arXiv:2403.07838 [pdf, other]

MPCPA: Multi-Center Privacy Computing with Predictions Aggregation based on Denoising Diffusion Probabilistic Model

Authors: Guibo Luo, Hanwen Zhang, Xiuling Wang, Mingzhi Chen, Yuesheng Zhu

Abstract: Privacy-preserving computing is crucial for multi-center machine learning in many applications such as healthcare and finance. In this paper a Multi-center Privacy Computing framework with Predictions Aggregation (MPCPA) based on denoising diffusion probabilistic model (DDPM) is proposed, in which conditional diffusion model training, DDPM data generation, a classifier, and strategy of prediction… ▽ More Privacy-preserving computing is crucial for multi-center machine learning in many applications such as healthcare and finance. In this paper a Multi-center Privacy Computing framework with Predictions Aggregation (MPCPA) based on denoising diffusion probabilistic model (DDPM) is proposed, in which conditional diffusion model training, DDPM data generation, a classifier, and strategy of prediction aggregation are included. Compared to federated learning, this framework necessitates fewer communications and leverages high-quality generated data to support robust privacy computing. Experimental validation across multiple datasets demonstrates that the proposed framework outperforms classic federated learning and approaches the performance of centralized learning with original data. Moreover, our approach demonstrates robust security, effectively addressing challenges such as image memorization and membership inference attacks. Our experiments underscore the efficacy of the proposed framework in the realm of privacy computing, with the code set to be released soon. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.06766 [pdf, other]

Determination of the number of $ψ(3686)$ events taken at BESIII

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: The number of $ψ(3686)$ events collected by the BESIII detector during the 2021 run period is determined to be $(2259.3\pm 11.1)\times 10^6$ by counting inclusive $ψ(3686)$ hadronic events. The uncertainty is systematic and the statistical uncertainty is negligible. Meanwhile, the numbers of $ψ(3686)$ events collected during the 2009 and 2012 run periods are updated to be… ▽ More The number of $ψ(3686)$ events collected by the BESIII detector during the 2021 run period is determined to be $(2259.3\pm 11.1)\times 10^6$ by counting inclusive $ψ(3686)$ hadronic events. The uncertainty is systematic and the statistical uncertainty is negligible. Meanwhile, the numbers of $ψ(3686)$ events collected during the 2009 and 2012 run periods are updated to be $(107.7\pm0.6)\times 10^6$ and $(345.4\pm 2.6)\times 10^6$, respectively. Both numbers are consistent with the previous measurements within one standard deviation. The total number of $ψ(3686)$ events in the three data samples is $(2712.4\pm14.3)\times10^6$. △ Less

Submitted 28 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.06104 [pdf, other]

Universal Debiased Editing on Foundation Models for Fair Medical Image Classification

Authors: Ruinan Jin, Wenlong Deng, Minghui Chen, Xiaoxiao Li

Abstract: In the era of Foundation Models' (FMs) rising prominence in AI, our study addresses the challenge of biases in medical images while using FM API, particularly spurious correlations between pixels and sensitive attributes. Traditional methods for bias mitigation face limitations due to the restricted access to web-hosted FMs and difficulties in addressing the underlying bias encoded within the FM A… ▽ More In the era of Foundation Models' (FMs) rising prominence in AI, our study addresses the challenge of biases in medical images while using FM API, particularly spurious correlations between pixels and sensitive attributes. Traditional methods for bias mitigation face limitations due to the restricted access to web-hosted FMs and difficulties in addressing the underlying bias encoded within the FM API. We propose an U(niversal) D(ebiased) E(diting) strategy, termed UDE, which generates UDE noise to mask such spurious correlation. UDE is capable of mitigating bias both within the FM API embedding and the images themselves. Furthermore, UDE is suitable for both white-box and black-box FM APIs, where we introduced G(reedy) (Z)eroth-O(rder) (GeZO) optimization for it when the gradient is inaccessible in black-box APIs. Our whole pipeline enables fairness-aware image editing that can be applied across various medical contexts without requiring direct model manipulation or significant computational resources. Our empirical results demonstrate the method's effectiveness in maintaining fairness and utility across different patient groups and diseases. In the era of AI-driven medicine, this work contributes to making healthcare diagnostics more equitable, showcasing a practical solution for bias mitigation in pre-trained image FMs. △ Less

Submitted 16 March, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.05125 [pdf, other]

Evaluating Text-to-Image Generative Models: An Empirical Study on Human Image Synthesis

Authors: Muxi Chen, Yi Liu, Jian Yi, Changran Xu, Qiuxia Lai, Hongliang Wang, Tsung-Yi Ho, Qiang Xu

Abstract: In this paper, we present an empirical study introducing a nuanced evaluation framework for text-to-image (T2I) generative models, applied to human image synthesis. Our framework categorizes evaluations into two distinct groups: first, focusing on image qualities such as aesthetics and realism, and second, examining text conditions through concept coverage and fairness. We introduce an innovative… ▽ More In this paper, we present an empirical study introducing a nuanced evaluation framework for text-to-image (T2I) generative models, applied to human image synthesis. Our framework categorizes evaluations into two distinct groups: first, focusing on image qualities such as aesthetics and realism, and second, examining text conditions through concept coverage and fairness. We introduce an innovative aesthetic score prediction model that assesses the visual appeal of generated images and unveils the first dataset marked with low-quality regions in generated human images to facilitate automatic defect detection. Our exploration into concept coverage probes the model's effectiveness in interpreting and rendering text-based concepts accurately, while our analysis of fairness reveals biases in model outputs, with an emphasis on gender, race, and age. While our study is grounded in human imagery, this dual-faceted approach is designed with the flexibility to be applicable to other forms of image generation, enhancing our understanding of generative models and paving the way to the next generation of more sophisticated, contextually aware, and ethically attuned generative models. We will release our code, the data used for evaluating generative models and the dataset annotated with defective areas soon. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.04203 [pdf, other]

Noise Reduction of Stochastic Density Functional Theory for Metals

Authors: Jake P. Vu, Ming Chen

Abstract: Density Functional Theory (DFT) has become a cornerstone in the modeling of metals. However, accurately simulating metals, particularly under extreme conditions, presents two significant challenges. First, simulating complex metallic systems at low electron temperatures is difficult due to their highly delocalized density matrix. Second, modeling metallic warm-dense materials at very high electron… ▽ More Density Functional Theory (DFT) has become a cornerstone in the modeling of metals. However, accurately simulating metals, particularly under extreme conditions, presents two significant challenges. First, simulating complex metallic systems at low electron temperatures is difficult due to their highly delocalized density matrix. Second, modeling metallic warm-dense materials at very high electron temperatures is challenging because it requires the computation of a large number of partially occupied orbitals. This study demonstrates that both challenges can be effectively addressed using the latest advances in linear-scaling stochastic DFT methodologies. Despite the inherent introduction of noise into all computed properties by stochastic DFT, this research evaluates the efficacy of various noise reduction techniques under different thermal conditions. Our observations indicate that the effectiveness of noise reduction strategies varies significantly with the electron temperature. Furthermore, we provide evidence that the computational cost of stochastic DFT methods scales linearly with system size for metal systems, regardless of the electron temperature regime. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.04000 [pdf, other]

doi 10.3847/1538-3881/ad3077

Direct-imaging Discovery of a Substellar Companion Orbiting the Accelerating Variable Star, HIP 39017

Authors: Taylor L. Tobin, Thayne Currie, Yiting Li, Jeffrey Chilcote, Timothy D. Brandt, Brianna Lacy, Masayuki Kuzuhara, Maria Vincent, Mona El Morsy, Vincent Deo, Jonathan P. Williams, Olivier Guyon, Julien Lozi, Sebastien Vievard, Nour Skaf, Kyohoon Ahn, Tyler Groff, N. Jeremy Kasdin, Taichi Uyama, Motohide Tamura, Aidan Gibbs, Briley L. Lewis, Rachel Bowens-Rubin, Maïssa Salama, Qier An , et al. (1 additional authors not shown)

Abstract: We present the direct-imaging discovery of a substellar companion (a massive planet or low-mass brown dwarf) to the young, $γ$ Doradus-type variable star, HIP 39017 (HD 65526). The companion's SCExAO/CHARIS JHK ($1.1-2.4μ$m) spectrum and Keck/NIRC2 L$^{\prime}$ photometry indicate that it is an L/T transition object. A comparison of the JHK+L$^{\prime}$ spectrum to several atmospheric model grids… ▽ More We present the direct-imaging discovery of a substellar companion (a massive planet or low-mass brown dwarf) to the young, $γ$ Doradus-type variable star, HIP 39017 (HD 65526). The companion's SCExAO/CHARIS JHK ($1.1-2.4μ$m) spectrum and Keck/NIRC2 L$^{\prime}$ photometry indicate that it is an L/T transition object. A comparison of the JHK+L$^{\prime}$ spectrum to several atmospheric model grids finds a significantly better fit to cloudy models than cloudless models. Orbit modeling with relative astrometry and precision stellar astrometry from Hipparcos and Gaia yields a semi-major axis of $23.8^{+8.7}_{-6.1}$ au, a dynamical companion mass of $30^{+31}_{-12}$~M$_J$, and a mass ratio of $\sim$1.9\%, properties most consistent with low-mass brown dwarfs. However, its mass estimated from luminosity models is a lower $\sim$13.8 $M_{\rm J}$ due to an estimated young age ($\lesssim$ 115 Myr); using a weighted posterior distribution informed by conservative mass constraints from luminosity evolutionary models yields a lower dynamical mass of $23.6_{-7.4}^{+9.1}$~M$_J$ and a mass ratio of $\sim$1.4\%. Analysis of the host star's multi-frequency $γ$ Dor-type pulsations, astrometric monitoring of HIP 39017b, and Gaia Data Release 4 astrometry of the star will clarify the system age and better constrain the mass and orbit of the companion. This discovery further reinforces the improved efficiency of targeted direct-imaging campaigns informed by long-baseline, precision stellar astrometry. △ Less

Submitted 15 May, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

Comments: Published in AJ: April 9, 2024

Journal ref: 2024, AJ, 167, 205

arXiv:2403.03983 [pdf, other]

The Cosmic Ultraviolet Baryon Survey (CUBS) VIII: Group Environment of the Most Luminous Quasars at $z\approx1$

Authors: Jennifer I. Li, Sean D. Johnson, Erin Boettcher, Sebastiano Cantalupo, Hsiao-Wen Chen, Mandy C. Chen, David R. DePalma, Zhuoqi, Liu, Nishant Mishra, Patrick Petitjean, Zhijie Qu, Gwen C. Rudie, Joop Schaye, Fakhri S. Zahedy

Abstract: We investigate the group-scale environment of 15 luminous quasars (luminosity $L_{\rm 3000}>10^{46}$ erg s$^{-1}$) from the Cosmic Ultraviolet Baryon Survey (CUBS) at redshift $z\approx1$. Using the Multi Unit Spectroscopic Explorer (MUSE) integral field spectrograph on the Very Large Telescope (VLT), we conduct a deep galaxy redshift survey in the CUBS quasar fields to identify group members and… ▽ More We investigate the group-scale environment of 15 luminous quasars (luminosity $L_{\rm 3000}>10^{46}$ erg s$^{-1}$) from the Cosmic Ultraviolet Baryon Survey (CUBS) at redshift $z\approx1$. Using the Multi Unit Spectroscopic Explorer (MUSE) integral field spectrograph on the Very Large Telescope (VLT), we conduct a deep galaxy redshift survey in the CUBS quasar fields to identify group members and measure the physical properties of individual galaxies and galaxy groups. We find that the CUBS quasars reside in diverse environments. The majority (11 out of 15) of the CUBS quasars reside in overdense environments with typical halo masses exceeding $10^{13}{\rm M}_{\odot}$, while the remaining quasars reside in moderate-size galaxy groups. No correlation is observed between overdensity and redshift, black hole (BH) mass, or luminosity. Radio-loud quasars (5 out of 15 CUBS quasars) are more likely to be in overdense environments than their radio-quiet counterparts in the sample, consistent with the mean trends from previous statistical observations and clustering analyses. Nonetheless, we also observe radio-loud quasars in moderate groups and radio-quiet quasars in overdense environments, indicating a large scatter in the connection between radio properties and environment. We find that the most UV luminous quasars might be outliers in the stellar mass-to-halo mass relations or may represent departures from the standard single-epoch BH relations. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 18 pages, 8 figures, accepted for publication in ApJ

arXiv:2403.03863 [pdf, other]

X-Shot: A Unified System to Handle Frequent, Few-shot and Zero-shot Learning Simultaneously in Classification

Authors: Hanzi Xu, Muhao Chen, Lifu Huang, Slobodan Vucetic, Wenpeng Yin

Abstract: In recent years, few-shot and zero-shot learning, which learn to predict labels with limited annotated instances, have garnered significant attention. Traditional approaches often treat frequent-shot (freq-shot; labels with abundant instances), few-shot, and zero-shot learning as distinct challenges, optimizing systems for just one of these scenarios. Yet, in real-world settings, label occurrences… ▽ More In recent years, few-shot and zero-shot learning, which learn to predict labels with limited annotated instances, have garnered significant attention. Traditional approaches often treat frequent-shot (freq-shot; labels with abundant instances), few-shot, and zero-shot learning as distinct challenges, optimizing systems for just one of these scenarios. Yet, in real-world settings, label occurrences vary greatly. Some of them might appear thousands of times, while others might only appear sporadically or not at all. For practical deployment, it is crucial that a system can adapt to any label occurrence. We introduce a novel classification challenge: X-shot, reflecting a real-world context where freq-shot, few-shot, and zero-shot labels co-occur without predefined limits. Here, X can span from 0 to positive infinity. The crux of X-shot centers on open-domain generalization and devising a system versatile enough to manage various label scenarios. To solve X-shot, we propose BinBin (Binary INference Based on INstruction following) that leverages the Indirect Supervision from a large collection of NLP tasks via instruction following, bolstered by Weak Supervision provided by large language models. BinBin surpasses previous state-of-the-art techniques on three benchmark datasets across multiple domains. To our knowledge, this is the first work addressing X-shot learning, where X remains variable. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.03500 [pdf, other]

Observation of the decay $h_{c}\to3(π^{+}π^{-})π^{0}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: Based on $(2712.4\pm14.1)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector, we study the decays $h_{c}\to3(π^{+}π^{-})π^{0}$, $h_{c}\to2(π^{+}π^{-})ω$, $h_{c}\to2(π^{+}π^{-})π^{0}η$, $h_{c}\to2(π^{+}π^{-})η$, and $h_{c}\to p\bar{p}$ via $ψ(3686)\toπ^{0}h_{c}$. The decay channel $h_{c}\to3(π^{+}π^{-})π^{0}$ is observed for the first time, and its branching fraction is determined to… ▽ More Based on $(2712.4\pm14.1)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector, we study the decays $h_{c}\to3(π^{+}π^{-})π^{0}$, $h_{c}\to2(π^{+}π^{-})ω$, $h_{c}\to2(π^{+}π^{-})π^{0}η$, $h_{c}\to2(π^{+}π^{-})η$, and $h_{c}\to p\bar{p}$ via $ψ(3686)\toπ^{0}h_{c}$. The decay channel $h_{c}\to3(π^{+}π^{-})π^{0}$ is observed for the first time, and its branching fraction is determined to be $\left( {9.28\pm 1.14 \pm 0.77} \right) \times {10^{ - 3}}$, where the first uncertainty is statistical and the second is systematic. In addition, first evidence is found for the modes $h_{c} \to 2(π^{+}π^{-})π^{0}η$ and $h_{c}\to2(π^{+}π^{-})ω$ with significances of 4.8$σ$ and 4.7$σ$, and their branching fractions are determined to be $(7.55\pm1.51\pm0.77)\times10^{-3}$ and $\left( {4.00 \pm 0.86 \pm 0.35}\right) \times {10^{ - 3}}$, respectively. No significant signals of $h_c\to 2(π^+π^-)η$ and $h_{c}\to p\bar{p}$ are observed, and the upper limits of the branching fractions of these decays are determined to be $<6.19\times10^{-4}$ and $<4.40\times10^{-5}$ at the 90% confidence level, respectively. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 11 pages, 3 figures

arXiv:2403.03218 [pdf, other]

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Authors: Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer , et al. (32 additional authors not shown)

Abstract: The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing furthe… ▽ More The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing further research into mitigating risk. Furthermore, they focus on only a few, highly specific pathways for malicious use. To fill these gaps, we publicly release the Weapons of Mass Destruction Proxy (WMDP) benchmark, a dataset of 3,668 multiple-choice questions that serve as a proxy measurement of hazardous knowledge in biosecurity, cybersecurity, and chemical security. WMDP was developed by a consortium of academics and technical consultants, and was stringently filtered to eliminate sensitive information prior to public release. WMDP serves two roles: first, as an evaluation for hazardous knowledge in LLMs, and second, as a benchmark for unlearning methods to remove such hazardous knowledge. To guide progress on unlearning, we develop RMU, a state-of-the-art unlearning method based on controlling model representations. RMU reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs. We release our benchmark and code publicly at https://wmdp.ai △ Less

Submitted 15 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: See the project page at https://wmdp.ai

arXiv:2403.03212 [pdf, other]

Performance of a modular ton-scale pixel-readout liquid argon time projection chamber

Authors: DUNE Collaboration, A. Abed Abud, B. Abi, R. Acciarri, M. A. Acero, M. R. Adames, G. Adamov, M. Adamowski, D. Adams, M. Adinolfi, C. Adriano, A. Aduszkiewicz, J. Aguilar, B. Aimard, F. Akbar, K. Allison, S. Alonso Monsalve, M. Alrashed, A. Alton, R. Alvarez, T. Alves, H. Amar, P. Amedo, J. Anderson, D. A. Andrade , et al. (1340 additional authors not shown)

Abstract: The Module-0 Demonstrator is a single-phase 600 kg liquid argon time projection chamber operated as a prototype for the DUNE liquid argon near detector. Based on the ArgonCube design concept, Module-0 features a novel 80k-channel pixelated charge readout and advanced high-coverage photon detection system. In this paper, we present an analysis of an eight-day data set consisting of 25 million cosmi… ▽ More The Module-0 Demonstrator is a single-phase 600 kg liquid argon time projection chamber operated as a prototype for the DUNE liquid argon near detector. Based on the ArgonCube design concept, Module-0 features a novel 80k-channel pixelated charge readout and advanced high-coverage photon detection system. In this paper, we present an analysis of an eight-day data set consisting of 25 million cosmic ray events collected in the spring of 2021. We use this sample to demonstrate the imaging performance of the charge and light readout systems as well as the signal correlations between the two. We also report argon purity and detector uniformity measurements, and provide comparisons to detector simulations. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 47 pages, 41 figures

Report number: FERMILAB-PUB-24-0073-LBNF

arXiv:2403.02012 [pdf, other]

OTFS vs OFDM: Which is Superior in Multiuser LEO Satellite Communications

Authors: Yu Liu, Ming Chen, Cunhua Pan, Tantao Gong, Jinhong Yuan, Jiangzhou Wang

Abstract: Orthogonal time frequency space (OTFS) modulation, a delay-Doppler (DD) domain communication scheme exhibiting strong robustness against the Doppler shifts, has the potentials to be employed in LEO satellite communications. However, the performance comparison with the orthogonal frequency division multiplexing (OFDM) modulation and the resource allocation scheme for multiuser OTFS-based LEO satell… ▽ More Orthogonal time frequency space (OTFS) modulation, a delay-Doppler (DD) domain communication scheme exhibiting strong robustness against the Doppler shifts, has the potentials to be employed in LEO satellite communications. However, the performance comparison with the orthogonal frequency division multiplexing (OFDM) modulation and the resource allocation scheme for multiuser OTFS-based LEO satellite communication system have rarely been investigated. In this paper, we conduct a performance comparison under various channel conditions between the OTFS and OFDM modulations, encompassing evaluations of sum-rate and bit error ratio (BER). Additionally, we investigate the joint optimal allocation of power and delay-Doppler resource blocks aiming at maximizing sum-rate for multiuser downlink OTFS-based LEO satellite communication systems. Unlike the conventional modulations relaying on complex input-output relations within the Time-Frequency (TF) domain, the OTFS modulation exploits both time and frequency diversities, i.e., delay and Doppler shifts remain constant during a OTFS frame, which facilitates a DD domain input-output simple relation for our investigation. We transform the resulting non-convex and combinatorial optimization problem into an equivalent difference of convex problem by decoupling the conditional constraints, and solve the transformed problem via penalty convex-concave procedure algorithm. Simulation results demonstrate that the OTFS modulation is robust to carrier frequency offsets (CFO) caused by high-mobility of LEO satellites, and has superior performance to the OFDM modulation. Moreover, numerical results indicate that our proposed resource allocation scheme has higher sum-rate than existed schemes for the OTFS modulation, such as delay divided multiple access and Doppler divided multiple access, especially in the high signal-to-noise ratio (SNR) regime. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 13 pages, 9 figures

arXiv:2403.01761 [pdf, other]

Observation of $ψ(3686)\to 3φ$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (645 additional authors not shown)

Abstract: Using $(2.712\pm0.014)\times 10^9$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, we report the first observation of $ψ(3686)\to 3φ$ decay with a significance larger than 10$σ$. The branching fraction of this decay is determined to be $(1.46\pm0.05\pm0.17)\times10^{-5}$, where the first uncertainty is statistical and the second is systematic. No significant str… ▽ More Using $(2.712\pm0.014)\times 10^9$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, we report the first observation of $ψ(3686)\to 3φ$ decay with a significance larger than 10$σ$. The branching fraction of this decay is determined to be $(1.46\pm0.05\pm0.17)\times10^{-5}$, where the first uncertainty is statistical and the second is systematic. No significant structure is observed in the $φφ$ invariant mass spectra. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.01710 [pdf, other]

Sensor-based Multi-Robot Coverage Control with Spatial Separation in Unstructured Environments

Authors: Xinyi Wang, Jiwen Xu, Chuanxiang Gao, Yizhou Chen, Jihan Zhang, Chenggang Wang, Ben M. Chen

Abstract: Multi-robot systems have increasingly become instrumental in tackling search and coverage problems. However, the challenge of optimizing task efficiency without compromising task success still persists, particularly in expansive, unstructured environments with dense obstacles. This paper presents an innovative, decentralized Voronoi-based approach for search and coverage to reactively navigate t… ▽ More Multi-robot systems have increasingly become instrumental in tackling search and coverage problems. However, the challenge of optimizing task efficiency without compromising task success still persists, particularly in expansive, unstructured environments with dense obstacles. This paper presents an innovative, decentralized Voronoi-based approach for search and coverage to reactively navigate these complexities while maintaining safety. This approach leverages the active sensing capabilities of multi-robot systems to supplement GIS (Geographic Information System), offering a more comprehensive and real-time understanding of the environment. Based on point cloud data, which is inherently non-convex and unstructured, this method efficiently generates collision-free Voronoi regions using only local sensing information through spatial decomposition and spherical mirroring techniques. Then, deadlock-aware guided map integrated with a gradient-optimized, centroid Voronoi-based coverage control policy, is constructed to improve efficiency by avoiding exhaustive searches and local sensing pitfalls. The effectiveness of our algorithm has been validated through extensive numerical simulations in high-fidelity environments, demonstrating significant improvements in both task success rate, coverage ratio, and task execution time compared with others. △ Less

Submitted 10 April, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

arXiv:2403.01639 [pdf, other]

Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian Mixture Models

Authors: Yuchen Wu, Minshuo Chen, Zihao Li, Mengdi Wang, Yuting Wei

Abstract: Diffusion models benefit from instillation of task-specific information into the score function to steer the sample generation towards desired properties. Such information is coined as guidance. For example, in text-to-image synthesis, text input is encoded as guidance to generate semantically aligned images. Proper guidance inputs are closely tied to the performance of diffusion models. A common… ▽ More Diffusion models benefit from instillation of task-specific information into the score function to steer the sample generation towards desired properties. Such information is coined as guidance. For example, in text-to-image synthesis, text input is encoded as guidance to generate semantically aligned images. Proper guidance inputs are closely tied to the performance of diffusion models. A common observation is that strong guidance promotes a tight alignment to the task-specific information, while reducing the diversity of the generated samples. In this paper, we provide the first theoretical study towards understanding the influence of guidance on diffusion models in the context of Gaussian mixture models. Under mild conditions, we prove that incorporating diffusion guidance not only boosts classification confidence but also diminishes distribution diversity, leading to a reduction in the differential entropy of the output distribution. Our analysis covers the widely adopted sampling schemes including DDPM and DDIM, and leverages comparison inequalities for differential equations as well as the Fokker-Planck equation that characterizes the evolution of probability density function, which may be of independent theoretical interest. △ Less

Submitted 3 March, 2024; originally announced March 2024.

Comments: 41 pages, 12 figures

arXiv:2403.01504 [pdf, ps, other]

doi 10.1109/JMMCT.2024.3370729

Spin and Orbital Angular Momenta of Electromagnetic Waves: From Classical to Quantum Forms

Authors: Wei E. I. Sha, Zhihao Lan, Menglin L. N. Chen, Yongpin P. Chen, Sheng Sun

Abstract: Angular momenta of electromagnetic waves are important both in concepts and applications. In this work, we systematically discuss two types of angular momenta, i.e., spin angular momentum and orbital angular momentum in various cases, e.g., with source and without source, in classical and quantum forms. Numerical results demonstrating how to extract the topological charge of a classical vortex bea… ▽ More Angular momenta of electromagnetic waves are important both in concepts and applications. In this work, we systematically discuss two types of angular momenta, i.e., spin angular momentum and orbital angular momentum in various cases, e.g., with source and without source, in classical and quantum forms. Numerical results demonstrating how to extract the topological charge of a classical vortex beam by spectral method are also presented. △ Less

Submitted 3 March, 2024; originally announced March 2024.

Comments: 5 pages, 3 figures

Journal ref: IEEE Journal on Multiscale and Multiphysics Computational Techniques, 2024

arXiv:2403.01374 [pdf, other]

A Novel Dynamic Light-Section 3D Reconstruction Method for Wide-Range Sensing

Authors: Mengjuan Chen, Qing Li, Kohei Shimasaki, Shaopeng Hu, Qingyi Gu, Idaku Ishii

Abstract: Existing galvanometer-based laser scanning systems are challenging to apply in multi-scale 3D reconstruction because of the difficulty in achieving a balance between high reconstruction accuracy and a wide reconstruction range. This paper presents a novel method that synchronizes laser scanning by switching the field-of-view (FOV) of a camera using multi-galvanometers. In addition to the advanced… ▽ More Existing galvanometer-based laser scanning systems are challenging to apply in multi-scale 3D reconstruction because of the difficulty in achieving a balance between high reconstruction accuracy and a wide reconstruction range. This paper presents a novel method that synchronizes laser scanning by switching the field-of-view (FOV) of a camera using multi-galvanometers. In addition to the advanced hardware setup, we establish a comprehensive mathematical model of the system by modeling dynamic camera, dynamic laser, and their combined interaction. We then propose a high-precision and flexible calibration method by constructing an error model and minimizing the objective function. Finally, we evaluate the performance of the proposed system by scanning standard components. The evaluation results demonstrate that the accuracy of the proposed 3D reconstruction system achieves 0.3 mm when the measurement range is extended to 1100 mm $\times$ 1300 mm $\times$ 650 mm. With the same reconstruction accuracy, the reconstruction range is expanded by a factor of 25, indicating that the proposed method simultaneously allows for high-precision and wide-range 3D reconstruction in industrial applications. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Comments: 9 pages,6 figures, Journal

MSC Class: First-level 68 ACM Class: I.4.9

arXiv:2403.01198 [pdf]

Organic solvent boosts charge storage and charging dynamics of conductive MOF supercapacitors

Authors: Ming Chen, Taizheng Wu, Liang Niu, Ting Ye, Wenlei Dai, Liang Zeng, Alexei A. Kornyshev, Zhenxiang Wang, Zhou Liu, Guang Feng

Abstract: Conductive metal-organic frameworks (c-MOFs) and ionic liquids (ILs) have emerged as auspicious combinations for high-performance supercapacitors. However, the nanoconfinement from c-MOFs and high viscosity of ILs slow down the charging process. This hindrance can, however, be resolved by adding solvent. Here, we performed constant-potential molecular simulations to scrutinize the solvent impact o… ▽ More Conductive metal-organic frameworks (c-MOFs) and ionic liquids (ILs) have emerged as auspicious combinations for high-performance supercapacitors. However, the nanoconfinement from c-MOFs and high viscosity of ILs slow down the charging process. This hindrance can, however, be resolved by adding solvent. Here, we performed constant-potential molecular simulations to scrutinize the solvent impact on charge storage and charging dynamics of MOF-IL-based supercapacitors. We find conditions for >100% enhancement in capacity and ~6 times increase in charging speed. These improvements were confirmed by synthesizing near-ideal c-MOFs and developing multiscale models linking molecular simulations to electrochemical measurements. Fundamentally, our findings elucidate that the solvent acts as an ionophobic agent to induce a substantial enhancement in charge storage, and as an ion traffic police to eliminate convoluted counterion and co-ion motion paths and create two distinct ion transport highways to accelerate charging dynamics. This work paves the way for the optimal design of MOF supercapacitors. △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2403.01192 [pdf, other]

A Composite Decomposition Method for Large-Scale Global Optimization

Authors: Maojiang Tian, Minyang Chen, Wei Du, Yang Tang, Yaochu Jin, Gary G. Yen

Abstract: Cooperative co-evolution (CC) algorithms, based on the divide-and-conquer strategy, have emerged as the predominant approach to solving large-scale global optimization (LSGO) problems. The efficiency and accuracy of the grouping stage significantly impact the performance of the optimization process. While the general separability grouping (GSG) method has overcome the limitation of previous differ… ▽ More Cooperative co-evolution (CC) algorithms, based on the divide-and-conquer strategy, have emerged as the predominant approach to solving large-scale global optimization (LSGO) problems. The efficiency and accuracy of the grouping stage significantly impact the performance of the optimization process. While the general separability grouping (GSG) method has overcome the limitation of previous differential grouping (DG) methods by enabling the decomposition of non-additively separable functions, it suffers from high computational complexity. To address this challenge, this article proposes a composite separability grouping (CSG) method, seamlessly integrating DG and GSG into a problem decomposition framework to utilize the strengths of both approaches. CSG introduces a step-by-step decomposition framework that accurately decomposes various problem types using fewer computational resources. By sequentially identifying additively, multiplicatively and generally separable variables, CSG progressively groups non-separable variables by recursively considering the interactions between each non-separable variable and the formed non-separable groups. Furthermore, to enhance the efficiency and accuracy of CSG, we introduce two innovative methods: a multiplicatively separable variable detection method and a non-separable variable grouping method. These two methods are designed to effectively detect multiplicatively separable variables and efficiently group non-separable variables, respectively. Extensive experimental results demonstrate that CSG achieves more accurate variable grouping with lower computational complexity compared to GSG and state-of-the-art DG series designs. △ Less

Submitted 8 March, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

arXiv:2403.01053 [pdf, other]

Seeing Unseen: Discover Novel Biomedical Concepts via Geometry-Constrained Probabilistic Modeling

Authors: Jianan Fan, Dongnan Liu, Hang Chang, Heng Huang, Mei Chen, Weidong Cai

Abstract: Machine learning holds tremendous promise for transforming the fundamental practice of scientific discovery by virtue of its data-driven nature. With the ever-increasing stream of research data collection, it would be appealing to autonomously explore patterns and insights from observational data for discovering novel classes of phenotypes and concepts. However, in the biomedical domain, there are… ▽ More Machine learning holds tremendous promise for transforming the fundamental practice of scientific discovery by virtue of its data-driven nature. With the ever-increasing stream of research data collection, it would be appealing to autonomously explore patterns and insights from observational data for discovering novel classes of phenotypes and concepts. However, in the biomedical domain, there are several challenges inherently presented in the cumulated data which hamper the progress of novel class discovery. The non-i.i.d. data distribution accompanied by the severe imbalance among different groups of classes essentially leads to ambiguous and biased semantic representations. In this work, we present a geometry-constrained probabilistic modeling treatment to resolve the identified issues. First, we propose to parameterize the approximated posterior of instance embedding as a marginal von MisesFisher distribution to account for the interference of distributional latent bias. Then, we incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space, which in turn minimizes the uncontrollable risk for unknown class learning and structuring. Furthermore, a spectral graph-theoretic method is devised to estimate the number of potential novel classes. It inherits two intriguing merits compared to existent approaches, namely high computational efficiency and flexibility for taxonomy-adaptive estimation. Extensive experiments across various biomedical scenarios substantiate the effectiveness and general applicability of our method. △ Less

Submitted 5 March, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

Comments: CVPR 2024

Showing 201–250 of 3,457 results for author: Chen, M