subscribe to arXiv mailings

Highest Fusion Performance without Harmful Edge Energy Bursts in Tokamak

Authors: SangKyeun Kim, Ricardo Shousha, SeongMoo Yang, Qiming Hu, SangHee Hahn, Azarakhsh Jalalvand, Jong-Kyu Park, Nikolas Christopher Logan, Andrew Oakleigh Nelson, Yong-Su Na, Raffi Nazikian, Robert Wilcox, Rongjie Hong, Terry Rhodes, Carlos Paz-Soldan, YoungMu Jeon, MinWoo Kim, WongHa Ko, JongHa Lee, Alexander Battey, Alessandro Bortolon, Joseph Snipes, Egemen Kolemen

Abstract: The path of tokamak fusion and ITER is maintaining high-performance plasma to produce sufficient fusion power. This effort is hindered by the transient energy burst arising from the instabilities at the boundary of high-confinement plasmas. The application of 3D magnetic perturbations is the method in ITER and possibly in future fusion power plants to suppress this instability and avoid energy bus… ▽ More The path of tokamak fusion and ITER is maintaining high-performance plasma to produce sufficient fusion power. This effort is hindered by the transient energy burst arising from the instabilities at the boundary of high-confinement plasmas. The application of 3D magnetic perturbations is the method in ITER and possibly in future fusion power plants to suppress this instability and avoid energy busts damaging the device. Unfortunately, the conventional use of the 3D field in tokamaks typically leads to degraded fusion performance and an increased risk of other plasma instabilities, two severe issues for reactor implementation. In this work, we present an innovative 3D field optimization, exploiting machine learning, real-time adaptability, and multi-device capabilities to overcome these limitations. This integrated scheme is successfully deployed on DIII-D and KSTAR tokamaks, consistently achieving reactor-relevant core confinement and the highest fusion performance without triggering damaging instabilities or bursts while demonstrating ITER-relevant automated 3D optimization for the first time. This is enabled both by advances in the physics understanding of self-organized transport in the plasma edge and by advances in machine-learning technology, which is used to optimize the 3D field spectrum for automated management of a volatile and complex system. These findings establish real-time adaptive 3D field optimization as a crucial tool for ITER and future reactors to maximize fusion performance while simultaneously minimizing damage to machine components. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.04819 [pdf, other]

DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature

Authors: Dawei Li, Shu Yang, Zhen Tan, Jae Young Baik, Sukwon Yun, Joseph Lee, Aaron Chacko, Bojian Hou, Duy Duong-Tran, Ying Ding, Huan Liu, Li Shen, Tianlong Chen

Abstract: Recent advancements in large language models (LLMs) have achieved promising performances across various applications. Nonetheless, the ongoing challenge of integrating long-tail knowledge continues to impede the seamless adoption of LLMs in specialized domains. In this work, we introduce DALK, a.k.a. Dynamic Co-Augmentation of LLMs and KG, to address this limitation and demonstrate its ability on… ▽ More Recent advancements in large language models (LLMs) have achieved promising performances across various applications. Nonetheless, the ongoing challenge of integrating long-tail knowledge continues to impede the seamless adoption of LLMs in specialized domains. In this work, we introduce DALK, a.k.a. Dynamic Co-Augmentation of LLMs and KG, to address this limitation and demonstrate its ability on studying Alzheimer's Disease (AD), a specialized sub-field in biomedicine and a global health priority. With a synergized framework of LLM and KG mutually enhancing each other, we first leverage LLM to construct an evolving AD-specific knowledge graph (KG) sourced from AD-related scientific literature, and then we utilize a coarse-to-fine sampling method with a novel self-aware knowledge retrieval approach to select appropriate knowledge from the KG to augment LLM inference capabilities. The experimental results, conducted on our constructed AD question answering (ADQA) benchmark, underscore the efficacy of DALK. Additionally, we perform a series of detailed analyses that can offer valuable insights and guidelines for the emerging topic of mutually enhancing KG and LLM. We will release the code and data at https://github.com/David-Li0406/DALK. △ Less

Submitted 12 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

Comments: Under Review; Incorrect author name revised

arXiv:2405.04532 [pdf, other]

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Authors: Yujun Lin, Haotian Tang, Shang Yang, Zhekai Zhang, Guangxuan Xiao, Chuang Gan, Song Han

Abstract: Quantization can accelerate large language model (LLM) inference. Going beyond INT8 quantization, the research community is actively exploring even lower precision, such as INT4. Nonetheless, state-of-the-art INT4 quantization techniques only accelerate low-batch, edge LLM inference, failing to deliver performance gains in large-batch, cloud-based LLM serving. We uncover a critical issue: existing… ▽ More Quantization can accelerate large language model (LLM) inference. Going beyond INT8 quantization, the research community is actively exploring even lower precision, such as INT4. Nonetheless, state-of-the-art INT4 quantization techniques only accelerate low-batch, edge LLM inference, failing to deliver performance gains in large-batch, cloud-based LLM serving. We uncover a critical issue: existing INT4 quantization methods suffer from significant runtime overhead (20-90%) when dequantizing either weights or partial sums on GPUs. To address this challenge, we introduce QoQ, a W4A8KV4 quantization algorithm with 4-bit weight, 8-bit activation, and 4-bit KV cache. QoQ stands for quattuor-octo-quattuor, which represents 4-8-4 in Latin. QoQ is implemented by the QServe inference library that achieves measured speedup. The key insight driving QServe is that the efficiency of LLM serving on GPUs is critically influenced by operations on low-throughput CUDA cores. Building upon this insight, in QoQ algorithm, we introduce progressive quantization that can allow low dequantization overhead in W4A8 GEMM. Additionally, we develop SmoothAttention to effectively mitigate the accuracy degradation incurred by 4-bit KV quantization. In the QServe system, we perform compute-aware weight reordering and take advantage of register-level parallelism to reduce dequantization latency. We also make fused attention memory-bound, harnessing the performance gain brought by KV4 quantization. As a result, QServe improves the maximum achievable serving throughput of Llama-3-8B by 1.2x on A100, 1.4x on L40S; and Qwen1.5-72B by 2.4x on A100, 3.5x on L40S, compared to TensorRT-LLM. Remarkably, QServe on L40S GPU can achieve even higher throughput than TensorRT-LLM on A100. Thus, QServe effectively reduces the dollar cost of LLM serving by 3x. Code is available at https://github.com/mit-han-lab/qserve. △ Less

Submitted 10 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

Comments: The first three authors contribute equally to this project and are listed in the alphabetical order. Yujun Lin leads the quantization algorithm, Haotian Tang and Shang Yang lead the GPU kernels and the serving system. Code is available at https://github.com/mit-han-lab/qserve

arXiv:2405.04496 [pdf, other]

Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing

Authors: Yi Zuo, Lingling Li, Licheng Jiao, Fang Liu, Xu Liu, Wenping Ma, Shuyuan Yang, Yuwei Guo

Abstract: Existing diffusion-based video editing methods have achieved impressive results in motion editing. Most of the existing methods focus on the motion alignment between the edited video and the reference video. However, these methods do not constrain the background and object content of the video to remain unchanged, which makes it possible for users to generate unexpected videos. In this paper, we p… ▽ More Existing diffusion-based video editing methods have achieved impressive results in motion editing. Most of the existing methods focus on the motion alignment between the edited video and the reference video. However, these methods do not constrain the background and object content of the video to remain unchanged, which makes it possible for users to generate unexpected videos. In this paper, we propose a one-shot video motion editing method called Edit-Your-Motion that requires only a single text-video pair for training. Specifically, we design the Detailed Prompt-Guided Learning Strategy (DPL) to decouple spatio-temporal features in space-time diffusion models. DPL separates learning object content and motion into two training stages. In the first training stage, we focus on learning the spatial features (the features of object content) and breaking down the temporal relationships in the video frames by shuffling them. We further propose Recurrent-Causal Attention (RC-Attn) to learn the consistent content features of the object from unordered video frames. In the second training stage, we restore the temporal relationship in video frames to learn the temporal feature (the features of the background and object's motion). We also adopt the Noise Constraint Loss to smooth out inter-frame differences. Finally, in the inference stage, we inject the content features of the source object into the editing branch through a two-branch structure (editing branch and reconstruction branch). With Edit-Your-Motion, users can edit the motion of objects in the source video to generate more exciting and diverse videos. Comprehensive qualitative experiments, quantitative experiments and user preference studies demonstrate that Edit-Your-Motion performs better than other methods. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.04286 [pdf, other]

Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore

Authors: Junchao Wu, Runzhe Zhan, Derek F. Wong, Shu Yang, Xuebo Liu, Lidia S. Chao, Min Zhang

Abstract: The efficacy of an large language model (LLM) generated text detector depends substantially on the availability of sizable training data. White-box zero-shot detectors, which require no such data, are nonetheless limited by the accessibility of the source model of the LLM-generated text. In this paper, we propose an simple but effective black-box zero-shot detection approach, predicated on the obs… ▽ More The efficacy of an large language model (LLM) generated text detector depends substantially on the availability of sizable training data. White-box zero-shot detectors, which require no such data, are nonetheless limited by the accessibility of the source model of the LLM-generated text. In this paper, we propose an simple but effective black-box zero-shot detection approach, predicated on the observation that human-written texts typically contain more grammatical errors than LLM-generated texts. This approach entails computing the Grammar Error Correction Score (GECScore) for the given text to distinguish between human-written and LLM-generated text. Extensive experimental results show that our method outperforms current state-of-the-art (SOTA) zero-shot and supervised methods, achieving an average AUROC of 98.7% and showing strong robustness against paraphrase and adversarial perturbation attacks. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.03989 [pdf]

A Method for Parsing and Vectorization of Semi-structured Data used in Retrieval Augmented Generation

Authors: Hang Yang, Jing Guo, Jianchuan Qi, Jinliang Xie, Si Zhang, Siqi Yang, Nan Li, Ming Xu

Abstract: This paper presents a novel method for parsing and vectorizing semi-structured data to enhance the functionality of Retrieval-Augmented Generation (RAG) within Large Language Models (LLMs). We developed a comprehensive pipeline for converting various data formats into .docx, enabling efficient parsing and structured data extraction. The core of our methodology involves the construction of a vector… ▽ More This paper presents a novel method for parsing and vectorizing semi-structured data to enhance the functionality of Retrieval-Augmented Generation (RAG) within Large Language Models (LLMs). We developed a comprehensive pipeline for converting various data formats into .docx, enabling efficient parsing and structured data extraction. The core of our methodology involves the construction of a vector database using Pinecone, which integrates seamlessly with LLMs to provide accurate, context-specific responses, particularly in environmental management and wastewater treatment operations. Through rigorous testing with both English and Chinese texts in diverse document formats, our results demonstrate a marked improvement in the precision and reliability of LLMs outputs. The RAG-enhanced models displayed enhanced ability to generate contextually rich and technically accurate responses, underscoring the potential of vector knowledge bases in significantly boosting the performance of LLMs in specialized domains. This research not only illustrates the effectiveness of our method but also highlights its potential to revolutionize data processing and analysis in environmental sciences, setting a precedent for future advancements in AI-driven applications. Our code is available at https://github.com/linancn/TianGong-AI-Unstructure.git. △ Less

Submitted 8 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

Comments: 20 pages,4 figures, 5 tables

arXiv:2405.03711 [pdf, other]

doi 10.1109/ACCESS.2024.3383322

Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning

Authors: Xiao Hu, Tianshu Wang, Min Gong, Shaoshi Yang

Abstract: Guidance commands of flight vehicles are a series of data sets with fixed time intervals, thus guidance design constitutes a sequential decision problem and satisfies the basic conditions for using deep reinforcement learning (DRL). In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on DRL and the pursuit flight vehicle (PFV) generates g… ▽ More Guidance commands of flight vehicles are a series of data sets with fixed time intervals, thus guidance design constitutes a sequential decision problem and satisfies the basic conditions for using deep reinforcement learning (DRL). In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on DRL and the pursuit flight vehicle (PFV) generates guidance commands based on the proportional navigation method. For the EFV, the objective of the guidance design entails progressively maximizing the residual velocity, subject to the constraint imposed by the given evasion distance. Thus an irregular dynamic max-min problem of extremely large-scale is formulated, where the time instant when the optimal solution can be attained is uncertain and the optimum solution depends on all the intermediate guidance commands generated before. For solving this problem, a two-step strategy is conceived. In the first step, we use the proximal policy optimization (PPO) algorithm to generate the guidance commands of the EFV. The results obtained by PPO in the global search space are coarse, despite the fact that the reward function, the neural network parameters and the learning rate are designed elaborately. Therefore, in the second step, we propose to invoke the evolution strategy (ES) based algorithm, which uses the result of PPO as the initial value, to further improve the quality of the solution by searching in the local space. Simulation results demonstrate that the proposed guidance design method based on the PPO algorithm is capable of achieving a residual velocity of 67.24 m/s, higher than the residual velocities achieved by the benchmark soft actor-critic and deep deterministic policy gradient algorithms. Furthermore, the proposed ES-enhanced PPO algorithm outperforms the PPO algorithm by 2.7\%, achieving a residual velocity of 69.04 m/s. △ Less

Submitted 4 May, 2024; originally announced May 2024.

Comments: 13 pages, 13 figures, accepted to appear on IEEE Access, Mar. 2024

Journal ref: IEEE Access, vol. 12, pp. 48210-48222, Mar. 2024

arXiv:2405.03064 [pdf, other]

RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation

Authors: Zelei Cheng, Xian Wu, Jiahao Yu, Sabrina Yang, Gang Wang, Xinyu Xing

Abstract: Deep reinforcement learning (DRL) is playing an increasingly important role in real-world applications. However, obtaining an optimally performing DRL agent for complex tasks, especially with sparse rewards, remains a significant challenge. The training of a DRL agent can be often trapped in a bottleneck without further progress. In this paper, we propose RICE, an innovative refining scheme for re… ▽ More Deep reinforcement learning (DRL) is playing an increasingly important role in real-world applications. However, obtaining an optimally performing DRL agent for complex tasks, especially with sparse rewards, remains a significant challenge. The training of a DRL agent can be often trapped in a bottleneck without further progress. In this paper, we propose RICE, an innovative refining scheme for reinforcement learning that incorporates explanation methods to break through the training bottlenecks. The high-level idea of RICE is to construct a new initial state distribution that combines both the default initial states and critical states identified through explanation methods, thereby encouraging the agent to explore from the mixed initial states. Through careful design, we can theoretically guarantee that our refining scheme has a tighter sub-optimality bound. We evaluate RICE in various popular RL environments and real-world applications. The results demonstrate that RICE significantly outperforms existing refining schemes in enhancing agent performance. △ Less

Submitted 5 June, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

Comments: Accepted by ICML 2024

arXiv:2405.02928 [pdf, other]

Probabilistic cellular automata with local transition matrices: synchronization, ergodicity, and inference

Authors: Erhan Bayraktar, Fei Lu, Mauro Maggioni, Ruoyu Wu, Sichen Yang

Abstract: We introduce a new class of probabilistic cellular automata that are capable of exhibiting rich dynamics such as synchronization and ergodicity and can be easily inferred from data. The system is a finite-state locally interacting Markov chain on a circular graph. Each site's subsequent state is random, with a distribution determined by its neighborhood's empirical distribution multiplied by a loc… ▽ More We introduce a new class of probabilistic cellular automata that are capable of exhibiting rich dynamics such as synchronization and ergodicity and can be easily inferred from data. The system is a finite-state locally interacting Markov chain on a circular graph. Each site's subsequent state is random, with a distribution determined by its neighborhood's empirical distribution multiplied by a local transition matrix. We establish sufficient and necessary conditions on the local transition matrix for synchronization and ergodicity. Also, we introduce novel least squares estimators for inferring the local transition matrix from various types of data, which may consist of either multiple trajectories, a long trajectory, or ensemble sequences without trajectory information. Under suitable identifiability conditions, we show the asymptotic normality of these estimators and provide non-asymptotic bounds for their accuracy. △ Less

Submitted 23 June, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

Comments: 30 pages, 3 figures

MSC Class: 60J10; 62F12

arXiv:2405.02633 [pdf, other]

Risk Assessment for Nonlinear Cyber-Physical Systems under Stealth Attacks

Authors: Guang Chen, Zhicong Sun, Yulong Ding, Shuang-hua Yang

Abstract: Stealth attacks pose potential risks to cyber-physical systems because they are difficult to detect. Assessing the risk of systems under stealth attacks remains an open challenge, especially in nonlinear systems. To comprehensively quantify these risks, we propose a framework that considers both the reachability of a system and the risk distribution of a scenario. We propose an algorithm to approx… ▽ More Stealth attacks pose potential risks to cyber-physical systems because they are difficult to detect. Assessing the risk of systems under stealth attacks remains an open challenge, especially in nonlinear systems. To comprehensively quantify these risks, we propose a framework that considers both the reachability of a system and the risk distribution of a scenario. We propose an algorithm to approximate the reachability of a nonlinear system under stealth attacks with a union of standard sets. Meanwhile, we present a method to construct a risk field to formally describe the risk distribution in a given scenario. The intersection relationships of system reachability and risk regions in the risk field indicate that attackers can cause corresponding risks without being detected. Based on this, we introduce a metric to dynamically quantify the risk. Compared to traditional methods, our framework predicts the risk value in an explainable way and provides early warnings for safety control. We demonstrate the effectiveness of our framework through a case study of an automated warehouse. △ Less

Submitted 4 May, 2024; originally announced May 2024.

Comments: 12 pages and 9 figures

arXiv:2405.02554 [pdf, ps, other]

Kinetic energy and streamline properties for irrotational equatorial wind waves

Authors: Jian Li, Shaojie Yang

Abstract: In this paper, we investigate kinetic energy and streamline properties for an irrotational periodic geophysical traveling surface water waves propagating in equatorial oceanic regions. Relying on the methods from complex analysis, we prove the logarithmic convexity and monotonicity of specific flow variables. By means of conformal mappings, we derive some qualitative results for kinetic energy and… ▽ More In this paper, we investigate kinetic energy and streamline properties for an irrotational periodic geophysical traveling surface water waves propagating in equatorial oceanic regions. Relying on the methods from complex analysis, we prove the logarithmic convexity and monotonicity of specific flow variables. By means of conformal mappings, we derive some qualitative results for kinetic energy and streamline, such as streamline time-period being independent of any moment and any point on the streamline in steady flow, the concavity and monotonicity of total kinetic energy within the region between two streamlines and the convexity and monotonicity of total kinetic energy over a streamline time-period. Moreover, we present several results about irrotational equatorial wind waves, such as an upper bound of the minimum of streamline time-period, an upper bound of the maximum of area within the region between two streamlines. Taking advantage of the Bernoulli's law and the Schwarz reflection principle, we show that the extremum of the kinetic energy is attained on the free surface for irrotational equatorial wind waves. △ Less

Submitted 19 May, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

MSC Class: 76B15; 30C20

arXiv:2405.02549 [pdf, other]

Defect-Assisted Domain Nucleation Drives Unique Exchange Bias Phenomena in $\bf{MnBi_2Te_4}$

Authors: Shiqi Yang, Xiaolong Xu, Yuchen Gao, Roger Guzman, Pingfan Gu, Huan Wang, Yuan Huang, Wu Zhou, Tianlong Xia, Yu Ye

Abstract: The study of the mechanism of exchange bias phenomena and the achievement of its efficient control are of great importance, as it promotes the revelation of unique exchange interactions and the development of exotic applications. However, it is challenging due to the elusive interface between magnetic phases. In this study, we report an unprecedented exchange bias phenomenon observed in ultrathin… ▽ More The study of the mechanism of exchange bias phenomena and the achievement of its efficient control are of great importance, as it promotes the revelation of unique exchange interactions and the development of exotic applications. However, it is challenging due to the elusive interface between magnetic phases. In this study, we report an unprecedented exchange bias phenomenon observed in ultrathin uncompensated antiferromagnetic MnBi$_2$Te$_4$. The magnitude and direction of the exchange field can be intentionally controlled by designing a magnetic field sweep protocol without a field cooling process. The combined experimental and theoretical simulation results indicate that the spin-flip process assisted by the ubiquitous defect-induced pinning domain sites with varying inner exchange interactions might give rise to the emergence and robustness of this peculiar exchange bias. The temperature and thickness dependence of the exchange bias phenomena are systematically investigated for further study and exploitation of its unique properties. This mechanism hold promise for highly tunable exchange bias in prevalent magnetic systems by engineering the properties of domain structures, and also offers promising avenues for the design of spintronic devices combing its topology based on MnBi$_2$Te$_4$. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: 8 pages, 4 figures

arXiv:2405.02304 [pdf, other]

High-finesse nanofiber Fabry-Pérot resonator in a portable storage container

Authors: S. Horikawa, S. Yang, T. Tanaka, T. Aoki, S. Kato

Abstract: We present characterization and storage methods for a high-finesse nanofiber Fabry-Pérot resonator. Reflection spectroscopy from both ends of the resonator allows for evaluation of the mirror transmittances and optical loss inside the resonator. To maintain the quality of the nanofiber resonator after the fabrication, we have developed a portable storage container. By filling the container with dr… ▽ More We present characterization and storage methods for a high-finesse nanofiber Fabry-Pérot resonator. Reflection spectroscopy from both ends of the resonator allows for evaluation of the mirror transmittances and optical loss inside the resonator. To maintain the quality of the nanofiber resonator after the fabrication, we have developed a portable storage container. By filling the container with dry, clean nitrogen gas, we can prevent contamination of the nanofiber during storage. This approach allows us to minimize the additional optical loss to less than 0.08% over a week. The portable container facilitates both the fabrication and subsequent experimentation with the resonator in different locations. This flexibility expands the range of applications, including quantum optics, communication, and sensing. △ Less

Submitted 7 May, 2024; v1 submitted 18 March, 2024; originally announced May 2024.

Comments: 4 pages, 3 figures; corrected typos and updated the address

arXiv:2405.01329 [pdf, other]

Decentralization of Ethereum's Builder Market

Authors: Sen Yang, Kartik Nayak, Fan Zhang

Abstract: Blockchains protect an ecosystem worth more than $500bn with their strong security properties derived from the principle of decentralization. Is today's blockchain really decentralized? In this paper, we empirically studied one of the least decentralized parts of Ethereum -- the most used blockchain system in practice -- and shed light on the decentralization issue from a new perspective. To avo… ▽ More Blockchains protect an ecosystem worth more than $500bn with their strong security properties derived from the principle of decentralization. Is today's blockchain really decentralized? In this paper, we empirically studied one of the least decentralized parts of Ethereum -- the most used blockchain system in practice -- and shed light on the decentralization issue from a new perspective. To avoid centralization caused by Maximal Extractable Value (MEV), Ethereum adopts a novel mechanism that produces blocks through a builder market. After two years in operation, however, the builder market has evolved to a highly centralized one with three builders producing more than 90% of blocks. Why does the builder market centralize, given that it is permissionless and anyone can join? Moreover, what are the security implications of a centralized builder market to MEV-Boost auctions? Through a rigorous empirical study of the builder market's core mechanism, MEV-Boost auctions, we answered these two questions using a large-scale auction dataset we curated since 2022. Unlike previous works that focus on who wins the auctions, we focus on why they win, to shed light on the {openness, competitiveness, and efficiency} of MEV-Boost auctions. Our findings also help identify directions for improving the decentralization of builder markets. △ Less

Submitted 2 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

arXiv:2405.00713 [pdf, ps, other]

Some inequalities related to Riesz transform on exterior Lipschitz domains

Authors: Renjin Jiang, Sibei Yang

Abstract: Let $n\ge2$ and $\mathcal{L}=-\mathrm{div}(A\nabla\cdot)$ be an elliptic operator on $\mathbb{R}^n$. Given an exterior Lipschitz domain $Ω$, let $\mathcal{L}_D$ and $\mathcal{L}_N$ be the elliptic operators $\mathcal{L}$ on $Ω$ subject to the Dirichlet and the Neumann boundary {conditions}, respectively. For the Neumann operator, we show that the reverse inequality… ▽ More Let $n\ge2$ and $\mathcal{L}=-\mathrm{div}(A\nabla\cdot)$ be an elliptic operator on $\mathbb{R}^n$. Given an exterior Lipschitz domain $Ω$, let $\mathcal{L}_D$ and $\mathcal{L}_N$ be the elliptic operators $\mathcal{L}$ on $Ω$ subject to the Dirichlet and the Neumann boundary {conditions}, respectively. For the Neumann operator, we show that the reverse inequality $\|\mathcal{L}_N^{1/2}f\|_{L^p(Ω)} \le C\|\nabla f\|_{L^p(Ω)}$ holds true for any $p\in(1,\infty)$. For the Dirichlet operator, it was known that the Riesz operator $\nabla \mathcal{L}_D^{-1/2}$ is not bounded for $p>2$ and $p\ge n$, even if $\mathcal{L}=-Δ$ being the Laplace operator. Suppose that $A$ are CMO coefficients or VMO coefficients satisfying certain perturbation property, and $\partialΩ$ is $C^1$, we prove that for $p>2$ and $p\in [n,\infty)$, it holds $$ \inf_{φ\in\mathcal{A}^p_0(Ω)}\left\|\nabla f-\nablaφ\right\|_{L^p(Ω)}\le C\left\|\mathcal{L}^{1/2}_D f\right\|_{L^p(Ω)} $$ for $f\in \dot{W}^{1,p}_0(Ω)$. Here $\mathcal{A}^p_0(Ω)=\{f\in \dot{W}^{1,p}_0(Ω):\,\mathcal{L}_Df=0\}$ is a non-trivial subspace generated by harmonic function in $Ω$ with zero boundary value. △ Less

Submitted 25 April, 2024; originally announced May 2024.

Comments: 24pp, comments are welcome

arXiv:2405.00098 [pdf, other]

Amplitude analysis and branching fraction measurement of $B^{+}\to D^{*-}D^{+}_{s}π^{+}$ decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1057 additional authors not shown)

Abstract: The decays of the $B^{+}$ meson to the final state $D^{*-}D^{+}_{s}π^{+}$ are studied in proton-proton collision data collected with the LHCb detector at centre-of-mass energies of 7, 8, and 13 TeV, corresponding to a total integrated luminosity of 9 fb$^{-1}$. The ratio of branching fractions of the $B^{+}\to D^{*-}D^{+}_{s}π^{+}$ and $B^{0}\to D^{*-}D^{+}_{s}$ decays is measured to be… ▽ More The decays of the $B^{+}$ meson to the final state $D^{*-}D^{+}_{s}π^{+}$ are studied in proton-proton collision data collected with the LHCb detector at centre-of-mass energies of 7, 8, and 13 TeV, corresponding to a total integrated luminosity of 9 fb$^{-1}$. The ratio of branching fractions of the $B^{+}\to D^{*-}D^{+}_{s}π^{+}$ and $B^{0}\to D^{*-}D^{+}_{s}$ decays is measured to be $0.173\pm 0.006\pm 0.010$, where the first uncertainty is statistical and the second is systematic. Using partially reconstructed $D^{*+}_{s}\to D^{+}_{s}γ$ and $D^{+}_{s}π^{0}$ decays, the ratio of branching fractions between the $B^{+}\to D^{*-}D^{*+}_{s}π^{+}$ and $B^{+}\to D^{*-}D^{+}_{s}π^{+}$ decays is determined as $1.31\pm 0.07\pm 0.14$. An amplitude analysis of the $B^{+}\to D^{*-}D^{+}_{s}π^{+}$ decay is performed for the first time, revealing dominant contributions from known excited charm resonances decaying to the $D^{*-}π^{+}$ final state. No significant evidence of exotic contributions in the $D^{+}_{s}π^{+}$ or $D^{*-}D^{+}_{s}$ channels is found. The fit fraction of the scalar state $T_{c\bar{s} 0}^{\ast}(2900)^{++}$ observed in the $B^{+}\to D^{-}D^{+}_{s}π^{+}$ decay is determined to be less than 2.3% at a 90% confidence level. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2024-001.html (LHCb public pages)

Report number: LHCb-PAPER-2024-001, CERN-EP-2024-110

arXiv:2404.19510 [pdf, other]

First observation of $Λ_{b}^{0} \rightarrow Σ_c^{(*)++} D^{(*)-} K^{-}$ decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1067 additional authors not shown)

Abstract: The four decays, $Λ_{b}^{0} \rightarrow Σ_c^{(*)++} D^{(*)-} K^{-}$, are observed for the first time using proton-proton collision data collected with the LHCb detector at a centre-of-mass energy of $13\,\rm{TeV}$, corresponding to an integrated luminosity of $6\,\rm{fb}^{-1}$. By considering the $Λ_b^0 \rightarrow Λ_c^{+} \overline{D}^0 K^{-}$ decay as reference channel, the following branching f… ▽ More The four decays, $Λ_{b}^{0} \rightarrow Σ_c^{(*)++} D^{(*)-} K^{-}$, are observed for the first time using proton-proton collision data collected with the LHCb detector at a centre-of-mass energy of $13\,\rm{TeV}$, corresponding to an integrated luminosity of $6\,\rm{fb}^{-1}$. By considering the $Λ_b^0 \rightarrow Λ_c^{+} \overline{D}^0 K^{-}$ decay as reference channel, the following branching fraction ratios are measured to be, $$\frac{\cal{B} (Λ_{b}^{0} \rightarrow Σ_{c}^{++} \rm{D}^{-} {K}^{-})}{\cal{B}(Λ_{b}^{0} \rightarrow Λ_c^{+} \rm \overline{D}^0 {K}^{-})} = {0.282}\pm{0.016}\pm{0.016}\pm{0.005}, \frac{\cal{B}(Λ_{b}^{0} \rightarrow Σ_{c}^{*++} \rm {D}^{-} {K}^{-})}{\cal{B}(Λ_{b}^{0} \rightarrow Σ_c^{++} \rm {D}^{-} {K}^{-})} = {0.460}\pm{0.052}\pm{0.028}, \frac{\cal{B}(Λ_{b}^{0} \rightarrow Σ_{c}^{++} \rm {D}^{*-} {K}^{-})}{\cal{B}(Λ_{b}^{0} \rightarrow Σ_c^{++} \rm {D}^{-} {K}^{-})} = {2.261}\pm{0.202}\pm{0.129}\pm{0.046}, \frac{\cal{B}(Λ_{b}^{0} \rightarrow Σ_{c}^{*++} \rm D^{*-} K^{-})}{\cal{B}(Λ_{b}^{0} \rightarrow Σ_c^{++} \rm D^{-} K^{-})} = {0.896}\pm{0.137}\pm{0.066}\pm{0.018},$$ where the first uncertainties are statistical, the second are systematic, and the third are due to uncertainties in the branching fractions of intermediate particle decays. These initial observations mark the beginning of pentaquark searches in these modes, with more data set to become available following the LHCb upgrade. △ Less

Submitted 11 June, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2023-044.html (LHCb public pages)

Report number: LHCb-PAPER-2023-044, CERN-EP-2024-098

arXiv:2404.19382 [pdf, other]

Probing Unlearned Diffusion Models: A Transferable Adversarial Attack Perspective

Authors: Xiaoxuan Han, Songlin Yang, Wei Wang, Yang Li, Jing Dong

Abstract: Advanced text-to-image diffusion models raise safety concerns regarding identity privacy violation, copyright infringement, and Not Safe For Work content generation. Towards this, unlearning methods have been developed to erase these involved concepts from diffusion models. However, these unlearning methods only shift the text-to-image mapping and preserve the visual content within the generative… ▽ More Advanced text-to-image diffusion models raise safety concerns regarding identity privacy violation, copyright infringement, and Not Safe For Work content generation. Towards this, unlearning methods have been developed to erase these involved concepts from diffusion models. However, these unlearning methods only shift the text-to-image mapping and preserve the visual content within the generative space of diffusion models, leaving a fatal flaw for restoring these erased concepts. This erasure trustworthiness problem needs probe, but previous methods are sub-optimal from two perspectives: (1) Lack of transferability: Some methods operate within a white-box setting, requiring access to the unlearned model. And the learned adversarial input often fails to transfer to other unlearned models for concept restoration; (2) Limited attack: The prompt-level methods struggle to restore narrow concepts from unlearned models, such as celebrity identity. Therefore, this paper aims to leverage the transferability of the adversarial attack to probe the unlearning robustness under a black-box setting. This challenging scenario assumes that the unlearning method is unknown and the unlearned model is inaccessible for optimization, requiring the attack to be capable of transferring across different unlearned models. Specifically, we employ an adversarial search strategy to search for the adversarial embedding which can transfer across different unlearned models. This strategy adopts the original Stable Diffusion model as a surrogate model to iteratively erase and search for embeddings, enabling it to find the embedding that can restore the target concept for different unlearning methods. Extensive experiments demonstrate the transferability of the searched adversarial embedding across several state-of-the-art unlearning methods and its effectiveness for different levels of concepts. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.18861 [pdf, other]

A Survey on Vision Mamba: Models, Applications and Challenges

Authors: Rui Xu, Shu Yang, Yihui Wang, Bo Du, Hao Chen

Abstract: Mamba, a recent selective structured state space model, performs excellently on long sequence modeling tasks. Mamba mitigates the modeling constraints of convolutional neural networks and offers advanced modeling capabilities similar to those of Transformers, through global receptive fields and dynamic weighting. Crucially, it achieves this without incurring the quadratic computational complexity… ▽ More Mamba, a recent selective structured state space model, performs excellently on long sequence modeling tasks. Mamba mitigates the modeling constraints of convolutional neural networks and offers advanced modeling capabilities similar to those of Transformers, through global receptive fields and dynamic weighting. Crucially, it achieves this without incurring the quadratic computational complexity typically associated with Transformers. Due to its advantages over the former two mainstream foundation models, Mamba exhibits great potential to be a visual foundation model. Researchers are actively applying Mamba to various computer vision tasks, leading to numerous emerging works. To help keep pace with the rapid advancements in computer vision, this paper aims to provide a comprehensive review of visual Mamba approaches. This paper begins by delineating the formulation of the original Mamba model. Subsequently, our review of visual Mamba delves into several representative backbone networks to elucidate the core insights of the visual Mamba. We then categorize related works using different modalities, including image, video, point cloud, multi-modal, and others. Specifically, for image applications, we further organize them into distinct tasks to facilitate a more structured discussion. Finally, we discuss the challenges and future research directions for visual Mamba, providing insights for future research in this quickly evolving area. A comprehensive list of visual Mamba models reviewed in this work is available at https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18814 [pdf, ps, other]

Belt and Brace: When Federated Learning Meets Differential Privacy

Authors: Xuebin Ren, Shusen Yang, Cong Zhao, Julie McCann, Zongben Xu

Abstract: Federated learning (FL) has great potential for large-scale machine learning (ML) without exposing raw data.Differential privacy (DP) is the de facto standard of privacy protection with provable guarantees.Advances in ML suggest that DP would be a perfect fit for FL with comprehensive privacy preservation. Hence, extensive efforts have been devoted to achieving practically usable FL with DP, which… ▽ More Federated learning (FL) has great potential for large-scale machine learning (ML) without exposing raw data.Differential privacy (DP) is the de facto standard of privacy protection with provable guarantees.Advances in ML suggest that DP would be a perfect fit for FL with comprehensive privacy preservation. Hence, extensive efforts have been devoted to achieving practically usable FL with DP, which however is still challenging.Practitioners often not only are not fully aware of its development and categorization, but also face a hard choice between privacy and utility. Therefore, it calls for a holistic review of current advances and an investigation on the challenges and opportunities for highly usable FL systems with a DP guarantee. In this article, we first introduce the primary concepts of FL and DP, and highlight the benefits of integration. We then review the current developments by categorizing different paradigms and notions. Aiming at usable FL with DP, we present the optimization principles to seek a better tradeoff between model utility and privacy loss. Finally, we discuss future challenges in the emergent areas and relevant research topics. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 10 pages, 4 figures, accepted by and to appear in Communications of the ACM (CACM)

arXiv:2404.17173 [pdf, other]

Exploring Beyond Logits: Hierarchical Dynamic Labeling Based on Embeddings for Semi-Supervised Classification

Authors: Yanbiao Ma, Licheng Jiao, Fang Liu, Lingling Li, Shuyuan Yang, Xu Liu

Abstract: In semi-supervised learning, methods that rely on confidence learning to generate pseudo-labels have been widely proposed. However, increasing research finds that when faced with noisy and biased data, the model's representation network is more reliable than the classification network. Additionally, label generation methods based on model predictions often show poor adaptability across different d… ▽ More In semi-supervised learning, methods that rely on confidence learning to generate pseudo-labels have been widely proposed. However, increasing research finds that when faced with noisy and biased data, the model's representation network is more reliable than the classification network. Additionally, label generation methods based on model predictions often show poor adaptability across different datasets, necessitating customization of the classification network. Therefore, we propose a Hierarchical Dynamic Labeling (HDL) algorithm that does not depend on model predictions and utilizes image embeddings to generate sample labels. We also introduce an adaptive method for selecting hyperparameters in HDL, enhancing its versatility. Moreover, HDL can be combined with general image encoders (e.g., CLIP) to serve as a fundamental data processing module. We extract embeddings from datasets with class-balanced and long-tailed distributions using pre-trained semi-supervised models. Subsequently, samples are re-labeled using HDL, and the re-labeled samples are used to further train the semi-supervised models. Experiments demonstrate improved model performance, validating the motivation that representation networks are more reliable than classifiers or predictors. Our approach has the potential to change the paradigm of pseudo-label generation in semi-supervised learning. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.16770 [pdf, other]

Pseudogap phase as fluctuating pair density wave

Authors: Zheng-Yuan Yue, Zheng-Tao Xu, Shuo Yang, Zheng-Cheng Gu

Abstract: The physical nature of pseudogap phase is one of the most important and intriguing problems towards understanding the key mechanism of high temperature superconductivity in cuprates. Theoretically, the square-lattice $t$-$J$ model is widely believed to be the simplest toy model that captures the essential physics of cuprate superconductors. We employ the Grassmann tensor product state approach to… ▽ More The physical nature of pseudogap phase is one of the most important and intriguing problems towards understanding the key mechanism of high temperature superconductivity in cuprates. Theoretically, the square-lattice $t$-$J$ model is widely believed to be the simplest toy model that captures the essential physics of cuprate superconductors. We employ the Grassmann tensor product state approach to investigate uniform states in the underdoped ($δ\lesssim 0.1$) region. In addition to the previously known uniform $d$-wave state, we discover a strongly fluctuating pair density wave (PDW) state with wave vector $Q = (π, π)$. This fluctuating PDW state weakly breaks the $C_4$ rotational symmetry of the square lattice and has a lower or comparable energy to the $d$-wave state (depending on doping and the $t/J$ ratio), making it a promising candidate state for describing the pseudogap phase. △ Less

Submitted 15 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: 10 pages, 13 figures, references added

arXiv:2404.15825 [pdf]

Impact of Top SiO2 interlayer Thickness on Memory Window of Si Channel FeFET with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) Gate Structure

Authors: Tao Hu, Xianzhou Shao, Mingkai Bai, Xinpei Jia, Saifei Dai, Xiaoqing Sun, Runhao Han, Jia Yang, Xiaoyu Ke, Fengbin Tian, Shuai Yang, Junshuai Chai, Hao Xu, Xiaolei Wang, Wenwu Wang, Tianchun Ye

Abstract: We study the impact of top SiO2 interlayer thickness on memory window of Si channel FeFET with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) gate structure. The memory window increases with thicker top SiO2. We realize the memory window of 6.3 V for 3.4 nm top SiO2. Moreover, we find that the endurance characteristic degrades with increasing the initial memory window. We study the impact of top SiO2 interlayer thickness on memory window of Si channel FeFET with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) gate structure. The memory window increases with thicker top SiO2. We realize the memory window of 6.3 V for 3.4 nm top SiO2. Moreover, we find that the endurance characteristic degrades with increasing the initial memory window. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 4 page 7 figures

arXiv:2404.15127 [pdf, other]

MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning

Authors: Sunan He, Yuxiang Nie, Zhixuan Chen, Zhiyuan Cai, Hongmei Wang, Shu Yang, Hao Chen

Abstract: The rapid advancement of large-scale vision-language models has showcased remarkable capabilities across various tasks. However, the lack of extensive and high-quality image-text data in medicine has greatly hindered the development of large-scale medical vision-language models. In this work, we present a diagnosis-guided bootstrapping strategy that exploits both image and label information to con… ▽ More The rapid advancement of large-scale vision-language models has showcased remarkable capabilities across various tasks. However, the lack of extensive and high-quality image-text data in medicine has greatly hindered the development of large-scale medical vision-language models. In this work, we present a diagnosis-guided bootstrapping strategy that exploits both image and label information to construct vision-language datasets. Based on the constructed dataset, we developed MedDr, a generalist foundation model for healthcare capable of handling diverse medical data modalities, including radiology, pathology, dermatology, retinography, and endoscopy. Moreover, during inference, we propose a simple but effective retrieval-augmented medical diagnosis strategy, which enhances the model's generalization ability. Extensive experiments on visual question answering, medical report generation, and medical image diagnosis demonstrate the superiority of our method. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.14684 [pdf, other]

Tests of gravitational wave propagation with LIGO-Virgo catalog

Authors: Xian-Liang Wang, Shu-Cheng Yang, Wen-Biao Han

Abstract: In the framework of general relativity (GR), gravitational waves (GWs) are theorized to travel at the speed of light across all frequencies. However, Lorentz invariance (LI) violation and weak equivalence principle (WEP) violation may lead to frequency-dependent variations in the propagation speed of GWs, which can be examined by comparing the theoretical and observed discrepancies in the arrival… ▽ More In the framework of general relativity (GR), gravitational waves (GWs) are theorized to travel at the speed of light across all frequencies. However, Lorentz invariance (LI) violation and weak equivalence principle (WEP) violation may lead to frequency-dependent variations in the propagation speed of GWs, which can be examined by comparing the theoretical and observed discrepancies in the arrival times of GW signals at various frequencies. This provides us with an opportunity to test these theories. In theories involving LI violations, we focus on the massive gravity with the graviton mass $m_g$. In the case of WEP violation, different massless particles exposed to the same gravitational source should exhibit varying gravitational time delays. The gravitational time delay induced by massive gravitational sources is proportional to $γ+1$, where the parameter $γ=1$ in GR. Therefore, we can quantify these two violations using the graviton mass $m_g$ and $|Δγ|$, respectively. In this study, we use selected GW data from binary black hole coalescences in the LIGO-Virgo catalogs GWTC-2.1 and GWTC-3 to place constraints on the parameters $m_g$ and $|Δγ|$. Our most stringent constraints suggest that $m_g \lesssim 1.40\times10^{-26}eV/c^2$ at the upper limit of the 90% credible interval and $|Δγ| \lesssim 7.05 \times 10^{-16}$ at the 90% credible interval. We also compute Bayes factors for models that assume LI and WEP violations compared to the standard GW model, respectively. The absolute value of the natural logarithm of the Bayes factor is generally less than 2. Our analysis reveals no statistically significant preference for either model. Additionally, the Bayes factors between these two models do not provide obvious evidence in favor of either one. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 10 pages, 9 figures

arXiv:2404.13859 [pdf, other]

Unveiling and Mitigating Generalized Biases of DNNs through the Intrinsic Dimensions of Perceptual Manifolds

Authors: Yanbiao Ma, Licheng Jiao, Fang Liu, Lingling Li, Wenping Ma, Shuyuan Yang, Xu Liu, Puhua Chen

Abstract: Building fair deep neural networks (DNNs) is a crucial step towards achieving trustworthy artificial intelligence. Delving into deeper factors that affect the fairness of DNNs is paramount and serves as the foundation for mitigating model biases. However, current methods are limited in accurately predicting DNN biases, relying solely on the number of training samples and lacking more precise measu… ▽ More Building fair deep neural networks (DNNs) is a crucial step towards achieving trustworthy artificial intelligence. Delving into deeper factors that affect the fairness of DNNs is paramount and serves as the foundation for mitigating model biases. However, current methods are limited in accurately predicting DNN biases, relying solely on the number of training samples and lacking more precise measurement tools. Here, we establish a geometric perspective for analyzing the fairness of DNNs, comprehensively exploring how DNNs internally shape the intrinsic geometric characteristics of datasets-the intrinsic dimensions (IDs) of perceptual manifolds, and the impact of IDs on the fairness of DNNs. Based on multiple findings, we propose Intrinsic Dimension Regularization (IDR), which enhances the fairness and performance of models by promoting the learning of concise and ID-balanced class perceptual manifolds. In various image recognition benchmark tests, IDR significantly mitigates model bias while improving its performance. △ Less

Submitted 17 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: 8pages, 6figures, Submitted to TPAMI

arXiv:2404.13603 [pdf, other]

Beyond MMSE: Rank-1 Subspace Channel Estimator for Massive MIMO Systems

Authors: Bin Li, Ziping Wei, Shaoshi Yang, Yang Zhang, Jun Zhang, Chenglin Zhao, Sheng Chen

Abstract: To glean the benefits offered by massive multi-input multi-output (MIMO) systems, channel state information must be accurately acquired. Despite the high accuracy, the computational complexity of classical linear minimum mean squared error (MMSE) estimator becomes prohibitively high in the context of massive MIMO, while the other low-complexity methods degrade the estimation accuracy seriously. In… ▽ More To glean the benefits offered by massive multi-input multi-output (MIMO) systems, channel state information must be accurately acquired. Despite the high accuracy, the computational complexity of classical linear minimum mean squared error (MMSE) estimator becomes prohibitively high in the context of massive MIMO, while the other low-complexity methods degrade the estimation accuracy seriously. In this paper, we develop a novel rank-1 subspace channel estimator to approximate the maximum likelihood (ML) estimator, which outperforms the linear MMSE estimator, but incurs a surprisingly low computational complexity. Our method first acquires the highly accurate angle-of-arrival (AoA) information via a constructed space-embedding matrix and the rank-1 subspace method. Then, it adopts the post-reception beamforming to acquire the unbiased estimate of channel gains. Furthermore, a fast method is designed to implement our new estimator. Theoretical analysis shows that the extra gain achieved by our method over the linear MMSE estimator grows according to the rule of O($\log_{10}M$), while its computational complexity is linearly scalable to the number of antennas $M$. Numerical simulations also validate the theoretical results. Our new method substantially extends the accuracy-complexity region and constitutes a promising channel estimation solution to the emerging massive MIMO communications. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 15 pages, 12 figures, accepted to appear on IEEE Transactions on Communications, Apr. 2024

arXiv:2404.12817 [pdf, other]

Determination of the CKM angle $φ_{3}$ from a combination of Belle and Belle II results

Authors: Belle, Belle II Collaborations, :, I. Adachi, L. Aggarwal, H. Aihara, N. Akopov, A. Aloisio, S. Al Said, N. Anh Ky, D. M. Asner, H. Atmacan, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, S. Bahinipati, P. Bambade, Sw. Banerjee, S. Bansal, M. Barrett, J. Baudot, A. Baur, A. Beaubien , et al. (377 additional authors not shown)

Abstract: We report a determination of the CKM angle $φ_{3}$, also known as $γ$, from a combination of measurements using samples of up to 711~fb$^{-1}$ from the Belle experiment and up to 362~fb$^{-1}$ from the Belle II experiment. We combine results from analyses of $B^+\to DK^+, B^+\to Dπ^+$, and $B^+ \to D^{*}K^+$ decays, where $D$ is an admixture of $D^0$ and $\overline{D}{}^{0}$ mesons, in a likelihoo… ▽ More We report a determination of the CKM angle $φ_{3}$, also known as $γ$, from a combination of measurements using samples of up to 711~fb$^{-1}$ from the Belle experiment and up to 362~fb$^{-1}$ from the Belle II experiment. We combine results from analyses of $B^+\to DK^+, B^+\to Dπ^+$, and $B^+ \to D^{*}K^+$ decays, where $D$ is an admixture of $D^0$ and $\overline{D}{}^{0}$ mesons, in a likelihood fit to obtain $φ_{3} = (78.6^{+7.2}_{-7.3})^{\circ}$. We also briefly discuss the interpretation of this result. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: 31 pages, 4 figures

Report number: Belle II Preprint 2023-015, KEK Preprint 2023-31

arXiv:2404.12412 [pdf]

Alloyed Re$_x$Mo$_{1-x}$S$_2$ Nanoflakes with Enlarged Interlayer Distances for Hydrogen Evolution

Authors: Jing Li, René Hübner, Marielle Deconinck, Ankita Bora, Markus Göbel, Dana Schwarz, Guangbo Chen, Guangzhao Wang, Shengyuan A. Yang, Yana Vaynzof, Vladimir Lesnyak

Abstract: Molybdenum sulfide (MoS$_2$) has attracted significant attention due to its great potential as a low-cost and efficient catalyst for the hydrogen evolution reaction. Developing a facile, easily upscalable, and inexpensive approach to produce catalytically active nanostructured MoS$_2$ with a high yield would significantly advance its practical application. Colloidal synthesis offers several advant… ▽ More Molybdenum sulfide (MoS$_2$) has attracted significant attention due to its great potential as a low-cost and efficient catalyst for the hydrogen evolution reaction. Developing a facile, easily upscalable, and inexpensive approach to produce catalytically active nanostructured MoS$_2$ with a high yield would significantly advance its practical application. Colloidal synthesis offers several advantages over other preparation techniques to overcome the low reaction yield of exfoliation and drawbacks of expensive equipment and processes used in chemical vapor deposition. In this work, we report an efficient synthesis of alloyed Re$_x$Mo$_{1-x}$S$_2$ nanoflakes with an enlarged interlayer distance, among which the composition Re$_{0.55}$Mo$_{0.45}$S$_2$ exhibits excellent catalytic performance with overpotentials as low as 79 mV at 10 mA/cm2 and a small Tafel slope of 42 mV/dec. Density functional theory calculations prove that enlarging the distance between layers in the Re$_x$Mo$_{1-x}$S$_2$alloy can greatly improve its catalytic performance due to a significantly reduced free energy of hydrogen adsorption. The developed approach paves the way to design advanced transition metal dichalcogenide-based catalysts for hydrogen evolution and to promote their large-scale practical application. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.12031 [pdf, other]

MLS-Track: Multilevel Semantic Interaction in RMOT

Authors: Zeliang Ma, Song Yang, Zhe Cui, Zhicheng Zhao, Fei Su, Delong Liu, Jingyu Wang

Abstract: The new trend in multi-object tracking task is to track objects of interest using natural language. However, the scarcity of paired prompt-instance data hinders its progress. To address this challenge, we propose a high-quality yet low-cost data generation method base on Unreal Engine 5 and construct a brand-new benchmark dataset, named Refer-UE-City, which primarily includes scenes from intersect… ▽ More The new trend in multi-object tracking task is to track objects of interest using natural language. However, the scarcity of paired prompt-instance data hinders its progress. To address this challenge, we propose a high-quality yet low-cost data generation method base on Unreal Engine 5 and construct a brand-new benchmark dataset, named Refer-UE-City, which primarily includes scenes from intersection surveillance videos, detailing the appearance and actions of people and vehicles. Specifically, it provides 14 videos with a total of 714 expressions, and is comparable in scale to the Refer-KITTI dataset. Additionally, we propose a multi-level semantic-guided multi-object framework called MLS-Track, where the interaction between the model and text is enhanced layer by layer through the introduction of Semantic Guidance Module (SGM) and Semantic Correlation Branch (SCB). Extensive experiments on Refer-UE-City and Refer-KITTI datasets demonstrate the effectiveness of our proposed framework and it achieves state-of-the-art performance. Code and datatsets will be available. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 17 pages 8 figures

arXiv:2404.11998 [pdf, other]

Curriculum Point Prompting for Weakly-Supervised Referring Image Segmentation

Authors: Qiyuan Dai, Sibei Yang

Abstract: Referring image segmentation (RIS) aims to precisely segment referents in images through corresponding natural language expressions, yet relying on cost-intensive mask annotations. Weakly supervised RIS thus learns from image-text pairs to pixel-level semantics, which is challenging for segmenting fine-grained masks. A natural approach to enhancing segmentation precision is to empower weakly super… ▽ More Referring image segmentation (RIS) aims to precisely segment referents in images through corresponding natural language expressions, yet relying on cost-intensive mask annotations. Weakly supervised RIS thus learns from image-text pairs to pixel-level semantics, which is challenging for segmenting fine-grained masks. A natural approach to enhancing segmentation precision is to empower weakly supervised RIS with the image segmentation foundation model SAM. Nevertheless, we observe that simply integrating SAM yields limited benefits and can even lead to performance regression due to the inevitable noise issues and challenges in excessive focus on object parts. In this paper, we present an innovative framework, Point PrompTing (PPT), incorporated with the proposed multi-source curriculum learning strategy to address these challenges. Specifically, the core of PPT is a point generator that not only harnesses CLIP's text-image alignment capability and SAM's powerful mask generation ability but also generates negative point prompts to address the noisy and excessive focus issues inherently and effectively. In addition, we introduce a curriculum learning strategy with object-centric images to help PPT gradually learn from simpler yet precise semantic alignment to more complex RIS. Experiments demonstrate that our PPT significantly and consistently outperforms prior weakly supervised techniques on mIoU by 11.34%, 14.14%, and 6.97% across RefCOCO, RefCOCO+, and G-Ref, respectively. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: Accepted to CVPR 2024

arXiv:2404.11957 [pdf, other]

The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models

Authors: Cheng Shi, Sibei Yang

Abstract: Foundation models, pre-trained on a large amount of data have demonstrated impressive zero-shot capabilities in various downstream tasks. However, in object detection and instance segmentation, two fundamental computer vision tasks heavily reliant on extensive human annotations, foundation models such as SAM and DINO struggle to achieve satisfactory performance. In this study, we reveal that the d… ▽ More Foundation models, pre-trained on a large amount of data have demonstrated impressive zero-shot capabilities in various downstream tasks. However, in object detection and instance segmentation, two fundamental computer vision tasks heavily reliant on extensive human annotations, foundation models such as SAM and DINO struggle to achieve satisfactory performance. In this study, we reveal that the devil is in the object boundary, \textit{i.e.}, these foundation models fail to discern boundaries between individual objects. For the first time, we probe that CLIP, which has never accessed any instance-level annotations, can provide a highly beneficial and strong instance-level boundary prior in the clustering results of its particular intermediate layer. Following this surprising observation, we propose $\textbf{Zip}$ which $\textbf{Z}$ips up CL$\textbf{ip}$ and SAM in a novel classification-first-then-discovery pipeline, enabling annotation-free, complex-scene-capable, open-vocabulary object detection and instance segmentation. Our Zip significantly boosts SAM's mask AP on COCO dataset by 12.5% and establishes state-of-the-art performance in various settings, including training-free, self-training, and label-efficient finetuning. Furthermore, annotation-free Zip even achieves comparable performance to the best-performing open-vocabulary object detecters using base annotations. Code is released at https://github.com/ChengShiest/Zip-Your-CLIP △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: ICLR2024, Code is released at https://github.com/ChengShiest/Zip-Your-CLIP

arXiv:2404.11539 [pdf, other]

Evaluating Span Extraction in Generative Paradigm: A Reflection on Aspect-Based Sentiment Analysis

Authors: Soyoung Yang, Won Ik Cho

Abstract: In the era of rapid evolution of generative language models within the realm of natural language processing, there is an imperative call to revisit and reformulate evaluation methodologies, especially in the domain of aspect-based sentiment analysis (ABSA). This paper addresses the emerging challenges introduced by the generative paradigm, which has moderately blurred traditional boundaries betwee… ▽ More In the era of rapid evolution of generative language models within the realm of natural language processing, there is an imperative call to revisit and reformulate evaluation methodologies, especially in the domain of aspect-based sentiment analysis (ABSA). This paper addresses the emerging challenges introduced by the generative paradigm, which has moderately blurred traditional boundaries between understanding and generation tasks. Building upon prevailing practices in the field, we analyze the advantages and shortcomings associated with the prevalent ABSA evaluation paradigms. Through an in-depth examination, supplemented by illustrative examples, we highlight the intricacies involved in aligning generative outputs with other evaluative metrics, specifically those derived from other tasks, including question answering. While we steer clear of advocating for a singular and definitive metric, our contribution lies in paving the path for a comprehensive guideline tailored for ABSA evaluations in this generative paradigm. In this position paper, we aim to provide practitioners with profound reflections, offering insights and directions that can aid in navigating this evolving landscape, ensuring evaluations that are both accurate and reflective of generative capabilities. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 10 pages

arXiv:2404.10874 [pdf, other]

doi 10.1103/PhysRevD.109.L111103

Measurement of the branching fraction of the decay $B^- \to D^0 ρ(770)^-$ at Belle II

Authors: Belle II Collaboration, I. Adachi, L. Aggarwal, H. Aihara, N. Akopov, A. Aloisio, N. Anh Ky, D. M. Asner, H. Atmacan, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, S. Bahinipati, P. Bambade, Sw. Banerjee, S. Bansal, M. Barrett, J. Baudot, A. Baur, A. Beaubien, F. Becherer, J. Becker, J. V. Bennett , et al. (367 additional authors not shown)

Abstract: We measure the branching fraction of the decay $B^- \to D^0 ρ(770)^-$ using data collected with the Belle II detector. The data contain 387 million $B\overline{B}$ pairs produced in $e^+e^-$ collisions at the $Υ(4S)$ resonance. We reconstruct $8360\pm 180$ decays from an analysis of the distributions of the $B^-$ energy and the $ρ(770)^-$ helicity angle. We determine the branching fraction to be… ▽ More We measure the branching fraction of the decay $B^- \to D^0 ρ(770)^-$ using data collected with the Belle II detector. The data contain 387 million $B\overline{B}$ pairs produced in $e^+e^-$ collisions at the $Υ(4S)$ resonance. We reconstruct $8360\pm 180$ decays from an analysis of the distributions of the $B^-$ energy and the $ρ(770)^-$ helicity angle. We determine the branching fraction to be $(0.939 \pm 0.021\mathrm{(stat)} \pm 0.050\mathrm{(syst)})\%$, in agreement with previous results. Our measurement improves the relative precision of the world average by more than a factor of two. △ Less

Submitted 27 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Report number: Belle II Preprint 2024-011, KEK Preprint 2024-4

Journal ref: PRD 109, 111103 (2024)

arXiv:2404.10660 [pdf, other]

Discovery of the optical and radio counterpart to the fast X-ray transient EP240315a

Authors: J. H. Gillanders, L. Rhodes, S. Srivastav, F. Carotenuto, J. Bright, M. E. Huber, H. F. Stevance, S. J. Smartt, K. C. Chambers, T. -W. Chen, R. Fender, A. Andersson, A. J. Cooper, P. G. Jonker, F. J. Cowie, T. deBoer, N. Erasmus, M. D. Fulton, H. Gao, J. Herman, C. -C. Lin, T. Lowe, E. A. Magnier, H. -Y. Miao, P. Minguez , et al. (14 additional authors not shown)

Abstract: Fast X-ray Transients (FXTs) are extragalactic bursts of soft X-rays first identified >10 years ago. Since then, nearly 40 events have been discovered, although almost all of these have been recovered from archival Chandra and XMM-Newton data. To date, optical sky surveys and follow-up searches have not revealed any multi-wavelength counterparts. The Einstein Probe, launched in January 2024, has s… ▽ More Fast X-ray Transients (FXTs) are extragalactic bursts of soft X-rays first identified >10 years ago. Since then, nearly 40 events have been discovered, although almost all of these have been recovered from archival Chandra and XMM-Newton data. To date, optical sky surveys and follow-up searches have not revealed any multi-wavelength counterparts. The Einstein Probe, launched in January 2024, has started surveying the sky in the soft X-ray regime (0.5-4 keV) and will rapidly increase the sample of FXTs discovered in real time. Here, we report the first discovery of both an optical and radio counterpart to a distant FXT, the fourth source publicly released by the Einstein Probe. We discovered a fast-fading optical transient within the 3 arcmin localisation radius of EP240315a with the all-sky optical survey ATLAS, and our follow-up Gemini spectrum provides a redshift, z=4.859+/-0.002. Furthermore, we uncovered a radio counterpart in the S-band (3.0 GHz) with the MeerKAT radio interferometer. The optical (rest-frame UV) and radio luminosities indicate the FXT most likely originates from either a long gamma-ray burst or a relativistic tidal disruption event. This may be a fortuitous early mission detection by the Einstein Probe or may signpost a mode of discovery for high-redshift, high-energy transients through soft X-ray surveys, combined with locating multi-wavelength counterparts. △ Less

Submitted 19 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: Updated to match version accepted for publication in ApJL (17 pages, 4 figures, 2 tables)

arXiv:2404.10576 [pdf, other]

doi 10.1103/PhysRevA.109.062226

Quantum phase transition and critical behavior between the gapless topological phases

Authors: Hao-Long Zhang, Han-Ze Li, Sheng Yang, Xue-Jia Yu

Abstract: The phase transition between gapped topological phases represents a class of unconventional criticality beyond the Landau paradigm. However, recent research has shifted attention to topological phases without a bulk gap, where the phase transitions between them are still elusive. In this work, based on large-scale density matrix renormalization group techniques, we investigate the critical behavio… ▽ More The phase transition between gapped topological phases represents a class of unconventional criticality beyond the Landau paradigm. However, recent research has shifted attention to topological phases without a bulk gap, where the phase transitions between them are still elusive. In this work, based on large-scale density matrix renormalization group techniques, we investigate the critical behaviors of the extended quantum XXZ model obtained by the Kennedy-Tasaki transformation. Using fidelity susceptibility as a diagnostic, we obtain a complete phase diagram, which includes both topological nontrivial and trivial gapless phases. Furthermore, as the XXZ-type anisotropy parameter $Δ$ varies, both the critical points $h_c$ and correlation length exponent $ν$ remain the same as in the $Δ=0$ case, characterized by $c=3/2$ (Ising + free boson) conformal field theory. Our results indicate that fidelity susceptibility can effectively detect and reveal a stable unconventional critical line between the topologically distinct gapless phases for general $Δ$. This work serves as a valuable reference for further research on phase transitions within the gapless topological phase of matter. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: 10 pages, 11 figures. Any comments or suggestions are welcome!

arXiv:2404.10306 [pdf, other]

Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model

Authors: Hengyuan Zhang, Yanru Wu, Dawei Li, Sak Yang, Rui Zhao, Yong Jiang, Fei Tan

Abstract: Aligned Large Language Models (LLMs) showcase remarkable versatility, capable of handling diverse real-world tasks. Meanwhile, aligned LLMs are also expected to exhibit speciality, excelling in specific applications. However, fine-tuning with extra data, a common practice to gain speciality, often leads to catastrophic forgetting (CF) of previously acquired versatility, hindering the model's perfo… ▽ More Aligned Large Language Models (LLMs) showcase remarkable versatility, capable of handling diverse real-world tasks. Meanwhile, aligned LLMs are also expected to exhibit speciality, excelling in specific applications. However, fine-tuning with extra data, a common practice to gain speciality, often leads to catastrophic forgetting (CF) of previously acquired versatility, hindering the model's performance across diverse tasks. In response to this challenge, we propose CoFiTune, a coarse to fine framework in an attempt to strike the balance between speciality and versatility. At the coarse-grained level, an empirical tree-search algorithm is utilized to pinpoint and update specific modules that are crucial for speciality, while keeping other parameters frozen; at the fine-grained level, a soft-masking mechanism regulates the update to the LLMs, mitigating the CF issue without harming speciality. In an overall evaluation of both speciality and versatility, CoFiTune consistently outperforms baseline methods across diverse tasks and model scales. Compared to the full-parameter SFT, CoFiTune leads to about 14% versatility improvement and marginal speciality loss on a 13B model. Lastly, based on further analysis, we provide a speculative insight into the information forwarding process in LLMs, which helps explain the effectiveness of the proposed method. The code is available at https://github.com/rattlesnakey/CoFiTune. △ Less

Submitted 3 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: 43 pages, 10 figures, accepted by ACL 2024 Findings

arXiv:2404.09990 [pdf, other]

HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

Authors: Mude Hui, Siwei Yang, Bingchen Zhao, Yichun Shi, Heng Wang, Peng Wang, Yuyin Zhou, Cihang Xie

Abstract: This study introduces HQ-Edit, a high-quality instruction-based image editing dataset with around 200,000 edits. Unlike prior approaches relying on attribute guidance or human feedback on building datasets, we devise a scalable data collection pipeline leveraging advanced foundation models, namely GPT-4V and DALL-E 3. To ensure its high quality, diverse examples are first collected online, expande… ▽ More This study introduces HQ-Edit, a high-quality instruction-based image editing dataset with around 200,000 edits. Unlike prior approaches relying on attribute guidance or human feedback on building datasets, we devise a scalable data collection pipeline leveraging advanced foundation models, namely GPT-4V and DALL-E 3. To ensure its high quality, diverse examples are first collected online, expanded, and then used to create high-quality diptychs featuring input and output images with detailed text prompts, followed by precise alignment ensured through post-processing. In addition, we propose two evaluation metrics, Alignment and Coherence, to quantitatively assess the quality of image edit pairs using GPT-4V. HQ-Edits high-resolution images, rich in detail and accompanied by comprehensive editing prompts, substantially enhance the capabilities of existing image editing models. For example, an HQ-Edit finetuned InstructPix2Pix can attain state-of-the-art image editing performance, even surpassing those models fine-tuned with human-annotated data. The project page is https://thefllood.github.io/HQEdit_web. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: Project Page: https://thefllood.github.io/HQEdit_web

arXiv:2404.09385 [pdf, other]

A Large-Scale Evaluation of Speech Foundation Models

Authors: Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee

Abstract: The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work,… ▽ More The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work, we establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the paradigm for speech. We propose a unified multi-tasking framework to address speech processing tasks in SUPERB using a frozen foundation model followed by task-specialized, lightweight prediction heads. Combining our results with community submissions, we verify that the foundation model paradigm is promising for speech, and our multi-tasking framework is simple yet effective, as the best-performing foundation model shows competitive generalizability across most SUPERB tasks. For reproducibility and extensibility, we have developed a long-term maintained platform that enables deterministic benchmarking, allows for result sharing via an online leaderboard, and promotes collaboration through a community-driven benchmark database to support new development cycles. Finally, we conduct a series of analyses to offer an in-depth understanding of SUPERB and speech foundation models, including information flows across tasks inside the models, the correctness of the weighted-sum benchmarking protocol and the statistical significance and robustness of the benchmark. △ Less

Submitted 29 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

Comments: The extended journal version for SUPERB and SUPERB-SG. Published in IEEE/ACM TASLP. The Arxiv version is preferred

arXiv:2404.09248 [pdf, other]

Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts

Authors: Jing-Cheng Pang, Si-Hang Yang, Kaiyuan Li, Jiaji Zhang, Xiong-Hui Chen, Nan Tang, Yang Yu

Abstract: Reinforcement learning (RL) trains agents to accomplish complex tasks through environmental interaction data, but its capacity is also limited by the scope of the available data. To obtain a knowledgeable agent, a promising approach is to leverage the knowledge from large language models (LLMs). Despite previous studies combining LLMs with RL, seamless integration of the two components remains cha… ▽ More Reinforcement learning (RL) trains agents to accomplish complex tasks through environmental interaction data, but its capacity is also limited by the scope of the available data. To obtain a knowledgeable agent, a promising approach is to leverage the knowledge from large language models (LLMs). Despite previous studies combining LLMs with RL, seamless integration of the two components remains challenging due to their semantic gap. This paper introduces a novel method, Knowledgeable Agents from Language Model Rollouts (KALM), which extracts knowledge from LLMs in the form of imaginary rollouts that can be easily learned by the agent through offline reinforcement learning methods. The primary challenge of KALM lies in LLM grounding, as LLMs are inherently limited to textual data, whereas environmental data often comprise numerical vectors unseen to LLMs. To address this, KALM fine-tunes the LLM to perform various tasks based on environmental data, including bidirectional translation between natural language descriptions of skills and their corresponding rollout data. This grounding process enhances the LLM's comprehension of environmental dynamics, enabling it to generate diverse and meaningful imaginary rollouts that reflect novel skills. Initial empirical evaluations on the CLEVR-Robot environment demonstrate that KALM enables agents to complete complex rephrasings of task goals and extend their capabilities to novel tasks requiring unprecedented optimal behaviors. KALM achieves a success rate of 46% in executing tasks with unseen goals, substantially surpassing the 26% success rate achieved by baseline methods. Furthermore, KALM effectively enables the LLM to comprehend environmental dynamics, resulting in the generation of meaningful imaginary rollouts that reflect novel skills and demonstrate the seamless integration of large language models and reinforcement learning. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.08725 [pdf, other]

Development of a data overflow protection system for Super-Kamiokande to maximize data from nearby supernovae

Authors: M. Mori, K. Abe, Y. Hayato, K. Hiraide, K. Hosokawa, K. Ieki, M. Ikeda, J. Kameda, Y. Kanemura, R. Kaneshima, Y. Kashiwagi, Y. Kataoka, S. Miki, S. Mine, M. Miura, S. Moriyama, Y. Nakano, M. Nakahata, S. Nakayama, Y. Noguchi, K. Okamoto, K. Sato, H. Sekiya, H. Shiba, K. Shimizu , et al. (230 additional authors not shown)

Abstract: Neutrinos from very nearby supernovae, such as Betelgeuse, are expected to generate more than ten million events over 10\,s in Super-Kamokande (SK). At such large event rates, the buffers of the SK analog-to-digital conversion board (QBEE) will overflow, causing random loss of data that is critical for understanding the dynamics of the supernova explosion mechanism. In order to solve this problem,… ▽ More Neutrinos from very nearby supernovae, such as Betelgeuse, are expected to generate more than ten million events over 10\,s in Super-Kamokande (SK). At such large event rates, the buffers of the SK analog-to-digital conversion board (QBEE) will overflow, causing random loss of data that is critical for understanding the dynamics of the supernova explosion mechanism. In order to solve this problem, two new DAQ modules were developed to aid in the observation of very nearby supernovae. The first of these, the SN module, is designed to save only the number of hit PMTs during a supernova burst and the second, the Veto module, prescales the high rate neutrino events to prevent the QBEE from overflowing based on information from the SN module. In the event of a very nearby supernova, these modules allow SK to reconstruct the time evolution of the neutrino event rate from beginning to end using both QBEE and SN module data. This paper presents the development and testing of these modules together with an analysis of supernova-like data generated with a flashing laser diode. We demonstrate that the Veto module successfully prevents DAQ overflows for Betelgeuse-like supernovae as well as the long-term stability of the new modules. During normal running the Veto module is found to issue DAQ vetos a few times per month resulting in a total dead time less than 1\,ms, and does not influence ordinary operations. Additionally, using simulation data we find that supernovae closer than 800~pc will trigger Veto module resulting in a prescaling of the observed neutrino data. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 28 pages, 18 figures. Submitted to PTEP

arXiv:2404.08450 [pdf, other]

Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues

Authors: Xianhua He, Dashuang Liang, Song Yang, Zhanlong Hao, Hui Ma, Binjie Mao, Xi Li, Yao Wang, Pengfei Yan, Ajian Liu

Abstract: Face recognition systems are frequently subjected to a variety of physical and digital attacks of different types. Previous methods have achieved satisfactory performance in scenarios that address physical attacks and digital attacks, respectively. However, few methods are considered to integrate a model that simultaneously addresses both physical and digital attacks, implying the necessity to dev… ▽ More Face recognition systems are frequently subjected to a variety of physical and digital attacks of different types. Previous methods have achieved satisfactory performance in scenarios that address physical attacks and digital attacks, respectively. However, few methods are considered to integrate a model that simultaneously addresses both physical and digital attacks, implying the necessity to develop and maintain multiple models. To jointly detect physical and digital attacks within a single model, we propose an innovative approach that can adapt to any network architecture. Our approach mainly contains two types of data augmentation, which we call Simulated Physical Spoofing Clues augmentation (SPSC) and Simulated Digital Spoofing Clues augmentation (SDSC). SPSC and SDSC augment live samples into simulated attack samples by simulating spoofing clues of physical and digital attacks, respectively, which significantly improve the capability of the model to detect "unseen" attack types. Extensive experiments show that SPSC and SDSC can achieve state-of-the-art generalization in Protocols 2.1 and 2.2 of the UniAttackData dataset, respectively. Our method won first place in "Unified Physical-Digital Face Attack Detection" of the 5th Face Anti-spoofing Challenge@CVPR2024. Our final submission obtains 3.75% APCER, 0.93% BPCER, and 2.34% ACER, respectively. Our code is available at https://github.com/Xianhua-He/cvpr2024-face-anti-spoofing-challenge. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 10 pages with 6 figures, Accepted by CVPRW 2024

arXiv:2404.08341 [pdf, other]

Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts

Authors: Yang Li, Songlin Yang, Wei Wang, Ziwen He, Bo Peng, Jing Dong

Abstract: Highly realistic AI generated face forgeries known as deepfakes have raised serious social concerns. Although DNN-based face forgery detection models have achieved good performance, they are vulnerable to latest generative methods that have less forgery traces and adversarial attacks. This limitation of generalization and robustness hinders the credibility of detection results and requires more ex… ▽ More Highly realistic AI generated face forgeries known as deepfakes have raised serious social concerns. Although DNN-based face forgery detection models have achieved good performance, they are vulnerable to latest generative methods that have less forgery traces and adversarial attacks. This limitation of generalization and robustness hinders the credibility of detection results and requires more explanations. In this work, we provide counterfactual explanations for face forgery detection from an artifact removal perspective. Specifically, we first invert the forgery images into the StyleGAN latent space, and then adversarially optimize their latent representations with the discrimination supervision from the target detection model. We verify the effectiveness of the proposed explanations from two aspects: (1) Counterfactual Trace Visualization: the enhanced forgery images are useful to reveal artifacts by visually contrasting the original images and two different visualization methods; (2) Transferable Adversarial Attacks: the adversarial forgery images generated by attacking the detection model are able to mislead other detection models, implying the removed artifacts are general. Extensive experiments demonstrate that our method achieves over 90% attack success rate and superior attack transferability. Compared with naive adversarial noise methods, our method adopts both generative and discriminative model priors, and optimize the latent representations in a synthesis-by-analysis way, which forces the search of counterfactual explanations on the natural face manifold. Thus, more general counterfactual traces can be found and better adversarial attack transferability can be achieved. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: Accepted to ICME2024

arXiv:2404.08215 [pdf, other]

Stability and noncentered PT symmetry of real topological phases

Authors: S. J. Yue, Qing Liu, Shengyuan A. Yang, Y. X. Zhao

Abstract: Real topological phases protected by the spacetime inversion (P T) symmetry are a current research focus. The basis is that the P T symmetry endows a real structure in momentum space, which leads to Z2 topological classifications in 1D and 2D. Here, we provide solutions to two outstanding problems in the diagnosis of real topology. First, based on the stable equivalence in K-theory, we clarify tha… ▽ More Real topological phases protected by the spacetime inversion (P T) symmetry are a current research focus. The basis is that the P T symmetry endows a real structure in momentum space, which leads to Z2 topological classifications in 1D and 2D. Here, we provide solutions to two outstanding problems in the diagnosis of real topology. First, based on the stable equivalence in K-theory, we clarify that the 2D topological invariant remains well defined in the presence of nontrivial 1D invariant, and we develop a general numerical approach for its evaluation, which was hitherto unavailable. Second, under the unit-cell convention, noncentered P T symmetries assume momentum dependence, which violates the presumption in previous methods for computing the topological invariants. We clarify the classifications for this case and formulate the invariants by introducing a twisted Wilson-loop operator for both 1D and 2D. A simple model on a rectangular lattice is constructed to demonstrate our theory, which can be readily realized using artificial crystals. △ Less

Submitted 16 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.08133 [pdf, other]

Search for rare $b \to d\ell^+\ell^-$ transitions at Belle

Authors: Belle, Belle II Collaborations, :, I. Adachi, L. Aggarwal, H. Aihara, N. Akopov, A. Aloisio, S. Al Said, D. M. Asner, H. Atmacan, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, S. Bahinipati, P. Bambade, Sw. Banerjee, S. Bansal, M. Barrett, J. Baudot, A. Beaubien, F. Becherer, J. Becker , et al. (371 additional authors not shown)

Abstract: We present the results of a search for the $b \to d\ell^+\ell^-$ flavor-changing neutral-current rare decays $B^{+, 0} \to (η, ω, π^{+,0}, ρ^{+, 0}) e^+e^-$ and $B^{+, 0} \to (η, ω, π^{0}, ρ^{+}) μ^+μ^-$ using a $711$ fb$^{-1}$ data sample that contains $772 \times 10^{6}$ $B\overline{B}$ events. The data were collected at the $Υ(4S)$ resonance with the Belle detector at the KEKB asymmetric-energy… ▽ More We present the results of a search for the $b \to d\ell^+\ell^-$ flavor-changing neutral-current rare decays $B^{+, 0} \to (η, ω, π^{+,0}, ρ^{+, 0}) e^+e^-$ and $B^{+, 0} \to (η, ω, π^{0}, ρ^{+}) μ^+μ^-$ using a $711$ fb$^{-1}$ data sample that contains $772 \times 10^{6}$ $B\overline{B}$ events. The data were collected at the $Υ(4S)$ resonance with the Belle detector at the KEKB asymmetric-energy $e^+e^-$ collider. We find no evidence for signal and set upper limits on branching fractions at the $90\%$ confidence level in the range $(3.8 - 47) \times 10^{-8}$ depending on the decay channel. The obtained limits are the world's best results. This is the first search for the channels $B^{+, 0} \to (ω, ρ^{+,0}) e^+e^-$ and $B^{+, 0} \to (ω, ρ^{+})μ^+μ^-$. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 7 pages, 12 figures

Report number: Belle II Preprint 2024-005, KEK Preprint 2023-52

arXiv:2404.08064 [pdf]

The Impact of Speech Anonymization on Pathology and Its Limits

Authors: Soroosh Tayebi Arasteh, Tomas Arias-Vergara, Paula Andrea Perez-Toro, Tobias Weise, Kai Packhaeuser, Maria Schuster, Elmar Noeth, Andreas Maier, Seung Hee Yang

Abstract: Integration of speech into healthcare has intensified privacy concerns due to its potential as a non-invasive biomarker containing individual biometric information. In response, speaker anonymization aims to conceal personally identifiable information while retaining crucial linguistic content. However, the application of anonymization techniques to pathological speech, a critical area where priva… ▽ More Integration of speech into healthcare has intensified privacy concerns due to its potential as a non-invasive biomarker containing individual biometric information. In response, speaker anonymization aims to conceal personally identifiable information while retaining crucial linguistic content. However, the application of anonymization techniques to pathological speech, a critical area where privacy is especially vital, has not been extensively examined. This study investigates anonymization's impact on pathological speech across over 2,700 speakers from multiple German institutions, focusing on privacy, pathological utility, and demographic fairness. We explore both deep-learning-based and signal processing-based anonymization methods, and document substantial privacy improvements across disorders-evidenced by equal error rate increases up to 1933%, with minimal overall impact on utility. Specific disorders such as Dysarthria, Dysphonia, and Cleft Lip and Palate experienced minimal utility changes, while Dysglossia showed slight improvements. Our findings underscore that the impact of anonymization varies substantially across different disorders. This necessitates disorder-specific anonymization strategies to optimally balance privacy with diagnostic utility. Additionally, our fairness analysis revealed consistent anonymization effects across most of the demographics. This study demonstrates the effectiveness of anonymization in pathological speech for enhancing privacy, while also highlighting the importance of customized and disorder-specific approaches to account for inversion attacks. △ Less

Submitted 22 June, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.07904 [pdf, other]

HGRN2: Gated Linear RNNs with State Expansion

Authors: Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, Yiran Zhong

Abstract: Hierarchically gated linear RNN (HGRN,Qin et al. 2023) has demonstrated competitive training speed and performance in language modeling, while offering efficient inference. However, the recurrent state size of HGRN remains relatively small, which limits its expressiveness.To address this issue, inspired by linear attention, we introduce a simple outer-product-based state expansion mechanism so tha… ▽ More Hierarchically gated linear RNN (HGRN,Qin et al. 2023) has demonstrated competitive training speed and performance in language modeling, while offering efficient inference. However, the recurrent state size of HGRN remains relatively small, which limits its expressiveness.To address this issue, inspired by linear attention, we introduce a simple outer-product-based state expansion mechanism so that the recurrent state size can be significantly enlarged without introducing any additional parameters. The linear attention form also allows for hardware-efficient training.Our extensive experiments verify the advantage of HGRN2 over HGRN1 in language modeling, image classification, and Long Range Arena.Our largest 3B HGRN2 model slightly outperforms Mamba and LLaMa Architecture Transformer for language modeling in a controlled experiment setting; and performs competitively with many open-source 3B models in downstream evaluation while using much fewer total training tokens. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: Techinical Report. Yiran Zhong is the corresponding author. The source code is available at https://github.com/OpenNLPLab/HGRN2

arXiv:2404.07343 [pdf, other]

Monitoring AGNs with H$β$ Asymmetry. IV. First Reverberation Mapping Results of 14 AGNs

Authors: T. E. Zastrocky, Michael S. Brotherton, Pu Du, Jacob N. McLane, Kianna A. Olson, D. A. Dale, H. A. Kobulnicky, Jaya Maithil, My L. Nguyen, William T. Chick, David H. Kasper, Derek Hand, C. Adelman, Z. Carter, G. Murphree, M. Oeur, T. Roth, S. Schonsberg, M. J. Caradonna, J. Favro, A. J. Ferguson, I. M. Gonzalez, L. M. Hadding, H. D. Hagler, C. J. Rogers , et al. (19 additional authors not shown)

Abstract: We report first-time reverberation mapping results for 14 AGNs from the ongoing Monitoring AGNs with H$β$ Asymmetry campaign (MAHA). These results utilize optical spectra obtained with the Long Slit Spectrograph on the Wyoming Infrared 2.3m Telescope between 2017 November-2023 May. MAHA combines long-duration monitoring with high cadence. We report results from multiple observing seasons for 9 of… ▽ More We report first-time reverberation mapping results for 14 AGNs from the ongoing Monitoring AGNs with H$β$ Asymmetry campaign (MAHA). These results utilize optical spectra obtained with the Long Slit Spectrograph on the Wyoming Infrared 2.3m Telescope between 2017 November-2023 May. MAHA combines long-duration monitoring with high cadence. We report results from multiple observing seasons for 9 of the 14 objects. These results include H$β$ time lags, supermassive black hole masses, and velocity-resolved time lags. The velocity-resolved lags allow us to investigate the kinematics of the broad-line region. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 35 pages, 19 figures, accepted for publication in ApJ Supplement

arXiv:2404.07155 [pdf, other]

Unified Language-driven Zero-shot Domain Adaptation

Authors: Senqiao Yang, Zhuotao Tian, Li Jiang, Jiaya Jia

Abstract: This paper introduces Unified Language-driven Zero-shot Domain Adaptation (ULDA), a novel task setting that enables a single model to adapt to diverse target domains without explicit domain-ID knowledge. We identify the constraints in the existing language-driven zero-shot domain adaptation task, particularly the requirement for domain IDs and domain-specific models, which may restrict flexibility… ▽ More This paper introduces Unified Language-driven Zero-shot Domain Adaptation (ULDA), a novel task setting that enables a single model to adapt to diverse target domains without explicit domain-ID knowledge. We identify the constraints in the existing language-driven zero-shot domain adaptation task, particularly the requirement for domain IDs and domain-specific models, which may restrict flexibility and scalability. To overcome these issues, we propose a new framework for ULDA, consisting of Hierarchical Context Alignment (HCA), Domain Consistent Representation Learning (DCRL), and Text-Driven Rectifier (TDR). These components work synergistically to align simulated features with target text across multiple visual levels, retain semantic correlations between different regional representations, and rectify biases between simulated and real target visual features, respectively. Our extensive empirical evaluations demonstrate that this framework achieves competitive performance in both settings, surpassing even the model that requires domain-ID, showcasing its superiority and generalization ability. The proposed method is not only effective but also maintains practicality and efficiency, as it does not introduce additional computational costs during inference. Our project page is https://senqiaoyang.com/project/ULDA . △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR 2024

arXiv:2404.07131 [pdf, other]

Search for prompt production of pentaquarks in charm hadron final states

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, B. Adeva, M. Adinolfi, P. Adlarson, H. Afsharnia, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, A. Alfonso Albero, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey , et al. (1090 additional authors not shown)

Abstract: A search for hidden-charm pentaquark states decaying to a range of $Σ_{c}\bar{D}$ and $Λ_{c}\bar{D}$ final states, as well as doubly-charmed pentaquark states to $Σ_{c}D$ and $Λ_{c}^{+}D$, is made using samples of proton-proton collision data corresponding to an integrated luminosity of $5.7fb^{-1}$ recorded by the LHCb detector at $\sqrt{s} = 13Te\kern -0.1em V$. Since no significant signals are… ▽ More A search for hidden-charm pentaquark states decaying to a range of $Σ_{c}\bar{D}$ and $Λ_{c}\bar{D}$ final states, as well as doubly-charmed pentaquark states to $Σ_{c}D$ and $Λ_{c}^{+}D$, is made using samples of proton-proton collision data corresponding to an integrated luminosity of $5.7fb^{-1}$ recorded by the LHCb detector at $\sqrt{s} = 13Te\kern -0.1em V$. Since no significant signals are found, upper limits are set on the pentaquark yields relative to that of the $Λ_{c}^{+}$ baryon in the $Λ_{c}^{+}\to pK^{-}π^{+}$ decay mode. The known pentaquark states are also investigated, and their signal yields are found to be consistent with zero in all cases. △ Less

Submitted 2 June, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2023-018.html (LHCb public pages)

Report number: LHCb-PAPER-2023-018, CERN-EP-2024-071

Showing 151–200 of 4,396 results for author: Yang, S