-
Metal-Poor Stars in the MW Disk: Resonant Cooling of Vertical Oscillations of Halo Stars in Barred Galaxies
Authors:
Xingchen Li,
Isaac Shlosman,
Daniel Pfenniger,
Clayton Heller
Abstract:
Using numerical simulations of a barred disk galaxy embedded in nonspinning and spinning dark matter (DM) halos, we present a novel mechanism of `cooling' the vertical oscillations of halo DM particles, which acquire the disk kinematics. The underlying mechanism consists of resonant interactions between halo particles and the stellar bar. The cooling mechanism acts both on dynamical and secular ti…
▽ More
Using numerical simulations of a barred disk galaxy embedded in nonspinning and spinning dark matter (DM) halos, we present a novel mechanism of `cooling' the vertical oscillations of halo DM particles, which acquire the disk kinematics. The underlying mechanism consists of resonant interactions between halo particles and the stellar bar. The cooling mechanism acts both on dynamical and secular timescales, i.e., from ~ 0.5 Gyr to few Gyr, and the stellar bar acts to absorb the kinetic energy of the vertical motions. Using a Milky Way-type stellar halo, we estimate the population of metal-poor disk stars which have been trapped by the MW disk and analyze its kinematics. We find that the population of metal-poor MW disk stars with $|z|\ltorder 3$\,kpc detected by the Gaia DR3 and other surveys can have their origin in the stellar halo. The cooled population also migrates radially outwards compared by acquiring energy from the spinning bar, and prograde-moving stars have a different distribution from the retrograde ones. Next, we have calculated the ratio of the prograde-to-retrograde orbits of the cooled population and found that this ratio varies radially, with the fast-spinning stellar halo resulting in the shallower radial increase of this ratio outside of the corotation. The nonspinning stellar halo shows a monotonic increase of this ratio with radius outside the corotation. Together with analyzed radial migration of these halo stars, the cooling phenomenon of halo metal-poor stars can explain their current disk population, and has corollaries for the chemical evolution of disk galaxies in general.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Measurement of the branching fractions of $\bar{B}\to D^{(*)} K^- K^{(*)0}_{(S)}$ and $\bar{B}\to D^{(*)}D_s^{-}$ decays at Belle II
Authors:
Belle II Collaboration,
I. Adachi,
L. Aggarwal,
H. Aihara,
N. Akopov,
A. Aloisio,
N. Althubiti,
N. Anh Ky,
D. M. Asner,
H. Atmacan,
T. Aushev,
V. Aushev,
M. Aversano,
R. Ayad,
V. Babu,
H. Bae,
S. Bahinipati,
P. Bambade,
Sw. Banerjee,
S. Bansal,
M. Barrett,
J. Baudot,
A. Baur,
A. Beaubien,
F. Becherer
, et al. (382 additional authors not shown)
Abstract:
We present measurements of the branching fractions of eight $\overline B{}^0\to D^{(*)+} K^- K^{(*)0}_{(S)}$, $B^{-}\to D^{(*)0} K^- K^{(*)0}_{(S)}$ decay channels. The results are based on data from SuperKEKB electron-positron collisions at the $Υ(4S)$ resonance collected with the Belle II detector, corresponding to an integrated luminosity of $362~\text{fb}^{-1}$. The event yields are extracted…
▽ More
We present measurements of the branching fractions of eight $\overline B{}^0\to D^{(*)+} K^- K^{(*)0}_{(S)}$, $B^{-}\to D^{(*)0} K^- K^{(*)0}_{(S)}$ decay channels. The results are based on data from SuperKEKB electron-positron collisions at the $Υ(4S)$ resonance collected with the Belle II detector, corresponding to an integrated luminosity of $362~\text{fb}^{-1}$. The event yields are extracted from fits to the distributions of the difference between expected and observed $B$ meson energy, and are efficiency-corrected as a function of $m(K^-K^{(*)0}_{(S)})$ and $m(D^{(*)}K^{(*)0}_{(S)})$ in order to avoid dependence on the decay model. These results include the first observation of $\overline B{}^0\to D^+K^-K_S^0$, $B^-\to D^{*0}K^-K_S^0$, and $\overline B{}^0\to D^{*+}K^-K_S^0$ decays and a significant improvement in the precision of the other channels compared to previous measurements. The helicity-angle distributions and the invariant mass distributions of the $K^- K^{(*)0}_{(S)}$ systems are compatible with quasi-two-body decays via a resonant transition with spin-parity $J^P=1^-$ for the $K^-K_S^0$ systems and $J^P= 1^+$ for the $K^-K^{*0}$ systems. We also present measurements of the branching fractions of four $\overline B{}^0\to D^{(*)+} D_s^-$, $B^{-}\to D^{(*)0} D_s^- $ decay channels with a precision compatible to the current world averages.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Global-in-time energy stability analysis for the exponential time differencing Runge-Kutta scheme for the phase field crystal equation
Authors:
Xiao Li,
Zhonghua Qiao,
Cheng Wang,
Nan Zheng
Abstract:
The global-in-time energy estimate is derived for the second-order accurate exponential time differencing Runge-Kutta (ETDRK2) numerical scheme to the phase field crystal (PFC) equation, a sixth-order parabolic equation modeling crystal evolution. To recover the value of stabilization constant, some local-in-time convergence analysis has been reported, and the energy stability becomes available ov…
▽ More
The global-in-time energy estimate is derived for the second-order accurate exponential time differencing Runge-Kutta (ETDRK2) numerical scheme to the phase field crystal (PFC) equation, a sixth-order parabolic equation modeling crystal evolution. To recover the value of stabilization constant, some local-in-time convergence analysis has been reported, and the energy stability becomes available over a fixed final time. In this work, we develop a global-in-time energy estimate for the ETDRK2 numerical scheme to the PFC equation by showing the energy dissipation property for any final time. An a priori assumption at the previous time step, combined with a single-step $H^2$ estimate of the numerical solution, is the key point in the analysis. Such an $H^2$ estimate recovers the maximum norm bound of the numerical solution at the next time step, and then the value of the stabilization parameter can be theoretically justified. This justification ensures the energy dissipation at the next time step, so that the mathematical induction can be effectively applied, by then the global-in-time energy estimate is accomplished. This paper represents the first effort to theoretically establish a global-in-time energy stability analysis for a second-order stabilized numerical scheme in terms of the original free energy functional. The presented methodology is expected to be available for many other Runge-Kutta numerical schemes to the gradient flow equations.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Tuning-Free Visual Customization via View Iterative Self-Attention Control
Authors:
Xiaojie Li,
Chenghao Gu,
Shuzhao Xie,
Yunpeng Bai,
Weixiang Zhang,
Zhi Wang
Abstract:
Fine-Tuning Diffusion Models enable a wide range of personalized generation and editing applications on diverse visual modalities. While Low-Rank Adaptation (LoRA) accelerates the fine-tuning process, it still requires multiple reference images and time-consuming training, which constrains its scalability for large-scale and real-time applications. In this paper, we propose \textit{View Iterative…
▽ More
Fine-Tuning Diffusion Models enable a wide range of personalized generation and editing applications on diverse visual modalities. While Low-Rank Adaptation (LoRA) accelerates the fine-tuning process, it still requires multiple reference images and time-consuming training, which constrains its scalability for large-scale and real-time applications. In this paper, we propose \textit{View Iterative Self-Attention Control (VisCtrl)} to tackle this challenge. Specifically, VisCtrl is a training-free method that injects the appearance and structure of a user-specified subject into another subject in the target image, unlike previous approaches that require fine-tuning the model. Initially, we obtain the initial noise for both the reference and target images through DDIM inversion. Then, during the denoising phase, features from the reference image are injected into the target image via the self-attention mechanism. Notably, by iteratively performing this feature injection process, we ensure that the reference image features are gradually integrated into the target image. This approach results in consistent and harmonious editing with only one reference image in a few denoising steps. Moreover, benefiting from our plug-and-play architecture design and the proposed Feature Gradual Sampling strategy for multi-view editing, our method can be easily extended to edit in complex visual domains. Extensive experiments show the efficacy of VisCtrl across a spectrum of tasks, including personalized editing of images, videos, and 3D scenes.
△ Less
Submitted 10 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
Matrix norm shrinkage estimators and priors
Authors:
Xiao Li,
Takeru Matsuda,
Fumiyasu Komaki
Abstract:
We develop a class of minimax estimators for a normal mean matrix under the Frobenius loss, which generalizes the James--Stein and Efron--Morris estimators. It shrinks the Schatten norm towards zero and works well for low-rank matrices. We also propose a class of superharmonic priors based on the Schatten norm, which generalizes Stein's prior and the singular value shrinkage prior. The generalized…
▽ More
We develop a class of minimax estimators for a normal mean matrix under the Frobenius loss, which generalizes the James--Stein and Efron--Morris estimators. It shrinks the Schatten norm towards zero and works well for low-rank matrices. We also propose a class of superharmonic priors based on the Schatten norm, which generalizes Stein's prior and the singular value shrinkage prior. The generalized Bayes estimators and Bayesian predictive densities with respect to these priors are minimax. We examine the performance of the proposed estimators and priors in simulation.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
A Survey on Incomplete Multi-label Learning: Recent Advances and Future Trends
Authors:
Xiang Li,
Jiexi Liu,
Xinrui Wang,
Songcan Chen
Abstract:
In reality, data often exhibit associations with multiple labels, making multi-label learning (MLL) become a prominent research topic. The last two decades have witnessed the success of MLL, which is indispensable from complete and accurate supervised information. However, obtaining such information in practice is always laborious and sometimes even impossible. To circumvent this dilemma, incomple…
▽ More
In reality, data often exhibit associations with multiple labels, making multi-label learning (MLL) become a prominent research topic. The last two decades have witnessed the success of MLL, which is indispensable from complete and accurate supervised information. However, obtaining such information in practice is always laborious and sometimes even impossible. To circumvent this dilemma, incomplete multi-label learning (InMLL) has emerged, aiming to learn from incomplete labeled data. To date, enormous InMLL works have been proposed to narrow the performance gap with complete MLL, whereas a systematic review for InMLL is still absent. In this paper, we not only attempt to fill the lacuna but also strive to pave the way for innovative research. Specifically, we retrospect the origin of InMLL, analyze the challenges of InMLL, and make a taxonomy of InMLL from the data-oriented and algorithm-oriented perspectives, respectively. Besides, we also present real applications of InMLL in various domains. More importantly, we highlight several potential future trends, including four open problems that are more in line with practice and three under-explored/unexplored techniques in addressing the challenges of InMLL, which may shed new light on developing novel research directions in the field of InMLL.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Strong and weak $CP$ tests in sequential decays of polarized $Σ^0$ hyperons
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The wea…
▽ More
The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The weak-$CP$ test is performed in the subsequent decays of their daughter particles $Λ$ and $\barΛ$. Also for the first time, the transverse polarizations of the $Σ^0$ hyperons in $J/ψ$ and $ψ(3686)$ decays are observed with opposite directions, and the ratios between the S-wave and D-wave contributions of the $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ decays are obtained. These results are crucial to understand the decay dynamics of the charmonium states and the production mechanism of the $Σ^0-\barΣ^0$ pairs.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Approximating arrival costs in distributed moving horizon estimation: A recursive method
Authors:
Xiaojie Li,
Xunyuan Yin
Abstract:
In this paper, we present a new approach to distributed moving horizon estimation for constrained nonlinear processes. The method involves approximating the arrival costs of local estimators through a recursive framework. First, distributed full-information estimation for linear unconstrained systems is presented, which serves as the foundation for deriving the analytical expression of the arrival…
▽ More
In this paper, we present a new approach to distributed moving horizon estimation for constrained nonlinear processes. The method involves approximating the arrival costs of local estimators through a recursive framework. First, distributed full-information estimation for linear unconstrained systems is presented, which serves as the foundation for deriving the analytical expression of the arrival costs for the local estimators. Subsequently, we develop a recursive arrival cost design for linear distributed moving horizon estimation. Sufficient conditions are derived to ensure the stability of the estimation error for constrained linear systems. Next, we extend the arrival cost design derived for linear systems to account for nonlinear systems, and a partition-based constrained distributed moving horizon estimation algorithm for nonlinear systems is formulated. A benchmark chemical process is used to illustrate the effectiveness and superiority of the proposed method.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models
Authors:
Xi Li,
Yusen Zhang,
Renze Lou,
Chen Wu,
Jiaqi Wang
Abstract:
Backdoor attacks present significant threats to Large Language Models (LLMs), particularly with the rise of third-party services that offer API integration and prompt engineering. Untrustworthy third parties can plant backdoors into LLMs and pose risks to users by embedding malicious instructions into user queries. The backdoor-compromised LLM will generate malicious output when and input is embed…
▽ More
Backdoor attacks present significant threats to Large Language Models (LLMs), particularly with the rise of third-party services that offer API integration and prompt engineering. Untrustworthy third parties can plant backdoors into LLMs and pose risks to users by embedding malicious instructions into user queries. The backdoor-compromised LLM will generate malicious output when and input is embedded with a specific trigger predetermined by an attacker. Traditional defense strategies, which primarily involve model parameter fine-tuning and gradient calculation, are inadequate for LLMs due to their extensive computational and clean data requirements. In this paper, we propose a novel solution, Chain-of-Scrutiny (CoS), to address these challenges. Backdoor attacks fundamentally create a shortcut from the trigger to the target output, thus lack reasoning support. Accordingly, CoS guides the LLMs to generate detailed reasoning steps for the input, then scrutinizes the reasoning process to ensure consistency with the final answer. Any inconsistency may indicate an attack. CoS only requires black-box access to LLM, offering a practical defense, particularly for API-accessible LLMs. It is user-friendly, enabling users to conduct the defense themselves. Driven by natural language, the entire defense process is transparent to users. We validate the effectiveness of CoS through extensive experiments across various tasks and LLMs. Additionally, experiments results shows CoS proves more beneficial for more powerful LLMs.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Self-Distilled Disentangled Learning for Counterfactual Prediction
Authors:
Xinshu Li,
Mingming Gong,
Lina Yao
Abstract:
The advancements in disentangled representation learning significantly enhance the accuracy of counterfactual predictions by granting precise control over instrumental variables, confounders, and adjustable variables. An appealing method for achieving the independent separation of these factors is mutual information minimization, a task that presents challenges in numerous machine learning scenari…
▽ More
The advancements in disentangled representation learning significantly enhance the accuracy of counterfactual predictions by granting precise control over instrumental variables, confounders, and adjustable variables. An appealing method for achieving the independent separation of these factors is mutual information minimization, a task that presents challenges in numerous machine learning scenarios, especially within high-dimensional spaces. To circumvent this challenge, we propose the Self-Distilled Disentanglement framework, referred to as $SD^2$. Grounded in information theory, it ensures theoretically sound independent disentangled representations without intricate mutual information estimator designs for high-dimensional representations. Our comprehensive experiments, conducted on both synthetic and real-world datasets, confirms the effectiveness of our approach in facilitating counterfactual inference in the presence of both observed and unobserved confounders.
△ Less
Submitted 14 June, 2024; v1 submitted 9 June, 2024;
originally announced June 2024.
-
Measurement of the integrated luminosity of the data collected at 3.773 GeV by BESIII from 2021 to 2024
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (634 additional authors not shown)
Abstract:
We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$,…
▽ More
We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$, $8.157 \pm 0.031$~fb$^{-1}$, and $4.191 \pm 0.016$~fb$^{-1}$, respectively, by analyzing large angle Bhabha scattering events. The uncertainties are dominated by systematic effects and the statistical uncertainties are negligible. Our results provide essential input for future analyses and precision measurements.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
A Survey on LLM-Based Agents: Common Workflows and Reusable LLM-Profiled Components
Authors:
Xinzhe Li
Abstract:
Recent advancements in Large Language Models (LLMs) have catalyzed the development of sophisticated frameworks for developing LLM-based agents. However, the complexity of these frameworks r poses a hurdle for nuanced differentiation at a granular level, a critical aspect for enabling efficient implementations across different frameworks and fostering future research. Hence, the primary purpose of…
▽ More
Recent advancements in Large Language Models (LLMs) have catalyzed the development of sophisticated frameworks for developing LLM-based agents. However, the complexity of these frameworks r poses a hurdle for nuanced differentiation at a granular level, a critical aspect for enabling efficient implementations across different frameworks and fostering future research. Hence, the primary purpose of this survey is to facilitate a cohesive understanding of diverse recently proposed frameworks by identifying common workflows and reusable LLM-Profiled Components (LMPCs).
△ Less
Submitted 15 June, 2024; v1 submitted 9 June, 2024;
originally announced June 2024.
-
Efficient Beamforming Feedback Information-Based Wi-Fi Sensing by Feature Selection
Authors:
Xin Li,
Jingzhi Hu,
Jun Luo
Abstract:
Wi-Fi sensing leveraging plain-text beamforming feedback information (BFI) in multiple-input-multiple-output (MIMO) systems attracts increasing attention. However, due to the implicit relationship between BFI and the channel state information (CSI), quantifying the sensing capability of BFI poses a challenge in building efficient BFI-based sensing algorithms. In this letter, we first derive a math…
▽ More
Wi-Fi sensing leveraging plain-text beamforming feedback information (BFI) in multiple-input-multiple-output (MIMO) systems attracts increasing attention. However, due to the implicit relationship between BFI and the channel state information (CSI), quantifying the sensing capability of BFI poses a challenge in building efficient BFI-based sensing algorithms. In this letter, we first derive a mathematical model of BFI, characterizing its relationship with CSI explicitly, and then develop a closed-form expression of BFI for 2x2 MIMO systems. To enhance the efficiency of BFI-based sensing by selecting only the most informative features, we quantify the sensing capacity of BFI using the Cramer-Rao bound (CRB) and then propose an efficient CRB-based BFI feature selection algorithm. Simulation results verify that BFI and CSI exhibit comparable sensing capabilities and that the proposed algorithm halves the number of features, reducing 20% more parameters than baseline methods, at the cost of only slightly increasing positioning errors.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
A Generalized Version of Chung's Lemma and its Applications
Authors:
Li Jiang,
Xiao Li,
Andre Milzarek,
Junwen Qiu
Abstract:
Chung's lemma is a classical tool for establishing asymptotic convergence rates of (stochastic) optimization methods under strong convexity-type assumptions and appropriate polynomial diminishing step sizes. In this work, we develop a generalized version of Chung's lemma, which provides a simple non-asymptotic convergence framework for a more general family of step size rules. We demonstrate broad…
▽ More
Chung's lemma is a classical tool for establishing asymptotic convergence rates of (stochastic) optimization methods under strong convexity-type assumptions and appropriate polynomial diminishing step sizes. In this work, we develop a generalized version of Chung's lemma, which provides a simple non-asymptotic convergence framework for a more general family of step size rules. We demonstrate broad applicability of the proposed generalized Chung's lemma by deriving tight non-asymptotic convergence rates for a large variety of stochastic methods. In particular, we obtain partially new non-asymptotic complexity results for stochastic optimization methods, such as stochastic gradient descent and random reshuffling, under a general $(θ,μ)$-Polyak-Lojasiewicz (PL) condition and for various step sizes strategies, including polynomial, constant, exponential, and cosine step sizes rules. Notably, as a by-product of our analysis, we observe that exponential step sizes can adapt to the objective function's geometry, achieving the optimal convergence rate without requiring exact knowledge of the underlying landscape. Our results demonstrate that the developed variant of Chung's lemma offers a versatile, systematic, and streamlined approach to establish non-asymptotic convergence rates under general step size rules.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
CCSI: Continual Class-Specific Impression for Data-free Class Incremental Learning
Authors:
Sana Ayromlou,
Teresa Tsang,
Purang Abolmaesumi,
Xiaoxiao Li
Abstract:
In real-world clinical settings, traditional deep learning-based classification methods struggle with diagnosing newly introduced disease types because they require samples from all disease classes for offline training. Class incremental learning offers a promising solution by adapting a deep network trained on specific disease classes to handle new diseases. However, catastrophic forgetting occur…
▽ More
In real-world clinical settings, traditional deep learning-based classification methods struggle with diagnosing newly introduced disease types because they require samples from all disease classes for offline training. Class incremental learning offers a promising solution by adapting a deep network trained on specific disease classes to handle new diseases. However, catastrophic forgetting occurs, decreasing the performance of earlier classes when adapting the model to new data. Prior proposed methodologies to overcome this require perpetual storage of previous samples, posing potential practical concerns regarding privacy and storage regulations in healthcare. To this end, we propose a novel data-free class incremental learning framework that utilizes data synthesis on learned classes instead of data storage from previous classes. Our key contributions include acquiring synthetic data known as Continual Class-Specific Impression (CCSI) for previously inaccessible trained classes and presenting a methodology to effectively utilize this data for updating networks when introducing new classes. We obtain CCSI by employing data inversion over gradients of the trained classification model on previous classes starting from the mean image of each class inspired by common landmarks shared among medical images and utilizing continual normalization layers statistics as a regularizer in this pixel-wise optimization process. Subsequently, we update the network by combining the synthesized data with new class data and incorporate several losses, including an intra-domain contrastive loss to generalize the deep network trained on the synthesized data to real data, a margin loss to increase separation among previous classes and new ones, and a cosine-normalized cross-entropy loss to alleviate the adverse effects of imbalanced distributions in training data.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Medical Vision Generalist: Unifying Medical Imaging Tasks in Context
Authors:
Sucheng Ren,
Xiaoke Huang,
Xianhang Li,
Junfei Xiao,
Jieru Mei,
Zeyu Wang,
Alan Yuille,
Yuyin Zhou
Abstract:
This study presents Medical Vision Generalist (MVG), the first foundation model capable of handling various medical imaging tasks -- such as cross-modal synthesis, image segmentation, denoising, and inpainting -- within a unified image-to-image generation framework. Specifically, MVG employs an in-context generation strategy that standardizes the handling of inputs and outputs as images. By treati…
▽ More
This study presents Medical Vision Generalist (MVG), the first foundation model capable of handling various medical imaging tasks -- such as cross-modal synthesis, image segmentation, denoising, and inpainting -- within a unified image-to-image generation framework. Specifically, MVG employs an in-context generation strategy that standardizes the handling of inputs and outputs as images. By treating these tasks as an image generation process conditioned on prompt image-label pairs and input images, this approach enables a flexible unification of various tasks, even those spanning different modalities and datasets. To capitalize on both local and global context, we design a hybrid method combining masked image modeling with autoregressive training for conditional image generation. This hybrid approach yields the most robust performance across all involved medical imaging tasks. To rigorously evaluate MVG's capabilities, we curated the first comprehensive generalist medical vision benchmark, comprising 13 datasets and spanning four imaging modalities (CT, MRI, X-ray, and micro-ultrasound). Our results consistently establish MVG's superior performance, outperforming existing vision generalists, such as Painter and LVM. Furthermore, MVG exhibits strong scalability, with its performance demonstrably improving when trained on a more diverse set of tasks, and can be effectively adapted to unseen datasets with only minimal task-specific samples. The code is available at \url{https://github.com/OliverRensu/MVG}.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Revisit to the WGVC schemes: a nonlinear order-preserving and spectral-property-optimized methodology and its enhancement
Authors:
Kang He,
Hongwei Liu,
Tongbiao Guo,
Xinliang Li,
Zhiwei He
Abstract:
The numerical simulation of supersonic complex flow problems demands capabilities in identifying multiscale structures and capturing shocks, imposing stringent requirements on the numerical scheme. The capability to identify multiscale structures is closely related to the spectral properties of the numerical scheme. Currently, existing methods to improve the spectral properties of finite differenc…
▽ More
The numerical simulation of supersonic complex flow problems demands capabilities in identifying multiscale structures and capturing shocks, imposing stringent requirements on the numerical scheme. The capability to identify multiscale structures is closely related to the spectral properties of the numerical scheme. Currently, existing methods to improve the spectral properties of finite difference schemes face shortcomings such as parallel difficulties (compact methods) or introducing unnecessary dispersion errors at low wavenumbers due to accuracy loss (spectral-like optimization methods). In this paper, we proposed an order-preserving spectral properties optimization method based on the group velocity control theory: the weighted group velocity control (WGVC) scheme. This method, centered around the concept of group velocity, achieves low-wavenumber accuracy control and mid-wavenumber group velocity control by designing smoothness indicators and nonlinear weighting approach for wave packets. Furthermore, by embedding the WGVC scheme into shock-capturing schemes such as WENO/TENO scheme, we not only preserve the spectral properties of the WGVC scheme at medium to low wavenumbers but also enhance the shock-capturing capability of the scheme. Theoretical and numerical experiments verify that the new method has advantages such as order-preserving, small dispersion and dissipation errors, and is very suitable for numerical simulation of complex flow problems such as turbulence-shock boundary layer interactions.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Selecting the Number of Communities for Weighted Degree-Corrected Stochastic Block Models
Authors:
Yucheng Liu,
Xiaodong Li
Abstract:
We investigate how to select the number of communities for weighted networks without a full likelihood modeling. First, we propose a novel weighted degree-corrected stochastic block model (DCSBM), in which the mean adjacency matrix is modeled as the same as in standard DCSBM, while the variance profile matrix is assumed to be related to the mean adjacency matrix through a given variance function.…
▽ More
We investigate how to select the number of communities for weighted networks without a full likelihood modeling. First, we propose a novel weighted degree-corrected stochastic block model (DCSBM), in which the mean adjacency matrix is modeled as the same as in standard DCSBM, while the variance profile matrix is assumed to be related to the mean adjacency matrix through a given variance function. Our method of selection the number of communities is based on a sequential testing framework, in each step the weighed DCSBM is fitted via some spectral clustering method. A key step is to carry out matrix scaling on the estimated variance profile matrix. The resulting scaling factors can be used to normalize the adjacency matrix, from which the testing statistic is obtained. Under mild conditions on the weighted DCSBM, our proposed procedure is shown to be consistent in estimating the true number of communities. Numerical experiments on both simulated and real network data also demonstrate the desirable empirical properties of our method.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation
Authors:
Xiaoqi Wang,
Wenbin He,
Xiwei Xuan,
Clint Sebastian,
Jorge Piazentin Ono,
Xin Li,
Sima Behpour,
Thang Doan,
Liang Gou,
Han Wei Shen,
Liu Ren
Abstract:
The open-vocabulary image segmentation task involves partitioning images into semantically meaningful segments and classifying them with flexible text-defined categories. The recent vision-based foundation models such as the Segment Anything Model (SAM) have shown superior performance in generating class-agnostic image segments. The main challenge in open-vocabulary image segmentation now lies in…
▽ More
The open-vocabulary image segmentation task involves partitioning images into semantically meaningful segments and classifying them with flexible text-defined categories. The recent vision-based foundation models such as the Segment Anything Model (SAM) have shown superior performance in generating class-agnostic image segments. The main challenge in open-vocabulary image segmentation now lies in accurately classifying these segments into text-defined categories. In this paper, we introduce the Universal Segment Embedding (USE) framework to address this challenge. This framework is comprised of two key components: 1) a data pipeline designed to efficiently curate a large amount of segment-text pairs at various granularities, and 2) a universal segment embedding model that enables precise segment classification into a vast range of text-defined categories. The USE model can not only help open-vocabulary image segmentation but also facilitate other downstream tasks (e.g., querying and ranking). Through comprehensive experimental studies on semantic segmentation and part segmentation benchmarks, we demonstrate that the USE framework outperforms state-of-the-art open-vocabulary segmentation methods.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
CorDA: Context-Oriented Decomposition Adaptation of Large Language Models
Authors:
Yibo Yang,
Xiaojie Li,
Zhongzhu Zhou,
Shuaiwen Leon Song,
Jianlong Wu,
Liqiang Nie,
Bernard Ghanem
Abstract:
Current parameter-efficient fine-tuning (PEFT) methods build adapters without considering the context of downstream task to learn, or the context of important knowledge to maintain. As a result, there is often a performance gap compared to full-parameter finetuning, and meanwhile the finetuned model suffers from catastrophic forgetting of the pre-trained world knowledge. In this paper, we propose…
▽ More
Current parameter-efficient fine-tuning (PEFT) methods build adapters without considering the context of downstream task to learn, or the context of important knowledge to maintain. As a result, there is often a performance gap compared to full-parameter finetuning, and meanwhile the finetuned model suffers from catastrophic forgetting of the pre-trained world knowledge. In this paper, we propose CorDA, a Context-oriented Decomposition Adaptation method that builds learnable adapters from weight decomposition oriented by the context of downstream task or world knowledge. Concretely, we collect a few data samples, and perform singular value decomposition for each linear layer of a pre-trained LLM multiplied by the covariance matrix of the input activation using these samples. By doing so, the context of the representative samples is captured through deciding the factorizing orientation. Our method enables two options, the knowledge-preserved adaptation and the instruction-previewed adaptation. For the former, we use question-answering samples to obtain the covariance matrices, and use the decomposed components with the smallest $r$ singular values to initialize a learnable adapter, with the others frozen such that the world knowledge is better preserved. For the latter, we use the instruction data from the finetuning task, such as math or coding, to orientate the decomposition and train the largest $r$ components that capture the main characteristics of the task to learn. We conduct extensive experiments on Math, Code, and Instruction Following tasks. Our knowledge-preserved adaptation not only achieves better performance than LoRA on finetuning tasks, but also mitigates the forgetting of world knowledge. Our instruction-previewed adaptation is able to further enhance the finetuning performance, surpassing full-parameter finetuning and the state-of-the-art PEFT methods.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation
Authors:
Yibo Yang,
Xiaojie Li,
Motasem Alfarra,
Hasan Hammoud,
Adel Bibi,
Philip Torr,
Bernard Ghanem
Abstract:
Relieving the reliance of neural network training on a global back-propagation (BP) has emerged as a notable research topic due to the biological implausibility and huge memory consumption caused by BP. Among the existing solutions, local learning optimizes gradient-isolated modules of a neural network with local errors and has been proved to be effective even on large-scale datasets. However, the…
▽ More
Relieving the reliance of neural network training on a global back-propagation (BP) has emerged as a notable research topic due to the biological implausibility and huge memory consumption caused by BP. Among the existing solutions, local learning optimizes gradient-isolated modules of a neural network with local errors and has been proved to be effective even on large-scale datasets. However, the reconciliation among local errors has never been investigated. In this paper, we first theoretically study non-greedy layer-wise training and show that the convergence cannot be assured when the local gradient in a module w.r.t. its input is not reconciled with the local gradient in the previous module w.r.t. its output. Inspired by the theoretical result, we further propose a local training strategy that successively regularizes the gradient reconciliation between neighboring modules without breaking gradient isolation or introducing any learnable parameters. Our method can be integrated into both local-BP and BP-free settings. In experiments, we achieve significant performance improvements compared to previous methods. Particularly, our method for CNN and Transformer architectures on ImageNet is able to attain a competitive performance with global BP, saving more than 40% memory consumption.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Authors:
Shengqiong Wu,
Hao Fei,
Xiangtai Li,
Jiayi Ji,
Hanwang Zhang,
Tat-Seng Chua,
Shuicheng Yan
Abstract:
Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in processing vision-language tasks. One of the crux of MLLMs lies in vision tokenization, which involves efficiently transforming input visual signals into feature representations that are most beneficial for LLMs. However, existing vision tokenizers, essential for semantic alignment between vision and language, r…
▽ More
Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in processing vision-language tasks. One of the crux of MLLMs lies in vision tokenization, which involves efficiently transforming input visual signals into feature representations that are most beneficial for LLMs. However, existing vision tokenizers, essential for semantic alignment between vision and language, remain problematic. Existing methods aggressively fragment visual input, corrupting the visual semantic integrity. To address this, this paper proposes a novel dynamic Semantic-Equivalent Vision Tokenizer (SeTok), which groups visual features into semantic units via a dynamic clustering algorithm, flexibly determining the number of tokens based on image complexity. The resulting vision tokens effectively preserve semantic integrity and capture both low-frequency and high-frequency visual features. The proposed MLLM (Setokim) equipped with SeTok significantly demonstrates superior performance across various tasks, as evidenced by our experimental results. The project page is at https://chocowu.github.io/SeTok-web/.
△ Less
Submitted 27 June, 2024; v1 submitted 7 June, 2024;
originally announced June 2024.
-
CoNo: Consistency Noise Injection for Tuning-free Long Video Diffusion
Authors:
Xingrui Wang,
Xin Li,
Zhibo Chen
Abstract:
Tuning-free long video diffusion has been proposed to generate extended-duration videos with enriched content by reusing the knowledge from pre-trained short video diffusion model without retraining. However, most works overlook the fine-grained long-term video consistency modeling, resulting in limited scene consistency (i.e., unreasonable object or background transitions), especially with multip…
▽ More
Tuning-free long video diffusion has been proposed to generate extended-duration videos with enriched content by reusing the knowledge from pre-trained short video diffusion model without retraining. However, most works overlook the fine-grained long-term video consistency modeling, resulting in limited scene consistency (i.e., unreasonable object or background transitions), especially with multiple text inputs. To mitigate this, we propose the Consistency Noise Injection, dubbed CoNo, which introduces the "look-back" mechanism to enhance the fine-grained scene transition between different video clips, and designs the long-term consistency regularization to eliminate the content shifts when extending video contents through noise prediction. In particular, the "look-back" mechanism breaks the noise scheduling process into three essential parts, where one internal noise prediction part is injected into two video-extending parts, intending to achieve a fine-grained transition between two video clips. The long-term consistency regularization focuses on explicitly minimizing the pixel-wise distance between the predicted noises of the extended video clip and the original one, thereby preventing abrupt scene transitions. Extensive experiments have shown the effectiveness of the above strategies by performing long-video generation under both single- and multi-text prompt conditions. The project has been available in https://wxrui182.github.io/CoNo.github.io/.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
CityCraft: A Real Crafter for 3D City Generation
Authors:
Jie Deng,
Wenhao Chai,
Junsheng Huang,
Zhonghan Zhao,
Qixuan Huang,
Mingyan Gao,
Jianshu Guo,
Shengyu Hao,
Wenhao Hu,
Jenq-Neng Hwang,
Xi Li,
Gaoang Wang
Abstract:
City scene generation has gained significant attention in autonomous driving, smart city development, and traffic simulation. It helps enhance infrastructure planning and monitoring solutions. Existing methods have employed a two-stage process involving city layout generation, typically using Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), or Transformers, followed by neur…
▽ More
City scene generation has gained significant attention in autonomous driving, smart city development, and traffic simulation. It helps enhance infrastructure planning and monitoring solutions. Existing methods have employed a two-stage process involving city layout generation, typically using Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), or Transformers, followed by neural rendering. These techniques often exhibit limited diversity and noticeable artifacts in the rendered city scenes. The rendered scenes lack variety, resembling the training images, resulting in monotonous styles. Additionally, these methods lack planning capabilities, leading to less realistic generated scenes. In this paper, we introduce CityCraft, an innovative framework designed to enhance both the diversity and quality of urban scene generation. Our approach integrates three key stages: initially, a diffusion transformer (DiT) model is deployed to generate diverse and controllable 2D city layouts. Subsequently, a Large Language Model(LLM) is utilized to strategically make land-use plans within these layouts based on user prompts and language guidelines. Based on the generated layout and city plan, we utilize the asset retrieval module and Blender for precise asset placement and scene construction. Furthermore, we contribute two new datasets to the field: 1)CityCraft-OSM dataset including 2D semantic layouts of urban areas, corresponding satellite images, and detailed annotations. 2) CityCraft-Buildings dataset, featuring thousands of diverse, high-quality 3D building assets. CityCraft achieves state-of-the-art performance in generating realistic 3D cities.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
The free boundary problem of an epidemic model with nonlocal diffusions and nonlocal reactions: spreading-vanishing dichotomy
Authors:
Xueping Li,
Lei Li,
Mingxin Wang
Abstract:
This paper concerns the free boundary problem of an epidemic model. The spatial movements of the infectious agents and the infective humans are approximated by nonlocal diffusion operators. Especially, both the growth rate of the agents and the infective rate of humans are represented by nonlocal reaction terms. Thus our model has four integral terms which bring some diffculties for the study of t…
▽ More
This paper concerns the free boundary problem of an epidemic model. The spatial movements of the infectious agents and the infective humans are approximated by nonlocal diffusion operators. Especially, both the growth rate of the agents and the infective rate of humans are represented by nonlocal reaction terms. Thus our model has four integral terms which bring some diffculties for the study of the corresponding principal eigenvalue problem. Firstly, using some elementray analysis instead of Krein-Rutman theorem and the variational characteristic, we obtain the existence and asymptotic behaviors of principal eigenvalue. Then a spreading-vanishing dichotomy is proved to hold, and the criteria for spreading and vanishing are derived. Lastly, comparing our results with those in the existing works, we discuss the effect of nonlocal reaction term on spreading and vanishing, finding that the more nonlocal reaction terms a model has, the harder spreading happens.
△ Less
Submitted 10 June, 2024; v1 submitted 7 June, 2024;
originally announced June 2024.
-
Skill-aware Mutual Information Optimisation for Generalisation in Reinforcement Learning
Authors:
Xuehui Yu,
Mhairi Dunion,
Xin Li,
Stefano V. Albrecht
Abstract:
Meta-Reinforcement Learning (Meta-RL) agents can struggle to operate across tasks with varying environmental features that require different optimal skills (i.e., different modes of behaviours). Using context encoders based on contrastive learning to enhance the generalisability of Meta-RL agents is now widely studied but faces challenges such as the requirement for a large sample size, also refer…
▽ More
Meta-Reinforcement Learning (Meta-RL) agents can struggle to operate across tasks with varying environmental features that require different optimal skills (i.e., different modes of behaviours). Using context encoders based on contrastive learning to enhance the generalisability of Meta-RL agents is now widely studied but faces challenges such as the requirement for a large sample size, also referred to as the $\log$-$K$ curse. To improve RL generalisation to different tasks, we first introduce Skill-aware Mutual Information (SaMI), an optimisation objective that aids in distinguishing context embeddings according to skills, thereby equipping RL agents with the ability to identify and execute different skills across tasks. We then propose Skill-aware Noise Contrastive Estimation (SaNCE), a $K$-sample estimator used to optimise the SaMI objective. We provide a framework for equipping an RL agent with SaNCE in practice and conduct experimental validation on modified MuJoCo and Panda-gym benchmarks. We empirically find that RL agents that learn by maximising SaMI achieve substantially improved zero-shot generalisation to unseen tasks. Additionally, the context encoder equipped with SaNCE demonstrates greater robustness to reductions in the number of available samples, thus possessing the potential to overcome the $\log$-$K$ curse.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Logic Synthesis with Generative Deep Neural Networks
Authors:
Xihan Li,
Xing Li,
Lei Chen,
Xing Zhang,
Mingxuan Yuan,
Jun Wang
Abstract:
While deep learning has achieved significant success in various domains, its application to logic circuit design has been limited due to complex constraints and strict feasibility requirement. However, a recent generative deep neural model, "Circuit Transformer", has shown promise in this area by enabling equivalence-preserving circuit transformation on a small scale. In this paper, we introduce a…
▽ More
While deep learning has achieved significant success in various domains, its application to logic circuit design has been limited due to complex constraints and strict feasibility requirement. However, a recent generative deep neural model, "Circuit Transformer", has shown promise in this area by enabling equivalence-preserving circuit transformation on a small scale. In this paper, we introduce a logic synthesis rewriting operator based on the Circuit Transformer model, named "ctrw" (Circuit Transformer Rewriting), which incorporates the following techniques: (1) a two-stage training scheme for the Circuit Transformer tailored for logic synthesis, with iterative improvement of optimality through self-improvement training; (2) integration of the Circuit Transformer with state-of-the-art rewriting techniques to address scalability issues, allowing for guided DAG-aware rewriting. Experimental results on the IWLS 2023 contest benchmark demonstrate the effectiveness of our proposed rewriting methods.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
The Potential Energy of Heavy Quarkonium in Flavor-Dependent Systems from a Holographic Model
Authors:
Xi Guo,
Xun Chen,
Dong Xiang,
Miguel Angel Martin Contreras,
Xiao-Hua Li
Abstract:
Within the framework of the Einstein-Maxwell-Dilaton (EMD) model, which incorporates information on the equation of state and baryon number susceptibility from lattice results, we have conducted a comprehensive analysis of the potential energy, running coupling, and dissociation time for heavy quark-antiquark pairs using gauge/gravity duality. This study encompasses various systems, including pure…
▽ More
Within the framework of the Einstein-Maxwell-Dilaton (EMD) model, which incorporates information on the equation of state and baryon number susceptibility from lattice results, we have conducted a comprehensive analysis of the potential energy, running coupling, and dissociation time for heavy quark-antiquark pairs using gauge/gravity duality. This study encompasses various systems, including pure gluon systems, 2 flavor systems, 2+1 flavor systems, and 2+1+1 flavor systems under finite temperature and chemical potential. The results reveal that the linear component of the potential energy diminishes as the flavor increases. It is also found that our results are extremely close to the recent lattice results for 2+1 flavors at finite temperature. Moreover, we have thoroughly investigated the dissociation distance and running coupling constant of quark-antiquark pairs to gain a comprehensive understanding of their behavior across various flavors. Finally, we have examined real-time dynamics of quark dissociation. The findings indicate that the dissociation time of quark-antiquark pairs is dependent on temperature, chemical potential, and flavor of the systems.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Measurements of the branching fractions of $Ξ_{c}^{0}\toΞ^{0}π^{0}$, $Ξ_{c}^{0}\toΞ^{0}η$, and $Ξ_{c}^{0}\toΞ^{0}η^{\prime}$ and asymmetry parameter of $Ξ_{c}^{0}\toΞ^{0}π^{0}$
Authors:
Belle,
Belle II Collaborations,
:,
I. Adachi,
L. Aggarwal,
H. Aihara,
N. Akopov,
A. Aloisio,
N. Althubiti,
N. Anh Ky,
D. M. Asner,
H. Atmacan,
T. Aushev,
V. Aushev,
M. Aversano,
R. Ayad,
V. Babu,
H. Bae,
S. Bahinipati,
P. Bambade,
Sw. Banerjee,
M. Barrett,
J. Baudot,
A. Baur,
A. Beaubien
, et al. (360 additional authors not shown)
Abstract:
We present a study of $Ξ_{c}^{0}\toΞ^{0}π^{0}$, $Ξ_{c}^{0}\toΞ^{0}η$, and $Ξ_{c}^{0}\toΞ^{0}η^{\prime}$ decays using the Belle and Belle~II data samples, which have integrated luminosities of 980~$\mathrm{fb}^{-1}$ and 426~$\mathrm{fb}^{-1}$, respectively. We measure the following relative branching fractions…
▽ More
We present a study of $Ξ_{c}^{0}\toΞ^{0}π^{0}$, $Ξ_{c}^{0}\toΞ^{0}η$, and $Ξ_{c}^{0}\toΞ^{0}η^{\prime}$ decays using the Belle and Belle~II data samples, which have integrated luminosities of 980~$\mathrm{fb}^{-1}$ and 426~$\mathrm{fb}^{-1}$, respectively. We measure the following relative branching fractions $${\cal B}(Ξ_{c}^{0}\toΞ^{0}π^{0})/{\cal B}(Ξ_{c}^{0}\toΞ^{-}π^{+}) = 0.48 \pm 0.02 ({\rm stat}) \pm 0.03 ({\rm syst}) ,$$ $${\cal B}(Ξ_{c}^{0}\toΞ^{0}η)/{\cal B}(Ξ_{c}^{0}\toΞ^{-}π^{+}) = 0.11 \pm 0.01 ({\rm stat}) \pm 0.01 ({\rm syst}) ,$$ $${\cal B}(Ξ_{c}^{0}\toΞ^{0}η^{\prime})/{\cal B}(Ξ_{c}^{0}\toΞ^{-}π^{+}) = 0.08 \pm 0.02 ({\rm stat}) \pm 0.01 ({\rm syst}) $$ for the first time, where the uncertainties are statistical ($\rm stat$) and systematic ($\rm syst$). By multiplying by the branching fraction of the normalization mode, ${\mathcal B}(Ξ_{c}^{0}\toΞ^{-}π^{+})$, we obtain the following absolute branching fraction results $(6.9 \pm 0.3 ({\rm stat}) \pm 0.5 ({\rm syst}) \pm 1.3 ({\rm norm})) \times 10^{-3}$, $(1.6 \pm 0.2 ({\rm stat}) \pm 0.2 ({\rm syst}) \pm 0.3 ({\rm norm})) \times 10^{-3}$, and $(1.2 \pm 0.3 ({\rm stat}) \pm 0.1 ({\rm syst}) \pm 0.2 ({\rm norm})) \times 10^{-3}$, for $Ξ_{c}^{0}$ decays to $Ξ^{0}π^{0}$, $Ξ^{0}η$, and $Ξ^{0}η^{\prime}$ final states, respectively. The third errors are from the uncertainty on ${\mathcal B}(Ξ_{c}^{0}\toΞ^{-}π^{+})$. The asymmetry parameter for $Ξ_{c}^{0}\toΞ^{0}π^{0}$ is measured to be $α(Ξ_{c}^{0}\toΞ^{0}π^{0}) = -0.90\pm0.15({\rm stat})\pm0.23({\rm syst})$.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Simplify Implant Depth Prediction as Video Grounding: A Texture Perceive Implant Depth Prediction Network
Authors:
Xinquan Yang,
Xuguang Li,
Xiaoling Luo,
Leilei Zeng,
Yudi Zhang,
Linlin Shen,
Yongqiang Deng
Abstract:
Surgical guide plate is an important tool for the dental implant surgery. However, the design process heavily relies on the dentist to manually simulate the implant angle and depth. When deep neural networks have been applied to assist the dentist quickly locates the implant position, most of them are not able to determine the implant depth. Inspired by the video grounding task which localizes the…
▽ More
Surgical guide plate is an important tool for the dental implant surgery. However, the design process heavily relies on the dentist to manually simulate the implant angle and depth. When deep neural networks have been applied to assist the dentist quickly locates the implant position, most of them are not able to determine the implant depth. Inspired by the video grounding task which localizes the starting and ending time of the target video segment, in this paper, we simplify the implant depth prediction as video grounding and develop a Texture Perceive Implant Depth Prediction Network (TPNet), which enables us to directly output the implant depth without complex measurements of oral bone. TPNet consists of an implant region detector (IRD) and an implant depth prediction network (IDPNet). IRD is an object detector designed to crop the candidate implant volume from the CBCT, which greatly saves the computation resource. IDPNet takes the cropped CBCT data to predict the implant depth. A Texture Perceive Loss (TPL) is devised to enable the encoder of IDPNet to perceive the texture variation among slices. Extensive experiments on a large dental implant dataset demonstrated that the proposed TPNet achieves superior performance than the existing methods.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
1st Place Solution for MOSE Track in CVPR 2024 PVUW Workshop: Complex Video Object Segmentation
Authors:
Deshui Miao,
Xin Li,
Zhenyu He,
Yaowei Wang,
Ming-Hsuan Yang
Abstract:
Tracking and segmenting multiple objects in complex scenes has always been a challenge in the field of video object segmentation, especially in scenarios where objects are occluded and split into parts. In such cases, the definition of objects becomes very ambiguous. The motivation behind the MOSE dataset is how to clearly recognize and distinguish objects in complex scenes. In this challenge, we…
▽ More
Tracking and segmenting multiple objects in complex scenes has always been a challenge in the field of video object segmentation, especially in scenarios where objects are occluded and split into parts. In such cases, the definition of objects becomes very ambiguous. The motivation behind the MOSE dataset is how to clearly recognize and distinguish objects in complex scenes. In this challenge, we propose a semantic embedding video object segmentation model and use the salient features of objects as query representations. The semantic understanding helps the model to recognize parts of the objects and the salient feature captures the more discriminative features of the objects. Trained on a large-scale video object segmentation dataset, our model achieves first place (\textbf{84.45\%}) in the test set of PVUW Challenge 2024: Complex Video Object Segmentation Track.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach
Authors:
Jianbo Dong,
Bin Luo,
Jun Zhang,
Pengcheng Zhang,
Fei Feng,
Yikai Zhu,
Ang Liu,
Zian Chen,
Yi Shi,
Hairong Jiao,
Gang Lu,
Yu Guan,
Ennan Zhai,
Wencong Xiao,
Hanyu Zhao,
Man Yuan,
Siran Yang,
Xiang Li,
Jiamang Wang,
Rui Men,
Jianwei Zhang,
Huang Zhong,
Dennis Cai,
Yuan Xie,
Binzhang Fu
Abstract:
The emergence of Large Language Models (LLMs) has necessitated the adoption of parallel training techniques, involving the deployment of thousands of GPUs to train a single model. Unfortunately, we have found that the efficiency of current parallel training is often suboptimal, largely due to the following two main issues. Firstly, hardware failures are inevitable, leading to interruptions in the…
▽ More
The emergence of Large Language Models (LLMs) has necessitated the adoption of parallel training techniques, involving the deployment of thousands of GPUs to train a single model. Unfortunately, we have found that the efficiency of current parallel training is often suboptimal, largely due to the following two main issues. Firstly, hardware failures are inevitable, leading to interruptions in the training tasks. The inability to quickly identify the faulty components results in a substantial waste of GPU resources. Secondly, since GPUs must wait for parameter synchronization to complete before proceeding to the next round of computation, network congestions can greatly increase the waiting time for GPUs. To address these challenges, this paper introduces a communication-driven solution, namely the C4. The key insights of C4 are two folds. First, in parallel training, collective communication exhibits periodic and homogeneous characteristics, so any anomalies are certainly due to some form of hardware malfunction. By leveraging this feature, C4 can rapidly identify the faulty components, swiftly isolate the anomaly, and restart the task, thereby avoiding resource wastage caused by delays in anomaly detection. Second, the predictable communication model of collective communication, involving few large flows, allows C4 to efficiently execute traffic planning, substantially reducing network congestion. C4 has been extensively implemented across our production systems, cutting error-induced overhead by roughly 30% and enhancing runtime performance by about 15% for certain applications with moderate communication costs.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Every Answer Matters: Evaluating Commonsense with Probabilistic Measures
Authors:
Qi Cheng,
Michael Boratko,
Pranay Kumar Yelugam,
Tim O'Gorman,
Nalini Singh,
Andrew McCallum,
Xiang Lorraine Li
Abstract:
Large language models have demonstrated impressive performance on commonsense tasks; however, these tasks are often posed as multiple-choice questions, allowing models to exploit systematic biases. Commonsense is also inherently probabilistic with multiple correct answers. The purpose of "boiling water" could be making tea and cooking, but it also could be killing germs. Existing tasks do not capt…
▽ More
Large language models have demonstrated impressive performance on commonsense tasks; however, these tasks are often posed as multiple-choice questions, allowing models to exploit systematic biases. Commonsense is also inherently probabilistic with multiple correct answers. The purpose of "boiling water" could be making tea and cooking, but it also could be killing germs. Existing tasks do not capture the probabilistic nature of common sense. To this end, we present commonsense frame completion (CFC), a new generative task that evaluates common sense via multiple open-ended generations. We also propose a method of probabilistic evaluation that strongly correlates with human judgments. Humans drastically outperform strong language model baselines on our dataset, indicating this approach is both a challenging and useful evaluation of machine common sense.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Class-Aware Cartilage Segmentation for Autonomous US-CT Registration in Robotic Intercostal Ultrasound Imaging
Authors:
Zhongliang Jiang,
Yunfeng Kang,
Yuan Bi,
Xuesong Li,
Chenyang Li,
Nassir Navab
Abstract:
Ultrasound imaging has been widely used in clinical examinations owing to the advantages of being portable, real-time, and radiation-free. Considering the potential of extensive deployment of autonomous examination systems in hospitals, robotic US imaging has attracted increased attention. However, due to the inter-patient variations, it is still challenging to have an optimal path for each patien…
▽ More
Ultrasound imaging has been widely used in clinical examinations owing to the advantages of being portable, real-time, and radiation-free. Considering the potential of extensive deployment of autonomous examination systems in hospitals, robotic US imaging has attracted increased attention. However, due to the inter-patient variations, it is still challenging to have an optimal path for each patient, particularly for thoracic applications with limited acoustic windows, e.g., intercostal liver imaging. To address this problem, a class-aware cartilage bone segmentation network with geometry-constraint post-processing is presented to capture patient-specific rib skeletons. Then, a dense skeleton graph-based non-rigid registration is presented to map the intercostal scanning path from a generic template to individual patients. By explicitly considering the high-acoustic impedance bone structures, the transferred scanning path can be precisely located in the intercostal space, enhancing the visibility of internal organs by reducing the acoustic shadow. To evaluate the proposed approach, the final path mapping performance is validated on five distinct CTs and two volunteer US data, resulting in ten pairs of CT-US combinations. Results demonstrate that the proposed graph-based registration method can robustly and precisely map the path from CT template to individual patients (Euclidean error: $2.21\pm1.11~mm$).
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
C^2RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction
Authors:
Yiqun Lin,
Jiewen Yang,
Hualiang Wang,
Xinpeng Ding,
Wei Zhao,
Xiaomeng Li
Abstract:
Cone beam computed tomography (CBCT) is an important imaging technology widely used in medical scenarios, such as diagnosis and preoperative planning. Using fewer projection views to reconstruct CT, also known as sparse-view reconstruction, can reduce ionizing radiation and further benefit interventional radiology. Compared with sparse-view reconstruction for traditional parallel/fan-beam CT, CBCT…
▽ More
Cone beam computed tomography (CBCT) is an important imaging technology widely used in medical scenarios, such as diagnosis and preoperative planning. Using fewer projection views to reconstruct CT, also known as sparse-view reconstruction, can reduce ionizing radiation and further benefit interventional radiology. Compared with sparse-view reconstruction for traditional parallel/fan-beam CT, CBCT reconstruction is more challenging due to the increased dimensionality caused by the measurement process based on cone-shaped X-ray beams. As a 2D-to-3D reconstruction problem, although implicit neural representations have been introduced to enable efficient training, only local features are considered and different views are processed equally in previous works, resulting in spatial inconsistency and poor performance on complicated anatomies. To this end, we propose C^2RV by leveraging explicit multi-scale volumetric representations to enable cross-regional learning in the 3D space. Additionally, the scale-view cross-attention module is introduced to adaptively aggregate multi-scale and multi-view features. Extensive experiments demonstrate that our C^2RV achieves consistent and significant improvement over previous state-of-the-art methods on datasets with diverse anatomy.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
A Comprehensive Study of Quantum Arithmetic Circuits
Authors:
Siyi Wang,
Xiufan Li,
Wei Jie Bryan Lee,
Suman Deb,
Eugene Lim,
Anupam Chattopadhyay
Abstract:
In recent decades, the field of quantum computing has experienced remarkable progress. This progress is marked by the superior performance of many quantum algorithms compared to their classical counterparts, with Shor's algorithm serving as a prominent illustration. Quantum arithmetic circuits, which are the fundamental building blocks in numerous quantum algorithms, have attracted much attention.…
▽ More
In recent decades, the field of quantum computing has experienced remarkable progress. This progress is marked by the superior performance of many quantum algorithms compared to their classical counterparts, with Shor's algorithm serving as a prominent illustration. Quantum arithmetic circuits, which are the fundamental building blocks in numerous quantum algorithms, have attracted much attention. Despite extensive exploration of various designs in the existing literature, researchers remain keen on developing novel designs and improving existing ones.
In this review article, we aim to provide a systematically organized and easily comprehensible overview of the current state-of-the-art in quantum arithmetic circuits. Specifically, this study covers fundamental operations such as addition, subtraction, multiplication, division and modular exponentiation. We delve into the detailed quantum implementations of these prominent designs and evaluate their efficiency considering various objectives. We also discuss potential applications of presented arithmetic circuits and suggest future research directions.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Time delay of fast radio burst population with respect to the star formation history
Authors:
Hai-Nan Lin,
Xin-Yi Li,
Rui Zou
Abstract:
In spite of significant progress in the research of fast radio bursts (FRBs) in recent decade, their origin is still under extensive debate. Investigation on the population of FRBs can provide new insight into this interesting problem. In this paper, based on the first CHIME/FRB catalog, we construct a Bayesian framework to analyze the FRB population, with the selection effect of the CHIME telesco…
▽ More
In spite of significant progress in the research of fast radio bursts (FRBs) in recent decade, their origin is still under extensive debate. Investigation on the population of FRBs can provide new insight into this interesting problem. In this paper, based on the first CHIME/FRB catalog, we construct a Bayesian framework to analyze the FRB population, with the selection effect of the CHIME telescope being properly taken into account. The energy function is modeled as the power-law with an exponential cutoff. Four redshift distribution models are considered, i.e., the star formation history (SFH) model, and three time-delayed models (Gaussian delay, log-normal delay, and power-law delay). The free parameters are simultaneously constrained using Bayesian inference method, and the Bayesian information criterion (BIC) is used in model comparison. According to BIC, the log-normal delay model fits the data best. The power-law delay model and Gaussian delay model can also give reasonable fits, although they are not as good as the log-normal delay model. However, the SFH model is strongly disfavored compared with the three time-delayed models. The energy function is tightly constrained and is almost independent of the redshift models, with the best-fitting power-law index $α\approx 1.8$, and cut-off energy $\log(E_c/{\rm erg})\approx 42$. The FRB population shows on average $3\sim 5$ billion years time delay with respect to the SFH. Therefore, the hypothesis that the FRB population traces the SFH is conclusively ruled out.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
The impact of nodes of information dissemination on epidemic spreading in dynamic multiplex networks
Authors:
Minyu Feng,
Xiangxi Li,
Yuhan Li,
Qin Li
Abstract:
Epidemic spreading processes on dynamic multiplex networks provide a more accurate description of natural spreading processes than those on single layered networks. To describe the influence of different individuals in the awareness layer on epidemic spreading, we propose a two-layer network-based epidemic spreading model, including some individuals who neglect the epidemic, and we explore how ind…
▽ More
Epidemic spreading processes on dynamic multiplex networks provide a more accurate description of natural spreading processes than those on single layered networks. To describe the influence of different individuals in the awareness layer on epidemic spreading, we propose a two-layer network-based epidemic spreading model, including some individuals who neglect the epidemic, and we explore how individuals with different properties in the awareness layer will affect the spread of epidemics. The two-layer network model is divided into an information transmission layer and a disease spreading layer. Each node in the layer represents an individual with different connections in different layers. Individuals with awareness will be infected with a lower probability compared to unaware individuals, which corresponds to the various epidemic prevention measures in real life. We adopt the micro-Markov chain approach to analytically derive the threshold for the proposed epidemic model, which demonstrates that the awareness layer affects the threshold of disease spreading. We then explore how individuals with different properties would affect the disease spreading process through extensive Monte Carlo numerical simulations. We find that individuals with high centrality in the awareness layer would significantly inhibit the transmission of infectious diseases. Additionally, we propose conjectures and explanations for the approximately linear effect of individuals with low centrality in the awareness layer on the number of infected individuals.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Dynamic properties of a class of van der Pol-Duffing oscillators
Authors:
Yelei Kuang,
Xuemei Li
Abstract:
In this paper, we study the existence of bifurcation of a van der Pol-Duffing oscillator with quintic terms and its quasi-periodic solutions by means of qualitative and bifurcation theories. Firstly, we analyze the autonomous system and find that it has two kinds of local bifurcations and a global bifurcation: pitchfork bifurcation, Hopf bifurcation, homoclinic bifurcation. It is worth noting that…
▽ More
In this paper, we study the existence of bifurcation of a van der Pol-Duffing oscillator with quintic terms and its quasi-periodic solutions by means of qualitative and bifurcation theories. Firstly, we analyze the autonomous system and find that it has two kinds of local bifurcations and a global bifurcation: pitchfork bifurcation, Hopf bifurcation, homoclinic bifurcation. It is worth noting that the disappearance of the homoclinic orbit is synchronized with the emergence of a large limit cycle. Then, by discussing the stability of equilibria at infinity and the orientation of the trajectory, the existence and stability of limit circles of the autonomous system are analyzed by combining the Poincaré-Bendixson theorem and the index theory. The global phase portrait and the numerical simulation of the autonomous system in different parameter values are given. Finally, the existence of periodic and quasi-periodic solutions to periodic forced system is proved by a KAM theorem.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Robust Knowledge Distillation Based on Feature Variance Against Backdoored Teacher Model
Authors:
Jinyin Chen,
Xiaoming Zhao,
Haibin Zheng,
Xiao Li,
Sheng Xiang,
Haifeng Guo
Abstract:
Benefiting from well-trained deep neural networks (DNNs), model compression have captured special attention for computing resource limited equipment, especially edge devices. Knowledge distillation (KD) is one of the widely used compression techniques for edge deployment, by obtaining a lightweight student model from a well-trained teacher model released on public platforms. However, it has been e…
▽ More
Benefiting from well-trained deep neural networks (DNNs), model compression have captured special attention for computing resource limited equipment, especially edge devices. Knowledge distillation (KD) is one of the widely used compression techniques for edge deployment, by obtaining a lightweight student model from a well-trained teacher model released on public platforms. However, it has been empirically noticed that the backdoor in the teacher model will be transferred to the student model during the process of KD. Although numerous KD methods have been proposed, most of them focus on the distillation of a high-performing student model without robustness consideration. Besides, some research adopts KD techniques as effective backdoor mitigation tools, but they fail to perform model compression at the same time. Consequently, it is still an open problem to well achieve two objectives of robust KD, i.e., student model's performance and backdoor mitigation. To address these issues, we propose RobustKD, a robust knowledge distillation that compresses the model while mitigating backdoor based on feature variance. Specifically, RobustKD distinguishes the previous works in three key aspects: (1) effectiveness: by distilling the feature map of the teacher model after detoxification, the main task performance of the student model is comparable to that of the teacher model; (2) robustness: by reducing the characteristic variance between the teacher model and the student model, it mitigates the backdoor of the student model under backdoored teacher model scenario; (3) generic: RobustKD still has good performance in the face of multiple data models (e.g., WRN 28-4, Pyramid-200) and diverse DNNs (e.g., ResNet50, MobileNet).
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Gaussian Representation for Deformable Image Registration
Authors:
Jihe Li,
Fabian Zhang,
Xia Li,
Tianhao Zhang,
Ye Zhang,
Joachim Buhmann
Abstract:
Deformable image registration (DIR) is a fundamental task in radiotherapy, with existing methods often struggling to balance computational efficiency, registration accuracy, and speed effectively. We introduce a novel DIR approach employing parametric 3D Gaussian control points achieving a better tradeoff. It provides an explicit and flexible representation for spatial deformation fields between 3…
▽ More
Deformable image registration (DIR) is a fundamental task in radiotherapy, with existing methods often struggling to balance computational efficiency, registration accuracy, and speed effectively. We introduce a novel DIR approach employing parametric 3D Gaussian control points achieving a better tradeoff. It provides an explicit and flexible representation for spatial deformation fields between 3D volumetric medical images, producing a displacement vector field (DVF) across all volumetric positions. The movement of individual voxels is derived using linear blend skinning (LBS) through localized interpolation of transformations associated with neighboring Gaussians. This interpolation strategy not only simplifies the determination of voxel motions but also acts as an effective regularization technique. Our approach incorporates a unified optimization process through backpropagation, enabling iterative learning of both the parameters of the 3D Gaussians and their transformations. Additionally, the density of Gaussians is adjusted adaptively during the learning phase to accommodate varying degrees of motion complexity. We validated our approach on the 4D-CT lung DIR-Lab and cardiac ACDC datasets, achieving an average target registration error (TRE) of 1.06 mm within a much-improved processing time of 2.43 seconds for the DIR-Lab dataset over existing methods, demonstrating significant advancements in both accuracy and efficiency.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Reference Channel Selection by Multi-Channel Masking for End-to-End Multi-Channel Speech Enhancement
Authors:
Wang Dai,
Xiaofei Li,
Archontis Politis,
Tuomas Virtanen
Abstract:
In end-to-end multi-channel speech enhancement, the traditional approach of designating one microphone signal as the reference for processing may not always yield optimal results. The limitation is particularly in scenarios with large distributed microphone arrays with varying speaker-to-microphone distances or compact, highly directional microphone arrays where speaker or microphone positions cha…
▽ More
In end-to-end multi-channel speech enhancement, the traditional approach of designating one microphone signal as the reference for processing may not always yield optimal results. The limitation is particularly in scenarios with large distributed microphone arrays with varying speaker-to-microphone distances or compact, highly directional microphone arrays where speaker or microphone positions change over time. Current mask-based methods often fix the reference channel during training, which makes it not possible to adaptively select the reference channel for optimal performance. To address this problem, we introduce an adaptive approach for selecting the optimal reference channel. Our method leverages a multi-channel masking-based scheme, where multiple masked signals are combined to generate a single-channel output signal. This enhanced signal is then used for loss calculation, while the reference clean speech is adjusted based on the highest scale-invariant signal-to-distortion ratio (SI-SDR). The experimental results on the Spear challenge simulated dataset D4 demonstrate the superiority of our proposed method over the conventional approach of using a fixed reference channel with single-channel masking.
△ Less
Submitted 11 June, 2024; v1 submitted 5 June, 2024;
originally announced June 2024.
-
A Quantum Neural Network-Based Approach to Power Quality Disturbances Detection and Recognition
Authors:
Guo-Dong Li,
Hai-Yan He,
Yue Li,
Xin-Hao Li,
Hao Liu,
Qing-Le Wang,
Long Cheng
Abstract:
Power quality disturbances (PQDs) significantly impact the stability and reliability of power systems, necessitating accurate and efficient detection and recognition methods. While numerous classical algorithms for PQDs detection and recognition have been extensively studied and applied, related work in the quantum domain is still in its infancy. In this paper, an improved quantum neural networks…
▽ More
Power quality disturbances (PQDs) significantly impact the stability and reliability of power systems, necessitating accurate and efficient detection and recognition methods. While numerous classical algorithms for PQDs detection and recognition have been extensively studied and applied, related work in the quantum domain is still in its infancy. In this paper, an improved quantum neural networks (QNN) model for PQDs detection and recognition is proposed. Specifically, the model constructs a quantum circuit comprising data qubits and ancilla qubits. Classical data is transformed into quantum data by embedding it into data qubits via the encoding layer. Subsequently, parametric quantum gates are utilized to form the variational layer, which facilitates qubit information transformation, thereby extracting essential feature information for detection and recognition. The expected value is obtained by measuring ancilla qubits, enabling the completion of disturbance classification based on this expected value. An analysis reveals that the runtime and space complexities of the QNN are $O\left ( poly\left ( N \right ) \right )$ and $O\left ( N \right )$, respectively. Extensive experiments validate the feasibility and superiority of the proposed model in PQD detection and recognition. The model achieves accuracies of 99.75\%, 97.85\% and 95.5\% in experiments involving the detection of disturbances, recognition of seven single disturbances, and recognition of ten mixed disturbances, respectively. Additionally, noise simulation and comparative experiments demonstrate that the proposed model exhibits robust anti-noise capabilities, requires few training parameters, and maintains high accuracy.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Achieving Near-Optimal Convergence for Distributed Minimax Optimization with Adaptive Stepsizes
Authors:
Yan Huang,
Xiang Li,
Yipeng Shen,
Niao He,
Jinming Xu
Abstract:
In this paper, we show that applying adaptive methods directly to distributed minimax problems can result in non-convergence due to inconsistency in locally computed adaptive stepsizes. To address this challenge, we propose D-AdaST, a Distributed Adaptive minimax method with Stepsize Tracking. The key strategy is to employ an adaptive stepsize tracking protocol involving the transmission of two ex…
▽ More
In this paper, we show that applying adaptive methods directly to distributed minimax problems can result in non-convergence due to inconsistency in locally computed adaptive stepsizes. To address this challenge, we propose D-AdaST, a Distributed Adaptive minimax method with Stepsize Tracking. The key strategy is to employ an adaptive stepsize tracking protocol involving the transmission of two extra (scalar) variables. This protocol ensures the consistency among stepsizes of nodes, eliminating the steady-state error due to the lack of coordination of stepsizes among nodes that commonly exists in vanilla distributed adaptive methods, and thus guarantees exact convergence. For nonconvex-strongly-concave distributed minimax problems, we characterize the specific transient times that ensure time-scale separation of stepsizes and quasi-independence of networks, leading to a near-optimal convergence rate of $\tilde{\mathcal{O}} \left( ε^{-\left( 4+δ\right)} \right)$ for any small $δ> 0$, matching that of the centralized counterpart. To our best knowledge, D-AdaST is the first distributed adaptive method achieving near-optimal convergence without knowing any problem-dependent parameters for nonconvex minimax problems. Extensive experiments are conducted to validate our theoretical results.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Measurements of the branching fractions of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^-π^0/η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Based on $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events, we investigate four hadronic decay modes of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^- π^0/η$ ($h=π$ or $K$) via the process $ψ(3686) \to π^{0}h_c$ at BESIII. The $h_c \to π^+ π^- π^0$ decay is observed with a significance of 9.6$σ$ after taking into account systematic uncertainties. Evidences for…
▽ More
Based on $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events, we investigate four hadronic decay modes of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^- π^0/η$ ($h=π$ or $K$) via the process $ψ(3686) \to π^{0}h_c$ at BESIII. The $h_c \to π^+ π^- π^0$ decay is observed with a significance of 9.6$σ$ after taking into account systematic uncertainties. Evidences for $h_c \to K^+ K^- π^0$ and $h_c \to K^+ K^- η$ are found with significances of $3.5σ$ and $3.3σ$, respectively, after considering the systematic uncertainties. The branching fractions of these decays are measured to be $\mathcal{B}(h_c \to π^+ π^- π^0)=(1.36\pm0.16\pm0.14)\times10^{-3}$, $\mathcal{B}(h_c \to K^+ K^- π^0)=(3.26\pm0.84\pm0.36)\times10^{-4}$, and $\mathcal{B}(h_c \to K^+ K^- η)=(3.13\pm1.08\pm0.38)\times10^{-4}$, where the first uncertainties are statistical and the second are systematic. No significant signal of $h_c\toπ^+π^-η$ is found, and the upper limit of its decay branching fraction is determined to be $\mathcal{B}(h_c\toπ^+π^-η) < 4.0 \times 10^{-4}$ at 90% confidence level.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
DenoDet: Attention as Deformable Multi-Subspace Feature Denoising for Target Detection in SAR Images
Authors:
Yimian Dai,
Minrui Zou,
Yuxuan Li,
Xiang Li,
Kang Ni,
Jian Yang
Abstract:
Synthetic Aperture Radar (SAR) target detection has long been impeded by inherent speckle noise and the prevalence of diminutive, ambiguous targets. While deep neural networks have advanced SAR target detection, their intrinsic low-frequency bias and static post-training weights falter with coherent noise and preserving subtle details across heterogeneous terrains. Motivated by traditional SAR ima…
▽ More
Synthetic Aperture Radar (SAR) target detection has long been impeded by inherent speckle noise and the prevalence of diminutive, ambiguous targets. While deep neural networks have advanced SAR target detection, their intrinsic low-frequency bias and static post-training weights falter with coherent noise and preserving subtle details across heterogeneous terrains. Motivated by traditional SAR image denoising, we propose DenoDet, a network aided by explicit frequency domain transform to calibrate convolutional biases and pay more attention to high-frequencies, forming a natural multi-scale subspace representation to detect targets from the perspective of multi-subspace denoising. We design TransDeno, a dynamic frequency domain attention module that performs as a transform domain soft thresholding operation, dynamically denoising across subspaces by preserving salient target signals and attenuating noise. To adaptively adjust the granularity of subspace processing, we also propose a deformable group fully-connected layer (DeGroFC) that dynamically varies the group conditioned on the input features. Without bells and whistles, our plug-and-play TransDeno sets state-of-the-art scores on multiple SAR target detection datasets. The code is available at https://github.com/GrokCV/GrokSAR.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Evidentially Calibrated Source-Free Time-Series Domain Adaptation with Temporal Imputation
Authors:
Mohamed Ragab,
Peiliang Gong,
Emadeldeen Eldele,
Wenyu Zhang,
Min Wu,
Chuan-Sheng Foo,
Daoqiang Zhang,
Xiaoli Li,
Zhenghua Chen
Abstract:
Source-free domain adaptation (SFDA) aims to adapt a model pre-trained on a labeled source domain to an unlabeled target domain without access to source data, preserving the source domain's privacy. While SFDA is prevalent in computer vision, it remains largely unexplored in time series analysis. Existing SFDA methods, designed for visual data, struggle to capture the inherent temporal dynamics of…
▽ More
Source-free domain adaptation (SFDA) aims to adapt a model pre-trained on a labeled source domain to an unlabeled target domain without access to source data, preserving the source domain's privacy. While SFDA is prevalent in computer vision, it remains largely unexplored in time series analysis. Existing SFDA methods, designed for visual data, struggle to capture the inherent temporal dynamics of time series, hindering adaptation performance. This paper proposes MAsk And imPUte (MAPU), a novel and effective approach for time series SFDA. MAPU addresses the critical challenge of temporal consistency by introducing a novel temporal imputation task. This task involves randomly masking time series signals and leveraging a dedicated temporal imputer to recover the original signal within the learned embedding space, bypassing the complexities of noisy raw data. Notably, MAPU is the first method to explicitly address temporal consistency in the context of time series SFDA. Additionally, it offers seamless integration with existing SFDA methods, providing greater flexibility. We further introduce E-MAPU, which incorporates evidential uncertainty estimation to address the overconfidence issue inherent in softmax predictions. To achieve that, we leverage evidential deep learning to obtain a better-calibrated pre-trained model and adapt the target encoder to map out-of-support target samples to a new feature representation closer to the source domain's support. This fosters better alignment, ultimately enhancing adaptation performance. Extensive experiments on five real-world time series datasets demonstrate that both MAPU and E-MAPU achieve significant performance gains compared to existing methods. These results highlight the effectiveness of our proposed approaches for tackling various time series domain adaptation problems.
△ Less
Submitted 12 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Authors:
Philip Anastassiou,
Jiawei Chen,
Jitong Chen,
Yuanzhe Chen,
Zhuo Chen,
Ziyi Chen,
Jian Cong,
Lelai Deng,
Chuang Ding,
Lu Gao,
Mingqing Gong,
Peisong Huang,
Qingqing Huang,
Zhiying Huang,
Yuanyuan Huo,
Dongya Jia,
Chumin Li,
Feiya Li,
Hui Li,
Jiaxin Li,
Xiaoyang Li,
Xingxing Li,
Lin Liu,
Shouda Liu,
Sichao Liu
, et al. (21 additional authors not shown)
Abstract:
We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub…
▽ More
We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and subjective evaluations. With fine-tuning, we achieve even higher subjective scores across these metrics. Seed-TTS offers superior controllability over various speech attributes such as emotion and is capable of generating highly expressive and diverse speech for speakers in the wild. Furthermore, we propose a self-distillation method for speech factorization, as well as a reinforcement learning approach to enhance model robustness, speaker similarity, and controllability. We additionally present a non-autoregressive (NAR) variant of the Seed-TTS model, named $\text{Seed-TTS}_\text{DiT}$, which utilizes a fully diffusion-based architecture. Unlike previous NAR-based TTS systems, $\text{Seed-TTS}_\text{DiT}$ does not depend on pre-estimated phoneme durations and performs speech generation through end-to-end processing. We demonstrate that this variant achieves comparable performance to the language model-based variant and showcase its effectiveness in speech editing. We encourage readers to listen to demos at \url{https://bytedancespeech.github.io/seedtts_tech_report}.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
GrootVL: Tree Topology is All You Need in State Space Model
Authors:
Yicheng Xiao,
Lin Song,
Shaoli Huang,
Jiangshan Wang,
Siyu Song,
Yixiao Ge,
Xiu Li,
Ying Shan
Abstract:
The state space models, employing recursively propagated features, demonstrate strong representation capabilities comparable to Transformer models and superior efficiency. However, constrained by the inherent geometric constraints of sequences, it still falls short in modeling long-range dependencies. To address this issue, we propose the GrootVL network, which first dynamically generates a tree t…
▽ More
The state space models, employing recursively propagated features, demonstrate strong representation capabilities comparable to Transformer models and superior efficiency. However, constrained by the inherent geometric constraints of sequences, it still falls short in modeling long-range dependencies. To address this issue, we propose the GrootVL network, which first dynamically generates a tree topology based on spatial relationships and input features. Then, feature propagation is performed based on this graph, thereby breaking the original sequence constraints to achieve stronger representation capabilities. Additionally, we introduce a linear complexity dynamic programming algorithm to enhance long-range interactions without increasing computational cost. GrootVL is a versatile multimodal framework that can be applied to both visual and textual tasks. Extensive experiments demonstrate that our method significantly outperforms existing structured state space models on image classification, object detection and segmentation. Besides, by fine-tuning large language models, our approach achieves consistent improvements in multiple textual tasks at minor training cost.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
A KL-based Analysis Framework with Applications to Non-Descent Optimization Methods
Authors:
Junwen Qiu,
Bohao Ma,
Xiao Li,
Andre Milzarek
Abstract:
We propose a novel analysis framework for non-descent-type optimization methodologies in nonconvex scenarios based on the Kurdyka-Lojasiewicz property. Our framework allows covering a broad class of algorithms, including those commonly employed in stochastic and distributed optimization. Specifically, it enables the analysis of first-order methods that lack a sufficient descent property and do not…
▽ More
We propose a novel analysis framework for non-descent-type optimization methodologies in nonconvex scenarios based on the Kurdyka-Lojasiewicz property. Our framework allows covering a broad class of algorithms, including those commonly employed in stochastic and distributed optimization. Specifically, it enables the analysis of first-order methods that lack a sufficient descent property and do not require access to full (deterministic) gradient information. We leverage this framework to establish, for the first time, iterate convergence and the corresponding rates for the decentralized gradient method and federated averaging under mild assumptions. Furthermore, based on the new analysis techniques, we show the convergence of the random reshuffling and stochastic gradient descent method without necessitating typical a priori bounded iterates assumptions.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.