-
Benchmarking the readout of a superconducting qubit for repeated measurements
Authors:
S. Hazra,
W. Dai,
T. Connolly,
P. D. Kurilovich,
Z. Wang,
L. Frunzio,
M. H. Devoret
Abstract:
Readout of superconducting qubits faces a trade-off between measurement speed and unwanted back-action on the qubit caused by the readout drive, such as $T_1$ degradation and leakage out of the computational subspace. The readout is typically benchmarked by integrating the readout signal and choosing a binary threshold to extract the "readout fidelity". We show that such a characterization may sig…
▽ More
Readout of superconducting qubits faces a trade-off between measurement speed and unwanted back-action on the qubit caused by the readout drive, such as $T_1$ degradation and leakage out of the computational subspace. The readout is typically benchmarked by integrating the readout signal and choosing a binary threshold to extract the "readout fidelity". We show that such a characterization may significantly overlook readout-induced leakage errors. We introduce a method to quantitatively assess this error by repeatedly executing a composite operation -- a readout preceded by a randomized qubit-flip. We apply this technique to characterize the dispersive readout of an intrinsically Purcell-protected qubit. We report a binary readout fidelity of $99.63\%$ and quantum non-demolition (QND) fidelity exceeding $99.00\%$ which takes into account a leakage error rate of $0.12\pm0.03\%$, under a repetition rate of $(380 \rm{ns})^{-1}$ for the composite operation.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Image Compression for Machine and Human Vision with Spatial-Frequency Adaptation
Authors:
Han Li,
Shaohui Li,
Shuangrui Ding,
Wenrui Dai,
Maida Cao,
Chenglin Li,
Junni Zou,
Hongkai Xiong
Abstract:
Image compression for machine and human vision (ICMH) has gained increasing attention in recent years. Existing ICMH methods are limited by high training and storage overheads due to heavy design of task-specific networks. To address this issue, in this paper, we develop a novel lightweight adapter-based tuning framework for ICMH, named Adapt-ICMH, that better balances task performance and bitrate…
▽ More
Image compression for machine and human vision (ICMH) has gained increasing attention in recent years. Existing ICMH methods are limited by high training and storage overheads due to heavy design of task-specific networks. To address this issue, in this paper, we develop a novel lightweight adapter-based tuning framework for ICMH, named Adapt-ICMH, that better balances task performance and bitrates with reduced overheads. We propose a spatial-frequency modulation adapter (SFMA) that simultaneously eliminates non-semantic redundancy with a spatial modulation adapter, and enhances task-relevant frequency components and suppresses task-irrelevant frequency components with a frequency modulation adapter. The proposed adapter is plug-and-play and compatible with almost all existing learned image compression models without compromising the performance of pre-trained models. Experiments demonstrate that Adapt-ICMH consistently outperforms existing ICMH frameworks on various machine vision tasks with fewer fine-tuned parameters and reduced computational complexity. Code will be released at https://github.com/qingshi9974/ECCV2024-AdpatICMH .
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
Constraining Holographic Dark Energy and Analyzing Cosmological Tensions
Authors:
Xin Tang,
Yin-Zhe Ma,
Wei-Ming Dai,
Hong-Jian He
Abstract:
We investigate cosmological constraints on the holographic dark energy (HDE) using the state-of-the-art cosmological datasets: Planck CMB angular power spectra and weak lensing power spectra, Atacama Cosmology Telescope (ACT) temperature power spectra, baryon acoustic oscillation (BAO) and redshift-space distortion (RSD) measurements from six-degree-field galaxy survey and Sloan Digital Sky Survey…
▽ More
We investigate cosmological constraints on the holographic dark energy (HDE) using the state-of-the-art cosmological datasets: Planck CMB angular power spectra and weak lensing power spectra, Atacama Cosmology Telescope (ACT) temperature power spectra, baryon acoustic oscillation (BAO) and redshift-space distortion (RSD) measurements from six-degree-field galaxy survey and Sloan Digital Sky Survey (DR12 & DR16) and the Cepheids-Supernovae measurement from SH0ES team (R22). We also examine the HDE model and $Λ$CDM with and without $N_{\rm eff}$ (effective number of relativistic species) being treated as a free parameter. We find that the HDE model can relieve the tensions of $H_0$ and $S_8$ to certain degrees. With ``Planck+ACT+BAO+RSD'' datasets, the constraints are $H_0 = 69.70 \pm 1.39\ \mathrm{km\ s^{-1} Mpc^{-1}}$ and $S_8 = 0.823 \pm 0.011$ in HDE model, which brings down the Hubble tension down to $1.92σ$ confidence level (C.L.) and the $S_8$ tension to $1$-$2σ$ C.L. By adding the R22 data, their values are improved as $H_0 = 71.86 \pm 0.93 \,\mathrm{km\ s^{-1} Mpc^{-1}}$ and $S_8 = 0.813 \pm 0.010$, which further brings the Hubble tension down to $0.85σ$ C.L. and relieves the $S_{8}$ tension. We also quantify the goodness-of-fit of different models with Akaike information criterion (AIC) and Bayesian information criterion (BIC), and find that the HDE agrees with the observational data better than the $Λ$CDM and other extended models (treating $N_{\rm eff}$ as free for fitting).
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
SvANet: A Scale-variant Attention-based Network for Small Medical Object Segmentation
Authors:
Wei Dai
Abstract:
Early detection and accurate diagnosis can predict the risk of malignant disease transformation, thereby increasing the probability of effective treatment. A mild syndrome with small infected regions is an ominous warning and is foremost in the early diagnosis of diseases. Deep learning algorithms, such as convolutional neural networks (CNNs), have been used to segment natural or medical objects,…
▽ More
Early detection and accurate diagnosis can predict the risk of malignant disease transformation, thereby increasing the probability of effective treatment. A mild syndrome with small infected regions is an ominous warning and is foremost in the early diagnosis of diseases. Deep learning algorithms, such as convolutional neural networks (CNNs), have been used to segment natural or medical objects, showing promising results. However, analyzing medical objects of small areas in images remains a challenge due to information losses and compression defects caused by convolution and pooling operations in CNNs. These losses and defects become increasingly significant as the network deepens, particularly for small medical objects. To address these challenges, we propose a novel scale-variant attention-based network (SvANet) for accurate small-scale object segmentation in medical images. The SvANet consists of Monte Carlo attention, scale-variant attention, and vision transformer, which incorporates cross-scale features and alleviates compression artifacts for enhancing the discrimination of small medical objects. Quantitative experimental results demonstrate the superior performance of SvANet, achieving 96.12%, 96.11%, 89.79%, 84.15%, 80.25%, 73.05%, and 72.58% in mean Dice coefficient for segmenting kidney tumors, skin lesions, hepatic tumors, polyps, surgical excision cells, retinal vasculatures, and sperms, which occupy less than 1% of the image areas in KiTS23, ISIC 2018, ATLAS, PolypGen, TissueNet, FIVES, and SpermHealth datasets, respectively.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Anisotropic Finsler $N$-Laplacian Liouville equation in convex cones
Authors:
Wei Dai,
Changfeng Gui,
YunPeng Luo
Abstract:
We consider the anisotropic Finsler $N$-Laplacian Liouville equation \[-Δ^{H}_{N}u=e^u \qquad {\rm{in}}\,\, \mathcal{C},\] where $N\geq2$, $\mathcal{C}\subseteq\mathbb{R}^{N}$ is an open convex cone including $\mathbb{R}^{N}$, the half space $\mathbb{R}^{N}_{+}$ and $\frac{1}{2^{m}}$-space $\mathbb{R}^{N}_{2^{-m}}:=\{x\in\mathbb{R}^{N}\mid x_{1},\cdots,x_{m}>0\}$ ($m=1,\cdots,N$), and the anisotro…
▽ More
We consider the anisotropic Finsler $N$-Laplacian Liouville equation \[-Δ^{H}_{N}u=e^u \qquad {\rm{in}}\,\, \mathcal{C},\] where $N\geq2$, $\mathcal{C}\subseteq\mathbb{R}^{N}$ is an open convex cone including $\mathbb{R}^{N}$, the half space $\mathbb{R}^{N}_{+}$ and $\frac{1}{2^{m}}$-space $\mathbb{R}^{N}_{2^{-m}}:=\{x\in\mathbb{R}^{N}\mid x_{1},\cdots,x_{m}>0\}$ ($m=1,\cdots,N$), and the anisotropic Finsler $N$-Laplacian $Δ^{H}_{N}$ is induced by a positively homogeneous function $H(x)$ of degree $1$. All solutions to the Finsler $N$-Laplacian Liouville equation with finite mass are completely classified. In particular, if $H(ξ)=|ξ|$, then the Finsler $N$-Laplacian $Δ^{H}_{N}$ reduces to the regular $N$-Laplacian $Δ_N$. Our result is a counterpart in the limiting case $p=N$ of the classification results in \cite{CFR} for the critical anisotropic $p$-Laplacian equations with $1<p<N$ in convex cones, and also extends the classification results in \cite{CK,CL,CW,CL2,E} for Liouville equation in the whole space $\mathbb{R}^{N}$ to general convex cones. In our proof, besides exploiting the anisotropic isoperimetric inequality inside convex cones, we have also proved and applied the radial Poincaré type inequality (Lemma \ref{A1}), which are key ingredients in the proof and of their own importance and interests.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Stationary and Sparse Denoising Approach for Corticomuscular Causality Estimation
Authors:
Farwa Abbas,
Verity McClelland,
Zoran Cvetkovic,
Wei Dai
Abstract:
Objective: Cortico-muscular communication patterns are instrumental in understanding movement control. Estimating significant causal relationships between motor cortex electroencephalogram (EEG) and surface electromyogram (sEMG) from concurrently active muscles presents a formidable challenge since the relevant processes underlying muscle control are typically weak in comparison to measurement noi…
▽ More
Objective: Cortico-muscular communication patterns are instrumental in understanding movement control. Estimating significant causal relationships between motor cortex electroencephalogram (EEG) and surface electromyogram (sEMG) from concurrently active muscles presents a formidable challenge since the relevant processes underlying muscle control are typically weak in comparison to measurement noise and background activities. Methodology: In this paper, a novel framework is proposed to simultaneously estimate the order of the autoregressive model of cortico-muscular interactions along with the parameters while enforcing stationarity condition in a convex program to ensure global optimality. The proposed method is further extended to a non-convex program to account for the presence of measurement noise in the recorded signals by introducing a wavelet sparsity assumption on the excitation noise in the model. Results: The proposed methodology is validated using both simulated data and neurophysiological signals. In case of simulated data, the performance of the proposed methods has been compared with the benchmark approaches in terms of order identification, computational efficiency, and goodness of fit in relation to various noise levels. In case of physiological signals our proposed methods are compared against the state-of-the-art approaches in terms of the ability to detect Granger causality. Significance: The proposed methods are shown to be effective in handling stationarity and measurement noise assumptions, revealing significant causal interactions from brain to muscles and vice versa.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Reference Channel Selection by Multi-Channel Masking for End-to-End Multi-Channel Speech Enhancement
Authors:
Wang Dai,
Xiaofei Li,
Archontis Politis,
Tuomas Virtanen
Abstract:
In end-to-end multi-channel speech enhancement, the traditional approach of designating one microphone signal as the reference for processing may not always yield optimal results. The limitation is particularly in scenarios with large distributed microphone arrays with varying speaker-to-microphone distances or compact, highly directional microphone arrays where speaker or microphone positions cha…
▽ More
In end-to-end multi-channel speech enhancement, the traditional approach of designating one microphone signal as the reference for processing may not always yield optimal results. The limitation is particularly in scenarios with large distributed microphone arrays with varying speaker-to-microphone distances or compact, highly directional microphone arrays where speaker or microphone positions change over time. Current mask-based methods often fix the reference channel during training, which makes it not possible to adaptively select the reference channel for optimal performance. To address this problem, we introduce an adaptive approach for selecting the optimal reference channel. Our method leverages a multi-channel masking-based scheme, where multiple masked signals are combined to generate a single-channel output signal. This enhanced signal is then used for loss calculation, while the reference clean speech is adjusted based on the highest scale-invariant signal-to-distortion ratio (SI-SDR). The experimental results on the Spear challenge simulated dataset D4 demonstrate the superiority of our proposed method over the conventional approach of using a fixed reference channel with single-channel masking.
△ Less
Submitted 11 June, 2024; v1 submitted 5 June, 2024;
originally announced June 2024.
-
Harnessing Neural Unit Dynamics for Effective and Scalable Class-Incremental Learning
Authors:
Depeng Li,
Tianqi Wang,
Junwei Chen,
Wei Dai,
Zhigang Zeng
Abstract:
Class-incremental learning (CIL) aims to train a model to learn new classes from non-stationary data streams without forgetting old ones. In this paper, we propose a new kind of connectionist model by tailoring neural unit dynamics that adapt the behavior of neural networks for CIL. In each training session, it introduces a supervisory mechanism to guide network expansion whose growth size is comp…
▽ More
Class-incremental learning (CIL) aims to train a model to learn new classes from non-stationary data streams without forgetting old ones. In this paper, we propose a new kind of connectionist model by tailoring neural unit dynamics that adapt the behavior of neural networks for CIL. In each training session, it introduces a supervisory mechanism to guide network expansion whose growth size is compactly commensurate with the intrinsic complexity of a newly arriving task. This constructs a near-minimal network while allowing the model to expand its capacity when cannot sufficiently hold new classes. At inference time, it automatically reactivates the required neural units to retrieve knowledge and leaves the remaining inactivated to prevent interference. We name our model AutoActivator, which is effective and scalable. To gain insights into the neural unit dynamics, we theoretically analyze the model's convergence property via a universal approximation theorem on learning sequential mappings, which is under-explored in the CIL community. Experiments show that our method achieves strong CIL performance in rehearsal-free and minimal-expansion settings with different backbones.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation
Authors:
Guanxing Lu,
Zifeng Gao,
Tianxing Chen,
Wenxun Dai,
Ziwei Wang,
Yansong Tang
Abstract:
Diffusion models have been verified to be effective in generating complex distributions from natural images to motion trajectories. Recent diffusion-based methods show impressive performance in 3D robotic manipulation tasks, whereas they suffer from severe runtime inefficiency due to multiple denoising steps, especially with high-dimensional observations. To this end, we propose a real-time roboti…
▽ More
Diffusion models have been verified to be effective in generating complex distributions from natural images to motion trajectories. Recent diffusion-based methods show impressive performance in 3D robotic manipulation tasks, whereas they suffer from severe runtime inefficiency due to multiple denoising steps, especially with high-dimensional observations. To this end, we propose a real-time robotic manipulation model named ManiCM that imposes the consistency constraint to the diffusion process, so that the model can generate robot actions in only one-step inference. Specifically, we formulate a consistent diffusion process in the robot action space conditioned on the point cloud input, where the original action is required to be directly denoised from any point along the ODE trajectory. To model this process, we design a consistency distillation technique to predict the action sample directly instead of predicting the noise within the vision community for fast convergence in the low-dimensional action manifold. We evaluate ManiCM on 31 robotic manipulation tasks from Adroit and Metaworld, and the results demonstrate that our approach accelerates the state-of-the-art method by 10 times in average inference speed while maintaining competitive average success rate.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
"Stumbling-to-Fetters" mechanism and Virginia Creeper model in hydrogel for designing bionic cardiovascular system
Authors:
Hanqing Dai,
Wenqing Dai,
Yuanyuan Chen,
Wanlu Zhang,
Yimeng Wang,
Ruiqian Guo,
Guoqi Zhang
Abstract:
Manufacturing hydrogels with identical electrochemical properties are typically riddled with unresolved inquiries and challenges. Here, we utilized ultra-light graphene flakes to trace the influence of convection phenomena during reactions on hydrogels' formation and structural non-uniformity, elucidating its mechanisms. Furthermore, we confirmed that an external electric field induced the orienta…
▽ More
Manufacturing hydrogels with identical electrochemical properties are typically riddled with unresolved inquiries and challenges. Here, we utilized ultra-light graphene flakes to trace the influence of convection phenomena during reactions on hydrogels' formation and structural non-uniformity, elucidating its mechanisms. Furthermore, we confirmed that an external electric field induced the orientation of functional groups of hydrogels along the direction of this field, revealing the mechanism of its influence on the structural non-uniformity and electrochemical properties of hydrogels. Additionally, we discovered that ion diffusion was "Stumbling-to-Fetters" by the functional groups on the polymer chains within the hydrogel, unveiling this mechanism and developing the Virginia Creeper (VC) model for hydrogels. We demonstrated the scalability and application of the VC model. Furthermore, we proposed a molecular-ion diffusion and current decay equation to describe the electrochemical properties of hydrogels. As an application of the VC model, we developed a bionic cardiovascular system and proved its potential to seamlessly interface with living organisms and generate bio-like bioelectricity. Our findings provide novel insights into triboelectricity and guidance for producing hydrogels with identical electrochemical properties, and offer a new pathway for bioelectric generation and the design of new hydrogel devices.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Prototype Design of a Digital Low-level RF System for S-band Deflectors
Authors:
J. F. Zhu,
H. L. Ding,
H. K. Li,
Y. Li,
X. W. Dai,
J. W. Han,
W. Q. Zhang
Abstract:
S-band deflectors are generally operated on pulsed mode for beam diagnosis. We plan to deploy 5 S-band (2997 MHz) deflectors to accurately measure the longitudinal time distribution of ultra-short electron beam pulses in Shenzhen Superconducting Soft X-ray Free Electron Laser (S3FEL). A microwave system of one deflector consists of a low-level RF system (LLRF), a solid-state amplifier, waveguide c…
▽ More
S-band deflectors are generally operated on pulsed mode for beam diagnosis. We plan to deploy 5 S-band (2997 MHz) deflectors to accurately measure the longitudinal time distribution of ultra-short electron beam pulses in Shenzhen Superconducting Soft X-ray Free Electron Laser (S3FEL). A microwave system of one deflector consists of a low-level RF system (LLRF), a solid-state amplifier, waveguide couplers, and a klystron, operated in pulse mode with a maximum repetition frequency of 50 Hz. Its microwave amplitude and phase stability must be better than 0.06%/0.08° (RMS). This article will introduce the prototype design of the hardware, firmware, and software of the digital LLRF system. We use homemade Local Oscillators (LOs) and commercial cards based on the MicroTCA standard in hardware design. The firmware design will use a Non-IQ demodulation and a pulse feedforward algorithm to suppress noise from high voltage of klystron. The software design is based on the EPICS control system architecture, achieving slow control and interface display functions. This report will also show some preliminary test results.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
A Low-rank Projected Proximal Gradient Method for Spectral Compressed Sensing
Authors:
Xi Yao,
Wei Dai
Abstract:
This paper presents a new approach to the recovery of a spectrally sparse signal (SSS) from partially observed entries, focusing on challenges posed by large-scale data and heavy noise environments. The SSS reconstruction can be formulated as a non-convex low-rank Hankel recovery problem. Traditional formulations for SSS recovery often suffer from reconstruction inaccuracies due to unequally weigh…
▽ More
This paper presents a new approach to the recovery of a spectrally sparse signal (SSS) from partially observed entries, focusing on challenges posed by large-scale data and heavy noise environments. The SSS reconstruction can be formulated as a non-convex low-rank Hankel recovery problem. Traditional formulations for SSS recovery often suffer from reconstruction inaccuracies due to unequally weighted norms and over-relaxation of the Hankel structure in noisy conditions. Moreover, a critical limitation of standard proximal gradient (PG) methods for solving the optimization problem is their slow convergence. We overcome this by introducing a more accurate formulation and a Low-rank Projected Proximal Gradient (LPPG) method, designed to efficiently converge to stationary points through a two-step process. The first step involves a modified PG approach, allowing for a constant step size independent of signal size, which significantly accelerates the gradient descent phase. The second step employs a subspace projection strategy, optimizing within a low-rank matrix space to further decrease the objective function. Both steps of the LPPG method are meticulously tailored to exploit the intrinsic low-rank and Hankel structures of the problem, thereby enhancing computational efficiency. Our numerical simulations reveal a substantial improvement in both the efficiency and recovery accuracy of the LPPG method compared to existing benchmark algorithms. This performance gain is particularly pronounced in scenarios with significant noise, demonstrating the method's robustness and applicability to large-scale SSS recovery tasks.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Search for solar axions by Primakoff effect with the full dataset of the CDEX-1B Experiment
Authors:
L. T. Yang,
S. K. Liu,
Q. Yue,
K. J. Kang,
Y. J. Li,
H. P. An,
Greeshma C.,
J. P. Chang,
Y. H. Chen,
J. P. Cheng,
W. H. Dai,
Z. Deng,
C. H. Fang,
X. P. Geng,
H. Gong,
Q. J. Guo,
T. Guo,
X. Y. Guo,
L. He,
J. R. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
L. Jiang,
S. Karmakar
, et al. (61 additional authors not shown)
Abstract:
We present the first limit on $g_{Aγ}$ coupling constant using the Bragg-Primakoff conversion based on an exposure of 1107.5 kg days of data from the CDEX-1B experiment at the China Jinping Underground Laboratory. The data are consistent with the null signal hypothesis, and no excess signals are observed. Limits of the coupling $g_{Aγ}<2.08\times10^{-9}$ GeV$^{-1}$ (95\% C.L.) are derived for axio…
▽ More
We present the first limit on $g_{Aγ}$ coupling constant using the Bragg-Primakoff conversion based on an exposure of 1107.5 kg days of data from the CDEX-1B experiment at the China Jinping Underground Laboratory. The data are consistent with the null signal hypothesis, and no excess signals are observed. Limits of the coupling $g_{Aγ}<2.08\times10^{-9}$ GeV$^{-1}$ (95\% C.L.) are derived for axions with mass up to 100 eV/$c^2$. Within the hadronic model of KSVZ, our results exclude axion mass $>5.3~\rm{eV}/c^2$ at 95\% C.L.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
Ground-state properties of dipolar Bose-Einstein condensates with spin-orbit coupling and quantum fluctuations
Authors:
Xianghua Su,
Wenting Dai,
Tianyu Li,
Jiyuan Wang,
Linghua Wen
Abstract:
We study the ground-state properties of dipolar spin-1/2 Bose-Einstein condensates with quantum fluctuations and Rashba spin-orbit coupling (SOC). The combined effects of dipole-dipole interaction (DDI), SOC, and Lee-Huang-Yang (LHY) correction induced by quantum fluctuations on the ground-state structures and spin textures of the system are analyzed and discussed. For the nonrotating case and fix…
▽ More
We study the ground-state properties of dipolar spin-1/2 Bose-Einstein condensates with quantum fluctuations and Rashba spin-orbit coupling (SOC). The combined effects of dipole-dipole interaction (DDI), SOC, and Lee-Huang-Yang (LHY) correction induced by quantum fluctuations on the ground-state structures and spin textures of the system are analyzed and discussed. For the nonrotating case and fixed nonlinear interspecies contact interaction strengths, our results show that structural phase transitions can be achieved by adjusting the strengths of the DDI and LHY correction. In the absence of SOC, a ground-state phase diagram is given with respect to the DDI strength and the LHY correction strength. We find that the system exhibits rich quantum phases including square droplet lattice phase, annular phase, loop-island structure, stripe-droplet coexistence phase, toroidal stripe phase, and Thomas-Fermi (TF) phase. For the rotating case, the increase of DDI strength can lead to a quantum phase transition from superfluid phase to supersolid phase. In the presence of SOC, the quantum droplets display obvious stretching and hidden vortex-antivortex clusters are formed in each component. In particular, weak or moderate SOC favors the formation of droplets while for strong SOC the ground state of the system develops into a stripe phase with hidden vortex-antivortex clusters. Furthermore, the system sustains exotic spin textures and topological excitations, such as composite skyrmion-antiskyrmion-meron-antimeron cluster, meron-antimeron string cluster, antimeron-meron-antimeron chain cluster, and peculiar skyrmion-antiskyrmion-meron-antimeron necklace with a meron-antimeron necklace embedded inside and a central spin Neel domain wall.
△ Less
Submitted 8 May, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model
Authors:
Wenxun Dai,
Ling-Hao Chen,
Jingbo Wang,
Jinpeng Liu,
Bo Dai,
Yansong Tang
Abstract:
This work introduces MotionLCM, extending controllable motion generation to a real-time level. Existing methods for spatial control in text-conditioned motion generation suffer from significant runtime inefficiency. To address this issue, we first propose the motion latent consistency model (MotionLCM) for motion generation, building upon the latent diffusion model (MLD). By employing one-step (or…
▽ More
This work introduces MotionLCM, extending controllable motion generation to a real-time level. Existing methods for spatial control in text-conditioned motion generation suffer from significant runtime inefficiency. To address this issue, we first propose the motion latent consistency model (MotionLCM) for motion generation, building upon the latent diffusion model (MLD). By employing one-step (or few-step) inference, we further improve the runtime efficiency of the motion latent diffusion model for motion generation. To ensure effective controllability, we incorporate a motion ControlNet within the latent space of MotionLCM and enable explicit control signals (e.g., pelvis trajectory) in the vanilla motion space to control the generation process directly, similar to controlling other latent-free diffusion models for motion generation. By employing these techniques, our approach can generate human motions with text and control signals in real-time. Experimental results demonstrate the remarkable generation and controlling capabilities of MotionLCM while maintaining real-time runtime efficiency.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
1st Place Solution for ICCV 2023 OmniObject3D Challenge: Sparse-View Reconstruction
Authors:
Hang Du,
Yaping Xue,
Weidong Dai,
Xuejun Yan,
Jingjing Wang
Abstract:
In this report, we present the 1st place solution for ICCV 2023 OmniObject3D Challenge: Sparse-View Reconstruction. The challenge aims to evaluate approaches for novel view synthesis and surface reconstruction using only a few posed images of each object. We utilize Pixel-NeRF as the basic model, and apply depth supervision as well as coarse-to-fine positional encoding. The experiments demonstrate…
▽ More
In this report, we present the 1st place solution for ICCV 2023 OmniObject3D Challenge: Sparse-View Reconstruction. The challenge aims to evaluate approaches for novel view synthesis and surface reconstruction using only a few posed images of each object. We utilize Pixel-NeRF as the basic model, and apply depth supervision as well as coarse-to-fine positional encoding. The experiments demonstrate the effectiveness of our approach in improving sparse-view reconstruction quality. We ranked first in the final test with a PSNR of 25.44614.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
First Search for Light Fermionic Dark Matter Absorption on Electrons Using Germanium Detector in CDEX-10 Experiment
Authors:
J. X. Liu,
L. T. Yang,
Q. Yue,
K. J. Kang,
Y. J. Li,
H. P. An,
Greeshma C.,
J. P. Chang,
Y. H. Chen,
J. P. Cheng,
W. H. Dai,
Z. Deng,
C. H. Fang,
X. P. Geng,
H. Gong,
Q. J. Guo,
T. Guo,
X. Y. Guo,
L. He,
J. R. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
L. Jiang,
S. Karmakar
, et al. (61 additional authors not shown)
Abstract:
We present the first results of the search for sub-MeV fermionic dark matter absorbed by electron targets of Germanium using the 205.4~kg$\cdot$day data collected by the CDEX-10 experiment, with the analysis threshold of 160~eVee. No significant dark matter (DM) signals over the background are observed. Results are presented as limits on the cross section of DM--electron interaction. We present ne…
▽ More
We present the first results of the search for sub-MeV fermionic dark matter absorbed by electron targets of Germanium using the 205.4~kg$\cdot$day data collected by the CDEX-10 experiment, with the analysis threshold of 160~eVee. No significant dark matter (DM) signals over the background are observed. Results are presented as limits on the cross section of DM--electron interaction. We present new constraints of cross section in the DM range of 0.1--10 keV/$c^2$ for vector and axial-vector interaction. The upper limit on the cross section is set to be $\rm 5.5\times10^{-46}~cm^2$ for vector interaction, and $\rm 1.8\times10^{-46}~cm^2$ for axial-vector interaction at DM mass of 5 keV/$c^2$.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
LOGO: A Long-Form Video Dataset for Group Action Quality Assessment
Authors:
Shiyi Zhang,
Wenxun Dai,
Sujia Wang,
Xiangwei Shen,
Jiwen Lu,
Jie Zhou,
Yansong Tang
Abstract:
Action quality assessment (AQA) has become an emerging topic since it can be extensively applied in numerous scenarios. However, most existing methods and datasets focus on single-person short-sequence scenes, hindering the application of AQA in more complex situations. To address this issue, we construct a new multi-person long-form video dataset for action quality assessment named LOGO. Distingu…
▽ More
Action quality assessment (AQA) has become an emerging topic since it can be extensively applied in numerous scenarios. However, most existing methods and datasets focus on single-person short-sequence scenes, hindering the application of AQA in more complex situations. To address this issue, we construct a new multi-person long-form video dataset for action quality assessment named LOGO. Distinguished in scenario complexity, our dataset contains 200 videos from 26 artistic swimming events with 8 athletes in each sample along with an average duration of 204.2 seconds. As for richness in annotations, LOGO includes formation labels to depict group information of multiple athletes and detailed annotations on action procedures. Furthermore, we propose a simple yet effective method to model relations among athletes and reason about the potential temporal logic in long-form videos. Specifically, we design a group-aware attention module, which can be easily plugged into existing AQA methods, to enrich the clip-wise representations based on contextual group information. To benchmark LOGO, we systematically conduct investigations on the performance of several popular methods in AQA and action segmentation. The results reveal the challenges our dataset brings. Extensive experiments also show that our approach achieves state-of-the-art on the LOGO dataset. The dataset and code will be released at \url{https://github.com/shiyi-zh0408/LOGO }.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Radio Frequency Interference Detection Using Efficient Multi-Scale Convolutional Attention UNet
Authors:
Fei Gu,
Longfei Hao,
Bo Liang,
Song Feng,
Shoulin Wei,
Wei Dai,
Yonghua Xu,
Zhixuan Li,
Yihang Dao
Abstract:
Studying the universe through radio telescope observation is crucial. However, radio telescopes capture not only signals from the universe but also various interfering signals, known as Radio Frequency Interference (RFI). The presence of RFI can significantly impact data analysis. Ensuring the accuracy, reliability, and scientific integrity of research findings by detecting and mitigating or elimi…
▽ More
Studying the universe through radio telescope observation is crucial. However, radio telescopes capture not only signals from the universe but also various interfering signals, known as Radio Frequency Interference (RFI). The presence of RFI can significantly impact data analysis. Ensuring the accuracy, reliability, and scientific integrity of research findings by detecting and mitigating or eliminating RFI in observational data, presents a persistent challenge in radio astronomy. In this study, we proposed a novel deep learning model called EMSCA-UNet for RFI detection. The model employs multi-scale convolutional operations to extract RFI features of various scale sizes. Additionally, an attention mechanism is utilized to assign different weights to the extracted RFI feature maps, enabling the model to focus on vital features for RFI detection. We evaluated the performance of the model using real data observed from the 40-meter radio telescope at Yunnan Observatory. Furthermore, we compared our results to other models, including U-Net, RFI-Net, and R-Net, using four commonly employed evaluation metrics: precision, recall, F1 score, and IoU. The results demonstrate that our model outperforms the other models on all evaluation metrics, achieving an average improvement of approximately 5\% compared to U-Net. Our model not only enhances the accuracy and comprehensiveness of RFI detection but also provides more detailed edge detection while minimizing the loss of useful signals.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Constraints on the Blazar-Boosted Dark Matter from the CDEX-10 Experiment
Authors:
R. Xu,
L. T. Yang,
Q. Yue,
K. J. Kang,
Y. J. Li,
H. P. An,
Greeshma C.,
J. P. Chang,
Y. H. Chen,
J. P. Cheng,
W. H. Dai,
Z. Deng,
C. H. Fang,
X. P. Geng,
H. Gong,
Q. J. Guo,
T. Guo,
X. Y. Guo,
L. He,
S. M. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
L. Jiang,
S. Karmakar
, et al. (59 additional authors not shown)
Abstract:
We report new constraints on light dark matter (DM) boosted by blazars using the 205.4 kg day data from the CDEX-10 experiment located at the China Jinping Underground Laboratory. Two representative blazars, TXS 0506+56 and BL Lacertae are studied. The results derived from TXS 0506+56 exclude DM-nucleon elastic scattering cross sections from $4.6\times 10^{-33}\ \rm cm^2$ to…
▽ More
We report new constraints on light dark matter (DM) boosted by blazars using the 205.4 kg day data from the CDEX-10 experiment located at the China Jinping Underground Laboratory. Two representative blazars, TXS 0506+56 and BL Lacertae are studied. The results derived from TXS 0506+56 exclude DM-nucleon elastic scattering cross sections from $4.6\times 10^{-33}\ \rm cm^2$ to $1\times10^{-26}\ \rm cm^2$ for DM masses between 10 keV and 1 GeV, and the results derived from BL Lacertae exclude DM-nucleon elastic scattering cross sections from $2.4\times 10^{-34}\ \rm cm^2$ to $1\times10^{-26}\ \rm cm^2$ for the same range of DM masses. The constraints correspond to the best sensitivities among solid-state detector experiments in the sub-MeV mass range.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
Probing Dark Matter Particles from Evaporating Primordial Black Holes via Electron Scattering in the CDEX-10 Experiment
Authors:
Z. H. Zhang,
L. T. Yang,
Q. Yue,
K. J. Kang,
Y. J. Li,
H. P. An,
Greeshma C.,
J. P. Chang,
Y. H. Chen,
J. P. Cheng,
W. H. Dai,
Z. Deng,
C. H. Fang,
X. P. Geng,
H. Gong,
Q. J. Guo,
T. Guo,
X. Y. Guo,
L. He,
S. M. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
L. Jiang,
S. Karmakar
, et al. (59 additional authors not shown)
Abstract:
Dark matter (DM) is a major constituent of the Universe. However, no definite evidence of DM particles (denoted as ``$χ$") has been found in DM direct detection (DD) experiments to date. There is a novel concept that detecting $χ$ from evaporating primordial black holes (PBHs). We search for $χ$ emitted from PBHs by investigating their interaction with target electrons. The examined PBH masses ran…
▽ More
Dark matter (DM) is a major constituent of the Universe. However, no definite evidence of DM particles (denoted as ``$χ$") has been found in DM direct detection (DD) experiments to date. There is a novel concept that detecting $χ$ from evaporating primordial black holes (PBHs). We search for $χ$ emitted from PBHs by investigating their interaction with target electrons. The examined PBH masses range from 1$\times$10$^{15}$ to 7$\times$10$^{16}$ g under the current limits of PBH abundance $f_{PBH}$. Using 205.4 kg$\cdot$day data obtained from the CDEX-10 experiment conducted in the China Jinping Underground Laboratory, we exclude the $χ$--electron ($χ$--$e$) elastic-scattering cross section $σ_{χe} \sim 5\times10^{-29}$ cm$^2$ for $χ$ with a mass $m_χ\lesssim$ 0.1 keV from our results. If ($m_χ$, $σ_{χe}$) can be determined in the future, DD experiments are expected to impose strong constraints on $f_{PBH}$ for large $M_{PBH}$s.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
Towards Scalable Semidefinite Programming: Optimal Metric ADMM with A Worst-case Performance Guarantee
Authors:
Yifan Ran,
Stefan Vlaski,
Wei Dai
Abstract:
Despite the numerous uses of semidefinite programming (SDP) and its universal solvability via interior point methods (IPMs), it is rarely applied to practical large-scale problems. This mainly owes to the computational cost of IPMs that increases in a bad exponential way with the data size. While first-order algorithms such as ADMM can alleviate this issue, but the scalability improvement appears…
▽ More
Despite the numerous uses of semidefinite programming (SDP) and its universal solvability via interior point methods (IPMs), it is rarely applied to practical large-scale problems. This mainly owes to the computational cost of IPMs that increases in a bad exponential way with the data size. While first-order algorithms such as ADMM can alleviate this issue, but the scalability improvement appears far not enough. In this work, we aim to achieve extra acceleration for ADMM by appealing to a non-Euclidean metric space, while maintaining everything in closed-form expressions. The efficiency gain comes from the extra degrees of freedom of a variable metric compared to a scalar step-size, which allows us to capture some additional ill-conditioning structures.
On the application side, we consider the quadratically constrained quadratic program (QCQP), which naturally appears in an SDP form after a dualization procedure. This technique, known as semidefinite relaxation, has important uses across different fields, particularly in wireless communications. Numerically, we observe that the scalability property is significantly improved. Depending on the data generation process, the extra acceleration can easily surpass the scalar-parameter efficiency limit, and the advantage is rapidly increasing as the data conditioning becomes worse.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Thermal oscillations and resonance in electron-phonon interaction process
Authors:
Emad Awad,
Weizhong Dai,
Sergey Sobolev
Abstract:
Thermal resonance, in which the temperature amplitude attains a maximum value (peak) in response to an external exciting frequency source, is a phenomenon pertinent to the presence of underdamped thermal oscillations and explicit finite-speed for the thermal wave propagation. The present work investigates the occurrence condition for thermal resonance phenomenon during the electron-phonon interact…
▽ More
Thermal resonance, in which the temperature amplitude attains a maximum value (peak) in response to an external exciting frequency source, is a phenomenon pertinent to the presence of underdamped thermal oscillations and explicit finite-speed for the thermal wave propagation. The present work investigates the occurrence condition for thermal resonance phenomenon during the electron-phonon interaction process in metals based on the hyperbolic two-temperature model. First, a sufficient condition for underdamped electron and lattice temperature oscillations is discussed by deriving a critical frequency (a material characteristic). It is shown that the critical frequency of thermal waves near room temperature, during electron-phonon interactions, may be on the order of terahertz ($10-20$ THz for Cu and Au, i.e., lying within the terahertz gap). It is found that whenever the natural frequency of metal temperature exceeds this frequency threshold, the temperature oscillations are of underdamped type. However, this condition is not necessary, since there is a small frequency domain, below this threshold, in which the underdamped thermal wave solution is available but not effective. Otherwise, the critical damping and the overdamping conditions of the temperature waves are determined numerically for a sample of pure metals. The thermal resonance conditions in both electron and lattice temperatures are investigated. The occurrence of resonance in both electron and lattice temperature is conditional on violating two distinct critical values of frequencies. When the natural frequency of the system becomes larger than these two critical values, an applied frequency equal to such a natural frequency can drive both electron and lattice temperatures to resonate together with different amplitudes and behaviors. However, the electron temperature resonates earlier than the lattice temperature.
△ Less
Submitted 9 February, 2024;
originally announced March 2024.
-
Broadband NIR photon upconversion generates NIR persistent luminescence for bioimaging
Authors:
Shuting Yang,
Bing Qi,
Mingzi Sun,
Wenjing Dai,
Ziyun Miao,
Wei Zheng,
Bolong Huang,
Jie Wang
Abstract:
Upconversion persistent luminescence (UCPL) phosphors that can be directly charged by near-infrared (NIR) light have gained considerable attention due to their promising applications ranging from photonics to biomedicine. However, current lanthanide-based UCPL phosphors show small absorption cross-sections and low upconversion charging efficiency. The development of UCPL phosphors faces challenges…
▽ More
Upconversion persistent luminescence (UCPL) phosphors that can be directly charged by near-infrared (NIR) light have gained considerable attention due to their promising applications ranging from photonics to biomedicine. However, current lanthanide-based UCPL phosphors show small absorption cross-sections and low upconversion charging efficiency. The development of UCPL phosphors faces challenges of lacking flexible upconversion charging pathways and poor design flexibility. Herein, we discovered a new lattice defect-mediated broadband photon upconversion process and the accompanied NIR-to-NIR UCPL in Cr-doped zinc gallate nanoparticles. The zinc gallate nanoparticles can be directly activated by broadband NIR light in the 700-1000 nm range to produce persistent luminescence at about 700 nm, which is also readily enhanced by rationally tailoring the lattice defects in the phosphors. This proposed UCPL phosphors achieved a signal-to-background ratio of over 200 in bioimaging by efficiently avoiding interference from autofluorescence and light scattering. Our findings reported the lattice defect-mediated photon upconversion for the first time, which significantly expanded the horizons for the flexible design of NIR-to-NIR UCPL phosphors toward broad applications.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations
Authors:
Chenyu You,
Yifei Min,
Weicheng Dai,
Jasjeet S. Sekhon,
Lawrence Staib,
James S. Duncan
Abstract:
Fine-tuning pre-trained vision-language models, like CLIP, has yielded success on diverse downstream tasks. However, several pain points persist for this paradigm: (i) directly tuning entire pre-trained models becomes both time-intensive and computationally costly. Additionally, these tuned models tend to become highly specialized, limiting their practicality for real-world deployment; (ii) recent…
▽ More
Fine-tuning pre-trained vision-language models, like CLIP, has yielded success on diverse downstream tasks. However, several pain points persist for this paradigm: (i) directly tuning entire pre-trained models becomes both time-intensive and computationally costly. Additionally, these tuned models tend to become highly specialized, limiting their practicality for real-world deployment; (ii) recent studies indicate that pre-trained vision-language classifiers may overly depend on spurious features -- patterns that correlate with the target in training data, but are not related to the true labeling function; and (iii) existing studies on mitigating the reliance on spurious features, largely based on the assumption that we can identify such features, does not provide definitive assurance for real-world applications. As a piloting study, this work focuses on exploring mitigating the reliance on spurious features for CLIP without using any group annotation. To this end, we systematically study the existence of spurious correlation on CLIP and CILP+ERM. We first, following recent work on Deep Feature Reweighting (DFR), verify that last-layer retraining can greatly improve group robustness on pretrained CLIP. In view of them, we advocate a lightweight representation calibration method for fine-tuning CLIP, by first generating a calibration set using the pretrained CLIP, and then calibrating representations of samples within this set through contrastive learning, all without the need for group labels. Extensive experiments and in-depth visualizations on several benchmarks validate the effectiveness of our proposals, largely reducing reliance and significantly boosting the model generalization.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Organic solvent boosts charge storage and charging dynamics of conductive MOF supercapacitors
Authors:
Ming Chen,
Taizheng Wu,
Liang Niu,
Ting Ye,
Wenlei Dai,
Liang Zeng,
Alexei A. Kornyshev,
Zhenxiang Wang,
Zhou Liu,
Guang Feng
Abstract:
Conductive metal-organic frameworks (c-MOFs) and ionic liquids (ILs) have emerged as auspicious combinations for high-performance supercapacitors. However, the nanoconfinement from c-MOFs and high viscosity of ILs slow down the charging process. This hindrance can, however, be resolved by adding solvent. Here, we performed constant-potential molecular simulations to scrutinize the solvent impact o…
▽ More
Conductive metal-organic frameworks (c-MOFs) and ionic liquids (ILs) have emerged as auspicious combinations for high-performance supercapacitors. However, the nanoconfinement from c-MOFs and high viscosity of ILs slow down the charging process. This hindrance can, however, be resolved by adding solvent. Here, we performed constant-potential molecular simulations to scrutinize the solvent impact on charge storage and charging dynamics of MOF-IL-based supercapacitors. We find conditions for >100% enhancement in capacity and ~6 times increase in charging speed. These improvements were confirmed by synthesizing near-ideal c-MOFs and developing multiscale models linking molecular simulations to electrochemical measurements. Fundamentally, our findings elucidate that the solvent acts as an ionophobic agent to induce a substantial enhancement in charge storage, and as an ion traffic police to eliminate convoluted counterion and co-ion motion paths and create two distinct ion transport highways to accelerate charging dynamics. This work paves the way for the optimal design of MOF supercapacitors.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
Proximal Dogleg Opportunistic Majorization for Nonconvex and Nonsmooth Optimization
Authors:
Yiming Zhou,
Wei Dai
Abstract:
We consider minimizing a function consisting of a quadratic term and a proximable term which is possibly nonconvex and nonsmooth. This problem is also known as scaled proximal operator. Despite its simple form, existing methods suffer from slow convergence or high implementation complexity or both. To overcome these limitations, we develop a fast and user-friendly second-order proximal algorithm.…
▽ More
We consider minimizing a function consisting of a quadratic term and a proximable term which is possibly nonconvex and nonsmooth. This problem is also known as scaled proximal operator. Despite its simple form, existing methods suffer from slow convergence or high implementation complexity or both. To overcome these limitations, we develop a fast and user-friendly second-order proximal algorithm. Key innovation involves building and solving a series of opportunistically majorized problems along a hybrid Newton direction. The approach directly uses the precise Hessian of the quadratic term, and calculates the inverse only once, eliminating the iterative numerical approximation of the Hessian, a common practice in quasi-Newton methods. The algorithm's convergence to a critical point is established, and local convergence rate is derived based on the Kurdyka-Lojasiewicz property of the objective function. Numerical comparisons are conducted on well-known optimization problems. The results demonstrate that the proposed algorithm not only achieves a faster convergence but also tends to converge to a better local optimum compare to benchmark algorithms.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Radial symmetry and sharp asymptotic behaviors of nonnegative solutions to $D^{1,p}$-critical quasi-linear static Schrödinger-Hartree equation involving $p$-Laplacian $-Δ_{p}$
Authors:
Wei Dai,
Yafei Li,
Zhao Liu
Abstract:
In this paper, we mainly consider nonnegative weak solution to the $D^{1,p}(\R^{N})$-critical quasi-linear static Schrödinger-Hartree equation with $p$-Laplacian $-Δ_{p}$ and nonlocal nonlinearity: \begin{align*} -Δ_p u =\left(|x|^{-2p}\ast |u|^{p}\right)|u|^{p-2}u \qquad &\mbox{in} \,\, \mathbb{R}^N, \end{align*} where $1<p<\frac{N}{2}$, $N\geq3$ and $u\in D^{1,p}(\R^N)$. Being different to the…
▽ More
In this paper, we mainly consider nonnegative weak solution to the $D^{1,p}(\R^{N})$-critical quasi-linear static Schrödinger-Hartree equation with $p$-Laplacian $-Δ_{p}$ and nonlocal nonlinearity: \begin{align*} -Δ_p u =\left(|x|^{-2p}\ast |u|^{p}\right)|u|^{p-2}u \qquad &\mbox{in} \,\, \mathbb{R}^N, \end{align*} where $1<p<\frac{N}{2}$, $N\geq3$ and $u\in D^{1,p}(\R^N)$. Being different to the $D^{1,p}(\R^{N})$-critical local nonlinear term $u^{p^{\star}-1}$ with $p^{\star}:=\frac{Np}{N-p}$ investigated in \cite{CFR,LDSMLMSB,GV,Ou,BS16,VJ16} etc., since the nonlocal convolution $|x|^{-2p}*u^p$ appears in the Hartree type nonlinearity, it is impossible for us to use the scaling arguments and the Doubling Lemma as in \cite{VJ16} to get preliminary estimates on upper bounds of asymptotic behaviors for any positive solutions $u$. Moreover, it is also quite difficult to obtain the boundedness of the quasi-norm $\|u \|_{L^{s,\infty}(\R^N)}$ and hence derive the sharp estimates on upper bounds of asymptotic behaviors from the preliminary estimates as in \cite{VJ16}. Fortunately, by showing a better preliminary estimates on upper bounds of asymptotic behaviors through the De Giorgi-Moser-Nash iteration method and combining the result from \cite{XCL}, we are able to overcome these difficulties and establish regularity and the sharp estimates on both upper and lower bounds of asymptotic behaviors for any positive solution $u$ to more general equation $-Δ_p u=V(x)u^{p-1}$ with $V\in L^{\frac{N}{p}}(\mathbb{R}^{N})$. Then, by using the arguments from \cite{BS16,VJ16}, we can deduce the sharp estimates on both upper and lower bounds for the decay rate of $|\nabla u|$. Finally, as a consequence, we can apply the method of moving planes to prove that all the nontrivial nonnegative solutions are radially symmetric and strictly decreasing about some point $x_0\in\R^N$.
△ Less
Submitted 20 April, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
Improving Diffusion Models for Inverse Problems Using Optimal Posterior Covariance
Authors:
Xinyu Peng,
Ziyang Zheng,
Wenrui Dai,
Nuoqian Xiao,
Chenglin Li,
Junni Zou,
Hongkai Xiong
Abstract:
Recent diffusion models provide a promising zero-shot solution to noisy linear inverse problems without retraining for specific inverse problems. In this paper, we reveal that recent methods can be uniformly interpreted as employing a Gaussian approximation with hand-crafted isotropic covariance for the intractable denoising posterior to approximate the conditional posterior mean. Inspired by this…
▽ More
Recent diffusion models provide a promising zero-shot solution to noisy linear inverse problems without retraining for specific inverse problems. In this paper, we reveal that recent methods can be uniformly interpreted as employing a Gaussian approximation with hand-crafted isotropic covariance for the intractable denoising posterior to approximate the conditional posterior mean. Inspired by this finding, we propose to improve recent methods by using more principled covariance determined by maximum likelihood estimation. To achieve posterior covariance optimization without retraining, we provide general plug-and-play solutions based on two approaches specifically designed for leveraging pre-trained models with and without reverse covariance. We further propose a scalable method for learning posterior covariance prediction based on representation with orthonormal basis. Experimental results demonstrate that the proposed methods significantly enhance reconstruction performance without requiring hyperparameter tuning.
△ Less
Submitted 2 June, 2024; v1 submitted 3 February, 2024;
originally announced February 2024.
-
Characteristic initial value problem for nonlinear wave equation with singular initial data
Authors:
Wei Dai,
Shiwu Yang
Abstract:
In this paper, we study the characteristic initial value problem for a class of nonlinear wave equations with data on a conic light cone in the Minkowski space $\mathbb{R}^{1+3}$. We show the existence of local solution for a class of singular initial data in the sense that the standard energy could be infinite and the solution may blow up at the conic point. As an application, we improve our prev…
▽ More
In this paper, we study the characteristic initial value problem for a class of nonlinear wave equations with data on a conic light cone in the Minkowski space $\mathbb{R}^{1+3}$. We show the existence of local solution for a class of singular initial data in the sense that the standard energy could be infinite and the solution may blow up at the conic point. As an application, we improve our previous result on the inverse scattering problem for the Maxwell-Klein-Gordon equations with scattering data on the future null infinity.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Distributionally Robust Receive Beamforming
Authors:
Shixiong Wang,
Wei Dai,
Geoffrey Ye Li
Abstract:
This article investigates signal estimation in wireless transmission (i.e., receive beamforming) from the perspective of statistical machine learning, where the transmit signals may be from an integrated sensing and communication system; that is, 1) signals may be not only discrete constellation points but also arbitrary complex values; 2) signals may be spatially correlated. Particular attention…
▽ More
This article investigates signal estimation in wireless transmission (i.e., receive beamforming) from the perspective of statistical machine learning, where the transmit signals may be from an integrated sensing and communication system; that is, 1) signals may be not only discrete constellation points but also arbitrary complex values; 2) signals may be spatially correlated. Particular attention is paid to handling various uncertainties such as the uncertainty of the transmit signal covariance, the uncertainty of the channel matrix, the uncertainty of the channel noise covariance, the existence of channel impulse noises, and the limited sample size of pilots. To proceed, a distributionally robust machine learning framework that is insensitive to the above uncertainties is proposed, which reveals that channel estimation is not a necessary operation. For optimal linear estimation, the proposed framework includes several existing beamformers as special cases such as diagonal loading and eigenvalue thresholding. For optimal nonlinear estimation, estimators are limited in reproducing kernel Hilbert spaces and neural network function spaces, and corresponding uncertainty-aware solutions (e.g., kernelized diagonal loading) are derived. In addition, we prove that the ridge and kernel ridge regression methods in machine learning are distributionally robust against diagonal perturbation in feature covariance.
△ Less
Submitted 10 June, 2024; v1 submitted 22 January, 2024;
originally announced January 2024.
-
MInD: Improving Multimodal Sentiment Analysis via Multimodal Information Disentanglement
Authors:
Weichen Dai,
Xingyu Li,
Pengbo Hu,
Zeyu Wang,
Ji Qi,
Jianlin Peng,
Yi Zhou
Abstract:
Learning effective joint representations has been a central task in multimodal sentiment analysis. Previous methods focus on leveraging the correlations between different modalities and enhancing performance through sophisticated fusion techniques. However, challenges still exist due to the inherent heterogeneity of distinct modalities, which may lead to distributional gap, impeding the full explo…
▽ More
Learning effective joint representations has been a central task in multimodal sentiment analysis. Previous methods focus on leveraging the correlations between different modalities and enhancing performance through sophisticated fusion techniques. However, challenges still exist due to the inherent heterogeneity of distinct modalities, which may lead to distributional gap, impeding the full exploitation of inter-modal information and resulting in redundancy and impurity in the information extracted from features. To address this problem, we introduce the Multimodal Information Disentanglement (MInD) approach. MInD decomposes the multimodal inputs into a modality-invariant component, a modality-specific component, and a remnant noise component for each modality through a shared encoder and multiple private encoders. The shared encoder aims to explore the shared information and commonality across modalities, while the private encoders are deployed to capture the distinctive information and characteristic features. These representations thus furnish a comprehensive perspective of the multimodal data, facilitating the fusion process instrumental for subsequent prediction tasks. Furthermore, MInD improves the learned representations by explicitly modeling the task-irrelevant noise in an adversarial manner. Experimental evaluations conducted on benchmark datasets, including CMU-MOSI, CMU-MOSEI, and UR-Funny, demonstrate MInD's superior performance over existing state-of-the-art methods in both multimodal emotion recognition and multimodal humor detection tasks.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
HG3-NeRF: Hierarchical Geometric, Semantic, and Photometric Guided Neural Radiance Fields for Sparse View Inputs
Authors:
Zelin Gao,
Weichen Dai,
Yu Zhang
Abstract:
Neural Radiance Fields (NeRF) have garnered considerable attention as a paradigm for novel view synthesis by learning scene representations from discrete observations. Nevertheless, NeRF exhibit pronounced performance degradation when confronted with sparse view inputs, consequently curtailing its further applicability. In this work, we introduce Hierarchical Geometric, Semantic, and Photometric G…
▽ More
Neural Radiance Fields (NeRF) have garnered considerable attention as a paradigm for novel view synthesis by learning scene representations from discrete observations. Nevertheless, NeRF exhibit pronounced performance degradation when confronted with sparse view inputs, consequently curtailing its further applicability. In this work, we introduce Hierarchical Geometric, Semantic, and Photometric Guided NeRF (HG3-NeRF), a novel methodology that can address the aforementioned limitation and enhance consistency of geometry, semantic content, and appearance across different views. We propose Hierarchical Geometric Guidance (HGG) to incorporate the attachment of Structure from Motion (SfM), namely sparse depth prior, into the scene representations. Different from direct depth supervision, HGG samples volume points from local-to-global geometric regions, mitigating the misalignment caused by inherent bias in the depth prior. Furthermore, we draw inspiration from notable variations in semantic consistency observed across images of different resolutions and propose Hierarchical Semantic Guidance (HSG) to learn the coarse-to-fine semantic content, which corresponds to the coarse-to-fine scene representations. Experimental results demonstrate that HG3-NeRF can outperform other state-of-the-art methods on different standard benchmarks and achieve high-fidelity synthesis results for sparse view inputs.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
Authors:
Bowen Shi,
Peisen Zhao,
Zichen Wang,
Yuhang Zhang,
Yaoming Wang,
Jin Li,
Wenrui Dai,
Junni Zou,
Hongkai Xiong,
Qi Tian,
Xiaopeng Zhang
Abstract:
Vision-language foundation models, represented by Contrastive language-image pre-training (CLIP), have gained increasing attention for jointly understanding both vision and textual tasks. However, existing approaches primarily focus on training models to match global image representations with textual descriptions, thereby overlooking the critical alignment between local regions and corresponding…
▽ More
Vision-language foundation models, represented by Contrastive language-image pre-training (CLIP), have gained increasing attention for jointly understanding both vision and textual tasks. However, existing approaches primarily focus on training models to match global image representations with textual descriptions, thereby overlooking the critical alignment between local regions and corresponding text tokens. This paper extends CLIP with multi-granularity alignment. Notably, we deliberately construct a new dataset comprising pseudo annotations at various levels of granularities, encompassing image-level, region-level, and pixel-level captions/tags. Accordingly, we develop a unified multi-granularity learning framework, named UMG-CLIP, that simultaneously empowers the model with versatile perception abilities across different levels of detail. Equipped with parameter efficient tuning, UMG-CLIP surpasses current widely used CLIP models and achieves state-of-the-art performance on diverse image understanding benchmarks, including open-world recognition, retrieval, semantic segmentation, and panoptic segmentation tasks. We hope UMG-CLIP can serve as a valuable option for advancing vision-language foundation models.
△ Less
Submitted 18 January, 2024; v1 submitted 12 January, 2024;
originally announced January 2024.
-
Approximation Algorithms for Minimizing Congestion in Demand-Aware Networks
Authors:
Wenkai Dai,
Michael Dinitz,
Klaus-Tycho Foerster,
Long Luo,
Stefan Schmid
Abstract:
Emerging reconfigurable optical communication technologies allow to enhance datacenter topologies with demand-aware links optimized towards traffic patterns. This paper studies the algorithmic problem of jointly optimizing topology and routing in such demand-aware networks to minimize congestion, along two dimensions: (1) splittable or unsplittable flows, and (2) whether routing is segregated, i.e…
▽ More
Emerging reconfigurable optical communication technologies allow to enhance datacenter topologies with demand-aware links optimized towards traffic patterns. This paper studies the algorithmic problem of jointly optimizing topology and routing in such demand-aware networks to minimize congestion, along two dimensions: (1) splittable or unsplittable flows, and (2) whether routing is segregated, i.e., whether routes can or cannot combine both demand-aware and demand-oblivious (static) links.
For splittable and segregated routing, we show that the problem is generally $2$-approximable, but APX-hard even for uniform demands induced by a bipartite demand graph. For unsplittable and segregated routing, we establish upper and lower bounds of $O\left(\log m/ \log\log m \right)$ and $Ω\left(\log m/ \log\log m \right)$, respectively, for polynomial-time approximation algorithms, where $m$ is the number of static links. We further reveal that under un-/splittable and non-segregated routing, even for demands of a single source (resp., destination), the problem cannot be approximated better than $Ω\left(\frac{c_{\max}}{c_{\min}} \right)$ unless P=NP, where $c_{\max}$ (resp., $c_{\min}$) denotes the maximum (resp., minimum) capacity. It remains NP-hard for uniform capacities, but is tractable for a single commodity and uniform capacities.
Our trace-driven simulations show a significant reduction in network congestion compared to existing solutions.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Classification of solutions to $3$-D and $4$-D mixed order conformally invariant systems with critical and exponential growth
Authors:
Wei Dai,
Lixiu Duan,
Rong Zhang
Abstract:
In this paper, without any assumption on $v$ and under the extremely mild assumption $u(x)= O(|x|^{K})$ as $|x|\rightarrow+\infty$ for some $K\gg1$ arbitrarily large, we classify solutions of the following conformally invariant system with mixed order and exponentially increasing nonlinearity in $\mathbb{R}^{3}$:…
▽ More
In this paper, without any assumption on $v$ and under the extremely mild assumption $u(x)= O(|x|^{K})$ as $|x|\rightarrow+\infty$ for some $K\gg1$ arbitrarily large, we classify solutions of the following conformally invariant system with mixed order and exponentially increasing nonlinearity in $\mathbb{R}^{3}$: $$ \begin{cases} \ (-Δ)^{\frac{1}{2}} u=v^{4} ,&x\in \mathbb{R}^{3},\\ \ -Δv=e^{pw} ,&x\in \mathbb{R}^{3},\\ \ (-Δ)^{\frac{3}{2}} w=u^{3} ,&x\in \mathbb{R}^{3}, \end{cases} $$ where $p>0$, $w(x)=o(|x|^{2})$ at $\infty$ and $u,v\geq0$ satisfies the finite total curvature condition $\int_{\mathbb{R}^{3}}u^{3}(x)\mathrm{d}x<+\infty$. Moreover, under the extremely mild assumption that \emph{either} $u(x)$ or $v(x)=O(|x|^{K})$ as $|x|\rightarrow+\infty$ for some $K\gg1$ arbitrarily large \emph{or} $\int_{\mathbb{R}^{4}}e^{Λpw(y)}\mathrm{d}y<+\infty$ for some $Λ\geq1$, we also prove classification of solutions to the conformally invariant system with mixed order and exponentially increasing nonlinearity in $\mathbb{R}^{4}$: \begin{align*} \begin{cases} \ (-Δ)^{\frac{1}{2}} u=e^{pw} ,&x\in \mathbb{R}^{4},\\ \ -Δv=u^2 ,&x\in \mathbb{R}^{4},\\ \ (-Δ)^{2} w=v^{4} ,&x\in \mathbb{R}^{4}, \end{cases} \end{align*} where $p>0$, and $w(x)=o(|x|^{2})$ at $\infty$ and $u,v\geq0$ satisfies the finite total curvature condition $\int_{\mathbb{R}^{4}}v^{4}(x)\mathrm{d}x<+\infty$. The key ingredients are deriving the integral representation formulae and crucial asymptotic behaviors of solutions $(u,v,w)$ and calculating the explicit value of the total curvature.
△ Less
Submitted 19 January, 2024; v1 submitted 8 January, 2024;
originally announced January 2024.
-
Few-shot Adaptation of Multi-modal Foundation Models: A Survey
Authors:
Fan Liu,
Tianshu Zhang,
Wenwen Dai,
Wenwen Cai,
Xiaocong Zhou,
Delong Chen
Abstract:
Multi-modal (vision-language) models, such as CLIP, are replacing traditional supervised pre-training models (e.g., ImageNet-based pre-training) as the new generation of visual foundation models. These models with robust and aligned semantic representations learned from billions of internet image-text pairs and can be applied to various downstream tasks in a zero-shot manner. However, in some fine…
▽ More
Multi-modal (vision-language) models, such as CLIP, are replacing traditional supervised pre-training models (e.g., ImageNet-based pre-training) as the new generation of visual foundation models. These models with robust and aligned semantic representations learned from billions of internet image-text pairs and can be applied to various downstream tasks in a zero-shot manner. However, in some fine-grained domains like medical imaging and remote sensing, the performance of multi-modal foundation models often leaves much to be desired. Consequently, many researchers have begun to explore few-shot adaptation methods for these models, gradually deriving three main technical approaches: 1) prompt-based methods, 2) adapter-based methods, and 3) external knowledge-based methods. Nevertheless, this rapidly developing field has produced numerous results without a comprehensive survey to systematically organize the research progress. Therefore, in this survey, we introduce and analyze the research advancements in few-shot adaptation methods for multi-modal models, summarizing commonly used datasets and experimental setups, and comparing the results of different methods. In addition, due to the lack of reliable theoretical support for existing methods, we derive the few-shot adaptation generalization error bound for multi-modal models. The theorem reveals that the generalization error of multi-modal foundation models is constrained by three factors: domain gap, model capacity, and sample size. Based on this, we propose three possible solutions from the following aspects: 1) adaptive domain generalization, 2) adaptive model selection, and 3) adaptive knowledge utilization.
△ Less
Submitted 4 January, 2024; v1 submitted 3 January, 2024;
originally announced January 2024.
-
P-TimeSync: A Precise Time Synchronization Simulation with Network Propagation Delays
Authors:
Wei Dai,
Rui Zhang,
Jinwei Liu
Abstract:
Time serves as the foundation of modern society and will continue to grow in value in the future world. Unlike previous research papers, authors delve into various time sources, ranging from atomic time and GPS time to quartz time. Specifically, we explore the time uncertainty associated with the four major Global Navigation Satellite Systems. In existing time synchronization simulations provide p…
▽ More
Time serves as the foundation of modern society and will continue to grow in value in the future world. Unlike previous research papers, authors delve into various time sources, ranging from atomic time and GPS time to quartz time. Specifically, we explore the time uncertainty associated with the four major Global Navigation Satellite Systems. In existing time synchronization simulations provide partial usages. However, our research introduces a comprehensive and precise time synchronization simulation named P-TimeSync, leading to a better understanding of time synchronization in distributed environments. It is a state-of-the-art simulation tool for time because (1) it can simulate atomic clocks and quartz clocks with user-defined software clock algorithms, (2) the simulation provides nanosecond-level precision time across different network propagation paths and distances, (3) the tool offers a visualization platform with classic algorithms for distributed time synchronization, such as Cristian's algorithm and Berkeley algorithm. The simulation easily allows for the redefinition of configurations and functions, supporting advanced research and development. The simulation tool could be downloaded via the website: https://github.com/rui5097/purdue_timesync
△ Less
Submitted 26 March, 2024; v1 submitted 2 January, 2024;
originally announced January 2024.
-
Plan, Posture and Go: Towards Open-World Text-to-Motion Generation
Authors:
Jinpeng Liu,
Wenxun Dai,
Chunyu Wang,
Yiji Cheng,
Yansong Tang,
Xin Tong
Abstract:
Conventional text-to-motion generation methods are usually trained on limited text-motion pairs, making them hard to generalize to open-world scenarios. Some works use the CLIP model to align the motion space and the text space, aiming to enable motion generation from natural language motion descriptions. However, they are still constrained to generate limited and unrealistic in-place motions. To…
▽ More
Conventional text-to-motion generation methods are usually trained on limited text-motion pairs, making them hard to generalize to open-world scenarios. Some works use the CLIP model to align the motion space and the text space, aiming to enable motion generation from natural language motion descriptions. However, they are still constrained to generate limited and unrealistic in-place motions. To address these issues, we present a divide-and-conquer framework named PRO-Motion, which consists of three modules as motion planner, posture-diffuser and go-diffuser. The motion planner instructs Large Language Models (LLMs) to generate a sequence of scripts describing the key postures in the target motion. Differing from natural languages, the scripts can describe all possible postures following very simple text templates. This significantly reduces the complexity of posture-diffuser, which transforms a script to a posture, paving the way for open-world generation. Finally, go-diffuser, implemented as another diffusion model, estimates whole-body translations and rotations for all postures, resulting in realistic motions. Experimental results have shown the superiority of our method with other counterparts, and demonstrated its capability of generating diverse and realistic motions from complex open-world prompts such as "Experiencing a profound sense of joy". The project page is available at https://moonsliu.github.io/Pro-Motion.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
LightGCNet: A Lightweight Geometric Constructive Neural Network for Data-Driven Soft sensors
Authors:
Jing Nan,
Yan Qin,
Wei Dai,
Chau Yuen
Abstract:
Data-driven soft sensors provide a potentially cost-effective and more accurate modeling approach to measure difficult-to-measure indices in industrial processes compared to mechanistic approaches. Artificial intelligence (AI) techniques, such as deep learning, have become a popular soft sensors modeling approach in the area of machine learning and big data. However, soft sensors models based deep…
▽ More
Data-driven soft sensors provide a potentially cost-effective and more accurate modeling approach to measure difficult-to-measure indices in industrial processes compared to mechanistic approaches. Artificial intelligence (AI) techniques, such as deep learning, have become a popular soft sensors modeling approach in the area of machine learning and big data. However, soft sensors models based deep learning potentially lead to complex model structures and excessive training time. In addition, industrial processes often rely on distributed control systems (DCS) characterized by resource constraints. Herein, guided by spatial geometric, a lightweight geometric constructive neural network, namely LightGCNet, is proposed, which utilizes compact angle constraint to assign the hidden parameters from dynamic intervals. At the same time, a node pool strategy and spatial geometric relationships are used to visualize and optimize the process of assigning hidden parameters, enhancing interpretability. In addition, the universal approximation property of LightGCNet is proved by spatial geometric analysis. Two versions algorithmic implementations of LightGCNet are presented in this article. Simulation results concerning both benchmark datasets and the ore grinding process indicate remarkable merits of LightGCNet in terms of small network size, fast learning speed, and sound generalization.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Spatial-Temporal DAG Convolutional Networks for End-to-End Joint Effective Connectivity Learning and Resting-State fMRI Classification
Authors:
Rui Yang,
Wenrui Dai,
Huajun She,
Yiping P. Du,
Dapeng Wu,
Hongkai Xiong
Abstract:
Building comprehensive brain connectomes has proved of fundamental importance in resting-state fMRI (rs-fMRI) analysis. Based on the foundation of brain network, spatial-temporal-based graph convolutional networks have dramatically improved the performance of deep learning methods in rs-fMRI time series classification. However, existing works either pre-define the brain network as the correlation…
▽ More
Building comprehensive brain connectomes has proved of fundamental importance in resting-state fMRI (rs-fMRI) analysis. Based on the foundation of brain network, spatial-temporal-based graph convolutional networks have dramatically improved the performance of deep learning methods in rs-fMRI time series classification. However, existing works either pre-define the brain network as the correlation matrix derived from the raw time series or jointly learn the connectome and model parameters without any topology constraint. These methods could suffer from degraded classification performance caused by the deviation from the intrinsic brain connectivity and lack biological interpretability of demonstrating the causal structure (i.e., effective connectivity) among brain regions. Moreover, most existing methods for effective connectivity learning are unaware of the downstream classification task and cannot sufficiently exploit useful rs-fMRI label information. To address these issues in an end-to-end manner, we model the brain network as a directed acyclic graph (DAG) to discover direct causal connections between brain regions and propose Spatial-Temporal DAG Convolutional Network (ST-DAGCN) to jointly infer effective connectivity and classify rs-fMRI time series by learning brain representations based on nonlinear structural equation model. The optimization problem is formulated into a continuous program and solved with score-based learning method via gradient descent. We evaluate ST-DAGCN on two public rs-fMRI databases. Experiments show that ST-DAGCN outperforms existing models by evident margins in rs-fMRI classification and simultaneously learns meaningful edges of effective connectivity that help understand brain activity patterns and pathological mechanisms in brain disease.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
scBiGNN: Bilevel Graph Representation Learning for Cell Type Classification from Single-cell RNA Sequencing Data
Authors:
Rui Yang,
Wenrui Dai,
Chenglin Li,
Junni Zou,
Dapeng Wu,
Hongkai Xiong
Abstract:
Single-cell RNA sequencing (scRNA-seq) technology provides high-throughput gene expression data to study the cellular heterogeneity and dynamics of complex organisms. Graph neural networks (GNNs) have been widely used for automatic cell type classification, which is a fundamental problem to solve in scRNA-seq analysis. However, existing methods do not sufficiently exploit both gene-gene and cell-c…
▽ More
Single-cell RNA sequencing (scRNA-seq) technology provides high-throughput gene expression data to study the cellular heterogeneity and dynamics of complex organisms. Graph neural networks (GNNs) have been widely used for automatic cell type classification, which is a fundamental problem to solve in scRNA-seq analysis. However, existing methods do not sufficiently exploit both gene-gene and cell-cell relationships, and thus the true potential of GNNs is not realized. In this work, we propose a bilevel graph representation learning method, named scBiGNN, to simultaneously mine the relationships at both gene and cell levels for more accurate single-cell classification. Specifically, scBiGNN comprises two GNN modules to identify cell types. A gene-level GNN is established to adaptively learn gene-gene interactions and cell representations via the self-attention mechanism, and a cell-level GNN builds on the cell-cell graph that is constructed from the cell representations generated by the gene-level GNN. To tackle the scalability issue for processing a large number of cells, scBiGNN adopts an Expectation Maximization (EM) framework in which the two modules are alternately trained via the E-step and M-step to learn from each other. Through this interaction, the gene- and cell-level structural information is integrated to gradually enhance the classification performance of both GNN modules. Experiments on benchmark datasets demonstrate that our scBiGNN outperforms a variety of existing methods for cell type classification from scRNA-seq data.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
AEGIS-Net: Attention-guided Multi-Level Feature Aggregation for Indoor Place Recognition
Authors:
Yuhang Ming,
Jian Ma,
Xingrui Yang,
Weichen Dai,
Yong Peng,
Wanzeng Kong
Abstract:
We present AEGIS-Net, a novel indoor place recognition model that takes in RGB point clouds and generates global place descriptors by aggregating lower-level color, geometry features and higher-level implicit semantic features. However, rather than simple feature concatenation, self-attention modules are employed to select the most important local features that best describe an indoor place. Our A…
▽ More
We present AEGIS-Net, a novel indoor place recognition model that takes in RGB point clouds and generates global place descriptors by aggregating lower-level color, geometry features and higher-level implicit semantic features. However, rather than simple feature concatenation, self-attention modules are employed to select the most important local features that best describe an indoor place. Our AEGIS-Net is made of a semantic encoder, a semantic decoder and an attention-guided feature embedding. The model is trained in a 2-stage process with the first stage focusing on an auxiliary semantic segmentation task and the second one on the place recognition task. We evaluate our AEGIS-Net on the ScanNetPR dataset and compare its performance with a pre-deep-learning feature-based method and five state-of-the-art deep-learning-based methods. Our AEGIS-Net achieves exceptional performance and outperforms all six methods.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
A quantitative fusion strategy of stock picking and timing based on Particle Swarm Optimized-Back Propagation Neural Network and Multivariate Gaussian-Hidden Markov Model
Authors:
Huajian Li,
Longjian Li,
Jiajian Liang,
Weinan Dai
Abstract:
In recent years, machine learning (ML) has brought effective approaches and novel techniques to economic decision, investment forecasting, and risk management, etc., coping the variable and intricate nature of economic and financial environments. For the investment in stock market, this research introduces a pioneering quantitative fusion model combining stock timing and picking strategy by levera…
▽ More
In recent years, machine learning (ML) has brought effective approaches and novel techniques to economic decision, investment forecasting, and risk management, etc., coping the variable and intricate nature of economic and financial environments. For the investment in stock market, this research introduces a pioneering quantitative fusion model combining stock timing and picking strategy by leveraging the Multivariate Gaussian-Hidden Markov Model (MGHMM) and Back Propagation Neural Network optimized by Particle Swarm (PSO-BPNN). After the information coefficients (IC) between fifty-two factors that have been winsorized, neutralized and standardized and the return of CSI 300 index are calculated, a given amount of factors that rank ahead are choose to be candidate factors heading for the input of PSO-BPNN after dimension reduction by Principal Component Analysis (PCA), followed by a certain amount of constituent stocks outputted. Subsequently, we conduct the prediction and trading on the basis of the screening stocks and stock market state outputted by MGHMM trained using inputting CSI 300 index data after Box-Cox transformation, bespeaking eximious performance during the period of past four years. Ultimately, some conventional forecast and trading methods are compared with our strategy in Chinese stock market. Our fusion strategy incorporating stock picking and timing presented in this article provide a innovative technique for financial analysis.
△ Less
Submitted 22 December, 2023; v1 submitted 9 December, 2023;
originally announced December 2023.
-
Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views
Authors:
Yabo Chen,
Jiemin Fang,
Yuyang Huang,
Taoran Yi,
Xiaopeng Zhang,
Lingxi Xie,
Xinggang Wang,
Wenrui Dai,
Hongkai Xiong,
Qi Tian
Abstract:
Synthesizing multi-view 3D from one single image is a significant and challenging task. For this goal, Zero-1-to-3 methods aim to extend a 2D latent diffusion model to the 3D scope. These approaches generate the target-view image with a single-view source image and the camera pose as condition information. However, the one-to-one manner adopted in Zero-1-to-3 incurs challenges for building geometr…
▽ More
Synthesizing multi-view 3D from one single image is a significant and challenging task. For this goal, Zero-1-to-3 methods aim to extend a 2D latent diffusion model to the 3D scope. These approaches generate the target-view image with a single-view source image and the camera pose as condition information. However, the one-to-one manner adopted in Zero-1-to-3 incurs challenges for building geometric and visual consistency across views, especially for complex objects. We propose a cascade generation framework constructed with two Zero-1-to-3 models, named Cascade-Zero123, to tackle this issue, which progressively extracts 3D information from the source image. Specifically, a self-prompting mechanism is designed to generate several nearby views at first. These views are then fed into the second-stage model along with the source image as generation conditions. With self-prompted multiple views as the supplementary information, our Cascade-Zero123 generates more highly consistent novel-view images than Zero-1-to-3. The promotion is significant for various complex and challenging scenes, involving insects, humans, transparent objects, and stacked multiple objects etc. The project page is at https://cascadezero123.github.io/.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
ASI: Accuracy-Stability Index for Evaluating Deep Learning Models
Authors:
Wei Dai,
Daniel Berleant
Abstract:
In the context of deep learning research, where model introductions continually occur, the need for effective and efficient evaluation remains paramount. Existing methods often emphasize accuracy metrics, overlooking stability. To address this, the paper introduces the Accuracy-Stability Index (ASI), a quantitative measure incorporating both accuracy and stability for assessing deep learning model…
▽ More
In the context of deep learning research, where model introductions continually occur, the need for effective and efficient evaluation remains paramount. Existing methods often emphasize accuracy metrics, overlooking stability. To address this, the paper introduces the Accuracy-Stability Index (ASI), a quantitative measure incorporating both accuracy and stability for assessing deep learning models. Experimental results demonstrate the application of ASI, and a 3D surface model is presented for visualizing ASI, mean accuracy, and coefficient of variation. This paper addresses the important issue of quantitative benchmarking metrics for deep learning models, providing a new approach for accurately evaluating accuracy and stability of deep learning models. The paper concludes with discussions on potential weaknesses and outlines future research directions.
△ Less
Submitted 14 February, 2024; v1 submitted 26 November, 2023;
originally announced November 2023.
-
Low-level radiofrequency system upgrade for the Dalian Coherent Light Source
Authors:
H. L. Ding,
J. F. Zhu,
H. K. Li,
J. W. Han,
X. W. Dai,
J. Y. Yang,
W. Q. Zhang
Abstract:
DCLS (Dalian Coherent Light Source) is an FEL (Free-Electron Laser) user facility at EUV (Extreme Ultraviolet). The primary accelerator of DCLS operates at a repetition rate of 20 Hz, and the beam is divided at the end of the linear accelerator through Kicker to make two 10 Hz beamlines work simultaneously. In the past year, we have completed the upgrade of the DCLS LLRF (Low-Level Radiofrequency)…
▽ More
DCLS (Dalian Coherent Light Source) is an FEL (Free-Electron Laser) user facility at EUV (Extreme Ultraviolet). The primary accelerator of DCLS operates at a repetition rate of 20 Hz, and the beam is divided at the end of the linear accelerator through Kicker to make two 10 Hz beamlines work simultaneously. In the past year, we have completed the upgrade of the DCLS LLRF (Low-Level Radiofrequency) system, including setting the microwave amplitude and phase for two beamlines based on event timing, optimizing the microwave stability, and generating microwave excitation with the arbitrary shape of amplitude and phase. We added two special event codes and a repetition rate division of 10 Hz in the event timing system and set the microwave amplitude and phase by judging the event code in LLRF. The amplitude and phase stability of the microwave was improved with an intra-pulse feedforward algorithm. In addition, we have also generated microwave excitation with arbitrary amplitude and phase shapes to meet the dual beam operation in the future. Detailed information on functions or algorithms will be presented in this paper.
△ Less
Submitted 24 October, 2023;
originally announced November 2023.
-
A low-delay reference tracking algorithm for microwave measurement and control
Authors:
J. F. Zhu,
H. L. Ding,
H. K. Li,
J. W. Han,
X. W. Dai,
Z. C. Chen,
J. Y. Yang,
W. Q. Zhang
Abstract:
In FEL (Free-Electron Laser) accelerators, LLRF (Low-Level Radiofrequency) systems usually deploy feedback or feedforward algorithms requiring precise microwave measurement. The slow drift of the clock allocation network of LLRF significantly impacts the measured microwave phase, thereby affecting the stability of the closed-loop operation. The reference tracking algorithm is used to eliminate the…
▽ More
In FEL (Free-Electron Laser) accelerators, LLRF (Low-Level Radiofrequency) systems usually deploy feedback or feedforward algorithms requiring precise microwave measurement. The slow drift of the clock allocation network of LLRF significantly impacts the measured microwave phase, thereby affecting the stability of the closed-loop operation. The reference tracking algorithm is used to eliminate the measurement drift. The conventional algorithm is to perform phase and amplitude demodulation on the synchronous reference signal from the main oscillator and subtract the reference phase in other measurement channels. The demodulation is usually based on the CORDIC, which requires approximately 16 clock cycles in FPGA (Field Programmable Gate Arrays). This paper uses the multiplication of complex numbers, which only requires four clock cycles of computational delay and achieves phase subtraction point by point. However, experiments show that it causes irrelevant amplitude noise to overlap and increase the amplitude measurement noise. Nevertheless, this reference tracking algorithm is suitable for control algorithms with low-delay requirements of microwave measurement.
△ Less
Submitted 24 October, 2023;
originally announced November 2023.
-
The microwave amplitude and phase setting based on event timing for the DCLS
Authors:
J. F. Zhu,
H. L. Ding,
H. K. Li,
J. W. Han,
X. W. Dai,
B. Xu,
L. Shi,
J. Y. Yang,
W. Q. Zhang
Abstract:
The primary accelerator of DCLS (Dalian Coherent Light Source) operates at a repetition rate of 20 Hz now, and the beam is divided at the end of the linear accelera-tor through Kicker to make two 10 Hz beamlines work simultaneously. For the simultaneous emission FEL of two beamlines, the beam energy of the two beamlines is required to be controlled independently, so we need to set the amplitude an…
▽ More
The primary accelerator of DCLS (Dalian Coherent Light Source) operates at a repetition rate of 20 Hz now, and the beam is divided at the end of the linear accelera-tor through Kicker to make two 10 Hz beamlines work simultaneously. For the simultaneous emission FEL of two beamlines, the beam energy of the two beamlines is required to be controlled independently, so we need to set the amplitude and phase of each beamline. This paper implements a microwave amplitude and phase setting function based on event timing. We upgraded the EVG/EVR event timing system and LLRF (Low-Level Radiofrequency) system. Two special event codes and a repetition rate division of 10 Hz are added to the event timing system, and we can set the microwave amplitude and phase by judging the event code in LLRF. We ulti-mately perform the microwave triggering at a repetition rate of 10 Hz for each beamline and validate this function through beam experiments.
△ Less
Submitted 24 October, 2023;
originally announced November 2023.
-
Erasure detection of a dual-rail qubit encoded in a double-post superconducting cavity
Authors:
Akshay Koottandavida,
Ioannis Tsioutsios,
Aikaterini Kargioti,
Cassady R. Smith,
Vidul R. Joshi,
Wei Dai,
James D. Teoh,
Jacob C. Curtis,
Luigi Frunzio,
Robert J. Schoelkopf,
Michel H. Devoret
Abstract:
Qubits with predominantly erasure errors present distinctive advantages for quantum error correction(QEC) and fault tolerant quantum computing. Logical qubits based on dual-rail encoding that exploit erasure detection have been recently proposed in superconducting circuit architectures, either with coupled transmons or cavities. Here, we implement a dual-rail qubit encoded in a compact, double-pos…
▽ More
Qubits with predominantly erasure errors present distinctive advantages for quantum error correction(QEC) and fault tolerant quantum computing. Logical qubits based on dual-rail encoding that exploit erasure detection have been recently proposed in superconducting circuit architectures, either with coupled transmons or cavities. Here, we implement a dual-rail qubit encoded in a compact, double-post superconducting cavity. Using an auxiliary transmon, we perform erasure detection on the dual-rail subspace. We characterize the behaviour of the codespace by a novel method to perform joint-Wigner tomography. This is based on modifying the cross-Kerr interaction between the cavity modes and the transmon. We measure an erasure rate of 3.981 +/- 0.003 (ms)-1 and a residual dephasing error rate up to 0.17 (ms)-1 within the codespace. This strong hierarchy of error rates, together with the compact and hardware-efficient nature of this novel architecture, hold promise in realising QEC schemes with enhanced thresholds and improved scaling.
△ Less
Submitted 17 November, 2023; v1 submitted 7 November, 2023;
originally announced November 2023.