-
HPC: Hierarchical Progressive Coding Framework for Volumetric Video
Authors:
Zihan Zheng,
Houqiang Zhong,
Qiang Hu,
Xiaoyun Zhang,
Li Song,
Ya Zhang,
Yanfeng Wang
Abstract:
Volumetric video based on Neural Radiance Field (NeRF) holds vast potential for various 3D applications, but its substantial data volume poses significant challenges for compression and transmission. Current NeRF compression lacks the flexibility to adjust video quality and bitrate within a single model for various network and device capacities. To address these issues, we propose HPC, a novel hie…
▽ More
Volumetric video based on Neural Radiance Field (NeRF) holds vast potential for various 3D applications, but its substantial data volume poses significant challenges for compression and transmission. Current NeRF compression lacks the flexibility to adjust video quality and bitrate within a single model for various network and device capacities. To address these issues, we propose HPC, a novel hierarchical progressive volumetric video coding framework achieving variable bitrate using a single model. Specifically, HPC introduces a hierarchical representation with a multi-resolution residual radiance field to reduce temporal redundancy in long-duration sequences while simultaneously generating various levels of detail. Then, we propose an end-to-end progressive learning approach with a multi-rate-distortion loss function to jointly optimize both hierarchical representation and compression. Our HPC trained only once can realize multiple compression levels, while the current methods need to train multiple fixed-bitrate models for different rate-distortion (RD) tradeoffs. Extensive experiments demonstrate that HPC achieves flexible quality levels with variable bitrate by a single model and exhibits competitive RD performance, even outperforming fixed-bitrate models across various datasets.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
From domain walls and the stripe phase to full suppression of charge density wave in the superconducting 1T-Ti$_{1-\text{x}}$Ta$_\text{x}$Se$_2$
Authors:
Q. Hu,
R. Venturini,
Y. Vaskivskyi,
J. Lipič,
Z. Jagličić,
D. Mihailovic
Abstract:
1T-TiSe$_2$ hosts a $2 \times 2 \times 2$ charge density wave (CDW) that is known to form the state with localized domains separated by the domain walls upon Cu intercalation. The CDW state with the domain wall network has attracted significant interest due to its coexistence with superconductivity. Here we present a scanning tunneling microscopy, transport and magnetic susceptibility study of 1T-…
▽ More
1T-TiSe$_2$ hosts a $2 \times 2 \times 2$ charge density wave (CDW) that is known to form the state with localized domains separated by the domain walls upon Cu intercalation. The CDW state with the domain wall network has attracted significant interest due to its coexistence with superconductivity. Here we present a scanning tunneling microscopy, transport and magnetic susceptibility study of 1T-Ti$_{1-\text{x}}$Ta$_\text{x}$Se$_2$. Ta substitution for Ti atoms allows us to perform experiments over the wide range of doping ($ 0 \leqslant \text{x} \leqslant 0.2$), providing access to a significantly broader phase diagram than Cu intercalation experiments. At x = 0.02, we observe a complex network of domains and domain walls. We identify two distinct types of domain walls and show their structure with atomic resolution. Additionally, an elusive symmetry-breaking stripe CDW is found at the light substitution of x = 0.02. We also measure highly substituted x = 0.2 crystals that are superconducting despite the full collapse of the CDW order. Our results uncover rich CDW physics in Ta-substituted 1T-TiSe2 crystals and illuminate the interplay between the CDW and superconductivity.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
A plan for a super $η$ factory at Huizhou accelerator complex
Authors:
Xu-Rong Chen,
Xiong-Hong He,
Qiang Hu,
De-Xu Lin,
Yang Liu,
Hao Qiu,
Xu Sun,
Ye Tian,
Rong Wang,
Hong-Lin Zhang,
Ya-Peng Zhang,
Cheng-Xin Zhao
Abstract:
As a Goldstone boson with zero quantum number and zero SM charge, the decays of long-lived $η$ ($η^{\prime}$) meson provide a unique window to search new physics beyond the standard model and new sources of CP violation, to test the low-energy QCD theory, and to measure the fundamental parameters of light quarks. For such goals in the physics frontiers we discuss a plan of building a super $η$ fac…
▽ More
As a Goldstone boson with zero quantum number and zero SM charge, the decays of long-lived $η$ ($η^{\prime}$) meson provide a unique window to search new physics beyond the standard model and new sources of CP violation, to test the low-energy QCD theory, and to measure the fundamental parameters of light quarks. For such goals in the physics frontiers we discuss a plan of building a super $η$ factory at HIAF high-energy terminal or at CiADS after upgrade. The high-intensity proton beam at HIAF provides a great opportunity of producing a huge number of $η$ samples of more than $10^{13}$ events per year, with multiple layers of thin targets of light nucleus. The physics goals, the first-version conceptual design of the spectrometer, and the preliminary simulation results are present.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Prospects for the detection of very-high-energy pulsars with LHAASO and SWGO
Authors:
Quan Hu,
Yi Zhang,
Kaikai Duan,
Houdun Zeng
Abstract:
Pulsations from the Crab pulsar have been detected by the MAGIC telescopes at energies up to 1.5 TeV, and the pulsed emission from the Vela pulsar was detected by H.E.S.S., reaching tens of TeV. These discoveries, along with the proposed additional emission due to inverse Compton scattering at TeV energies, lead us to consider suitable candidates for detection with current and future extensive air…
▽ More
Pulsations from the Crab pulsar have been detected by the MAGIC telescopes at energies up to 1.5 TeV, and the pulsed emission from the Vela pulsar was detected by H.E.S.S., reaching tens of TeV. These discoveries, along with the proposed additional emission due to inverse Compton scattering at TeV energies, lead us to consider suitable candidates for detection with current and future extensive air show (EAS) experiments at very-high-energy (VHE; 0.1 $-$ 100 TeV) ranges. Leveraging energy spectrum data from pulsars as observed by Fermi and Imaging Atmospheric Cherenkov Telescopes (IACTs) and considering the sensitivities of both LHAASO and SWGO, this study evaluates their detectability and estimates the time required for their significant detection. Our results indicate that LHAASO could detect the Crab's pulsed signal within six years, while SWGO might detect Vela's signal within one year. Observations of the most energetic Fermi pulsars with EAS experiments will provide insight into the nature of VHE pulsar emissions, helping to clarify the primary characteristics of VHE pulsars.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
Basketball-SORT: An Association Method for Complex Multi-object Occlusion Problems in Basketball Multi-object Tracking
Authors:
Qingrui Hu,
Atom Scott,
Calvin Yeung,
Keisuke Fujii
Abstract:
Recent deep learning-based object detection approaches have led to significant progress in multi-object tracking (MOT) algorithms. The current MOT methods mainly focus on pedestrian or vehicle scenes, but basketball sports scenes are usually accompanied by three or more object occlusion problems with similar appearances and high-intensity complex motions, which we call complex multi-object occlusi…
▽ More
Recent deep learning-based object detection approaches have led to significant progress in multi-object tracking (MOT) algorithms. The current MOT methods mainly focus on pedestrian or vehicle scenes, but basketball sports scenes are usually accompanied by three or more object occlusion problems with similar appearances and high-intensity complex motions, which we call complex multi-object occlusion (CMOO). Here, we propose an online and robust MOT approach, named Basketball-SORT, which focuses on the CMOO problems in basketball videos. To overcome the CMOO problem, instead of using the intersection-over-union-based (IoU-based) approach, we use the trajectories of neighboring frames based on the projected positions of the players. Our method designs the basketball game restriction (BGR) and reacquiring Long-Lost IDs (RLLI) based on the characteristics of basketball scenes, and we also solve the occlusion problem based on the player trajectories and appearance features. Experimental results show that our method achieves a Higher Order Tracking Accuracy (HOTA) score of 63.48$\%$ on the basketball fixed video dataset and outperforms other recent popular approaches. Overall, our approach solved the CMOO problem more effectively than recent MOT algorithms.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context Parallelism
Authors:
Diandian Gu,
Peng Sun,
Qinghao Hu,
Ting Huang,
Xun Chen,
Yingtong Xiong,
Guoteng Wang,
Qiaoling Chen,
Shangchun Zhao,
Jiarui Fang,
Yonggang Wen,
Tianwei Zhang,
Xin Jin,
Xuanzhe Liu
Abstract:
Efficiently training LLMs with long sequences is important yet challenged by the massive computation and memory requirements. Sequence parallelism has been proposed to tackle these problems, but existing methods suffer from scalability or efficiency issues. We propose LoongTrain, a novel system to efficiently train LLMs with long sequences at scale. The core of LoongTrain is the 2D-Attention mecha…
▽ More
Efficiently training LLMs with long sequences is important yet challenged by the massive computation and memory requirements. Sequence parallelism has been proposed to tackle these problems, but existing methods suffer from scalability or efficiency issues. We propose LoongTrain, a novel system to efficiently train LLMs with long sequences at scale. The core of LoongTrain is the 2D-Attention mechanism, which combines both head-parallel and context-parallel techniques to break the scalability constraints while maintaining efficiency. We introduce Double-Ring-Attention and analyze the performance of device placement strategies to further speed up training. We implement LoongTrain with the hybrid ZeRO and Selective Checkpoint++ techniques. Experiment results show that LoongTrain outperforms state-of-the-art baselines, i.e., DeepSpeed-Ulysses and Megatron Context Parallelism, in both end-to-end training speed and scalability, and improves Model FLOPs Utilization (MFU) by up to 2.88x.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
SNR-Progressive Model with Harmonic Compensation for Low-SNR Speech Enhancement
Authors:
Zhongshu Hou,
Qinwen Hu,
Zhanzhong Cao,
Ming Tang,
Jing Lu
Abstract:
Despite significant progress made in the last decade, deep neural network (DNN) based speech enhancement (SE) still faces the challenge of notable degradation in the quality of recovered speech under low signal-to-noise ratio (SNR) conditions. In this letter, we propose an SNR-progressive speech enhancement model with harmonic compensation for low-SNR SE. Reliable pitch estimation is obtained from…
▽ More
Despite significant progress made in the last decade, deep neural network (DNN) based speech enhancement (SE) still faces the challenge of notable degradation in the quality of recovered speech under low signal-to-noise ratio (SNR) conditions. In this letter, we propose an SNR-progressive speech enhancement model with harmonic compensation for low-SNR SE. Reliable pitch estimation is obtained from the intermediate output, which has the benefit of retaining more speech components than the coarse estimate while possessing a significant higher SNR than the input noisy speech. An effective harmonic compensation mechanism is introduced for better harmonic recovery. Extensive ex-periments demonstrate the advantage of our proposed model. A multi-modal speech extraction system based on the proposed backbone model ranks first in the ICASSP 2024 MISP Challenge: https://mispchallenge.github.io/mispchallenge2023/index.html.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Sublattice Dichotomy in Monolayer FeSe Superconductor
Authors:
Cui Ding,
Zhipeng Xu,
Xiaotong Jiao,
Qiyin Hu,
Wenxuan Zhao,
Lexian Yang,
Kun Jiang,
Jin-Feng Jia,
Lili Wang,
Jiangping Hu,
Qi-Kun Xue
Abstract:
The pairing mechanism behind the monolayer FeSe is one essential question for iron-based superconductors. In this work, we show the sublattice degree of freedoms of monolayer FeSe plays a special role in its pairing properties, namely the sublattice dichotomy. The high-quality monolayer FeSe samples with atomic flat $1\times1$ topography on the SrTiO$_3$(001) substrates are grown by molecular beam…
▽ More
The pairing mechanism behind the monolayer FeSe is one essential question for iron-based superconductors. In this work, we show the sublattice degree of freedoms of monolayer FeSe plays a special role in its pairing properties, namely the sublattice dichotomy. The high-quality monolayer FeSe samples with atomic flat $1\times1$ topography on the SrTiO$_3$(001) substrates are grown by molecular beam epitaxy. By comparing the tunneling spectra at $α$ and $β$ Fe sublattices, we find the coherence peak of $α$-Fe at the inner gap $+V_i$ is higher than $β$-Fe while the coherence peak of $β$-Fe at $-V_i$ is higher than $α$-Fe with a similar amount. We also observed a reversed effect at the outer gap $\pm V_o$. We propose the $η$-pairing mechanism between $k$ and $-k+Q$ is the key mechanism for this unconventional sublattice dichotomy effect.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
SALI: Short-term Alignment and Long-term Interaction Network for Colonoscopy Video Polyp Segmentation
Authors:
Qiang Hu,
Zhenyu Yi,
Ying Zhou,
Fang Peng,
Mei Liu,
Qiang Li,
Zhiwei Wang
Abstract:
Colonoscopy videos provide richer information in polyp segmentation for rectal cancer diagnosis. However, the endoscope's fast moving and close-up observing make the current methods suffer from large spatial incoherence and continuous low-quality frames, and thus yield limited segmentation accuracy. In this context, we focus on robust video polyp segmentation by enhancing the adjacent feature cons…
▽ More
Colonoscopy videos provide richer information in polyp segmentation for rectal cancer diagnosis. However, the endoscope's fast moving and close-up observing make the current methods suffer from large spatial incoherence and continuous low-quality frames, and thus yield limited segmentation accuracy. In this context, we focus on robust video polyp segmentation by enhancing the adjacent feature consistency and rebuilding the reliable polyp representation. To achieve this goal, we in this paper propose SALI network, a hybrid of Short-term Alignment Module (SAM) and Long-term Interaction Module (LIM). The SAM learns spatial-aligned features of adjacent frames via deformable convolution and further harmonizes them to capture more stable short-term polyp representation. In case of low-quality frames, the LIM stores the historical polyp representations as a long-term memory bank, and explores the retrospective relations to interactively rebuild more reliable polyp features for the current segmentation. Combing SAM and LIM, the SALI network of video segmentation shows a great robustness to the spatial variations and low-visual cues. Benchmark on the large-scale SUNSEG verifies the superiority of SALI over the current state-of-the-arts by improving Dice by 2.1%, 2.5%, 4.1% and 1.9%, for the four test sub-sets, respectively. Codes are at https://github.com/Scatteredrain/SALI.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Interlayer Fermi polarons of excited exciton states in quantizing magnetic fields
Authors:
Huiying Cui,
Qianying Hu,
Xuan Zhao,
Liguo Ma,
Feng Jin,
Qingming Zhang,
Kenji Watanabe,
Takashi Taniguchi,
Jie Shan,
Kin Fai Mak,
Yongqing Li,
Yang Xu
Abstract:
The study of exciton-polarons has offered profound insights into the many-body interactions between bosonic excitations and their immersed Fermi sea within layered heterostructures. However, little is known about the properties of exciton polarons with interlayer interactions. Here through magneto-optical reflectance contrast measurements, we experimentally investigate interlayer Fermi polarons fo…
▽ More
The study of exciton-polarons has offered profound insights into the many-body interactions between bosonic excitations and their immersed Fermi sea within layered heterostructures. However, little is known about the properties of exciton polarons with interlayer interactions. Here through magneto-optical reflectance contrast measurements, we experimentally investigate interlayer Fermi polarons for 2s excitons in WSe$_2$/graphene heterostructures, where the excited exciton states (2s) in the WSe$_2$ layer are dressed by free charge carriers of the adjacent graphene layer in the Landau quantization regime. First, such a system enables an optical detection of integer and fractional quantum Hall states (e.g. $ν=\pm1/3$, $\pm$2/3) of monolayer graphene. Furthermore, we observe that the 2s state evolves into two distinct branches, denoted as attractive and repulsive polarons, when graphene is doped out of the incompressible quantum Hall gaps. Our work paves the way for the understanding of the excited composite quasiparticles and Bose-Fermi mixtures.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Fine-Grained Domain Generalization with Feature Structuralization
Authors:
Wenlong Yu,
Dongyue Chen,
Qilong Wang,
Qinghua Hu
Abstract:
Fine-grained domain generalization (FGDG) is a more challenging task than traditional DG tasks due to its small inter-class variations and relatively large intra-class disparities. When domain distribution changes, the vulnerability of subtle features leads to a severe deterioration in model performance. Nevertheless, humans inherently demonstrate the capacity for generalizing to out-of-distributi…
▽ More
Fine-grained domain generalization (FGDG) is a more challenging task than traditional DG tasks due to its small inter-class variations and relatively large intra-class disparities. When domain distribution changes, the vulnerability of subtle features leads to a severe deterioration in model performance. Nevertheless, humans inherently demonstrate the capacity for generalizing to out-of-distribution data, leveraging structured multi-granularity knowledge that emerges from discerning the commonality and specificity within categories. Likewise, we propose a Feature Structuralized Domain Generalization (FSDG) model, wherein features experience structuralization into common, specific, and confounding segments, harmoniously aligned with their relevant semantic concepts, to elevate performance in FGDG. Specifically, feature structuralization (FS) is accomplished through joint optimization of five constraints: a decorrelation function applied to disentangled segments, three constraints ensuring common feature consistency and specific feature distinctiveness, and a prediction calibration term. By imposing these stipulations, FSDG is prompted to disentangle and align features based on multi-granularity knowledge, facilitating robust subtle distinctions among categories. Extensive experimentation on three benchmarks consistently validates the superiority of FSDG over state-of-the-art counterparts, with an average improvement of 6.2% in FGDG performance. Beyond that, the explainability analysis on explicit concept matching intensity between the shared concepts among categories and the model channels, along with experiments on various mainstream model architectures, substantiates the validity of FS.
△ Less
Submitted 17 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Efficient Multi-View Fusion and Flexible Adaptation to View Missing in Cardiovascular System Signals
Authors:
Qihan Hu,
Daomiao Wang,
Hong Wu,
Jian Liu,
Cuiwei Yang
Abstract:
The progression of deep learning and the widespread adoption of sensors have facilitated automatic multi-view fusion (MVF) about the cardiovascular system (CVS) signals. However, prevalent MVF model architecture often amalgamates CVS signals from the same temporal step but different views into a unified representation, disregarding the asynchronous nature of cardiovascular events and the inherent…
▽ More
The progression of deep learning and the widespread adoption of sensors have facilitated automatic multi-view fusion (MVF) about the cardiovascular system (CVS) signals. However, prevalent MVF model architecture often amalgamates CVS signals from the same temporal step but different views into a unified representation, disregarding the asynchronous nature of cardiovascular events and the inherent heterogeneity across views, leading to catastrophic view confusion. Efficient training strategies specifically tailored for MVF models to attain comprehensive representations need simultaneous consideration. Crucially, real-world data frequently arrives with incomplete views, an aspect rarely noticed by researchers. Thus, the View-Centric Transformer (VCT) and Multitask Masked Autoencoder (M2AE) are specifically designed to emphasize the centrality of each view and harness unlabeled data to achieve superior fused representations. Additionally, we systematically define the missing-view problem for the first time and introduce prompt techniques to aid pretrained MVF models in flexibly adapting to various missing-view scenarios. Rigorous experiments involving atrial fibrillation detection, blood pressure estimation, and sleep staging-typical health monitoring tasks-demonstrate the remarkable advantage of our method in MVF compared to prevailing methodologies. Notably, the prompt technique requires finetuning less than 3% of the entire model's data, substantially fortifying the model's resilience to view missing while circumventing the need for complete retraining. The results demonstrate the effectiveness of our approaches, highlighting their potential for practical applications in cardiovascular health monitoring. Codes and models are released at URL.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Link between cascade transitions and correlated Chern insulators in magic-angle twisted bilayer graphene
Authors:
Qianying Hu,
Shu Liang,
Xinheng Li,
Hao Shi,
Xi Dai,
Yang Xu
Abstract:
Chern insulators are topologically non-trivial states of matter characterized by incompressible bulk and chiral edge states. Incorporating topological Chern bands with strong electronic correlations provides a versatile playground for studying emergent quantum phenomena. In this study, we resolve the correlated Chern insulators (CCIs) in magic-angle twisted bilayer graphene (MATBG) through Rydberg…
▽ More
Chern insulators are topologically non-trivial states of matter characterized by incompressible bulk and chiral edge states. Incorporating topological Chern bands with strong electronic correlations provides a versatile playground for studying emergent quantum phenomena. In this study, we resolve the correlated Chern insulators (CCIs) in magic-angle twisted bilayer graphene (MATBG) through Rydberg exciton sensing spectroscopy, and unveil their direct link with the zero-field cascade features in the electronic compressibility. The compressibility minima in the cascade are found to deviate substantially from nearby integer fillings (by $Δν$) and coincide with the onsets of CCIs in doping densities, yielding a quasi-universal relation $B_c$=$Φ_0Δν/C$ (onset magnetic field $B_c$, magnetic flux quantum $Φ_0$ and Chern number $C$). We suggest these onsets lie on the intersection where the integer filling of localized "f-orbitals" and Chern bands are simultaneously reached. Our findings update the field-dependent phase diagram of MATBG and directly support the topological heavy fermion model.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes…
▽ More
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γ$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Provable Optimization for Adversarial Fair Self-supervised Contrastive Learning
Authors:
Qi Qi,
Quanqi Hu,
Qihang Lin,
Tianbao Yang
Abstract:
This paper studies learning fair encoders in a self-supervised learning (SSL) setting, in which all data are unlabeled and only a small portion of them are annotated with sensitive attribute.
Adversarial fair representation learning is well suited for this scenario by minimizing a contrastive loss over unlabeled data while maximizing an adversarial loss of predicting the sensitive attribute over…
▽ More
This paper studies learning fair encoders in a self-supervised learning (SSL) setting, in which all data are unlabeled and only a small portion of them are annotated with sensitive attribute.
Adversarial fair representation learning is well suited for this scenario by minimizing a contrastive loss over unlabeled data while maximizing an adversarial loss of predicting the sensitive attribute over the data with sensitive attribute. Nevertheless, optimizing adversarial fair representation learning presents significant challenges due to solving a non-convex non-concave minimax game. The complexity deepens when incorporating a global contrastive loss that contrasts each anchor data point against all other examples. A central question is ``{\it can we design a provable yet efficient algorithm for solving adversarial fair self-supervised contrastive learning}?'' Building on advanced optimization techniques, we propose a stochastic algorithm dubbed SoFCLR with a convergence analysis under reasonable conditions without requring a large batch size. We conduct extensive experiments to demonstrate the effectiveness of the proposed approach for downstream classification with eight fairness notions.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
From Analog to Digital: Multi-Order Digital Joint Coding-Modulation for Semantic Communication
Authors:
Guangyi Zhang,
Pujing Yang,
Yunlong Cai,
Qiyu Hu,
Guanding Yu
Abstract:
Recent studies in joint source-channel coding (JSCC) have fostered a fresh paradigm in end-to-end semantic communication. Despite notable performance achievements, present initiatives in building semantic communication systems primarily hinge on the transmission of continuous channel symbols, thus presenting challenges in compatibility with established digital systems. In this paper, we introduce…
▽ More
Recent studies in joint source-channel coding (JSCC) have fostered a fresh paradigm in end-to-end semantic communication. Despite notable performance achievements, present initiatives in building semantic communication systems primarily hinge on the transmission of continuous channel symbols, thus presenting challenges in compatibility with established digital systems. In this paper, we introduce a novel approach to address this challenge by developing a multi-order digital joint coding-modulation (MDJCM) scheme for semantic communications. Initially, we construct a digital semantic communication system by integrating a multi-order modulation/demodulation module into a nonlinear transform source-channel coding (NTSCC) framework. Recognizing the non-differentiable nature of modulation/demodulation, we propose a novel substitution training strategy. Herein, we treat modulation/demodulation as a constrained quantization process and introduce scaling operations alongside manually crafted noise to approximate this process. As a result, employing this approximation in training semantic communication systems can be deployed in practical modulation/demodulation scenarios with superior performance. Additionally, we demonstrate the equivalence by analyzing the involved probability distribution. Moreover, to further upgrade the performance, we develop a hierarchical dimension-reduction strategy to provide a gradual information extraction process. Extensive experimental evaluations demonstrate the superiority of our proposed method over existing digital and non-digital JSCC techniques.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Predictive Dynamic Fusion
Authors:
Bing Cao,
Yinan Xia,
Yi Ding,
Changqing Zhang,
Qinghua Hu
Abstract:
Multimodal fusion is crucial in joint decision-making systems for rendering holistic judgments. Since multimodal data changes in open environments, dynamic fusion has emerged and achieved remarkable progress in numerous applications. However, most existing dynamic multimodal fusion methods lack theoretical guarantees and easily fall into suboptimal problems, yielding unreliability and instability.…
▽ More
Multimodal fusion is crucial in joint decision-making systems for rendering holistic judgments. Since multimodal data changes in open environments, dynamic fusion has emerged and achieved remarkable progress in numerous applications. However, most existing dynamic multimodal fusion methods lack theoretical guarantees and easily fall into suboptimal problems, yielding unreliability and instability. To address this issue, we propose a Predictive Dynamic Fusion (PDF) framework for multimodal learning. We proceed to reveal the multimodal fusion from a generalization perspective and theoretically derive the predictable Collaborative Belief (Co-Belief) with Mono- and Holo-Confidence, which provably reduces the upper bound of generalization error. Accordingly, we further propose a relative calibration strategy to calibrate the predicted Co-Belief for potential uncertainty. Extensive experiments on multiple benchmarks confirm our superiority. Our code is available at https://github.com/Yinan-Xia/PDF.
△ Less
Submitted 13 July, 2024; v1 submitted 7 June, 2024;
originally announced June 2024.
-
GWnext 2024: Meeting Summary
Authors:
Alejandro Torres-Orjuela,
Veronica Vazquez-Aceves,
Rui Xu,
Jin-Hong Chen,
Andrea Derdzinski,
Matthias U. Kruckow,
Stefano Rinaldi,
Lorenzo Speri,
Ziming Wang,
Garvin Yim,
Xue-Ting Zhang,
Qian Hu,
Miaoxin Liu,
Xiangyu Lyu,
Zheng Wu,
Cong Zhou,
Manuel Arca Sedda,
Yan-Chen Bi,
Hong-Yu Chen,
Xian Chen,
Jiageng Jiao,
Yu-Mei Wu
Abstract:
GWnext 2024 was a meeting held in the Kavli Institute for Astronomy and Astrophysics at Peking University in March $4^\text{th} - 8^\text{th}$, 2024. In the meeting researchers at different career stages -- with a particular focus on early career scientists -- working on the different aspects of gravitational wave (GW) astronomy gathered to discuss the current status as well as prospects of the fi…
▽ More
GWnext 2024 was a meeting held in the Kavli Institute for Astronomy and Astrophysics at Peking University in March $4^\text{th} - 8^\text{th}$, 2024. In the meeting researchers at different career stages -- with a particular focus on early career scientists -- working on the different aspects of gravitational wave (GW) astronomy gathered to discuss the current status as well as prospects of the field. The meeting was divided into three core sessions: Astrophysics, GW Theory, and Detection. Each session consisted of introductory talks and extended discussion sessions. Moreover, there was a poster session where students could present their results. In this paper, we summarize the results presented during the meeting and present the most important outcomes.
△ Less
Submitted 27 May, 2024;
originally announced June 2024.
-
Noninvasive magnetic detection of 2D van der Waals room-temperature ferromagnet Fe3GaTe2 using divacancy spins in SiC
Authors:
Xia Chen,
Qin-Yue Luo,
Pei-Jie Guo,
Hao-Jie Zhou,
Qi-Cheng Hu,
Hong-Peng Wu,
Xiao-Wen Shen,
Ru-Yue Cui,
Lei Dong,
Tian-Xing Wei,
Yu-Hang Xiao,
De-Ren Li,
Li Lei,
Xi Zhang,
Jun-Feng Wang,
Gang Xiang
Abstract:
Room-temperature (RT) two-dimensional (2D) van der Waals (vdW) ferromagnets hold immense promise for next-generation spintronic devices for information storage and processing. To achieve high-density energy-efficient spintronic devices, it is essential to understand local magnetic properties of RT 2D vdW magnets. In this work, we realize noninvasive in situ magnetic detection in vdW-layered ferrom…
▽ More
Room-temperature (RT) two-dimensional (2D) van der Waals (vdW) ferromagnets hold immense promise for next-generation spintronic devices for information storage and processing. To achieve high-density energy-efficient spintronic devices, it is essential to understand local magnetic properties of RT 2D vdW magnets. In this work, we realize noninvasive in situ magnetic detection in vdW-layered ferromagnet Fe3GaTe2 using divacancy spins quantum sensor in silicon carbide (SiC) at RT. The structural features and magnetic properties of the Fe3GaTe2 are characterized utilizing Raman spectrum, magnetization and magneto-transport measurements. Further detailed analysis of temperature- and magnetic field-dependent optically detected magnetic resonances of the PL6 divacancy near the Fe3GaTe2 reveal that, the Curie temperature (Tc) of Fe3GaTe2 is ~360K, and the magnetization increases with external magnetic fields. Additionally, spin relaxometry technology is employed to probe the magnetic fluctuations of Fe3GaTe2, revealing a peak in the spin relaxation rate around Tc. These experiments give insights into the intriguing local magnetic properties of 2D vdW RT ferromagnet Fe3GaTe2 and pave the way for the application of SiC quantum sensors in noninvasive in situ magnetic detection of related 2D vdW magnets.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Hybrid Reinforcement Learning Framework for Mixed-Variable Problems
Authors:
Haoyan Zhai,
Qianli Hu,
Jiangning Chen
Abstract:
Optimization problems characterized by both discrete and continuous variables are common across various disciplines, presenting unique challenges due to their complex solution landscapes and the difficulty of navigating mixed-variable spaces effectively. To Address these challenges, we introduce a hybrid Reinforcement Learning (RL) framework that synergizes RL for discrete variable selection with…
▽ More
Optimization problems characterized by both discrete and continuous variables are common across various disciplines, presenting unique challenges due to their complex solution landscapes and the difficulty of navigating mixed-variable spaces effectively. To Address these challenges, we introduce a hybrid Reinforcement Learning (RL) framework that synergizes RL for discrete variable selection with Bayesian Optimization for continuous variable adjustment. This framework stands out by its strategic integration of RL and continuous optimization techniques, enabling it to dynamically adapt to the problem's mixed-variable nature. By employing RL for exploring discrete decision spaces and Bayesian Optimization to refine continuous parameters, our approach not only demonstrates flexibility but also enhances optimization performance. Our experiments on synthetic functions and real-world machine learning hyperparameter tuning tasks reveal that our method consistently outperforms traditional RL, random search, and standalone Bayesian optimization in terms of effectiveness and efficiency.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Hierarchical Classification Auxiliary Network for Time Series Forecasting
Authors:
Yanru Sun,
Zongxia Xie,
Dongyue Chen,
Emadeldeen Eldele,
Qinghua Hu
Abstract:
Deep learning has significantly advanced time series forecasting through its powerful capacity to capture sequence relationships. However, training these models with the Mean Square Error (MSE) loss often results in over-smooth predictions, making it challenging to handle the complexity and learn high-entropy features from time series data with high variability and unpredictability. In this work,…
▽ More
Deep learning has significantly advanced time series forecasting through its powerful capacity to capture sequence relationships. However, training these models with the Mean Square Error (MSE) loss often results in over-smooth predictions, making it challenging to handle the complexity and learn high-entropy features from time series data with high variability and unpredictability. In this work, we introduce a novel approach by tokenizing time series values to train forecasting models via cross-entropy loss, while considering the continuous nature of time series data. Specifically, we propose Hierarchical Classification Auxiliary Network, HCAN, a general model-agnostic component that can be integrated with any forecasting model. HCAN is based on a Hierarchy-Aware Attention module that integrates multi-granularity high-entropy features at different hierarchy levels. At each level, we assign a class label for timesteps to train an Uncertainty-Aware Classifier. This classifier mitigates the over-confidence in softmax loss via evidence theory. We also implement a Hierarchical Consistency Loss to maintain prediction consistency across hierarchy levels. Extensive experiments integrating HCAN with state-of-the-art forecasting models demonstrate substantial improvements over baselines on several real-world datasets. Code is available at:https://github.com/syrGitHub/HCAN.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Quantitative Certification of Bias in Large Language Models
Authors:
Isha Chaudhary,
Qian Hu,
Manoj Kumar,
Morteza Ziyadi,
Rahul Gupta,
Gagandeep Singh
Abstract:
Large Language Models (LLMs) can produce responses that exhibit social biases and support stereotypes. However, conventional benchmarking is insufficient to thoroughly evaluate LLM bias, as it can not scale to large sets of prompts and provides no guarantees. Therefore, we propose a novel certification framework QuaCer-B (Quantitative Certification of Bias) that provides formal guarantees on obtai…
▽ More
Large Language Models (LLMs) can produce responses that exhibit social biases and support stereotypes. However, conventional benchmarking is insufficient to thoroughly evaluate LLM bias, as it can not scale to large sets of prompts and provides no guarantees. Therefore, we propose a novel certification framework QuaCer-B (Quantitative Certification of Bias) that provides formal guarantees on obtaining unbiased responses from target LLMs under large sets of prompts. A certificate consists of high-confidence bounds on the probability of obtaining biased responses from the LLM for any set of prompts containing sensitive attributes, sampled from a distribution. We illustrate the bias certification in LLMs for prompts with various prefixes drawn from given distributions. We consider distributions of random token sequences, mixtures of manual jailbreaks, and jailbreaks in the LLM's embedding space to certify its bias. We certify popular LLMs with QuaCer-B and present novel insights into their biases.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Single-loop Stochastic Algorithms for Difference of Max-Structured Weakly Convex Functions
Authors:
Quanqi Hu,
Qi Qi,
Zhaosong Lu,
Tianbao Yang
Abstract:
In this paper, we study a class of non-smooth non-convex problems in the form of $\min_{x}[\max_{y\in Y}φ(x, y) - \max_{z\in Z}ψ(x, z)]$, where both $Φ(x) = \max_{y\in Y}φ(x, y)$ and $Ψ(x)=\max_{z\in Z}ψ(x, z)$ are weakly convex functions, and $φ(x, y), ψ(x, z)$ are strongly concave functions in terms of $y$ and $z$, respectively. It covers two families of problems that have been studied but are m…
▽ More
In this paper, we study a class of non-smooth non-convex problems in the form of $\min_{x}[\max_{y\in Y}φ(x, y) - \max_{z\in Z}ψ(x, z)]$, where both $Φ(x) = \max_{y\in Y}φ(x, y)$ and $Ψ(x)=\max_{z\in Z}ψ(x, z)$ are weakly convex functions, and $φ(x, y), ψ(x, z)$ are strongly concave functions in terms of $y$ and $z$, respectively. It covers two families of problems that have been studied but are missing single-loop stochastic algorithms, i.e., difference of weakly convex functions and weakly convex strongly-concave min-max problems. We propose a stochastic Moreau envelope approximate gradient method dubbed SMAG, the first single-loop algorithm for solving these problems, and provide a state-of-the-art non-asymptotic convergence rate. The key idea of the design is to compute an approximate gradient of the Moreau envelopes of $Φ, Ψ$ using only one step of stochastic gradient update of the primal and dual variables. Empirically, we conduct experiments on positive-unlabeled (PU) learning and partial area under ROC curve (pAUC) optimization with an adversarial fairness regularizer to validate the effectiveness of our proposed algorithms.
△ Less
Submitted 29 May, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
Single and Double Diffractive Production of Dilepton and Photon at LHC
Authors:
Gongming Yu,
Rabia Hameed,
Liyuan Hu,
Qiang Hu
Abstract:
We have investigated the single and double diffractive production of dileptons and photons in ultra-peripheral collisions at the Large Hadron Collider (LHC). Utilizing advanced theoretical models that integrate quantum electrodynamics (QED) and Quantum Chromodynamics (QCD) frameworks, we analyze the differential cross sections of these processes, with particular emphasis on the role of the Pomeron…
▽ More
We have investigated the single and double diffractive production of dileptons and photons in ultra-peripheral collisions at the Large Hadron Collider (LHC). Utilizing advanced theoretical models that integrate quantum electrodynamics (QED) and Quantum Chromodynamics (QCD) frameworks, we analyze the differential cross sections of these processes, with particular emphasis on the role of the Pomeron and resolved Pomeron structures, as well as resolved photon structures. Our research employs diffractive production mechanisms to predict dilepton and photon production rates under various LHC energy scenarios. Our results demonstrate distinct production patterns for single and double diffractive processes, highlighting their potential as probes for studying the electromagnetic structure of heavy ions and the dynamics of soft interactions in high-energy collisions. This paper provides new insights into the photon-mediated and Pomeron-mediated production mechanisms and sets the stage for future experimental investigations at collider facilities.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Minimizing UCB: a Better Local Search Strategy in Local Bayesian Optimization
Authors:
Zheyi Fan,
Wenyu Wang,
Szu Hui Ng,
Qingpei Hu
Abstract:
Local Bayesian optimization is a promising practical approach to solve the high dimensional black-box function optimization problem. Among them is the approximated gradient class of methods, which implements a strategy similar to gradient descent. These methods have achieved good experimental results and theoretical guarantees. However, given the distributional properties of the Gaussian processes…
▽ More
Local Bayesian optimization is a promising practical approach to solve the high dimensional black-box function optimization problem. Among them is the approximated gradient class of methods, which implements a strategy similar to gradient descent. These methods have achieved good experimental results and theoretical guarantees. However, given the distributional properties of the Gaussian processes applied on these methods, there may be potential to further exploit the information of the Gaussian processes to facilitate the BO search. In this work, we develop the relationship between the steps of the gradient descent method and one that minimizes the Upper Confidence Bound (UCB), and show that the latter can be a better strategy than direct gradient descent when a Gaussian process is applied as a surrogate. Through this insight, we propose a new local Bayesian optimization algorithm, MinUCB, which replaces the gradient descent step with minimizing UCB in GIBO. We further show that MinUCB maintains a similar convergence rate with GIBO. We then improve the acquisition function of MinUCB further through a look ahead strategy, and obtain a more efficient algorithm LA-MinUCB. We apply our algorithms on different synthetic and real-world functions, and the results show the effectiveness of our method. Our algorithms also illustrate improvements on local search strategies from an upper bound perspective in Bayesian optimization, and provides a new direction for future algorithm design.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
JointRF: End-to-End Joint Optimization for Dynamic Neural Radiance Field Representation and Compression
Authors:
Zihan Zheng,
Houqiang Zhong,
Qiang Hu,
Xiaoyun Zhang,
Li Song,
Ya Zhang,
Yanfeng Wang
Abstract:
Neural Radiance Field (NeRF) excels in photo-realistically static scenes, inspiring numerous efforts to facilitate volumetric videos. However, rendering dynamic and long-sequence radiance fields remains challenging due to the significant data required to represent volumetric videos. In this paper, we propose a novel end-to-end joint optimization scheme of dynamic NeRF representation and compressio…
▽ More
Neural Radiance Field (NeRF) excels in photo-realistically static scenes, inspiring numerous efforts to facilitate volumetric videos. However, rendering dynamic and long-sequence radiance fields remains challenging due to the significant data required to represent volumetric videos. In this paper, we propose a novel end-to-end joint optimization scheme of dynamic NeRF representation and compression, called JointRF, thus achieving significantly improved quality and compression efficiency against the previous methods. Specifically, JointRF employs a compact residual feature grid and a coefficient feature grid to represent the dynamic NeRF. This representation handles large motions without compromising quality while concurrently diminishing temporal redundancy. We also introduce a sequential feature compression subnetwork to further reduce spatial-temporal redundancy. Finally, the representation and compression subnetworks are end-to-end trained combined within the JointRF. Extensive experiments demonstrate that JointRF can achieve superior compression performance across various datasets.
△ Less
Submitted 8 June, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Exploring quantum criticality and ergodicity-breaking dynamics in spin-1 Kitaev chains via single-ion anisotropies
Authors:
Wen-Yi Zhang,
Qing-Min Hu,
Jie Ren,
Liangsheng Li,
Wen-Long You
Abstract:
We investigate the topological gauge-theory terms and quantum criticality in spin-1 Kitaev chain with generic single-ion anisotropies (SIAs). The ground-state phase diagram, including Kitaev spin liquid (KSL) phase and gapless dimer phase, is determined by the infinite time evolving block decimation (iTEBD) method. By varying the strength of uniaxial SIA, there is quantum phase transition between…
▽ More
We investigate the topological gauge-theory terms and quantum criticality in spin-1 Kitaev chain with generic single-ion anisotropies (SIAs). The ground-state phase diagram, including Kitaev spin liquid (KSL) phase and gapless dimer phase, is determined by the infinite time evolving block decimation (iTEBD) method. By varying the strength of uniaxial SIA, there is quantum phase transition between the KSL phase and the dimer phase, which is equivalent to the confinement-deconfinement transition in the lattice Schwinger model with a topological $θ$-angle of $π$. Here, we demonstrate an added rhombic SIA to the $\mathbb{Z}_2$ symmetric model shifts topological angle $θ$ away from $π$, leading to the emergence of $y$-ferroquadrupole and $x$-ferroquadrupole phases for negative and positive values of rhombic SIAs, respectively. By adjusting these rhombic SIAs, the system from the $y$-ferroquadrupole to the $x$-ferroquadrupole phases undergoes either a crossover through the KSL phase or a genuine phase transition through the deconfined line, thus providing an example for unnecessary phase transitions. We find the spin-1 Hamiltonian can be exactly mapped to the spin-1/2 effective extended PXP Hamiltonian coupled to all positive $\mathbb{Z}_2$ gauge charges, where the uniaxial SIA is equivalent to the uniform detuning while the rhombic SIA plays a role of staggered detuning. We then examine the hierarchical fragmentation of the Hilbert space and its associated dynamics in specific limiting cases. The quantum many-body scar (QMBS) arises from the Fibonacci constraint under a weak field of SIAs. When the staggered detuning is sufficiently large, the resultant slow dynamics can be nominally understood using a second-order effective Hamiltonian derived by the Schrieffer-Wolff transformation.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Single Aperture Large Telescope for Universe Studies (SALTUS): Science Overview
Authors:
Gordon Chin,
Carrie M. Anderson,
Jennifer Bergner,
Nicolas Biver,
Gordon L. Bjoraker,
Thibault Cavalie,
Michael DiSanti,
Jian-Rong Gao,
Paul Hartogh,
Leon K. Harding,
Qing Hu,
Daewook Kim,
Craig Kulesa,
Gert de Lange,
David T. Leisawitz,
Rebecca C. Levy,
Arthur Lichtenberger,
Daniel P. Marronh,
Joan Najita,
Trent Newswander,
George H. Rieke,
Dimitra Rigopoulou,
Peter Roefsema,
Nathan X. Roth,
Kamber Schwarz
, et al. (11 additional authors not shown)
Abstract:
The SALTUS Probe mission will provide a powerful far-infrared (far-IR) pointed space observatory to explore our cosmic origins and the possibility of life elsewhere. The observatory employs an innovative deployable 14-m aperture, with a sunshield that will radiatively cool the off-axis primary to <45K. This cooled primary reflector works in tandem with cryogenic coherent and incoherent instruments…
▽ More
The SALTUS Probe mission will provide a powerful far-infrared (far-IR) pointed space observatory to explore our cosmic origins and the possibility of life elsewhere. The observatory employs an innovative deployable 14-m aperture, with a sunshield that will radiatively cool the off-axis primary to <45K. This cooled primary reflector works in tandem with cryogenic coherent and incoherent instruments that span the 34 to 660 micron far-IR range at both high and moderate spectral resolutions.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Data quality control system and long-term performance monitor of the LHAASO-KM2A
Authors:
Zhen Cao,
F. Aharonian,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
W. Bian,
A. V. Bukevich,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
H. X. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. Chen
, et al. (263 additional authors not shown)
Abstract:
The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To…
▽ More
The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively.
△ Less
Submitted 13 June, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
Visible and Clear: Finding Tiny Objects in Difference Map
Authors:
Bing Cao,
Haiyu Yao,
Pengfei Zhu,
Qinghua Hu
Abstract:
Tiny object detection is one of the key challenges in the field of object detection. The performance of most generic detectors dramatically decreases in tiny object detection tasks. The main challenge lies in extracting effective features of tiny objects. Existing methods usually perform generation-based feature enhancement, which is seriously affected by spurious textures and artifacts, making it…
▽ More
Tiny object detection is one of the key challenges in the field of object detection. The performance of most generic detectors dramatically decreases in tiny object detection tasks. The main challenge lies in extracting effective features of tiny objects. Existing methods usually perform generation-based feature enhancement, which is seriously affected by spurious textures and artifacts, making it difficult to make the tiny-object-specific features visible and clear for detection. To address this issue, we propose a self-reconstructed tiny object detection (SR-TOD) framework. We for the first time introduce a self-reconstruction mechanism in the detection model, and discover the strong correlation between it and the tiny objects. Specifically, we impose a reconstruction head in-between the neck of a detector, constructing a difference map of the reconstructed image and the input, which shows high sensitivity to tiny objects. This inspires us to enhance the weak representations of tiny objects under the guidance of the difference maps. Thus, improving the visibility of tiny objects for the detectors. Building on this, we further develop a Difference Map Guided Feature Enhancement (DGFE) module to make the tiny feature representation more clear. In addition, we further propose a new multi-instance anti-UAV dataset, which is called DroneSwarms dataset and contains a large number of tiny drones with the smallest average size to date. Extensive experiments on the DroneSwarms dataset and other datasets demonstrate the effectiveness of the proposed method. The code and dataset will be publicly available.
△ Less
Submitted 11 July, 2024; v1 submitted 18 May, 2024;
originally announced May 2024.
-
Discovery of Very-high-energy Gamma-ray Emissions from the Low Luminosity AGN NGC 4278 by LHAASO
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) i…
▽ More
The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) is compatible with NGC 4278 within $\sim0.03$ degree. Variation analysis shows an indication of the variability at a few months level in the TeV band, which is consistent with low frequency observations. Based on these observations, we report the detection of TeV $γ$-ray emissions from this low-luminosity AGN NGC 4278. The observations by LHAASO-WCDA during active period has a significance level of 8.8\,$σ$ with best-fit photon spectral index $\varGamma=2.56\pm0.14$ and a flux $f_{1-10\,\rm{TeV}}=(7.0\pm1.1_{\rm{sta}}\pm0.35_{\rm{syst}})\times10^{-13}\,\rm{photons\,cm^{-2}\,s^{-1}}$, or approximately $5\%$ of the Crab Nebula. The discovery of VHE from NGC 4278 indicates that the compact, weak radio jet can efficiently accelerate particles and emit TeV photons.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Backdoor Removal for Generative Large Language Models
Authors:
Haoran Li,
Yulin Chen,
Zihao Zheng,
Qi Hu,
Chunkit Chan,
Heshan Liu,
Yangqiu Song
Abstract:
With rapid advances, generative large language models (LLMs) dominate various Natural Language Processing (NLP) tasks from understanding to reasoning. Yet, language models' inherent vulnerabilities may be exacerbated due to increased accessibility and unrestricted model training on massive textual data from the Internet. A malicious adversary may publish poisoned data online and conduct backdoor a…
▽ More
With rapid advances, generative large language models (LLMs) dominate various Natural Language Processing (NLP) tasks from understanding to reasoning. Yet, language models' inherent vulnerabilities may be exacerbated due to increased accessibility and unrestricted model training on massive textual data from the Internet. A malicious adversary may publish poisoned data online and conduct backdoor attacks on the victim LLMs pre-trained on the poisoned data. Backdoored LLMs behave innocuously for normal queries and generate harmful responses when the backdoor trigger is activated. Despite significant efforts paid to LLMs' safety issues, LLMs are still struggling against backdoor attacks. As Anthropic recently revealed, existing safety training strategies, including supervised fine-tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), fail to revoke the backdoors once the LLM is backdoored during the pre-training stage. In this paper, we present Simulate and Eliminate (SANDE) to erase the undesired backdoored mappings for generative LLMs. We initially propose Overwrite Supervised Fine-tuning (OSFT) for effective backdoor removal when the trigger is known. Then, to handle the scenarios where the trigger patterns are unknown, we integrate OSFT into our two-stage framework, SANDE. Unlike previous works that center on the identification of backdoors, our safety-enhanced LLMs are able to behave normally even when the exact triggers are activated. We conduct comprehensive experiments to show that our proposed SANDE is effective against backdoor attacks while bringing minimal harm to LLMs' powerful capability without any additional access to unbackdoored clean models. We will release the reproducible code.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Discovering hidden physics using ML-based multimodal super-resolution measurement and its application to fusion plasmas
Authors:
Azarakhsh Jalalvand,
SangKyeun Kim,
Jaemin Seo,
Qiming Hu,
Max Curie,
Peter Steiner,
Andrew Oakleigh Nelson,
Yong-Su Na,
Egemen Kolemen
Abstract:
A non-linear complex system governed by multi-spatial and multi-temporal physics scales cannot be fully understood with a single diagnostic, as each provides only a partial view and much information is lost during data extraction. Combining multiple diagnostics also results in imperfect projections of the system's physics. By identifying hidden inter-correlations between diagnostics, we can levera…
▽ More
A non-linear complex system governed by multi-spatial and multi-temporal physics scales cannot be fully understood with a single diagnostic, as each provides only a partial view and much information is lost during data extraction. Combining multiple diagnostics also results in imperfect projections of the system's physics. By identifying hidden inter-correlations between diagnostics, we can leverage mutual support to fill in these gaps, but uncovering these inter-correlations analytically is too complex. We introduce a groundbreaking machine learning methodology to address this issue. Our multimodal approach generates super resolution data encompassing multiple physics phenomena, capturing detailed structural evolution and responses to perturbations previously unobservable. This methodology addresses a critical problem in fusion plasmas: the Edge Localized Mode (ELM), a plasma instability that can severely damage reactor walls. One method to stabilize ELM is using resonant magnetic perturbation to trigger magnetic islands. However, low spatial and temporal resolution of measurements limits the analysis of these magnetic islands due to their small size, rapid dynamics, and complex interactions within the plasma. With super-resolution diagnostics, we can experimentally verify theoretical models of magnetic islands for the first time, providing unprecedented insights into their role in ELM stabilization. This advancement aids in developing effective ELM suppression strategies for future fusion reactors like ITER and has broader applications, potentially revolutionizing diagnostics in fields such as astronomy, astrophysics, and medical imaging.
△ Less
Submitted 27 June, 2024; v1 submitted 9 May, 2024;
originally announced May 2024.
-
Exploration of morphological coherence in open clusters with a "core-shell'' structure
Authors:
Qingshun Hu,
Yu Zhang,
Songmei Qin,
Jing Zhong,
Li Chen,
Yangping Luo
Abstract:
The study of their morphological coherence allows us to obtain a better understanding of the morphological evolution of open clusters. We employed the ellipsoid fitting method to delineate the 3D spatial structure of the sample clusters while using the morphological dislocation (MD) defined in our previous work and the ellipticity ratio (ER) of the clusters' inner and outer structures to character…
▽ More
The study of their morphological coherence allows us to obtain a better understanding of the morphological evolution of open clusters. We employed the ellipsoid fitting method to delineate the 3D spatial structure of the sample clusters while using the morphological dislocation (MD) defined in our previous work and the ellipticity ratio (ER) of the clusters' inner and outer structures to characterize the morphological coherence of the sample clusters. The results show an inverse correlation between the ER of the sample clusters and the number of their members, indicating that sample clusters with a much more elliptical external morphology than internal shape generally tend to host a large number of members. Meanwhile, a slight shrinking of the MD of the sample clusters with their members' number may shed light on the significant role of the gravitational binding of the sample clusters in maintaining their morphological stability. Moreover, there are no correlations between the MD and ER of the sample clusters and their age. They are also not significantly correlated with the X-axis, the Y-axis, their orbital eccentricities, and the radial and vertical forces on them. However, the ER of the sample clusters displays some fluctuations in the distributions between it and the above covariates, implying that the morphologies of the sample clusters are sensitive to the external environment if sample effects are not taken into account. Finally, the analysis of the 3D spatial shapes of sample clusters with a small ER or a large ER demonstrates that the number of members lays an important foundation for forming a dense internal system for sample clusters. At the same time, the MD of the sample clusters can serve well as an indicator of their morphological stability, which is built upon a certain amount of member stars.
△ Less
Submitted 11 May, 2024; v1 submitted 9 May, 2024;
originally announced May 2024.
-
Highest Fusion Performance without Harmful Edge Energy Bursts in Tokamak
Authors:
SangKyeun Kim,
Ricardo Shousha,
SeongMoo Yang,
Qiming Hu,
SangHee Hahn,
Azarakhsh Jalalvand,
Jong-Kyu Park,
Nikolas Christopher Logan,
Andrew Oakleigh Nelson,
Yong-Su Na,
Raffi Nazikian,
Robert Wilcox,
Rongjie Hong,
Terry Rhodes,
Carlos Paz-Soldan,
YoungMu Jeon,
MinWoo Kim,
WongHa Ko,
JongHa Lee,
Alexander Battey,
Alessandro Bortolon,
Joseph Snipes,
Egemen Kolemen
Abstract:
The path of tokamak fusion and ITER is maintaining high-performance plasma to produce sufficient fusion power. This effort is hindered by the transient energy burst arising from the instabilities at the boundary of high-confinement plasmas. The application of 3D magnetic perturbations is the method in ITER and possibly in future fusion power plants to suppress this instability and avoid energy bus…
▽ More
The path of tokamak fusion and ITER is maintaining high-performance plasma to produce sufficient fusion power. This effort is hindered by the transient energy burst arising from the instabilities at the boundary of high-confinement plasmas. The application of 3D magnetic perturbations is the method in ITER and possibly in future fusion power plants to suppress this instability and avoid energy busts damaging the device. Unfortunately, the conventional use of the 3D field in tokamaks typically leads to degraded fusion performance and an increased risk of other plasma instabilities, two severe issues for reactor implementation. In this work, we present an innovative 3D field optimization, exploiting machine learning, real-time adaptability, and multi-device capabilities to overcome these limitations. This integrated scheme is successfully deployed on DIII-D and KSTAR tokamaks, consistently achieving reactor-relevant core confinement and the highest fusion performance without triggering damaging instabilities or bursts while demonstrating ITER-relevant automated 3D optimization for the first time. This is enabled both by advances in the physics understanding of self-organized transport in the plasma edge and by advances in machine-learning technology, which is used to optimize the 3D field spectrum for automated management of a volatile and complex system. These findings establish real-time adaptive 3D field optimization as a crucial tool for ITER and future reactors to maximize fusion performance while simultaneously minimizing damage to machine components.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Possible Causes of False General Relativity Violations in Gravitational Wave Observations
Authors:
Anuradha Gupta,
K. G. Arun,
Enrico Barausse,
Laura Bernard,
Emanuele Berti,
Sajad A. Bhat,
Alessandra Buonanno,
Vitor Cardoso,
Shun Yin Cheung,
Teagan A. Clarke,
Sayantani Datta,
Arnab Dhani,
Jose María Ezquiaga,
Ish Gupta,
Nir Guttman,
Tanja Hinderer,
Qian Hu,
Justin Janquart,
Nathan K. Johnson-McDaniel,
Rahul Kashyap,
N. V. Krishnendu,
Paul D. Lasky,
Andrew Lundgren,
Elisa Maggio,
Parthapratim Mahapatra
, et al. (18 additional authors not shown)
Abstract:
General relativity (GR) has proven to be a highly successful theory of gravity since its inception. The theory has thrivingly passed numerous experimental tests, predominantly in weak gravity, low relative speeds, and linear regimes, but also in the strong-field and very low-speed regimes with binary pulsars. Observable gravitational waves (GWs) originate from regions of spacetime where gravity is…
▽ More
General relativity (GR) has proven to be a highly successful theory of gravity since its inception. The theory has thrivingly passed numerous experimental tests, predominantly in weak gravity, low relative speeds, and linear regimes, but also in the strong-field and very low-speed regimes with binary pulsars. Observable gravitational waves (GWs) originate from regions of spacetime where gravity is extremely strong, making them a unique tool for testing GR, in previously inaccessible regions of large curvature, relativistic speeds, and strong gravity. Since their first detection, GWs have been extensively used to test GR, but no deviations have been found so far. Given GR's tremendous success in explaining current astronomical observations and laboratory experiments, accepting any deviation from it requires a very high level of statistical confidence and consistency of the deviation across GW sources. In this paper, we compile a comprehensive list of potential causes that can lead to a false identification of a GR violation in standard tests of GR on data from current and future ground-based GW detectors. These causes include detector noise, signal overlaps, gaps in the data, detector calibration, source model inaccuracy, missing physics in the source and in the underlying environment model, source misidentification, and mismodeling of the astrophysical population. We also provide a rough estimate of when each of these causes will become important for tests of GR for different detector sensitivities. We argue that each of these causes should be thoroughly investigated, quantified, and ruled out before claiming a GR violation in GW observations.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
Multimodal Fusion on Low-quality Data: A Comprehensive Survey
Authors:
Qingyang Zhang,
Yake Wei,
Zongbo Han,
Huazhu Fu,
Xi Peng,
Cheng Deng,
Qinghua Hu,
Cai Xu,
Jie Wen,
Di Hu,
Changqing Zhang
Abstract:
Multimodal fusion focuses on integrating information from multiple modalities with the goal of more accurate prediction, which has achieved remarkable progress in a wide range of scenarios, including autonomous driving and medical diagnosis. However, the reliability of multimodal fusion remains largely unexplored especially under low-quality data settings. This paper surveys the common challenges…
▽ More
Multimodal fusion focuses on integrating information from multiple modalities with the goal of more accurate prediction, which has achieved remarkable progress in a wide range of scenarios, including autonomous driving and medical diagnosis. However, the reliability of multimodal fusion remains largely unexplored especially under low-quality data settings. This paper surveys the common challenges and recent advances of multimodal fusion in the wild and presents them in a comprehensive taxonomy. From a data-centric view, we identify four main challenges that are faced by multimodal fusion on low-quality data, namely (1) noisy multimodal data that are contaminated with heterogeneous noises, (2) incomplete multimodal data that some modalities are missing, (3) imbalanced multimodal data that the qualities or properties of different modalities are significantly different and (4) quality-varying multimodal data that the quality of each modality dynamically changes with respect to different samples. This new taxonomy will enable researchers to understand the state of the field and identify several potential directions. We also provide discussion for the open problems in this field together with interesting future research directions.
△ Less
Submitted 5 May, 2024; v1 submitted 27 April, 2024;
originally announced April 2024.
-
CFMW: Cross-modality Fusion Mamba for Multispectral Object Detection under Adverse Weather Conditions
Authors:
Haoyuan Li,
Qi Hu,
You Yao,
Kailun Yang,
Peng Chen
Abstract:
Cross-modality images that integrate visible-infrared spectra cues can provide richer complementary information for object detection. Despite this, existing visible-infrared object detection methods severely degrade in severe weather conditions. This failure stems from the pronounced sensitivity of visible images to environmental perturbations, such as rain, haze, and snow, which frequently cause…
▽ More
Cross-modality images that integrate visible-infrared spectra cues can provide richer complementary information for object detection. Despite this, existing visible-infrared object detection methods severely degrade in severe weather conditions. This failure stems from the pronounced sensitivity of visible images to environmental perturbations, such as rain, haze, and snow, which frequently cause false negatives and false positives in detection. To address this issue, we introduce a novel and challenging task, termed visible-infrared object detection under adverse weather conditions. To foster this task, we have constructed a new Severe Weather Visible-Infrared Dataset (SWVID) with diverse severe weather scenes. Furthermore, we introduce the Cross-modality Fusion Mamba with Weather-removal (CFMW) to augment detection accuracy in adverse weather conditions. Thanks to the proposed Weather Removal Diffusion Model (WRDM) and Cross-modality Fusion Mamba (CFM) modules, CFMW is able to mine more essential information of pedestrian features in cross-modality fusion, thus could transfer to other rarer scenarios with high efficiency and has adequate availability on those platforms with low computing power. To the best of our knowledge, this is the first study that targeted improvement and integrated both Diffusion and Mamba modules in cross-modality object detection, successfully expanding the practical application of this type of model with its higher accuracy and more advanced architecture. Extensive experiments on both well-recognized and self-created datasets conclusively demonstrate that our CFMW achieves state-of-the-art detection performance, surpassing existing benchmarks. The dataset and source code will be made publicly available at https://github.com/lhy-zjut/CFMW.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Magnetic flux induced topological superconductivity in magnetic atomic rings
Authors:
Jinpeng Xiao,
Qianglin Hu,
Xiaobing Luo
Abstract:
There have been numerous studies on topological superconductivity in magnetic atomic chains deposited on s-wave superconductors. Most of these investigations have focused on spin-orbit interactions or helical spin orders. In this paper, we propose a model for achieving one-dimensional topological superconductivity in a magnetic atomic ring. This model utilizes a magnetic field and an antiferromagn…
▽ More
There have been numerous studies on topological superconductivity in magnetic atomic chains deposited on s-wave superconductors. Most of these investigations have focused on spin-orbit interactions or helical spin orders. In this paper, we propose a model for achieving one-dimensional topological superconductivity in a magnetic atomic ring. This model utilizes a magnetic field and an antiferromagnetic/ferromagnetic order, under the condition that the magnetic field is perpendicular to the moments of the magnetic order. On a quasi-one-dimensional substrate surface, where the half-filled ring favors an antiferromagnetic configuration, we demonstrate that either the magnetic field itself or a Rashba spin-orbit coupling guarantees the perpendicularity. On a two-dimensional surface, where the ring favors ferromagnetic orders, the perpendicularity is achieved by introducing a minor Rashba spin-orbit coupling.
△ Less
Submitted 23 May, 2024; v1 submitted 23 April, 2024;
originally announced April 2024.
-
DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition
Authors:
Haozhe Cheng,
Cheng Ju,
Haicheng Wang,
Jinxiang Liu,
Mengting Chen,
Qiang Hu,
Xiaoyun Zhang,
Yanfeng Wang
Abstract:
As one of the fundamental video tasks in computer vision, Open-Vocabulary Action Recognition (OVAR) recently gains increasing attention, with the development of vision-language pre-trainings. To enable generalization of arbitrary classes, existing methods treat class labels as text descriptions, then formulate OVAR as evaluating embedding similarity between visual samples and textual classes. Howe…
▽ More
As one of the fundamental video tasks in computer vision, Open-Vocabulary Action Recognition (OVAR) recently gains increasing attention, with the development of vision-language pre-trainings. To enable generalization of arbitrary classes, existing methods treat class labels as text descriptions, then formulate OVAR as evaluating embedding similarity between visual samples and textual classes. However, one crucial issue is completely ignored: the class descriptions given by users may be noisy, e.g., misspellings and typos, limiting the real-world practicality of vanilla OVAR. To fill the research gap, this paper pioneers to evaluate existing methods by simulating multi-level noises of various types, and reveals their poor robustness. To tackle the noisy OVAR task, we further propose one novel DENOISER framework, covering two parts: generation and discrimination. Concretely, the generative part denoises noisy class-text names via one decoding process, i.e., propose text candidates, then utilize inter-modal and intra-modal information to vote for the best. At the discriminative part, we use vanilla OVAR models to assign visual samples to class-text names, thus obtaining more semantics. For optimization, we alternately iterate between generative and discriminative parts for progressive refinements. The denoised text classes help OVAR models classify visual samples more accurately; in return, classified visual samples help better denoising. On three datasets, we carry out extensive experiments to show our superior robustness, and thorough ablations to dissect the effectiveness of each component.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Enhancing Fault Detection for Large Language Models via Mutation-Based Confidence Smoothing
Authors:
Qiang Hu,
Jin Wen,
Maxime Cordy,
Yuheng Huang,
Xiaofei Xie,
Lei Ma
Abstract:
Large language models (LLMs) achieved great success in multiple application domains and attracted huge attention from different research communities recently. Unfortunately, even for the best LLM, there still exist many faults that LLM cannot correctly predict. Such faults will harm the usability of LLMs. How to quickly reveal them in LLMs is important, but challenging. The reasons are twofold, 1)…
▽ More
Large language models (LLMs) achieved great success in multiple application domains and attracted huge attention from different research communities recently. Unfortunately, even for the best LLM, there still exist many faults that LLM cannot correctly predict. Such faults will harm the usability of LLMs. How to quickly reveal them in LLMs is important, but challenging. The reasons are twofold, 1) the heavy labeling effort for preparing the test data, and 2) accessing closed-source LLMs such as GPT4 is money-required. To handle this problem, in the traditional deep learning testing field, test selection methods have been proposed for efficiently testing deep learning models by prioritizing faults. However, the usefulness of these methods on LLMs is unclear and under exploration. In this paper, we first study the effectiveness of existing fault detection methods for LLMs. Experimental results on four different tasks~(including both code tasks and natural language processing tasks) and four LLMs (e.g., LLaMA and GPT4) demonstrated that existing fault detection methods cannot perform well on LLMs (e.g., seven out of eight methods perform worse than random selection on LLaMA). To enhance existing fault detection methods, we propose MuCS, a prompt Mutation-based prediction Confidence Smoothing method for LLMs. Concretely, we mutate the prompts and compute the average prediction confidence of all mutants as the input of fault detection methods. The results show that our proposed solution significantly enhances existing methods with the improvement of test relative coverage by up to 97.64%.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective
Authors:
Wencheng Zhu,
Xin Zhou,
Pengfei Zhu,
Yu Wang,
Qinghua Hu
Abstract:
In this paper, we present a simple yet effective contrastive knowledge distillation approach, which can be formulated as a sample-wise alignment problem with intra- and inter-sample constraints. Unlike traditional knowledge distillation methods that concentrate on maximizing feature similarities or preserving class-wise semantic correlations between teacher and student features, our method attempt…
▽ More
In this paper, we present a simple yet effective contrastive knowledge distillation approach, which can be formulated as a sample-wise alignment problem with intra- and inter-sample constraints. Unlike traditional knowledge distillation methods that concentrate on maximizing feature similarities or preserving class-wise semantic correlations between teacher and student features, our method attempts to recover the "dark knowledge" by aligning sample-wise teacher and student logits. Specifically, our method first minimizes logit differences within the same sample by considering their numerical values, thus preserving intra-sample similarities. Next, we bridge semantic disparities by leveraging dissimilarities across different samples. Note that constraints on intra-sample similarities and inter-sample dissimilarities can be efficiently and effectively reformulated into a contrastive learning framework with newly designed positive and negative pairs. The positive pair consists of the teacher's and student's logits derived from an identical sample, while the negative pairs are formed by using logits from different samples. With this formulation, our method benefits from the simplicity and efficiency of contrastive learning through the optimization of InfoNCE, yielding a run-time complexity that is far less than $O(n^2)$, where $n$ represents the total number of training samples. Furthermore, our method can eliminate the need for hyperparameter tuning, particularly related to temperature parameters and large batch sizes. We conduct comprehensive experiments on three datasets including CIFAR-100, ImageNet-1K, and MS COCO. Experimental results clearly confirm the effectiveness of the proposed method on both image classification and object detection tasks. Our source codes will be publicly available at https://github.com/wencheng-zhu/CKD.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Socialized Learning: A Survey of the Paradigm Shift for Edge Intelligence in Networked Systems
Authors:
Xiaofei Wang,
Yunfeng Zhao,
Chao Qiu,
Qinghua Hu,
Victor C. M. Leung
Abstract:
Amidst the robust impetus from artificial intelligence (AI) and big data, edge intelligence (EI) has emerged as a nascent computing paradigm, synthesizing AI with edge computing (EC) to become an exemplary solution for unleashing the full potential of AI services. Nonetheless, challenges in communication costs, resource allocation, privacy, and security continue to constrain its proficiency in sup…
▽ More
Amidst the robust impetus from artificial intelligence (AI) and big data, edge intelligence (EI) has emerged as a nascent computing paradigm, synthesizing AI with edge computing (EC) to become an exemplary solution for unleashing the full potential of AI services. Nonetheless, challenges in communication costs, resource allocation, privacy, and security continue to constrain its proficiency in supporting services with diverse requirements. In response to these issues, this paper introduces socialized learning (SL) as a promising solution, further propelling the advancement of EI. SL is a learning paradigm predicated on social principles and behaviors, aimed at amplifying the collaborative capacity and collective intelligence of agents within the EI system. SL not only enhances the system's adaptability but also optimizes communication, and networking processes, essential for distributed intelligence across diverse devices and platforms. Therefore, a combination of SL and EI may greatly facilitate the development of collaborative intelligence in the future network. This paper presents the findings of a literature review on the integration of EI and SL, summarizing the latest achievements in existing research on EI and SL. Subsequently, we delve comprehensively into the limitations of EI and how it could benefit from SL. Special emphasis is placed on the communication challenges and networking strategies and other aspects within these systems, underlining the role of optimized network solutions in improving system efficacy. Based on these discussions, we elaborate in detail on three integrated components: socialized architecture, socialized training, and socialized inference, analyzing their strengths and weaknesses. Finally, we identify some possible future applications of combining SL and EI, discuss open problems and suggest some future research.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
Backdoor Attacks and Defenses on Semantic-Symbol Reconstruction in Semantic Communications
Authors:
Yuan Zhou,
Rose Qingyang Hu,
Yi Qian
Abstract:
Semantic communication is of crucial importance for the next-generation wireless communication networks. The existing works have developed semantic communication frameworks based on deep learning. However, systems powered by deep learning are vulnerable to threats such as backdoor attacks and adversarial attacks. This paper delves into backdoor attacks targeting deep learning-enabled semantic comm…
▽ More
Semantic communication is of crucial importance for the next-generation wireless communication networks. The existing works have developed semantic communication frameworks based on deep learning. However, systems powered by deep learning are vulnerable to threats such as backdoor attacks and adversarial attacks. This paper delves into backdoor attacks targeting deep learning-enabled semantic communication systems. Since current works on backdoor attacks are not tailored for semantic communication scenarios, a new backdoor attack paradigm on semantic symbols (BASS) is introduced, based on which the corresponding defense measures are designed. Specifically, a training framework is proposed to prevent BASS. Additionally, reverse engineering-based and pruning-based defense strategies are designed to protect against backdoor attacks in semantic communication. Simulation results demonstrate the effectiveness of both the proposed attack paradigm and the defense strategies.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
Approximate Wireless Communication for Lossy Gradient Updates in IoT Federated Learning
Authors:
Xiang Ma,
Haijian Sun,
Rose Qingyang Hu,
Yi Qian
Abstract:
Federated learning (FL) has emerged as a distributed machine learning (ML) technique that can protect local data privacy for participating clients and improve system efficiency. Instead of sharing raw data, FL exchanges intermediate learning parameters, such as gradients, among clients. This article presents an efficient wireless communication approach tailored for FL parameter transmission, espec…
▽ More
Federated learning (FL) has emerged as a distributed machine learning (ML) technique that can protect local data privacy for participating clients and improve system efficiency. Instead of sharing raw data, FL exchanges intermediate learning parameters, such as gradients, among clients. This article presents an efficient wireless communication approach tailored for FL parameter transmission, especially for Internet of Things (IoT) devices, to facilitate model aggregation. Our study considers practical wireless channels that can lead to random bit errors, which can substantially affect FL performance. Motivated by empirical gradient value distribution, we introduce a novel received bit masking method that confines received gradient values within prescribed limits. Moreover, given the intrinsic error resilience of ML gradients, our approach enables the delivery of approximate gradient values with errors without resorting to extensive error correction coding or retransmission. This strategy reduces computational overhead at both the transmitter and the receiver and minimizes communication latency. Consequently, our scheme is particularly well-suited for resource-constrained IoT devices. Additionally, we explore the inherent protection of the most significant bits (MSBs) through gray coding in high-order modulation. Our simulations demonstrate that our proposed scheme can effectively mitigate random bit errors in FL performance, achieving similar learning objectives, but with the 50% air time required by existing methods involving error correction and retransmission.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems
Authors:
Kaixin Li,
Yuchen Tian,
Qisheng Hu,
Ziyang Luo,
Jing Ma
Abstract:
Programming often involves converting detailed and complex specifications into code, a process during which developers typically utilize visual aids to more effectively convey concepts. While recent developments in Large Multimodal Models have demonstrated remarkable abilities in visual reasoning and mathematical tasks, there is little work on investigating whether these models can effectively int…
▽ More
Programming often involves converting detailed and complex specifications into code, a process during which developers typically utilize visual aids to more effectively convey concepts. While recent developments in Large Multimodal Models have demonstrated remarkable abilities in visual reasoning and mathematical tasks, there is little work on investigating whether these models can effectively interpret visual elements for code generation. To this end, we present MMCode, the first multi-modal coding dataset for evaluating algorithmic problem-solving skills in visually rich contexts. MMCode contains 3,548 questions and 6,620 images collected from real-world programming challenges harvested from 10 code competition websites, presenting significant challenges due to the extreme demand for reasoning abilities. Our experiment results show that current state-of-the-art models struggle to solve these problems. The results highlight the lack of powerful vision-code models, and we hope MMCode can serve as an inspiration for future works in this domain. The data and code are publicly available at https://github.com/happylkx/MMCode.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning
Authors:
Yuwei Tang,
Zhenyi Lin,
Qilong Wang,
Pengfei Zhu,
Qinghua Hu
Abstract:
Recently, pre-trained vision-language models (e.g., CLIP) have shown great potential in few-shot learning and attracted a lot of research interest. Although efforts have been made to improve few-shot ability of CLIP, key factors on the effectiveness of existing methods have not been well studied, limiting further exploration of CLIP's potential in few-shot learning. In this paper, we first introdu…
▽ More
Recently, pre-trained vision-language models (e.g., CLIP) have shown great potential in few-shot learning and attracted a lot of research interest. Although efforts have been made to improve few-shot ability of CLIP, key factors on the effectiveness of existing methods have not been well studied, limiting further exploration of CLIP's potential in few-shot learning. In this paper, we first introduce a unified formulation to analyze CLIP-based few-shot learning methods from a perspective of logit bias, which encourages us to learn an effective logit bias for further improving performance of CLIP-based few-shot learning methods. To this end, we disassemble three key components involved in computation of logit bias (i.e., logit features, logit predictor, and logit fusion) and empirically analyze the effect on performance of few-shot classification. Based on analysis of key components, this paper proposes a novel AMU-Tuning method to learn effective logit bias for CLIP-based few-shot classification. Specifically, our AMU-Tuning predicts logit bias by exploiting the appropriate $\underline{\textbf{A}}$uxiliary features, which are fed into an efficient feature-initialized linear classifier with $\underline{\textbf{M}}$ulti-branch training. Finally, an $\underline{\textbf{U}}$ncertainty-based fusion is developed to incorporate logit bias into CLIP for few-shot classification. The experiments are conducted on several widely used benchmarks, and the results show AMU-Tuning clearly outperforms its counterparts while achieving state-of-the-art performance of CLIP-based few-shot learning without bells and whistles.
△ Less
Submitted 13 April, 2024;
originally announced April 2024.
-
Net 835-Gb/s/λ Carrier- and LO-Free 100-km Transmission Using Channel-Aware Phase Retrieval Reception
Authors:
Hanzi Huang,
Haoshuo Chen,
Qian Hu,
Di Che,
Yetian Huang,
Brian Stern,
Nicolas K. Fontaine,
Mikael Mazur,
Lauren Dallachiesa,
Roland Ryf,
Zhengxuan Li,
Yingxiong Song
Abstract:
We experimentally demonstrate the first carrier- and LO-free 800G/λ receiver enabling direct compatibility with standard coherent transmitters via phase retrieval, achieving net 835-Gb/s transmission over 100-km SMF and record 8.27-b/s/Hz net optical spectral efficiency.
We experimentally demonstrate the first carrier- and LO-free 800G/λ receiver enabling direct compatibility with standard coherent transmitters via phase retrieval, achieving net 835-Gb/s transmission over 100-km SMF and record 8.27-b/s/Hz net optical spectral efficiency.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
LHAASO-KM2A detector simulation using Geant4
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (254 additional authors not shown)
Abstract:
KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with…
▽ More
KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with large altitude difference (30 m) and huge coverage (1.3 km^2). In this paper, the design of the KM2A simulation code G4KM2A based on Geant4 is introduced. The process of G4KM2A is optimized mainly in memory consumption to avoid memory overffow. Some simpliffcations are used to signiffcantly speed up the execution of G4KM2A. The running time is reduced by at least 30 times compared to full detector simulation. The particle distributions and the core/angle resolution comparison between simulation and experimental data of the full KM2A array are also presented, which show good agreement.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought
Authors:
Jooyoung Lee,
Fan Yang,
Thanh Tran,
Qian Hu,
Emre Barut,
Kai-Wei Chang,
Chengwei Su
Abstract:
We introduce a novel framework, LM-Guided CoT, that leverages a lightweight (i.e., <1B) language model (LM) for guiding a black-box large (i.e., >10B) LM in reasoning tasks. Specifically, the lightweight LM first generates a rationale for each input instance. The Frozen large LM is then prompted to predict a task output based on the rationale generated by the lightweight LM. Our approach is resour…
▽ More
We introduce a novel framework, LM-Guided CoT, that leverages a lightweight (i.e., <1B) language model (LM) for guiding a black-box large (i.e., >10B) LM in reasoning tasks. Specifically, the lightweight LM first generates a rationale for each input instance. The Frozen large LM is then prompted to predict a task output based on the rationale generated by the lightweight LM. Our approach is resource-efficient in the sense that it only requires training the lightweight LM. We optimize the model through 1) knowledge distillation and 2) reinforcement learning from rationale-oriented and task-oriented reward signals. We assess our method with multi-hop extractive question answering (QA) benchmarks, HotpotQA, and 2WikiMultiHopQA. Experimental results show that our approach outperforms all baselines regarding answer prediction accuracy. We also find that reinforcement learning helps the model to produce higher-quality rationales with improved QA performance.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.