-
SegPoint: Segment Any Point Cloud via Large Language Model
Authors:
Shuting He,
Henghui Ding,
Xudong Jiang,
Bihan Wen
Abstract:
Despite significant progress in 3D point cloud segmentation, existing methods primarily address specific tasks and depend on explicit instructions to identify targets, lacking the capability to infer and understand implicit user intentions in a unified framework. In this work, we propose a model, called SegPoint, that leverages the reasoning capabilities of a multi-modal Large Language Model (LLM)…
▽ More
Despite significant progress in 3D point cloud segmentation, existing methods primarily address specific tasks and depend on explicit instructions to identify targets, lacking the capability to infer and understand implicit user intentions in a unified framework. In this work, we propose a model, called SegPoint, that leverages the reasoning capabilities of a multi-modal Large Language Model (LLM) to produce point-wise segmentation masks across a diverse range of tasks: 1) 3D instruction segmentation, 2) 3D referring segmentation, 3) 3D semantic segmentation, and 4) 3D open-vocabulary semantic segmentation. To advance 3D instruction research, we introduce a new benchmark, Instruct3D, designed to evaluate segmentation performance from complex and implicit instructional texts, featuring 2,565 point cloud-instruction pairs. Our experimental results demonstrate that SegPoint achieves competitive performance on established benchmarks such as ScanRefer for referring segmentation and ScanNet for semantic segmentation, while delivering outstanding outcomes on the Instruct3D dataset. To our knowledge, SegPoint is the first model to address these varied segmentation tasks within a single framework, achieving satisfactory performance.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
A millisecond pulsar position determined to 0.2 milliarcsecond precision with VLBI
Authors:
Hao Ding,
Adam T. Deller,
Paulo C. C. Freire,
Leonid Petrov
Abstract:
Precise millisecond pulsar (MSP) positions determined with very long baseline interferometry (VLBI) hold the key to building the connection between the kinematic and dynamic reference frames respectively used by VLBI and pulsar timing. The frame connection would provide an important pathway to examining the planetary ephemerides used in pulsar timing, and potentially enhancing the sensitivities of…
▽ More
Precise millisecond pulsar (MSP) positions determined with very long baseline interferometry (VLBI) hold the key to building the connection between the kinematic and dynamic reference frames respectively used by VLBI and pulsar timing. The frame connection would provide an important pathway to examining the planetary ephemerides used in pulsar timing, and potentially enhancing the sensitivities of pulsar timing arrays used to detect stochastic gravitational-wave background at nano-Hz regime. We aim at significantly improving the VLBI-based MSP position from its current $\gtrsim1\,$mas precision level by reducing the two dominant components in the positional uncertainty -- the propagation-related uncertainty and the uncertainty resulting from the frequency-dependent core shifts of the reference sources. We introduce a new differential astrometry strategy of using multiple calibrators observed at several widely separated frequencies, which we call PINPT (Phase-screen Interpolation plus frequeNcy-dePendent core shifT correction; read as "pinpoint") for brevity. The strategy allows determination of the core-shift and mitigates the impact of residual delay in the atmosphere. We implemented the strategy on PSR J2222-0137, an MSP well constrained astrometrically with VLBI and pulsar timing. Using the PINPT strategy, we determined core shifts for 4 AGNs around PSR J2222-0137, and derived a VLBI-based pulsar position with uncertainty of 0.17 mas and 0.32 mas in right ascension and declination, respectively, approaching the uncertainty level of the best-determined timing-based MSP positions. The realization of the PINPT strategy promises a factor-of-5 positional precision enhancement (over conventional VLBI astrometry) for all kinds of compact radio sources observed at $\lesssim2$ GHz, including most fast radio bursts.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge
Authors:
Hao Ding,
Tuxun Lu,
Yuqian Zhang,
Ruixing Liang,
Hongchao Shu,
Lalithkumar Seenivasan,
Yonghao Long,
Qi Dou,
Cong Gao,
Mathias Unberath
Abstract:
Accurate segmentation of tools in robot-assisted surgery is critical for machine perception, as it facilitates numerous downstream tasks including augmented reality feedback. While current feed-forward neural network-based methods exhibit excellent segmentation performance under ideal conditions, these models have proven susceptible to even minor corruptions, significantly impairing the model's pe…
▽ More
Accurate segmentation of tools in robot-assisted surgery is critical for machine perception, as it facilitates numerous downstream tasks including augmented reality feedback. While current feed-forward neural network-based methods exhibit excellent segmentation performance under ideal conditions, these models have proven susceptible to even minor corruptions, significantly impairing the model's performance. This vulnerability is especially problematic in surgical settings where predictions might be used to inform high-stakes decisions. To better understand model behavior under non-adversarial corruptions, prior work has explored introducing artificial corruptions, like Gaussian noise or contrast perturbation to test set images, to assess model robustness. However, these corruptions are either not photo-realistic or model/task agnostic. Thus, these investigations provide limited insights into model deterioration under realistic surgical corruptions. To address this limitation, we introduce the SegSTRONG-C challenge that aims to promote the development of algorithms robust to unforeseen but plausible image corruptions of surgery, like smoke, bleeding, and low brightness. We collect and release corruption-free mock endoscopic video sequences for the challenge participants to train their algorithms and benchmark them on video sequences with photo-realistic non-adversarial corruptions for a binary robot tool segmentation task. This new benchmark will allow us to carefully study neural network robustness to non-adversarial corruptions of surgery, thus constituting an important first step towards more robust models for surgical computer vision. In this paper, we describe the data collection and annotation protocol, baseline evaluations of established segmentation models, and data augmentation-based techniques to enhance model robustness.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images
Authors:
Tianyu Luan,
Zhongpai Gao,
Luyuan Xie,
Abhishek Sharma,
Hao Ding,
Benjamin Planche,
Meng Zheng,
Ange Lou,
Terrence Chen,
Junsong Yuan,
Ziyan Wu
Abstract:
We introduce a novel bottom-up approach for human body mesh reconstruction, specifically designed to address the challenges posed by partial visibility and occlusion in input images. Traditional top-down methods, relying on whole-body parametric models like SMPL, falter when only a small part of the human is visible, as they require visibility of most of the human body for accurate mesh reconstruc…
▽ More
We introduce a novel bottom-up approach for human body mesh reconstruction, specifically designed to address the challenges posed by partial visibility and occlusion in input images. Traditional top-down methods, relying on whole-body parametric models like SMPL, falter when only a small part of the human is visible, as they require visibility of most of the human body for accurate mesh reconstruction. To overcome this limitation, our method employs a "Divide and Fuse (D&F)" strategy, reconstructing human body parts independently before fusing them, thereby ensuring robustness against occlusions. We design Human Part Parametric Models (HPPM) that independently reconstruct the mesh from a few shape and global-location parameters, without inter-part dependency. A specially designed fusion module then seamlessly integrates the reconstructed parts, even when only a few are visible. We harness a large volume of ground-truth SMPL data to train our parametric mesh models. To facilitate the training and evaluation of our method, we have established benchmark datasets featuring images of partially visible humans with HPPM annotations. Our experiments, conducted on these benchmark datasets, demonstrate the effectiveness of our D&F method, particularly in scenarios with substantial invisibility, where traditional approaches struggle to maintain reconstruction quality.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Strangeness-Correlations on the pseudo-critical line in (2+1)-flavor QCD
Authors:
D. Bollweg,
H. -T. Ding,
J. Goswami,
F. Karsch,
Swagato Mukherjee,
P. Petreczky,
C. Schmidt
Abstract:
We present some lattice QCD results on first ($χ_1^i$) and second ($χ_2^i$) cumulants of and correlations ($χ_{11}^{ij}$) among net baryon-number ($B$), strangeness ($S$) and electric charge ($Q$) along the pseudo-critical line ($T_{pc}(μ_B)$) in the temperature ($T$)--baryon chemical potential ($μ_B$) phase diagram of (2+1)-flavor QCD. We point out that violations of the isospin symmetric limit o…
▽ More
We present some lattice QCD results on first ($χ_1^i$) and second ($χ_2^i$) cumulants of and correlations ($χ_{11}^{ij}$) among net baryon-number ($B$), strangeness ($S$) and electric charge ($Q$) along the pseudo-critical line ($T_{pc}(μ_B)$) in the temperature ($T$)--baryon chemical potential ($μ_B$) phase diagram of (2+1)-flavor QCD. We point out that violations of the isospin symmetric limit of vanishing electric charge chemical potential are small along the $T_{pc}(μ_B)$ for the entire range of $μ_B$ covered in the RHIC beam energy scan. For the strangeness neutral matter produced in heavy-ion collisions this leads to a close relation between $χ_{11}^{BS}$ and $χ_{11}^{QS}$. We compare lattice QCD results for $χ_{11}^{BS}/χ_2^S$ along the $T_{pc}(μ_B)$ line with preliminary experimental measurements of $χ_{11}^{BS}/χ_2^S$ for collision energies $7.7~{\rm GeV}\le \sqrt{s_{_{NN}}}\le 62.4~{\rm GeV}$. While we find good agreements for $\sqrt{s_{_{NN}}}\ge 39$~GeV, differences are sizeable at smaller values of $\sqrt{s_{_{NN}}}$. Moreover, we compare lattice QCD results for the ratio of the strangeness ($μ_S$) to baryon ($μ_B$) chemical potentials, which define a strangeness neutral system with fixed electric charge to baryon number density, with experimental results obtained by the STAR collaboration for $μ_S/μ_B$ using strange baryon yields on the freeze-out line. Finally, we determine the baryon chemical potential at the freeze-out ($μ_B^f$) by comparing $χ_1^B/χ_2^B$ along the $T_{pc}(μ_B)$ with the experimentally measured net-proton cumulants $χ_1^p/χ_2^p$. We find that $\{μ_B^f, T_{pc}(μ_B^f) \}$ are consistent with the freeze-out parameters of the statistical-model fits to experimentally measured hadron yields for $\sqrt{s_{_{NN}}} \geq 11.5$ GeV.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Reproducing Kernel Hilbert Space Approach to Non-Markovian Quantum Stochastic Models
Authors:
John E. Gough,
Haijin Ding,
Nina H. Amini
Abstract:
We give a derivation of the non-Markovian quantum state diffusion equation of Di{ó}si and Strunz starting from a model of a quantum mechanical system coupled to a bosonic bath. We show that the complex trajectories arises as a consequence of using the Bargmann-Segal (complex wave) representation of the bath. In particular, we construct a reproducing kernel Hilbert space for the bath auto-correlati…
▽ More
We give a derivation of the non-Markovian quantum state diffusion equation of Di{ó}si and Strunz starting from a model of a quantum mechanical system coupled to a bosonic bath. We show that the complex trajectories arises as a consequence of using the Bargmann-Segal (complex wave) representation of the bath. In particular, we construct a reproducing kernel Hilbert space for the bath auto-correlation and realize the space of complex trajectories as a Hilbert subspace. The reproducing kernel naturally arises from a feature space where the underlying feature space is the one-particle Hilbert space of the bath quanta. We exploit this to derive the unravelling of the open quantum system dynamics and show equivalence to the equation of Di{ó}si and Strunz. We also give an explicit expression for the reduced dynamics of a two-level system coupled to the bath via a Jaynes-Cummings interaction and show that this does indeed correspond to an exact solution of the Di{ó}si-Strunz equation. Finally, we discuss the physical interpretation of the complex trajectories and show that they are intrinsically unobservable.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
PECTP: Parameter-Efficient Cross-Task Prompts for Incremental Vision Transformer
Authors:
Qian Feng,
Hanbin Zhao,
Chao Zhang,
Jiahua Dong,
Henghui Ding,
Yu-Gang Jiang,
Hui Qian
Abstract:
Incremental Learning (IL) aims to learn deep models on sequential tasks continually, where each new task includes a batch of new classes and deep models have no access to task-ID information at the inference time. Recent vast pre-trained models (PTMs) have achieved outstanding performance by prompt technique in practical IL without the old samples (rehearsal-free) and with a memory constraint (mem…
▽ More
Incremental Learning (IL) aims to learn deep models on sequential tasks continually, where each new task includes a batch of new classes and deep models have no access to task-ID information at the inference time. Recent vast pre-trained models (PTMs) have achieved outstanding performance by prompt technique in practical IL without the old samples (rehearsal-free) and with a memory constraint (memory-constrained): Prompt-extending and Prompt-fixed methods. However, prompt-extending methods need a large memory buffer to maintain an ever-expanding prompt pool and meet an extra challenging prompt selection problem. Prompt-fixed methods only learn a single set of prompts on one of the incremental tasks and can not handle all the incremental tasks effectively. To achieve a good balance between the memory cost and the performance on all the tasks, we propose a Parameter-Efficient Cross-Task Prompt (PECTP) framework with Prompt Retention Module (PRM) and classifier Head Retention Module (HRM). To make the final learned prompts effective on all incremental tasks, PRM constrains the evolution of cross-task prompts' parameters from Outer Prompt Granularity and Inner Prompt Granularity. Besides, we employ HRM to inherit old knowledge in the previously learned classifier heads to facilitate the cross-task prompts' generalization ability. Extensive experiments show the effectiveness of our method. The source codes will be available at \url{https://github.com/RAIAN08/PECTP}.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Three-dimensional Imaging of Pion using Lattice QCD: Generalized Parton Distributions
Authors:
Heng-Tong Ding,
Xiang Gao,
Swagato Mukherjee,
Peter Petreczky,
Qi Shi,
Sergey Syritsyn,
Yong Zhao
Abstract:
In this work, we report a lattice calculation of $x$-dependent valence pion generalized parton distributions (GPDs) at zero skewness with multiple values of the momentum transfer $-t$. The calculations are based on an $N_f=2+1$ gauge ensemble of highly improved staggered quarks with Wilson-Clover valence fermion. The lattice spacing is 0.04 fm, and the pion valence mass is tuned to be 300 MeV. We…
▽ More
In this work, we report a lattice calculation of $x$-dependent valence pion generalized parton distributions (GPDs) at zero skewness with multiple values of the momentum transfer $-t$. The calculations are based on an $N_f=2+1$ gauge ensemble of highly improved staggered quarks with Wilson-Clover valence fermion. The lattice spacing is 0.04 fm, and the pion valence mass is tuned to be 300 MeV. We determine the Lorentz-invariant amplitudes of the quasi-GPD matrix elements for both symmetric and asymmetric momenta transfers with similar values and show the equivalence of both frames. Then, focusing on the asymmetric frame, we utilize a hybrid scheme to renormalize the quasi-GPD matrix elements obtained from the lattice calculations. After the Fourier transforms, the quasi-GPDs are then matched to the light-cone GPDs within the framework of large momentum effective theory with improved matching, including the next-to-next-to-leading order perturbative corrections, and leading renormalon and renormalization group resummations. We also present the 3-dimensional image of the pion in impact-parameter space through the Fourier transform of the momentum transfer $-t$.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
PVUW 2024 Challenge on Complex Video Understanding: Methods and Results
Authors:
Henghui Ding,
Chang Liu,
Yunchao Wei,
Nikhila Ravi,
Shuting He,
Song Bai,
Philip Torr,
Deshui Miao,
Xin Li,
Zhenyu He,
Yaowei Wang,
Ming-Hsuan Yang,
Zhensong Xu,
Jiangtao Yao,
Chengjing Wu,
Ting Liu,
Luoqi Liu,
Xinyu Liu,
Jing Zhang,
Kexin Zhang,
Yuting Yang,
Licheng Jiao,
Shuyuan Yang,
Mingqi Gao,
Jingnan Luo
, et al. (12 additional authors not shown)
Abstract:
Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as…
▽ More
Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as the disappearance and reappearance of objects, inconspicuous small objects, heavy occlusions, and crowded environments in MOSE. Moreover, we provide a new motion expression guided video segmentation dataset MeViS to study the natural language-guided video understanding in complex environments. These new videos, sentences, and annotations enable us to foster the development of a more comprehensive and robust pixel-level understanding of video scenes in complex environments and realistic scenarios. The MOSE challenge had 140 registered teams in total, 65 teams participated the validation phase and 12 teams made valid submissions in the final challenge phase. The MeViS challenge had 225 registered teams in total, 50 teams participated the validation phase and 5 teams made valid submissions in the final challenge phase.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Preliminary Design of a General Electronics Platform for Accelerator Facilities
Authors:
Jinfu Zhu,
Hongli Ding,
Haokui Li,
Qiaoye Ran,
Xiwen Dai,
Wei Li,
Jiawei Han,
Yue Li,
Zhiyuan Zhang,
Weixin Qiu,
Weiqing Zhang
Abstract:
Many accelerators require considerable electronic systems for tests, verification, and operation. In Shenzhen Superconducting Soft X-ray Free Electron Laser (S3FEL), to meet the early tests and verification of various systems, save development expenses, and improve the reusability of hardware, firmware, and software systems, we have considered the needs of each system and preliminarily designed a…
▽ More
Many accelerators require considerable electronic systems for tests, verification, and operation. In Shenzhen Superconducting Soft X-ray Free Electron Laser (S3FEL), to meet the early tests and verification of various systems, save development expenses, and improve the reusability of hardware, firmware, and software systems, we have considered the needs of each system and preliminarily designed a general electronics platform based on MicroTCA.4. The Advanced Mezzanine Card (AMC) will place an FPGA Mezzanine Card (FMC) that supports 500 MSPS to 2 GSPS ADC/DAC. We will design two FMC cards on the Rear Transition Module (RTM), which can be used for analog signal conditioning and waveform digitization by 10 MSPS to 250 MSPS ADC/DAC or motor control. The commercial MCH, CPU, power module, and MTCA crate are deployed. This platform can also be applied to other accelerator facilities.
△ Less
Submitted 11 May, 2024;
originally announced June 2024.
-
A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models
Authors:
Xincheng Shuai,
Henghui Ding,
Xingjun Ma,
Rongcheng Tu,
Yu-Gang Jiang,
Dacheng Tao
Abstract:
Image editing aims to edit the given synthetic or real image to meet the specific requirements from users. It is widely studied in recent years as a promising and challenging field of Artificial Intelligence Generative Content (AIGC). Recent significant advancement in this field is based on the development of text-to-image (T2I) diffusion models, which generate images according to text prompts. Th…
▽ More
Image editing aims to edit the given synthetic or real image to meet the specific requirements from users. It is widely studied in recent years as a promising and challenging field of Artificial Intelligence Generative Content (AIGC). Recent significant advancement in this field is based on the development of text-to-image (T2I) diffusion models, which generate images according to text prompts. These models demonstrate remarkable generative capabilities and have become widely used tools for image editing. T2I-based image editing methods significantly enhance editing performance and offer a user-friendly interface for modifying content guided by multimodal inputs. In this survey, we provide a comprehensive review of multimodal-guided image editing techniques that leverage T2I diffusion models. First, we define the scope of image editing from a holistic perspective and detail various control signals and editing scenarios. We then propose a unified framework to formalize the editing process, categorizing it into two primary algorithm families. This framework offers a design space for users to achieve specific goals. Subsequently, we present an in-depth analysis of each component within this framework, examining the characteristics and applicable scenarios of different combinations. Given that training-based methods learn to directly map the source image to target one under user guidance, we discuss them separately, and introduce injection schemes of source image in different scenarios. Additionally, we review the application of 2D techniques to video editing, highlighting solutions for inter-frame inconsistency. Finally, we discuss open challenges in the field and suggest potential future research directions. We keep tracing related works at https://github.com/xinchengshuai/Awesome-Image-Editing.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
The expressway network design problem for multiple urban subregions based on the macroscopic fundamental diagram
Authors:
Yunran Di,
Weihua Zhang,
Haotian Shi,
Heng Ding,
Jinbiao Huo,
Bin Ran
Abstract:
As urbanization advances, cities are expanding, leading to a more decentralized urban structure and longer average commuting durations. The construction of an urban expressway system emerges as a critical strategy to tackle this challenge. However, the traditional link-level network design method faces modeling and solution challenges when dealing with the large-scale expressway network design pro…
▽ More
As urbanization advances, cities are expanding, leading to a more decentralized urban structure and longer average commuting durations. The construction of an urban expressway system emerges as a critical strategy to tackle this challenge. However, the traditional link-level network design method faces modeling and solution challenges when dealing with the large-scale expressway network design problem (ENDP). To address the challenges, this paper proposes an expressway network design method for multiple urban subregions based on the macroscopic fundamental diagram (MFD). Initially, a mixed road network traffic model that describes traffic dynamics of multiple subregions and candidate expressways is developed by integrating the MFD and the cell transmission model (CTM). Then, treating urban subregions and candidate expressways as route nodes in the mixed road network, a route choice model is established based on stochastic user equilibrium. Finally, a decision model for ENDP is proposed to minimize vehicle travel time under the construction budget constraint. The impact of financial investment and traffic demand on expressway network design schemes in the case study is explored separately. The simulation results indicate that during the initial stages of expressway planning, the construction of new expressways can significantly alleviate traffic congestion. However, as the expressway network expands further, the effectiveness of improving traffic conditions through new expressway construction gradually diminishes if traffic demand does not continue to increase. Additionally, variations in traffic demand between subregions result in different construction schemes, emphasizing the importance of adjusting budget allocations based on specific traffic demands.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
VLBA Astrometry of the Fastest-spinning Magnetar Swift J1818.0-1607: A Large Trigonometric Distance & A Small Transverse Velocity
Authors:
Hao Ding,
Marcus E. Lower,
Adam T. Deller,
Ryan M. Shannon,
Fernando Camilo,
John Sarkissian
Abstract:
In addition to being the most magnetic objects in the known universe, magnetars are the only objects observed to generate fast-radio-burst-like emissions. The formation mechanism of magnetars is still highly debated, and may potentially be probed with the magnetar velocity distribution. We carried out a 3-year-long astrometric campaign on Swift J1818.0-1607 -- the fastest-spinning magnetar, using…
▽ More
In addition to being the most magnetic objects in the known universe, magnetars are the only objects observed to generate fast-radio-burst-like emissions. The formation mechanism of magnetars is still highly debated, and may potentially be probed with the magnetar velocity distribution. We carried out a 3-year-long astrometric campaign on Swift J1818.0-1607 -- the fastest-spinning magnetar, using the Very Long Baseline Array. After applying the phase-calibrating 1D interpolation strategy, we obtained a small proper motion of 8.5 $\mathrm{mas~yr^{-1}}$ magnitude, and a parallax of $0.12\pm0.02$ mas (uncertainties at $1\,σ$ confidence throughout the Letter) for Swift J1818.0-1607. The latter is the second magnetar parallax, and is among the smallest neutron star parallaxes ever determined. From the parallax, we derived the distance $9.4^{+2.0}_{-1.6}$ kpc, which locates Swift J1818.0-1607 at the far side of the Galactic central region. Combined with the distance, the small proper motion leads to a transverse peculiar velocity $v_\perp=48^{+50}_{-16}$ $\mathrm{km~s^{-1}}$ -- a new lower limit to magnetar $v_\perp$. Incorporating previous $v_\perp$ estimates of seven other magnetars, we acquired $v_\perp=149^{+132}_{-68}$ $\mathrm{km~s^{-1}}$ for the sample of astrometrically studied magnetars, corresponding to the three-dimensional space velocity $\sim190^{+168}_{-87}$ $\mathrm{km~s^{-1}}$, smaller than the average level of young pulsars. Additionally, we found that the magnetar velocity sample does not follow the unimodal young pulsar velocity distribution reported by Hobbs et al. at $>2\,σ$ confidence, while loosely agreeing with more recent bimodal young pulsar velocity distributions derived from relatively small samples of quality astrometric determinations.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Sensing, Communication, and Control Co-design for Energy Efficient Satellite-UAV Networks
Authors:
Tianhao. Liang,
Huahao. Ding,
Yuqi. Ping,
Bin. Cao,
Tingting. Zhang,
Qinyu. Zhang
Abstract:
Traditional terrestrial communication infrastructures often fail to collect the timely information from Internet of Thing (IoT) devices in remote areas. To address this challenge, we investigate a Satellite-unmanned aerial vehicles (UAV) integrated Non-terrestrial network (NTN), where the UAV is controlled by remote control center via UAV-to-Satellite connections. To maximize the energy efficiency…
▽ More
Traditional terrestrial communication infrastructures often fail to collect the timely information from Internet of Thing (IoT) devices in remote areas. To address this challenge, we investigate a Satellite-unmanned aerial vehicles (UAV) integrated Non-terrestrial network (NTN), where the UAV is controlled by remote control center via UAV-to-Satellite connections. To maximize the energy efficiency (EE) of the UAV, we optimize the UAV trajectory, power allocation, and state sensing strategies, while guaranteing the control stability and communication reliability. This challenging problem is addressed using an efficient algorithm, incorporating a Deep Q-Network (DQN)-based trajectory determination, a closed form of power allocation, and one-dimensional searching for sensing. Numerical simulations are conducted to validate the effectiveness of our approach. The results showcase the data size of collection has a greater impact than transmission power, and reveal the relationship among sensing interval, communication maximum power and control performance. This study provides promising solutions and valuable insights for efficient data collection in remote IoT.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow
Authors:
Chaoyang Wang,
Xiangtai Li,
Lu Qi,
Henghui Ding,
Yunhai Tong,
Ming-Hsuan Yang
Abstract:
Semantic segmentation and semantic image synthesis are two representative tasks in visual perception and generation. While existing methods consider them as two distinct tasks, we propose a unified diffusion-based framework (SemFlow) and model them as a pair of reverse problems. Specifically, motivated by rectified flow theory, we train an ordinary differential equation (ODE) model to transport be…
▽ More
Semantic segmentation and semantic image synthesis are two representative tasks in visual perception and generation. While existing methods consider them as two distinct tasks, we propose a unified diffusion-based framework (SemFlow) and model them as a pair of reverse problems. Specifically, motivated by rectified flow theory, we train an ordinary differential equation (ODE) model to transport between the distributions of real images and semantic masks. As the training object is symmetric, samples belonging to the two distributions, images and semantic masks, can be effortlessly transferred reversibly. For semantic segmentation, our approach solves the contradiction between the randomness of diffusion outputs and the uniqueness of segmentation results. For image synthesis, we propose a finite perturbation approach to enhance the diversity of generated results without changing the semantic categories. Experiments show that our SemFlow achieves competitive results on semantic segmentation and semantic image synthesis tasks. We hope this simple framework will motivate people to rethink the unification of low-level and high-level vision. Project page: https://github.com/wang-chaoyang/SemFlow.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Delay Performance Analysis of Delay-Deterministic Wireless Networks with Infinite and Finite Blocklength Transmission
Authors:
Hanxue Ding,
Shaoyi Xu,
Ziheng Xu,
Rongtao Xu,
Zonghui Li,
Junhui Zhao
Abstract:
In order to achieve stable and reliable industrial manufacturing, wireless networks must meet the stringent communication requirements of industrial automation, particularly the need for deterministic low latency communication. The limited wireless resources and time-varying fading channel contribute to the random fluctuations of transmission delay, making it challenging to realize delay-determini…
▽ More
In order to achieve stable and reliable industrial manufacturing, wireless networks must meet the stringent communication requirements of industrial automation, particularly the need for deterministic low latency communication. The limited wireless resources and time-varying fading channel contribute to the random fluctuations of transmission delay, making it challenging to realize delay-deterministic wireless networks. An open challenge in this context is to model delay determinism, also known as jitter, and analyze delay performance. In this paper, we model jitter as the variance of delay and conduct a comprehensive analysis of delay performance. Specifically, we consider two transmission regimes: infinite blocklength (IBL) and finite blocklength (FBL). In the IBL regime, the distribution of the transmission delay is analyzed, and the closed-form expressions for the average delay, jitter, and delay violation probability are derived. In the FBL regime, an upper bound on the transmission delay is first approximated at a high signalto-noise ratio. Based on this upper bound, the delay distribution, delay violation probability, average delay, and jitter are derived. Finally, simulation results are provided to validate the accuracy of the analysis and derivations. Additionally, the impact of system parameters on jitter is analyzed to gain further insights.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
UnKE: Unstructured Knowledge Editing in Large Language Models
Authors:
Jingcheng Deng,
Zihao Wei,
Liang Pang,
Hanxing Ding,
Huawei Shen,
Xueqi Cheng
Abstract:
Recent knowledge editing methods have primarily focused on modifying structured knowledge in large language models, heavily relying on the assumption that structured knowledge is stored as key-value pairs locally in MLP layers or specific neurons. However, this task setting overlooks the fact that a significant portion of real-world knowledge is stored in an unstructured format, characterized by l…
▽ More
Recent knowledge editing methods have primarily focused on modifying structured knowledge in large language models, heavily relying on the assumption that structured knowledge is stored as key-value pairs locally in MLP layers or specific neurons. However, this task setting overlooks the fact that a significant portion of real-world knowledge is stored in an unstructured format, characterized by long-form content, noise, and a complex yet comprehensive nature. The "knowledge locating" and "term-driven optimization" techniques conducted from the assumption used in previous methods (e.g., MEMIT) are ill-suited for unstructured knowledge. To address these challenges, we propose a novel unstructured knowledge editing method, namely UnKE, which extends previous assumptions in the layer dimension and token dimension. Firstly, in the layer dimension, we discard the "knowledge locating" step and treat first few layers as the key, which expand knowledge storage through layers to break the "knowledge stored locally" assumption. Next, we replace "term-driven optimization" with "cause-driven optimization" across all inputted tokens in the token dimension, directly optimizing the last layer of the key generator to perform editing to generate the required key vectors. By utilizing key-value pairs at the layer level, UnKE effectively represents and edits complex and comprehensive unstructured knowledge, leveraging the potential of both the MLP and attention layers. Results on newly proposed unstructure knowledge editing dataset (UnKEBench) and traditional structured datasets demonstrate that UnKE achieves remarkable performance, surpassing strong baselines.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Cross-Domain Knowledge Distillation for Low-Resolution Human Pose Estimation
Authors:
Zejun Gu,
Zhong-Qiu Zhao,
Henghui Ding,
Hao Shen,
Zhao Zhang,
De-Shuang Huang
Abstract:
In practical applications of human pose estimation, low-resolution inputs frequently occur, and existing state-of-the-art models perform poorly with low-resolution images. This work focuses on boosting the performance of low-resolution models by distilling knowledge from a high-resolution model. However, we face the challenge of feature size mismatch and class number mismatch when applying knowled…
▽ More
In practical applications of human pose estimation, low-resolution inputs frequently occur, and existing state-of-the-art models perform poorly with low-resolution images. This work focuses on boosting the performance of low-resolution models by distilling knowledge from a high-resolution model. However, we face the challenge of feature size mismatch and class number mismatch when applying knowledge distillation to networks with different input resolutions. To address this issue, we propose a novel cross-domain knowledge distillation (CDKD) framework. In this framework, we construct a scale-adaptive projector ensemble (SAPE) module to spatially align feature maps between models of varying input resolutions. It adopts a projector ensemble to map low-resolution features into multiple common spaces and adaptively merges them based on multi-scale information to match high-resolution features. Additionally, we construct a cross-class alignment (CCA) module to solve the problem of the mismatch of class numbers. By combining an easy-to-hard training (ETHT) strategy, the CCA module further enhances the distillation performance. The effectiveness and efficiency of our approach are demonstrated by extensive experiments on two common benchmark datasets: MPII and COCO. The code is made available in supplementary material.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Prototype Design of a Digital Low-level RF System for S-band Deflectors
Authors:
J. F. Zhu,
H. L. Ding,
H. K. Li,
Y. Li,
X. W. Dai,
J. W. Han,
W. Q. Zhang
Abstract:
S-band deflectors are generally operated on pulsed mode for beam diagnosis. We plan to deploy 5 S-band (2997 MHz) deflectors to accurately measure the longitudinal time distribution of ultra-short electron beam pulses in Shenzhen Superconducting Soft X-ray Free Electron Laser (S3FEL). A microwave system of one deflector consists of a low-level RF system (LLRF), a solid-state amplifier, waveguide c…
▽ More
S-band deflectors are generally operated on pulsed mode for beam diagnosis. We plan to deploy 5 S-band (2997 MHz) deflectors to accurately measure the longitudinal time distribution of ultra-short electron beam pulses in Shenzhen Superconducting Soft X-ray Free Electron Laser (S3FEL). A microwave system of one deflector consists of a low-level RF system (LLRF), a solid-state amplifier, waveguide couplers, and a klystron, operated in pulse mode with a maximum repetition frequency of 50 Hz. Its microwave amplitude and phase stability must be better than 0.06%/0.08° (RMS). This article will introduce the prototype design of the hardware, firmware, and software of the digital LLRF system. We use homemade Local Oscillators (LOs) and commercial cards based on the MicroTCA standard in hardware design. The firmware design will use a Non-IQ demodulation and a pulse feedforward algorithm to suppress noise from high voltage of klystron. The software design is based on the EPICS control system architecture, achieving slow control and interface display functions. This report will also show some preliminary test results.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Towards Metric DBSCAN: Exact, Approximate, and Streaming Algorithms
Authors:
Guanlin Mo,
Shihong Song,
Hu Ding
Abstract:
DBSCAN is a popular density-based clustering algorithm that has many different applications in practice. However, the running time of DBSCAN in high-dimensional space or general metric space ({\em e.g.,} clustering a set of texts by using edit distance) can be as large as quadratic in the input size. Moreover, most of existing accelerating techniques for DBSCAN are only available for low-dimension…
▽ More
DBSCAN is a popular density-based clustering algorithm that has many different applications in practice. However, the running time of DBSCAN in high-dimensional space or general metric space ({\em e.g.,} clustering a set of texts by using edit distance) can be as large as quadratic in the input size. Moreover, most of existing accelerating techniques for DBSCAN are only available for low-dimensional Euclidean space. In this paper, we study the DBSCAN problem under the assumption that the inliers (the core points and border points) have a low intrinsic dimension (which is a realistic assumption for many high-dimensional applications), where the outliers can locate anywhere in the space without any assumption. First, we propose a $k$-center clustering based algorithm that can reduce the time-consuming labeling and merging tasks of DBSCAN to be linear. Further, we propose a linear time approximate DBSCAN algorithm, where the key idea is building a novel small-size summary for the core points. Also, our algorithm can be efficiently implemented for streaming data and the required memory is independent of the input size. Finally, we conduct our experiments and compare our algorithms with several popular DBSCAN algorithms. The experimental results suggest that our proposed approach can significantly reduce the computational complexity in practice.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Cooperative Route Guidance and Flow Control for Mixed Road Networks Comprising Expressway and Arterial Network
Authors:
Yunran Di,
Haotian Shi,
Weihua Zhang,
Heng Ding,
Xiaoyan Zheng,
Bin Ran
Abstract:
Facing the congestion challenges of mixed road networks comprising expressways and arterial road networks, traditional control solutions fall short. To effectively alleviate traffic congestion in mixed road networks, it is crucial to clear the interaction between expressways and arterial networks and achieve orderly coordination between them. This study employs the multi-class cell transmission mo…
▽ More
Facing the congestion challenges of mixed road networks comprising expressways and arterial road networks, traditional control solutions fall short. To effectively alleviate traffic congestion in mixed road networks, it is crucial to clear the interaction between expressways and arterial networks and achieve orderly coordination between them. This study employs the multi-class cell transmission model (CTM) combined with the macroscopic fundamental diagram (MFD) to model the traffic dynamics of expressway systems and arterial subregions, enabling vehicle path tracking across these two systems. Consequently, a comprehensive traffic transmission model suitable for mixed road networks has been integrated. Utilizing the SUMO software, a simulation platform for the mixed road network is established, and the average trip lengths within the model have been calibrated. Based on the proposed traffic model, this study constructs a route guidance model for mixed road networks and develops an integrated model predictive control (MPC) strategy that merges route guidance, perimeter control, and ramp metering to address the challenges of mixed road networks' traffic flow control. A case study of a scenario in which a bidirectional expressway connects two subregions is conducted, and the results validate the effectiveness of the proposed cooperative guidance and control (CGC) method in reducing overall congestion in mixed road networks.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Authors:
DeepSeek-AI,
Aixin Liu,
Bei Feng,
Bin Wang,
Bingxuan Wang,
Bo Liu,
Chenggang Zhao,
Chengqi Dengr,
Chong Ruan,
Damai Dai,
Daya Guo,
Dejian Yang,
Deli Chen,
Dongjie Ji,
Erhang Li,
Fangyun Lin,
Fuli Luo,
Guangbo Hao,
Guanting Chen,
Guowei Li,
H. Zhang,
Hanwei Xu,
Hao Yang,
Haowei Zhang,
Honghui Ding
, et al. (132 additional authors not shown)
Abstract:
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference…
▽ More
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models.
△ Less
Submitted 19 June, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
VLBA Astrometry of the Galactic Double Neutron Stars PSR J0509+3801 and PSR J1930-1852: A Preliminary Transverse Velocity Distribution of Double Neutron Stars and Its Implications
Authors:
Hao Ding,
Adam T. Deller,
Joseph K. Swiggum,
Ryan S. Lynch,
Shami Chatterjee,
Thomas M. Tauris
Abstract:
The mergers of double neutron stars (DNSs) systems are believed to drive the majority of short $γ$-ray bursts (SGRBs), while also serving as production sites of heavy r-process elements. Despite being key to i) confirming the nature of the extragalactic SGRBs, ii) addressing the poorly-understood r-process enrichment in the ultra-faint dwarf galaxies (UFDGs), and iii) probing the formation process…
▽ More
The mergers of double neutron stars (DNSs) systems are believed to drive the majority of short $γ$-ray bursts (SGRBs), while also serving as production sites of heavy r-process elements. Despite being key to i) confirming the nature of the extragalactic SGRBs, ii) addressing the poorly-understood r-process enrichment in the ultra-faint dwarf galaxies (UFDGs), and iii) probing the formation process of DNS systems, the space velocity distribution of DNSs is still poorly constrained due to the small number of DNSs with well-determined astrometry. In this work, we determine new proper motions and parallaxes of two Galactic DNSs -- PSR J0509+3801 and PSR J1930-1852, using the Very Long Baseline Array, and estimate the transverse velocities $v_\perp$ of all the 11 isolated Galactic DNSs having proper motion measurements in a consistent manner. Our correlation analysis reveals that the DNS $v_\perp$ is tentatively correlated with three parameters: spin period, orbital eccentricity, and companion mass. With the preliminary $v_\perp$ distribution, we obtain the following findings. Firstly, the refined $v_\perp$ distribution is confirmed to agree with the observed displacements of the localized SGRBs from their host galaxy birth sites. Secondly, we estimate that around 11% and 25% of DNSs remain gravitationally bound to UFDGs with escape velocities of 15$\mathrm{~km~s^{-1}}$ and 25$\mathrm{~km~s^{-1}}$, respectively. Hence, the retained DNSs might indeed be responsible for the r-process enrichment confirmed so far in a few UFDGs. Finally, we discuss how a future ensemble of astrometrically determined DNSs may probe the multimodality of the $v_\perp$ distribution.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Trapping polar molecules by surface acoustic waves
Authors:
Haijin Ding,
Re-Bing Wu,
Yu-xi Liu
Abstract:
We propose a method to trap polar molecules with the electrical force induced by the surface acoustic wave (SAW) on piezoelectric materials. In this approach, the electrical force is perpendicular to the moving direction of the polar molecules, and is used to control the positions of trapped polar molecules in the direction orthogonal to the acoustic transmission. By virtue of an external electric…
▽ More
We propose a method to trap polar molecules with the electrical force induced by the surface acoustic wave (SAW) on piezoelectric materials. In this approach, the electrical force is perpendicular to the moving direction of the polar molecules, and is used to control the positions of trapped polar molecules in the direction orthogonal to the acoustic transmission. By virtue of an external electrical force, the SAW-induced electrical field can trap the polar molecules into single-layer or multi-layer lattices. The arrangement of molecules can affect the binding energy and localization of the molecule array. Then the one- or two-dimensional trapped polar molecule arrays can be used to construct the Bose-Hubbard (BH) model, whose energy and dynamics are affected by the localizations of the trapped molecules. We find that the phase transitions between the superfluid and Mott insulator based on trapped polar molecule BH model can be modulated by the SAW induced electrical potential.
△ Less
Submitted 7 June, 2024; v1 submitted 27 April, 2024;
originally announced April 2024.
-
When to Trust LLMs: Aligning Confidence with Response Quality
Authors:
Shuchang Tao,
Liuyi Yao,
Hanxing Ding,
Yuexiang Xie,
Qi Cao,
Fei Sun,
Jinyang Gao,
Huawei Shen,
Bolin Ding
Abstract:
Despite the success of large language models (LLMs) in natural language generation, much evidence shows that LLMs may produce incorrect or nonsensical text. This limitation highlights the importance of discerning when to trust LLMs, especially in safety-critical domains. Existing methods often express reliability by confidence level, however, their effectiveness is limited by the lack of objective…
▽ More
Despite the success of large language models (LLMs) in natural language generation, much evidence shows that LLMs may produce incorrect or nonsensical text. This limitation highlights the importance of discerning when to trust LLMs, especially in safety-critical domains. Existing methods often express reliability by confidence level, however, their effectiveness is limited by the lack of objective guidance. To address this, we propose CONfidence-Quality-ORDer-preserving alignment approach (CONQORD), which leverages reinforcement learning guided by a tailored dual-component reward function. This function integrates quality reward and order-preserving alignment reward functions. Specifically, the order-preserving reward incentivizes the model to verbalize greater confidence for responses of higher quality to align the order of confidence and quality. Experiments demonstrate that CONQORD significantly improves the alignment performance between confidence and response accuracy, without causing over-cautious. Furthermore, the aligned confidence provided by CONQORD informs when to trust LLMs, and acts as a determinant for initiating the retrieval process of external knowledge. Aligning confidence with response quality ensures more transparent and reliable responses, providing better trustworthiness.
△ Less
Submitted 9 June, 2024; v1 submitted 26 April, 2024;
originally announced April 2024.
-
Identifying non-Hermitian critical points with quantum metric
Authors:
Jun-Feng Ren,
Jing Li,
Hai-Tao Ding,
Dan-Wei Zhang
Abstract:
The geometric properties of quantum states is fully encoded by the quantum geometric tensor. The real and imaginary parts of the quantum geometric tensor are the quantum metric and Berry curvature, which characterize the distance and phase difference between two nearby quantum states in Hilbert space, respectively. For conventional Hermitian quantum systems, the quantum metric corresponds to the f…
▽ More
The geometric properties of quantum states is fully encoded by the quantum geometric tensor. The real and imaginary parts of the quantum geometric tensor are the quantum metric and Berry curvature, which characterize the distance and phase difference between two nearby quantum states in Hilbert space, respectively. For conventional Hermitian quantum systems, the quantum metric corresponds to the fidelity susceptibility and has already been used to specify quantum phase transitions from the geometric perspective. In this work, we extend this wisdom to the non-Hermitian systems for revealing non-Hermitian critical points. To be concrete, by employing numerical exact diagonalization and analytical methods, we calculate the quantum metric and corresponding order parameters in various non-Hermitian models, which include two non-Hermitian generalized Aubry-André models and non-Hermitian cluster and mixed-field Ising models. We demonstrate that the quantum metric of eigenstates in these non-Hermitian models exactly identifies the localization transitions, mobility edges, and many-body quantum phase transitions, respectively. We further show that this strategy is robust against the finite-size effect and different boundary conditions.
△ Less
Submitted 1 May, 2024; v1 submitted 23 April, 2024;
originally announced April 2024.
-
Approximate Algorithms For $k$-Sparse Wasserstein Barycenter With Outliers
Authors:
Qingyuan Yang,
Hu Ding
Abstract:
Wasserstein Barycenter (WB) is one of the most fundamental optimization problems in optimal transportation. Given a set of distributions, the goal of WB is to find a new distribution that minimizes the average Wasserstein distance to them. The problem becomes even harder if we restrict the solution to be ``$k$-sparse''. In this paper, we study the $k$-sparse WB problem in the presence of outliers,…
▽ More
Wasserstein Barycenter (WB) is one of the most fundamental optimization problems in optimal transportation. Given a set of distributions, the goal of WB is to find a new distribution that minimizes the average Wasserstein distance to them. The problem becomes even harder if we restrict the solution to be ``$k$-sparse''. In this paper, we study the $k$-sparse WB problem in the presence of outliers, which is a more practical setting since real-world data often contains noise. Existing WB algorithms cannot be directly extended to handle the case with outliers, and thus it is urgently needed to develop some novel ideas. First, we investigate the relation between $k$-sparse WB with outliers and the clustering (with outliers) problems. In particular, we propose a clustering based LP method that yields constant approximation factor for the $k$-sparse WB with outliers problem. Further, we utilize the coreset technique to achieve the $(1+ε)$-approximation factor for any $ε>0$, if the dimensionality is not high. Finally, we conduct the experiments for our proposed algorithms and illustrate their efficiencies in practice.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
Fewer Truncations Improve Language Modeling
Authors:
Hantian Ding,
Zijian Wang,
Giovanni Paolini,
Varun Kumar,
Anoop Deoras,
Dan Roth,
Stefano Soatto
Abstract:
In large language model training, input documents are typically concatenated together and then split into sequences of equal length to avoid padding tokens. Despite its efficiency, the concatenation approach compromises data integrity -- it inevitably breaks many documents into incomplete pieces, leading to excessive truncations that hinder the model from learning to compose logically coherent and…
▽ More
In large language model training, input documents are typically concatenated together and then split into sequences of equal length to avoid padding tokens. Despite its efficiency, the concatenation approach compromises data integrity -- it inevitably breaks many documents into incomplete pieces, leading to excessive truncations that hinder the model from learning to compose logically coherent and factually consistent content that is grounded on the complete context. To address the issue, we propose Best-fit Packing, a scalable and efficient method that packs documents into training sequences through length-aware combinatorial optimization. Our method completely eliminates unnecessary truncations while retaining the same training efficiency as concatenation. Empirical results from both text and code pre-training show that our method achieves superior performance (e.g., relatively +4.7% on reading comprehension; +16.8% in context following; and +9.2% on program synthesis), and reduces closed-domain hallucination effectively by up to 58.3%.
△ Less
Submitted 2 May, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Mitigating the Curse of Dimensionality for Certified Robustness via Dual Randomized Smoothing
Authors:
Song Xia,
Yi Yu,
Xudong Jiang,
Henghui Ding
Abstract:
Randomized Smoothing (RS) has been proven a promising method for endowing an arbitrary image classifier with certified robustness. However, the substantial uncertainty inherent in the high-dimensional isotropic Gaussian noise imposes the curse of dimensionality on RS. Specifically, the upper bound of ${\ell_2}$ certified robustness radius provided by RS exhibits a diminishing trend with the expans…
▽ More
Randomized Smoothing (RS) has been proven a promising method for endowing an arbitrary image classifier with certified robustness. However, the substantial uncertainty inherent in the high-dimensional isotropic Gaussian noise imposes the curse of dimensionality on RS. Specifically, the upper bound of ${\ell_2}$ certified robustness radius provided by RS exhibits a diminishing trend with the expansion of the input dimension $d$, proportionally decreasing at a rate of $1/\sqrt{d}$. This paper explores the feasibility of providing ${\ell_2}$ certified robustness for high-dimensional input through the utilization of dual smoothing in the lower-dimensional space. The proposed Dual Randomized Smoothing (DRS) down-samples the input image into two sub-images and smooths the two sub-images in lower dimensions. Theoretically, we prove that DRS guarantees a tight ${\ell_2}$ certified robustness radius for the original input and reveal that DRS attains a superior upper bound on the ${\ell_2}$ robustness radius, which decreases proportionally at a rate of $(1/\sqrt m + 1/\sqrt n )$ with $m+n=d$. Extensive experiments demonstrate the generalizability and effectiveness of DRS, which exhibits a notable capability to integrate with established methodologies, yielding substantial improvements in both accuracy and ${\ell_2}$ certified robustness baselines of RS on the CIFAR-10 and ImageNet datasets. Code is available at https://github.com/xiasong0501/DRS.
△ Less
Submitted 15 June, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Satellite observations reveal shorter periodic inner core oscillation
Authors:
Yachong An,
Hao Ding,
Fred D. Richards,
Weiping Jiang,
Jiancheng Li,
Wenbin Shen
Abstract:
Detecting the Earth's inner core motions relative to the mantle presents a considerable challenge due to their indirect accessibility. Seismological observations initially provided evidence for differential/super-rotation of the inner core, but recently demonstrated a possibly about 70-year periodic oscillation. The contrasting results underscore the ongoing enigma surrounding inner core motion, l…
▽ More
Detecting the Earth's inner core motions relative to the mantle presents a considerable challenge due to their indirect accessibility. Seismological observations initially provided evidence for differential/super-rotation of the inner core, but recently demonstrated a possibly about 70-year periodic oscillation. The contrasting results underscore the ongoing enigma surrounding inner core motion, leaving debates unresolved, including the precise oscillate period. In parallel to seismic observations, satellite geodesy has accumulated decades of global high-precision records, providing a novel avenue to probe inner core motions. Here, we detect an about 6-year oscillation from the gravitational field degree-2 order-2 Stokes coefficients derived from satellite observations, and find it has a unique phase correlation with the about 6-year signal in the Earth's length-of-day variations. This correlation is attributed to an inner core oscillation which is controlled by the gravitational coupling between the inner core and lower mantle (mainly due to the density heterogeneity of the two large low-velocity provinces; LLVPs). That is, we independently corroborate the inner core periodic oscillation, albeit with a significantly shorter period than previously suggested. Our findings demonstrate the dense layer of the LLVPs (mean density anomalies of about +0.9 percent at the bottom), consistent with inversions from tidal tomography and Stoneley modes. Furthermore, our research reveals equatorial topographic undulations of about 187 m at the inner core boundary.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Dynamic Neural Control Flow Execution: An Agent-Based Deep Equilibrium Approach for Binary Vulnerability Detection
Authors:
Litao Li,
Steven H. H. Ding,
Andrew Walenstein,
Philippe Charland,
Benjamin C. M. Fung
Abstract:
Software vulnerabilities are a challenge in cybersecurity. Manual security patches are often difficult and slow to be deployed, while new vulnerabilities are created. Binary code vulnerability detection is less studied and more complex compared to source code, and this has important practical implications. Deep learning has become an efficient and powerful tool in the security domain, where it pro…
▽ More
Software vulnerabilities are a challenge in cybersecurity. Manual security patches are often difficult and slow to be deployed, while new vulnerabilities are created. Binary code vulnerability detection is less studied and more complex compared to source code, and this has important practical implications. Deep learning has become an efficient and powerful tool in the security domain, where it provides end-to-end and accurate prediction. Modern deep learning approaches learn the program semantics through sequence and graph neural networks, using various intermediate representation of programs, such as abstract syntax trees (AST) or control flow graphs (CFG). Due to the complex nature of program execution, the output of an execution depends on the many program states and inputs. Also, a CFG generated from static analysis can be an overestimation of the true program flow. Moreover, the size of programs often does not allow a graph neural network with fixed layers to aggregate global information. To address these issues, we propose DeepEXE, an agent-based implicit neural network that mimics the execution path of a program. We use reinforcement learning to enhance the branching decision at every program state transition and create a dynamic environment to learn the dependency between a vulnerability and certain program states. An implicitly defined neural network enables nearly infinite state transitions until convergence, which captures the structural information at a higher level. The experiments are conducted on two semi-synthetic and two real-world datasets. We show that DeepEXE is an accurate and efficient method and outperforms the state-of-the-art vulnerability detection methods.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models
Authors:
Zihao Wei,
Jingcheng Deng,
Liang Pang,
Hanxing Ding,
Huawei Shen,
Xueqi Cheng
Abstract:
The extensive utilization of large language models (LLMs) underscores the crucial necessity for precise and contemporary knowledge embedded within their intrinsic parameters. Existing research on knowledge editing primarily concentrates on monolingual scenarios, neglecting the complexities presented by multilingual contexts and multi-hop reasoning. To address these challenges, our study introduces…
▽ More
The extensive utilization of large language models (LLMs) underscores the crucial necessity for precise and contemporary knowledge embedded within their intrinsic parameters. Existing research on knowledge editing primarily concentrates on monolingual scenarios, neglecting the complexities presented by multilingual contexts and multi-hop reasoning. To address these challenges, our study introduces MLaKE (Multilingual Language Knowledge Editing), a novel benchmark comprising 4072 multi-hop and 5360 single-hop questions designed to evaluate the adaptability of knowledge editing methods across five languages: English, Chinese, Japanese, French, and German. MLaKE aggregates fact chains from Wikipedia across languages and utilizes LLMs to generate questions in both free-form and multiple-choice. We evaluate the multilingual knowledge editing generalization capabilities of existing methods on MLaKE. Existing knowledge editing methods demonstrate higher success rates in English samples compared to other languages. However, their generalization capabilities are limited in multi-language experiments. Notably, existing knowledge editing methods often show relatively high generalization for languages within the same language family compared to languages from different language families. These results underscore the imperative need for advancements in multilingual knowledge editing and we hope MLaKE can serve as a valuable resource for benchmarking and solution development.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Possible charge density wave induced lattice distortion in ferromagnetic FeGe film
Authors:
Guangdong Nie,
Guanghui Han,
Erfa S. Z.,
Shijian Chen,
Hao Ding,
Fangdong Tang,
Licong Peng,
Young Sun,
Deshun Hong
Abstract:
Binary compound FeGe hosts multiple structures, where skyrmion lattice emerges in the chiral B20 phase and antiferromagnet with charge density wave shows up in the hexagonal phase. Here, we synthesized monoclinic FeGe films which are ferromagnetic with Curie temperature as high as 800 K. By low temperature transmission electron microscope, lattice reconstructions in both real and reciprocal space…
▽ More
Binary compound FeGe hosts multiple structures, where skyrmion lattice emerges in the chiral B20 phase and antiferromagnet with charge density wave shows up in the hexagonal phase. Here, we synthesized monoclinic FeGe films which are ferromagnetic with Curie temperature as high as 800 K. By low temperature transmission electron microscope, lattice reconstructions in both real and reciprocal space were captured at 100 K whereas no observable transition was observed in either transport nor magnetic characterizations. We infer the lattice distortion may be induced by charge density wave. Our work suggests FeGe films an ideal platform for understanding the intertwining of charge density wave, lattice distortion and magnetism, and paves the way to the tuning charge density wave by means of lattice engineering.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
QCD Predictions for Meson Electromagnetic Form Factors at High Momenta: Testing Factorization in Exclusive Processes
Authors:
Heng-Tong Ding,
Xiang Gao,
Andrew D. Hanlon,
Swagato Mukherjee,
Peter Petreczky,
Qi Shi,
Sergey Syritsyn,
Rui Zhang,
Yong Zhao
Abstract:
We report the first lattice QCD computation of pion and kaon electromagnetic form factors, $F_M(Q^2)$, at large momentum transfer up to 10 and 28 $\mathrm{GeV}^2$, respectively. Utilizing physical masses and two fine lattices, we achieve good agreement with JLab experimental results at $Q^2 \lesssim 4~\mathrm{GeV}^2$. For $Q^2 \gtrsim 4~\mathrm{GeV}^2$, our results provide $\textit{ab-initio}$ QCD…
▽ More
We report the first lattice QCD computation of pion and kaon electromagnetic form factors, $F_M(Q^2)$, at large momentum transfer up to 10 and 28 $\mathrm{GeV}^2$, respectively. Utilizing physical masses and two fine lattices, we achieve good agreement with JLab experimental results at $Q^2 \lesssim 4~\mathrm{GeV}^2$. For $Q^2 \gtrsim 4~\mathrm{GeV}^2$, our results provide $\textit{ab-initio}$ QCD benchmarks for the forthcoming experiments at JLab 12 GeV and future electron-ion colliders. We also test the QCD collinear factorization framework utilizing our high-$Q^2$ form factors at next-to-next-to-leading order in perturbation theory, which relates the form factors to the leading Fock-state meson distribution amplitudes. Comparisons with independent lattice QCD calculations using the same framework demonstrate, within estimated uncertainties, the universality of these nonperturbative quantities.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation
Authors:
Shuting He,
Henghui Ding
Abstract:
Referring video segmentation relies on natural language expressions to identify and segment objects, often emphasizing motion clues. Previous works treat a sentence as a whole and directly perform identification at the video-level, mixing up static image-level cues with temporal motion cues. However, image-level features cannot well comprehend motion cues in sentences, and static cues are not cruc…
▽ More
Referring video segmentation relies on natural language expressions to identify and segment objects, often emphasizing motion clues. Previous works treat a sentence as a whole and directly perform identification at the video-level, mixing up static image-level cues with temporal motion cues. However, image-level features cannot well comprehend motion cues in sentences, and static cues are not crucial for temporal perception. In fact, static cues can sometimes interfere with temporal perception by overshadowing motion cues. In this work, we propose to decouple video-level referring expression understanding into static and motion perception, with a specific emphasis on enhancing temporal comprehension. Firstly, we introduce an expression-decoupling module to make static cues and motion cues perform their distinct role, alleviating the issue of sentence embeddings overlooking motion cues. Secondly, we propose a hierarchical motion perception module to capture temporal information effectively across varying timescales. Furthermore, we employ contrastive learning to distinguish the motions of visually similar objects. These contributions yield state-of-the-art performance across five datasets, including a remarkable $\textbf{9.2%}$ $\mathcal{J\&F}$ improvement on the challenging $\textbf{MeViS}$ dataset. Code is available at https://github.com/heshuting555/DsHmp.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
A Generative Deep Learning Approach for Crash Severity Modeling with Imbalanced Data
Authors:
Junlan Chen,
Ziyuan Pu,
Nan Zheng,
Xiao Wen,
Hongliang Ding,
Xiucheng Guo
Abstract:
Crash data is often greatly imbalanced, with the majority of crashes being non-fatal crashes, and only a small number being fatal crashes due to their rarity. Such data imbalance issue poses a challenge for crash severity modeling since it struggles to fit and interpret fatal crash outcomes with very limited samples. Usually, such data imbalance issues are addressed by data resampling methods, suc…
▽ More
Crash data is often greatly imbalanced, with the majority of crashes being non-fatal crashes, and only a small number being fatal crashes due to their rarity. Such data imbalance issue poses a challenge for crash severity modeling since it struggles to fit and interpret fatal crash outcomes with very limited samples. Usually, such data imbalance issues are addressed by data resampling methods, such as under-sampling and over-sampling techniques. However, most traditional and deep learning-based data resampling methods, such as synthetic minority oversampling technique (SMOTE) and generative Adversarial Networks (GAN) are designed dedicated to processing continuous variables. Though some resampling methods have improved to handle both continuous and discrete variables, they may have difficulties in dealing with the collapse issue associated with sparse discrete risk factors. Moreover, there is a lack of comprehensive studies that compare the performance of various resampling methods in crash severity modeling. To address the aforementioned issues, the current study proposes a crash data generation method based on the Conditional Tabular GAN. After data balancing, a crash severity model is employed to estimate the performance of classification and interpretation. A comparative study is conducted to assess classification accuracy and distribution consistency of the proposed generation method using a 4-year imbalanced crash dataset collected in Washington State, U.S. Additionally, Monte Carlo simulation is employed to estimate the performance of parameter and probability estimation in both two- and three-class imbalance scenarios. The results indicate that using synthetic data generated by CTGAN-RU for crash severity modeling outperforms using original data or synthetic data generated by other resampling methods.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Learning Trimaps via Clicks for Image Matting
Authors:
Chenyi Zhang,
Yihan Hu,
Henghui Ding,
Humphrey Shi,
Yao Zhao,
Yunchao Wei
Abstract:
Despite significant advancements in image matting, existing models heavily depend on manually-drawn trimaps for accurate results in natural image scenarios. However, the process of obtaining trimaps is time-consuming, lacking user-friendliness and device compatibility. This reliance greatly limits the practical application of all trimap-based matting methods. To address this issue, we introduce Cl…
▽ More
Despite significant advancements in image matting, existing models heavily depend on manually-drawn trimaps for accurate results in natural image scenarios. However, the process of obtaining trimaps is time-consuming, lacking user-friendliness and device compatibility. This reliance greatly limits the practical application of all trimap-based matting methods. To address this issue, we introduce Click2Trimap, an interactive model capable of predicting high-quality trimaps and alpha mattes with minimal user click inputs. Through analyzing real users' behavioral logic and characteristics of trimaps, we successfully propose a powerful iterative three-class training strategy and a dedicated simulation function, making Click2Trimap exhibit versatility across various scenarios. Quantitative and qualitative assessments on synthetic and real-world matting datasets demonstrate Click2Trimap's superior performance compared to all existing trimap-free matting methods. Especially, in the user study, Click2Trimap achieves high-quality trimap and matting predictions in just an average of 5 seconds per image, demonstrating its substantial practical value in real-world applications.
△ Less
Submitted 6 April, 2024; v1 submitted 30 March, 2024;
originally announced April 2024.
-
Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment
Authors:
Li Siyao,
Tianpei Gu,
Zhitao Yang,
Zhengyu Lin,
Ziwei Liu,
Henghui Ding,
Lei Yang,
Chen Change Loy
Abstract:
We introduce a novel task within the field of 3D dance generation, termed dance accompaniment, which necessitates the generation of responsive movements from a dance partner, the "follower", synchronized with the lead dancer's movements and the underlying musical rhythm. Unlike existing solo or group dance generation tasks, a duet dance scenario entails a heightened degree of interaction between t…
▽ More
We introduce a novel task within the field of 3D dance generation, termed dance accompaniment, which necessitates the generation of responsive movements from a dance partner, the "follower", synchronized with the lead dancer's movements and the underlying musical rhythm. Unlike existing solo or group dance generation tasks, a duet dance scenario entails a heightened degree of interaction between the two participants, requiring delicate coordination in both pose and position. To support this task, we first build a large-scale and diverse duet interactive dance dataset, DD100, by recording about 117 minutes of professional dancers' performances. To address the challenges inherent in this task, we propose a GPT-based model, Duolando, which autoregressively predicts the subsequent tokenized motion conditioned on the coordinated information of the music, the leader's and the follower's movements. To further enhance the GPT's capabilities of generating stable results on unseen conditions (music and leader motions), we devise an off-policy reinforcement learning strategy that allows the model to explore viable trajectories from out-of-distribution samplings, guided by human-defined rewards. Based on the collected dataset and proposed method, we establish a benchmark with several carefully designed metrics.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
RakutenAI-7B: Extending Large Language Models for Japanese
Authors:
Rakuten Group,
Aaron Levine,
Connie Huang,
Chenguang Wang,
Eduardo Batista,
Ewa Szymanska,
Hongyi Ding,
Hou Wei Chou,
Jean-François Pessiot,
Johanes Effendi,
Justin Chiu,
Kai Torben Ohlhus,
Karan Chopra,
Keiji Shinzato,
Koji Murakami,
Lee Xiong,
Lei Chen,
Maki Kubota,
Maksim Tkachenko,
Miroku Lee,
Naoki Takahashi,
Prathyusha Jwalapuram,
Ryutaro Tatsushima,
Saurabh Jain,
Sunil Kumar Yadav
, et al. (5 additional authors not shown)
Abstract:
We introduce RakutenAI-7B, a suite of Japanese-oriented large language models that achieve the best performance on the Japanese LM Harness benchmarks among the open 7B models. Along with the foundation model, we release instruction- and chat-tuned models, RakutenAI-7B-instruct and RakutenAI-7B-chat respectively, under the Apache 2.0 license.
We introduce RakutenAI-7B, a suite of Japanese-oriented large language models that achieve the best performance on the Japanese LM Harness benchmarks among the open 7B models. Along with the foundation model, we release instruction- and chat-tuned models, RakutenAI-7B-instruct and RakutenAI-7B-chat respectively, under the Apache 2.0 license.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Direct Probe of Topology and Geometry of Quantum States on IBM Q
Authors:
Tianqi Chen,
Hai-Tao Ding,
Ruizhe Shen,
Shi-Liang Zhu,
Jiangbin Gong
Abstract:
The concepts of topology and geometry are of critical importance in exploring exotic phases of quantum matter. Though they have been investigated on various experimental platforms, to date a direct probe of topological and geometric properties on a universal quantum computer even for a minimum model is still in vain. In this work, we first show that a density matrix form of the quantum geometric t…
▽ More
The concepts of topology and geometry are of critical importance in exploring exotic phases of quantum matter. Though they have been investigated on various experimental platforms, to date a direct probe of topological and geometric properties on a universal quantum computer even for a minimum model is still in vain. In this work, we first show that a density matrix form of the quantum geometric tensor (QGT) can be explicitly re-constructed from Pauli operator measurements on a quantum circuit. We then propose two algorithms, suitable for IBM quantum computers, to directly probe QGT. The first algorithm is a variational quantum algorithm particularly suitable for Noisy Intermediate-Scale Quantum (NISQ)-era devices, whereas the second one is a pure quantum algorithm based on quantum imaginary time evolution. Explicit results obtained from IBM Q simulating a Chern insulator model are presented and analysed. Our results indicate that transmon qubit-based universal quantum computers have the potential to directly simulate and investigate topological and geometric properties of a quantum system.
△ Less
Submitted 6 June, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
LERENet: Eliminating Intra-class Differences for Metal Surface Defect Few-shot Semantic Segmentation
Authors:
Hanze Ding,
Zhangkai Wu,
Jiyan Zhang,
Ming Ping,
Yanfang Liu
Abstract:
Few-shot segmentation models excel in metal defect detection due to their rapid generalization ability to new classes and pixel-level segmentation, rendering them ideal for addressing data scarcity issues and achieving refined object delineation in industrial applications. Existing works neglect the \textit{Intra-Class Differences}, inherent in metal surface defect data, which hinders the model fr…
▽ More
Few-shot segmentation models excel in metal defect detection due to their rapid generalization ability to new classes and pixel-level segmentation, rendering them ideal for addressing data scarcity issues and achieving refined object delineation in industrial applications. Existing works neglect the \textit{Intra-Class Differences}, inherent in metal surface defect data, which hinders the model from learning sufficient knowledge from the support set to guide the query set segmentation. Specifically, it can be categorized into two types: the \textit{Semantic Difference} induced by internal factors in metal samples and the \textit{Distortion Difference} caused by external factors of surroundings. To address these differences, we introduce a \textbf{L}ocal d\textbf{E}scriptor based \textbf{R}easoning and \textbf{E}xcitation \textbf{Net}work (\textbf{LERENet}) to learn the two-view guidance, i.e., local and global information from the graph and feature space, and fuse them to segment precisely. Since the relation structure of local features embedded in graph space will help to eliminate \textit{Semantic Difference}, we employ Multi-Prototype Reasoning (MPR) module, extracting local descriptors based prototypes and analyzing local-view feature relevance in support-query pairs. Besides, due to the global information that will assist in countering the \textit{Distortion Difference} in observations, we utilize Multi-Prototype Excitation (MPE) module to capture the global-view relations in support-query pairs. Finally, we employ an Information Fusion Module (IFM) to fuse learned prototypes in local and global views to generate pixel-level masks. Our comprehensive experiments on defect datasets demonstrate that it outperforms existing benchmarks, establishing a new state-of-the-art.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
An Empirical Study on Developers Shared Conversations with ChatGPT in GitHub Pull Requests and Issues
Authors:
Huizi Hao,
Kazi Amit Hasan,
Hong Qin,
Marcos Macedo,
Yuan Tian,
Steven H. H. Ding,
Ahmed E. Hassan
Abstract:
ChatGPT has significantly impacted software development practices, providing substantial assistance to developers in a variety of tasks, including coding, testing, and debugging. Despite its widespread adoption, the impact of ChatGPT as an assistant in collaborative coding remains largely unexplored. In this paper, we analyze a dataset of 210 and 370 developers shared conversations with ChatGPT in…
▽ More
ChatGPT has significantly impacted software development practices, providing substantial assistance to developers in a variety of tasks, including coding, testing, and debugging. Despite its widespread adoption, the impact of ChatGPT as an assistant in collaborative coding remains largely unexplored. In this paper, we analyze a dataset of 210 and 370 developers shared conversations with ChatGPT in GitHub pull requests (PRs) and issues. We manually examined the content of the conversations and characterized the dynamics of the sharing behavior, i.e., understanding the rationale behind the sharing, identifying the locations where the conversations were shared, and determining the roles of the developers who shared them. Our main observations are: (1) Developers seek ChatGPT assistance across 16 types of software engineering inquiries. In both conversations shared in PRs and issues, the most frequently encountered inquiry categories include code generation, conceptual questions, how-to guides, issue resolution, and code review. (2) Developers frequently engage with ChatGPT via multi-turn conversations where each prompt can fulfill various roles, such as unveiling initial or new tasks, iterative follow-up, and prompt refinement. Multi-turn conversations account for 33.2% of the conversations shared in PRs and 36.9% in issues. (3) In collaborative coding, developers leverage shared conversations with ChatGPT to facilitate their role-specific contributions, whether as authors of PRs or issues, code reviewers, or collaborators on issues. Our work serves as the first step towards understanding the dynamics between developers and ChatGPT in collaborative software development and opens up new directions for future research on the topic.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Explore In-Context Segmentation via Latent Diffusion Models
Authors:
Chaoyang Wang,
Xiangtai Li,
Henghui Ding,
Lu Qi,
Jiangning Zhang,
Yunhai Tong,
Chen Change Loy,
Shuicheng Yan
Abstract:
In-context segmentation has drawn more attention with the introduction of vision foundation models. Most existing approaches adopt metric learning or masked image modeling to build the correlation between visual prompts and input image queries. In this work, we explore this problem from a new perspective, using one representative generation model, the latent diffusion model (LDM). We observe a tas…
▽ More
In-context segmentation has drawn more attention with the introduction of vision foundation models. Most existing approaches adopt metric learning or masked image modeling to build the correlation between visual prompts and input image queries. In this work, we explore this problem from a new perspective, using one representative generation model, the latent diffusion model (LDM). We observe a task gap between generation and segmentation in diffusion models, but LDM is still an effective minimalist for in-context segmentation. In particular, we propose two meta-architectures and correspondingly design several output alignment and optimization strategies. We have conducted comprehensive ablation studies and empirically found that the segmentation quality counts on output alignment and in-context instructions. Moreover, we build a new and fair in-context segmentation benchmark that includes both image and video datasets. Experiments validate the efficiency of our approach, demonstrating comparable or even stronger results than previous specialist models or visual foundation models. Our study shows that LDMs can also achieve good enough results for challenging in-context segmentation tasks.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Curvature of the chiral phase transition line from the magnetic equation of state of (2+1)-flavor QCD
Authors:
H. -T. Ding,
O. Kaczmarek,
F. Karsch,
P. Petreczky,
Mugdha Sarkar,
C. Schmidt,
Sipaz Sharma
Abstract:
We analyze the dependence of the chiral phase transition temperature on baryon number and strangeness chemical potentials by calculating the leading order curvature coefficients in the light and strange quark flavor basis as well as in the conserved charge ($B, S$) basis. Making use of scaling properties of the magnetic equation of state (MEoS) and including diagonal as well as off-diagonal contri…
▽ More
We analyze the dependence of the chiral phase transition temperature on baryon number and strangeness chemical potentials by calculating the leading order curvature coefficients in the light and strange quark flavor basis as well as in the conserved charge ($B, S$) basis. Making use of scaling properties of the magnetic equation of state (MEoS) and including diagonal as well as off-diagonal contributions in the expansion of the energy-like scaling variable that enters the parametrization of the MEoS, allows to explore the variation of $T_c(μ_B,μ_S) = T_c ( 1 - (κ_2^B \hatμ_B^2 + κ_2^S \hatμ_S^2 + 2κ_{11}^{BS} \hatμ_B \hatμ_S))$ along different lines in the $(μ_B,μ_S)$ plane. On lattices with fixed cut-off in units of temperature, $aT=1/8$, we find $κ_2^B=0.015(1)$, $κ_2^S=0.0124(5)$ and $κ_{11}^{BS}=-0.0050(7)$. We show that the chemical potential dependence along the line of vanishing strangeness chemical potential is about 10\% larger than along the strangeness neutral line. The latter differs only by about $3\%$ from the curvature on a line of vanishing strange quark chemical potential, $μ_s=0$. We also show that close to the chiral limit the strange quark mass contributes like an energy-like variable in scaling relations for pseudo-critical temperatures. The chiral phase transition temperature decreases with decreasing strange quark mass, $T_c(m_s)= T_c(m_s^{\rm phy}) (1 - 0.097(2) (m_s-m_s^{\rm phys})/m_s^{\rm phy}+{\cal O}((Δm_s)^2)$.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Bifurcated Attention: Accelerating Massively Parallel Decoding with Shared Prefixes in LLMs
Authors:
Ben Athiwaratkun,
Sujan Kumar Gonugondla,
Sanjay Krishna Gouda,
Haifeng Qian,
Hantian Ding,
Qing Sun,
Jun Wang,
Jiacheng Guo,
Liangfu Chen,
Parminder Bhatia,
Ramesh Nallapati,
Sudipta Sengupta,
Bing Xiang
Abstract:
This study introduces bifurcated attention, a method designed to enhance language model inference in shared-context batch decoding scenarios. Our approach addresses the challenge of redundant memory IO costs, a critical factor contributing to latency in high batch sizes and extended context lengths. Bifurcated attention achieves this by strategically dividing the attention mechanism during increme…
▽ More
This study introduces bifurcated attention, a method designed to enhance language model inference in shared-context batch decoding scenarios. Our approach addresses the challenge of redundant memory IO costs, a critical factor contributing to latency in high batch sizes and extended context lengths. Bifurcated attention achieves this by strategically dividing the attention mechanism during incremental decoding into two separate GEMM operations: one focusing on the KV cache from prefill, and another on the decoding process itself. While maintaining the computational load (FLOPs) of standard attention mechanisms, bifurcated attention ensures precise computation with significantly reduced memory IO. Our empirical results show over 2.1$\times$ speedup when sampling 16 output sequences and more than 6.2$\times$ speedup when sampling 32 sequences at context lengths exceeding 8k tokens on a 7B model that uses multi-head attention. The efficiency gains from bifurcated attention translate into lower latency, making it particularly suitable for real-time applications. For instance, it enables massively parallel answer generation without substantially increasing latency, thus enhancing performance when integrated with post-processing techniques such as re-ranking.
△ Less
Submitted 11 July, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
Automating SBOM Generation with Zero-Shot Semantic Similarity
Authors:
Devin Pereira,
Christopher Molloy,
Sudipta Acharya,
Steven H. H. Ding
Abstract:
It is becoming increasingly important in the software industry, especially with the growing complexity of software ecosystems and the emphasis on security and compliance for manufacturers to inventory software used on their systems. A Software-Bill-of-Materials (SBOM) is a comprehensive inventory detailing a software application's components and dependencies. Current approaches rely on case-based…
▽ More
It is becoming increasingly important in the software industry, especially with the growing complexity of software ecosystems and the emphasis on security and compliance for manufacturers to inventory software used on their systems. A Software-Bill-of-Materials (SBOM) is a comprehensive inventory detailing a software application's components and dependencies. Current approaches rely on case-based reasoning to inconsistently identify the software components embedded in binary files. We propose a different route, an automated method for generating SBOMs to prevent disastrous supply-chain attacks. Remaining on the topic of static code analysis, we interpret this problem as a semantic similarity task wherein a transformer model can be trained to relate a product name to corresponding version strings. Our test results are compelling, demonstrating the model's strong performance in the zero-shot classification task, further demonstrating the potential for use in a real-world cybersecurity context.
△ Less
Submitted 3 February, 2024;
originally announced March 2024.
-
Detecting degenerate bands topological invariants in optical lattice
Authors:
Jing-Xin Liu,
Jian-Te Wang,
Hai-Tao Ding
Abstract:
In this paper, we present a novel experimental approach for simulating and detecting topological invariants using ultracold fermions confined in two-dimensional hexagonal optical lattices. We propose achieving two-fold degenerate four-band models with non-trivial topologies in both the AII and A classes by introducing additional inertial forces, Raman processes, or periodic driving. By implementin…
▽ More
In this paper, we present a novel experimental approach for simulating and detecting topological invariants using ultracold fermions confined in two-dimensional hexagonal optical lattices. We propose achieving two-fold degenerate four-band models with non-trivial topologies in both the AII and A classes by introducing additional inertial forces, Raman processes, or periodic driving. By implementing various quench sequences and observing the evolution of the time-of-flight pattern, we can gather comprehensive information about the ground states and determine the topology of the valence bands. Through the analysis of tomographic results, we are able to extract and calculate the spin Chern number for both spin-conserving and spin-nonconserving cases. Additionally, we demonstrate the robustness of the quantized topological invariants and discuss the effects of various experimental parameters.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
VLBI Astrometry of Radio Stars to Link Radio and Optical Celestial Reference Frames: Observing Strategies
Authors:
Jingdong Zhang,
Bo Zhang,
Shuangjing Xu,
Niu Liu,
Wen Chen,
Hao Ding,
Pengfei Jiang,
Yan Sun,
Jinqing Wang,
Lang Cui,
Shiming Wen,
Xiaofeng Mai,
Jinling Li,
Fengchun Shu,
Yidan Huang
Abstract:
The Gaia celestial reference frame (Gaia-CRF) will benefit from a close assessment with independent methods, such as Very Long Baseline Interferometry (VLBI) measurements of radio stars at bright magnitudes. However, obtaining full astrometric parameters for each radio star through VLBI measurements demands a significant amount of observation time. This study proposes an efficient observing strate…
▽ More
The Gaia celestial reference frame (Gaia-CRF) will benefit from a close assessment with independent methods, such as Very Long Baseline Interferometry (VLBI) measurements of radio stars at bright magnitudes. However, obtaining full astrometric parameters for each radio star through VLBI measurements demands a significant amount of observation time. This study proposes an efficient observing strategy that acquires double-epoch VLBI positions to measure the positions and proper motions of radio stars at a reduced cost. The solution for CRF link compatible with individual VLBI position measurements is introduced, and the optimized observing epoch scheduling is discussed. Applying this solution to observational data yields results sensitive to sample increase or decrease, yet they remain consistently in line with the literature at the 1-sigma level. This suggests the potential for improvement with a larger sample size. Simulations for adding observations demonstrate the double-epoch strategy reduces CRF link parameter uncertainties by over 30% compared to the five-parameter strategy.
△ Less
Submitted 26 March, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
DaReNeRF: Direction-aware Representation for Dynamic Scenes
Authors:
Ange Lou,
Benjamin Planche,
Zhongpai Gao,
Yamin Li,
Tianyu Luan,
Hao Ding,
Terrence Chen,
Jack Noble,
Ziyan Wu
Abstract:
Addressing the intricate challenge of modeling and re-rendering dynamic scenes, most recent approaches have sought to simplify these complexities using plane-based explicit representations, overcoming the slow training time issues associated with methods like Neural Radiance Fields (NeRF) and implicit representations. However, the straightforward decomposition of 4D dynamic scenes into multiple 2D…
▽ More
Addressing the intricate challenge of modeling and re-rendering dynamic scenes, most recent approaches have sought to simplify these complexities using plane-based explicit representations, overcoming the slow training time issues associated with methods like Neural Radiance Fields (NeRF) and implicit representations. However, the straightforward decomposition of 4D dynamic scenes into multiple 2D plane-based representations proves insufficient for re-rendering high-fidelity scenes with complex motions. In response, we present a novel direction-aware representation (DaRe) approach that captures scene dynamics from six different directions. This learned representation undergoes an inverse dual-tree complex wavelet transformation (DTCWT) to recover plane-based information. DaReNeRF computes features for each space-time point by fusing vectors from these recovered planes. Combining DaReNeRF with a tiny MLP for color regression and leveraging volume rendering in training yield state-of-the-art performance in novel view synthesis for complex dynamic scenes. Notably, to address redundancy introduced by the six real and six imaginary direction-aware wavelet coefficients, we introduce a trainable masking approach, mitigating storage issues without significant performance decline. Moreover, DaReNeRF maintains a 2x reduction in training time compared to prior art while delivering superior performance.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition
Authors:
Kun-Yu Lin,
Henghui Ding,
Jiaming Zhou,
Yu-Ming Tang,
Yi-Xing Peng,
Zhilin Zhao,
Chen Change Loy,
Wei-Shi Zheng
Abstract:
Building upon the impressive success of CLIP (Contrastive Language-Image Pretraining), recent pioneer works have proposed to adapt the powerful CLIP to video data, leading to efficient and effective video learners for open-vocabulary action recognition. Inspired by that humans perform actions in diverse environments, our work delves into an intriguing question: Can CLIP-based video learners effect…
▽ More
Building upon the impressive success of CLIP (Contrastive Language-Image Pretraining), recent pioneer works have proposed to adapt the powerful CLIP to video data, leading to efficient and effective video learners for open-vocabulary action recognition. Inspired by that humans perform actions in diverse environments, our work delves into an intriguing question: Can CLIP-based video learners effectively generalize to video domains they have not encountered during training? To answer this, we establish a CROSS-domain Open-Vocabulary Action recognition benchmark named XOV-Action, and conduct a comprehensive evaluation of five state-of-the-art CLIP-based video learners under various types of domain gaps. The evaluation demonstrates that previous methods exhibit limited action recognition performance in unseen video domains, revealing potential challenges of the cross-domain open-vocabulary action recognition task. In this paper, we focus on one critical challenge of the task, namely scene bias, and accordingly contribute a novel scene-aware video-text alignment method. Our key idea is to distinguish video representations apart from scene-encoded text representations, aiming to learn scene-agnostic video representations for recognizing actions across domains. Extensive experiments demonstrate the effectiveness of our method. The benchmark and code will be available at https://github.com/KunyuLin/XOV-Action/.
△ Less
Submitted 24 May, 2024; v1 submitted 3 March, 2024;
originally announced March 2024.