Skip to main content

Showing 1–50 of 943 results for author: Wei, X

  1. arXiv:2407.10759  [pdf, other

    eess.AS cs.CL cs.LG

    Qwen2-Audio Technical Report

    Authors: Yunfei Chu, Jin Xu, Qian Yang, Haojie Wei, Xipin Wei, Zhifang Guo, Yichong Leng, Yuanjun Lv, Jinzheng He, Junyang Lin, Chang Zhou, Jingren Zhou

    Abstract: We introduce the latest progress of Qwen-Audio, a large-scale audio-language model called Qwen2-Audio, which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. In contrast to complex hierarchical tags, we have simplified the pre-training process by utilizing natural language prompts for different data an… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: https://github.com/QwenLM/Qwen2-Audio. Checkpoints, codes and scripts will be opensoursed soon

  2. arXiv:2407.10671  [pdf, other

    cs.CL cs.AI

    Qwen2 Technical Report

    Authors: An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang , et al. (34 additional authors not shown)

    Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, a… ▽ More

    Submitted 16 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figure

  3. arXiv:2407.10199  [pdf, other

    nucl-ex nucl-th

    Charge radii of $^{11-16}$C, $^{13-17}$N and $^{15-18}$O determined from their charge-changing cross-sections and the mirror-difference charge radii

    Authors: J. W. Zhao, B. -H. Sun, I. Tanihata, J. Y. Xu, K. Y. Zhang, A. Prochazka, L. H. Zhu, S. Terashima, J. Meng, L. C. He, C. Y. Liu, G. S. Li, C. G. Lu, W. J. Lin, W. P. Lin, Z. Liu, P. P Ren, Z. Y. Sun, F. Wang, J. Wang, M. Wang, S. T. Wang, X. L. Wei, X. D. Xu, J. C. Zhang , et al. (2 additional authors not shown)

    Abstract: Charge-changing cross-sections of $^{11-16}$C, $^{13-17}$N and $^{15-18}$O on a carbon target have been determined at energies around 300 MeV/nucleon. A nucleon separation energy dependent correction factor has been introduced to the Glauber model calculation for extracting the nuclear charge radii from the experimental CCCSs. The charge radii of $^{11}$C, $^{13,16}$N and $^{15}$O thus were determ… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 3 figures, submitted to Physics Letters B

  4. arXiv:2407.10043  [pdf, other

    cond-mat.dis-nn

    Phase induced localization transition

    Authors: Tong Liu, Xingbo Wei, Youguo Wang

    Abstract: Localization phenomenon is an important research field in condensed matter physics. However, due to the complexity and subtlety of disordered syestems, new localization phenomena always emerge unexpectedly. For example, it is generally believed that the phase of the hopping term does not affect the localization properties of the system, so the calculation of the phase is often ignored in the study… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  5. arXiv:2407.09835  [pdf, other

    cs.CL

    Investigating Low-Rank Training in Transformer Language Models: Efficiency and Scaling Analysis

    Authors: Xiuying Wei, Skander Moalla, Razvan Pascanu, Caglar Gulcehre

    Abstract: State-of-the-art LLMs often rely on scale with high computational costs, which has sparked a research agenda to reduce parameter counts and costs without significantly impacting performance. Our study focuses on Transformer-based LLMs, specifically applying low-rank parametrization to the computationally intensive feedforward networks (FFNs), which are less studied than attention blocks. In contra… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by ICML 2024 Next Generation of Sequence Modeling Architectures Workshop. arXiv admin note: substantial text overlap with arXiv:2406.16450

  6. arXiv:2407.08981  [pdf

    eess.SY

    Joint Load and Capacity Scheduling for Flexible Radio Resource Management of High-Throughput Satellites

    Authors: Jia Zhuoya, Xiong Wei, Hao Hongxing, Liu Zheng, Han Chi

    Abstract: This work first explores using flexible beam-user mapping to optimize the beam service range and beam position, in order to adapt the non-uniform traffic demand to offer in high-throughput satellite (HTS) systems. Second, on this basis, the joint flexible bandwidth allocation is adopted to adapt the offer to demand at the same time. This strategy allows both beam capacity and load to be adjusted t… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  7. arXiv:2407.08739  [pdf, other

    cs.CV

    MAVIS: Mathematical Visual Instruction Tuning

    Authors: Renrui Zhang, Xinyu Wei, Dongzhi Jiang, Yichi Zhang, Ziyu Guo, Chengzhuo Tong, Jiaming Liu, Aojun Zhou, Bin Wei, Shanghang Zhang, Peng Gao, Hongsheng Li

    Abstract: Multi-modal Large Language Models (MLLMs) have recently emerged as a significant focus in academia and industry. Despite their proficiency in general multi-modal scenarios, the mathematical problem-solving capabilities in visual contexts remain insufficiently explored. We identify three key areas within MLLMs that need to be improved: visual encoding of math diagrams, diagram-language alignment, a… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Work in progress. Data and Models are released at https://github.com/ZrrSkywalker/MAVIS

  8. arXiv:2407.08489  [pdf, other

    cs.CV

    Projecting Points to Axes: Oriented Object Detection via Point-Axis Representation

    Authors: Zeyang Zhao, Qilong Xue, Yuhang He, Yifan Bai, Xing Wei, Yihong Gong

    Abstract: This paper introduces the point-axis representation for oriented object detection, emphasizing its flexibility and geometrically intuitive nature with two key components: points and axes. 1) Points delineate the spatial extent and contours of objects, providing detailed shape descriptions. 2) Axes define the primary directionalities of objects, providing essential orientation cues crucial for prec… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 19 pages,7 figures,accpeted by ECCV24!

  9. arXiv:2407.07125  [pdf, other

    gr-qc astro-ph.IM physics.data-an

    Rapid Parameter Estimation for Merging Massive Black Hole Binaries Using ODE-Based Generative Models

    Authors: Bo Liang, Minghui Du, He Wang, Yuxiang Xu, Chang Liu, Xiaotong Wei, Peng Xu, Li-e Qiang, Ziren Luo

    Abstract: Detecting the coalescences of massive black hole binaries (MBHBs) is one of the primary targets for space-based gravitational wave observatories such as LISA, Taiji, and Tianqin. The fast and accurate parameter estimation of merging MBHBs is of great significance for both astrophysics and the global fitting of all resolvable sources. However, such analyses entail significant computational costs. T… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  10. arXiv:2407.05356  [pdf, ps, other

    math.OC math.PR

    Extended mean-field control problems with Poissonian common noise: Stochastic maximum principle and Hamiltonian-Jacobi-Bellman equation

    Authors: Lijun Bo, Jingfei Wang, Xiaoli Wei, Xiang Yu

    Abstract: This paper studies the extended mean-field control problems with state-control joint law dependence and Poissonian common noise. We develop the stochastic maximum principle (SMP) and establish the connection to the Hamiltonian-Jacobi-Bellman (HJB) equation on the Wasserstein space. The presence of the conditional joint law in the McKean-Vlasov dynamics and its discontinuity caused by the Poissonia… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Keywords: Extended mean-field control, Poissonian common noise, relaxed control formulation, stochastic maximum principle, HJB equation

  11. arXiv:2407.04521  [pdf, ps, other

    math.OC cs.LG q-fin.CP

    Unified continuous-time q-learning for mean-field game and mean-field control problems

    Authors: Xiaoli Wei, Xiang Yu, Fengyi Yuan

    Abstract: This paper studies the continuous-time q-learning in the mean-field jump-diffusion models from the representative agent's perspective. To overcome the challenge when the population distribution may not be directly observable, we introduce the integrated q-function in decoupled form (decoupled Iq-function) and establish its martingale characterization together with the value function, which provide… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  12. arXiv:2407.02129  [pdf, other

    cs.HC

    ReliaAvatar: A Robust Real-Time Avatar Animator with Integrated Motion Prediction

    Authors: Bo Qian, Zhenhuan Wei, Jiashuo Li, Xing Wei

    Abstract: Efficiently estimating the full-body pose with minimal wearable devices presents a worthwhile research direction. Despite significant advancements in this field, most current research neglects to explore full-body avatar estimation under low-quality signal conditions, which is prevalent in practical usage. To bridge this gap, we summarize three scenarios that may be encountered in real-world appli… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  13. arXiv:2407.01445  [pdf, other

    cs.LG cs.CV

    FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources

    Authors: Xiyuan Wei, Fanjiang Ye, Ori Yonay, Xingyu Chen, Baixi Sun, Dingwen Tao, Tianbao Yang

    Abstract: Existing studies of training state-of-the-art Contrastive Language-Image Pretraining (CLIP) models on large-scale data involve hundreds of or even thousands of GPUs due to the requirement of a large batch size. However, such a large amount of resources is not accessible to most people. While advanced compositional optimization techniques for optimizing global contrastive losses have been demonstra… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 23 pages

  14. Joint Beamforming and Antenna Position Optimization for Movable Antenna-Assisted Spectrum Sharing

    Authors: Xin Wei, Weidong Mei, Dong Wang, Boyu Ning, Zhi Chen

    Abstract: Fluid antennas (FAs) and movable antennas (MAs) have drawn increasing attention in wireless communications recently due to their ability to create favorable channel conditions via local antenna movement within a confined region. In this letter, we advance their application for cognitive radio to facilitate efficient spectrum sharing between primary and secondary communication systems. In particula… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  15. arXiv:2406.18605  [pdf, other

    physics.ins-det nucl-ex

    The neutron array of the compact spectrometer for heavy ion experiments in Fermi energy region

    Authors: Dawei Si, Sheng Xiao, Yuhao Qin, Yijie Wang, Junhuai Xu, Baiting Tian, Boyuan Zhang, Dong Guo, Qin Zhi, Xiaobao Wei, Yibo Hao, Zengxiang Wang, Tianren Zhuo, Yuansheng Yang, Xianglun Wei, Herun Yang, Peng Ma, Limin Duan, Fangfang Duan, Junbing Ma, Shiwei Xu, Zhen Bai, Guo Yang, Yanyun Yang, Zhigang Xiao

    Abstract: The emission of neutrons from heavy ion reactions is an important observable for studying the asymmetric nuclear equation of state and the reaction dynamics. A 20-unit neutron array has been developed and mounted on the compact spectrometer for heavy ion experiments (CSHINE) to measure the neutron spectra, neutron-neutron and neutron-proton correlation functions. Each unit consists of a… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 8 pages, 11 figures

  16. arXiv:2406.16450  [pdf, other

    cs.CL

    Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers

    Authors: Xiuying Wei, Skander Moalla, Razvan Pascanu, Caglar Gulcehre

    Abstract: State-of-the-art results in large language models (LLMs) often rely on scale, which becomes computationally expensive. This has sparked a research agenda to reduce these models' parameter count and computational costs without significantly impacting their performance. Our study focuses on transformer-based LLMs, specifically targeting the computationally intensive feedforward networks (FFN), which… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  17. arXiv:2406.15768  [pdf, other

    cs.CV

    MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception

    Authors: Guanqun Wang, Xinyu Wei, Jiaming Liu, Ray Zhang, Yichi Zhang, Kevin Zhang, Maurice Chong, Shanghang Zhang

    Abstract: In recent years, multimodal large language models (MLLMs) have shown remarkable capabilities in tasks like visual question answering and common sense reasoning, while visual perception models have made significant strides in perception tasks, such as detection and segmentation. However, MLLMs mainly focus on high-level image-text interpretations and struggle with fine-grained visual understanding,… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 14 pages, 8 figures

  18. arXiv:2406.15539  [pdf, other

    hep-ex nucl-ex

    First Measurement of Deeply Virtual Compton Scattering on the Neutron with Detection of the Active Neutron

    Authors: CLAS Collaboration, A. Hobart, S. Niccolai, M. Čuić, K. Kumerički, P. Achenbach, J. S. Alvarado, W. R. Armstrong, H. Atac, H. Avakian, L. Baashen, N. A. Baltzell, L. Barion, M. Bashkanov, M. Battaglieri, B. Benkel, F. Benmokhtar, A. Bianconi, A. S. Biselli, S. Boiarinov, M. Bondi, W. A. Booth, F. Bossù, K. -Th. Brinkmann, W. J. Briscoe , et al. (124 additional authors not shown)

    Abstract: Measuring Deeply Virtual Compton Scattering on the neutron is one of the necessary steps to understand the structure of the nucleon in terms of Generalized Parton Distributions (GPDs). Neutron targets play a complementary role to transversely polarized proton targets in the determination of the GPD $E$. This poorly known and poorly constrained GPD is essential to obtain the contribution of the qua… ▽ More

    Submitted 25 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: 7 pages, 6 figures

    Report number: JLAB-PHY-24-4089

  19. arXiv:2406.12878  [pdf, other

    physics.ins-det hep-ex nucl-ex

    Beam test results of the prototype of the multi wire drift chamber for the CSR external-target experiment

    Authors: Zhi Qin, Zhoubo He, Zhe Cao, Tao Chen, Zhi Deng, Limin Duan, Dong Guo, Rongjiang Hu, Jie Kong, Canwen Liu, Peng Ma, Xianglun Wei, Shihai Wen, Xiangjie Wen, Junwei Yan, Herun Yang, Zuoqiao Yang, Yuhong Yu, Zhigang Xiao

    Abstract: The half-size prototype of the multi wire drift chamber (MWDC) for the cooling storage ring (CSR) external-target experiment (CEE) was assembled and tested in 350 MeV/u Kr+Fe reactions on the heavy ion research facility in Lanzhou (HIRFL). The prototype consists of 6 sense layers, where the sense wires are stretched in three directions X, U and V, meeting $0^\circ$, $30^\circ$ and $-30^\circ$ with… ▽ More

    Submitted 15 May, 2024; originally announced June 2024.

  20. arXiv:2406.12324  [pdf, other

    cs.RO

    AutoDSL: Automated domain-specific language design for structural representation of procedures with constraints

    Authors: Yu-Zhe Shi, Haofei Hou, Zhangqian Bi, Fanxu Meng, Xiang Wei, Lecheng Ruan, Qining Wang

    Abstract: Accurate representation of procedures in restricted scenarios, such as non-standardized scientific experiments, requires precise depiction of constraints. Unfortunately, Domain-specific Language (DSL), as an effective tool to express constraints structurally, often requires case-by-case hand-crafting, necessitating customized, labor-intensive efforts. To overcome this challenge, we introduce the A… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL'24)

  21. arXiv:2406.11833  [pdf, other

    cs.CV cs.AI cs.LG

    MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

    Authors: Ziyu Liu, Tao Chu, Yuhang Zang, Xilin Wei, Xiaoyi Dong, Pan Zhang, Zijian Liang, Yuanjun Xiong, Yu Qiao, Dahua Lin, Jiaqi Wang

    Abstract: Generating natural and meaningful responses to communicate with multi-modal human inputs is a fundamental capability of Large Vision-Language Models(LVLMs). While current open-source LVLMs demonstrate promising performance in simplified scenarios such as single-turn single-image input, they fall short in real-world conversation scenarios such as following instructions in a long context history wit… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: This project is available at https://github.com/Liuziyu77/MMDU

  22. arXiv:2406.11185  [pdf, other

    physics.chem-ph

    Acceleration without Disruption: DFT Software as a Service

    Authors: Fusong Ju, Xinran Wei, Lin Huang, Andrew J. Jenkins, Leo Xia, Jia Zhang, Jianwei Zhu, Han Yang, Bin Shao, Peggy Dai, Ashwin Mayya, Zahra Hooshmand, Alexandra Efimovskaya, Nathan A. Baker, Matthias Troyer, Hongbin Liu

    Abstract: Density functional theory (DFT) has been a cornerstone in computational chemistry, physics, and materials science for decades, benefiting from advancements in computational power and theoretical methods. This paper introduces a novel, cloud-native application, Accelerated DFT, which offers an order of magnitude acceleration in DFT simulations. By integrating state-of-the-art cloud infrastructure a… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  23. arXiv:2406.10527  [pdf, other

    cs.CV

    Panoptic-FlashOcc: An Efficient Baseline to Marry Semantic Occupancy with Panoptic via Instance Center

    Authors: Zichen Yu, Changyong Shu, Qianpu Sun, Junjie Linghu, Xiaobao Wei, Jiangyong Yu, Zongdai Liu, Dawei Yang, Hui Li, Yan Chen

    Abstract: Panoptic occupancy poses a novel challenge by aiming to integrate instance occupancy and semantic occupancy within a unified framework. However, there is still a lack of efficient solutions for panoptic occupancy. In this paper, we propose Panoptic-FlashOcc, a straightforward yet robust 2D feature framework that enables realtime panoptic occupancy. Building upon the lightweight design of FlashOcc,… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  24. arXiv:2406.10193  [pdf

    cond-mat.str-el cond-mat.supr-con

    Three-dimensional quantum Griffiths singularity in bulk iron-pnictide superconductors

    Authors: Shao-Bo Liu, Congkuan Tian, Yongqing Cai, Hang Cui, Xinjian Wei, Mantang Chen, Yang Zhao, Yuan Sui, Shuyue Guan, Shuang Jia, Yu Zhang, Ya Feng, Jiankun Li, Jian Cui, Yuanjun Song, Tingting Hao, Chaoyu Chen, Jian-Hao Chen

    Abstract: The quantum Griffiths singularity (QGS) is a phenomenon driven by quenched disorders that break conventional scaling invariance and result in a divergent dynamical critical exponent during quantum phase transitions (QPT). While this phenomenon has been well-documented in low-dimensional conventional superconductors and in three-dimensional (3D) magnetic metal systems, its presence in 3D supercondu… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 17 pages, 4 figures

  25. arXiv:2406.08418  [pdf, other

    cs.CV cs.AI

    OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

    Authors: Qingyun Li, Zhe Chen, Weiyun Wang, Wenhai Wang, Shenglong Ye, Zhenjiang Jin, Guanzhou Chen, Yinan He, Zhangwei Gao, Erfei Cui, Jiashuo Yu, Hao Tian, Jiasheng Zhou, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Zhenxiang Li, Pei Chu, Yi Wang , et al. (15 additional authors not shown)

    Abstract: Image-text interleaved data, consisting of multiple images and texts arranged in a natural document format, aligns with the presentation paradigm of internet data and closely resembles human reading habits. Recent studies have shown that such data aids multimodal in-context learning and maintains the capabilities of large language models during multimodal fine-tuning. However, the limited scale an… ▽ More

    Submitted 12 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  26. arXiv:2406.07175  [pdf, ps, other

    cond-mat.mtrl-sci

    Phase Diagram of growth modes in Graphene Growth on Cooper by Vapor Deposition

    Authors: Tongtong Wang, Jian Zheng, Xin Wei, Dajun Shu

    Abstract: Understanding the atomistic mechanism in graphene growth is crucial for controlling the number of layers or domain sizes to meet practical needs. In this work, focusing on the growth of graphene by chemical vapor deposition on copper substrates, the surface kinetics in the growth are systematically investigated by first-principles calculations. The phase diagram, predicting whether the growth mode… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  27. arXiv:2406.07057  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study

    Authors: Yichi Zhang, Yao Huang, Yitong Sun, Chang Liu, Zhe Zhao, Zhengwei Fang, Yifan Wang, Huanran Chen, Xiao Yang, Xingxing Wei, Hang Su, Yinpeng Dong, Jun Zhu

    Abstract: Despite the superior capabilities of Multimodal Large Language Models (MLLMs) across diverse tasks, they still face significant trustworthiness challenges. Yet, current literature on the assessment of trustworthy MLLMs remains limited, lacking a holistic evaluation to offer thorough insights into future improvements. In this work, we establish MultiTrust, the first comprehensive and unified benchm… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 100 pages, 84 figures, 33 tables

  28. arXiv:2406.05485  [pdf, other

    cs.CV

    Training-Free Robust Interactive Video Object Segmentation

    Authors: Xiaoli Wei, Zhaoqing Wang, Yandong Guo, Chunxia Zhang, Tongliang Liu, Mingming Gong

    Abstract: Interactive video object segmentation is a crucial video task, having various applications from video editing to data annotating. However, current approaches struggle to accurately segment objects across diverse domains. Recently, Segment Anything Model (SAM) introduces interactive visual prompts and demonstrates impressive performance across different domains. In this paper, we propose a training… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  29. arXiv:2406.04743  [pdf, other

    cs.LG cs.CR cs.DC stat.AP

    When Swarm Learning meets energy series data: A decentralized collaborative learning design based on blockchain

    Authors: Lei Xu, Yulong Chen, Yuntian Chen, Longfeng Nie, Xuetao Wei, Liang Xue, Dongxiao Zhang

    Abstract: Machine learning models offer the capability to forecast future energy production or consumption and infer essential unknown variables from existing data. However, legal and policy constraints within specific energy sectors render the data sensitive, presenting technical hurdles in utilizing data from diverse sources. Therefore, we propose adopting a Swarm Learning (SL) scheme, which replaces the… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  30. arXiv:2406.04325  [pdf, other

    cs.CV

    ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

    Authors: Lin Chen, Xilin Wei, Jinsong Li, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Bin Lin, Zhenyu Tang, Li Yuan, Yu Qiao, Dahua Lin, Feng Zhao, Jiaqi Wang

    Abstract: We present the ShareGPT4Video series, aiming to facilitate the video understanding of large video-language models (LVLMs) and the video generation of text-to-video models (T2VMs) via dense and precise captions. The series comprises: 1) ShareGPT4Video, 40K GPT4V annotated dense captions of videos with various lengths and sources, developed through carefully designed data filtering and annotating st… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project Page: https://sharegpt4video.github.io/

  31. arXiv:2406.03794  [pdf, other

    cs.LG

    Infusing Self-Consistency into Density Functional Theory Hamiltonian Prediction via Deep Equilibrium Models

    Authors: Zun Wang, Chang Liu, Nianlong Zou, He Zhang, Xinran Wei, Lin Huang, Lijun Wu, Bin Shao

    Abstract: In this study, we introduce a unified neural network architecture, the Deep Equilibrium Density Functional Theory Hamiltonian (DEQH) model, which incorporates Deep Equilibrium Models (DEQs) for predicting Density Functional Theory (DFT) Hamiltonians. The DEQH model inherently captures the self-consistency nature of Hamiltonian, a critical aspect often overlooked by traditional machine learning app… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  32. arXiv:2405.20323  [pdf, other

    cs.CV cs.AI

    $\textit{S}^3$Gaussian: Self-Supervised Street Gaussians for Autonomous Driving

    Authors: Nan Huang, Xiaobao Wei, Wenzhao Zheng, Pengju An, Ming Lu, Wei Zhan, Masayoshi Tomizuka, Kurt Keutzer, Shanghang Zhang

    Abstract: Photorealistic 3D reconstruction of street scenes is a critical technique for developing real-world simulators for autonomous driving. Despite the efficacy of Neural Radiance Fields (NeRF) for driving scenes, 3D Gaussian Splatting (3DGS) emerges as a promising direction due to its faster speed and more explicit representation. However, most existing street 3DGS methods require tracked 3D vehicle b… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Code is available at: https://github.com/nnanhuang/S3Gaussian/

  33. arXiv:2405.18860  [pdf, other

    cs.RO

    Empowering Embodied Manipulation: A Bimanual-Mobile Robot Manipulation Dataset for Household Tasks

    Authors: Tianle Zhang, Dongjiang Li, Yihang Li, Zecui Zeng, Lin Zhao, Lei Sun, Yue Chen, Xuelong Wei, Yibing Zhan, Lusong Li, Xiaodong He

    Abstract: The advancements in embodied AI are increasingly enabling robots to tackle complex real-world tasks, such as household manipulation. However, the deployment of robots in these environments remains constrained by the lack of comprehensive bimanual-mobile robot manipulation data that can be learned. Existing datasets predominantly focus on single-arm manipulation tasks, while the few dual-arm datase… ▽ More

    Submitted 6 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  34. arXiv:2405.18729  [pdf, other

    cs.LG cs.AI

    Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning

    Authors: Tianle Zhang, Jiayi Guan, Lin Zhao, Yihang Li, Dongjiang Li, Zecui Zeng, Lei Sun, Yue Chen, Xuelong Wei, Lusong Li, Xiaodong He

    Abstract: Offline reinforcement learning (RL) aims to learn optimal policies from previously collected datasets. Recently, due to their powerful representational capabilities, diffusion models have shown significant potential as policy models for offline RL issues. However, previous offline RL algorithms based on diffusion policies generally adopt weighted regression to improve the policy. This approach opt… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  35. arXiv:2405.18361  [pdf, other

    cs.CV

    Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?

    Authors: Yifan Bai, Dongming Wu, Yingfei Liu, Fan Jia, Weixin Mao, Ziheng Zhang, Yucheng Zhao, Jianbing Shen, Xing Wei, Tiancai Wang, Xiangyu Zhang

    Abstract: Rapid advancements in Autonomous Driving (AD) tasks turned a significant shift toward end-to-end fashion, particularly in the utilization of vision-language models (VLMs) that integrate robust logical reasoning and cognitive abilities to enable comprehensive end-to-end planning. However, these VLM-based approaches tend to integrate 2D vision tokenizers and a large language model (LLM) for ego-car… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  36. arXiv:2405.17935  [pdf, other

    cs.CL cs.AI

    Tool Learning with Large Language Models: A Survey

    Authors: Changle Qu, Sunhao Dai, Xiaochi Wei, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Jun Xu, Ji-Rong Wen

    Abstract: Recently, tool learning with large language models (LLMs) has emerged as a promising paradigm for augmenting the capabilities of LLMs to tackle highly complex problems. Despite growing attention and rapid advancements in this field, the existing literature remains fragmented and lacks systematic organization, posing barriers to entry for newcomers. This gap motivates us to conduct a comprehensive… ▽ More

    Submitted 30 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  37. arXiv:2405.16865  [pdf, other

    q-bio.NC cs.LG stat.ML

    An Investigation of Conformal Isometry Hypothesis for Grid Cells

    Authors: Dehong Xu, Ruiqi Gao, Wen-Hao Zhang, Xue-Xin Wei, Ying Nian Wu

    Abstract: This paper investigates the conformal isometry hypothesis as a potential explanation for the emergence of hexagonal periodic patterns in the response maps of grid cells. The hypothesis posits that the activities of the population of grid cells form a high-dimensional vector in the neural space, representing the agent's self-position in 2D physical space. As the agent moves in the 2D physical space… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2310.19192

  38. arXiv:2405.16089  [pdf, other

    cs.CL cs.IR

    COLT: Towards Completeness-Oriented Tool Retrieval for Large Language Models

    Authors: Changle Qu, Sunhao Dai, Xiaochi Wei, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Jun Xu, Ji-Rong Wen

    Abstract: Recently, the integration of external tools with Large Language Models (LLMs) has emerged as a promising approach to overcome the inherent constraints of their pre-training data. However, realworld applications often involve a diverse range of tools, making it infeasible to incorporate all tools directly into LLMs due to constraints on input length and response time. Therefore, to fully exploit th… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  39. arXiv:2405.14702  [pdf, other

    cs.CV cs.AI

    G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models

    Authors: Pengyue Jia, Yiding Liu, Xiaopeng Li, Xiangyu Zhao, Yuhao Wang, Yantong Du, Xiao Han, Xuetao Wei, Shuaiqiang Wang, Dawei Yin

    Abstract: Worldwide geolocalization aims to locate the precise location at the coordinate level of photos taken anywhere on the Earth. It is very challenging due to 1) the difficulty of capturing subtle location-aware visual semantics, and 2) the heterogeneous geographical distribution of image data. As a result, existing studies have clear limitations when scaled to a worldwide context. They may easily con… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  40. arXiv:2405.12079  [pdf, other

    cs.DC cs.OS

    PARALLELGPUOS: A Concurrent OS-level GPU Checkpoint and Restore System using Validated Speculation

    Authors: Zhuobin Huang, Xingda Wei, Yingyi Hao, Rong Chen, Mingcong Han, Jinyu Gu, Haibo Chen

    Abstract: Checkpointing (C) and restoring (R) are key components for GPU tasks. POS is an OS-level GPU C/R system: It can transparently checkpoint or restore processes that use the GPU, without requiring any cooperation from the application, a key feature required by modern systems like the cloud. Moreover, POS is the first OS-level C/R system that can concurrently execute C/R with the application execution… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  41. arXiv:2405.11335  [pdf, other

    cs.CR

    Detecting Complex Multi-step Attacks with Explainable Graph Neural Network

    Authors: Wei Liu, Peng Gao, Haotian Zhang, Ke Li, Weiyong Yang, Xingshen Wei, Jiwu Shu

    Abstract: Complex multi-step attacks have caused significant damage to numerous critical infrastructures. To detect such attacks, graph neural network based methods have shown promising results by modeling the system's events as a graph. However, existing methods still face several challenges when deployed in practice. First, there is a lack of sufficient real attack data especially considering the large vo… ▽ More

    Submitted 13 June, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

    Comments: Corresponding author: Peng Gao (gao.itslab@gmail.com)

  42. arXiv:2405.11236  [pdf, other

    cs.CV

    TriLoRA: Integrating SVD for Advanced Style Personalization in Text-to-Image Generation

    Authors: Chengcheng Feng, Mu He, Qiuyu Tian, Haojie Yin, Xiaofang Zhao, Hongwei Tang, Xingqiang Wei

    Abstract: As deep learning technology continues to advance, image generation models, especially models like Stable Diffusion, are finding increasingly widespread application in visual arts creation. However, these models often face challenges such as overfitting, lack of stability in generated results, and difficulties in accurately capturing the features desired by creators during the fine-tuning process.… ▽ More

    Submitted 13 June, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

  43. arXiv:2405.09150  [pdf, other

    cs.CV

    Curriculum Dataset Distillation

    Authors: Zhiheng Ma, Anjia Cao, Funing Yang, Xing Wei

    Abstract: Most dataset distillation methods struggle to accommodate large-scale datasets due to their substantial computational and memory requirements. In this paper, we present a curriculum-based dataset distillation framework designed to harmonize scalability with efficiency. This framework strategically distills synthetic images, adhering to a curriculum that transitions from simple to complex. By incor… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  44. arXiv:2405.08816  [pdf, other

    cs.CV cs.RO

    The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

    Authors: Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang , et al. (66 additional authors not shown)

    Abstract: In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: ICRA 2024; 32 pages, 24 figures, 5 tables; Code at https://robodrive-24.github.io/

  45. arXiv:2405.08340  [pdf, other

    cs.CR cs.CV

    Achieving Resolution-Agnostic DNN-based Image Watermarking:A Novel Perspective of Implicit Neural Representation

    Authors: Yuchen Wang, Xingyu Zhu, Guanhui Ye, Shiyao Zhang, Xuetao Wei

    Abstract: DNN-based watermarking methods are rapidly developing and delivering impressive performances. Recent advances achieve resolution-agnostic image watermarking by reducing the variant resolution watermarking problem to a fixed resolution watermarking problem. However, such a reduction process can potentially introduce artifacts and low robustness. To address this issue, we propose the first, to the b… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  46. arXiv:2405.08245  [pdf

    cs.CV cs.AI

    Progressive enhancement and restoration for mural images under low-light and defected conditions based on multi-receptive field strategy

    Authors: Xiameng Wei, Binbin Fan, Ying Wang, Yanxiang Feng, Laiyi Fu

    Abstract: Ancient murals are valuable cultural heritage with great archaeological value. They provide insights into ancient religions, ceremonies, folklore, among other things through their content. However, due to long-term oxidation and inadequate protection, ancient murals have suffered continuous damage, including peeling and mold etc. Additionally, since ancient murals were typically painted indoors, t… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  47. Selective Focus: Investigating Semantics Sensitivity in Post-training Quantization for Lane Detection

    Authors: Yunqian Fan, Xiuying Wei, Ruihao Gong, Yuqing Ma, Xiangguo Zhang, Qi Zhang, Xianglong Liu

    Abstract: Lane detection (LD) plays a crucial role in enhancing the L2+ capabilities of autonomous driving, capturing widespread attention. The Post-Processing Quantization (PTQ) could facilitate the practical application of LD models, enabling fast speeds and limited memories without labeled data. However, prior PTQ methods do not consider the complex LD outputs that contain physical semantics, such as off… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: Accepted by AAAI-24

    Journal ref: AAAI 2024, 38, 11936-11943

  48. arXiv:2405.05808  [pdf, other

    cs.CV

    Fast and Controllable Post-training Sparsity: Learning Optimal Sparsity Allocation with Global Constraint in Minutes

    Authors: Ruihao Gong, Yang Yong, Zining Wang, Jinyang Guo, Xiuying Wei, Yuqing Ma, Xianglong Liu

    Abstract: Neural network sparsity has attracted many research interests due to its similarity to biological schemes and high energy efficiency. However, existing methods depend on long-time training or fine-tuning, which prevents large-scale applications. Recently, some works focusing on post-training sparsity (PTS) have emerged. They get rid of the high training cost but usually suffer from distinct accura… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  49. arXiv:2405.05538  [pdf, other

    cs.CV

    A Survey on Personalized Content Synthesis with Diffusion Models

    Authors: Xulu Zhang, Xiao-Yong Wei, Wengyu Zhang, Jinlin Wu, Zhaoxiang Zhang, Zhen Lei, Qing Li

    Abstract: Recent advancements in generative models have significantly impacted content creation, leading to the emergence of Personalized Content Synthesis (PCS). With a small set of user-provided examples, PCS aims to customize the subject of interest to specific user-defined prompts. Over the past two years, more than 150 methods have been proposed. However, existing surveys mainly focus on text-to-image… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  50. arXiv:2405.04765  [pdf, other

    cs.LG cs.AI cs.DC

    When Foresight Pruning Meets Zeroth-Order Optimization: Efficient Federated Learning for Low-Memory Devices

    Authors: Pengyu Zhang, Yingjie Liu, Yingbo Zhou, Xiao Du, Xian Wei, Ting Wang, Mingsong Chen

    Abstract: Although Federated Learning (FL) enables collaborative learning in Artificial Intelligence of Things (AIoT) design, it fails to work on low-memory AIoT devices due to its heavy memory usage. To address this problem, various federated pruning methods are proposed to reduce memory usage during inference. However, few of them can substantially mitigate the memory burdens during pruning and training.… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.