Skip to main content

Showing 1–50 of 895 results for author: Zhong, Y

  1. arXiv:2407.13746  [pdf, ps, other

    cs.LG stat.ML

    Multi-Label Learning with Stronger Consistency Guarantees

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: We present a detailed study of surrogate losses and algorithms for multi-label learning, supported by $H$-consistency bounds. We first show that, for the simplest form of multi-label loss (the popular Hamming loss), the well-known consistent binary relevance surrogate suffers from a sub-optimal dependency on the number of labels in terms of $H$-consistency bounds, when using smooth losses such as… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  2. arXiv:2407.13732  [pdf, other

    cs.LG stat.ML

    Realizable $H$-Consistent and Bayes-Consistent Loss Functions for Learning to Defer

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: We present a comprehensive study of surrogate loss functions for learning to defer. We introduce a broad family of surrogate losses, parameterized by a non-increasing function $Ψ$, and establish their realizable $H$-consistency under mild conditions. For cost functions based on classification error, we further show that these losses admit $H$-consistency bounds when the hypothesis set is symmetric… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  3. arXiv:2407.13722  [pdf, ps, other

    cs.LG stat.ML

    Enhanced $H$-Consistency Bounds

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: Recent research has introduced a key notion of $H$-consistency bounds for surrogate losses. These bounds offer finite-sample guarantees, quantifying the relationship between the zero-one estimation error (or other target loss) and the surrogate loss estimation error for a specific hypothesis set. However, previous bounds were derived under the condition that a lower bound of the surrogate loss con… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  4. arXiv:2407.13509  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models

    Authors: Weiqin Li, Peiji Yang, Yicheng Zhong, Yixuan Zhou, Zhisheng Wang, Zhiyong Wu, Xixin Wu, Helen Meng

    Abstract: Spontaneous style speech synthesis, which aims to generate human-like speech, often encounters challenges due to the scarcity of high-quality data and limitations in model capabilities. Recent language model-based TTS systems can be trained on large, diverse, and low-quality speech datasets, resulting in highly natural synthesized speech. However, they are limited by the difficulty of simulating v… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by INTERSPEECH 2024

  5. arXiv:2407.10979  [pdf, ps, other

    cs.NI

    Diffusion Model-based Incentive Mechanism with Prospect Theory for Edge AIGC Services in 6G IoT

    Authors: Jinbo Wen, Jiangtian Nie, Yue Zhong, Changyan Yi, Xiaohuan Li, Jiangming Jin, Yang Zhang, Dusit Niyato

    Abstract: The fusion of Internet of Things (IoT) with Sixth-Generation (6G) technology has significant potential to revolutionize the IoT landscape. Utilizing the ultra-reliable and low-latency communication capabilities of 6G, 6G-IoT networks can transmit high-quality and diverse data to enhance edge learning. Artificial Intelligence-Generated Content (AIGC) harnesses advanced AI algorithms to automaticall… ▽ More

    Submitted 10 June, 2024; originally announced July 2024.

  6. arXiv:2407.09946  [pdf, other

    cs.CV

    Low-Rank Interconnected Adaptation Across Layers

    Authors: Yibo Zhong, Yao Zhou

    Abstract: Low-rank adaptation (LoRA), as one of the most well-known representative methods of parameter-efficient fine-tuning, freezes the backbone model and introduces parallel adapter modules to each layer of the model. These modules consist of two low-rank trainable matrices: a low-dimension projector (LP) and a high-dimension projector (HP) with their product approximating the change for updating the mo… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 13 pages, 4 figures

  7. arXiv:2407.08496  [pdf, ps, other

    math.DG math.GT

    Convergences of Combinatorial Ricci Flows to Degenerated Circle Packings in Hyperbolic Background Geometry

    Authors: Guangming Hu, Sicheng Lu, Dong Tan, Youliang Zhong, Puchun Zhou

    Abstract: This paper investigates a kind of degenerated circle packings in hyperbolic background geometry. A main problem is whether a prescribed total geodesic curvature data can be realized by a degenerated circle packing or not. We fully characterize the sufficient and necessary conditions and show the uniqueness. Furthermore, we introduce the combinatoral Ricci flow to find the desired degenerated circl… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 36 pages, 9 figures

    MSC Class: 52C26; 57M50

  8. arXiv:2407.08126  [pdf, other

    cs.AI cs.CV cs.MM

    Label-anticipated Event Disentanglement for Audio-Visual Video Parsing

    Authors: Jinxing Zhou, Dan Guo, Yuxin Mao, Yiran Zhong, Xiaojun Chang, Meng Wang

    Abstract: Audio-Visual Video Parsing (AVVP) task aims to detect and temporally locate events within audio and visual modalities. Multiple events can overlap in the timeline, making identification challenging. While traditional methods usually focus on improving the early audio-visual encoders to embed more effective features, the decoding phase -- crucial for final event classification, often receives less… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  9. arXiv:2407.07406  [pdf, other

    cs.CV cs.AI

    Weakly-supervised Medical Image Segmentation with Gaze Annotations

    Authors: Yuan Zhong, Chenhui Tang, Yumeng Yang, Ruoxi Qi, Kang Zhou, Yuqi Gong, Pheng Ann Heng, Janet H. Hsiao, Qi Dou

    Abstract: Eye gaze that reveals human observational patterns has increasingly been incorporated into solutions for vision tasks. Despite recent explorations on leveraging gaze to aid deep networks, few studies exploit gaze as an efficient annotation approach for medical image segmentation which typically entails heavy annotating costs. In this paper, we propose to collect dense weak supervision for medical… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: MICCAI 2024

  10. arXiv:2407.07140  [pdf, other

    cs.LG stat.ML

    Cardinality-Aware Set Prediction and Top-$k$ Classification

    Authors: Corinna Cortes, Anqi Mao, Christopher Mohri, Mehryar Mohri, Yutao Zhong

    Abstract: We present a detailed study of cardinality-aware top-$k$ classification, a novel approach that aims to learn an accurate top-$k$ set predictor while maintaining a low cardinality. We introduce a new target loss function tailored to this setting that accounts for both the classification error and the cardinality of the set predicted. To optimize this loss function, we propose two families of surrog… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2403.19625

  11. arXiv:2407.06794  [pdf, other

    cs.CV

    ERQ: Error Reduction for Post-Training Quantization of Vision Transformers

    Authors: Yunshan Zhong, Jiawei Hu, You Huang, Yuxin Zhang, Rongrong Ji

    Abstract: Post-training quantization (PTQ) for vision transformers (ViTs) has garnered significant attention due to its efficiency in compressing models. However, existing methods typically overlook the intricate interdependence between quantized weight and activation, leading to considerable quantization error. In this paper, we propose ERQ, a two-step PTQ approach meticulously crafted to sequentially redu… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: ICML2024 (Spotlight)

  12. arXiv:2407.05220  [pdf, other

    cond-mat.str-el cond-mat.mes-hall cond-mat.mtrl-sci cond-mat.quant-gas

    Altermagnetism in Heavy Fermion Systems

    Authors: Miaomiao Zhao, Wei-Wei Yang, Xueming Guo, Hong-Gang Luo, Yin Zhong

    Abstract: Novel collinear magnet, the altermagnet (AM) with spin-splitting energy band and zero net magnetization have attracted great interest due to its potential spintronic applications. Here, we demonstrate AM-like phases in a microscopic Kondo lattice model, widely used for heavy fermion compounds. With the framework of fermionic parton mean-field theory, we find the $d$-wave AM state can coexist with… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: 14 pages, 12 figures

  13. arXiv:2407.04941  [pdf, other

    astro-ph.HE

    Spindown of Pulsars Interacting with Companion Winds: Impact of Magnetospheric Compression

    Authors: Yici Zhong, Anatoly Spitkovsky, Jens F. Mahlmann, Hayk Hakobyan

    Abstract: The presence of a companion wind in neutron star binary systems can form a contact discontinuity well within the pulsar's light cylinder, effectively creating a waveguide that confines the pulsar's electromagnetic fields and significantly alters its spindown. We parametrize this confinement as the ratio between the equatorial position of the contact discontinuity (or standoff distance)… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 13 pages, 7 figures, comments are welcomed

  14. arXiv:2407.03992  [pdf, other

    eess.IV

    Performance of Medical Image Fusion in High-level Analysis Tasks: A Mutual Enhancement Framework for Unaligned PAT and MRI Image Fusion

    Authors: Yutian Zhong, Jinchuan He, Zhichao Liang, Shuangyang Zhang, Qianjin Feng, Wufan Chen, Li Qi

    Abstract: Photoacoustic tomography (PAT) offers optical contrast, whereas magnetic resonance imaging (MRI) excels in imaging soft tissue and organ anatomy. The fusion of PAT with MRI holds promising application prospects due to their complementary advantages. Existing image fusion have made considerable progress in pre-registered images, yet spatial deformations are difficult to avoid in medical imaging sce… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  15. arXiv:2407.02842  [pdf, other

    cs.CV cs.AI cs.CL

    MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis

    Authors: Lei Chen, Feng Yan, Yujie Zhong, Shaoxiang Chen, Zequn Jie, Lin Ma

    Abstract: Multimodal Large Language Models (MLLM) have made significant progress in the field of document analysis. Despite this, existing benchmarks typically focus only on extracting text and simple layout information, neglecting the complex interactions between elements in structured documents such as mind maps and flowcharts. To address this issue, we introduce the new benchmark named MindBench, which n… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: technical report

  16. arXiv:2407.02785  [pdf

    cond-mat.mtrl-sci physics.comp-ph

    Identifying Direct Bandgap Silicon Structures with High-throughput Search and Machine Learning Methods

    Authors: Rui Wang, Hongyu Yu, Yang Zhong, Hongjun Xiang

    Abstract: Utilizations of silicon-based luminescent devices are restricted by the indirect-gap nature of diamond silicon. In this study, the high-throughput method is employed to expedite discoveries of direct-gap silicon crystals. The machine learning (ML) potential is utilized to construct a dataset comprising 2637 silicon allotropes, which is subsequently screened using an ML Hamiltonian model and densit… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  17. arXiv:2407.00983  [pdf, other

    cs.CV

    FairMedFM: Fairness Benchmarking for Medical Imaging Foundation Models

    Authors: Ruinan Jin, Zikang Xu, Yuan Zhong, Qiongsong Yao, Qi Dou, S. Kevin Zhou, Xiaoxiao Li

    Abstract: The advent of foundation models (FMs) in healthcare offers unprecedented opportunities to enhance medical diagnostics through automated classification and segmentation tasks. However, these models also raise significant concerns about their fairness, especially when applied to diverse and underrepresented populations in healthcare applications. Currently, there is a lack of comprehensive benchmark… ▽ More

    Submitted 3 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: 29 pages, 17 figures

  18. arXiv:2407.00765  [pdf, other

    cs.LG cs.NE math.NA stat.ML

    Structured and Balanced Multi-component and Multi-layer Neural Networks

    Authors: Shijun Zhang, Hongkai Zhao, Yimin Zhong, Haomin Zhou

    Abstract: In this work, we propose a balanced multi-component and multi-layer neural network (MMNN) structure to approximate functions with complex features with both accuracy and efficiency in terms of degrees of freedom and computation cost. The main idea is motivated by a multi-component, each of which can be approximated effectively by a single-layer network, and multi-layer decomposition in a "divide-a… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Our codes and implementation details are available at https://github.com/ShijunZhangMath/MMNN

  19. arXiv:2406.18173  [pdf, other

    cs.CL

    UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs

    Authors: Wenhao Li, Mingbao Lin, Yunshan Zhong, Shuicheng Yan, Rongrong Ji

    Abstract: Managing long texts is challenging for large language models (LLMs) due to limited context window sizes. This study introduces UIO-LLMs, an unbiased incremental optimization approach for memory-enhanced transformers under long-context settings. We initially conceptualize the process as a streamlined encoder-decoder framework where the weights-shared encoder and decoder respectively encapsulate a c… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  20. arXiv:2406.18018  [pdf, other

    eess.IV

    A Cross Spatio-Temporal Pathology-based Lung Nodule Dataset

    Authors: Muwei Jian, Haoran Zhang, Mingju Shao, Hongyu Chen, Huihui Huang, Yanjie Zhong, Changlei Zhang, Bin Wang, Penghui Gao

    Abstract: Recently, intelligent analysis of lung nodules with the assistant of computer aided detection (CAD) techniques can improve the accuracy rate of lung cancer diagnosis. However, existing CAD systems and pulmonary datasets mainly focus on Computed Tomography (CT) images from one single period, while ignoring the cross spatio-temporal features associated with the progression of nodules contained in im… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  21. arXiv:2406.17998  [pdf, other

    cs.CV

    Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model

    Authors: Zhuo Zheng, Stefano Ermon, Dongjun Kim, Liangpei Zhang, Yanfei Zhong

    Abstract: Our understanding of the temporal dynamics of the Earth's surface has been advanced by deep vision models, which often require lots of labeled multi-temporal images for training. However, collecting, preprocessing, and annotating multi-temporal remote sensing images at scale is non-trivial since it is expensive and knowledge-intensive. In this paper, we present change data generators based on gene… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: The enhanced extension of our ICCV 2023 (Changen)

  22. arXiv:2406.17894  [pdf, other

    cs.LG

    Efficient and Effective Implicit Dynamic Graph Neural Network

    Authors: Yongjian Zhong, Hieu Vu, Tianbao Yang, Bijaya Adhikari

    Abstract: Implicit graph neural networks have gained popularity in recent years as they capture long-range dependencies while improving predictive performance in static graphs. Despite the tussle between performance degradation due to the oversmoothing of learned embeddings and long-range dependency being more pronounced in dynamic graphs, as features are aggregated both across neighborhood and time, no pri… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  23. arXiv:2406.16690  [pdf, other

    cs.CL

    Scaling Laws for Linear Complexity Language Models

    Authors: Xuyang Shen, Dong Li, Ruitao Leng, Zhen Qin, Weigao Sun, Yiran Zhong

    Abstract: The interest in linear complexity models for large language models is on the rise, although their scaling capacity remains uncertain. In this study, we present the scaling laws for linear complexity language models to establish a foundation for their scalability. Specifically, we examine the scaling behaviors of three efficient linear architectures. These include TNL, a linear attention model with… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Technical report. Yiran Zhong is the corresponding author

  24. Single-Temporal Supervised Learning for Universal Remote Sensing Change Detection

    Authors: Zhuo Zheng, Yanfei Zhong, Ailong Ma, Liangpei Zhang

    Abstract: Bitemporal supervised learning paradigm always dominates remote sensing change detection using numerous labeled bitemporal image pairs, especially for high spatial resolution (HSR) remote sensing imagery. However, it is very expensive and labor-intensive to label change regions in large-scale bitemporal HSR remote sensing image pairs. In this paper, we propose single-temporal supervised learning (… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: IJCV 2024. arXiv admin note: text overlap with arXiv:2108.07002

  25. arXiv:2406.15306  [pdf

    cs.LG cs.CL cs.CV

    Advanced Multimodal Deep Learning Architecture for Image-Text Matching

    Authors: Jinyin Wang, Haijing Zhang, Yihao Zhong, Yingbin Liang, Rongwei Ji, Yiru Cang

    Abstract: Image-text matching is a key multimodal task that aims to model the semantic association between images and text as a matching relationship. With the advent of the multimedia information age, image, and text data show explosive growth, and how to accurately realize the efficient and accurate semantic correspondence between them has become the core issue of common concern in academia and industry.… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2405.17460 by other authors

  26. arXiv:2406.15000  [pdf, other

    cs.CL cs.AI

    Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations

    Authors: Lichao Zhang, Jia Yu, Shuai Zhang, Long Li, Yangyang Zhong, Guanbao Liang, Yuming Yan, Qing Ma, Fangsheng Weng, Fayu Pan, Jing Li, Renjun Xu, Zhenzhong Lan

    Abstract: Large Language Models (LLMs) have significantly advanced user-bot interactions, enabling more complex and coherent dialogues. However, the prevalent text-only modality might not fully exploit the potential for effective user engagement. This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement in chatbot conversations. We cond… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  27. arXiv:2406.14069  [pdf, other

    eess.IV cs.CV

    Towards Multi-modality Fusion and Prototype-based Feature Refinement for Clinically Significant Prostate Cancer Classification in Transrectal Ultrasound

    Authors: Hong Wu, Juan Fu, Hongsheng Ye, Yuming Zhong, Xuebin Zou, Jianhua Zhou, Yi Wang

    Abstract: Prostate cancer is a highly prevalent cancer and ranks as the second leading cause of cancer-related deaths in men globally. Recently, the utilization of multi-modality transrectal ultrasound (TRUS) has gained significant traction as a valuable technique for guiding prostate biopsies. In this study, we propose a novel learning framework for clinically significant prostate cancer (csPCa) classifica… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  28. arXiv:2406.13942  [pdf, other

    cs.LG

    Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models

    Authors: Yuan Zhong, Xiaochen Wang, Jiaqi Wang, Xiaokun Zhang, Yaqing Wang, Mengdi Huai, Cao Xiao, Fenglong Ma

    Abstract: Synthesizing electronic health records (EHR) data has become a preferred strategy to address data scarcity, improve data quality, and model fairness in healthcare. However, existing approaches for EHR data generation predominantly rely on state-of-the-art generative techniques like generative adversarial networks, variational autoencoders, and language models. These methods typically replicate inp… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  29. arXiv:2406.13524  [pdf, ps, other

    math.CV math.DS

    Koebe uniformization for infinitely connected attracting Fatou domains

    Authors: Xiaoguang Wang, Yi Zhong

    Abstract: This paper works on the structure of infinitely connected Fatou damains of rational maps in terms of Koebe uniformization. Due to the complicated boundary behavior, the existing uniformization results are failed to apply in general. We proved that if the rational map is geometrically finite, then its infinitely connected attracting Fatou damain is conformally homeomorphic to a circle domain.

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 13 pages

    MSC Class: 30C20(Primary); 30C35(Secondary)

  30. arXiv:2406.13078  [pdf

    physics.med-ph

    A universal bioluminescence tomography system for pre-clinical image-guided radiotherapy research

    Authors: Zhishen Tong, Zijian Deng, Xiangkun Xu, Ciara Newman, Xun Jia, Yuncheng Zhong, Merle Reinhart, Paul Tsouchlos, Tim Devling, Hamid Dehghani, Iulian Iordachita, Debabrata Saha, John W. Wong, Ken Kang-Hsin Wang

    Abstract: CBCT-guided small animal irradiators encounter challenges in localizing soft-tissue targets due to low imaging contrast. Bioluminescence tomography (BLT) offers a promising solution, but they have largely remained in laboratorial development, limiting accessibility for researchers. In this work, we develop a universal, commercial-graded BLT-guided system (MuriGlo) designed to seamlessly integrate… ▽ More

    Submitted 27 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  31. arXiv:2406.11194  [pdf, other

    cs.CL

    In-Context Editing: Learning Knowledge from Self-Induced Distributions

    Authors: Siyuan Qi, Bangcheng Yang, Kailin Jiang, Xiaobo Wang, Jiaqi Li, Yifan Zhong, Yaodong Yang, Zilong Zheng

    Abstract: The existing fine-tuning paradigm for language models is brittle in knowledge editing scenarios, where the model must incorporate new information without extensive retraining. This brittleness often results in overfitting, reduced performance, and unnatural language generation. To address this, we propose Consistent In-Context Editing (ICE), a novel approach that leverages the model's in-context l… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  32. arXiv:2406.10514  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis

    Authors: Zehua Kcriss Li, Meiying Melissa Chen, Yi Zhong, Pinxin Liu, Zhiyao Duan

    Abstract: Expressive speech synthesis aims to generate speech that captures a wide range of para-linguistic features, including emotion and articulation, though current research primarily emphasizes emotional aspects over the nuanced articulatory features mastered by professional voice actors. Inspired by this, we explore expressive speech synthesis through the lens of articulatory phonetics. Specifically,… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  33. arXiv:2406.10457  [pdf, other

    quant-ph

    Noise-induced quantum synchronization and maximally entangled mixed states in superconducting circuits

    Authors: Ziyu Tao, Finn Schmolke, Chang-Kang Hu, Wenhui Huang, Yuxuan Zhou, Jiawei Zhang, Ji Chu, Libo Zhang, Xuandong Sun, Zecheng Guo, Jingjing Niu, Wenle Weng, Song Liu, Youpeng Zhong, Dian Tan, Dapeng Yu, Eric Lutz

    Abstract: Random fluctuations can lead to cooperative effects in complex systems. We here report the experimental observation of noise-induced quantum synchronization in a chain of superconducting transmon qubits with nearest-neighbor interactions. The application of Gaussian white noise to a single site leads to synchronous oscillations in the entire chain. We show that the two synchronized end qubits are… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  34. arXiv:2406.07490  [pdf, other

    astro-ph.SR physics.plasm-ph

    Transition from decaying to decayless kink oscillations of solar coronal loops

    Authors: Valery M. Nakariakov, Yu Zhong, Dmitrii Y. Kolotkov

    Abstract: The transition of an impulsively excited kink oscillation of a solar coronal loop to an oscillation with a stationary amplitude, i.e., the damping pattern, is determined using the low-dimensional self-oscillation model. In the model, the decayless kink oscillations are sustained by the interaction of the oscillating loop with an external quasi-steady flow. The analytical solution is based on the a… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted for publication in MNRAS, 8 pages, 5 figures

  35. arXiv:2406.06858  [pdf, other

    cs.LG cs.DC

    FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion

    Authors: Li-Wen Chang, Wenlei Bao, Qi Hou, Chengquan Jiang, Ningxin Zheng, Yinmin Zhong, Xuanrun Zhang, Zuquan Song, Ziheng Jiang, Haibin Lin, Xin Jin, Xin Liu

    Abstract: Large deep learning models have demonstrated strong ability to solve many tasks across a wide range of applications. Those large models typically require training and inference to be distributed. Tensor parallelism is a common technique partitioning computation of an operation or layer across devices to overcome the memory capacity limitation of a single processor, and/or to accelerate computation… ▽ More

    Submitted 18 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  36. arXiv:2406.03075  [pdf, other

    cs.CL

    Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework

    Authors: Xiaoxi Sun, Jinpeng Li, Yan Zhong, Dongyan Zhao, Rui Yan

    Abstract: The advent of large language models (LLMs) has facilitated the development of natural language text generation. It also poses unprecedented challenges, with content hallucination emerging as a significant concern. Existing solutions often involve expensive and complex interventions during the training process. Moreover, some approaches emphasize problem disassembly while neglecting the crucial val… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 18 pages, 3 figures

  37. arXiv:2406.00919  [pdf, other

    cs.CV cs.MM

    Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-wise Pseudo Labeling

    Authors: Jinxing Zhou, Dan Guo, Yiran Zhong, Meng Wang

    Abstract: The Audio-Visual Video Parsing task aims to identify and temporally localize the events that occur in either or both the audio and visual streams of audible videos. It often performs in a weakly-supervised manner, where only video event labels are provided, \ie, the modalities and the timestamps of the labels are unknown. Due to the lack of densely annotated labels, recent work attempts to leverag… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: IJCV 2024 Accepted. arXiv admin note: substantial text overlap with arXiv:2303.02344

  38. arXiv:2406.00491  [pdf, other

    cs.NI

    Optimizing Age of Information in Random Access Networks: A Second-Order Approach for Active/Passive Users

    Authors: Siqi Fan, Yuxin Zhong, I-Hong Hou, Clement K Kam

    Abstract: In this paper, we study the moments of the Age of Information (AoI) for both active and passive users in a random access network. In this network, active users broadcast sensing data, while passive users detect in-band radio activities from out-of-network devices, such as jammers. Collisions occur when multiple active users transmit simultaneously. Passive users can detect radio activities only wh… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Transaction on Communications. arXiv admin note: text overlap with arXiv:2305.05137

  39. arXiv:2405.21022  [pdf, other

    cs.CL cs.CV

    You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet

    Authors: Zhen Qin, Yuxin Mao, Xuyang Shen, Dong Li, Jing Zhang, Yuchao Dai, Yiran Zhong

    Abstract: Linear attention mechanisms have gained prominence in causal language models due to their linear computational complexity and enhanced speed. However, the inherent decay mechanism in linear attention presents challenges when applied to multi-dimensional sequence modeling tasks, such as image processing and multi-modal learning. In these scenarios, the utilization of sequential scanning to establis… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Technical report. Yiran Zhong is the corresponding author. The code is available at https://github.com/OpenNLPLab/LightNet

  40. arXiv:2405.17383  [pdf, other

    cs.CL

    Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective

    Authors: Zhen Qin, Xuyang Shen, Dong Li, Weigao Sun, Stan Birchfield, Richard Hartley, Yiran Zhong

    Abstract: We present the Linear Complexity Sequence Model (LCSM), a comprehensive solution that unites various sequence modeling techniques with linear complexity, including linear attention, state space model, long convolution, and linear RNN, within a single framework. The goal is to enhance comprehension of these models by analyzing the impact of each component from a cohesive and streamlined viewpoint.… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Technical report. Yiran Zhong is the corresponding author

  41. arXiv:2405.17381  [pdf, other

    cs.CL

    Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention

    Authors: Zhen Qin, Weigao Sun, Dong Li, Xuyang Shen, Weixuan Sun, Yiran Zhong

    Abstract: We present Lightning Attention, the first linear attention implementation that maintains a constant training speed for various sequence lengths under fixed memory consumption. Due to the issue with cumulative summation operations (cumsum), previous linear attention implementations cannot achieve their theoretical advantage in a casual setting. However, this issue can be effectively solved by utili… ▽ More

    Submitted 20 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024. Yiran Zhong is the corresponding author. Code is released at github.com/OpenNLPLab/TransnormerLLM

  42. arXiv:2405.15542  [pdf, other

    cs.NI cs.DC cs.LG eess.SP

    SATSense: Multi-Satellite Collaborative Framework for Spectrum Sensing

    Authors: Haoxuan Yuan, Zhe Chen, Zheng Lin, Jinbo Peng, Zihan Fang, Yuhang Zhong, Zihang Song, Yue Gao

    Abstract: Low Earth Orbit satellite Internet has recently been deployed, providing worldwide service with non-terrestrial networks. With the large-scale deployment of both non-terrestrial and terrestrial networks, limited spectrum resources will not be allocated enough. Consequently, dynamic spectrum sharing is crucial for their coexistence in the same spectrum, where accurate spectrum sensing is essential.… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 13 pages, 16 figures

  43. arXiv:2405.14582  [pdf, other

    cs.CV cs.AI

    PoseCrafter: One-Shot Personalized Video Synthesis Following Flexible Pose Control

    Authors: Yong Zhong, Min Zhao, Zebin You, Xiaofeng Yu, Changwang Zhang, Chongxuan Li

    Abstract: In this paper, we introduce PoseCrafter, a one-shot method for personalized video generation following the control of flexible poses. Built upon Stable Diffusion and ControlNet, we carefully design an inference process to produce high-quality videos without the corresponding ground-truth frames. First, we select an appropriate reference frame from the training video and invert it to initialize all… ▽ More

    Submitted 18 July, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  44. arXiv:2405.13967  [pdf, other

    cs.CL

    DeTox: Toxic Subspace Projection for Model Editing

    Authors: Rheeya Uppaal, Apratim Dey, Yiting He, Yiqiao Zhong, Junjie Hu

    Abstract: Recent alignment algorithms such as direct preference optimization (DPO) have been developed to improve the safety of large language models (LLMs) by training these models to match human behaviors exemplified by preference data. However, these methods are both computationally intensive and lacking in controllability and transparency, making them prone to jailbreaking and inhibiting their widesprea… ▽ More

    Submitted 28 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: Preprint

  45. arXiv:2405.11349  [pdf, other

    cs.LG

    Unlock the Power of Algorithm Features: A Generalization Analysis for Algorithm Selection

    Authors: Xingyu Wu, Yan Zhong, Jibin Wu, Yuxiao Huang, Sheng-hao Wu, Kay Chen Tan

    Abstract: In the algorithm selection research, the discussion surrounding algorithm features has been significantly overshadowed by the emphasis on problem features. Although a few empirical studies have yielded evidence regarding the effectiveness of algorithm features, the potential benefits of incorporating algorithm features into algorithm selection models and their suitability for different scenarios r… ▽ More

    Submitted 3 June, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

  46. arXiv:2405.05968  [pdf, other

    cs.LG stat.ML

    A Universal Growth Rate for Learning with Smooth Surrogate Losses

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: This paper presents a comprehensive analysis of the growth rate of $H$-consistency bounds (and excess error bounds) for various surrogate losses used in classification. We prove a square-root growth rate near zero for smooth margin-based surrogate losses in binary classification, providing both upper and lower bounds under mild assumptions. This result also translates to excess error bounds. Our l… ▽ More

    Submitted 8 July, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  47. arXiv:2405.03091  [pdf

    cs.CV cs.LG

    Research on Image Recognition Technology Based on Multimodal Deep Learning

    Authors: Jinyin Wang, Xingchen Li, Yixuan Jin, Yihao Zhong, Keke Zhang, Chang Zhou

    Abstract: This project investigates the human multi-modal behavior identification algorithm utilizing deep neural networks. According to the characteristics of different modal information, different deep neural networks are used to adapt to different modal video information. Through the integration of various deep neural networks, the algorithm successfully identifies behaviors across multiple modalities. I… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  48. arXiv:2405.03025  [pdf, other

    cs.CV

    Matten: Video Generation with Mamba-Attention

    Authors: Yu Gao, Jiancheng Huang, Xiaopeng Sun, Zequn Jie, Yujie Zhong, Lin Ma

    Abstract: In this paper, we introduce Matten, a cutting-edge latent diffusion model with Mamba-Attention architecture for video generation. With minimal computational cost, Matten employs spatial-temporal attention for local video content modeling and bidirectional Mamba for global video content modeling. Our comprehensive experimental evaluation demonstrates that Matten has competitive performance with the… ▽ More

    Submitted 10 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

  49. arXiv:2405.02660  [pdf, other

    cs.IT eess.SP

    AFDM Channel Estimation in Multi-Scale Multi-Lag Channels

    Authors: Rongyou Cao, Yuheng Zhong, Jiangbin Lyu, Deqing Wang, Liqun Fu

    Abstract: Affine Frequency Division Multiplexing (AFDM) is a brand new chirp-based multi-carrier (MC) waveform for high mobility communications, with promising advantages over Orthogonal Frequency Division Multiplexing (OFDM) and other MC waveforms. Existing AFDM research focuses on wireless communication at high carrier frequency (CF), which typically considers only Doppler frequency shift (DFS) as a resul… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 6 pages, 6 figures. Investigate AFDM under underwater multi-scale multi-lag channels. Derive the new input-output formula with the impact of Doppler time scaling. Propose two new channel estimation methods to tackle different level of Doppler factors. Perform diversity analyis based on CFR overlap probability (COP) and mutual incoherent property (MIP)

  50. arXiv:2405.01312  [pdf, other

    cs.DB cs.CR

    Privacy-Enhanced Database Synthesis for Benchmark Publishing

    Authors: Yongrui Zhong, Yunqing Ge, Jianbin Qin, Shuyuan Zheng, Bo Tang, Yu-Xuan Qiu, Rui Mao, Ye Yuan, Makoto Onizuka, Chuan Xiao

    Abstract: Benchmarking is crucial for evaluating a DBMS, yet existing benchmarks often fail to reflect the varied nature of user workloads. As a result, there is increasing momentum toward creating databases that incorporate real-world user data to more accurately mirror business environments. However, privacy concerns deter users from directly sharing their data, underscoring the importance of creating syn… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.