Skip to main content

Showing 1–50 of 180 results for author: Xu, C

  1. arXiv:2407.07554  [pdf, other

    cs.GR cs.SD eess.AS

    Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation

    Authors: Zikai Huang, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Chenxi Zheng, Jing Qin, Shengfeng He

    Abstract: Dance, as an art form, fundamentally hinges on the precise synchronization with musical beats. However, achieving aesthetically pleasing dance sequences from music is challenging, with existing methods often falling short in controllability and beat alignment. To address these shortcomings, this paper introduces Beat-It, a novel framework for beat-specific, key pose-guided dance generation. Unlike… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  2. arXiv:2407.03050  [pdf, other

    eess.SP

    Semantic-Aware Power Allocation for Generative Semantic Communications with Foundation Models

    Authors: Chunmei Xu, Mahdi Boloursaz Mashhadi, Yi Ma, Rahim Tafazolli

    Abstract: Recent advancements in diffusion models have made a significant breakthrough in generative modeling. The combination of the generative model and semantic communication (SemCom) enables high-fidelity semantic information exchange at ultra-low rates. A novel generative SemCom framework for image tasks is proposed, wherein pre-trained foundation models serve as semantic encoders and decoders for sema… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  3. arXiv:2407.00467  [pdf, other

    cs.LG cs.DC eess.IV

    VcLLM: Video Codecs are Secretly Tensor Codecs

    Authors: Ceyu Xu, Yongji Wu, Xinyu Yang, Beidi Chen, Matthew Lentz, Danyang Zhuo, Lisa Wu Wills

    Abstract: As the parameter size of large language models (LLMs) continues to expand, the need for a large memory footprint and high communication bandwidth have become significant bottlenecks for the training and inference of LLMs. To mitigate these bottlenecks, various tensor compression techniques have been proposed to reduce the data size, thereby alleviating memory requirements and communication pressur… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  4. arXiv:2406.18201  [pdf, other

    eess.IV cs.CV

    EFCNet: Every Feature Counts for Small Medical Object Segmentation

    Authors: Lingjie Kong, Qiaoling Wei, Chengming Xu, Han Chen, Yanwei Fu

    Abstract: This paper explores the segmentation of very small medical objects with significant clinical value. While Convolutional Neural Networks (CNNs), particularly UNet-like models, and recent Transformers have shown substantial progress in image segmentation, our empirical findings reveal their poor performance in segmenting the small medical objects and lesions concerned in this paper. This limitation… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  5. arXiv:2406.17173  [pdf, other

    eess.IV cs.CV cs.LG

    Diff3Dformer: Leveraging Slice Sequence Diffusion for Enhanced 3D CT Classification with Transformer Networks

    Authors: Zihao Jin, Yingying Fang, Jiahao Huang, Caiwen Xu, Simon Walsh, Guang Yang

    Abstract: The manifestation of symptoms associated with lung diseases can vary in different depths for individual patients, highlighting the significance of 3D information in CT scans for medical image classification. While Vision Transformer has shown superior performance over convolutional neural networks in image classification tasks, their effectiveness is often demonstrated on sufficiently large 2D dat… ▽ More

    Submitted 26 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: conference

  6. arXiv:2406.16896  [pdf, other

    eess.SP cs.LG

    f-GAN: A frequency-domain-constrained generative adversarial network for PPG to ECG synthesis

    Authors: Nathan C. L. Kong, Dae Lee, Huyen Do, Dae Hoon Park, Cong Xu, Hongda Mao, Jonathan Chung

    Abstract: Electrocardiograms (ECGs) and photoplethysmograms (PPGs) are generally used to monitor an individual's cardiovascular health. In clinical settings, ECGs and fingertip PPGs are the main signals used for assessing cardiovascular health, but the equipment necessary for their collection precludes their use in daily monitoring. Although PPGs obtained from wrist-worn devices are susceptible to noise due… ▽ More

    Submitted 15 May, 2024; originally announced June 2024.

  7. arXiv:2406.16200  [pdf, other

    cs.LG cs.CR cs.IT eess.SP

    Towards unlocking the mystery of adversarial fragility of neural networks

    Authors: Jingchao Gao, Raghu Mudumbai, Xiaodong Wu, Jirong Yi, Catherine Xu, Hui Xie, Weiyu Xu

    Abstract: In this paper, we study the adversarial robustness of deep neural networks for classification tasks. We look at the smallest magnitude of possible additive perturbations that can change the output of a classification algorithm. We provide a matrix-theoretic explanation of the adversarial fragility of deep neural network for classification. In particular, our theoretical results show that neural ne… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 21 pages

  8. arXiv:2406.15846  [pdf, other

    cs.CL eess.AS

    Revisiting Interpolation Augmentation for Speech-to-Text Generation

    Authors: Chen Xu, Jie Wang, Xiaoqian Liu, Qianqian Dong, Chunliang Zhang, Tong Xiao, Jingbo Zhu, Dapeng Man, Wu Yang

    Abstract: Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Findings

  9. arXiv:2406.15668  [pdf, other

    cs.CL cs.SD eess.AS

    PI-Whisper: An Adaptive and Incremental ASR Framework for Diverse and Evolving Speaker Characteristics

    Authors: Amir Nassereldine, Dancheng Liu, Chenhui Xu, Jinjun Xiong

    Abstract: As edge-based automatic speech recognition (ASR) technologies become increasingly prevalent for the development of intelligent and personalized assistants, three important challenges must be addressed for these resource-constrained ASR models, i.e., adaptivity, incrementality, and inclusivity. We propose a novel ASR framework, PI-Whisper, in this work and show how it can improve an ASR's recogniti… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 11 pages, 3 figures

  10. arXiv:2406.08248  [pdf, other

    eess.SY

    Traffic Signal Cycle Control with Centralized Critic and Decentralized Actors under Varying Intervention Frequencies

    Authors: Maonan Wang, Yirong Chen, Yuheng Kan, Chengcheng Xu, Michael Lepech, Man-On Pun, Xi Xiong

    Abstract: Traffic congestion in urban areas is a significant problem, leading to prolonged travel times, reduced efficiency, and increased environmental concerns. Effective traffic signal control (TSC) is a key strategy for reducing congestion. Unlike most TSC systems that rely on high-frequency control, this study introduces an innovative joint phase traffic signal cycle control method that operates effect… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 26 pages, 17 figures

  11. arXiv:2406.03510  [pdf, other

    cs.SD cs.AI eess.AS

    Speech-based Clinical Depression Screening: An Empirical Study

    Authors: Yangbin Chen, Chenyang Xu, Chunfeng Liang, Yanbao Tao, Chuan Shi

    Abstract: This study investigates the utility of speech signals for AI-based depression screening across varied interaction scenarios, including psychiatric interviews, chatbot conversations, and text readings. Participants include depressed patients recruited from the outpatient clinics of Peking University Sixth Hospital and control group members from the community, all diagnosed by psychiatrists followin… ▽ More

    Submitted 12 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 figures

  12. arXiv:2406.01138  [pdf, ps, other

    eess.SP cs.IT

    Precise Analysis of Covariance Identifiability for Activity Detection in Grant-Free Random Access

    Authors: Shengsong Luo, Junjie Ma, Chongbin Xu, Xin Wang

    Abstract: We consider the identifiability issue of maximum likelihood based activity detection in massive MIMO based grant-free random access. A prior work by Chen et al. indicates that the identifiability undergoes a phase transition for commonly-used random signatures. In this paper, we provide an analytical characterization of the boundary of the phase transition curve. Our theoretical results agree well… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  13. arXiv:2406.00497  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    Recent Advances in End-to-End Simultaneous Speech Translation

    Authors: Xiaoqian Liu, Guoqiang Hu, Yangfan Du, Erfeng He, YingFeng Luo, Chen Xu, Tong Xiao, Jingbo Zhu

    Abstract: Simultaneous speech translation (SimulST) is a demanding task that involves generating translations in real-time while continuously processing speech input. This paper offers a comprehensive overview of the recent developments in SimulST research, focusing on four major challenges. Firstly, the complexities associated with processing lengthy and continuous speech streams pose significant hurdles.… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  14. arXiv:2405.15438  [pdf, other

    cs.CV cs.LG eess.IV

    Comparing remote sensing-based forest biomass mapping approaches using new forest inventory plots in contrasting forests in northeastern and southwestern China

    Authors: Wenquan Dong, Edward T. A. Mitchard, Yuwei Chen, Man Chen, Congfeng Cao, Peilun Hu, Cong Xu, Steven Hancock

    Abstract: Large-scale high spatial resolution aboveground biomass (AGB) maps play a crucial role in determining forest carbon stocks and how they are changing, which is instrumental in understanding the global carbon cycle, and implementing policy to mitigate climate change. The advent of the new space-borne LiDAR sensor, NASA's GEDI instrument, provides unparalleled possibilities for the accurate and unbia… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  15. arXiv:2405.14336  [pdf, other

    eess.IV

    I$^2$VC: A Unified Framework for Intra- & Inter-frame Video Compression

    Authors: Meiqin Liu, Chenming Xu, Yukai Gu, Chao Yao, Yao Zhao

    Abstract: Video compression aims to reconstruct seamless frames by encoding the motion and residual information from existing frames. Previous neural video compression methods necessitate distinct codecs for three types of frames (I-frame, P-frame and B-frame), which hinders a unified approach and generalization across different video contexts. Intra-codec techniques lack the advanced Motion Estimation and… ▽ More

    Submitted 1 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: 19 pages, 10 figures

  16. arXiv:2405.13678  [pdf, other

    cs.IT eess.SP

    Integrated Sensing and Communication Exploiting Prior Information: How Many Sensing Beams are Needed?

    Authors: Chan Xu, Shuowen Zhang

    Abstract: This paper studies an integrated sensing and communication (ISAC) system where a multi-antenna base station (BS) aims to communicate with a single-antenna user in the downlink and sense the unknown and random angle parameter of a target via exploiting its prior distribution information. We consider a general transmit beamforming structure where the BS sends one communication beam and potentially o… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: This is the longer version of a paper to appear in IEEE International Symposium on Information Theory (ISIT), 2024

  17. arXiv:2405.09753  [pdf, other

    cs.IT eess.SP

    Stacked Intelligent Metasurfaces for Holographic MIMO Aided Cell-Free Networks

    Authors: Qingchao Li, Mohammed El-Hajjar, Chao Xu, Jiancheng An, Chau Yuen, Lajos Hanzo

    Abstract: Large-scale multiple-input and multiple-output (MIMO) systems are capable of achieving high date rate. However, given the high hardware cost and excessive power consumption of massive MIMO systems, as a remedy, intelligent metasurfaces have been designed for efficient holographic MIMO (HMIMO) systems. In this paper, we propose a HMIMO architecture based on stacked intelligent metasurfaces (SIM) fo… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  18. arXiv:2405.06230  [pdf

    eess.IV

    Fire in SRRN: Next-Gen 3D Temperature Field Reconstruction Technology

    Authors: Shenxiang Feng, Xiaojian Hao, Xiaodong Huang, Pan Pei, Tong Wei, Chenyang Xu

    Abstract: In aerospace and energy engineering, accurate 3D combustion field temperature measurement is critical. The resolution of traditional methods based on algebraic iteration is limited by the initial voxel division. This study introduces a novel method for reconstructing three-dimensional temperature fields using the Spatial Radiation Representation Network (SRRN). This method utilizes the flame therm… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  19. arXiv:2405.01882  [pdf, other

    cs.RO cs.AI eess.SP

    Millimeter Wave Radar-based Human Activity Recognition for Healthcare Monitoring Robot

    Authors: Zhanzhong Gu, Xiangjian He, Gengfa Fang, Chengpei Xu, Feng Xia, Wenjing Jia

    Abstract: Healthcare monitoring is crucial, especially for the daily care of elderly individuals living alone. It can detect dangerous occurrences, such as falls, and provide timely alerts to save lives. Non-invasive millimeter wave (mmWave) radar-based healthcare monitoring systems using advanced human activity recognition (HAR) models have recently gained significant attention. However, they encounter cha… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  20. arXiv:2404.15961  [pdf, other

    eess.SP cs.AI

    Soil analysis with machine-learning-based processing of stepped-frequency GPR field measurements: Preliminary study

    Authors: Chunlei Xu, Michael Pregesbauer, Naga Sravani Chilukuri, Daniel Windhager, Mahsa Yousefi, Pedro Julian, Lothar Ratschbacher

    Abstract: Ground Penetrating Radar (GPR) has been widely studied as a tool for extracting soil parameters relevant to agriculture and horticulture. When combined with Machine-Learning-based (ML) methods, high-resolution Stepped Frequency Countinuous Wave Radar (SFCW) measurements hold the promise to give cost effective access to depth resolved soil parameters, including at root-level depth. In a first step… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  21. arXiv:2404.10235  [pdf, ps, other

    eess.SP

    Integrated Sensing and Communication for Edge Inference with End-to-End Multi-View Fusion

    Authors: Xibin Jin, Guoliang Li, Shuai Wang, Miaowen Wen, Chengzhong Xu, H. Vincent Poor

    Abstract: Integrated sensing and communication (ISAC) is a promising solution to accelerate edge inference via the dual use of wireless signals. However, this paradigm needs to minimize the inference error and latency under ISAC co-functionality interference, for which the existing ISAC or edge resource allocation algorithms become inefficient, as they ignore the inter-dependency between low-level ISAC desi… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  22. arXiv:2404.06676  [pdf

    cs.LG eess.SP stat.AP

    Topological Feature Search Method for Multichannel EEG: Application in ADHD classification

    Authors: Tianming Cai, Guoying Zhao, Junbin Zang, Chen Zong, Zhidong Zhang, Chenyang Xue

    Abstract: In recent years, the preliminary diagnosis of Attention Deficit Hyperactivity Disorder (ADHD) using electroencephalography (EEG) has garnered attention from researchers. EEG, known for its expediency and efficiency, plays a pivotal role in the diagnosis and treatment of ADHD. However, the non-stationarity of EEG signals and inter-subject variability pose challenges to the diagnostic and classifica… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  23. arXiv:2403.18826  [pdf

    q-bio.QM eess.IV eess.SY

    SAM-dPCR: Real-Time and High-throughput Absolute Quantification of Biological Samples Using Zero-Shot Segment Anything Model

    Authors: Yuanyuan Wei, Shanhang Luo, Changran Xu, Yingqi Fu, Qingyue Dong, Yi Zhang, Fuyang Qu, Guangyao Cheng, Yi-Ping Ho, Ho-Pui Ho, Wu Yuan

    Abstract: Digital PCR (dPCR) has revolutionized nucleic acid diagnostics by enabling absolute quantification of rare mutations and target sequences. However, current detection methodologies face challenges, as flow cytometers are costly and complex, while fluorescence imaging methods, relying on software or manual counting, are time-consuming and prone to errors. To address these limitations, we present SAM… ▽ More

    Submitted 22 January, 2024; originally announced March 2024.

    Comments: 23 pages, 6 figures

  24. arXiv:2403.15944  [pdf, other

    cs.CV cs.AI eess.IV

    Adaptive Super Resolution For One-Shot Talking-Head Generation

    Authors: Luchuan Song, Pinxin Liu, Guojun Yin, Chenliang Xu

    Abstract: The one-shot talking-head generation learns to synthesize a talking-head video with one source portrait image under the driving of same or different identity video. Usually these methods require plane-based pixel transformations via Jacobin matrices or facial image warps for novel poses generation. The constraints of using a single image source and pixel displacements often compromise the clarity… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: 5 pages, 3 figures

  25. arXiv:2402.19020  [pdf, other

    eess.IV cs.CV

    Unsupervised Learning of High-resolution Light Field Imaging via Beam Splitter-based Hybrid Lenses

    Authors: Jianxin Lei, Chengcai Xu, Langqing Shi, Junhui Hou, Ping Zhou

    Abstract: In this paper, we design a beam splitter-based hybrid light field imaging prototype to record 4D light field image and high-resolution 2D image simultaneously, and make a hybrid light field dataset. The 2D image could be considered as the high-resolution ground truth corresponding to the low-resolution central sub-aperture image of 4D light field image. Subsequently, we propose an unsupervised lea… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  26. arXiv:2402.13763  [pdf, other

    cs.SD eess.AS

    Music Style Transfer with Time-Varying Inversion of Diffusion Models

    Authors: Sifei Li, Yuxin Zhang, Fan Tang, Chongyang Ma, Weiming dong, Changsheng Xu

    Abstract: With the development of diffusion models, text-guided image style transfer has demonstrated high-quality controllable synthesis results. However, the utilization of text for diverse music style transfer poses significant challenges, primarily due to the limited availability of matched audio-text datasets. Music, being an abstract and complex art form, exhibits variations and intricacies even withi… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: 7 pages, 4 figures, AAAI 2024

  27. arXiv:2402.01808  [pdf, other

    cs.SD eess.AS

    KS-Net: Multi-band joint speech restoration and enhancement network for 2024 ICASSP SSI Challenge

    Authors: Guochen Yu, Runqiang Han, Chenglin Xu, Haoran Zhao, Nan Li, Chen Zhang, Xiguang Zheng, Chao Zhou, Qi Huang, Bing Yu

    Abstract: This paper presents the speech restoration and enhancement system created by the 1024K team for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. Our system consists of a generative adversarial network (GAN) in complex-domain for speech restoration and a fine-grained multi-band fusion module for speech enhancement. In the blind test set of SSI, the proposed system achieves an overall mean… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: Accepted to ICASSP 2024; Rank 1st in ICASSP 2024 Speech Signal Improvement (SSI) Challenge

  28. arXiv:2402.01509  [pdf, other

    eess.IV cs.CV cs.LG

    Advancing Brain Tumor Inpainting with Generative Models

    Authors: Ruizhi Zhu, Xinru Zhang, Haowen Pang, Chundan Xu, Chuyang Ye

    Abstract: Synthesizing healthy brain scans from diseased brain scans offers a potential solution to address the limitations of general-purpose algorithms, such as tissue segmentation and brain extraction algorithms, which may not effectively handle diseased images. We consider this a 3D inpainting task and investigate the adaptation of 2D inpainting methods to meet the requirements of 3D magnetic resonance… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  29. arXiv:2401.17800  [pdf, other

    cs.SD cs.MM eess.AS

    Dance-to-Music Generation with Encoder-based Textual Inversion of Diffusion Models

    Authors: Sifei Li, Weiming Dong, Yuxin Zhang, Fan Tang, Chongyang Ma, Oliver Deussen, Tong-Yee Lee, Changsheng Xu

    Abstract: The harmonious integration of music with dance movements is pivotal in vividly conveying the artistic essence of dance. This alignment also significantly elevates the immersive quality of gaming experiences and animation productions. While there has been remarkable advancement in creating high-fidelity music from textual descriptions, current methodologies mainly concentrate on modulating overarch… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 9 pages, 3 figures

  30. arXiv:2401.10269  [pdf, ps, other

    cs.IT eess.SP stat.ME

    Robust Multi-Sensor Multi-Target Tracking Using Possibility Labeled Multi-Bernoulli Filter

    Authors: Han Cai, Chenbao Xue, Jeremie Houssineau, Zhirun Xue

    Abstract: With the increasing complexity of multiple target tracking scenes, a single sensor may not be able to effectively monitor a large number of targets. Therefore, it is imperative to extend the single-sensor technique to Multi-Sensor Multi-Target Tracking (MSMTT) for enhanced functionality. Typical MSMTT methods presume complete randomness of all uncertain components, and therefore effective solution… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  31. arXiv:2401.04935  [pdf, other

    cs.MM cs.CL cs.SD eess.AS

    Learning Audio Concepts from Counterfactual Natural Language

    Authors: Ali Vosoughi, Luca Bondi, Ho-Hsiang Wu, Chenliang Xu

    Abstract: Conventional audio classification relied on predefined classes, lacking the ability to learn from free-form text. Recent methods unlock learning joint audio-text embeddings from raw audio-text pairs describing audio in natural language. Despite recent advancements, there is little exploration of systematic methods to train models for recognizing sound events and sources in alternative scenarios, s… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: Accepted at ICASSP 2024

  32. arXiv:2312.13048  [pdf, other

    cs.IT eess.SP

    MIMO Integrated Sensing and Communication Exploiting Prior Information

    Authors: Chan Xu, Shuowen Zhang

    Abstract: In this paper, we study a multiple-input multiple-output (MIMO) integrated sensing and communication (ISAC) system where one multi-antenna base station (BS) sends information to a user with multiple antennas in the downlink and simultaneously senses the location parameter of a target based on its reflected echo signals received back at the BS receive antennas. We focus on the case where the locati… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: submitted for possible journal publication

  33. arXiv:2312.10952  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Soft Alignment of Modality Space for End-to-end Speech Translation

    Authors: Yuhao Zhang, Kaiqi Kou, Bei Li, Chen Xu, Chunliang Zhang, Tong Xiao, Jingbo Zhu

    Abstract: End-to-end Speech Translation (ST) aims to convert speech into target text within a unified model. The inherent differences between speech and text modalities often impede effective cross-modal and cross-lingual transfer. Existing methods typically employ hard alignment (H-Align) of individual speech and text segments, which can degrade textual representations. To address this, we introduce Soft A… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: Accepted to ICASSP2024

  34. UniTSA: A Universal Reinforcement Learning Framework for V2X Traffic Signal Control

    Authors: Maonan Wang, Xi Xiong, Yuheng Kan, Chengcheng Xu, Man-On Pun

    Abstract: Traffic congestion is a persistent problem in urban areas, which calls for the development of effective traffic signal control (TSC) systems. While existing Reinforcement Learning (RL)-based methods have shown promising performance in optimizing TSC, it is challenging to generalize these methods across intersections of different structures. In this work, a universal RL-based TSC framework is propo… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: 18 pages, 9 figures

    Journal ref: IEEE Transactions on Vehicular Technology, 2024

  35. arXiv:2311.09814  [pdf, ps, other

    cs.IT eess.SP

    Stacked Intelligent Metasurface-Aided MIMO Transceiver Design

    Authors: Jiancheng An, Chau Yuen, Chao Xu, Hongbin Li, Derrick Wing Kwan Ng, Marco Di Renzo, Mérouane Debbah, Lajos Hanzo

    Abstract: Next-generation wireless networks are expected to utilize the limited radio frequency (RF) resources more efficiently with the aid of intelligent transceivers. To this end, we propose a promising transceiver architecture relying on stacked intelligent metasurfaces (SIM). An SIM is constructed by stacking an array of programmable metasurface layers, where each layer consists of a massive number of… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 9 pages, 5 figures, 1 table

  36. arXiv:2311.03810  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Rethinking and Improving Multi-task Learning for End-to-end Speech Translation

    Authors: Yuhao Zhang, Chen Xu, Bei Li, Hao Chen, Tong Xiao, Chunliang Zhang, Jingbo Zhu

    Abstract: Significant improvements in end-to-end speech translation (ST) have been achieved through the application of multi-task learning. However, the extent to which auxiliary tasks are highly consistent with the ST task, and how much this approach truly helps, have not been thoroughly studied. In this paper, we investigate the consistency between different tasks, considering different times and modules.… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: Accepted to EMNLP2023 main conference

  37. arXiv:2310.19216  [pdf, other

    cs.NI eess.SP

    Optimal Status Updates for Minimizing Age of Correlated Information in IoT Networks with Energy Harvesting Sensors

    Authors: Chao Xu, Xinyan Zhang, Howard H. Yang, Xijun Wang, Nikolaos Pappas, Dusit Niyato, Tony Q. S. Quek

    Abstract: Many real-time applications of the Internet of Things (IoT) need to deal with correlated information generated by multiple sensors. The design of efficient status update strategies that minimize the Age of Correlated Information (AoCI) is a key factor. In this paper, we consider an IoT network consisting of sensors equipped with the energy harvesting (EH) capability. We optimize the average AoCI a… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

  38. arXiv:2310.17579  [pdf, other

    cs.LG eess.SP

    BLIS-Net: Classifying and Analyzing Signals on Graphs

    Authors: Charles Xu, Laney Goldman, Valentina Guo, Benjamin Hollander-Bodie, Maedee Trank-Greene, Ian Adelstein, Edward De Brouwer, Rex Ying, Smita Krishnaswamy, Michael Perlmutter

    Abstract: Graph neural networks (GNNs) have emerged as a powerful tool for tasks such as node classification and graph classification. However, much less work has been done on signal classification, where the data consists of many functions (referred to as signals) defined on the vertices of a single graph. These tasks require networks designed differently from those designed for traditional GNN tasks. Inde… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Journal ref: Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:4537-4545, 2024

  39. arXiv:2310.15584  [pdf, other

    cs.LG cs.NI eess.SP

    Accelerating Split Federated Learning over Wireless Communication Networks

    Authors: Ce Xu, Jinxuan Li, Yuan Liu, Yushi Ling, Miaowen Wen

    Abstract: The development of artificial intelligence (AI) provides opportunities for the promotion of deep neural network (DNN)-based applications. However, the large amount of parameters and computational complexity of DNN makes it difficult to deploy it on edge devices which are resource-constrained. An efficient method to address this challenge is model partition/splitting, in which DNN is divided into t… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  40. arXiv:2310.14355  [pdf

    cs.LG eess.IV

    A global product of fine-scale urban building height based on spaceborne lidar

    Authors: Xiao Ma, Guang Zheng, Chi Xu, L. Monika Moskal, Peng Gong, Qinghua Guo, Huabing Huang, Xuecao Li, Yong Pang, Cheng Wang, Huan Xie, Bailang Yu, Bo Zhao, Yuyu Zhou

    Abstract: Characterizing urban environments with broad coverages and high precision is more important than ever for achieving the UN's Sustainable Development Goals (SDGs) as half of the world's populations are living in cities. Urban building height as a fundamental 3D urban structural feature has far-reaching applications. However, so far, producing readily available datasets of recent urban building heig… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

  41. arXiv:2310.11713  [pdf, other

    cs.CV cs.SD eess.AS

    Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation

    Authors: Yiyang Su, Ali Vosoughi, Shijian Deng, Yapeng Tian, Chenliang Xu

    Abstract: The audio-visual sound separation field assumes visible sources in videos, but this excludes invisible sounds beyond the camera's view. Current methods struggle with such sounds lacking visible cues. This paper introduces a novel "Audio-Visual Scene-Aware Separation" (AVSA-Sep) framework. It includes a semantic parser for visible and invisible sounds and a separator for scene-informed separation.… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: Accepted at ICCV 2023 - AV4D, 4 figures, 3 tables

  42. arXiv:2310.10822  [pdf, other

    cs.RO cs.CV eess.SY

    Vision and Language Navigation in the Real World via Online Visual Language Mapping

    Authors: Chengguang Xu, Hieu T. Nguyen, Christopher Amato, Lawson L. S. Wong

    Abstract: Navigating in unseen environments is crucial for mobile robots. Enhancing them with the ability to follow instructions in natural language will further improve navigation efficiency in unseen cases. However, state-of-the-art (SOTA) vision-and-language navigation (VLN) methods are mainly evaluated in simulation, neglecting the complex and noisy real world. Directly transferring SOTA navigation poli… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  43. arXiv:2310.07284  [pdf, other

    eess.AS cs.CL

    Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction

    Authors: Xiang Hao, Jibin Wu, Jianwei Yu, Chenglin Xu, Kay Chen Tan

    Abstract: Humans possess an extraordinary ability to selectively focus on the sound source of interest amidst complex acoustic environments, commonly referred to as cocktail party scenarios. In an attempt to replicate this remarkable auditory attention capability in machines, target speaker extraction (TSE) models have been developed. These models leverage the pre-registered cues of the target speaker to ex… ▽ More

    Submitted 14 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: Under review, https://github.com/haoxiangsnr/llm-tse

  44. arXiv:2310.04779  [pdf, other

    eess.IV cs.CV

    TransCC: Transformer Network for Coronary Artery CCTA Segmentation

    Authors: Chenchu Xu, Meng Li, Xue Wu

    Abstract: The accurate segmentation of Coronary Computed Tomography Angiography (CCTA) images holds substantial clinical value for the early detection and treatment of Coronary Heart Disease (CHD). The Transformer, utilizing a self-attention mechanism, has demonstrated commendable performance in the realm of medical image processing. However, challenges persist in coronary segmentation tasks due to (1) the… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

  45. arXiv:2309.15977  [pdf, other

    cs.SD cs.CV eess.AS

    Neural Acoustic Context Field: Rendering Realistic Room Impulse Response With Neural Fields

    Authors: Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

    Abstract: Room impulse response (RIR), which measures the sound propagation within an environment, is critical for synthesizing high-fidelity audio for a given environment. Some prior work has proposed representing RIR as a neural field function of the sound emitter and receiver positions. However, these methods do not sufficiently consider the acoustic properties of an audio scene, leading to unsatisfactor… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  46. IBVC: Interpolation-driven B-frame Video Compression

    Authors: Chenming Xu, Meiqin Liu, Chao Yao, Weisi Lin, Yao Zhao

    Abstract: Learned B-frame video compression aims to adopt bi-directional motion estimation and motion compensation (MEMC) coding for middle frame reconstruction. However, previous learned approaches often directly extend neural P-frame codecs to B-frame relying on bi-directional optical-flow estimation or video frame interpolation. They suffer from inaccurate quantized motions and inefficient motion compens… ▽ More

    Submitted 14 March, 2024; v1 submitted 24 September, 2023; originally announced September 2023.

    Comments: Submitted to Pattern Recognition

  47. arXiv:2309.12234  [pdf, ps, other

    cs.CL eess.AS

    Bridging the Gaps of Both Modality and Language: Synchronous Bilingual CTC for Speech Translation and Speech Recognition

    Authors: Chen Xu, Xiaoqian Liu, Erfeng He, Yuhao Zhang, Qianqian Dong, Tong Xiao, Jingbo Zhu, Dapeng Man, Wu Yang

    Abstract: In this study, we present synchronous bilingual Connectionist Temporal Classification (CTC), an innovative framework that leverages dual CTC to bridge the gaps of both modality and language in the speech translation (ST) task. Utilizing transcript and translation as concurrent objectives for CTC, our model bridges the gap between audio and text as well as between source and target languages. Build… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  48. arXiv:2309.09924  [pdf, other

    cs.LG eess.SP stat.ML

    Learning graph geometry and topology using dynamical systems based message-passing

    Authors: Dhananjay Bhaskar, Yanlei Zhang, Charles Xu, Xingzhi Sun, Oluwadamilola Fasina, Guy Wolf, Maximilian Nickel, Michael Perlmutter, Smita Krishnaswamy

    Abstract: In this paper we introduce DYMAG: a message passing paradigm for GNNs built on the expressive power of continuous, multiscale graph-dynamics. Standard discrete-time message passing algorithms implicitly make use of simplistic graph dynamics and aggregation schemes which limit their ability to capture fundamental graph topological properties. By contrast, DYMAG makes use of complex graph dynamics b… ▽ More

    Submitted 7 July, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

  49. arXiv:2308.06285  [pdf, other

    cs.HC eess.IV

    An Integrated Visual Analytics System for Studying Clinical Carotid Artery Plaques

    Authors: Chaoqing Xu, Zhentao Zheng, Yiting Fu, Baofeng Chang, Legao Chen, Minghui Wu, Mingli Song, Jinsong Jiang

    Abstract: Carotid artery plaques can cause arterial vascular diseases such as stroke and myocardial infarction, posing a severe threat to human life. However, the current clinical examination mainly relies on a direct assessment by physicians of patients' clinical indicators and medical images, lacking an integrated visualization tool for analyzing the influencing factors and composition of carotid artery p… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

  50. arXiv:2308.00122  [pdf, other

    cs.CV cs.SD eess.AS

    DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models

    Authors: Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, Chenliang Xu

    Abstract: We propose DAVIS, a Diffusion model-based Audio-VIusal Separation framework that solves the audio-visual sound source separation task through a generative manner. While existing discriminative methods that perform mask regression have made remarkable progress in this field, they face limitations in capturing the complex data distribution required for high-quality separation of sounds from diverse… ▽ More

    Submitted 31 July, 2023; originally announced August 2023.