Skip to main content

Showing 1–50 of 171 results for author: Liu, D

  1. arXiv:2407.11541  [pdf, other

    eess.IV cs.CV

    Uniformly Accelerated Motion Model for Inter Prediction

    Authors: Zhuoyuan Li, Yao Li, Chuanbo Tang, Li Li, Dong Liu, Feng Wu

    Abstract: Inter prediction is a key technology to reduce the temporal redundancy in video coding. In natural videos, there are usually multiple moving objects with variable velocity, resulting in complex motion fields that are difficult to represent compactly. In Versatile Video Coding (VVC), existing inter prediction methods usually assume uniform speed motion between consecutive frames and use the linear… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures

  2. arXiv:2407.10926  [pdf, other

    eess.IV cs.CV

    In-Loop Filtering via Trained Look-Up Tables

    Authors: Zhuoyuan Li, Jiacheng Li, Yao Li, Li Li, Dong Liu, Feng Wu

    Abstract: In-loop filtering (ILF) is a key technology for removing the artifacts in image/video coding standards. Recently, neural network-based in-loop filtering methods achieve remarkable coding gains beyond the capability of advanced video coding standards, which becomes a powerful coding tool candidate for future video coding standards. However, the utilization of deep neural networks brings heavy time… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 11 pages, 6 figures

  3. arXiv:2407.08093  [pdf, other

    eess.IV cs.AI cs.CV eess.SP

    MemWarp: Discontinuity-Preserving Cardiac Registration with Memorized Anatomical Filters

    Authors: Hang Zhang, Xiang Chen, Renjiu Hu, Dongdong Liu, Gaolei Li, Rongguang Wang

    Abstract: Many existing learning-based deformable image registration methods impose constraints on deformation fields to ensure they are globally smooth and continuous. However, this assumption does not hold in cardiac image registration, where different anatomical regions exhibit asymmetric motions during respiration and movements due to sliding organs within the chest. Consequently, such global constraint… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 11 pages, 2 figure, 2 tables

  4. arXiv:2407.07667  [pdf, other

    cs.CV eess.IV

    VEnhancer: Generative Space-Time Enhancement for Video Generation

    Authors: Jingwen He, Tianfan Xue, Dongyang Liu, Xinqi Lin, Peng Gao, Dahua Lin, Yu Qiao, Wanli Ouyang, Ziwei Liu

    Abstract: We present VEnhancer, a generative space-time enhancement framework that improves the existing text-to-video results by adding more details in spatial domain and synthetic detailed motion in temporal domain. Given a generated low-quality video, our approach can increase its spatial and temporal resolution simultaneously with arbitrary up-sampling space and time scales through a unified video diffu… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: technical report

  5. arXiv:2407.00280  [pdf, other

    eess.IV cs.CV

    IVCA: Inter-Relation-Aware Video Complexity Analyzer

    Authors: Junqi Liao, Yao Li, Zhuoyuan Li, Li Li, Dong Liu

    Abstract: To meet the real-time analysis requirements of video streaming applications, we propose an inter-relation-aware video complexity analyzer (IVCA) as an extension to VCA. The IVCA addresses the limitation of VCA by considering inter-frame relations, namely motion and reference structure. First, we enhance the accuracy of temporal features by introducing feature-domain motion estimation into the IVCA… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: The report for the solution of second prize winner in ICIP 2024 Grand Challenge on Video Complexity (Team: USTC-iVC_Team1, USTC-iVC_Team2)

  6. arXiv:2406.17926  [pdf, other

    cs.CL cs.SD eess.AS

    FASA: a Flexible and Automatic Speech Aligner for Extracting High-quality Aligned Children Speech Data

    Authors: Dancheng Liu, Jinjun Xiong

    Abstract: Automatic Speech Recognition (ASR) for adults' speeches has made significant progress by employing deep neural network (DNN) models recently, but improvement in children's speech is still unsatisfactory due to children's speech's distinct characteristics. DNN models pre-trained on adult data often struggle in generalizing children's speeches with fine tuning because of the lack of high-quality ali… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 4 pages, 1 figure

  7. arXiv:2406.15668  [pdf, other

    cs.CL cs.SD eess.AS

    PI-Whisper: An Adaptive and Incremental ASR Framework for Diverse and Evolving Speaker Characteristics

    Authors: Amir Nassereldine, Dancheng Liu, Chenhui Xu, Jinjun Xiong

    Abstract: As edge-based automatic speech recognition (ASR) technologies become increasingly prevalent for the development of intelligent and personalized assistants, three important challenges must be addressed for these resource-constrained ASR models, i.e., adaptivity, incrementality, and inclusivity. We propose a novel ASR framework, PI-Whisper, in this work and show how it can improve an ASR's recogniti… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 11 pages, 3 figures

  8. arXiv:2406.14118  [pdf, other

    eess.IV cs.CV

    Prediction and Reference Quality Adaptation for Learned Video Compression

    Authors: Xihua Sheng, Li Li, Dong Liu, Houqiang Li

    Abstract: Temporal prediction is one of the most important technologies for video compression. Various prediction coding modes are designed in traditional video codecs. Traditional video codecs will adaptively to decide the optimal coding mode according to the prediction quality and reference quality. Recently, learned video codecs have made great progress. However, they ignore the prediction and reference… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  9. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, Jinming Guo, Xiaolin Chen, Jingcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  10. arXiv:2405.16485  [pdf, other

    eess.SY

    Make Safe Decisions in Power System: Safe Reinforcement Learning Based Pre-decision Making for Voltage Stability Emergency Control

    Authors: Congbo Bi, Lipeng Zhu, Di Liu, Chao Lu

    Abstract: The high penetration of renewable energy and power electronic equipment bring significant challenges to the efficient construction of adaptive emergency control strategies against various presumed contingencies in today's power systems. Traditional model-based emergency control methods have difficulty in adapt well to various complicated operating conditions in practice. Fr emerging artificial int… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 11 pages

  11. arXiv:2405.09266  [pdf, other

    cs.CV cs.AI cs.MM cs.SD eess.AS

    Dance Any Beat: Blending Beats with Visuals in Dance Video Generation

    Authors: Xuanchen Wang, Heng Wang, Dongnan Liu, Weidong Cai

    Abstract: Automated choreography advances by generating dance from music. Current methods create skeleton keypoint sequences, not full dance videos, and cannot make specific individuals dance, limiting their real-world use. These methods also need precise keypoint annotations, making data collection difficult and restricting the use of self-made video datasets. To overcome these challenges, we introduce a n… ▽ More

    Submitted 16 July, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: 11 pages, 6 figures, demo page: https://DabFusion.github.io

  12. arXiv:2405.08237  [pdf, other

    cs.CL cs.SD eess.AS

    A predictive learning model can simulate temporal dynamics and context effects found in neural representations of continuous speech

    Authors: Oli Danyi Liu, Hao Tang, Naomi Feldman, Sharon Goldwater

    Abstract: Speech perception involves storing and integrating sequentially presented items. Recent work in cognitive neuroscience has identified temporal and contextual characteristics in humans' neural encoding of speech that may facilitate this temporal processing. In this study, we simulated similar analyses with representations extracted from a computational model that was trained on unlabelled speech wi… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: Accepted to CogSci 2024

  13. arXiv:2405.05252  [pdf, other

    cs.CV cs.AI cs.LG eess.IV eess.SP

    Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

    Authors: Hongjie Wang, Difan Liu, Yan Kang, Yijun Li, Zhe Lin, Niraj K. Jha, Yuchen Liu

    Abstract: Diffusion Models (DMs) have exhibited superior performance in generating high-quality and diverse images. However, this exceptional performance comes at the cost of expensive architectural design, particularly due to the attention module heavily used in leading models. Existing works mainly adopt a retraining process to enhance DM efficiency. This is computationally expensive and not very scalable… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

  14. arXiv:2403.19001  [pdf, other

    cs.CV cs.AI eess.IV q-bio.NC

    Cross-domain Fiber Cluster Shape Analysis for Language Performance Cognitive Score Prediction

    Authors: Yui Lo, Yuqian Chen, Dongnan Liu, Wan Liu, Leo Zekelman, Fan Zhang, Yogesh Rathi, Nikos Makris, Alexandra J. Golby, Weidong Cai, Lauren J. O'Donnell

    Abstract: Shape plays an important role in computer graphics, offering informative features to convey an object's morphology and functionality. Shape analysis in brain imaging can help interpret structural and functionality correlations of the human brain. In this work, we investigate the shape of the brain's 3D white matter connections and its potential predictive relationship to human cognitive function.… ▽ More

    Submitted 29 March, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: 2 figures, 11 pages

  15. arXiv:2403.13356  [pdf, other

    eess.AS cs.SD eess.IV

    KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario

    Authors: Huali Zhou, Yuke Lin, Dong Liu, Ming Li

    Abstract: This work aims to promote Chinese opera research in both musical and speech domains, with a primary focus on overcoming the data limitations. We introduce KunquDB, a relatively large-scale, well-annotated audio-visual dataset comprising 339 speakers and 128 hours of content. Originating from the Kunqu Opera Art Canon (Kunqu yishu dadian), KunquDB is meticulously structured by dialogue lines, provi… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  16. arXiv:2403.11694  [pdf, other

    eess.IV cs.CV

    Object Segmentation-Assisted Inter Prediction for Versatile Video Coding

    Authors: Zhuoyuan Li, Zikun Yuan, Li Li, Dong Liu, Xiaohu Tang, Feng Wu

    Abstract: In modern video coding standards, block-based inter prediction is widely adopted, which brings high compression efficiency. However, in natural videos, there are usually multiple moving objects of arbitrary shapes, resulting in complex motion fields that are difficult to compactly represent. This problem has been tackled by more flexible block partitioning methods in the Versatile Video Coding (VV… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 22 pages, 15 figures

  17. arXiv:2403.11102  [pdf, other

    cs.NI eess.SP

    Jointly Optimizing Terahertz based Sensing and Communications in Vehicular Networks: A Dynamic Graph Neural Network Approach

    Authors: Xuefei Li, Mingzhe Chen, Ye Hu, Zhilong Zhang, Danpu Liu, Shiwen Mao

    Abstract: In this paper, the problem of vehicle service mode selection (sensing, communication, or both) and vehicle connections within terahertz (THz) enabled joint sensing and communications over vehicular networks is studied. The considered network consists of several service provider vehicles (SPVs) that can provide: 1) only sensing service, 2) only communication service, and 3) both services, sensing s… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  18. arXiv:2403.05937  [pdf, other

    cs.CV eess.IV

    Wavelet-Like Transform-Based Technology in Response to the Call for Proposals on Neural Network-Based Image Coding

    Authors: Cunhui Dong, Haichuan Ma, Haotian Zhang, Changsheng Gao, Li Li, Dong Liu

    Abstract: Neural network-based image coding has been developing rapidly since its birth. Until 2022, its performance has surpassed that of the best-performing traditional image coding framework -- H.266/VVC. Witnessing such success, the IEEE 1857.11 working subgroup initializes a neural network-based image coding standard project and issues a corresponding call for proposals (CfP). In response to the CfP, t… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  19. arXiv:2403.01250  [pdf

    eess.SY

    Resilient Mobile Energy Storage Resources Based Distribution Network Restoration in Interdependent Power-Transportation-Information Networks

    Authors: Jian Zhong, Chen Chen, Qiming Yang, Dafu Liu, Wentao Shen, Chenlin Ji, Zhaohong Bie

    Abstract: The interactions between power, transportation, and information networks (PTIN), are becoming more profound with the advent of smart city technologies. Existing mobile energy storage resource (MESR)-based power distribution network (PDN) restoration schemes often neglect the interdependencies among PTIN, thus, efficient PDN restoration cannot be achieved. This paper outlines the interacting factor… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  20. arXiv:2402.16865  [pdf, other

    eess.IV cs.CV cs.LG

    Improve Robustness of Eye Disease Detection by including Learnable Probabilistic Discrete Latent Variables into Machine Learning Models

    Authors: Anirudh Prabhakaran, YeKun Xiao, Ching-Yu Cheng, Dianbo Liu

    Abstract: Ocular diseases, ranging from diabetic retinopathy to glaucoma, present a significant public health challenge due to their prevalence and potential for causing vision impairment. Early and accurate diagnosis is crucial for effective treatment and management.In recent years, deep learning models have emerged as powerful tools for analysing medical images, including ocular imaging . However, challen… ▽ More

    Submitted 20 January, 2024; originally announced February 2024.

    Comments: This is a work in progress

  21. arXiv:2402.13276  [pdf, other

    eess.AS cs.AI cs.SD

    When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection

    Authors: Xiangyu Zhang, Hexin Liu, Kaishuai Xu, Qiquan Zhang, Daijiao Liu, Beena Ahmed, Julien Epps

    Abstract: Depression is a critical concern in global mental health, prompting extensive research into AI-based detection methods. Among various AI technologies, Large Language Models (LLMs) stand out for their versatility in mental healthcare applications. However, their primary limitation arises from their exclusive dependence on textual input, which constrains their overall capabilities. Furthermore, the… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  22. arXiv:2402.10642  [pdf, other

    eess.AS cs.AI

    Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model

    Authors: Xiangyu Zhang, Daijiao Liu, Hexin Liu, Qiquan Zhang, Hanyu Meng, Leibny Paola Garcia, Eng Siong Chng, Lina Yao

    Abstract: Recently, Denoising Diffusion Probabilistic Models (DDPMs) have attained leading performances across a diverse range of generative tasks. However, in the field of speech synthesis, although DDPMs exhibit impressive performance, their long training duration and substantial inference costs hinder practical deployment. Existing approaches primarily focus on enhancing inference speed, while approaches… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  23. arXiv:2401.15864  [pdf, other

    cs.CV eess.IV

    Spatial Decomposition and Temporal Fusion based Inter Prediction for Learned Video Compression

    Authors: Xihua Sheng, Li Li, Dong Liu, Houqiang Li

    Abstract: Video compression performance is closely related to the accuracy of inter prediction. It tends to be difficult to obtain accurate inter prediction for the local video regions with inconsistent motion and occlusion. Traditional video coding standards propose various technologies to handle motion inconsistency and occlusion, such as recursive partitions, geometric partitions, and long-term reference… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

  24. arXiv:2401.09833  [pdf, other

    eess.IV cs.AI cs.CV

    Slicer Networks

    Authors: Hang Zhang, Xiang Chen, Rongguang Wang, Renjiu Hu, Dongdong Liu, Gaolei Li

    Abstract: In medical imaging, scans often reveal objects with varied contrasts but consistent internal intensities or textures. This characteristic enables the use of low-frequency approximations for tasks such as segmentation and deformation field estimation. Yet, integrating this concept into neural network architectures for medical image analysis remains underexplored. In this paper, we propose the Slice… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: 8 figures and 3 tables

  25. arXiv:2401.07496  [pdf, other

    cs.IT cs.LG eess.SP

    Low-Rank Gradient Compression with Error Feedback for MIMO Wireless Federated Learning

    Authors: Mingzhao Guo, Dongzhu Liu, Osvaldo Simeone, Dingzhu Wen

    Abstract: This paper presents a novel approach to enhance the communication efficiency of federated learning (FL) in multiple input and multiple output (MIMO) wireless systems. The proposed method centers on a low-rank matrix factorization strategy for local gradient compression based on alternating least squares, along with over-the-air computation and error feedback. The proposed protocol, termed over-the… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: 5 pages, 3 figures, 27 references, submitted

  26. arXiv:2312.08343  [pdf

    eess.IV cs.CV q-bio.QM

    Enhancing CT Image synthesis from multi-modal MRI data based on a multi-task neural network framework

    Authors: Zhuoyao Xin, Christopher Wu, Dong Liu, Chunming Gu, Jia Guo, Jun Hua

    Abstract: Image segmentation, real-value prediction, and cross-modal translation are critical challenges in medical imaging. In this study, we propose a versatile multi-task neural network framework, based on an enhanced Transformer U-Net architecture, capable of simultaneously, selectively, and adaptively addressing these medical image tasks. Validation is performed on a public repository of human brain MR… ▽ More

    Submitted 17 December, 2023; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: 4 pages, 3 figures, 2 tables

  27. arXiv:2311.15607  [pdf, other

    eess.IV cs.AI cs.CV

    Spatially Covariant Image Registration with Text Prompts

    Authors: Xiang Chen, Min Liu, Rongguang Wang, Renjiu Hu, Dongdong Liu, Gaolei Li, Hang Zhang

    Abstract: Medical images are often characterized by their structured anatomical representations and spatially inhomogeneous contrasts. Leveraging anatomical priors in neural networks can greatly enhance their utility in resource-constrained clinical settings. Prior research has harnessed such information for image segmentation, yet progress in deformable image registration has been modest. Our work introduc… ▽ More

    Submitted 5 February, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: 13 pages, 8 figures, 6 tables

  28. arXiv:2311.08808  [pdf, other

    eess.IV

    Degradation Estimation Recurrent Neural Network with Local and Non-Local Priors for Compressive Spectral Imaging

    Authors: Yubo Dong, Dahua Gao, Yuyan Li, Guangming Shi, Danhua Liu

    Abstract: In the Coded Aperture Snapshot Spectral Imaging (CASSI) system, deep unfolding networks (DUNs) have demonstrated excellent performance in recovering 3D hyperspectral images (HSIs) from 2D measurements. However, some noticeable gaps exist between the imaging model used in DUNs and the real CASSI imaging process, such as the sensing error as well as photon and dark current noise, compromising the ac… ▽ More

    Submitted 14 January, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

  29. arXiv:2311.05415  [pdf, other

    eess.SP

    EEG-DG: A Multi-Source Domain Generalization Framework for Motor Imagery EEG Classification

    Authors: Xiao-Cong Zhong, Qisong Wang, Dan Liu, Zhihuang Chen, Jing-Xiao Liao, Jinwei Sun, Yudong Zhang, Feng-Lei Fan

    Abstract: Motor imagery EEG classification plays a crucial role in non-invasive Brain-Computer Interface (BCI) research. However, the classification is affected by the non-stationarity and individual variations of EEG signals. Simply pooling EEG data with different statistical distributions to train a classification model can severely degrade the generalization performance. To address this issue, the existi… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  30. arXiv:2309.05846  [pdf, other

    eess.IV

    Designs and Implementations in Neural Network-based Video Coding

    Authors: Yue Li, Junru Li, Chaoyi Lin, Kai Zhang, Li Zhang, Franck Galpin, Thierry Dumas, Hongtao Wang, Muhammed Coban, Jacob Ström, Du Liu, Kenneth Andersson

    Abstract: The past decade has witnessed the huge success of deep learning in well-known artificial intelligence applications such as face recognition, autonomous driving, and large language model like ChatGPT. Recently, the application of deep learning has been extended to a much wider range, with neural network-based video coding being one of them. Neural network-based video coding can be performed at two… ▽ More

    Submitted 13 September, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

  31. arXiv:2309.04154  [pdf, other

    cs.RO eess.SY

    A novel model for layer jamming-based continuum robots

    Authors: Bowen Yi, Yeman Fan, Dikai Liu

    Abstract: Continuum robots with variable stiffness have gained wide popularity in the last decade. Layer jamming (LJ) has emerged as a simple and efficient technique to achieve tunable stiffness for continuum robots. Despite its merits, the development of a control-oriented dynamical model tailored for this specific class of robots remains an open problem in the literature. This paper aims to present the fi… ▽ More

    Submitted 11 September, 2023; v1 submitted 8 September, 2023; originally announced September 2023.

  32. arXiv:2308.16376  [pdf, other

    eess.IV cs.CV cs.DC

    Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites: A Federated Learning Approach with Noise-Resilient Training

    Authors: Lei Bai, Dongang Wang, Michael Barnett, Mariano Cabezas, Weidong Cai, Fernando Calamante, Kain Kyle, Dongnan Liu, Linda Ly, Aria Nguyen, Chun-Chien Shieh, Ryan Sullivan, Hengrui Wang, Geng Zhan, Wanli Ouyang, Chenyu Wang

    Abstract: Accurately measuring the evolution of Multiple Sclerosis (MS) with magnetic resonance imaging (MRI) critically informs understanding of disease progression and helps to direct therapeutic strategy. Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area. Obtaining sufficient data from a single clin… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: 11 pages, 4 figures, journal submission

  33. arXiv:2308.11627  [pdf, other

    eess.SP cs.AI cs.CV eess.IV eess.SY

    Non-Intrusive Electric Load Monitoring Approach Based on Current Feature Visualization for Smart Energy Management

    Authors: Yiwen Xu, Dengfeng Liu, Liangtao Huang, Zhiquan Lin, Tiesong Zhao, Sam Kwong

    Abstract: The state-of-the-art smart city has been calling for an economic but efficient energy management over large-scale network, especially for the electric power system. It is a critical issue to monitor, analyze and control electric loads of all users in system. In this paper, we employ the popular computer vision techniques of AI to design a non-invasive load monitoring method for smart electric ener… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

  34. arXiv:2307.12027  [pdf, other

    cs.CV eess.IV

    On the Effectiveness of Spectral Discriminators for Perceptual Quality Improvement

    Authors: Xin Luo, Yunan Zhu, Shunxin Xu, Dong Liu

    Abstract: Several recent studies advocate the use of spectral discriminators, which evaluate the Fourier spectra of images for generative modeling. However, the effectiveness of the spectral discriminators is not well interpreted yet. We tackle this issue by examining the spectral discriminators in the context of perceptual image super-resolution (i.e., GAN-based SR), as SR image quality is susceptible to s… ▽ More

    Submitted 16 August, 2023; v1 submitted 22 July, 2023; originally announced July 2023.

    Comments: Accepted to ICCV 2023. Code and Models are publicly available at https://github.com/Luciennnnnnn/DualFormer

  35. arXiv:2307.11993  [pdf, other

    cs.CR cs.CY cs.DC cs.OS eess.SY

    Verifiable Sustainability in Data Centers

    Authors: Syed Rafiul Hussain, Patrick McDaniel, Anshul Gandhi, Kanad Ghose, Kartik Gopalan, Dongyoon Lee, Yu David Liu, Zhenhua Liu, Shuai Mu, Erez Zadok

    Abstract: Data centers have significant energy needs, both embodied and operational, affecting sustainability adversely. The current techniques and tools for collecting, aggregating, and reporting verifiable sustainability data are vulnerable to cyberattacks and misuse, requiring new security and privacy-preserving solutions. This paper outlines security challenges and research directions for addressing the… ▽ More

    Submitted 12 January, 2024; v1 submitted 22 July, 2023; originally announced July 2023.

  36. arXiv:2307.05092  [pdf, other

    cs.CV eess.IV

    Offline and Online Optical Flow Enhancement for Deep Video Compression

    Authors: Chuanbo Tang, Xihua Sheng, Zhuoyuan Li, Haotian Zhang, Li Li, Dong Liu

    Abstract: Video compression relies heavily on exploiting the temporal redundancy between video frames, which is usually achieved by estimating and using the motion information. The motion information is represented as optical flows in most of the existing deep video compression networks. Indeed, these networks often adopt pre-trained optical flow estimation networks for motion estimation. The optical flows,… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

    Comments: 9 pages, 6 figures

  37. arXiv:2306.15490  [pdf, other

    eess.IV cs.CV

    EVD Surgical Guidance with Retro-Reflective Tool Tracking and Spatial Reconstruction using Head-Mounted Augmented Reality Device

    Authors: Haowei Li, Wenqing Yan, Du Liu, Long Qian, Yuxing Yang, Yihao Liu, Zhe Zhao, Hui Ding, Guangzhi Wang

    Abstract: Augmented Reality (AR) has been used to facilitate surgical guidance during External Ventricular Drain (EVD) surgery, reducing the risks of misplacement in manual operations. During this procedure, the key challenge is accurately estimating the spatial relationship between pre-operative images and actual patient anatomy in AR environment. This research proposes a novel framework utilizing Time of… ▽ More

    Submitted 3 July, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

  38. arXiv:2306.10681  [pdf, other

    eess.IV cs.CV

    VNVC: A Versatile Neural Video Coding Framework for Efficient Human-Machine Vision

    Authors: Xihua Sheng, Li Li, Dong Liu, Houqiang Li

    Abstract: Almost all digital videos are coded into compact representations before being transmitted. Such compact representations need to be decoded back to pixels before being displayed to humans and - as usual - before being enhanced/analyzed by machine vision algorithms. Intuitively, it is more efficient to enhance/analyze the coded representations directly without decoding them into pixels. Therefore, w… ▽ More

    Submitted 1 November, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

  39. arXiv:2306.07089  [pdf, other

    eess.IV cs.AI cs.CV

    Topology Repairing of Disconnected Pulmonary Airways and Vessels: Baselines and a Dataset

    Authors: Ziqiao Weng, Jiancheng Yang, Dongnan Liu, Weidong Cai

    Abstract: Accurate segmentation of pulmonary airways and vessels is crucial for the diagnosis and treatment of pulmonary diseases. However, current deep learning approaches suffer from disconnectivity issues that hinder their clinical usefulness. To address this challenge, we propose a post-processing approach that leverages a data-driven method to repair the topology of disconnected pulmonary tubular struc… ▽ More

    Submitted 28 June, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

    Comments: MICCAI 2023 Early Accepted

  40. arXiv:2306.06603  [pdf, ps, other

    cs.IT cs.LG eess.SP

    Task-Oriented Integrated Sensing, Computation and Communication for Wireless Edge AI

    Authors: Hong Xing, Guangxu Zhu, Dongzhu Liu, Haifeng Wen, Kaibin Huang, Kaishun Wu

    Abstract: With the advent of emerging IoT applications such as autonomous driving, digital-twin and metaverse etc. featuring massive data sensing, analyzing and inference as well critical latency in beyond 5G (B5G) networks, edge artificial intelligence (AI) has been proposed to provide high-performance computation of a conventional cloud down to the network edge. Recently, convergence of wireless sensing,… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

    Comments: 18 pages, 6 figures, submitted for possible journal publication

  41. arXiv:2306.05912  [pdf, other

    eess.IV cs.CV

    Single-Image-Based Deep Learning for Segmentation of Early Esophageal Cancer Lesions

    Authors: Haipeng Li, Dingrui Liu, Yu Zeng, Shuaicheng Liu, Tao Gan, Nini Rao, Jinlin Yang, Bing Zeng

    Abstract: Accurate segmentation of lesions is crucial for diagnosis and treatment of early esophageal cancer (EEC). However, neither traditional nor deep learning-based methods up to today can meet the clinical requirements, with the mean Dice score - the most important metric in medical image analysis - hardly exceeding 0.75. In this paper, we present a novel deep learning approach for segmenting EEC lesio… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

  42. arXiv:2306.04579  [pdf, other

    eess.IV cs.CV

    A Dataset for Deep Learning-based Bone Structure Analyses in Total Hip Arthroplasty

    Authors: Kaidong Zhang, Ziyang Gan, Dong Liu, Xifu Shang

    Abstract: Total hip arthroplasty (THA) is a widely used surgical procedure in orthopedics. For THA, it is of clinical significance to analyze the bone structure from the CT images, especially to observe the structure of the acetabulum and femoral head, before the surgical procedure. For such bone structure analyses, deep learning technologies are promising but require high-quality labeled data for the learn… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: 16 pages, 17 figures

  43. arXiv:2306.03865  [pdf, other

    cs.RO eess.SY

    Simultaneous Position-and-Stiffness Control of Underactuated Antagonistic Tendon-Driven Continuum Robots

    Authors: Bowen Yi, Yeman Fan, Dikai Liu, Jose Guadalupe Romero

    Abstract: Continuum robots have gained widespread popularity due to their inherent compliance and flexibility, particularly their adjustable levels of stiffness for various application scenarios. Despite efforts to dynamic modeling and control synthesis over the past decade, few studies have incorporated stiffness regulation into their feedback control design; however, this is one of the initial motivations… ▽ More

    Submitted 13 October, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

  44. arXiv:2305.20024  [pdf

    eess.SY

    Cooperative IoT Data Sharing with Heterogeneity of Participants Based on Electricity Retail

    Authors: Bohong Wang, Qinglai Guo, Tian Xia, Qiang Li, Di Liu, Feng Zhao

    Abstract: With the development of Internet of Things (IoT) and big data technology, the data value is increasingly explored in multiple practical scenarios, including electricity transactions. However, the isolation of IoT data among several entities makes it difficult to achieve optimal allocation of data resources and convert data resources into real economic value, thus it is necessary to introduce the I… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: 18 pages, 14 figures

  45. Resilience in Platoons of Cooperative Heterogeneous Vehicles: Self-organization Strategies and Provably-correct Design

    Authors: Di Liu, Sebastian Mair, Kang Yang, Simone Baldi, Paolo Frasca, Matthias Althoff

    Abstract: This work proposes provably-correct self-organizing strategies for platoons of heterogeneous vehicles. We refer to self-organization as the capability of a platoon to autonomously homogenize to a common group behavior. We show that self-organization promotes resilience to acceleration limits and communication failures, i.e., homogenizing to a common group behavior makes the platoon recover from th… ▽ More

    Submitted 22 February, 2024; v1 submitted 27 May, 2023; originally announced May 2023.

  46. arXiv:2305.04156  [pdf, other

    eess.IV cs.CV

    SynthMix: Mixing up Aligned Synthesis for Medical Cross-Modality Domain Adaptation

    Authors: Xinwen Zhang, Chaoyi Zhang, Dongnan Liu, Qianbi Yu, Weidong Cai

    Abstract: The adversarial methods showed advanced performance by producing synthetic images to mitigate the domain shift, a common problem due to the hardship of acquiring labelled data in medical field. Most existing studies focus on modifying the network architecture, but little has worked on the GAN training strategy. In this work, we propose SynthMix, an add-on module with a natural yet effective traini… ▽ More

    Submitted 6 May, 2023; originally announced May 2023.

    Comments: Accepted by The IEEE International Symposium on Biomedical Imaging (ISBI) 2023

  47. arXiv:2305.04152  [pdf, ps, other

    cs.IT eess.SP

    Bayesian Over-the-Air FedAvg via Channel Driven Stochastic Gradient Langevin Dynamics

    Authors: Boning Zhang, Dongzhu Liu, Osvaldo Simeone, Guangxu Zhu

    Abstract: The recent development of scalable Bayesian inference methods has renewed interest in the adoption of Bayesian learning as an alternative to conventional frequentist learning that offers improved model calibration via uncertainty quantification. Recently, federated averaging Langevin dynamics (FALD) was introduced as a variant of federated averaging that can efficiently implement distributed Bayes… ▽ More

    Submitted 9 May, 2023; v1 submitted 6 May, 2023; originally announced May 2023.

    Comments: 6 pages, 4 figures, 26 references, submitted

  48. arXiv:2305.00104  [pdf, other

    cs.CV eess.AS eess.IV

    MMViT: Multiscale Multiview Vision Transformers

    Authors: Yuchen Liu, Natasha Ong, Kaiyan Peng, Bo Xiong, Qifan Wang, Rui Hou, Madian Khabsa, Kaiyue Yang, David Liu, Donald S. Williamson, Hanchao Yu

    Abstract: We present Multiscale Multiview Vision Transformers (MMViT), which introduces multiscale feature maps and multiview encodings to transformer models. Our model encodes different views of the input signal and builds several channel-resolution feature stages to process the multiple views of the input at different resolutions in parallel. At each scale stage, we use a cross-attention block to fuse inf… ▽ More

    Submitted 28 April, 2023; originally announced May 2023.

  49. arXiv:2304.14053  [pdf, other

    eess.IV cs.CV cs.LG

    Precise Few-shot Fat-free Thigh Muscle Segmentation in T1-weighted MRI

    Authors: Sheng Chen, Zihao Tang, Dongnan Liu, Ché Fornusek, Michael Barnett, Chenyu Wang, Mariano Cabezas, Weidong Cai

    Abstract: Precise thigh muscle volumes are crucial to monitor the motor functionality of patients with diseases that may result in various degrees of thigh muscle loss. T1-weighted MRI is the default surrogate to obtain thigh muscle masks due to its contrast between muscle and fat signals. Deep learning approaches have recently been widely used to obtain these masks through segmentation. However, due to the… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

    Comments: ISBI2023, Few-shot, Intra-muscular fat, Thigh muscle segmentation, Pseudo-label denoising, MRI

  50. arXiv:2304.04647  [pdf

    eess.SY

    L0-norm constraint normalized subband adaptive filtering algorithm: Performance development and AEC application

    Authors: Dongxu Liu, Haiquan Zhao, Yang Zhou

    Abstract: Limited by fixed step-size and sparsity penalty factor, the conventional sparsity-aware normalized subband adaptive filtering (NSAF) type algorithms suffer from trade-off requirements of high filtering accurateness and quicker convergence behavior for sparse system identification. To deal with this problem, this paper proposes variable step-size L0-norm constraint NSAF algorithms (VSS-L0-NSAFs). W… ▽ More

    Submitted 11 May, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible