Skip to main content

Showing 1–50 of 118 results for author: Cheng, Y

  1. arXiv:2407.10828  [pdf

    cs.SD cs.AI eess.AS

    Towards Enhanced Classification of Abnormal Lung sound in Multi-breath: A Light Weight Multi-label and Multi-head Attention Classification Method

    Authors: Yi-Wei Chua, Yun-Chien Cheng

    Abstract: This study aims to develop an auxiliary diagnostic system for classifying abnormal lung respiratory sounds, enhancing the accuracy of automatic abnormal breath sound classification through an innovative multi-label learning approach and multi-head attention mechanism. Addressing the issue of class imbalance and lack of diversity in existing respiratory sound datasets, our study employs a lightweig… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  2. arXiv:2407.10646  [pdf, other

    cs.SD eess.AS

    Towards zero-shot amplifier modeling: One-to-many amplifier modeling via tone embedding control

    Authors: Yu-Hua Chen, Yen-Tung Yeh, Yuan-Chiao Cheng, Jui-Te Wu, Yu-Hsiang Ho, Jyh-Shing Roger Jang, Yi-Hsuan Yang

    Abstract: Replicating analog device circuits through neural audio effect modeling has garnered increasing interest in recent years. Existing work has predominantly focused on a one-to-one emulation strategy, modeling specific devices individually. In this paper, we tackle the less-explored scenario of one-to-many emulation, utilizing conditioning mechanisms to emulate multiple guitar amplifiers through a si… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ISMIR 2024

  3. arXiv:2407.03374  [pdf

    cs.AI cs.SE eess.SP eess.SY

    An Outline of Prognostics and Health Management Large Model: Concepts, Paradigms, and Challenges

    Authors: Laifa Tao, Shangyu Li, Haifei Liu, Qixuan Huang, Liang Ma, Guoao Ning, Yiling Chen, Yunlong Wu, Bin Li, Weiwei Zhang, Zhengduo Zhao, Wenchao Zhan, Wenyan Cao, Chao Wang, Hongmei Liu, Jian Ma, Mingliang Suo, Yujie Cheng, Yu Ding, Dengwei Song, Chen Lu

    Abstract: Prognosis and Health Management (PHM), critical for ensuring task completion by complex systems and preventing unexpected failures, is widely adopted in aerospace, manufacturing, maritime, rail, energy, etc. However, PHM's development is constrained by bottlenecks like generalization, interpretation and verification abilities. Presently, generative artificial intelligence (AI), represented by Larg… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  4. arXiv:2406.16942  [pdf, other

    eess.IV cs.AI cs.CV

    Enhancing Diagnostic Reliability of Foundation Model with Uncertainty Estimation in OCT Images

    Authors: Yuanyuan Peng, Aidi Lin, Meng Wang, Tian Lin, Ke Zou, Yinglin Cheng, Tingkun Shi, Xulong Liao, Lixia Feng, Zhen Liang, Xinjian Chen, Huazhu Fu, Haoyu Chen

    Abstract: Inability to express the confidence level and detect unseen classes has limited the clinical implementation of artificial intelligence in the real-world. We developed a foundation model with uncertainty estimation (FMUE) to detect 11 retinal conditions on optical coherence tomography (OCT). In the internal test set, FMUE achieved a higher F1 score of 96.76% than two state-of-the-art algorithms, RE… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: All codes are available at https://github.com/yuanyuanpeng0129/FMUE

  5. arXiv:2404.09595  [pdf, ps, other

    eess.SP cs.AI

    Building Semantic Communication System via Molecules: An End-to-End Training Approach

    Authors: Yukun Cheng, Wei Chen, Bo Ai

    Abstract: The concept of semantic communication provides a novel approach for applications in scenarios with limited communication resources. In this paper, we propose an end-to-end (E2E) semantic molecular communication system, aiming to enhance the efficiency of molecular communication systems by reducing the transmitted information. Specifically, following the joint source channel coding paradigm, the ne… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  6. arXiv:2404.06080  [pdf

    eess.IV cs.CV

    Using Few-Shot Learning to Classify Primary Lung Cancer and Other Malignancy with Lung Metastasis in Cytological Imaging via Endobronchial Ultrasound Procedures

    Authors: Ching-Kai Lin, Di-Chun Wei, Yun-Chien Cheng

    Abstract: This study aims to establish a computer-aided diagnosis system for endobronchial ultrasound (EBUS) surgery to assist physicians in the preliminary diagnosis of metastatic cancer. This involves arranging immediate examinations for other sites of metastatic cancer after EBUS surgery, eliminating the need to wait for reports, thereby shortening the waiting time by more than half and enabling patients… ▽ More

    Submitted 9 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  7. arXiv:2404.01929  [pdf

    eess.IV cs.CV

    Towards Enhanced Analysis of Lung Cancer Lesions in EBUS-TBNA -- A Semi-Supervised Video Object Detection Method

    Authors: Jyun-An Lin, Yun-Chien Cheng, Ching-Kai Lin

    Abstract: This study aims to establish a computer-aided diagnostic system for lung lesions using endobronchial ultrasound (EBUS) to assist physicians in identifying lesion areas. During EBUS-transbronchial needle aspiration (EBUS-TBNA) procedures, hysicians rely on grayscale ultrasound images to determine the location of lesions. However, these images often contain significant noise and can be influenced by… ▽ More

    Submitted 20 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  8. arXiv:2403.09301  [pdf, ps, other

    eess.SP

    CRB Analysis for Mixed-ADC Based DOA Estimation

    Authors: Xinnan Zhang, Yuanbo Cheng, Xiaolei Shang, Jun Liu

    Abstract: We consider a mixed analog-to-digital converter (ADC) based architecture consisting of high-precision and one-bit ADCs with the antenna-varying threshold for direction of arrival (DOA) estimation using a uniform linear array (ULA), which utilizes fixed but different thresholds for one-bit ADCs across different receive antennas. The Cram{é}r-Rao bound (CRB) with the antenna-varying threshold is obt… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Major Revision under Trans on Signal Processing

  9. arXiv:2403.09136  [pdf, other

    eess.IV cs.CV

    Biophysics Informed Pathological Regularisation for Brain Tumour Segmentation

    Authors: Lipei Zhang, Yanqi Cheng, Lihao Liu, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero

    Abstract: Recent advancements in deep learning have significantly improved brain tumour segmentation techniques; however, the results still lack confidence and robustness as they solely consider image data without biophysical priors or pathological information. Integrating biophysics-informed regularisation is one effective way to change this situation, as it provides an prior regularisation for automated e… ▽ More

    Submitted 17 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: 11 pages, 4 figures and 1 table

  10. arXiv:2401.03476  [pdf, other

    cs.MM cs.AI cs.HC cs.SD eess.AS

    Freetalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness

    Authors: Sicheng Yang, Zunnan Xu, Haiwei Xue, Yongkang Cheng, Shaoli Huang, Mingming Gong, Zhiyong Wu

    Abstract: Current talking avatars mostly generate co-speech gestures based on audio and text of the utterance, without considering the non-speaking motion of the speaker. Furthermore, previous works on co-speech gesture generation have designed network structures based on individual gesture datasets, which results in limited data volume, compromised generalizability, and restricted speaker movements. To tac… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: 6 pages, 3 figures, ICASSP 2024

  11. arXiv:2312.00082  [pdf, other

    eess.IV cs.CV

    A Compact Implicit Neural Representation for Efficient Storage of Massive 4D Functional Magnetic Resonance Imaging

    Authors: Ruoran Li, Runzhao Yang, Wenxin Xiang, Yuxiao Cheng, Tingxiong Xiao, Jinli Suo

    Abstract: Functional Magnetic Resonance Imaging (fMRI) data is a widely used kind of four-dimensional biomedical data, which requires effective compression. However, fMRI compressing poses unique challenges due to its intricate temporal dynamics, low signal-to-noise ratio, and complicated underlying redundancies. This paper reports a novel compression paradigm specifically tailored for fMRI data based on Im… ▽ More

    Submitted 29 February, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

  12. arXiv:2311.13682  [pdf, other

    cs.CV eess.IV

    Single-Shot Plug-and-Play Methods for Inverse Problems

    Authors: Yanqi Cheng, Lipei Zhang, Zhenda Shen, Shujun Wang, Lequan Yu, Raymond H. Chan, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero

    Abstract: The utilisation of Plug-and-Play (PnP) priors in inverse problems has become increasingly prominent in recent years. This preference is based on the mathematical equivalence between the general proximal operator and the regularised denoiser, facilitating the adaptation of various off-the-shelf denoiser priors to a wide range of inverse problems. However, existing PnP models predominantly rely on p… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

  13. arXiv:2311.13610  [pdf, other

    cs.CV eess.IV

    TRIDENT: The Nonlinear Trilogy for Implicit Neural Representations

    Authors: Zhenda Shen, Yanqi Cheng, Raymond H. Chan, Pietro Liò, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero

    Abstract: Implicit neural representations (INRs) have garnered significant interest recently for their ability to model complex, high-dimensional data without explicit parameterisation. In this work, we introduce TRIDENT, a novel function for implicit neural representations characterised by a trilogy of nonlinearities. Firstly, it is designed to represent high-order features through order compactness. Secon… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  14. arXiv:2311.13134  [pdf, other

    cs.CV eess.IV

    Lightweight High-Speed Photography Built on Coded Exposure and Implicit Neural Representation of Videos

    Authors: Zhihong Zhang, Runzhao Yang, Jinli Suo, Yuxiao Cheng, Qionghai Dai

    Abstract: The compact cameras recording high-speed scenes with high resolution are highly demanded, but the required high bandwidth often leads to bulky, heavy systems, which limits their applications on low-capacity platforms. Adopting a coded exposure setup to encode a frame sequence into a blurry snapshot and retrieve the latent sharp video afterward can serve as a lightweight solution. However, restorin… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: 19 pages, 10 figures

  15. arXiv:2311.07110  [pdf, other

    eess.SY cs.CR cs.LG

    Adversarial Purification for Data-Driven Power System Event Classifiers with Diffusion Models

    Authors: Yuanbin Cheng, Koji Yamashita, Jim Follum, Nanpeng Yu

    Abstract: The global deployment of the phasor measurement units (PMUs) enables real-time monitoring of the power system, which has stimulated considerable research into machine learning-based models for event detection and classification. However, recent studies reveal that machine learning-based methods are vulnerable to adversarial attacks, which can fool the event classifiers by adding small perturbation… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  16. arXiv:2310.06256  [pdf, ps, other

    eess.SP

    Rate Compatible LDPC Neural Decoding Network: A Multi-Task Learning Approach

    Authors: Yukun Cheng, Wei Chen, Lun Li, Bo Ai

    Abstract: Deep learning based decoding networks have shown significant improvement in decoding LDPC codes, but the neural decoders are limited by rate-matching operations such as puncturing or extending, thus needing to train multiple decoders with different code rates for a variety of channel conditions. In this correspondence, we propose a Multi-Task Learning based rate-compatible LDPC ecoding network, wh… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

  17. arXiv:2309.15800  [pdf, other

    cs.CL cs.SD eess.AS

    Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study

    Authors: Xuankai Chang, Brian Yan, Kwanghee Choi, Jeeweon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang

    Abstract: Speech signals, typically sampled at rates in the tens of thousands per second, contain redundancies, evoking inefficiencies in sequence modeling. High-dimensional speech features such as spectrograms are often used as the input for the subsequent model. However, they can still be redundant. Recent investigations proposed the use of discrete speech units derived from self-supervised learning repre… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: Submitted to IEEE ICASSP 2024

  18. arXiv:2307.14588  [pdf

    eess.IV cs.CV cs.LG

    MCPA: Multi-scale Cross Perceptron Attention Network for 2D Medical Image Segmentation

    Authors: Liang Xu, Mingxiao Chen, Yi Cheng, Pengfei Shao, Shuwei Shen, Peng Yao, Ronald X. Xu

    Abstract: The UNet architecture, based on Convolutional Neural Networks (CNN), has demonstrated its remarkable performance in medical image analysis. However, it faces challenges in capturing long-range dependencies due to the limited receptive fields and inherent bias of convolutional operations. Recently, numerous transformer-based techniques have been incorporated into the UNet architecture to overcome t… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

  19. arXiv:2307.07688  [pdf, other

    cs.CV eess.IV

    DRM-IR: Task-Adaptive Deep Unfolding Network for All-In-One Image Restoration

    Authors: Yuanshuo Cheng, Mingwen Shao, Yecong Wan, Chao Wang

    Abstract: Existing All-In-One image restoration (IR) methods usually lack flexible modeling on various types of degradation, thus impeding the restoration performance. To achieve All-In-One IR with higher task dexterity, this work proposes an efficient Dynamic Reference Modeling paradigm (DRM-IR), which consists of task-adaptive degradation modeling and model-based image restoring. Specifically, these two s… ▽ More

    Submitted 30 November, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

  20. arXiv:2307.07096  [pdf, other

    eess.AS cs.SD

    Low Rank Properties for Estimating Microphones Start Time and Sources Emission Time

    Authors: Faxian Cao, Yongqiang Cheng, Adil Mehmood Khan, Zhijing Yang, S. M. Ahsan Kazmiand Yingxiu Chang

    Abstract: Uncertainty in timing information pertaining to the start time of microphone recordings and sources' emission time pose significant challenges in various applications, such as joint microphones and sources localization. Traditional optimization methods, which directly estimate this unknown timing information (UTIm), often fall short compared to approaches exploiting the low-rank property (LRP). LR… ▽ More

    Submitted 21 July, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: 13 pages for main content; 9 pages for proof of proposed low rank properties; 13 figures

  21. arXiv:2306.16022  [pdf, other

    cs.SD cs.CR eess.AS

    Enrollment-stage Backdoor Attacks on Speaker Recognition Systems via Adversarial Ultrasound

    Authors: Xinfeng Li, Junning Ze, Chen Yan, Yushi Cheng, Xiaoyu Ji, Wenyuan Xu

    Abstract: Automatic Speaker Recognition Systems (SRSs) have been widely used in voice applications for personal identification and access control. A typical SRS consists of three stages, i.e., training, enrollment, and recognition. Previous work has revealed that SRSs can be bypassed by backdoor attacks at the training stage or by adversarial example attacks at the recognition stage. In this paper, we propo… ▽ More

    Submitted 10 December, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

    Comments: Published in Internet of Things Journal (IoT-J)

  22. arXiv:2306.08985  [pdf, other

    eess.SP

    Mixed-ADC Based PMCW MIMO Radar Angle-Doppler Imaging

    Authors: Xiaolei Shang, Ronghao Lin, Yuanbo Cheng

    Abstract: Phase-modulated continuous-wave (PMCW) multiple-input multiple-output (MIMO) radar systems are known to possess excellent mutual interference mitigation capabilities, but require costly and power-hungry high sampling rate and high-precision analog-to-digital converters (ADC's). To reduce cost and power consumption, we consider a mixed-ADC architecture, in which most receive antenna outputs are sam… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

  23. arXiv:2306.08965  [pdf, other

    eess.SP

    Code Optimization and Angle-Doppler Imaging for ST-CDM LFMCW MIMO Radar Systems

    Authors: Xiaolei Shang, Yuanbo Cheng

    Abstract: We consider code optimization and angle-Doppler imaging for slow-time code division multiplexing (ST-CDM) linear frequency-modulated continuous-wave (LFMCW) multiple-input multiple-output (MIMO) radar systems. We optimize the slow-time code via the minimization of a Cram{é}r-Rao Bound (CRB)-based metric to enhance the parameter estimation performance. Then, a computationally efficient RELAX-based… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

  24. arXiv:2305.11397  [pdf, other

    eess.AS cs.SD

    Are Microphone Signals Alone Sufficient for Self-Positioning?

    Authors: Faxian Cao, Yongqiang Cheng, Adil Mehmood Khan, Zhijing Yang

    Abstract: In an era where asynchronous environments pose challenges to traditional self-positioning methods, we propose a new transformation to the existing paradigm. Traditionally, time of arrival (TOA) measurements require both microphone and source signals, limiting their applicability in environments with unknown emission time of human voices or sources and unknown recording start time of independent mi… ▽ More

    Submitted 6 July, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: 1 figure, including 3 sub-figures

  25. arXiv:2305.02719  [pdf

    eess.IV cs.CV

    Using Spatio-Temporal Dual-Stream Network with Self-Supervised Learning for Lung Tumor Classification on Radial Probe Endobronchial Ultrasound Video

    Authors: Ching-Kai Lin, Chin-Wen Chen, Yun-Chien Cheng

    Abstract: The purpose of this study is to develop a computer-aided diagnosis system for classifying benign and malignant lung lesions, and to assist physicians in real-time analysis of radial probe endobronchial ultrasound (EBUS) videos. During the biopsy process of lung cancer, physicians use real-time ultrasound images to find suitable lesion locations for sampling. However, most of these images are diffi… ▽ More

    Submitted 6 May, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

  26. arXiv:2304.08038  [pdf, other

    cs.IT eess.SP

    Orthogonal AMP for Problems with Multiple Measurement Vectors and/or Multiple Transforms

    Authors: Yiyao Cheng, Lei Liu, Shansuo Liang, Jonathan. H. Manton, Li Ping

    Abstract: Approximate message passing (AMP) algorithms break a (high-dimensional) statistical problem into parts then repeatedly solve each part in turn, akin to alternating projections. A distinguishing feature is their asymptotic behaviours can be accurately predicted via their associated state evolution equations. Orthogonal AMP (OAMP) was recently developed to avoid the need for computing the so-called… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

  27. arXiv:2303.15112  [pdf, ps, other

    eess.SP

    Optimal Mixed-ADC arrangement for DOA Estimation via CRB using ULA

    Authors: Xinnan Zhang, Yuanbo Cheng, Xiaolei Shang, Jun liu

    Abstract: We consider a mixed analog-to-digital converter (ADC) based architecture for direction of arrival (DOA) estimation using a uniform linear array (ULA). We derive the Cram{é}r-Rao bound (CRB) of the DOA under the optimal time-varying threshold, and find that the asymptotic CRB is related to the arrangement of high-precision and one-bit ADCs for a fixed number of ADCs. Then, a new concept called ``mi… ▽ More

    Submitted 14 May, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: 5 pages, 3 figures, accepted by ICASSP2023

  28. arXiv:2301.04479  [pdf, other

    eess.SP

    Super-resolution of Ray-tracing Channel Simulation via Attention Mechanism based Deep Learning Model

    Authors: Haoyang Zhang, Danping He, Xiping Wang, Wenbin Wang, Yunhao Cheng, Ke Guan

    Abstract: As an emerging approach, deep learning plays an increasingly influential role in channel modeling. Traditional ray tracing (RT) methods of channel modeling tend to be inefficient and expensive. In this paper, we present a super-resolution (SR) model for channel characteristics. Residual connection and attention mechanism are applied to this convolutional neural network (CNN) model. Experiments pro… ▽ More

    Submitted 21 January, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

  29. arXiv:2212.09553  [pdf, other

    cs.CL cs.SD eess.AS

    Mu$^{2}$SLAM: Multitask, Multilingual Speech and Language Models

    Authors: Yong Cheng, Yu Zhang, Melvin Johnson, Wolfgang Macherey, Ankur Bapna

    Abstract: We present Mu$^{2}$SLAM, a multilingual sequence-to-sequence model pre-trained jointly on unlabeled speech, unlabeled text and supervised data spanning Automatic Speech Recognition (ASR), Automatic Speech Translation (AST) and Machine Translation (MT), in over 100 languages. By leveraging a quantized representation of speech as a target, Mu$^{2}$SLAM trains the speech-text models with a sequence-t… ▽ More

    Submitted 26 June, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: ICML 2023

  30. arXiv:2211.17250  [pdf, other

    cs.RO cs.LG eess.SY

    Safe and Efficient Reinforcement Learning Using Disturbance-Observer-Based Control Barrier Functions

    Authors: Yikun Cheng, Pan Zhao, Naira Hovakimyan

    Abstract: Safe reinforcement learning (RL) with assured satisfaction of hard state constraints during training has recently received a lot of attention. Safety filters, e.g., based on control barrier functions (CBFs), provide a promising way for safe RL via modifying the unsafe actions of an RL agent on the fly. Existing safety filter-based approaches typically involve learning of uncertain dynamics and qua… ▽ More

    Submitted 28 August, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

  31. Super-resolution Reconstruction of Single Image for Latent features

    Authors: Xin Wang, Jing-Ke Yan, Jing-Ye Cai, Jian-Hua Deng, Qin Qin, Yao Cheng

    Abstract: Single-image super-resolution (SISR) typically focuses on restoring various degraded low-resolution (LR) images to a single high-resolution (HR) image. However, during SISR tasks, it is often challenging for models to simultaneously maintain high quality and rapid sampling while preserving diversity in details and texture features. This challenge can lead to issues such as model collapse, lack of… ▽ More

    Submitted 9 November, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Journal ref: Computational Visual Media,2023

  32. arXiv:2211.02625  [pdf, other

    eess.SP cs.LG

    MAEEG: Masked Auto-encoder for EEG Representation Learning

    Authors: Hsiang-Yun Sherry Chien, Hanlin Goh, Christopher M. Sandino, Joseph Y. Cheng

    Abstract: Decoding information from bio-signals such as EEG, using machine learning has been a challenge due to the small data-sets and difficulty to obtain labels. We propose a reconstruction-based self-supervised learning model, the masked auto-encoder for EEG (MAEEG), for learning EEG representations by learning to reconstruct the masked EEG features using a transformer architecture. We found that MAEEG… ▽ More

    Submitted 27 October, 2022; originally announced November 2022.

    Comments: 10 pages, 5 figures, accepted by Workshop on Learning from Time Series for Health, NeurIPS2022 as poster presentation

  33. arXiv:2210.15370  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    CasNet: Investigating Channel Robustness for Speech Separation

    Authors: Fan-Lin Wang, Yao-Fei Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang

    Abstract: Recording channel mismatch between training and testing conditions has been shown to be a serious problem for speech separation. This situation greatly reduces the separation performance, and cannot meet the requirement of daily use. In this study, inheriting the use of our previously constructed TAT-2mix corpus, we address the channel mismatch problem by proposing a channel-aware audio separation… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  34. arXiv:2210.15368  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    A Training and Inference Strategy Using Noisy and Enhanced Speech as Target for Speech Enhancement without Clean Speech

    Authors: Li-Wei Chen, Yao-Fei Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang

    Abstract: The lack of clean speech is a practical challenge to the development of speech enhancement systems, which means that there is an inevitable mismatch between their training criterion and evaluation metric. In response to this unfavorable situation, we propose a training and inference strategy that additionally uses enhanced speech as a target by improving the previously proposed noisy-target traini… ▽ More

    Submitted 22 May, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: Accepted by Interspeech 2023

  35. arXiv:2209.15180  [pdf, other

    eess.IV cs.CV

    SCI: A Spectrum Concentrated Implicit Neural Compression for Biomedical Data

    Authors: Runzhao Yang, Tingxiong Xiao, Yuxiao Cheng, Qianni Cao, Jinyuan Qu, Jinli Suo, Qionghai Dai

    Abstract: Massive collection and explosive growth of biomedical data, demands effective compression for efficient storage, transmission and sharing. Readily available visual data compression techniques have been studied extensively but tailored for natural images/videos, and thus show limited performance on biomedical data which are of different features and larger diversity. Emerging implicit neural repres… ▽ More

    Submitted 23 November, 2022; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: accepted to AAAI2023

    ACM Class: I.4.2; I.2.10

  36. Monte-Carlo Sampling Approach to Model Selection: A Primer

    Authors: Petre Stoica, Xiaolei Shang, Yuanbo Cheng

    Abstract: Any data modeling exercise has two main components: parameter estimation and model selection. The latter will be the topic of this lecture note. More concretely we will introduce several Monte-Carlo sampling-based rules for model selection using the maximum a posteriori (MAP) approach. Model selection problems are omnipresent in signal processing applications: examples include selecting the order… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Journal ref: IEEE Signal Processing Magazine, Vol, 39, no. 5, pp. 85--2, 2022

  37. The Cramer-Rao Bound for Signal Parameter Estimation from Quantized Data

    Authors: Petre Stoica, Xiaolei Shang, Yuanbo Cheng

    Abstract: Several current ultra-wide band applications, such as millimeter wave radar and communication systems, require high sampling rates and therefore expensive and energy-hungry analogto-digital converters (ADCs). In applications where cost and power constraints exist, the use of high-precision ADCs is not feasible and the designer must resort to ADCs with coarse quantization. Consequently the interest… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Journal ref: IEEE Signal Processing Magazine, vol. 39, no.1, pp. 118-125,2021

  38. Robust machine learning segmentation for large-scale analysis of heterogeneous clinical brain MRI datasets

    Authors: Benjamin Billot, Colin Magdamo, You Cheng, Steven E. Arnold, Sudeshna Das, Juan. E. Iglesias

    Abstract: Every year, millions of brain MRI scans are acquired in hospitals, which is a figure considerably larger than the size of any research dataset. Therefore, the ability to analyse such scans could transform neuroimaging research. Yet, their potential remains untapped, since no automated algorithm is robust enough to cope with the high variability in clinical acquisitions (MR contrasts, resolutions,… ▽ More

    Submitted 4 January, 2023; v1 submitted 5 September, 2022; originally announced September 2022.

    Comments: under review, extension of MICCAI 2022 paper

  39. arXiv:2208.03067  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Large vocabulary speech recognition for languages of Africa: multilingual modeling and self-supervised learning

    Authors: Sandy Ritchie, You-Chi Cheng, Mingqing Chen, Rajiv Mathews, Daan van Esch, Bo Li, Khe Chai Sim

    Abstract: Almost none of the 2,000+ languages spoken in Africa have widely available automatic speech recognition systems, and the required data is also only available for a few languages. We have experimented with two techniques which may provide pathways to large vocabulary speech recognition for African languages: multilingual modeling and self-supervised learning. We gathered available open source data… ▽ More

    Submitted 4 October, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

  40. arXiv:2207.11746  [pdf, other

    eess.SY

    Consensus-based Frequency and Voltage Regulation for Fully Inverter-based Islanded Microgrids

    Authors: Y. Cheng, Tao Liu, David J. Hill, Xue Lyu

    Abstract: This paper proposes a new distributed consensus-based control method for voltage and frequency control of fully inverter-based islanded microgrids (MGs). The proposed method includes the active power sharing in voltage control to improve the reactive power sharing accuracy and thus generalizes some existing secondary frequency and voltage control methods. Firstly, frequency is regulated by distrib… ▽ More

    Submitted 24 July, 2022; originally announced July 2022.

    Comments: In proceedings of the 11th Bulk Power Systems Dynamics and Control Symposium (IREP 2022), July 25-30, 2022, Banff, Canada

    Report number: IREP2022-8

  41. INFWIDE: Image and Feature Space Wiener Deconvolution Network for Non-blind Image Deblurring in Low-Light Conditions

    Authors: Zhihong Zhang, Yuxiao Cheng, Jinli Suo, Liheng Bian, Qionghai Dai

    Abstract: Under low-light environment, handheld photography suffers from severe camera shake under long exposure settings. Although existing deblurring algorithms have shown promising performance on well-exposed blurry images, they still cannot cope with low-light snapshots. Sophisticated noise and saturation regions are two dominating challenges in practical low-light deblurring. In this work, we propose a… ▽ More

    Submitted 17 February, 2023; v1 submitted 17 July, 2022; originally announced July 2022.

    Comments: Accepted by IEEE Trans. Image Process, early access version available at https://ieeexplore.ieee.org/document/10047966

  42. arXiv:2207.07611  [pdf, other

    cs.LG cs.CV cs.SD eess.AS

    Position Prediction as an Effective Pretraining Strategy

    Authors: Shuangfei Zhai, Navdeep Jaitly, Jason Ramapuram, Dan Busbridge, Tatiana Likhomanenko, Joseph Yitan Cheng, Walter Talbott, Chen Huang, Hanlin Goh, Joshua Susskind

    Abstract: Transformers have gained increasing popularity in a wide range of applications, including Natural Language Processing (NLP), Computer Vision and Speech Recognition, because of their powerful representational capacity. However, harnessing this representational capacity effectively requires a large amount of data, strong regularization, or both, to mitigate overfitting. Recently, the power of the Tr… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    Comments: Accepted to ICML 2022

  43. arXiv:2207.03334  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    Speech Emotion: Investigating Model Representations, Multi-Task Learning and Knowledge Distillation

    Authors: Vikramjit Mitra, Hsiang-Yun Sherry Chien, Vasudha Kowtha, Joseph Yitan Cheng, Erdrin Azemi

    Abstract: Estimating dimensional emotions, such as activation, valence and dominance, from acoustic speech signals has been widely explored over the past few years. While accurate estimation of activation and dominance from speech seem to be possible, the same for valence remains challenging. Previous research has shown that the use of lexical information can improve valence estimation performance. Lexical… ▽ More

    Submitted 2 July, 2022; originally announced July 2022.

    Comments: 5 pages, 3 figures, Interspeech 2022

  44. arXiv:2207.03190  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Learning Music-Dance Representations through Explicit-Implicit Rhythm Synchronization

    Authors: Jiashuo Yu, Junfu Pu, Ying Cheng, Rui Feng, Ying Shan

    Abstract: Although audio-visual representation has been proved to be applicable in many downstream tasks, the representation of dancing videos, which is more specific and always accompanied by music with complex auditory contents, remains challenging and uninvestigated. Considering the intrinsic alignment between the cadent movement of dancer and music rhythm, we introduce MuDaR, a novel Music-Dance Represe… ▽ More

    Submitted 10 August, 2023; v1 submitted 7 July, 2022; originally announced July 2022.

    Comments: Accepted for publication in IEEE Transactions on Multimedia

  45. arXiv:2207.02209  [pdf

    cs.LG cond-mat.mtrl-sci eess.IV physics.optics

    Tackling Data Scarcity with Transfer Learning: A Case Study of Thickness Characterization from Optical Spectra of Perovskite Thin Films

    Authors: Siyu Isaac Parker Tian, Zekun Ren, Selvaraj Venkataraj, Yuanhang Cheng, Daniil Bash, Felipe Oviedo, J. Senthilnath, Vijila Chellappan, Yee-Fun Lim, Armin G. Aberle, Benjamin P MacLeod, Fraser G. L. Parlane, Curtis P. Berlinguette, Qianxiao Li, Tonio Buonassisi, Zhe Liu

    Abstract: Transfer learning increasingly becomes an important tool in handling data scarcity often encountered in machine learning. In the application of high-throughput thickness as a downstream process of the high-throughput optimization of optoelectronic thin films with autonomous workflows, data scarcity occurs especially for new materials. To achieve high-throughput thickness characterization, we propo… ▽ More

    Submitted 20 December, 2022; v1 submitted 14 June, 2022; originally announced July 2022.

  46. arXiv:2206.12060  [pdf, other

    eess.SP

    Dual Power Spectrum Manifold and Toeplitz HPD Manifold: Enhancement and Analysis for Matrix CFAR Detection

    Authors: Hao Wu, Yongqiang Cheng, Xixi Chen, Zheng Yang, Xiang Li, Hongqiang Wang

    Abstract: Recently, an innovative matrix CFAR detection scheme based on information geometry, also referred to as the geometric detector, has been developed speedily and exhibits distinct advantages in several practical applications. These advantages benefit from the geometry of the Toeplitz Hermitian positive definite (HPD) manifold $\mathcal{M}_{\mathcal{T}H_{++}}$, but the sophisticated geometry also res… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: Submitted to IEEE Transactions on Information Theory

  47. arXiv:2206.01344  [pdf

    eess.IV cs.CV cs.LG

    Detecting Pulmonary Embolism from Computed Tomography Using Convolutional Neural Network

    Authors: Chia-Hung Yang, Yun-Chien Cheng, Chin Kuo

    Abstract: The clinical symptoms of pulmonary embolism (PE) are very diverse and non-specific, which makes it difficult to diagnose. In addition, pulmonary embolism has multiple triggers and is one of the major causes of vascular death. Therefore, if it can be detected and treated quickly, it can significantly reduce the risk of death in hospitalized patients. In the detection process, the cost of computed t… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

  48. arXiv:2205.08106  [pdf

    eess.IV cs.CV cs.LG

    Computerized Tomography Pulmonary Angiography Image Simulation using Cycle Generative Adversarial Network from Chest CT imaging in Pulmonary Embolism Patients

    Authors: Chia-Hung Yang, Yun-Chien Cheng, Chin Kuo

    Abstract: The purpose of this research is to develop a system that generates simulated computed tomography pulmonary angiography (CTPA) images clinically for pulmonary embolism diagnoses. Nowadays, CTPA images are the gold standard computerized detection method to determine and identify the symptoms of pulmonary embolism (PE), although performing CTPA is harmful for patients and also expensive. Therefore, w… ▽ More

    Submitted 17 May, 2022; originally announced May 2022.

    Comments: 23 pages, 14 figures, 6 tables

  49. A Dual Sensor Computational Camera for High Quality Dark Videography

    Authors: Yuxiao Cheng, Runzhao Yang, Zhihong Zhang, Jinli Suo, Qionghai Dai

    Abstract: Videos captured under low light conditions suffer from severe noise. A variety of efforts have been devoted to image/video noise suppression and made large progress. However, in extremely dark scenarios, extensive photon starvation would hamper precise noise modeling. Instead, developing an imaging system collecting more photons is a more effective way for high-quality video capture under low illu… ▽ More

    Submitted 11 April, 2022; originally announced April 2022.

    Journal ref: Information Fusion Volume 93, May 2023, Pages 429-440

  50. arXiv:2204.04217  [pdf

    eess.IV cs.AI cs.CV

    Feature-enhanced Adversarial Semi-supervised Semantic Segmentation Network for Pulmonary Embolism Annotation

    Authors: Ting-Wei Cheng, Jerry Chang, Ching-Chun Huang, Chin Kuo, Yun-Chien Cheng

    Abstract: This study established a feature-enhanced adversarial semi-supervised semantic segmentation model to automatically annotate pulmonary embolism lesion areas in computed tomography pulmonary angiogram (CTPA) images. In current studies, all of the PE CTPA image segmentation methods are trained by supervised learning. However, the supervised learning models need to be retrained and the images need to… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.