Skip to main content

Showing 1–31 of 31 results for author: Jin, Q

  1. arXiv:2406.10911  [pdf, other

    cs.SD eess.AS

    SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction

    Authors: Yuxun Tang, Jiatong Shi, Yuning Wu, Qin Jin

    Abstract: In speech generation tasks, human subjective ratings, usually referred to as the opinion score, are considered the "gold standard" for speech quality evaluation, with the mean opinion score (MOS) serving as the primary evaluation metric. Due to the high cost of human annotation, several MOS prediction systems have emerged in the speech domain, demonstrating good performance. These MOS prediction m… ▽ More

    Submitted 20 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  2. arXiv:2406.08905  [pdf, other

    cs.SD eess.AS

    SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models

    Authors: Yuxun Tang, Yuning Wu, Jiatong Shi, Qin Jin

    Abstract: Discrete representation has shown advantages in speech generation tasks, wherein discrete tokens are derived by discretizing hidden features from self-supervised learning (SSL) pre-trained models. However, the direct application of speech SSL models to singing generation encounters domain gaps between speech and singing. Furthermore, singing generation necessitates a more refined representation th… ▽ More

    Submitted 20 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  3. arXiv:2406.08416  [pdf, other

    cs.SD eess.AS

    TokSing: Singing Voice Synthesis based on Discrete Tokens

    Authors: Yuning Wu, Chunlei zhang, Jiatong Shi, Yuxun Tang, Shan Yang, Qin Jin

    Abstract: Recent advancements in speech synthesis witness significant benefits by leveraging discrete tokens extracted from self-supervised learning (SSL) models. Discrete tokens offer higher storage efficiency and greater operability in intermediate representations compared to traditional continuous Mel spectrograms. However, when it comes to singing voice synthesis(SVS), achieving higher levels of melody… ▽ More

    Submitted 20 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  4. arXiv:2406.07725  [pdf, ps, other

    cs.SD eess.AS

    The Interspeech 2024 Challenge on Speech Processing Using Discrete Units

    Authors: Xuankai Chang, Jiatong Shi, Jinchuan Tian, Yuning Wu, Yuxun Tang, Yihan Wu, Shinji Watanabe, Yossi Adi, Xie Chen, Qin Jin

    Abstract: Representing speech and audio signals in discrete units has become a compelling alternative to traditional high-dimensional feature vectors. Numerous studies have highlighted the efficacy of discrete units in various applications such as speech compression and restoration, speech recognition, and speech generation. To foster exploration in this domain, we introduce the Interspeech 2024 Challenge,… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: This manuscript has been accepted by Interspeech2024

  5. arXiv:2406.03688  [pdf, other

    eess.IV cs.CV

    Shadow and Light: Digitally Reconstructed Radiographs for Disease Classification

    Authors: Benjamin Hou, Qingqing Zhu, Tejas Sudarshan Mathai, Qiao Jin, Zhiyong Lu, Ronald M. Summers

    Abstract: In this paper, we introduce DRR-RATE, a large-scale synthetic chest X-ray dataset derived from the recently released CT-RATE dataset. DRR-RATE comprises of 50,188 frontal Digitally Reconstructed Radiographs (DRRs) from 21,304 unique patients. Each image is paired with a corresponding radiology text report and binary labels for 18 pathology classes. Given the controllable nature of DRR generation,… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  6. Fast, nonlocal and neural: a lightweight high quality solution to image denoising

    Authors: Yu Guo, Axel Davy, Gabriele Facciolo, Jean-Michel Morel, Qiyu Jin

    Abstract: With the widespread application of convolutional neural networks (CNNs), the traditional model based denoising algorithms are now outperformed. However, CNNs face two problems. First, they are computationally demanding, which makes their deployment especially difficult for mobile terminals. Second, experimental evidence shows that CNNs often over-smooth regular textures present in images, in contr… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 5 pages. This paper was accepted by IEEE Signal Processing Letters on July 1, 2021

    Journal ref: IEEE Signal Processing Letters, 2021, 28:1515-1519

  7. arXiv:2401.17619  [pdf, ps, other

    cs.SD eess.AS

    Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing

    Authors: Jiatong Shi, Yueqian Lin, Xinyi Bai, Keyi Zhang, Yuning Wu, Yuxun Tang, Yifeng Yu, Qin Jin, Shinji Watanabe

    Abstract: In singing voice synthesis (SVS), generating singing voices from musical scores faces challenges due to limited data availability. This study proposes a unique strategy to address the data scarcity in SVS. We employ an existing singing voice synthesizer for data augmentation, complemented by detailed manual tuning, an approach not previously explored in data curation, to reduce instances of unnatu… ▽ More

    Submitted 12 June, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

    Comments: Accepted by Interspeech2024

  8. arXiv:2311.14473  [pdf, other

    eess.IV cs.CV

    Joint Diffusion: Mutual Consistency-Driven Diffusion Model for PET-MRI Co-Reconstruction

    Authors: Taofeng Xie, Zhuo-Xu Cui, Chen Luo, Huayu Wang, Congcong Liu, Yuanzhi Zhang, Xuemei Wang, Yanjie Zhu, Guoqing Chen, Dong Liang, Qiyu Jin, Yihang Zhou, Haifeng Wang

    Abstract: Positron Emission Tomography and Magnetic Resonance Imaging (PET-MRI) systems can obtain functional and anatomical scans. PET suffers from a low signal-to-noise ratio. Meanwhile, the k-space data acquisition process in MRI is time-consuming. The study aims to accelerate MRI and enhance PET image quality. Conventional approaches involve the separate reconstruction of each modality within PET-MRI sy… ▽ More

    Submitted 10 July, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

  9. arXiv:2311.12581  [pdf, other

    eess.IV cs.CV

    A Region of Interest Focused Triple UNet Architecture for Skin Lesion Segmentation

    Authors: Guoqing Liu, Yu Guo, Caiying Wu, Guoqing Chen, Barintag Saheya, Qiyu Jin

    Abstract: Skin lesion segmentation is of great significance for skin lesion analysis and subsequent treatment. It is still a challenging task due to the irregular and fuzzy lesion borders, and diversity of skin lesions. In this paper, we propose Triple-UNet to automatically segment skin lesions. It is an organic combination of three UNet architectures with suitable modules. In order to concatenate the first… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: 15 pages, 5 figures

  10. arXiv:2309.13571  [pdf, other

    eess.IV cs.CV

    Matrix Completion-Informed Deep Unfolded Equilibrium Models for Self-Supervised k-Space Interpolation in MRI

    Authors: Chen Luo, Huayu Wang, Taofeng Xie, Qiyu Jin, Guoqing Chen, Zhuo-Xu Cui, Dong Liang

    Abstract: Recently, regularization model-driven deep learning (DL) has gained significant attention due to its ability to leverage the potent representational capabilities of DL while retaining the theoretical guarantees of regularization models. However, most of these methods are tailored for supervised learning scenarios that necessitate fully sampled labels, which can pose challenges in practical MRI app… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

  11. arXiv:2309.09250  [pdf, other

    cs.CV eess.IV

    Convex Latent-Optimized Adversarial Regularizers for Imaging Inverse Problems

    Authors: Huayu Wang, Chen Luo, Taofeng Xie, Qiyu Jin, Guoqing Chen, Zhuo-Xu Cui, Dong Liang

    Abstract: Recently, data-driven techniques have demonstrated remarkable effectiveness in addressing challenges related to MR imaging inverse problems. However, these methods still exhibit certain limitations in terms of interpretability and robustness. In response, we introduce Convex Latent-Optimized Adversarial Regularizers (CLEAR), a novel and interpretable data-driven paradigm. CLEAR represents a fusion… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

  12. arXiv:2308.09370  [pdf, other

    cs.CL cs.SD eess.AS

    TrOMR:Transformer-Based Polyphonic Optical Music Recognition

    Authors: Yixuan Li, Huaping Liu, Qiang Jin, Miaomiao Cai, Peng Li

    Abstract: Optical Music Recognition (OMR) is an important technology in music and has been researched for a long time. Previous approaches for OMR are usually based on CNN for image understanding and RNN for music symbol classification. In this paper, we propose a transformer-based approach with excellent global perceptual capability for end-to-end polyphonic OMR, called TrOMR. We also introduce a novel con… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  13. arXiv:2308.02867  [pdf, other

    cs.SD eess.AS

    A Systematic Exploration of Joint-training for Singing Voice Synthesis

    Authors: Yuning Wu, Yifeng Yu, Jiatong Shi, Tao Qian, Qin Jin

    Abstract: There has been a growing interest in using end-to-end acoustic models for singing voice synthesis (SVS). Typically, these models require an additional vocoder to transform the generated acoustic features into the final waveform. However, since the acoustic model and the vocoder are not jointly optimized, a gap can exist between the two models, leading to suboptimal performance. Although a similar… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

  14. arXiv:2303.08607  [pdf, other

    cs.SD eess.AS

    PHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution Predictor

    Authors: Yuning Wu, Jiatong Shi, Tao Qian, Dongji Gao, Qin Jin

    Abstract: Singing voice synthesis (SVS), as a specific task for generating the vocal singing voice from a music score, has drawn much attention in recent years. SVS faces the challenge that the singing has various pronunciation flexibility conditioned on the same music score. Most of the previous works of SVS can not well handle the misalignment between the music score and actual singing. In this paper, we… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  15. arXiv:2205.04029  [pdf, other

    cs.SD cs.MM eess.AS

    Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis

    Authors: Jiatong Shi, Shuai Guo, Tao Qian, Nan Huo, Tomoki Hayashi, Yuning Wu, Frank Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe, Qin Jin

    Abstract: This paper introduces a new open-source platform named Muskits for end-to-end music processing, which mainly focuses on end-to-end singing voice synthesis (E2E-SVS). Muskits supports state-of-the-art SVS models, including RNN SVS, transformer SVS, and XiaoiceSing. The design of Muskits follows the style of widely-used speech processing toolkits, ESPnet and Kaldi, for data prepossessing, training,… ▽ More

    Submitted 2 July, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

    Comments: Accepted by Interspeech

  16. arXiv:2203.17001  [pdf, other

    eess.AS cs.LG cs.SD

    SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy

    Authors: Shuai Guo, Jiatong Shi, Tao Qian, Shinji Watanabe, Qin Jin

    Abstract: Deep learning based singing voice synthesis (SVS) systems have been demonstrated to flexibly generate singing with better qualities, compared to conventional statistical parametric based methods. However, neural systems are generally data-hungry and have difficulty to reach reasonable singing quality with limited public available training data. In this work, we explore different data augmentation… ▽ More

    Submitted 6 July, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted by INTERSPEECH 2022

  17. arXiv:2203.13032  [pdf, ps, other

    cs.CV eess.IV

    Multi-modal Emotion Estimation for in-the-wild Videos

    Authors: Liyu Meng, Yuchen Liu, Xiaolong Liu, Zhaopei Huang, Yuan Cheng, Meng Wang, Chuanhe Liu, Qin Jin

    Abstract: In this paper, we briefly introduce our submission to the Valence-Arousal Estimation Challenge of the 3rd Affective Behavior Analysis in-the-wild (ABAW) competition. Our method utilizes the multi-modal information, i.e., the visual and audio information, and employs a temporal encoder to model the temporal context in the videos. Besides, a smooth processor is applied to get more reasonable predict… ▽ More

    Submitted 31 March, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

  18. arXiv:2203.10678  [pdf, other

    eess.IV

    SweiNet: Deep Learning Based Uncertainty Quantification for Ultrasound Shear Wave Elasticity Imaging

    Authors: Felix Q. Jin, Lindsey C. Carlson, Helen Feltovich, Timothy J. Hall, Mark L. Palmeri

    Abstract: In ultrasound shear wave elasticity (SWE) imaging, a number of algorithms exist for estimating the shear wave speed (SWS) from spatiotemporal displacement data. However, no method provides a well-calibrated and practical uncertainty metric, hindering SWE's clinical adoption and utility in downstream decision-making. Here, we designed a deep learning SWS estimator that simultaneously outputs a quan… ▽ More

    Submitted 20 March, 2022; originally announced March 2022.

    Comments: 9 pages, 8 figures

  19. arXiv:2111.00865  [pdf, other

    cs.CV eess.IV

    MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal Emotion Recognition

    Authors: Jinming Zhao, Ruichen Li, Qin Jin, Xinchao Wang, Haizhou Li

    Abstract: Multimodal emotion recognition study is hindered by the lack of labelled corpora in terms of scale and diversity, due to the high annotation cost and label ambiguity. In this paper, we propose a pre-training model \textbf{MEmoBERT} for multimodal emotion recognition, which learns multimodal joint representations through self-supervised learning from large-scale unlabeled video data that come in sh… ▽ More

    Submitted 27 October, 2021; originally announced November 2021.

    Comments: 4 papges, 2 figures

  20. GT U-Net: A U-Net Like Group Transformer Network for Tooth Root Segmentation

    Authors: Yunxiang Li, Shuai Wang, Jun Wang, Guodong Zeng, Wenjun Liu, Qianni Zhang, Qun Jin, Yaqi Wang

    Abstract: To achieve an accurate assessment of root canal therapy, a fundamental step is to perform tooth root segmentation on oral X-ray images, in that the position of tooth root boundary is significant anatomy information in root canal therapy evaluation. However, the fuzzy boundary makes the tooth root segmentation very challenging. In this paper, we propose a novel end-to-end U-Net like Group Transform… ▽ More

    Submitted 29 September, 2021; originally announced September 2021.

    Journal ref: 12th International Workshop, MLMI 2021, Held in Conjunction with MICCAI 2021

  21. arXiv:2107.06779  [pdf, other

    cs.CL cs.SD eess.AS

    MMGCN: Multimodal Fusion via Deep Graph Convolution Network for Emotion Recognition in Conversation

    Authors: Jingwen Hu, Yuchen Liu, Jinming Zhao, Qin Jin

    Abstract: Emotion recognition in conversation (ERC) is a crucial component in affective dialogue systems, which helps the system understand users' emotions and generate empathetic responses. However, most works focus on modeling speaker and contextual information primarily on the textual modality or simply leveraging multimodal information through feature concatenation. In order to explore a more effective… ▽ More

    Submitted 14 July, 2021; originally announced July 2021.

  22. Free-form tumor synthesis in computed tomography images via richer generative adversarial network

    Authors: Qiangguo Jin, Hui Cui, Changming Sun, Zhaopeng Meng, Ran Su

    Abstract: The insufficiency of annotated medical imaging scans for cancer makes it challenging to train and validate data-hungry deep learning models in precision oncology. We propose a new richer generative adversarial network for free-form 3D tumor/lesion synthesis in computed tomography (CT) images. The network is composed of a new richer convolutional feature enhanced dilated-gated generator (RicherDG)… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

  23. Domain adaptation based self-correction model for COVID-19 infection segmentation in CT images

    Authors: Qiangguo Jin, Hui Cui, Changming Sun, Zhaopeng Meng, Leyi Wei, Ran Su

    Abstract: The capability of generalization to unseen domains is crucial for deep learning models when considering real-world scenarios. However, current available medical image datasets, such as those for COVID-19 CT images, have large variations of infections and domain shift problems. To address this issue, we propose a prior knowledge driven domain adaptation and a dual-domain enhanced self-correction le… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

  24. arXiv:2012.12686  [pdf, other

    eess.IV math.NA physics.comp-ph

    Adorym: A multi-platform generic x-ray image reconstruction framework based on automatic differentiation

    Authors: Ming Du, Saugat Kandel, Junjing Deng, Xiaojing Huang, Arnaud Demortiere, Tuan Tu Nguyen, Remi Tucoulou, Vincent De Andrade, Qiaoling Jin, Chris Jacobsen

    Abstract: We describe and demonstrate an optimization-based x-ray image reconstruction framework called Adorym. Our framework provides a generic forward model, allowing one code framework to be used for a wide range of imaging methods ranging from near-field holography to and fly-scan ptychographic tomography. By using automatic differentiation for optimization, Adorym has the flexibility to refine experime… ▽ More

    Submitted 22 December, 2020; originally announced December 2020.

    MSC Class: 78-04

  25. arXiv:2012.02278  [pdf, other

    eess.IV cs.CV cs.LG

    Multiscale Attention Guided Network for COVID-19 Diagnosis Using Chest X-ray Images

    Authors: Jingxiong Li, Yaqi Wang, Shuai Wang, Jun Wang, Jun Liu, Qun Jin, Lingling Sun

    Abstract: Coronavirus disease 2019 (COVID-19) is one of the most destructive pandemic after millennium, forcing the world to tackle a health crisis. Automated lung infections classification using chest X-ray (CXR) images could strengthen diagnostic capability when handling COVID-19. However, classifying COVID-19 from pneumonia cases using CXR image is a difficult task because of shared spatial characteristi… ▽ More

    Submitted 8 January, 2021; v1 submitted 11 November, 2020; originally announced December 2020.

  26. arXiv:2011.10260  [pdf, other

    eess.IV cs.CV

    Edge Adaptive Hybrid Regularization Model For Image Deblurring

    Authors: Tingting Zhang, Jie Chen, Caiying Wu, Zhifei He, Tieyong Zeng, Qiyu Jin

    Abstract: The parameter selection is crucial to regularization based image restoration methods. Generally speaking, a spatially fixed parameter for regularization item in the whole image does not perform well for both edge and smooth areas. A larger parameter of regularization item reduces noise better in smooth areas but blurs edge regions, while a small parameter sharpens edge but causes residual noise. I… ▽ More

    Submitted 6 April, 2021; v1 submitted 20 November, 2020; originally announced November 2020.

  27. arXiv:2010.12024  [pdf, other

    eess.AS cs.LG cs.SD

    Sequence-to-sequence Singing Voice Synthesis with Perceptual Entropy Loss

    Authors: Jiatong Shi, Shuai Guo, Nan Huo, Yuekai Zhang, Qin Jin

    Abstract: The neural network (NN) based singing voice synthesis (SVS) systems require sufficient data to train well and are prone to over-fitting due to data scarcity. However, we often encounter data limitation problem in building SVS systems because of high data acquisition and annotation costs. In this work, we propose a Perceptual Entropy (PE) loss derived from a psycho-acoustic hearing model to regular… ▽ More

    Submitted 26 February, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: Accepted by ICASSP2021

  28. arXiv:2009.02598  [pdf, other

    eess.AS cs.MM cs.SD

    Semi-supervised Multi-modal Emotion Recognition with Cross-Modal Distribution Matching

    Authors: Jingjun Liang, Ruichen Li, Qin Jin

    Abstract: Automatic emotion recognition is an active research topic with wide range of applications. Due to the high manual annotation cost and inevitable label ambiguity, the development of emotion recognition dataset is limited in both scale and quality. Therefore, one of the key challenges is how to build effective models with limited data resource. Previous works have explored different approaches to ta… ▽ More

    Submitted 5 September, 2020; originally announced September 2020.

    Comments: 10 pages, 5 figures, to be published on ACM Multimedia 2020

  29. arXiv:2008.08647  [pdf, other

    eess.AS cs.SD

    Context-aware Goodness of Pronunciation for Computer-Assisted Pronunciation Training

    Authors: Jiatong Shi, Nan Huo, Qin Jin

    Abstract: Mispronunciation detection is an essential component of the Computer-Assisted Pronunciation Training (CAPT) systems. State-of-the-art mispronunciation detection models use Deep Neural Networks (DNN) for acoustic modeling, and a Goodness of Pronunciation (GOP) based algorithm for pronunciation scoring. However, GOP based scoring models have two major limitations: i.e., (i) They depend on forced a… ▽ More

    Submitted 19 August, 2020; originally announced August 2020.

    Comments: Accepted by Interspeech2020

  30. arXiv:2007.02017  [pdf, other

    cs.CV eess.IV

    FracBits: Mixed Precision Quantization via Fractional Bit-Widths

    Authors: Linjie Yang, Qing Jin

    Abstract: Model quantization helps to reduce model size and latency of deep neural networks. Mixed precision quantization is favorable with customized hardwares supporting arithmetic operations at multiple bit-widths to achieve maximum efficiency. We propose a novel learning-based algorithm to derive mixed precision models end-to-end under target computation constraints and model sizes. During the optimizat… ▽ More

    Submitted 2 December, 2020; v1 submitted 4 July, 2020; originally announced July 2020.

    Comments: Accepted by AAAI 2021

  31. arXiv:2004.11577  [pdf, other

    eess.IV cs.CV

    A Review of an Old Dilemma: Demosaicking First, or Denoising First?

    Authors: Qiyu Jin, Gabriele Facciolo, Jean-Michel Morel

    Abstract: Image denoising and demosaicking are the most important early stages in digital camera pipelines. They constitute a severely ill-posed problem that aims at reconstructing a full color image from a noisy color filter array (CFA) image. In most of the literature, denoising and demosaicking are treated as two independent problems, without considering their interaction, or asking which should be appli… ▽ More

    Submitted 24 April, 2020; originally announced April 2020.