Skip to main content

Showing 1–50 of 69 results for author: Geng, M

  1. arXiv:2407.06310  [pdf, other

    cs.SD cs.AI cs.HC cs.LG eess.AS

    Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation

    Authors: Mengzhe Geng, Xurong Xie, Jiajun Deng, Zengrui Jin, Guinan Li, Tianzi Wang, Shujie Hu, Zhaoqing Li, Helen Meng, Xunying Liu

    Abstract: The application of data-intensive automatic speech recognition (ASR) technologies to dysarthric and elderly adult speech is confronted by their mismatch against healthy and nonaged voices, data scarcity and large speaker-level variability. To this end, this paper proposes two novel data-efficient methods to learn homogeneous dysarthric and elderly speaker-level features for rapid, on-the-fly test-… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: In submission to IEEE/ACM Transactions on Audio, Speech, and Language Processing

  2. arXiv:2406.10160  [pdf, other

    cs.SD cs.AI eess.AS

    One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model

    Authors: Zhaoqing Li, Haoning Xu, Tianzi Wang, Shoukang Hu, Zengrui Jin, Shujie Hu, Jiajun Deng, Mingyu Cui, Mengzhe Geng, Xunying Liu

    Abstract: We propose a novel one-pass multiple ASR systems joint compression and quantization approach using an all-in-one neural model. A single compression cycle allows multiple nested systems with varying Encoder depths, widths, and quantization precision settings to be simultaneously constructed without the need to train and store individual target systems separately. Experiments consistently demonstrat… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  3. arXiv:2406.10152  [pdf, other

    cs.SD eess.AS

    Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition

    Authors: Guinan Li, Jiajun Deng, Youjun Chen, Mengzhe Geng, Shujie Hu, Zhe Li, Zengrui Jin, Tianzi Wang, Xurong Xie, Helen Meng, Xunying Liu

    Abstract: This paper proposes joint speaker feature learning methods for zero-shot adaptation of audio-visual multichannel speech separation and recognition systems. xVector and ECAPA-TDNN speaker encoders are connected using purpose-built fusion blocks and tightly integrated with the complete system training. Experiments conducted on LRS3-TED data simulated multichannel overlapped speech suggest that joint… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  4. arXiv:2406.10034  [pdf, other

    cs.SD cs.AI eess.AS

    Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask

    Authors: Tianzi Wang, Xurong Xie, Zhaoqing Li, Shoukang Hu, Zengrui Jing, Jiajun Deng, Mingyu Cui, Shujie Hu, Mengzhe Geng, Guinan Li, Helen Meng, Xunying Liu

    Abstract: This paper proposes a novel non-autoregressive (NAR) block-based Attention Mask Decoder (AMD) that flexibly balances performance-efficiency trade-offs for Conformer ASR systems. AMD performs parallel NAR inference within contiguous blocks of output labels that are concealed using attention masks, while conducting left-to-right AR prediction and history context amalgamation between blocks. A beam s… ▽ More

    Submitted 16 July, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, 2 tables, Interspeech24 conference

  5. arXiv:2406.08911  [pdf, other

    cs.CL eess.AS

    An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios

    Authors: Cheng Gong, Erica Cooper, Xin Wang, Chunyu Qiang, Mengzhe Geng, Dan Wells, Longbiao Wang, Jianwu Dang, Marc Tessier, Aidan Pine, Korin Richmond, Junichi Yamagishi

    Abstract: Self-supervised learning (SSL) representations from massively multilingual models offer a promising solution for low-resource language speech tasks. Despite advancements, language adaptation in TTS systems remains an open problem. This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system proposed in our previous work. We conducted experiments on… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  6. arXiv:2405.19323  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Are Large Language Models Chameleons?

    Authors: Mingmeng Geng, Sihong He, Roberto Trotta

    Abstract: Do large language models (LLMs) have their own worldviews and personality tendencies? Simulations in which an LLM was asked to answer subjective questions were conducted more than 1 million times. Comparison of the responses from different LLMs with real data from the European Social Survey (ESS) suggests that the effect of prompts on bias and variability is fundamental, highlighting major cultura… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 16 pages,8 figures

  7. arXiv:2405.07639  [pdf

    astro-ph.EP physics.geo-ph

    Unveiling the Magmatic Architecture Beneath Oceanus Procellarum: Insights from GRAIL Mission Data

    Authors: Meixia Geng, Qingjie Yang, Chaouki Kasmi, J. Kim Welford, Alexander L. Peace

    Abstract: The Oceanus Procellarum region, characterized by its vast basaltic plains and pronounced volcanic activity, serves as a focal point for understanding the volcanic history of the Moon. Leveraging the Gravity Recovery and Interior Laboratory (GRAIL) mission data, we imaged the magmatic structures beneath the Oceanus Procellarum region. Our 3D density models uncover pronounced linear magmatic structu… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 30 pages, 6 figures, and 1 table

  8. arXiv:2405.05814  [pdf

    eess.IV cs.CV

    MSDiff: Multi-Scale Diffusion Model for Ultra-Sparse View CT Reconstruction

    Authors: Pinhuang Tan, Mengxiao Geng, Jingya Lu, Liu Shi, Bin Huang, Qiegen Liu

    Abstract: Computed Tomography (CT) technology reduces radiation haz-ards to the human body through sparse sampling, but fewer sampling angles pose challenges for image reconstruction. Score-based generative models are widely used in sparse-view CT re-construction, performance diminishes significantly with a sharp reduction in projection angles. Therefore, we propose an ultra-sparse view CT reconstruction me… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  9. arXiv:2405.05763  [pdf

    cs.CV cs.AI

    DP-MDM: Detail-Preserving MR Reconstruction via Multiple Diffusion Models

    Authors: Mengxiao Geng, Jiahao Zhu, Xiaolin Zhu, Qiqing Liu, Dong Liang, Qiegen Liu

    Abstract: Detail features of magnetic resonance images play a cru-cial role in accurate medical diagnosis and treatment, as they capture subtle changes that pose challenges for doc-tors when performing precise judgments. However, the widely utilized naive diffusion model has limitations, as it fails to accurately capture more intricate details. To en-hance the quality of MRI reconstruction, we propose a com… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  10. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  11. arXiv:2404.08627  [pdf, other

    cs.CL cs.AI cs.DL cs.LG

    Is ChatGPT Transforming Academics' Writing Style?

    Authors: Mingmeng Geng, Roberto Trotta

    Abstract: Based on one million arXiv papers submitted from May 2018 to January 2024, we assess the textual density of ChatGPT's writing style in their abstracts by means of a statistical analysis of word frequency changes. Our model is calibrated and validated on a mixture of real abstracts and ChatGPT-modified abstracts (simulated data) after a careful noise analysis. We find that ChatGPT is having an incr… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 15 pages, 19 figures

  12. arXiv:2403.03382  [pdf, other

    cs.AI

    Adaptive Discovering and Merging for Incremental Novel Class Discovery

    Authors: Guangyao Chen, Peixi Peng, Yangru Huang, Mengyue Geng, Yonghong Tian

    Abstract: One important desideratum of lifelong learning aims to discover novel classes from unlabelled data in a continuous manner. The central challenge is twofold: discovering and learning novel classes while mitigating the issue of catastrophic forgetting of established knowledge. To this end, we introduce a new paradigm called Adaptive Discovering and Merging (ADM) to discover novel categories adaptive… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: AAAI 2024. arXiv admin note: text overlap with arXiv:2207.08605 by other authors

  13. arXiv:2401.00662  [pdf, other

    cs.SD eess.AS

    Enhancing Pre-trained ASR System Fine-tuning for Dysarthric Speech Recognition using Adversarial Data Augmentation

    Authors: Huimeng Wang, Zengrui Jin, Mengzhe Geng, Shujie Hu, Guinan Li, Tianzi Wang, Haoning Xu, Xunying Liu

    Abstract: Automatic recognition of dysarthric speech remains a highly challenging task to date. Neuro-motor conditions and co-occurring physical disabilities create difficulty in large-scale data collection for ASR system development. Adapting SSL pre-trained ASR models to limited dysarthric speech via data-intensive parameter fine-tuning leads to poor generalization. To this end, this paper presents an ext… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

    Comments: To appear at IEEE ICASSP 2024

  14. arXiv:2312.11562  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    A Survey of Reasoning with Foundation Models

    Authors: Jiankai Sun, Chuanyang Zheng, Enze Xie, Zhengying Liu, Ruihang Chu, Jianing Qiu, Jiaqi Xu, Mingyu Ding, Hongyang Li, Mengzhe Geng, Yue Wu, Wenhai Wang, Junsong Chen, Zhangyue Yin, Xiaozhe Ren, Jie Fu, Junxian He, Wu Yuan, Qi Liu, Xihui Liu, Yu Li, Hao Dong, Yu Cheng, Ming Zhang, Pheng Ann Heng , et al. (9 additional authors not shown)

    Abstract: Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing development of foundation models, e.g., Large Language Models (LLMs), there is a growing interest in exploring… ▽ More

    Submitted 25 January, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

    Comments: 20 Figures, 160 Pages, 750+ References, Project Page https://github.com/reasoning-survey/Awesome-Reasoning-Foundation-Models

  15. arXiv:2312.08641  [pdf, other

    eess.AS cs.SD

    Towards Automatic Data Augmentation for Disordered Speech Recognition

    Authors: Zengrui Jin, Xurong Xie, Tianzi Wang, Mengzhe Geng, Jiajun Deng, Guinan Li, Shujie Hu, Xunying Liu

    Abstract: Automatic recognition of disordered speech remains a highly challenging task to date due to data scarcity. This paper presents a reinforcement learning (RL) based on-the-fly data augmentation approach for training state-of-the-art PyChain TDNN and end-to-end Conformer ASR systems on such data. The handcrafted temporal and spectral mask operations in the standard SpecAugment method that are task an… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: To appear at IEEE ICASSP 2024

  16. arXiv:2311.18442  [pdf, other

    cond-mat.supr-con cond-mat.mtrl-sci cond-mat.str-el

    Electronic Phase Propagation Speed in BaFe$_2$As$_2$ Revealed by Dilatometry

    Authors: Xin Qin, Xingyu Wang, Wenshan Hong, Mengqiao Geng, Yuan Li, Huiqian Luo, Shiliang Li, Yang Liu

    Abstract: Thermal expansion offers deep insights into phase transitions in condensed matter physics. Utilizing an advanced AC-temperature dilatometer with picometer resolution, this study clearly resolves the antiferromagnetic and structural transition in BaFe$_2$As$_2$. The implementation of temperature oscillation reveals a hysteresis near the transition temperature $T_\mathrm{N}$ with unprecedented resol… ▽ More

    Submitted 26 March, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

  17. arXiv:2311.16641  [pdf, other

    physics.ins-det cond-mat.mtrl-sci cond-mat.str-el cond-mat.supr-con physics.app-ph

    A High Resolution Dilatometer Using Optical Fiber Interferometer

    Authors: Xin Qin, Guoxin Cao, Mengqiao Geng, Shengchun Liu, Yang Liu

    Abstract: We introduce a high performance differential dilatometer based on an all-fiber Michelson interferometer at cryogenic temperature with $10^{-10}$ resolution in $δL/L$. It resolve the linear thermal expansion coefficient by measuring the oscillating changes of sample thickness and sample temperature with the interferometer and in-situ thermometer, respectively. By measuring the linear thermal expans… ▽ More

    Submitted 13 May, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

  18. arXiv:2311.09667  [pdf, other

    cs.DB

    Repetitive nonoverlapping sequential pattern mining

    Authors: Meng Geng, Youxi Wu, Yan Li, Jing Liu, Philippe Fournier-Viger, Xingquan Zhu, Xindong Wu

    Abstract: Sequential pattern mining (SPM) is an important branch of knowledge discovery that aims to mine frequent sub-sequences (patterns) in a sequential database. Various SPM methods have been investigated, and most of them are classical SPM methods, since these methods only consider whether or not a given pattern occurs within a sequence. Classical SPM can only find the common features of sequences, but… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  19. arXiv:2309.10986  [pdf, other

    math.NA econ.GN

    Research on the Impact of Executive Shareholding on New Investment in Enterprises Based on Multivariable Linear Regression Model

    Authors: Shanyi Zhou, Ning Yan, Zhijun Li, Mo Geng, Xulong Zhang, Hongbiao Si, Lihua Tang, Wenyuan Sun, Longda Zhang, Yi Cao

    Abstract: Based on principal-agent theory and optimal contract theory, companies use the method of increasing executives' shareholding to stimulate collaborative innovation. However, from the aspect of agency costs between management and shareholders (i.e. the first type) and between major shareholders and minority shareholders (i.e. the second type), the interests of management, shareholders and creditors… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Accepted by the 7th APWeb-WAIM International Joint Conference on Web and Big Data. (APWeb 2023)

  20. arXiv:2308.03963  [pdf, other

    cond-mat.mtrl-sci physics.comp-ph physics.geo-ph

    Influence of electronic entropy on Hellmann-Feynman forces in ab initio molecular dynamics with large temperature changes

    Authors: Ming Geng, Chris E. Mohn

    Abstract: The Z method is a popular atomistic simulation method for determining the melting temperature where a sequence of molecular dynamics runs are carried out to target the lowest system energy where the solid always melts. Homogeneous melting at the limit of critical superheating, Th, is accompanied by a drop in temperature as kinetic energy is converted to potential energy and the equilibrium melting… ▽ More

    Submitted 20 September, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

    Comments: 10 figures, 17 pages

  21. arXiv:2308.03018  [pdf, other

    cs.CV eess.IV

    Recurrent Spike-based Image Restoration under General Illumination

    Authors: Lin Zhu, Yunlong Zheng, Mengyue Geng, Lizhi Wang, Hua Huang

    Abstract: Spike camera is a new type of bio-inspired vision sensor that records light intensity in the form of a spike array with high temporal resolution (20,000 Hz). This new paradigm of vision sensor offers significant advantages for many vision tasks such as high speed image reconstruction. However, existing spike-based approaches typically assume that the scenes are with sufficient light intensity, whi… ▽ More

    Submitted 6 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM MM 2023

  22. arXiv:2307.05444  [pdf, other

    cond-mat.mtrl-sci physics.geo-ph

    Ab initio constraints on silica melting to 500 GPa

    Authors: Ming Geng, Chris E. Mohn

    Abstract: The melting curve of pure silica (SiO$_2$) was determined using {\it ab initio} density functional theory together with the solid-liquid coexisting approach, thermodynamic integration and the Z method. The melting curves are consistent with a smooth slow increase in a large region from 50 GPa (dT/dP $\approx$ 15 K/GPa) to about 500 GPa (dT/dP $\approx$ 5 K/GPa) without any abrupt changes at around… ▽ More

    Submitted 18 November, 2023; v1 submitted 7 July, 2023; originally announced July 2023.

  23. arXiv:2307.02909  [pdf, other

    eess.AS cs.AI cs.SD

    Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition

    Authors: Guinan Li, Jiajun Deng, Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Mingyu Cui, Helen Meng, Xunying Liu

    Abstract: Accurate recognition of cocktail party speech containing overlapping speakers, noise and reverberation remains a highly challenging task to date. Motivated by the invariance of visual modality to acoustic signal corruption, an audio-visual multi-channel speech separation, dereverberation and recognition approach featuring a full incorporation of visual information into all system components is pro… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing

  24. arXiv:2306.15265  [pdf, other

    eess.AS cs.LG

    Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition

    Authors: Tianzi Wang, Shoukang Hu, Jiajun Deng, Zengrui Jin, Mengzhe Geng, Yi Wang, Helen Meng, Xunying Liu

    Abstract: Automatic recognition of disordered and elderly speech remains highly challenging tasks to date due to data scarcity. Parameter fine-tuning is often used to exploit the large quantities of non-aged and healthy speech pre-trained models, while neural architecture hyper-parameters are set using expert knowledge and remain unchanged. This paper investigates hyper-parameter adaptation for Conformer AS… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: 5 pages, 3 figures, 3 tables, accepted by Interspeech2023

  25. arXiv:2306.14608  [pdf, other

    eess.AS cs.CL

    Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems

    Authors: Jiajun Deng, Guinan Li, Xurong Xie, Zengrui Jin, Mingyu Cui, Tianzi Wang, Shujie Hu, Mengzhe Geng, Xunying Liu

    Abstract: Rich sources of variability in natural speech present significant challenges to current data intensive speech recognition technologies. To model both speaker and environment level diversity, this paper proposes a novel Bayesian factorised speaker-environment adaptive training and test time adaptation approach for Conformer ASR models. Speaker and environment level characteristics are separately mo… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: Accepted by INTERSPEECH 2023

  26. arXiv:2306.06564   

    quant-ph

    Guarding Quantum Key Distribution with integrated Magnetic-free Nonreciprocal Structures

    Authors: Qiang Liu, Yinming Huang, Tingting Luo, Chunfeng Huang, Minming Geng, Zhenrong Zhang, Kejin Wei

    Abstract: Inserting nonreciprocal devices at the doorways of Alice and Bob is a widely recognized countermeasure against quantum hacking attacks in quantum key distribution (QKD) systems. However, traditional integrated nonreciprocal devices, which are typically based on magneto-optical effects, face challenges in compatibility with current semiconductor integration technology. As a result, earlier chip-bas… ▽ More

    Submitted 4 August, 2023; v1 submitted 10 June, 2023; originally announced June 2023.

    Comments: We have found that the presented structure is a mode convertor which is suitable for guarding quantum key ditribution

  27. arXiv:2305.10659  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    Use of Speech Impairment Severity for Dysarthric Speech Recognition

    Authors: Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Jiajun Deng, Mingyu Cui, Guinan Li, Jianwei Yu, Xurong Xie, Xunying Liu

    Abstract: A key challenge in dysarthric speech recognition is the speaker-level diversity attributed to both speaker-identity associated factors such as gender, and speech impairment severity. Most prior researches on addressing this issue focused on using speaker-identity only. To this end, this paper proposes a novel set of techniques to use both severity and speaker-identity in dysarthric speech recognit… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: Accepted to INTERSPEECH2023

  28. arXiv:2304.11384  [pdf, other

    cs.SE

    Large Language Models are Few-Shot Summarizers: Multi-Intent Comment Generation via In-Context Learning

    Authors: Mingyang Geng, Shangwen Wang, Dezun Dong, Haotian Wang, Ge Li, Zhi Jin, Xiaoguang Mao, Xiangke Liao

    Abstract: Code comment generation aims at generating natural language descriptions for a code snippet to facilitate developers' program comprehension activities. Despite being studied for a long time, a bottleneck for existing approaches is that given a code snippet, they can only generate one comment while developers usually need to know information from diverse perspectives such as what is the functionali… ▽ More

    Submitted 14 June, 2023; v1 submitted 22 April, 2023; originally announced April 2023.

    Comments: Accepted by the 46th International Conference on Software Engineering (ICSE 2024)

  29. arXiv:2302.14564  [pdf, other

    cs.SD cs.AI eess.AS

    Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition

    Authors: Shujie Hu, Xurong Xie, Zengrui Jin, Mengzhe Geng, Yi Wang, Mingyu Cui, Jiajun Deng, Xunying Liu, Helen Meng

    Abstract: Automatic recognition of disordered and elderly speech remains a highly challenging task to date due to the difficulty in collecting such data in large quantities. This paper explores a series of approaches to integrate domain adapted SSL pre-trained models into TDNN and Conformer ASR systems for dysarthric and elderly speech recognition: a) input feature fusion between standard acoustic frontends… ▽ More

    Submitted 22 June, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: accepted by ICASSP 2023

  30. arXiv:2211.01646  [pdf, other

    eess.AS cs.SD

    Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition

    Authors: Zengrui Jin, Xurong Xie, Mengzhe Geng, Tianzi Wang, Shujie Hu, Jiajun Deng, Guinan Li, Xunying Liu

    Abstract: Automatic recognition of disordered speech remains a highly challenging task to date. The underlying neuro-motor conditions, often compounded with co-occurring physical disabilities, lead to the difficulty in collecting large quantities of impaired speech required for ASR system development. This paper presents novel variational auto-encoder generative adversarial network (VAE-GAN) based personali… ▽ More

    Submitted 19 March, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  31. arXiv:2208.13259  [pdf, other

    cs.CL cs.AI

    Bayesian Neural Network Language Modeling for Speech Recognition

    Authors: Boyang Xue, Shoukang Hu, Junhao Xu, Mengzhe Geng, Xunying Liu, Helen Meng

    Abstract: State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex. They are prone to overfitting and poor generalization when given limited training data. To this end, an overarching full Bayesian learning framework encompassing three methods is proposed in this paper to account for the u… ▽ More

    Submitted 28 August, 2022; originally announced August 2022.

  32. arXiv:2206.13232  [pdf, other

    eess.AS cs.LG cs.SD

    Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection

    Authors: Tianzi Wang, Jiajun Deng, Mengzhe Geng, Zi Ye, Shoukang Hu, Yi Wang, Mingyu Cui, Zengrui Jin, Xunying Liu, Helen Meng

    Abstract: Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care to delay further progression. This paper presents the development of a state-of-the-art Conformer based speech recognition system built on the DementiaBank Pitt corpus for automatic AD detection. The baseline Conformer system trained with speed perturbation and SpecAugment based data augmentation is significantl… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: 5 pages, 1 figure, accepted by INTERSPEECH 2022

  33. arXiv:2206.12045  [pdf, other

    eess.AS cs.SD

    Confidence Score Based Conformer Speaker Adaptation for Speech Recognition

    Authors: Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui Jin, Mengzhe Geng, Guinan Li, Xunying Liu, Helen Meng

    Abstract: A key challenge for automatic speech recognition (ASR) systems is to model the speaker level variability. In this paper, compact speaker dependent learning hidden unit contributions (LHUC) are used to facilitate both speaker adaptive training (SAT) and test time unsupervised speaker adaptation for state-of-the-art Conformer based end-to-end ASR systems. The sensitivity during adaptation to supervi… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: It's accepted to INTERSPEECH 2022. arXiv admin note: text overlap with arXiv:2206.11596

  34. Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems

    Authors: Mingyu Cui, Jiajun Deng, Shoukang Hu, Xurong Xie, Tianzi Wang, Shujie Hu, Mengzhe Geng, Boyang Xue, Xunying Liu, Helen Meng

    Abstract: Fundamental modelling differences between hybrid and end-to-end (E2E) automatic speech recognition (ASR) systems create large diversity and complementarity among them. This paper investigates multi-pass rescoring and cross adaptation based system combination approaches for hybrid TDNN and Conformer E2E ASR systems. In multi-pass rescoring, state-of-the-art hybrid LF-MMI trained CNN-TDNN system fea… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: It' s accepted to ISCA 2022

  35. arXiv:2206.07327  [pdf, other

    eess.AS cs.AI

    Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition

    Authors: Shujie Hu, Xurong Xie, Mengzhe Geng, Mingyu Cui, Jiajun Deng, Guinan Li, Tianzi Wang, Xunying Liu, Helen Meng

    Abstract: Articulatory features are inherently invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition (ASR) systems designed for normal speech. Their practical application to atypical task domains such as elderly and disordered speech across languages is often limited by the difficulty in collecting such specialist data from target speakers. This pa… ▽ More

    Submitted 22 June, 2023; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: accepted by INTERSPEECH 2023

  36. Two-party secure semiquantum summation against the collective-dephasing noise

    Authors: Tian-Yu Ye, Tian-Jie Xu, Mao-Jie Geng, Ying Chen

    Abstract: In this paper, we propose a two-party semiquantum summation protocol, where two classical users can accomplish the summation of their private binary sequences with the assistance of a quantum semi-honest third party (TP). The term 'semi-honest' implies that TP cannot conspire with others but is able to implement all kinds oof attacks. This protocol employs logical qubits as traveling particles to… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

    Comments: 9 pages, 2 tables

    Journal ref: Quantum Information Processing, 2022, 21:118

  37. Quantum dialogue based on quantum encryption with single photons in both polarization and spatial-mode degrees of freedom

    Authors: Tian-Yu Ye, Mao-Jie Geng, Tian-Jie Xu, Ying Chen

    Abstract: In this paper, a novel information leakage resistant quantum dialogue (QD) protocol with single photons in both polarization and spatial-mode degrees of freedom is proposed, which utilizes quantum encryption technology to overcome the information leakage problem. In the proposed QD protocol, during the transmission process, the single photons in both polarization and spatial-mode degrees of freedo… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

    Comments: 7 pages

    Journal ref: Scientia Sinica Physica, Mechanica & Astronomica, 2021, 51(10): 100311

  38. Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition

    Authors: Zengrui Jin, Mengzhe Geng, Jiajun Deng, Tianzi Wang, Shujie Hu, Guinan Li, Xunying Liu

    Abstract: Despite the rapid progress of automatic speech recognition (ASR) technologies targeting normal speech, accurate recognition of dysarthric and elderly speech remains highly challenging tasks to date. It is difficult to collect large quantities of such data for ASR system development due to the mobility issues often found among these users. To this end, data augmentation techniques play a vital role… ▽ More

    Submitted 23 June, 2022; v1 submitted 13 May, 2022; originally announced May 2022.

    Comments: arXiv admin note: text overlap with arXiv:2202.10290

  39. arXiv:2205.04927  [pdf

    quant-ph

    Semiquantum private comparison based on Bell states without quantum measurements from the classical user

    Authors: Mao-Jie Geng, Xia Li, Tian-Yu Ye

    Abstract: In this paper, we propose a novel semiquantum private comparison (SQPC) protocol based on Bell states, which enables one quantum user and one classical user to compare the equality of their private inputs with the help of a semi-honest quantum third party (TP). TP is assumed to be semi-honest in the sense that she may take all possible attacks to steal users' private inputs except conspiring with… ▽ More

    Submitted 25 September, 2023; v1 submitted 10 May, 2022; originally announced May 2022.

    Comments: 17 pages, 1 figure, 3 tables

  40. arXiv:2203.14593  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition

    Authors: Mengzhe Geng, Xurong Xie, Rongfeng Su, Jianwei Yu, Zengrui Jin, Tianzi Wang, Shujie Hu, Zi Ye, Helen Meng, Xunying Liu

    Abstract: Accurate recognition of dysarthric and elderly speech remain challenging tasks to date. Speaker-level heterogeneity attributed to accent or gender, when aggregated with age and speech impairment, create large diversity among these speakers. Scarcity of speaker-level data limits the practical use of data-intensive model based speaker adaptation methods. To this end, this paper proposes two novel fo… ▽ More

    Submitted 28 May, 2023; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: Accepted to INTERSPEECH 2023

  41. arXiv:2203.10274  [pdf, other

    eess.AS cs.AI

    Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For Disordered Speech Recognition

    Authors: Shujie Hu, Shansong Liu, Xurong Xie, Mengzhe Geng, Tianzi Wang, Shoukang Hu, Mingyu Cui, Xunying Liu, Helen Meng

    Abstract: Articulatory features are inherently invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition (ASR) systems for normal speech. Their practical application to disordered speech recognition is often limited by the difficulty in collecting such specialist data from impaired speakers. This paper presents a cross-domain acoustic-to-articulatory (… ▽ More

    Submitted 19 March, 2022; originally announced March 2022.

    Comments: accepted by ICASSP 2022

  42. arXiv:2202.10290  [pdf, other

    eess.AS cs.AI cs.LG cs.SD q-bio.QM

    Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition

    Authors: Mengzhe Geng, Xurong Xie, Zi Ye, Tianzi Wang, Guinan Li, Shujie Hu, Xunying Liu, Helen Meng

    Abstract: Despite the rapid progress of automatic speech recognition (ASR) technologies targeting normal speech in recent decades, accurate recognition of dysarthric and elderly speech remains highly challenging tasks to date. Sources of heterogeneity commonly found in normal speech including accent or gender, when further compounded with the variability over age and speech pathology severity level, create… ▽ More

    Submitted 17 March, 2022; v1 submitted 21 February, 2022; originally announced February 2022.

    Comments: In submission to IEEE/ACM Transactions on Audio Speech and Language Processing

  43. arXiv:2201.05845  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    Recent Progress in the CUHK Dysarthric Speech Recognition System

    Authors: Shansong Liu, Mengzhe Geng, Shoukang Hu, Xurong Xie, Mingyu Cui, Jianwei Yu, Xunying Liu, Helen Meng

    Abstract: Despite the rapid progress of automatic speech recognition (ASR) technologies in the past few decades, recognition of disordered speech remains a highly challenging task to date. Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based ASR technologies that predominantly target normal speech. This paper presents recent research efforts at… ▽ More

    Submitted 26 February, 2022; v1 submitted 15 January, 2022; originally announced January 2022.

  44. arXiv:2201.05562  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Investigation of Data Augmentation Techniques for Disordered Speech Recognition

    Authors: Mengzhe Geng, Xurong Xie, Shansong Liu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng

    Abstract: Disordered speech recognition is a highly challenging task. The underlying neuro-motor conditions of people with speech disorders, often compounded with co-occurring physical disabilities, lead to the difficulty in collecting large quantities of speech required for system development. This paper investigates a set of data augmentation techniques for disordered speech recognition, including vocal t… ▽ More

    Submitted 14 January, 2022; originally announced January 2022.

    Comments: Proceedings of INTERSPEECH 2020

  45. arXiv:2201.05554  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition

    Authors: Mengzhe Geng, Shansong Liu, Jianwei Yu, Xurong Xie, Shoukang Hu, Zi Ye, Zengrui Jin, Xunying Liu, Helen Meng

    Abstract: Automatic recognition of disordered speech remains a highly challenging task to date. Sources of variability commonly found in normal speech including accent, age or gender, when further compounded with the underlying causes of speech impairment and varying severity levels, create large diversity among speakers. To this end, speaker adaptation techniques play a vital role in current speech recogni… ▽ More

    Submitted 14 January, 2022; originally announced January 2022.

    Comments: Proceedings of INTERSPEECH 2021

  46. Semiquantum Private Comparison of Size Relationship Based on d-level Single-Particle States

    Authors: Mao-Jie Geng, Tian-Jie Xu, Ying Chen, Tian-Yu Ye

    Abstract: In this paper, we propose a novel semiquantum private comparison (SQPC) protocol of size relationship based on d-level single-particle states. The designed protocol can compare the size relationship of different privacy messages from two classical users with the help of a semi-honest third party (TP), who is permitted to misbehave on her own but cannot be in collusion with anyone else. The correct… ▽ More

    Submitted 12 January, 2022; originally announced January 2022.

    Comments: 12 pages, 2 figures, 2 tables

    Journal ref: Scientia Sinica Physica, Mechanica & Astronomica , 2022, 52(9): 290311

  47. arXiv:2201.03943  [pdf, other

    eess.AS cs.SD

    Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks

    Authors: Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng

    Abstract: State-of-the-art automatic speech recognition (ASR) system development is data and computation intensive. The optimal design of deep neural networks (DNNs) for these systems often require expert knowledge and empirical evaluation. In this paper, a range of neural architecture search (NAS) techniques are used to automatically learn two types of hyper-parameters of factored time delay neural network… ▽ More

    Submitted 28 March, 2022; v1 submitted 8 January, 2022; originally announced January 2022.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP). arXiv admin note: text overlap with arXiv:2007.08818

  48. Experimental secure quantum key distribution in presence of polarization-dependent loss

    Authors: Chunfeng Huang, Ye Chen, Long Jin, Minming Geng, Junwei Wang, Zhenrong Zhang, Kejin Wei

    Abstract: Quantum key distribution (QKD) is theoretically secure using the principle of quantum mechanics; therefore, QKD is a promising solution for the future of secure communication. Although several experimental demonstrations of QKD have been reported, they have not considered the polarization-dependent loss in state preparation in the key-rate estimation. In this study, we experimentally characterized… ▽ More

    Submitted 3 January, 2022; originally announced January 2022.

  49. arXiv:2112.14379  [pdf, other

    cs.CV

    Background-aware Classification Activation Map for Weakly Supervised Object Localization

    Authors: Lei Zhu, Qi She, Qian Chen, Xiangxi Meng, Mufeng Geng, Lujia Jin, Zhe Jiang, Bin Qiu, Yunfei You, Yibao Zhang, Qiushi Ren, Yanye Lu

    Abstract: Weakly supervised object localization (WSOL) relaxes the requirement of dense annotations for object localization by using image-level classification masks to supervise its learning process. However, current WSOL methods suffer from excessive activation of background locations and need post-processing to obtain the localization mask. This paper attributes these issues to the unawareness of backgro… ▽ More

    Submitted 28 December, 2021; originally announced December 2021.

  50. Single-state multi-party semiquantum key agreement protocol based on multi-particle GHZ entangled states

    Authors: Tian-Jie Xu, Ying Chen, Mao-Jie Geng, Tian-Yu Ye

    Abstract: In this paper, we put forward a novel single-state three-party semiquantum key agreement (SQKA) protocol with three-particle GHZ entangled states first. Different with previous quantum key agreement (QKA) protocols, the proposed single-state three-party SQKA protocol can realize the goal that a quantum party and two classical parties who only possess limited quantum capabilities equally contribute… ▽ More

    Submitted 30 July, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

    Comments: 21 pages, 2 figures, 2 tables

    Journal ref: Quantum Information Processing,2022,21:266