Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2405.13344 (eess)

[Submitted on 22 May 2024]

Title:Contextualized Automatic Speech Recognition with Dynamic Vocabulary

Authors:Yui Sudo, Yosuke Fukumoto, Muhammad Shakeel, Yifan Peng, Shinji Watanabe

Abstract:Deep biasing (DB) improves the performance of end-to-end automatic speech recognition (E2E-ASR) for rare words or contextual phrases using a bias list. However, most existing methods treat bias phrases as sequences of subwords in a predefined static vocabulary, which can result in ineffective learning of the dependencies between subwords. More advanced techniques address this problem by incorporating additional text data, which increases the overall workload. This paper proposes a dynamic vocabulary where phrase-level bias tokens can be added during the inference phase. Each bias token represents an entire bias phrase within a single token, thereby eliminating the need to learn the dependencies between the subwords within the bias phrases. This method can be applied to various architectures because it only extends the embedding and output layers in common E2E-ASR architectures. Experimental results demonstrate that the proposed method improves the performance of bias phrases on English and Japanese datasets.

Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:2405.13344 [eess.AS]
	(or arXiv:2405.13344v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2405.13344

Submission history

From: Yui Sudo [view email]
[v1] Wed, 22 May 2024 05:03:39 UTC (987 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Contextualized Automatic Speech Recognition with Dynamic Vocabulary

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Contextualized Automatic Speech Recognition with Dynamic Vocabulary

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators