Computer Science > Sound

arXiv:2203.15643 (cs)

[Submitted on 29 Mar 2022 (v1), last revised 5 Nov 2022 (this version, v2)]

Title:Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation

Authors:Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji, Andros Tjandra, Sakriani Sakti

View PDF

Abstract:Several solutions for lightweight TTS have shown promising results. Still, they either rely on a hand-crafted design that reaches non-optimum size or use a neural architecture search but often suffer training costs. We present Nix-TTS, a lightweight TTS achieved via knowledge distillation to a high-quality yet large-sized, non-autoregressive, and end-to-end (vocoder-free) TTS teacher model. Specifically, we offer module-wise distillation, enabling flexible and independent distillation to the encoder and decoder module. The resulting Nix-TTS inherited the advantageous properties of being non-autoregressive and end-to-end from the teacher, yet significantly smaller in size, with only 5.23M parameters or up to 89.34% reduction of the teacher model; it also achieves over 3.04x and 8.36x inference speedup on Intel-i7 CPU and Raspberry Pi 3B respectively and still retains a fair voice naturalness and intelligibility compared to the teacher model. We provide pretrained models and audio samples of Nix-TTS.

Comments:	Accepted at SLT 2022 (this https URL). Associated materials can be seen in this https URL
Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
MSC classes:	68T50 (Primary) 68T07, 68T10, 68T99 (Secondary)
ACM classes:	I.2.7; I.2.6; H.5.5
Cite as:	arXiv:2203.15643 [cs.SD]
	(or arXiv:2203.15643v2 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2203.15643

Submission history

From: Radityo Eko Prasojo [view email]
[v1] Tue, 29 Mar 2022 15:04:26 UTC (963 KB)
[v2] Sat, 5 Nov 2022 12:43:44 UTC (2,657 KB)

Computer Science > Sound

Title:Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators