lucataco / xtts-v2

Coqui XTTS-v2: Multilingual Text To Speech Voice Cloning

Input

Output

Run time and cost

This model runs on Nvidia A100 (40GB) GPU hardware. Predictions typically complete within 7 seconds.

Readme

This model expects that you use at least 6 seconds of audio

Note: Dont include spaces in your input audio file name

About

XTTS-v2 the Open, Foundation Speech Model by Coqui 🐸

Language Settings: English: en 🇺🇸 French: fr 🇫🇷 German: de 🇩🇪 Spanish: es 🇪🇸 Italian: it 🇮🇹 Portuguese: pt 🇵🇹 Czech: cs 🇨🇿 Polish: pl 🇵🇱 Russian: ru 🇷🇺 Dutch: nl 🇳🇱 Turksih: tr 🇹🇷 Arabic: ar 🇦🇪 Mandarin Chinese: zh-cn 🇨🇳

Changelog

11/28/23 - Added Hindi support