ChatTTS: Revolutionary Conversational Text-to-Speech Model
Explore the cutting-edge text-to-speech algorithm tailored for daily dialogue, offering advanced features in English and Chinese
Key Aspects
- No key aspects available
Tags
ChatTTS Features
Conversational TTS
ChatTTS is optimized for dialogue-based tasks, enabling natural and expressive speech synthesis. It supports multiple speakers, facilitating interactive conversations.
Fine-grained Control
The model can predict and control fine-grained prosodic features, including laughter, pauses, and interjections, providing a more natural speech output.
ChatTTS Specifications
Dataset and Model Details
The main model is trained with Chinese and English audio data of over 100,000 hours. An open-source version is available on HuggingFace, which is a 40,000 hours pre-trained model without SFT.
Roadmap
Future plans for ChatTTS include open-sourcing the 40k-hours-base model and spk_stats file, streaming audio generation, and multi-emotion controlling. Additionally, there are plans for ChatTTS.cpp, indicating a potential for broader compatibility and integration.
ChatTTS Usage Instructions
Installation and Setup
To get started with ChatTTS, users can clone the repository from GitHub and install the required packages. There are options to install directly via pip or from conda, with additional optional installations for NVIDIA GPU users.
Basic and Advanced Usage
Basic usage involves importing the ChatTTS module, loading the model, and inferring text to speech. Advanced usage allows for more control, including sampling a random speaker, customizing temperature, and manual control over sentence and word levels.
ChatTTS Customer Service Details
Contact Information
For formal inquiries about the model and roadmap, users can contact the team at [email protected]. Additionally, there are multiple online chat options available, including QQ groups for Chinese users and a Discord server for global community interaction.
ChatTTS Common Issues and Problems
Model Stability and Performance
Some users may experience issues with model stability, particularly with multi-speaker scenarios or poor audio quality. These issues are common with autoregressive models and can be mitigated by trying multiple samples to find a suitable result.
Control Over Emotions
Currently, the released model allows for control over laughter, breaks, and intonation. Future versions may include more emotional control capabilities.