GigaSpeech
GigaSpeech is a vast multi-domain English corpus of up to 10,000 hours of high-quality audio from audiobooks, podcasts, and YouTube videos. It includes different speech styles, from read speech to spontaneous speech, on a variety of topics. The dataset is designed for automatic speech recognition (ASR) and speech synthesis (TTS).
Up to 10,000 hours of transcribed audio, WAV/opus files, various audio segments
Apache 2.0
Description
The dataset GigaSpeech contains a vast array of audio transcribed in English, collected from a variety of sources such as audiobooks, podcasts, and YouTube videos. It offers several configurations ranging from 10 hours (XS) to 10,000 hours (XL) to adapt to research and industrial needs. The audio segments are accompanied by accurate text transcripts, making it possible to train robust speech recognition and synthesis models.
What is this dataset for?
- Train automatic speech recognition (ASR) models in English on large amounts of data.
- Form speech synthesis systems (TTS) from varied and quality audio.
- Test and evaluate models in various thematic areas and speech styles.
Can it be enriched or improved?
Yes, the dataset can be supplemented with additional annotations, finer segmentations, or integrations of new audio sources. It is also possible to adapt transcripts for specific use cases or to add metadata to enrich user experiences.
🔎 In summary
🧠 Recommended for
- Specialists in ASR
- TTS projects
- Audio AI researchers
🔧 Compatible tools
- Kaldi
- ESPnet
- Hugging Face Transformers
- Wav2vec 2.0
- SpeechBrain
💡 Tip
Use the various configurations to adjust the volume according to your resources and needs.
Frequently Asked Questions
What are GigaSpeech's main audio sources?
Audiobooks, podcasts, and YouTube videos covering a variety of topics and speaking styles.
Can GigaSpeech be used for text-to-speech (TTS)?
Yes, the dataset is suitable for training text-to-speech models in addition to speech recognition.
Does the dataset contain multiple subset sizes?
Yes, it offers five configurations of different sizes, from 10 hours (XS) to 10,000 hours (XL), to adapt to various uses.