By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Open Datasets
GigaSpeech
Audio

GigaSpeech

GigaSpeech is a vast multi-domain English corpus of up to 10,000 hours of high-quality audio from audiobooks, podcasts, and YouTube videos. It includes different speech styles, from read speech to spontaneous speech, on a variety of topics. The dataset is designed for automatic speech recognition (ASR) and speech synthesis (TTS).

Download dataset
Size

Up to 10,000 hours of transcribed audio, WAV/opus files, various audio segments

Licence

Apache 2.0

Description

The dataset GigaSpeech contains a vast array of audio transcribed in English, collected from a variety of sources such as audiobooks, podcasts, and YouTube videos. It offers several configurations ranging from 10 hours (XS) to 10,000 hours (XL) to adapt to research and industrial needs. The audio segments are accompanied by accurate text transcripts, making it possible to train robust speech recognition and synthesis models.

What is this dataset for?

  • Train automatic speech recognition (ASR) models in English on large amounts of data.
  • Form speech synthesis systems (TTS) from varied and quality audio.
  • Test and evaluate models in various thematic areas and speech styles.

Can it be enriched or improved?

Yes, the dataset can be supplemented with additional annotations, finer segmentations, or integrations of new audio sources. It is also possible to adapt transcripts for specific use cases or to add metadata to enrich user experiences.

🔎 In summary

Criterion Evaluation
🧩Ease of Use ⭐⭐⭐☆☆ (Requires handling large volumes and varied formats)
🧼Cleaning Required ⭐⭐⭐☆☆ (Moderate – quality control recommended depending on audio sources)
🏷️Annotation Richness ⭐⭐⭐☆☆ (Accurate text transcriptions, few additional annotations)
📜Commercial License ✅ Free and commercial (Apache 2.0)
👨‍💻Ideal for Beginners ⚠️ Recommended for users with audio experience
🔁Reusable for Fine-Tuning 🔥 Excellent for ASR and TTS fine-tuning
🌍Cultural Diversity 🌐 English only, multi-domain

🧠 Recommended for

  • Specialists in ASR
  • TTS projects
  • Audio AI researchers

🔧 Compatible tools

  • Kaldi
  • ESPnet
  • Hugging Face Transformers
  • Wav2vec 2.0
  • SpeechBrain

💡 Tip

Use the various configurations to adjust the volume according to your resources and needs.

Frequently Asked Questions

What are GigaSpeech's main audio sources?

Audiobooks, podcasts, and YouTube videos covering a variety of topics and speaking styles.

Can GigaSpeech be used for text-to-speech (TTS)?

Yes, the dataset is suitable for training text-to-speech models in addition to speech recognition.

Does the dataset contain multiple subset sizes?

Yes, it offers five configurations of different sizes, from 10 hours (XS) to 10,000 hours (XL), to adapt to various uses.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.