By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Open Datasets
LibriSpeech
Multimodal

LibriSpeech

LibriSpeech is a reference audio dataset in the field of automatic speech recognition (ASR). It is composed of recordings of public domain books read aloud by English speakers, accompanied by their accurate text transcriptions.

Download dataset
Size

Approximately 1000 hours of audio in FLAC format, with associated transcripts in TXT

Licence

Free for academic and commercial use, under a Creative Commons license

Description


The LibriSpeech dataset includes:

  • Approximately 1000 hours of audio in English in FLAC format
  • Word for word transcripts in TXT format
  • Subsets organized according to the quality of alignment and the complexity of the recordings (clean, other)
  • An original database from the LibriVox project, with texts from the public domain

What is this dataset for?


LibriSpeech is widely used for:

  • Training speech recognition models (ASR)
  • Fine-tuning or evaluation of pre-trained models like Whisper, Wav2Vec, or DeepSpeech
  • Research on speech comprehension, audio segmentation, or audio-text alignment
  • Improving speech synthesis and interaction technologies

Can it be enriched or improved?


Yes, although already very structured, LibriSpeech can be adapted to:

  • Add prosodic or phonetic annotations
  • Combine with multilingual corpora for code-switching recognition
  • Create noisy or accented variants to test the robustness of the models
  • Integrate audio-text into multimodal alignment pipelines

🔗 Source: LibriSpeech Dataset

Frequently Asked Questions

What is the difference between the “clean” and “other” subsets?

“Clean” recordings have better audio quality and clearer diction, while “other” files are more complex (sharp accents, background noise, faster playback, etc.).

Can LibriSpeech be used for languages other than English?

No, LibriSpeech is exclusively in English. For other languages, there are equivalents like Common Voice, Multilingual LibriSpeech, or VoxPopuli.

Is LibriSpeech adapted to speech synthesis?

Yes, even if it is not its main use. The well-segmented recordings and aligned transcripts make it useful for training or evaluating text-to-speech (TTS) systems.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.