By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Open Datasets
AudioMnist
Audio

AudioMnist

AudioMnist is an audio dataset designed for automatic speech recognition. It contains recordings of numbers (from 0 to 9) pronounced by several dozen speakers, under controlled conditions. This dataset is a reference for short word classification tasks and the study of vocal representations.

Download dataset
Size

Approximately 30,000 audio files, WAV format

Licence

Open access for academic and research use, under a Creative Commons Attribution license

Description


Each recording is a WAV file containing an isolated number. The dataset is structured with:

  • 30,000 audio clips of numbers (0—9)
  • 60 different speakers (male and female)
  • Information on the gender, age, and linguistic background of participants
  • A controlled sound environment to minimize extraneous noise
  • 48 kHz sampling for optimal analysis quality

The dataset is often used for supervised classification and self-supervised learning tasks in audio.

What is this dataset for?


AudioMnist is used for:

  • Training audio classification models on simple controls
  • The neural network benchmark for speech recognition
  • The study of inter-speaker variability (age, gender, accent)
  • Research on vocal embeddings, phonetics, and acoustics
  • Experimentation with CNN or Transformer models on spectrograms

Can it be enriched or improved?


Yes, several possible paths:

  • Add background noise or distortions to test robustness
  • Extend the dataset to other languages or accents
  • Supplement with visual data for audio-visual approaches
  • Use data for contrasted learning or audio auto-encoding

🔗 Source: AudioMnist Dataset

Frequently Asked Questions

Can this dataset be used for commercial purposes?

No, the use is limited to academic research. For commercial use, it is recommended to contact the authors of the dataset.

Why is it called AudioMnist?

In reference to the famous MNIST dataset (handwritten figures), AudioMNIST offers a vocal version with the same logic for classifying simple numbers.

Are the speakers multilingual?

Yes, although the recordings are in English, the speakers come from a variety of linguistic backgrounds, which introduces a variety of accents.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.