By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information

Preferences Decline Accept

AudioMnist

Audio

AudioMnist

AudioMnist is an audio dataset designed for automatic speech recognition. It contains recordings of numbers (from 0 to 9) pronounced by several dozen speakers, under controlled conditions. This dataset is a reference for short word classification tasks and the study of vocal representations.

Download dataset

Size

Approximately 30,000 audio files, WAV format

Licence

Open access for academic and research use, under a Creative Commons Attribution license

Description

‍
Each recording is a WAV file containing an isolated number. The dataset is structured with:

30,000 audio clips of numbers (0—9)
60 different speakers (male and female)
Information on the gender, age, and linguistic background of participants
A controlled sound environment to minimize extraneous noise
48 kHz sampling for optimal analysis quality

‍

The dataset is often used for supervised classification and self-supervised learning tasks in audio.

‍

‍

What is this dataset for?

‍
AudioMnist is used for:

Training audio classification models on simple controls
The neural network benchmark for speech recognition
The study of inter-speaker variability (age, gender, accent)
Research on vocal embeddings, phonetics, and acoustics
Experimentation with CNN or Transformer models on spectrograms

‍

‍

Can it be enriched or improved?

‍
Yes, several possible paths:

Add background noise or distortions to test robustness
Extend the dataset to other languages or accents
Supplement with visual data for audio-visual approaches
Use data for contrasted learning or audio auto-encoding

‍

‍

🔗 Source: AudioMnist Dataset

‍

Frequently Asked Questions

Can this dataset be used for commercial purposes?

No, the use is limited to academic research. For commercial use, it is recommended to contact the authors of the dataset.

Why is it called AudioMnist?

In reference to the famous MNIST dataset (handwritten figures), AudioMNIST offers a vocal version with the same logic for classifying simple numbers.

Are the speakers multilingual?

Yes, although the recordings are in English, the speakers come from a variety of linguistic backgrounds, which introduces a variety of accents.

Similar datasets

DCASE Challenge Dataset

LUNA16

Open Images Dataset

Copyright © Innovatiana SAS (SIREN 913 684 668), a French & Malagasy company, 2021-2025. All rights reserved

Terms of use Privacy Policy