ESC-50 (Environmental Sound Classification)

ESC-50 (Environmental Sound Classification) is an audio data set for training models that can recognize environmental sounds. It brings together sound clips divided into five main categories, representing natural or everyday sounds, useful for automatic acoustic recognition.

Download dataset

Size

2000 audio clips of 5 seconds each, WAV format

Licence

Free under a Creative Commons Attribution NonCommercial License (CC BY-NC)

Description

‍
The ESC-50 dataset includes:

2,000 high-quality audio files (44.1 kHz, mono)
Standardized duration of 5 seconds per clip
50 classes divided into 5 main categories:
- Animals (birds, dogs, insects...)
- Natural sounds (rain, wind, fire...)
- Human noises (laughing, coughing, sneezing...)
- Household appliances (clocks, doors, vacuum cleaners...)
- Urban environments (sirens, traffic, construction...)

‍

Precise annotations make it easy to use directly for supervised tasks.

‍

What is this dataset for?

‍
ESC-50 is mainly used for:

Training supervised audio classification models
The validation of machine learning techniques on real sounds
The development of embedded audio recognition systems
Acoustic or psycho-acoustic analysis of natural or urban sounds
Audio artificial intelligence research and the evaluation of new algorithms

‍

Can it be enriched or improved?

‍
Yes, several options are possible:

The addition of sounds with realistic background noise to increase robustness
Mixing or superimposing sounds to study source separation
Extending with additional categories or records
Integration with other corpora (AudioSet, UrbanSound8K) to broaden class diversity

‍

🔗 Source: ESC-50 Dataset

‍