By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Open Datasets
AudioSet
Audio

AudioSet

AudioSet is a vast audio corpus compiled by Google, containing millions of sound clips from YouTube videos. Each clip, lasting 10 seconds, is annotated with one or more tags from a structured vocabulary of more than 600 categories of sounds.

Download dataset
Size

Over 2 million annotated audio clips, WAV (via extraction) and JSON (annotations) formats

Licence

Free access for research purposes, with annotations provided by Google under a Creative Commons license (original audio remains hosted on YouTube)

Description


AudioSet covers a wide variety of sounds from the real world:

  • Human sounds: speech, laughter, cough, screams, applause,...
  • Animal sounds: barking, birdsong, henning,...
  • Mechanical sounds: engines, alarms, sirens, tools, vehicles,...
  • Environments: rain, wind, crowd, forest, classroom,...
  • Music: instruments, songs, various musical genres

The annotations are prioritized and are the result of a semi-automated process validated manually on a subset.

What is this dataset for?


AudioSet is used for:

  • Training models for the classification and detection of environmental sounds
  • The development of real-time sound recognition systems
  • Annotating complex audio scenes for robotics or embedded devices
  • The study of acoustic contexts in audio or multimodal AI projects
  • The analysis of sound events for the creation of audio banks or generative synthesis

Can it be enriched or improved?


Yes, for example:

  • By combining AudioSet with extracts that are locally stored or captured in real time
  • By refining categories for specific industrial or medical contexts
  • By applying segmentation or source separation techniques
  • Using audio embeddings as input into multimodal models

🔗 Source: AudioSet Dataset

Frequently Asked Questions

Are the audio files directly downloadable?

No Only annotations and video links are provided. Audio samples must be extracted via YouTube links, in accordance with the terms of use.

Can AudioSet be used commercially?

Annotations are free, but the original audio is subject to YouTube copyright, so a license check is required for commercial use.

Is the dataset multilingual?

Indirectly, yes. The voice sounds come from multilingual videos, but the annotations are in English.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.