By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Open Datasets
Google Speech Commands
Audio

Google Speech Commands

The Google Speech Commands dataset consists of short voice recordings containing simple commands spoken by different speakers. This corpus is designed to train low-latency speech recognition models.

Download dataset
Size

Approximately 105,000 audio files, WAV format

Licence

Open access under a Creative Commons Attribution 4.0 license

Description


This audio dataset contains:

  • Approximately 105,000 clips lasting approximately 1 second
  • Over 30 distinct voice commands
  • Recordings collected from thousands of speakers
  • A relatively clean or slightly noisy background
  • A version with artificial background noise added for robust training

The dataset is particularly suitable for embedded or mobile applications requiring fast and accurate recognition of voice keywords.

What is this dataset for?


Google Speech Commands is used to:

  • Training lightweight keyword recognition models
  • The development of voice interfaces for connected devices (IoT, home automation)
  • Performance evaluation on command detection tasks
  • Analysis of short audio signals and phonetic properties

Can it be enriched or improved?


Yes, in particular by:

  • The addition of real background noise (voice, street, nature...) to test robustness
  • The creation of new sets of keywords specific to an application
  • Fine tuning with local voices or in other languages
  • Integration into real-time architectures (TinyML, on-device AI)

🔗 Source: Google Speech Commands Dataset

Frequently Asked Questions

Can this dataset be used for commercial applications?

Yes, as long as you comply with the terms of the CC-BY 4.0 license, including the correct attribution to Google.

Is it multilingual?

No, this dataset is mostly in English. Other projects are required for multi-lingual models.

Can it be used with models like Whisper, Wav2Vec, or DeepSpeech?

Absolutely. This dataset is compatible with most open source speech recognition frameworks, and is ideal for supervised audio classification tasks.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.