Google Speech Commands
The Google Speech Commands dataset consists of short voice recordings containing simple commands spoken by different speakers. This corpus is designed to train low-latency speech recognition models.
Approximately 105,000 audio files, WAV format
Open access under a Creative Commons Attribution 4.0 license
Description
This audio dataset contains:
- Approximately 105,000 clips lasting approximately 1 second
- Over 30 distinct voice commands
- Recordings collected from thousands of speakers
- A relatively clean or slightly noisy background
- A version with artificial background noise added for robust training
The dataset is particularly suitable for embedded or mobile applications requiring fast and accurate recognition of voice keywords.
What is this dataset for?
Google Speech Commands is used to:
- Training lightweight keyword recognition models
- The development of voice interfaces for connected devices (IoT, home automation)
- Performance evaluation on command detection tasks
- Analysis of short audio signals and phonetic properties
Can it be enriched or improved?
Yes, in particular by:
- The addition of real background noise (voice, street, nature...) to test robustness
- The creation of new sets of keywords specific to an application
- Fine tuning with local voices or in other languages
- Integration into real-time architectures (TinyML, on-device AI)
🔗 Source: Google Speech Commands Dataset
Frequently Asked Questions
Can this dataset be used for commercial applications?
Yes, as long as you comply with the terms of the CC-BY 4.0 license, including the correct attribution to Google.
Is it multilingual?
No, this dataset is mostly in English. Other projects are required for multi-lingual models.
Can it be used with models like Whisper, Wav2Vec, or DeepSpeech?
Absolutely. This dataset is compatible with most open source speech recognition frameworks, and is ideal for supervised audio classification tasks.