Google Speech Commands

The Google Speech Commands dataset consists of short voice recordings containing simple commands spoken by different speakers. This corpus is designed to train low-latency speech recognition models.

Download dataset

Size

Approximately 105,000 audio files, WAV format

Licence

Open access under a Creative Commons Attribution 4.0 license

Description

‍
This audio dataset contains:

Approximately 105,000 clips lasting approximately 1 second
Over 30 distinct voice commands
Recordings collected from thousands of speakers
A relatively clean or slightly noisy background
A version with artificial background noise added for robust training

‍

The dataset is particularly suitable for embedded or mobile applications requiring fast and accurate recognition of voice keywords.

‍

What is this dataset for?

‍
Google Speech Commands is used to:

Training lightweight keyword recognition models
The development of voice interfaces for connected devices (IoT, home automation)
Performance evaluation on command detection tasks
Analysis of short audio signals and phonetic properties

‍

Can it be enriched or improved?

‍
Yes, in particular by:

The addition of real background noise (voice, street, nature...) to test robustness
The creation of new sets of keywords specific to an application
Fine tuning with local voices or in other languages
Integration into real-time architectures (TinyML, on-device AI)

‍

🔗 Source: Google Speech Commands Dataset

‍