By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Open Datasets
VoxCeleb
Multimodal

VoxCeleb

VoxCeleb is a massive dataset of voice recordings taken from public videos, mostly interviews and media appearances. It contains the voices of several thousand speakers, mostly celebrities, and is designed for the robust identification of people from their voices, despite noise, accents, or changes in the environment.

Download dataset
Size

Over 1 million audio clips of human voices, WAV format

Licence

Free access for non-commercial use (restricted license with prior access request)

Description


The dataset comes from the extraction of audio from YouTube videos, with a semi-automatic verification of the voice/face correspondence. It includes:

  • Over 1 million voice clips
  • Several thousand speakers identified (VoxCeleb1 and VoxCeleb2)
  • Metadata about each speaker (identity, nationality, gender...)
  • Recordings in real, noisy or varied environments
  • Balancing male/female voices, with a great diversity of linguistic origins

It is used to train systems that can recognize or distinguish individuals based on their voiceprints alone.

What is this dataset for?


VoxCeleb is used in numerous projects related to:

  • Automatic speaker identification (speaker identification/verification)
  • Improving speech recognition systems in noisy environments
  • Research in voice biometrics and audio security
  • Pre-training Wav2Vec, Whisper or ECAPA-TDNN models
  • The creation of voiceprints for personalized voice assistants

Can it be enriched or improved?


Yes, for example:

  • By adding data from underrepresented languages
  • By supplementing with extracts from non-media domains (podcasts, calls)
  • By standardizing audio signals for better comparative performance
  • By testing scenarios for spoofing or resisting voice spoofing

🔗 Source: VoxCeleb Dataset

Frequently Asked Questions

Are voices anonymized or identifiable?

They are linked to public identities (mainly celebrities), with detailed metadata, but their use is reserved for research.

Can this dataset be used for commercial projects?

No VoxCeleb is only available for academic or non-commercial use. An access request must be submitted to the research team.

Is the dataset multilingual?

Yes, it covers a wide range of languages and accents, making it a robust basis for multilingual voice identification tasks.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.