By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Audio

RAVDESS

RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) is a multimodal reference dataset for the recognition of emotions. It contains voice and visual recordings of professional actors expressing different emotions through speech and singing, under controlled conditions.

Download dataset
Size

7356 audio and video files, WAV and MP4 formats

Licence

Available free for research, under a Creative Commons Attribution-NonCommercial 4.0 license (CC BY-NC 4.0)

Description


The dataset includes:

  • 24 actors (12 men and 12 women)
  • 2 types of content: spoken and sung speech
  • 8 emotions: calm, joy, sadness, anger, fear, surprise, disgust, neutral
  • 7,356 files in total (audio, video, audio-visual)
  • Precise annotations of emotions, intensity, gender, and modality

The recordings are made in the studio, guaranteeing optimal quality for the analysis of audio and visual signals.

What is this dataset for?


RAVDESS is widely used for:

  • Training models to recognize emotions from the voice or the face
  • The development of voice assistants, chatbots or empathetic interfaces
  • Multimodal analysis of human emotional expressions
  • The evaluation of speech-to-emotion or vision-to-emotion systems
  • Projects in computational psychology and affective neuroscience

Can it be enriched or improved?


Yes, here are some possible axes:

  • Combine with other emotional datasets (CREMA-D, SAVEE) to increase the diversity of speakers
  • Add background noise or filters to test the robustness of models
  • Extracting spectrogram or facial features for hybrid audio/video models
  • Extend analysis to subtle emotions or varied cultural expressions

🔗 Source: RAVDESS Dataset

Frequently Asked Questions

Can RAVDESS be used in commercial applications?

No, commercial use is prohibited without explicit permission. The dataset is intended for academic research and non-commercial projects.

Does the dataset contain real emotions?

Emotions are played by professional actors, in studio conditions, which ensures clarity but may limit emotional naturalness in some cases.

Is it a multilingual dataset?

No The recordings are exclusively in North American English.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.