RAVDESS

RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) is a multimodal reference dataset for the recognition of emotions. It contains voice and visual recordings of professional actors expressing different emotions through speech and singing, under controlled conditions.

Download dataset

Size

7356 audio and video files, WAV and MP4 formats

Licence

Available free for research, under a Creative Commons Attribution-NonCommercial 4.0 license (CC BY-NC 4.0)

Description

‍
The dataset includes:

24 actors (12 men and 12 women)
2 types of content: spoken and sung speech
8 emotions: calm, joy, sadness, anger, fear, surprise, disgust, neutral
7,356 files in total (audio, video, audio-visual)
Precise annotations of emotions, intensity, gender, and modality

‍

The recordings are made in the studio, guaranteeing optimal quality for the analysis of audio and visual signals.

‍

What is this dataset for?

‍
RAVDESS is widely used for:

Training models to recognize emotions from the voice or the face
The development of voice assistants, chatbots or empathetic interfaces
Multimodal analysis of human emotional expressions
The evaluation of speech-to-emotion or vision-to-emotion systems
Projects in computational psychology and affective neuroscience

‍

Can it be enriched or improved?

‍
Yes, here are some possible axes:

Combine with other emotional datasets (CREMA-D, SAVEE) to increase the diversity of speakers
Add background noise or filters to test the robustness of models
Extracting spectrogram or facial features for hybrid audio/video models
Extend analysis to subtle emotions or varied cultural expressions

‍

🔗 Source: RAVDESS Dataset

‍