RAVDESS
RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) is a multimodal reference dataset for the recognition of emotions. It contains voice and visual recordings of professional actors expressing different emotions through speech and singing, under controlled conditions.
7356 audio and video files, WAV and MP4 formats
Available free for research, under a Creative Commons Attribution-NonCommercial 4.0 license (CC BY-NC 4.0)
Description
The dataset includes:
- 24 actors (12 men and 12 women)
- 2 types of content: spoken and sung speech
- 8 emotions: calm, joy, sadness, anger, fear, surprise, disgust, neutral
- 7,356 files in total (audio, video, audio-visual)
- Precise annotations of emotions, intensity, gender, and modality
The recordings are made in the studio, guaranteeing optimal quality for the analysis of audio and visual signals.
What is this dataset for?
RAVDESS is widely used for:
- Training models to recognize emotions from the voice or the face
- The development of voice assistants, chatbots or empathetic interfaces
- Multimodal analysis of human emotional expressions
- The evaluation of speech-to-emotion or vision-to-emotion systems
- Projects in computational psychology and affective neuroscience
Can it be enriched or improved?
Yes, here are some possible axes:
- Combine with other emotional datasets (CREMA-D, SAVEE) to increase the diversity of speakers
- Add background noise or filters to test the robustness of models
- Extracting spectrogram or facial features for hybrid audio/video models
- Extend analysis to subtle emotions or varied cultural expressions
🔗 Source: RAVDESS Dataset
Frequently Asked Questions
Can RAVDESS be used in commercial applications?
No, commercial use is prohibited without explicit permission. The dataset is intended for academic research and non-commercial projects.
Does the dataset contain real emotions?
Emotions are played by professional actors, in studio conditions, which ensures clarity but may limit emotional naturalness in some cases.
Is it a multilingual dataset?
No The recordings are exclusively in North American English.