GoEmotions
GoEmotions is a text-based dataset with Reddit comments annotated for 27 distinct or neutral emotions. It makes it possible to train models on complex emotions in a real context.
Approximately 58,000 plain text examples with multi-label annotations (JSON)
Apache 2.0
Description
GoEmotions is a dataset built from Reddit comments that are manually annotated to identify the emotion expressed. Each entry can be associated with several emotions among 27 distinct categories or be neutral. It is a rich corpus for emotional classification, with complex and realistic cases.
What is this dataset for?
- Training emotion-detection models from text
- Develop empathetic chatbots or more humane virtual assistants
- Improve automatic moderation and the detection of sensitive speech
Can it be enriched or improved?
Yes, you can complete the dataset with other sources of social comments, or translate it into other languages. It is also possible to add conversational contexts or combine data with metadata (e.g. subreddit) to refine emotional models. Additional annotations such as emotional intensity could also be incorporated.
🔎 In summary
🧠 Recommended for
- Emotion detection projects
- Conversational assistants
- Social NLP search
🔧 Compatible tools
- Hugging Face Transformers
- Scikit-learn
- PyTorch
- TensorFlow
- SpacY
💡 Tip
First, train a model on GoEmotions and then refine it with data specific to your field (e.g. service, forums, etc.)
Frequently Asked Questions
Does the GoEmotions dataset cover multiple languages?
No, it is entirely in English, but it is possible to translate it manually or automatically for multilingual cases.
Can GoEmotions be used in commercial projects?
Yes, the Apache 2.0 license allows commercial use, subject to compliance with the standard license terms.
Does this dataset contain biases?
Yes, like any social media data, it may contain biases related to Reddit and its users. It is important to take this into account when interpreting the results.