By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Open Datasets
GoEmotions
Text

GoEmotions

GoEmotions is a text-based dataset with Reddit comments annotated for 27 distinct or neutral emotions. It makes it possible to train models on complex emotions in a real context.

Download dataset
Size

Approximately 58,000 plain text examples with multi-label annotations (JSON)

Licence

Apache 2.0

Description

GoEmotions is a dataset built from Reddit comments that are manually annotated to identify the emotion expressed. Each entry can be associated with several emotions among 27 distinct categories or be neutral. It is a rich corpus for emotional classification, with complex and realistic cases.

What is this dataset for?

  • Training emotion-detection models from text
  • Develop empathetic chatbots or more humane virtual assistants
  • Improve automatic moderation and the detection of sensitive speech

Can it be enriched or improved?

Yes, you can complete the dataset with other sources of social comments, or translate it into other languages. It is also possible to add conversational contexts or combine data with metadata (e.g. subreddit) to refine emotional models. Additional annotations such as emotional intensity could also be incorporated.

🔎 In summary

Criterion Evaluation
🧩Ease of Use ⭐⭐⭐⭐☆ (Clear JSON format with explicit labels)
🧼Cleaning Required ⭐⭐⭐⭐⭐ (Very low, ready-to-use data)
🏷️Annotation Richness ⭐⭐⭐⭐☆ (Multi-label with 28 emotional categories)
📜Commercial License ✅ Yes (Apache 2.0)
👨‍💻Ideal for Beginners 👩‍💻 Highly suitable, well-documented dataset
🔁Reusable for Fine-Tuning 🔥 Excellent base for emotion models
🌍Cultural Diversity 🌐 Moderate, English only with Reddit bias

🧠 Recommended for

  • Emotion detection projects
  • Conversational assistants
  • Social NLP search

🔧 Compatible tools

  • Hugging Face Transformers
  • Scikit-learn
  • PyTorch
  • TensorFlow
  • SpacY

💡 Tip

First, train a model on GoEmotions and then refine it with data specific to your field (e.g. service, forums, etc.)

Frequently Asked Questions

Does the GoEmotions dataset cover multiple languages?

No, it is entirely in English, but it is possible to translate it manually or automatically for multilingual cases.

Can GoEmotions be used in commercial projects?

Yes, the Apache 2.0 license allows commercial use, subject to compliance with the standard license terms.

Does this dataset contain biases?

Yes, like any social media data, it may contain biases related to Reddit and its users. It is important to take this into account when interpreting the results.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.