By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Open Datasets
Sentiment Analysis for Mental Health
Text

Sentiment Analysis for Mental Health

Text dataset compiling statements from multiple sources (social networks, forums) annotated according to 7 mental health states (normal, depression, suicidal, anxiety, stress, stress, bipolarity, personality disorder). Intended to train AI models for emotional analysis and mental health chatbots.

Download dataset
Size

Approximately 51,000 text statements annotated into 7 categories, CSV/JSON format

Licence

Open Database License (ODbL) or equivalent free license (please check before using)

Description

Sentiment Analysis for Mental Health dataset brings together more than 51,000 textual statements from various platforms (Reddit, Twitter, etc.), annotated according to 7 categories of mental states. It provides a rich and diverse corpus for the understanding of psychological disorders through automatic language processing.

What is this dataset for?

  • Train models for the classification of mental states from text.
  • Develop intelligent chatbots for psychological support.
  • Conduct emotional analyses to detect mental health trends and crises.

Can it be enriched or improved?

Yes, it is possible to improve the granularity of annotations, to add contextual metadata, or to extend the corpus with other sources. Data cleaning and bias management are essential for optimal use.

🔎 In summary

Criterion Evaluation
🧩 Ease of use⭐⭐⭐✩✩ (Requires some cleaning and preprocessing)
🧼 Need for cleaning⭐⭐⭐✩✩ (Moderate – aggregation from multiple sources, quality control needed)
🏷️ Annotation richness⭐⭐⭐⭐✩ (Good – 7 distinct mental state categories)
📜 Commercial license⚠️ Probably yes (ODbL), to verify depending on use
👨‍💻 Beginner friendly⚠️ Medium – requires NLP and health ethics knowledge
🔁 Fine-tuning ready⚡ Suitable for classification and dialogue models
🌍 Cultural diversity🌏 Large – data from multiple social platforms

🧠 Recommended for

  • AI mental health researchers
  • Chatbot developers
  • Sentimental analysts

🔧 Compatible tools

  • Hugging Face Transformers
  • spaCy
  • Scikit-learn
  • Rasa

💡 Tip

Use textual data augmentation methods to improve the robustness of models.

Frequently Asked Questions

Does this dataset make it possible to detect suicide risks automatically?

Yes, it includes a specific “Suicidal” category to model the early detection of risks.


⚠️ Important Disclaimer: While this datasets can be used to help identify potential early warning signs of self-harm or suicidal ideation, it is not a substitute for professional evaluation or emergency services. Its outputs are experimental and cannot guarantee accurate or comprehensive detection of all risks. If you or someone you know may be at risk, please seek immediate help from qualified mental health professionals or emergency services rather than relying on the model alone.

Does the diversity of sources impact the quality of the data?

Yes, the variety of platforms requires thorough cleaning to avoid biases related to specific contexts.

Is this dataset suitable for commercial use?

The Open Database license is generally permissive, but it is important to check the exact terms according to the project and use.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.