Sentiment Analysis for Mental Health

Text dataset compiling statements from multiple sources (social networks, forums) annotated according to 7 mental health states (normal, depression, suicidal, anxiety, stress, stress, bipolarity, personality disorder). Intended to train AI models for emotional analysis and mental health chatbots.

Download dataset

Size

Approximately 51,000 text statements annotated into 7 categories, CSV/JSON format

Licence

Open Database License (ODbL) or equivalent free license (please check before using)

Description

‍

Sentiment Analysis for Mental Health dataset brings together more than 51,000 textual statements from various platforms (Reddit, Twitter, etc.), annotated according to 7 categories of mental states. It provides a rich and diverse corpus for the understanding of psychological disorders through automatic language processing.

‍

What is this dataset for?

‍

Train models for the classification of mental states from text.
Develop intelligent chatbots for psychological support.
Conduct emotional analyses to detect mental health trends and crises.

‍

Can it be enriched or improved?

‍

Yes, it is possible to improve the granularity of annotations, to add contextual metadata, or to extend the corpus with other sources. Data cleaning and bias management are essential for optimal use.

‍

🔎 In summary

Criterion	Evaluation
🧩 Ease of use	⭐⭐⭐✩✩ (Requires some cleaning and preprocessing)
🧼 Need for cleaning	⭐⭐⭐✩✩ (Moderate – aggregation from multiple sources, quality control needed)
🏷️ Annotation richness	⭐⭐⭐⭐✩ (Good – 7 distinct mental state categories)
📜 Commercial license	⚠️ Probably yes (ODbL), to verify depending on use
👨‍💻 Beginner friendly	⚠️ Medium – requires NLP and health ethics knowledge
🔁 Fine-tuning ready	⚡ Suitable for classification and dialogue models
🌍 Cultural diversity	🌏 Large – data from multiple social platforms

‍

🧠 Recommended for

AI mental health researchers
Chatbot developers
Sentimental analysts

‍

🔧 Compatible tools

Hugging Face Transformers
spaCy
Scikit-learn
Rasa

‍

💡 Tip

Use textual data augmentation methods to improve the robustness of models.

Frequently Asked Questions

Does this dataset make it possible to detect suicide risks automatically?

Yes, it includes a specific “Suicidal” category to model the early detection of risks.

‍
⚠️ Important Disclaimer: While this dataset can be used to help identify potential early warning signs of self-harm or suicidal ideation, it is not a substitute for professional evaluation or emergency services. Its outputs are experimental and cannot guarantee accurate or comprehensive detection of all risks. If you or someone you know may be at risk, please seek immediate help from qualified mental health professionals or emergency services rather than relying on the model alone.

Does the diversity of sources impact the quality of the data?

Yes, the variety of platforms requires thorough cleaning to avoid biases related to specific contexts.

Is this dataset suitable for commercial use?

The Open Database license is generally permissive, but it is important to check the exact terms according to the project and use.

Similar datasets

Text

Twitter Sentiment Analysis Dataset

Text

sQuad (Stanford Question Answering Dataset)

Image

Labeled Faces in the Wild (LFW)