By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Open Datasets
Twitter Sentiment Analysis Dataset
Text

Twitter Sentiment Analysis Dataset

The Twitter Sentiment Analysis dataset is a database that is widely used in NLP for opinion analysis tasks. It contains over a million tweets annotated according to their emotional tone: positive, negative, or neutral.

Download dataset
Size

Approximately 1.6 million annotated tweets, in CSV format

Licence

Use subject to the Twitter API terms of use. Verification required for commercial uses

Description


The Twitter Sentiment dataset includes:

  • 1.6 million text tweets annotated in English
  • Three classes: positive, negative, neutral
  • A CSV format that can be easily used in NLP pipelines
  • Optional metadata (depending on the version): ID, date, username, etc.

What is this dataset for?


This dataset is commonly used to:

  • Training models for the classification of feelings on short texts
  • Analysis of trends and opinions on social networks
  • Online reputation monitoring (brand monitoring)
  • The improvement of moderation systems, recommendations or summaries of opinions

Can it be enriched or improved?


Yes, despite its size, this dataset can be enriched:

  • Addition of emotional subcategories (joy, anger, surprise, etc.)
  • Integration of contextual data (hashtags, emojis, images)
  • Creation of thematic filters (politics, sport, health...)
  • Translation or adaptation for multilingual analyses

🔗 Source: Twitter Sentiment Dataset

Frequently Asked Questions

Are the tweets in the dataset still available?

Not necessarily. Some may have been removed or made private. It is recommended to double-check their availability before use.

Can this dataset be used in a commercial context?

That depends on the terms of use of the Twitter API. It is imperative to consult the platform's policy before commercial exploitation.

Are there newer alternatives?

Yes, other datasets like TweetEval or Sentiment140 offer variants, sometimes enriched or more recent, for similar uses.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.