MVSC
URL: https://github.com/MIND-Lab/MVSC
Description:
The Multi-View Sentiment Corpus (MVSC) consists of 3000 tweets, each labeled by three annotators with sentiment, emotion, irony, subjectivity, and implicitness. The dataset is designed to offer a multi-dimensional view of sentiment analysis by considering multiple aspects of language, including emotions, sentiment polarity, and the presence of irony or implicitness.
Methods
The MVSC corpus is structured to provide a rich set of labels for each tweet, covering different aspects that are crucial for understanding sentiment and irony in text:
- Emotion: Labels include anger, anticipation, joy, trust, fear, surprise, sadness, disgust, and none.
- Irony: Tweets are classified as either ironic or not ironic.
- Implicitness/Explicitness: Classifications into explicit, implicit, or none.
- Subjectivity/Objectivity: Classifications of tweets as either subjective or objective.
- Sentiment: Sentiment polarity is labeled as positive, negative, or neutral.
Each emoji within a tweet, if present, is also annotated with a sentiment polarity (positive, negative, neutral) or topic-relatedness.
Files
The MVSC corpus includes two tab-separated files:
- MVSC Tweet Corpus.csv: Contains tweet text, sentiment, emotion, irony, subjectivity, and implicitness annotations.
- MVSC Emoji Corpus.csv: Provides sentiment labels related to emojis used in the tweets.
Format Example (MVSC Tweet Corpus):
Each entry in the MVSC Tweet Corpus.csv includes the following fields:
- tweet_id: Numerical ID of the tweet.
- tweet_text: The UTF-8 encoded text of the tweet.
- user_id: Unique identifier for the user who posted the tweet.
- keyword: The keyword used to retrieve the tweet related to two popular movies: Deadpool and Suicide Squad.
- Emotion_A1, A2, A3: Emotion labels for each of the three annotators (anger, anticipation, joy, trust, fear, surprise, sadness, disgust, None).
- Irony_A1, A2, A3: Irony labels (Ironic, Not Ironic).
- Implicit-Explicit_A1, A2, A3: Labels for implicitness/explicitness (Explicit, Implicit, None).
- Subjectivity_A1, A2, A3: Labels for subjectivity/objectivity (Subjective, Objective).
- Tweet_Sentiment_A1, A2, A3: Sentiment labels for polarity (Positive, Negative, Neutral).
Example Format (MVSC Emoji Corpus):
Each entry in the MVSC Emoji Corpus.csv includes:
- tweet_id: Numerical ID of the tweet.
- emoji_position: The position of the emoji in the tweet, ordered by appearance.
- Tweet_Sentiment_A1, A2, A3: Sentiment polarity and topic-relatedness labels for the emoji (Topic, Positive, Negative, Neutral).
Results
As this repository contains the dataset, the results are not directly provided. Researchers can use this multi-annotated dataset to train and evaluate sentiment and sarcasm detection models, with a focus on understanding the interplay between different linguistic dimensions like emotion, sentiment, irony, and subjectivity.
Dataset
The MVSC dataset includes:
- Tweets: 3000 tweets labeled by 3 annotators across five dimensions: sentiment, emotion, irony, subjectivity, and implicitness.
- Emojis: Labels for emojis used in tweets, considering sentiment and topic-relatedness.
Files included in the repository:
- MVSC Tweet Corpus.csv
- MVSC Emoji Corpus.csv
You can access the corpus and the raw data from the repository: MVSC Dataset