View on GitHub

SIDATA

SPIRS

URL: https://github.com/bshmueli/SPIRS

Description

SPIRS is a high-quality sarcasm dataset containing 15,000 sarcastic tweets and 15,000 non-sarcastic tweets, totaling 30,000 samples. The dataset was collected using a novel data capturing method called reactive supervision, which enables the collection of both intended and perceived sarcasm. This unique approach allows for a richer context in sarcasm detection tasks.

Dataset Details

SPIRS stands for Sarcasm, Perceived and Intended, by Reactive Supervision.
The dataset includes two files:
- SPIRS-sarcastic-ids.csv: Contains 15,000 sarcastic tweet IDs (positive samples).
- SPIRS-non-sarcastic-ids.csv: Contains 15,000 non-sarcastic tweet IDs (negative samples).
Additional metadata for sarcastic tweets includes:
- Sarcasm perspective (intended or perceived).
- Author sequence.
- Contextual tweet IDs (cue, oblivious, and eliciting tweets).

Key Features

Reactive Supervision: A method allowing the collection of both intended and perceived sarcasm texts.
Rich Context: Includes contextual information that can help better understand sarcasm, such as author sequence and related tweets.
Research: The dataset is explained in detail in the reactive supervision paper, and more insights can be found in the Medium article or the YouTube video.

Dataset Size

SPIRS-non-sarcastic-ids.csv: 673 KB
SPIRS-sarcastic-ids.csv: 1.31 MB

Methods

No specific methods information is provided in the repository.

Results

No specific results information is provided in the repository.

Models

No specific models information is provided in the repository.