View on GitHub

SIDATA

ArSarcasT

URL: https://github.com/Mabdelaziz/ArSarcasT

Description: An Arabic sarcasm detection dataset.

Dataset

Name: ArSarcasT
Size:
- Training data: 3.75 MB
- Testing data: 904 KB

Additional Information

How the datasets were created

The ArSarcasT corpus is a dataset of Arabic tweets designed for sarcasm detection. It combines tweets from prior benchmarking datasets with new tweets covering social and political topics from 2020 to 2022.

Annotation process: Native Arabic speakers from Egypt manually annotated the tweets, with final labels determined through majority voting. Benchmark examples were re-annotated to ensure consistency, which led to some differences from the original labels.
Composition: The dataset contains 26,014 tweets, with 28% labeled as sarcastic.

Training methods applied

Information not available.

Results obtained

Information not available.