ArSarcasT
URL: https://github.com/Mabdelaziz/ArSarcasT
Description: An Arabic sarcasm detection dataset.
Dataset
- Name: ArSarcasT
- Size:
- Training data: 3.75 MB
- Testing data: 904 KB
Additional Information
How the datasets were created
The ArSarcasT corpus is a dataset of Arabic tweets designed for sarcasm detection. It combines tweets from prior benchmarking datasets with new tweets covering social and political topics from 2020 to 2022.
- Annotation process: Native Arabic speakers from Egypt manually annotated the tweets, with final labels determined through majority voting. Benchmark examples were re-annotated to ensure consistency, which led to some differences from the original labels.
- Composition: The dataset contains 26,014 tweets, with 28% labeled as sarcastic.
Training methods applied
Information not available.
Results obtained
Information not available.