View on GitHub

SIDATA

ArSarcasm-v2

URL: https://github.com/iabufarha/ArSarcasm-v2

Description: ArSarcasm-v2 is an extension to the original ArSarcasm dataset. It was used for the shared task on sarcasm detection and sentiment analysis, which was part of WANLP 2021.

Dataset

Name: ArSarcasm-v2
Size:
- Training data: 2.28 MB
- Testing data: 571 KB

Additional Information

How the datasets were created

ArSarcasm-v2 is an extension of the original ArSarcasm dataset, incorporating data from the DAICT corpus and additional tweets. Each tweet was annotated for sarcasm, sentiment (positive, negative, neutral), and dialect (modern standard Arabic and regional Arabic dialects). The dataset consists of 15,548 tweets in total, divided into:

Training set: 12,548 tweets
Test set: 3,000 tweets

The data was created for the WANLP 2021 Shared Task on Sarcasm and Sentiment Detection in Arabic. Annotations were performed manually, ensuring accuracy and alignment with the task’s goals.

Training methods applied

Information not available.

Results obtained

Information not available.