View on GitHub

SIDATA

Irony-detection

URL: https://github.com/fatemenajafi135/Irony-detection

Description: Persian Irony Detection, including a Persian dataset, automatic dataset creation, and fine-tuning transformer-based language models for the task.

Dataset Creation

The Persian irony detection dataset was created using two methods:

MirasIrony: A manually labeled dataset for Persian irony detection.
Persian Irony Detection: An automatically labeled dataset generated by crawling Persian tweets from a Telegram channel. The process involves:
- Crawling: Collecting public messages from Telegram using the Telegram API (via the crawling.py script).
- Gathering: Combining the crawled data, cleaning it, and saving it into a CSV file (messages.csv).
- Cleaning: Basic text cleaning and saving the cleaned dataset (messages_cleaned.csv).
- Labeling: Tweets are labeled based on the most common reactions from users (top-2 reactions).

Training Methods

The dataset is used to train transformer-based language models for Persian irony detection. Fine-tuning is applied to the following models:

ParsBert vr3
XLM-RoBERTa-Base
XLM-RoBERTa-Large

The models are evaluated using common metrics like accuracy, recall, precision, and F1 score.

Results

The comparison of different fine-tuned language models on the Persian dataset shows the following results:

Language Model	Accuracy	Recall	Precision	F1
ParsBert vr3	81.3%	81.4%	81.3%	81.3%
XLM-RoBERTa-Base	82.6%	82.8%	82.6%	82.5%
XLM-RoBERTa-Large	84.7%	84.7%	84.6%	84.6%

Dataset Files:

Persian_irony_detection.csv: 4.7 MB
test.csv: 987 KB
train.csv: 3.81 MB

Dataset Statistics:

Ironic Tweets: 7,014
Non-Ironic Tweets: 7,932
Avg. Tokens per Ironic Tweet: 30
Avg. Tokens per Non-Ironic Tweet: 45
Max Tokens per Ironic Tweet: 260
Max Tokens per Non-Ironic Tweet: 430

Sample Tweets:

Ironic: “پشت یه کامیونه نوشته بود: سلطان خیانت هیدروژن! هم پیوند کوالانسی میگیره هم هیدروژنی! فکر کنم رانندش لیسانس شیمی داشته 🙁😂🤦🏻‍♂️”
Non-Ironic: “آره مهاجرت خوبه ولی قشنگترش این بود که همینجا کنار خانواده و دوستامون به خواسته‌هایی که داشتیم برسیم :(“
Ironic: “تاس کباب داشتیم بابام جفت شیش آورد همه‌شو خورد”
Non-Ironic: “مدیون تاول های پامون تو راه اشتباه نباشیم! هر جا که فهمیدیم مسیر درست را انتخاب نکردیم،بدون تردید دور بزنیم و برگردیم!”