View on GitHub

SIDATA

Turkish-Irony-Dataset

URL: https://github.com/teghub/Turkish-Irony-Dataset

Description: Turkish Social Media Dataset for Irony Detection

Project Overview

The Turkish-Irony-Dataset is a dataset designed for irony detection in Turkish social media texts. The dataset consists of 220 Turkish microblog texts, divided equally between ironic and non-ironic sentences. This dataset is particularly valuable for training models to identify irony in informal Turkish text from social media platforms.

Dataset Details:

Dataset Usage:

This dataset can be used to train and evaluate models that aim to detect irony in Turkish social media posts.

Datasets:

Dataset Statistics:

Training Methods:

The dataset requires preprocessing before training. The following steps are involved:

  1. Data Preparation: Execute the data_prep.py script to convert the .csv file into .tsv format.
  2. Model Execution: After preparing the data, run the run_model.py script to evaluate your model.
  3. Cross-validation: The dataset includes 10-fold cross-validation files (e.g., 0_train.tsv, 1_test.tsv) for evaluating the model’s performance.

Results:

The evaluation results, including accuracy and other metrics, are saved in the outputs directory. However, specific performance metrics such as accuracy, precision, recall, or F1 scores were not provided in the information shared.