View on GitHub

SIDATA

dataset-pruning-sarcasm-detection

URL: https://github.com/priyank96/dataset-pruning-sarcasm-detection

Description: No description provided.

Project Overview

This repository focuses on fine-tuning models for sarcasm detection using a pruned dataset.

Training Methods:

The repository includes several pre-trained models:
- RoBERTa
- XLNet
- Electra

The models can be executed by running:

python ./Models/model-name/model-name.py

Dependencies are installed via:
```
pip install -r requirements.txt
```
The dataset used for training is SARC (Sarcastic Comments on Reddit), which can be downloaded from:
- Kaggle Dataset.

Results:

The best results from the paper can be reproduced by running:
```
python ./Models/RoBERTa/RoBERTa.py
```
The repository is based on a fork of SemEval 2022 Task 6 - Sarcasm Detection (GitHub).
More details on the approach can be found in their related paper:
SemEval-2022 Task 6 Paper.

Dataset:

Source: Kaggle (SARC dataset).
File: train-balanced-sarcasm.csv (not included in the repository but can be downloaded separately).