View on GitHub

SIDATA

S3D

URL: https://github.com/surrey-nlp/S3D

Description:
This repository contains sarcasm-annotated datasets along with notebooks to use fine-tuned language models. The work was presented at the EMNLP 2022 workshop: “Utilizing Weak Supervision to Create S3D: A Sarcasm Annotated Dataset.”

Datasets

The repository provides three datasets focused on sarcasm detection in Twitter data:

Experiments

The repository includes a notebook demonstrating the dataset labeling process. The experiments for creating S3D-v1 and S3D-v2 can be reproduced using Python notebooks available here. The models are loaded via HuggingFace.

Models Used

Model Fine-tuned Version Description
BERTweet bertweet-base-finetuned-SARC-combined-DS Fine-tuned on a combined dataset for sarcasm detection
BERTweet bertweet-base-finetuned-SARC-DS Fine-tuned specifically on the SARC dataset
RoBERTalarge roberta-large-finetuned-SARC-combined-DS RoBERTa-large model fine-tuned on the combined dataset

Maintainers