View on GitHub

SIDATA

KoCoSa_sarcasm_detection

URL: https://github.com/Yu-billie/KoCoSa_sarcasm_detection

Description: Repository for “KoCoSa: Korean Context-aware Sarcasm Detection Dataset” accepted in COLING 2024.

Dataset Creation

The KoCoSa dataset is focused on sarcasm detection in the Korean language with a particular emphasis on context-awareness. It was created to address the challenge of understanding sarcasm in Korean text, taking into account both linguistic features and context. The dataset was manually annotated by the authors, following rigorous standards to ensure high-quality labels.

The dataset consists of Korean sarcasm instances from various domains, and the sarcasm detection task involves analyzing the context within the text to determine if sarcasm is present.

Training Methods

While specific training methods are not detailed in the provided repository description, the dataset is available for training various sarcasm detection models. Users can utilize traditional machine learning models or deep learning approaches such as transformer-based models (e.g., BERT) for training on the KoCoSa dataset.

Results

The results of experiments and models trained on the KoCoSa dataset are shared in the associated paper:

Paper: KoCoSa: Korean Context-aware Sarcasm Detection Dataset (arXiv version).

Additional Information

The KoCoSa dataset is also available through Huggingface for easy access and integration with machine learning pipelines.

Huggingface Dataset: KoCoSa

Dataset Files:

Dataset details are available upon request or via the Huggingface repository.
The dataset is expected to include annotated instances of sarcasm in Korean text.