View on GitHub

SIDATA

Sarcasm Detection

URL: https://github.com/MirunaPislar/Sarcasm-Detection

Description

This project explores sarcasm detection on Twitter using both traditional machine learning and deep learning techniques. Sarcasm, a form of verbal irony, is challenging to detect in written text due to the absence of paralinguistic cues such as intonation and facial expressions.

The project was completed as part of a Bachelor of Science in Computer Science at the University of Manchester, under the supervision of Mr. John McNaught. A video summarizing the project achievements is available here.

Methods and Models

The study investigates various methods for sarcasm detection in tweets:

Traditional Machine Learning Models:
- Support Vector Machines (SVMs)
- Logistic Regression
- Discrete feature-based classifiers
Deep Learning Models:
- Convolutional Neural Networks (CNNs)
- Long Short-Term Memory Networks (LSTMs)
- Gated Recurrent Units (GRUs)
- Bi-directional LSTMs
- Attention-based LSTMs

These models were evaluated on four different Twitter datasets (details available in res/ folder).

Datasets

The repository contains both raw and processed Twitter datasets, as well as vocabularies, word lists, and emoji selections that were useful for preprocessing.

Implementation Details

The source code is structured into multiple directories:

src/ – Code for data processing, analysis, training, and evaluation.
res/ – Datasets and preprocessed resources.
models/ – Pre-trained and trained models.
plots/ – Visualization plots used to support the results.
stats/ – Statistical comparisons between different preprocessing and model evaluation stages.
images/ – Model architecture visualizations and screenshots from the project report.

Requirements

The code has been tested on Python 3.5 (Ubuntu 16.04) with Keras 2.0.8 and TensorFlow 1.3 as the backend.

Required dependencies:

Python 3.5
Keras 2.0
TensorFlow 1.3
gensim 3.0
numpy 1.13
scikit-learn
h5py
emoji
tqdm
pandas
itertools
matplotlib

Results

The performance of different deep learning models on the considered datasets is summarized in the table below:

Results

Visualizations

The project includes tools to visualize deep network layers. Running specific scripts generates .html files in plots/html_visualizations/, which allow detailed analysis of hidden unit activations.

Disclaimer

The primary goal of this project was not to develop the most computationally efficient implementation but rather to derive meaningful insights about sarcasm detection in Twitter data. While the code has been reviewed and verified, users should apply it at their own discretion.