View on GitHub

SIDATA

Sentiment-Analysis—Amazon-Fine-Food-Reviews

URL: https://github.com/amritajose/Sentiment-Analysis—Amazon-Fine-Food-Reviews

Description

This project focuses on implementing a sentiment analysis system using NLP techniques to interpret language complexities such as context, ambiguity, sarcasm, and irony in user reviews. The dataset used is the fine food reviews from Amazon, aiming to assist customers in making faster and more informed decisions regarding the products. The system also includes a review summarization feature to condense reviews to less than 20 words.

Project Overview

Solution Implementation

  1. Data Preprocessing: Handling missing values, duplicates, redundant data, and adding new variables for analysis.
  2. Random Under Sampling: To ensure even class distribution for better model performance.
  3. Sentiment Classification: Using three models—Logistic Regression, Naïve Bayes, and K Nearest Neighbor—and vectorizing the data with Bag of Words and TF-IDF.
  4. Exploratory Analysis: Analyzing data distribution, generating word clouds, and identifying helpful reviews.
  5. Sentiment Prediction: Evaluating the sentiment of reviews using the implemented machine learning models.
  6. Summarization: Reviews are summarized to less than 20 words using NLTK, with the sentiment of these summaries compared to the original reviews.
  7. Model Evaluation: Comparing the performance of the models and summarization techniques based on accuracy.

Dataset

The dataset used in this project is the Amazon Fine Food Reviews dataset, which contains product reviews and ratings. It is available for download on Kaggle at:
Amazon Fine Food Reviews Dataset.

Results

Sample Input and Output