Project | Business Case: Automated Customer Reviews

Project Goal

This project aims to develop a product review system powered by NLP models that aggregate customer feedback from different sources. The key tasks include classifying reviews, clustering product categories, and using generative AI to summarize reviews into recommendation articles.

Problem Statement

With thousands of reviews available across multiple platforms, manually analyzing them is inefficient. This project seeks to automate the process using NLP models to extract insights and provide users with valuable product recommendations.

Main Tasks

1. Review Classification

Objective: Classify customer reviews into positive, negative, or neutral categories to help the company improve its products and services.
Task: Create a model for classifying the textual content of reviews into these three categories.

Mapping Star Ratings to Sentiment Classes

Since the dataset contains star ratings (1 to 5), you should map them to three sentiment classes as follows:

Star Rating	Sentiment Class
1 - 2	Negative
3	Neutral
4 - 5	Positive

This is a simple approach, but you are encouraged to experiment with different mappings!

Model Building

For classifying customer reviews into positive, negative, or neutral, use pretrained transformer-based models to leverage powerful language representations without training from scratch.

Suggested Models

distilbert-base-uncased – Lightweight and fast, ideal for limited resources.
bert-base-uncased – A strong general-purpose model for sentiment analysis.
roberta-base – More robust to nuanced sentiment variations.
nlptown/bert-base-multilingual-uncased-sentiment – Handles multiple languages, useful for diverse datasets.
cardiffnlp/twitter-roberta-base-sentiment – Optimized for short texts like social media reviews.

Explore models on Hugging Face and experiment with fine-tuning to improve accuracy.

Model Evaluation

Evaluation Metrics

Evaluated the model's performance on a separate test dataset using various evaluation metrics:
- Accuracy: Percentage of correctly classified instances.
- Precision: Proportion of true positive predictions among all positive predictions.
- Recall: Proportion of true positive predictions among all actual positive instances.
- F1-score: Harmonic mean of precision and recall.
Calculated confusion matrix to analyze model's performance across different classes.

Results

Model achieved an accuracy of X% on the test dataset.
Precision, recall, and F1-score for each class are as follows:
Class 1: Precision=X%, Recall=X%, F1-score=X%
Class 2: Precision=X%, Recall=X%, F1-score=X%
...
Confusion matrix showing table and graphical representations

2. Product Category Clustering

Objective: Simplify the dataset by clustering product categories into 4-6 meta-categories.
Task: Create a model to group all reviews into 4-6 broader categories. Example suggestions:
- Ebook readers
- Batteries
- Accessories (keyboards, laptop stands, etc.)
- Non-electronics (Nespresso pods, pet carriers, etc.)
Note: Analyze the dataset in depth to determine the most appropriate categories.

3. Review Summarization Using Generative AI

Objective: Summarize reviews into articles that recommend the top products for each category.
Task: Create a model that generates a short article (like a blog post) for each product category. The output should include:
- Top 3 products and key differences between them.
- Top complaints for each of those products.
- Worst product in the category and why it should be avoided.
Consider using Pretrained Generative Models like T5, GPT-3, or BART for generating coherent and well-structured summaries. These models excel at tasks like summarization and text generation, and can be fine-tuned to produce high-quality outputs based on the extracted insights from reviews. You are encouraged to explore other Transformer-based models available on platforms like Hugging Face. Fine-tuning any of these pre-trained models on your specific dataset could further improve the relevance and quality of the generated summaries.

Datasets

Primary Dataset: Amazon Product Reviews
Larger Dataset: Amazon Reviews Dataset
Additional Datasets: You are free to use other datasets from sources like HuggingFace, Kaggle, or any other platform.

Expectations

All your three components (classification, clustering, and text summarizer) should be visible or possible to interact with on the page in some form.
You are free to host the models on your laptop or any cloud platform (e.g., Gradio, AWS, etc.).

Deliverables

Source Code:
- Well-organized and linted code (use tools like pylint).
- Notebooks should be structured with clear headers/sections.
- Alternatively, provide plain Python files with a main() function.
README:
- A detailed README file explaining how to run the code and reproduce the results.
Final Output:
- Generated blog posts with product recommendations.
- A website, text file, or Word document containing the final results.
PPT Presentation:
- A presentation (no more than 15 minutes) tailored for both technical and non-technical audiences.
Bonus | Deployed Model:
- Bonus: A deployed website/app using the framework of your choice.
- Bonus: Host the app so it can be queried by anyone.

Evaluation Criteria

Task	Points
Data Preprocessing	15
Model for Review Classification	20
Clustering Model	20
Summarization Model	30
PDF Report (Approach, Results, Analysis)	5
PPT Presentation	10
Bonus: Deployment & Hosting the App Publicly	10

Passing Score: 70 points.

Additional Notes

Teamwork: Work individually or in groups of no more than 2 people.
Presentation: Tailor your presentation for both technical and non-technical audiences.

Suggested Workflow

Data Collection: Gather and preprocess the dataset(s).
Model Development:
- Build and evaluate the review classification model.
- Develop and test the clustering model.
- Create the summarization model using Generative AI.
OPTIONAL | Deployment: Deploy the models using your chosen framework.
Documentation: Prepare the README, PDF report, and PPT presentation.
Final Delivery: Submit all deliverables.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project | Business Case: Automated Customer Reviews

Project Goal

Problem Statement

Main Tasks

1. Review Classification

Mapping Star Ratings to Sentiment Classes

Suggested Models

Model Evaluation

Evaluation Metrics

Results

2. Product Category Clustering

3. Review Summarization Using Generative AI

Datasets

Expectations

Deliverables

Evaluation Criteria

Additional Notes

Suggested Workflow

About

Uh oh!

Releases

Packages

ironhack-labs/project-dsai-business-case-automated-customer-reviews

Folders and files

Latest commit

History

Repository files navigation

Project | Business Case: Automated Customer Reviews

Project Goal

Problem Statement

Main Tasks

1. Review Classification

Mapping Star Ratings to Sentiment Classes

Suggested Models

Model Evaluation

Evaluation Metrics

Results

2. Product Category Clustering

3. Review Summarization Using Generative AI

Datasets

Expectations

Deliverables

Evaluation Criteria

Additional Notes

Suggested Workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages