Sparkify User Churn Prediction

Flask, AWS, Azure, GCP, ML, Classification, Apache cluster, HPC

Project Overview

The goal is to predict user churn for a fictional music streaming service called Sparkify. Churn prediction is critical for subscription-based businesses to retain customers and improve their service.

Motivation

Predicting user churn helps businesses identify users who are likely to cancel their subscription. This information can be used to take proactive measures to retain customers, such as targeted promotions, personalized offers, or improved customer service.

Data

The dataset used for this project contains user activity logs for Sparkify. The data includes information such as user demographics, session details, page views, and the length of time users listened to songs. The target variable is churn, which indicates whether a user has canceled their subscription.

Architecture

The project uses the following architecture:

Data Preprocessing: Cleaning and transforming the data using PySpark.
Feature Engineering: Creating relevant features for the prediction model.
Model Training: Training a machine learning model using PySpark's MLlib.
Model Evaluation: Evaluating the model's performance using appropriate metrics.
Web Application: Building a Flask web application to serve the model for real-time predictions.
Project Report: Summarise the project workflow, results, conclusion and future improvement.

General Analysis

The general data processing is stated in Sparkify.ipynb and formatted in Sparkify.html

Web Application

A Flask web application is built to interact with the trained machine learning model. Users can submit their data through the web interface and get predictions about whether they are likely to churn.

You can find the web app in Sparkify_app/ The trained model is stored as Sparkify_app/final_model.zip

Installation

To run this project locally, follow these steps:

Set up a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Set up PySpark: Ensure you have Apache Spark installed and configured. Follow the instructions on the official Spark documentation.

Usage

Download Sparkify_app directory
Run app.py
Go to the website shown in the running terminal
Upload the user dataset on the web as indicated
Waiting for the prediction

Training the Model

To train the model, run:

sparkify_etl_model.py

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
Sparkify_app		Sparkify_app
.gitignore		.gitignore
README.md		README.md
Sparkify.html		Sparkify.html
Sparkify.ipynb		Sparkify.ipynb
Udacity Sparkify Project Report.docx		Udacity Sparkify Project Report.docx
sparkify_aws_model.py		sparkify_aws_model.py
sparkify_model_DT.py		sparkify_model_DT.py
sparkify_model_GBT.py		sparkify_model_GBT.py
sparkify_model_randomForest.py		sparkify_model_randomForest.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sparkify User Churn Prediction

Project Overview

Motivation

Data

Architecture

General Analysis

Web Application

Installation

Usage

Training the Model

About

Uh oh!

Releases

Packages

Languages

GinnCheng/Cloud-computing-classifier-breed

Folders and files

Latest commit

History

Repository files navigation

Sparkify User Churn Prediction

Project Overview

Motivation

Data

Architecture

General Analysis

Web Application

Installation

Usage

Training the Model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages