Flask, AWS, Azure, GCP, ML, Classification, Apache cluster, HPC
The goal is to predict user churn for a fictional music streaming service called Sparkify. Churn prediction is critical for subscription-based businesses to retain customers and improve their service.
Predicting user churn helps businesses identify users who are likely to cancel their subscription. This information can be used to take proactive measures to retain customers, such as targeted promotions, personalized offers, or improved customer service.
The dataset used for this project contains user activity logs for Sparkify. The data includes information such as user demographics, session details, page views, and the length of time users listened to songs. The target variable is churn
, which indicates whether a user has canceled their subscription.
The project uses the following architecture:
- Data Preprocessing: Cleaning and transforming the data using PySpark.
- Feature Engineering: Creating relevant features for the prediction model.
- Model Training: Training a machine learning model using PySpark's MLlib.
- Model Evaluation: Evaluating the model's performance using appropriate metrics.
- Web Application: Building a Flask web application to serve the model for real-time predictions.
- Project Report: Summarise the project workflow, results, conclusion and future improvement.
The general data processing is stated in Sparkify.ipynb and formatted in Sparkify.html
A Flask web application is built to interact with the trained machine learning model. Users can submit their data through the web interface and get predictions about whether they are likely to churn.
You can find the web app in Sparkify_app/ The trained model is stored as Sparkify_app/final_model.zip
To run this project locally, follow these steps:
-
Set up a virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Set up PySpark: Ensure you have Apache Spark installed and configured. Follow the instructions on the official Spark documentation.
- Download Sparkify_app directory
- Run app.py
- Go to the website shown in the running terminal
- Upload the user dataset on the web as indicated
- Waiting for the prediction
To train the model, run:
sparkify_etl_model.py