This repository contains an end-to-end machine learning project that predicts whether a patient is diabetic or non-diabetic based on diagnostic health measurements. The project covers the full ML workflow: exploratory data analysis (EDA), preprocessing, model building, evaluation, and deployment of a simple prediction function.
- Explore dataset distributions and class balance.
- Visualize relationships (e.g., glucose vs outcome, BMI vs outcome).
- Use charts and summary statistics to uncover hidden insights.
- Standardize features for fair comparison across models.
- Split the dataset into training and testing sets.
- Ensure the model generalizes well to unseen data.
-
Implement at least two ML models:
- Logistic Regression
- Support Vector Machine (SVM)
- Random Forest (optional, for robustness)
-
Apply hyperparameter tuning using GridSearchCV.
-
Compare model performance based on accuracy and other metrics.
-
Build a function that:
- Accepts new patient data as input.
- Returns an instant prediction → Diabetic or Non-Diabetic.
diabetes-prediction-ml/
│-- data/ # Dataset (if shareable)
│-- notebooks/ # Jupyter notebooks for each phase
│-- src/ # Source code (preprocessing, models, utils)
│-- results/ # Saved graphs, charts, and reports
|-- LICENSE # MIT License
│-- README.md # Project documentation
- Python 🐍
- Pandas & NumPy → Data manipulation
- Matplotlib & Seaborn → Data visualization
- Scikit-learn → ML models & evaluation
- Accuracy
- Precision
- Recall
- F1-Score
- Confusion Matrix
-
Clone this repo:
git clone https://github.com/abanoub-refaat/gtc-diabetes-prediction-model.git cd gtc-diabetes-prediction-model -
Open the notebooks to explore each phase:
jupyter notebook notebooks/
-
Run the prediction function (example in notebooks or
src/predict.py).
- Understand and apply the end-to-end ML pipeline.
- Gain insights from EDA and data preprocessing.
- Compare models and improve performance with tuning.
- Build a working prediction tool for diabetes detection.
- Scikit-learn Documentation
- EDA for Classification
- Why Standardize Data for ML?
- Hyperparameter Tuning with GridSearchCV
Developed by Abanoub Refaat as part of the GTC Machine Learning Internship project.