Skip to content

This repository contains an end-to-end machine learning project focused on predicting whether a patient is diabetic or non-diabetic based on diagnostic health measurements.

License

Notifications You must be signed in to change notification settings

abanoub-refaat/gtc-diabetes-predection-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GTC Internship - Diabetes Prediction using Machine Learning

Repository Description

This repository contains an end-to-end machine learning project that predicts whether a patient is diabetic or non-diabetic based on diagnostic health measurements. The project covers the full ML workflow: exploratory data analysis (EDA), preprocessing, model building, evaluation, and deployment of a simple prediction function.

Project Workflow

Phase 1: Data Exploration

  • Explore dataset distributions and class balance.
  • Visualize relationships (e.g., glucose vs outcome, BMI vs outcome).
  • Use charts and summary statistics to uncover hidden insights.

Phase 2: Data Preprocessing

  • Standardize features for fair comparison across models.
  • Split the dataset into training and testing sets.
  • Ensure the model generalizes well to unseen data.

Phase 3: Model Building & Training

  • Implement at least two ML models:

    • Logistic Regression
    • Support Vector Machine (SVM)
    • Random Forest (optional, for robustness)
  • Apply hyperparameter tuning using GridSearchCV.

  • Compare model performance based on accuracy and other metrics.

Phase 4: Prediction Engine

  • Build a function that:

    • Accepts new patient data as input.
    • Returns an instant prediction → Diabetic or Non-Diabetic.

Repository Structure

diabetes-prediction-ml/
│-- data/                # Dataset (if shareable)
│-- notebooks/           # Jupyter notebooks for each phase
│-- src/                 # Source code (preprocessing, models, utils)
│-- results/             # Saved graphs, charts, and reports
|-- LICENSE              # MIT License
│-- README.md            # Project documentation

⚙️ Technologies Used

  • Python 🐍
  • Pandas & NumPy → Data manipulation
  • Matplotlib & Seaborn → Data visualization
  • Scikit-learn → ML models & evaluation

Evaluation Metrics

  • Accuracy
  • Precision
  • Recall
  • F1-Score
  • Confusion Matrix

How to Run

  1. Clone this repo:

    git clone https://github.com/abanoub-refaat/gtc-diabetes-prediction-model.git
    cd gtc-diabetes-prediction-model
  2. Open the notebooks to explore each phase:

    jupyter notebook notebooks/
  3. Run the prediction function (example in notebooks or src/predict.py).

Project Goals

  • Understand and apply the end-to-end ML pipeline.
  • Gain insights from EDA and data preprocessing.
  • Compare models and improve performance with tuning.
  • Build a working prediction tool for diabetes detection.

References & Learning Resources

Author

Developed by Abanoub Refaat as part of the GTC Machine Learning Internship project.

About

This repository contains an end-to-end machine learning project focused on predicting whether a patient is diabetic or non-diabetic based on diagnostic health measurements.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published