A KNIME-based project for predicting healthcare insurance costs using regression and clustering models
KNIME (Konstanz Information Miner) is an open-source data analytics platform that allows users to build workflows for data preprocessing, visualization, and machine learning without heavy coding.
It provides a drag-and-drop interface to connect different nodes for tasks like:
- Data cleaning
- Exploratory analysis
- Machine learning modeling
- Predictive analytics
In this project, I have used KNIME to perform predictive analysis of medical insurance charges.
The workflow includes:
- π₯ Importing and preprocessing healthcare data
- π Exploring features such as age, BMI, smoking status, and region
- π€ Applying machine learning models (regression) to predict insurance charges
- π Visualizing important relationships between features and charges
π― Objective: To understand the factors influencing medical insurance costs and build a reliable predictive model.
- Predict insurance charges using regression models
- Quantify the impact of smoking on medical costs
- Identify clusters of policyholders with similar risk profiles
- Investigate the combined effect of high BMI and smoking
- π¬ Smoking has the strongest predictive influence on medical costs
- π Age and BMI significantly impact charges
- β‘ Interaction effects (BMI Γ Smoking) amplify risk
- π Regression model achieved an RΒ² of 0.751