Addresses the problem of predicting whether a company will go bankrupt based on financial ratios using several classification techniques.
The repository includes:
- 📂
project2company/: All core Python scripts, data preprocessing, model training and evaluation. - 📄 ΕφΠλη_Εργασία 2.pdf: Final report (Assignment 2).
- 📄 ΕφΠλη_Εργασία 3.pdf: Follow-up report.
- 📄 Καλές πρακτικές checklist.pdf: Report writing guidelines.
- Python 3.x
- Pandas, NumPy, Matplotlib, Seaborn
- Scikit-learn (MinMaxScaler, StratifiedKFold, classifiers)
- Google Colab (execution environment)
- Excel (for pivot table visualization)
- Data Validation – Check for missing values (NaN)
- Normalization – Apply MinMax scaling to numeric features
- Stratified K-Fold Split (k=4)
- Downsampling – Balance classes (3:1 ratio of healthy to bankrupt)
- Model Training & Evaluation on 8 classifiers:
- Logistic Regression, LDA, KNN, Decision Tree, Random Forest, SVM, Naive Bayes, Gradient Boosting
- Performance Metrics – Accuracy, Precision, Recall, F1, AUC, Recall_Healthy
- Confusion Matrix – Train/Test visualized
- Export to
.csvfor further Excel-based analysis
- Output metrics are saved to:
balancedDataOutcomes.csv - Excel pivot tables were used to compare average performance across classifiers.
- Visual comparisons: stacked bar charts, F1 vs Recall, grouped charts.
- Open any
.pyscript from this repository in Google Colab - Run the notebook cells sequentially.
- Outputs will appear inline (metrics, confusion matrices, graphs).
Alternatively, download the repo and run locally with:
pip install -r requirements.txt
python check_and_normalize.py
python model_loop_all.py