AutoML

A powerful web application that combines automated machine learning capabilities with interactive data analysis and natural language processing. Built with Streamlit and PyCaret, this application makes machine learning accessible to everyone.

Features

1. Data Management

Upload your own CSV datasets with automatic encoding detection
Support for 30+ different file encodings including UTF-8, Latin1, and various Asian encodings
Choose from pre-loaded sample datasets
Automatic data persistence between sessions

2. Exploratory Data Analysis

Interactive data profiling with ydata-profiling
Full-page report display with comprehensive statistics
Visual data exploration
Automatic data type detection and analysis

3. Automated Machine Learning

Classification
- Multiple model comparison
- Model performance metrics
- Model download capability
Regression
- Multiple model comparison
- Model performance metrics
- Model download capability
Clustering
- K-means clustering
- Cluster assignment visualization
Anomaly Detection
- Isolation Forest implementation
- Anomaly scoring
Time Series Forecasting
- ARIMA modeling
- Time series visualization
- Forecast predictions

4. Advanced Preprocessing

TF-IDF text preprocessing with customizable features
PCA dimensionality reduction
Automatic feature engineering
Support for mixed data types
Text data transformation

5. Natural Language Interface

Powered by Groq's Llama 3.3 70B model
Interactive data exploration through natural language
Intelligent data analysis responses
Context-aware question answering

Installation

Clone the repository:

git clone https://github.com/yourusername/AutoML.git
cd AutoML

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Create a .streamlit/secrets.toml file with your Groq API key:

[groq]
api_key = "your-groq-api-key"

Usage

Start the Streamlit app:

streamlit run app.py

Navigate through the different sections using the sidebar:
- Upload Dataset: Import your own CSV files with automatic encoding detection
- Use Sample Dataset: Choose from pre-loaded datasets
- Exploratory Data Analysis: View detailed data profiles
- Machine Learning: Run automated ML tasks
- Ask Questions: Interact with your data using natural language

Supported Sample Datasets

Classification

Iris: Classic classification dataset
Diabetes: Medical classification dataset
Bank: Banking customer classification

Regression

Boston: Housing price prediction
Insurance: Insurance cost prediction

Clustering

Wine: Wine quality clustering

Anomaly Detection

Credit Card: Credit card fraud detection

Time Series

Airline: Airline passenger forecasting

Model Features

Classification & Regression

Multiple model comparison
Performance metrics visualization
Model download capability
Feature importance analysis
Optional preprocessing with TF-IDF or PCA

Clustering

K-means implementation
Cluster visualization
Cluster assignment export

Anomaly Detection

Isolation Forest algorithm
Anomaly score calculation
Threshold-based detection

Time Series

ARIMA modeling
Forecast visualization
Time series decomposition

Requirements

The project uses the following main packages:

streamlit
pycaret
ydata-profiling
langchain
groq
pandas
numpy
scikit-learn
plotly
matplotlib
seaborn

For a complete list of dependencies, see requirements.txt.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. When contributing:

Fork the repository
Create a new branch for your feature
Submit a pull request with a clear description of your changes

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.devcontainer		.devcontainer
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
index.html		index.html
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoML

Features

1. Data Management

2. Exploratory Data Analysis

3. Automated Machine Learning

4. Advanced Preprocessing

5. Natural Language Interface

Installation

Usage

Supported Sample Datasets

Classification

Regression

Clustering

Anomaly Detection

Time Series

Model Features

Classification & Regression

Clustering

Anomaly Detection

Time Series

Requirements

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AutoML

Features

1. Data Management

2. Exploratory Data Analysis

3. Automated Machine Learning

4. Advanced Preprocessing

5. Natural Language Interface

Installation

Usage

Supported Sample Datasets

Classification

Regression

Clustering

Anomaly Detection

Time Series

Model Features

Classification & Regression

Clustering

Anomaly Detection

Time Series

Requirements

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages