A powerful web application that combines automated machine learning capabilities with interactive data analysis and natural language processing. Built with Streamlit and PyCaret, this application makes machine learning accessible to everyone.
- Upload your own CSV datasets with automatic encoding detection
- Support for 30+ different file encodings including UTF-8, Latin1, and various Asian encodings
- Choose from pre-loaded sample datasets
- Automatic data persistence between sessions
- Interactive data profiling with ydata-profiling
- Full-page report display with comprehensive statistics
- Visual data exploration
- Automatic data type detection and analysis
- Classification
- Multiple model comparison
- Model performance metrics
- Model download capability
- Regression
- Multiple model comparison
- Model performance metrics
- Model download capability
- Clustering
- K-means clustering
- Cluster assignment visualization
- Anomaly Detection
- Isolation Forest implementation
- Anomaly scoring
- Time Series Forecasting
- ARIMA modeling
- Time series visualization
- Forecast predictions
- TF-IDF text preprocessing with customizable features
- PCA dimensionality reduction
- Automatic feature engineering
- Support for mixed data types
- Text data transformation
- Powered by Groq's Llama 3.3 70B model
- Interactive data exploration through natural language
- Intelligent data analysis responses
- Context-aware question answering
- Clone the repository:
git clone https://github.com/yourusername/AutoML.git
cd AutoML- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Create a
.streamlit/secrets.tomlfile with your Groq API key:
[groq]
api_key = "your-groq-api-key"- Start the Streamlit app:
streamlit run app.py- Navigate through the different sections using the sidebar:
- Upload Dataset: Import your own CSV files with automatic encoding detection
- Use Sample Dataset: Choose from pre-loaded datasets
- Exploratory Data Analysis: View detailed data profiles
- Machine Learning: Run automated ML tasks
- Ask Questions: Interact with your data using natural language
- Iris: Classic classification dataset
- Diabetes: Medical classification dataset
- Bank: Banking customer classification
- Boston: Housing price prediction
- Insurance: Insurance cost prediction
- Wine: Wine quality clustering
- Credit Card: Credit card fraud detection
- Airline: Airline passenger forecasting
- Multiple model comparison
- Performance metrics visualization
- Model download capability
- Feature importance analysis
- Optional preprocessing with TF-IDF or PCA
- K-means implementation
- Cluster visualization
- Cluster assignment export
- Isolation Forest algorithm
- Anomaly score calculation
- Threshold-based detection
- ARIMA modeling
- Forecast visualization
- Time series decomposition
The project uses the following main packages:
- streamlit
- pycaret
- ydata-profiling
- langchain
- groq
- pandas
- numpy
- scikit-learn
- plotly
- matplotlib
- seaborn
For a complete list of dependencies, see requirements.txt.
Contributions are welcome! Please feel free to submit a Pull Request. When contributing:
- Fork the repository
- Create a new branch for your feature
- Submit a pull request with a clear description of your changes
This project is licensed under the MIT License - see the LICENSE file for details.