Real-Time ETL Pipelines, Automated MLOps Retraining, and Blockchain-Secured Logs
This project implements an integrated AI-driven, automated, and blockchain-secured network security system with real-time threat detection and immutable audit trails. It addresses the challenges of modern cybersecurity by combining machine learning, blockchain technology, and automated MLOps pipelines.
- Automated ETL Pipeline: Extract, transform, and load network security data with blockchain logging
- AI-Powered Threat Detection: Deep learning models (MLP-GRU) for anomaly detection
- Blockchain-Secured Audit System: Tamper-proof logs for forensic analysis
- MLOps Automation: Continuous model retraining and deployment
- Real-Time Monitoring Dashboard: Streamlit-based interface for live threat analytics
5th-Semester-Project-/
├── src/
│ ├── etl/ # ETL Pipeline Module
│ ├── models/ # AI/ML Models
│ ├── blockchain/ # Blockchain Integration
│ ├── mlops/ # MLOps Pipeline
│ ├── api/ # FastAPI Backend
│ └── dashboard/ # Streamlit Dashboard
├── data/
│ ├── raw/ # Raw security datasets
│ ├── processed/ # Processed data
│ └── models/ # Trained models
├── config/ # Configuration files
├── tests/ # Unit and integration tests
├── docs/ # Documentation
├── scripts/ # Utility scripts
├── notebooks/ # Jupyter notebooks for exploration
└── deployment/ # Docker & Kubernetes configs
- ETL & Processing: Python, Pandas, Apache Airflow, Apache Spark
- Data Streaming: Apache Kafka
- Machine Learning: TensorFlow, PyTorch, Scikit-learn
- MLOps: MLflow, DagsHub
- Blockchain: Web3.py, Smart Contracts, IPFS
- Databases: MongoDB, PostgreSQL
- Cloud: AWS (S3, EC2)
- Deployment: Docker, Kubernetes
- Frontend: Streamlit, FastAPI
- AI Integration: Google Gemini API
- CIC-IDS2017: Intrusion detection dataset
- UNSW-NB15: Network intrusion dataset
- Phishing Websites Dataset
- Real-time logs: Firewall logs, network packets, system events
- Python 3.10+
- macOS (or Linux/Windows with WSL)
- Git
- Sufficient disk space (at least 10GB)
- Clone the repository:
cd /Users/omsuneri/5th-Semester-Project-- Create and activate virtual environment:
python3 -m venv venv
source venv/bin/activate # On macOS/Linux- Install dependencies:
pip install --upgrade pip
pip install -r requirements.txt- Set up environment variables:
cp config/.env.example config/.env
# Edit config/.env with your API keys and configurations- Initialize the database:
python scripts/init_database.py- Download datasets:
python scripts/download_datasets.pypython src/etl/pipeline.pypython src/models/train_model.pypython src/blockchain/blockchain_node.pystreamlit run src/dashboard/app.pyuvicorn src.api.main:app --reload --host 0.0.0.0 --port 8000Edit config/config.yaml to customize:
- Database connections
- API endpoints
- Model parameters
- Blockchain settings
- Alert thresholds
Add your API keys to config/.env:
GEMINI_API_KEY=your_gemini_api_key_here
MONGODB_URI=your_mongodb_connection_string
POSTGRES_URI=your_postgres_connection_string
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
Access the dashboard at: http://localhost:8501
Features:
- Real-time threat detection
- Network traffic analysis
- Model performance metrics
- Blockchain audit logs
- Alert management
Run all tests:
pytest tests/ -v --cov=srcRun specific test modules:
pytest tests/test_etl.py -v
pytest tests/test_models.py -v
pytest tests/test_blockchain.py -vDetailed documentation is available in the docs/ folder:
- Architecture Overview
- ETL Pipeline Guide
- Model Training Guide
- Blockchain Integration
- API Documentation
- Deployment Guide
This is an academic project developed by:
- Om Santosh Suneri (UE238066)
- Shubham Choubey (UE238101)
- Sourav Biswas (UE238103)
- Tanuj Ramchandani (UE238108)
- Sehwag Meena (UE238095)
- Yatin Kumar (UE238112)
This project is developed as part of B.E.(IT) coursework at the University Institute of Engineering & Technology, Panjab University, Chandigarh.
- Dr. Amandeep Verma Ma'am
- Amrit Sandhu Ma'am
- Department of Information Technology, UIET, Panjab University
For issues or questions, please create an issue in the repository or contact the project team.
Project Type: 5th Semester Project (B.E. Information Technology)
Institution: UIET, Panjab University, Chandigarh
Academic Year: 2023-2027