AI-Powered Meeting Transcription & Summarization Platform
SummitAI is an end-to-end meeting intelligence system that automatically transcribes, analyzes, and summarizes meetings. It leverages Whisper for speech recognition, Gemini for abstractive summarization, and a FastAPI backend integrated with MongoDB and Tebi for efficient data storage and retrieval.
- 🎧 Automatic Speech Recognition (ASR) using OpenAI Whisper
- 🧠 Summarization powered by Google Gemini for concise, context-aware meeting overviews
- ⚡ Audio Preprocessing with FFmpeg for robust media handling
- 🧩 FastAPI Backend for RESTful endpoints and async data processing
- 🗃️ MongoDB + Tebi Integration for scalable storage of transcripts, summaries, and metadata
- 📊 Performance Benchmarking utilities for ASR and summarization workloads
┌──────────────────┐
│ Audio Input │
│ (.wav/.mp3 file) │
└───────┬──────────┘
│
▼
┌──────────────┐
│ FFmpeg │
│ Audio Preproc│
└───────┬──────┘
│
▼
┌──────────────┐
│ Whisper │
│ Transcribe │
└───────┬──────┘
│
▼
┌──────────────┐
│ Gemini │
│ Summarization│
└───────┬──────┘
│
▼
┌──────────────────┐
│ FastAPI + Mongo │
│ + Tebi Backend │
└──────────────────┘
| Component | Technology |
|---|---|
| ASR | OpenAI Whisper |
| Summarization | Google Gemini API |
| Audio Processing | FFmpeg |
| Backend | FastAPI |
| Database | MongoDB, Tebi |
| Benchmarking | Custom Python Scripts |
Tested on a 1-minute WAV file processed end-to-end.
| Metric | Value / Range |
|---|---|
| Avg CPU Usage | ~49.8% |
| Peak RAM Usage | ~64.1% |
| Disk Read (MB) | Negligible (~0.08 MB total) |
| Disk Write (MB) | ~15 MB (peaks around 5 MB/s) |
| Avg Processing Time | ~25 seconds for 1 min audio |
| Throughput | ~2.4× real-time |
Interpretation: SummitAI maintains a stable CPU footprint (~50%) and consistent memory usage (~64%), with minimal disk I/O. This indicates efficient streaming and in-memory processing suitable for scalable multi-session workloads.
git clone https://github.com/prabhsuratsingh/SummitAI.git
cd SummitAIdocker build -t server .docker run -p 8000:8000 server- 🔊 Real-time streaming ASR pipeline
- 🗣️ Speaker diarization and emotion tagging
- 📅 Meeting analytics dashboard (insights, action items)
- ☁️ Multi-cloud deployment (Tebi, GCP, AWS)
Prabhsurat Singh Linkedin: @prabhsuratsingh