A lightweight, fast audio transcription service that automatically converts audio files to text, generates summaries, and saves them as markdown files. Built with Python, Whisper AI, and Gemini AI.
- 🎯 Automatic audio file detection and processing
- 🔄 Automatic file format conversion to MP3 using FFmpeg
- 📝 High-quality transcription using OpenAI's Whisper
- 📚 AI-powered summaries using Google's Gemini
- 📋 Clean markdown output format
- 🔄 Background processing service
- 🚀 FastAPI endpoint for direct uploads
- Install dependencies:
pip install fastapi uvicorn whisper google-generativeai python-dotenv python-multipart-
Install FFmpeg (required for audio conversion)
-
Create a
.envfile with:
GENERATIVEAI_API_KEY=your_gemini_api_key
SCAN_DIR=path/to/input/folder
TRANSCRIPT_DIR=path/to/output/folder- Start the server:
uvicorn main:app --host 0.0.0.0 --port 8000- Drop audio files into your
SCAN_DIRfolder - Files will be automatically processed and transcripts will appear in
TRANSCRIPT_DIR
Send POST requests to /transcribe/ with audio files:
curl -X POST -F "file=@your_audio.mp3" http://localhost:8000/transcribe/Transcripts are saved as markdown files with:
- Timestamp of generation
- AI-generated summary
- Full transcription text
- Python 3.8+
- FFmpeg
- Google Gemini API key