A powerful, easy-to-use platform for question answering over documents, web pages, research papers, audio, and video files using Retrieval-Augmented Generation (RAG) with Google Gemini and Qdrant.
- PDF, URL, and Research Paper (Grobid) ingestion
- Audio and Video Q&A (automatic transcription and semantic search)
- Streamlit web interface
- Google Gemini LLM integration
- In-memory Qdrant vector store
git clone https://github.com/himaenshuu/Multi_modal_rag-application
cd "my app"
It is recommended to use a virtual environment:
python -m venv venv
venv\Scripts\activate # On Windows
# or
source venv/bin/activate # On Mac/Linux
pip install -r requirements.txt
Create a .env
file in the project root and add your Google Gemini API key:
GEMINI_API=your_google_gemini_api_key
- Download FFmpeg from the official site: https://ffmpeg.org/download.html
- Choose the Windows build (e.g., from gyan.dev).
- Extract the downloaded ZIP (e.g.,
ffmpeg-release-essentials.zip
). - Move the extracted
ffmpeg
folder to a location likeC:\ffmpeg
. - Add
C:\ffmpeg\bin
to your Windows PATH:- Open System Properties > Environment Variables
- Under System variables, find and select
Path
, then click Edit - Click New and add:
C:\ffmpeg\bin
- Click OK to save
- Open a new Command Prompt and run:
You should see version info if installed correctly.
ffmpeg -version
streamlit run app.py
The app will open in your browser. Use the interface to upload files, add URLs, or ask questions!
my app/
app.py # Streamlit UI
rag_app.py # PDF, URL, research paper RAG logic
audio_video_rag.py # Audio/Video RAG logic
requirements.txt # Python dependencies
.gitignore # Git ignore rules
README.md # This file
- Temporary files (audio, video, PDFs) are saved as
temp_*
and ignored by git. - For research papers, Grobid parser is used for better structure extraction.
- All data is stored in-memory.
MIT License