A Retrieval-Augmented Generation (RAG) application built with Streamlit that enables users to ask questions about their documents and receive AI-generated answers based on the document content.
- Dual Document Loading: Upload via UI or load from disk
- Local Models: Uses locally stored AI models (no internet required for inference)
- Configurable Models: Change embedding model and LLM via
.envfile - Vector Search: FAISS-powered semantic similarity search
- Persistent Storage: Save and load indexes using pickle
- Batch Processing: Efficient handling of large document sets
- Interactive UI: User-friendly Streamlit interface
| Requirement | Minimum | Recommended |
|---|---|---|
| Python | 3.8 | 3.10+ |
| RAM | 8GB | 16GB |
| Storage | 2GB | 5GB |
| OS | Windows/macOS/Linux | Any |
streamlit>=1.28.0
sentence-transformers>=2.2.0
transformers>=4.30.0
faiss-cpu>=1.7.4
torch>=2.0.0
python-dotenv>=1.0.0
numpy
git clone https://github.com/sukantsondhi/Rag-Application.git
cd RAGnarokWindows:
python -m venv .venv
.venv\Scripts\activatemacOS/Linux:
python3 -m venv .venv
source .venv/bin/activatepip install -r requirements.txtOr install manually:
pip install streamlit sentence-transformers transformers faiss-cpu torch python-dotenv numpyCreate a .env file in the project root:
cp .env.example .envOr create it manually with the following content:
# Embedding Model Configuration
# Options: sentence-transformers/all-MiniLM-L6-v2, sentence-transformers/all-mpnet-base-v2, etc.
EMBEDDING_MODEL_NAME=sentence-transformers/all-MiniLM-L6-v2
# LLM Configuration
# Options: google/flan-t5-small, google/flan-t5-base, google/flan-t5-large, etc.
LLM_MODEL_NAME=google/flan-t5-small
# Local model directory (models will be downloaded here)
MODELS_DIR=.modelsModels are downloaded automatically on first run, or you can pre-download them:
python -c "
from sentence_transformers import SentenceTransformer
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import os
os.makedirs('.models', exist_ok=True)
# Download embedding model
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
model.save('.models/all-MiniLM-L6-v2')
# Download LLM
tokenizer = AutoTokenizer.from_pretrained('google/flan-t5-small')
llm = AutoModelForSeq2SeqLM.from_pretrained('google/flan-t5-small')
tokenizer.save_pretrained('.models/flan-t5-small')
llm.save_pretrained('.models/flan-t5-small')
"Place your .txt files in:
to_upload_from_UI/- for UI upload testingto_upload_from_disk/- for bulk loading from disk
streamlit run RAGnarok.pyThe app will open in your default browser at http://localhost:8501
Configure the application by editing the .env file:
| Variable | Description | Default |
|---|---|---|
EMBEDDING_MODEL_NAME |
HuggingFace embedding model name | sentence-transformers/all-MiniLM-L6-v2 |
LLM_MODEL_NAME |
HuggingFace LLM model name | google/flan-t5-small |
MODELS_DIR |
Directory for storing downloaded models | .models |
| Model | Size | Dimensions | Best For |
|---|---|---|---|
sentence-transformers/all-MiniLM-L6-v2 |
80MB | 384 | Fast, general purpose |
sentence-transformers/all-mpnet-base-v2 |
420MB | 768 | Higher accuracy |
sentence-transformers/multi-qa-MiniLM-L6-cos-v1 |
80MB | 384 | Q&A tasks |
| Model | Size | Parameters | Best For |
|---|---|---|---|
google/flan-t5-small |
300MB | 80M | Fast inference, basic tasks |
google/flan-t5-base |
990MB | 250M | Balanced performance |
google/flan-t5-large |
3GB | 780M | Higher quality answers |
| Setting | Range | Default | Description |
|---|---|---|---|
| Chunk Size | 200-1000 | 500 | Characters per text chunk |
| Overlap | 0-200 | 50 | Overlapping characters between chunks |
| Top K | 1-10 | 3 | Number of relevant chunks to retrieve |
| Temperature | 0.1-1.5 | 0.9 | LLM creativity (lower = more deterministic) |
| Top P | 0.1-1.0 | 0.95 | Nucleus sampling threshold |
- Select "Fresh Index" option
- Upload your
.txtfiles using the file uploader - Click "🚀 Build Index"
- Wait for processing to complete
- Ask questions in the chat interface
- Place
.txtfiles in theto_upload_from_disk/folder - Select "Load from Disk" option
- Click "🚀 Build Index"
- Ask questions in the chat interface
- Build an initial index using Method 1 or 2
- Select "Add to Previous Index" option
- Upload additional
.txtfiles - Click "🚀 Add to Index"
- The new documents will be merged with the existing index
- Type your question in the text input
- Press Enter or wait for automatic processing
- View the generated answer
- Expand "📚 Sources" to see the relevant document chunks
Rag-Application/
├── RAGnarok.py # Main application file
├── README.md # Documentation
├── requirements.txt # Python dependencies
├── .env # Environment configuration
├── .env.example # Example environment file
├── .gitignore # Git ignore patterns
│
├── .models/ # Local AI models (auto-downloaded)
│ ├── all-MiniLM-L6-v2/ # Embedding model
│ └── flan-t5-small/ # Language model
│
├── to_upload_from_UI/ # Sample files for UI upload
│ └── *.txt
│
├── to_upload_from_disk/ # Bulk document folder
│ └── *.txt
│
├── faiss_index.pkl # Saved FAISS index (auto-generated)
└── chunks.pkl # Saved text chunks (auto-generated)
┌─────────────────────────────────────────────────────────────┐
│ USER INTERFACE │
│ (Streamlit App) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ DOCUMENT INGESTION │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ UI Upload │ │ Disk Loader │ │
│ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ TEXT PROCESSING │
│ Chunking (configurable size & overlap) │
│ Smart boundary detection (sentences, paragraphs) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ EMBEDDING GENERATION │
│ SentenceTransformer (configurable via .env) │
│ Batch processing for large datasets │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ VECTOR INDEXING │
│ FAISS IndexFlatL2 (L2/Euclidean distance) │
│ Pickle serialization for persistence │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ QUERY PROCESSING │
│ Question → Embedding → Vector similarity search │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ ANSWER GENERATION │
│ FLAN-T5 (configurable via .env) │
│ Context-aware text generation │
└─────────────────────────────────────────────────────────────┘
- Document Loading: Text files are loaded from UI upload or disk
- Chunking: Documents are split into overlapping chunks (default: 500 chars, 50 overlap)
- Embedding: Each chunk is converted to a 384-dimensional vector
- Indexing: Vectors are stored in a FAISS index for fast similarity search
- Query Processing: User questions are embedded using the same model
- Retrieval: Top-K most similar chunks are retrieved from the index
- Generation: The LLM generates an answer using the retrieved context
❌ Models not found
Solution: Click the "📥 Download Models" button or manually download models using the script in Step 5.
Solution:
- Reduce chunk size in sidebar settings
- Use a smaller model in
.env - Close other applications
Solution:
- Use batch processing (automatic for >100 chunks)
- Use a smaller embedding model
- Ensure you have sufficient RAM
ImportError: No module named 'faiss'
Solution:
pip install faiss-cpuSolution:
- Ensure
.envfile is in the project root - Install python-dotenv:
pip install python-dotenv - Restart the Streamlit app
- Use smaller models for faster inference on limited hardware
- Adjust chunk size based on your document structure
- Pre-build indexes for frequently used document sets
- Use GPU if available (install
faiss-gpuinstead offaiss-cpu)
This project is open source and available under the MIT License.
Contributions are welcome! Please feel free to submit a Pull Request.
If you encounter any issues, please open an issue on GitHub.