This repo contains a local RAG demo, a Streamlit chatbot UI, utilities to build and reuse a FAISS vector DB, and LoRA fine-tuning/merge helpers.
- Python 3.11+ (tested on Windows 10)
- Ollama running locally (
http://localhost:11434
) - Recommended shell: Windows PowerShell
python -m pip install -r requirements.txt
ollama pull nomic-embed-text
ollama pull qwen2.5:0.5b-instruct
loader.py
loads PDF(s), splits and sanitizes text, embeds with Ollama, and saves a FAISS index to disk.
Examples:
# Single PDF → saved to .\faiss_index
python .\loader.py --pdf C:\Source\research\docx\report-ko.pdf --out C:\Source\research\faiss_index --emb nomic-embed-text --base http://localhost:11434
# All PDFs in a folder
python .\loader.py --dir C:\Source\research\docx --out C:\Source\research\faiss_index --emb nomic-embed-text --base http://localhost:11434
Notes:
- Text is sanitized to remove invalid surrogate characters to avoid JSON encoding errors in the Ollama client.
- Default chunking is 250/50 (size/overlap). Adjust in
loader.py
if desired. - The output directory contains FAISS index files that can be reloaded later without recomputing embeddings.
interface.py
provides a dark-mode chat UI with chat history, retrieval controls, source panel, and an embedded PDF viewer with pagination.
streamlit run interface.py
Behavior:
- On startup, the app auto-loads a FAISS index from
FAISS_INDEX_DIR
or./faiss_index
if present. - Uses Ollama locally with defaults below; you can override via environment variables.
- Right panel shows the original PDF (picker + viewer). Pagination controls are centered at the bottom of the viewer.
Environment variables (optional):
FAISS_INDEX_DIR
→ path to a saved FAISS index (default./faiss_index
)OLLAMA_HOST
→ e.g.,http://localhost:11434
OLLAMA_EMBED
→ embedding model tag (defaultnomic-embed-text
)OLLAMA_LLM
→ chat model tag (defaultqwen2.5:0.5b-instruct
)
main.py
shows a minimal RAG flow using a prebuilt FAISS index.
Edit the index path in main.py
if needed:
INDEX_DIR = r"C:\\Source\\research\\faiss_index"
Run:
python .\main.py
Quick demonstration fine-tune using TRL.
Input data format: data.jsonl
lines with keys prompt
and response
.
{"prompt": "...", "response": "..."}
Run:
python .\train.py
Outputs are written to OUT_DIR
(default qwen2.5-3b-lora
). See code for tunables.
Creates a merged Hugging Face folder you can use directly or export to GGUF.
Examples:
# Merge only
python .\merge.py --base Qwen/Qwen2.5-3B-Instruct --adapter C:\Source\research\qwen2.5-3b-lora --out C:\Source\research\qwen2.5-3b-merged --cpu-only --dtype fp32
# Merge and test generation
python .\merge.py --base Qwen/Qwen2.5-0.5B-Instruct --adapter C:\Source\research\qwen2.5-3b-lora --out C:\Source\research\qwen2.5-3b-merged --cpu-only --dtype fp32 --infer