Sfenbox (SFIT Enquiry box) - Advanced Admission Enquiry Chatbot with Unified Hybrid RAG Framework (URAG)

A modular pipeline for building a college information chatbot using LLMs, with support for ingesting both crawled website data and PDF documents as context.

Features

URAG-D (Document Augmentation):
Processes crawled web data or PDFs, semantically chunks, rewrites, and summarizes content for robust retrieval.
URAG-F (FAQ Enrichment):
Generates and paraphrases FAQs from augmented documents for diverse and accurate chatbot responses.
PDF Support:
Ingests and processes PDF files as context.
FastAPI Backend:
Exposes a /chat endpoint for chatbot queries.

Folder Structure

urag-sfenbox/
│
├── python_backend/
│   ├── urag_preparation.py      # Data preparation and augmentation pipeline
│   ├── main.py                  # FastAPI app for chatbot API
│   └── __init__.py
├── pdf_docs/                    # (Recommended) Place your PDF files here
├── data/                        # (Optional) JSON data files
├── .gitignore
└── README.md

Setup

Clone the repository:

git clone https://github.com/AnleaMJ/urag-sfenbox.git
cd urag-sfenbox

Create and activate a virtual environment:

python -m venv sfenbox
sfenbox\Scripts\activate  # On Windows

Install dependencies:
```
pip install -r requirements.txt
```
(If requirements.txt is missing, install manually: pip install fastapi uvicorn langchain-community langchain-core and other required packages.)
Configure your environment:
- Edit config.py with your HuggingFace API token and model names.
- Place your PDF files in a folder (e.g., pdf_docs/).

Data Preparation

Run the preparation pipeline to process your data:

python python_backend/urag_preparation.py

PDF Crawling & Caching:
All PDFs in the pdf_docs folder are automatically extracted and cached as pdf_crawled_data.json for faster future runs.
On subsequent runs, the pipeline loads PDF data from this JSON file instead of re-processing the PDFs.

To use both PDF files and firecrawl (web-crawled JSON) data as context:

# In urag_preparation.py __main__ section:
augmented_docs = prep.urag_d_augment_documents(
    use_pdf=True,
    pdf_folder="pdf_docs",
    use_firecrawl=True,
    firecrawl_json=None,  # or path to your firecrawl JSON file
    pdf_json="pdf_crawled_data.json"
)

Running the Chatbot API

Start the FastAPI server:

uvicorn python_backend.main:app --reload

The API will be available at http://127.0.0.1:8000

Test the /chat endpoint with a POST request:

{
  "question": "What courses are offered?"
}

Deployment

For production, use a process manager (e.g., Gunicorn with Uvicorn workers).
Deploy on a cloud VM or platform (Azure, AWS, GCP, Heroku, etc.).
Connect a frontend (React, Streamlit, etc.) to the FastAPI backend.

Tips

Use JSON as context for faster repeated runs.
Add new PDFs to pdf_docs/ and re-run the preparation pipeline as needed.
Use .gitignore to avoid committing cache files.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.bolt		.bolt
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
pdf_docs		pdf_docs
python_backend		python_backend
src		src
.gitignore		.gitignore
DEPLOYMENT.md		DEPLOYMENT.md
README.md		README.md
eslint.config.js		eslint.config.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sfenbox (SFIT Enquiry box) - Advanced Admission Enquiry Chatbot with Unified Hybrid RAG Framework (URAG)

Features

Folder Structure

Setup

Data Preparation

Running the Chatbot API

Deployment

Tips

License

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

AnleaMJ/urag-sfenbox

Folders and files

Latest commit

History

Repository files navigation

Sfenbox (SFIT Enquiry box) - Advanced Admission Enquiry Chatbot with Unified Hybrid RAG Framework (URAG)

Features

Folder Structure

Setup

Data Preparation

Running the Chatbot API

Deployment

Tips

License

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages