Discord RAG Pipeline

A full-stack application that enables scraping Discord servers and querying the data using both RAG (Retrieval Augmented Generation) and SQL approaches. The system uses Modal for serverless deployment, FastAPI for the backend, and React with TypeScript for the frontend.

Architecture Overview

The application consists of two main services:

Backend Service (Modal/FastAPI)
- Handles Discord server scraping
- Manages SQLite database with vector embeddings
- Processes queries using either RAG or SQL approaches
- Integrates with OpenAI for embeddings and completions
Frontend Service (React/TypeScript)
- Provides user interface for server scraping
- Enables natural language querying of Discord data
- Visualizes the query processing pipeline
- Built with shadcn/ui components

Prerequisites

Python 3.8+
Bun 1.0+
Modal CLI
OpenAI API key
Discord Bot Token with necessary permissions
uv (Python package installer)

Environment Setup

Clone the repository:

git clone <repository-url>
cd discord-rag-pipeline

Create a .env file in the root directory:

OPENAI_API_KEY=your_openai_api_key
DISCORD_TOKEN=your_discord_bot_token

Create a .env file in the frontend_service directory:

VITE_MODAL_URL=http://localhost:8000  # For local development

Backend Setup

Navigate to the backend directory:

cd backend_service

Install dependencies with uv:

uv pip install -r requirements.txt

Deploy to Modal:

modal deploy src/modal_app/main.py

Frontend Setup

Navigate to the frontend directory:

cd frontend_service

Install dependencies with Bun:

bun install

Start the development server:

bun dev

Usage

Scraping Discord Data
- Enter your Discord server ID in the scraper form
- Set the desired message limit
- Click "Scrape Server" to begin data collection
Querying Data
- Enter your question in natural language
- The system automatically determines whether to use RAG or SQL
- View the complete processing pipeline in the UI

Query Examples

RAG-based queries:
- "What are the main topics discussed in the server?"
- "Summarize recent conversations about React"
SQL-based queries:
- "How many messages were sent today?"
- "Who are the most active users?"

Technical Details

Database Schema

CREATE TABLE discord_messages (
    id TEXT PRIMARY KEY,
    channel_id TEXT NOT NULL,
    author_id TEXT NOT NULL,
    content TEXT NOT NULL,
    created_at TIMESTAMP NOT NULL
);

CREATE VIRTUAL TABLE vec_discord_messages USING vec0(
    id TEXT PRIMARY KEY,
    embedding FLOAT[1536]
);

Key Features

Vector Search: Uses SQLite-VEC for efficient similarity search
Hybrid Query Processing: Automatically chooses between RAG and SQL approaches
Real-time Processing: Processes Discord messages and generates embeddings on-the-fly
Interactive UI: Visualizes the complete query processing pipeline

Development

Backend Development

The backend uses Modal for serverless deployment and includes:

FastAPI for API endpoints
SQLite with vector extension for data storage
OpenAI integration for embeddings and completions
uv for fast, reliable Python package management

Frontend Development

The frontend is built with:

React 18 with TypeScript
Vite for build tooling
shadcn/ui for components
Lucide icons
Bun for fast JavaScript runtime and package management

Troubleshooting

Discord Scraping Issues
- Ensure bot has necessary permissions
- Check server ID is correct
- Verify Discord token is valid
Query Processing Issues
- Confirm OpenAI API key is valid
- Check database contains scraped messages
- Verify embeddings are being generated correctly
Package Management Issues
- For Python: Try uv pip install --force-reinstall -r requirements.txt
- For JavaScript: Try bun install --force

License

MIT

Contributing

Fork the repository
Create your feature branch
Commit your changes
Push to the branch
Create a new Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
backend_service		backend_service
frontend_service		frontend_service
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Discord RAG Pipeline

Architecture Overview

Prerequisites

Environment Setup

Backend Setup

Frontend Setup

Usage

Query Examples

Technical Details

Database Schema

Key Features

Development

Backend Development

Frontend Development

Troubleshooting

License

Contributing

About

Uh oh!

Releases

Packages

juliettech13/discord_rag_app

Folders and files

Latest commit

History

Repository files navigation

Discord RAG Pipeline

Architecture Overview

Prerequisites

Environment Setup

Backend Setup

Frontend Setup

Usage

Query Examples

Technical Details

Database Schema

Key Features

Development

Backend Development

Frontend Development

Troubleshooting

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages