This repository demonstrates how to integrate Retrieval-Augmented Generation (RAG) using Pinecone into a custom LLM for Agora's Conversational AI. The project provides a foundation for creating more contextually aware and knowledge-enhanced AI conversations by leveraging vector embeddings and semantic similarity search.
- Vector Database Integration: Store and retrieve vector embeddings with Pinecone
- RAG Implementation: Enhance LLM responses with relevant context from your knowledge base
- API Endpoints: Ready-to-use REST APIs for managing records and generating AI completions
- Streaming Support: Real-time streaming responses from LLM APIs
- Easy Setup: Simple configuration with environment variables
- Node.js (v18+)
- Pinecone account and API key
- OpenAI API key (or other LLM provider)
- Agora account (for Conversational AI integration)
git clone https://github.com/TJ-Agora/Convo-AI-Custom-LLM-Pinecone.git
cd Convo-AI-Custom-LLM-Pineconenpm installCreate a .env file in the root directory:
# LLM API (OpenAI by default)
LLM_API_KEY=your_llm_api_key
# Pinecone Configuration
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_INDEX_NAME=your_pinecone_index_name
# Server Configuration
PORT=3000
npm run devThe server will be available at http://localhost:3000 (or the port you specified).
.
├── libs/
│ └── pinecone/
│ ├── config.js # Pinecone initialization
│ └── pineconeService.js # Vector database operations
├── routes/
│ ├── chatCompletionRouter.js # LLM integration with RAG
│ └── pineconeRouter.js # CRUD operations for Pinecone
├── .env.example # Example environment variables
├── package.json # Dependencies and scripts
├── server.js # Express server setup
└── README.md # Project documentation
-
POST /rag/pinecone/store: Store a new record with vector embedding
{ "text": "Your text to be embedded and stored", "id": "optional-custom-id" } -
POST /rag/pinecone/query: Search for records by semantic similarity
{ "query": "Your search query", "options": { "limit": 5 } } -
DELETE /rag/pinecone/:id: Delete a specific record by ID
-
DELETE /rag/pinecone/clear/all: Clear all records (use with caution)
-
POST /rag/pinecone/embed: Generate embedding for text without storing
{ "text": "Text to generate embedding for" }
- POST /chat/completions: Get AI response with RAG enhancement
{ "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Tell me about vector databases." } ], "model": "gpt-4o-mini", "stream": false, "queryRag": true }
- User query is received by the chat completion endpoint
- If
queryRagis enabled, the system:- Converts the query to a vector embedding
- Searches Pinecone for semantically similar records
- Formats and injects relevant records as context
- The enhanced prompt is sent to the LLM (e.g., OpenAI)
- The LLM response, now informed by your knowledge base, is returned
This project is designed to work with Agora's Conversational AI by:
- Providing enhanced context to improve responses
- Supporting streaming for real-time conversation
- Maintaining conversation history and context
For full integration with Agora's Convo AI Engine, refer to the official Agora documentation.
Follow these steps to deploy your application to Heroku:
heroku loginheroku create your-app-name-hereReplace your-app-name-here with your desired app name.
heroku config:set PINECONE_API_KEY=your_pinecone_api_key
heroku config:set PINECONE_INDEX_NAME=your_pinecone_index_name
heroku config:set LLM_API_KEY=your_llm_api_keyheroku git:remote -a your-app-name-heregit push heroku mainOr use git push heroku master if your default branch is master.
heroku openheroku logs --tail- Make sure your
package.jsonhas the correct Node.js version in the engines section - Ensure you have a
Procfilein your root directory (should contain:web: node server.js) - Remember that Heroku's filesystem is ephemeral - any file changes will be lost on dyno restart
Built for demonstration purposes - customize for your specific needs.