LLM Project

This repository contains code to load embeddings, create Qdrant collections, and perform similarity searches using Langchain and QdrantClient.

Overview

This project demonstrates how to:

Load embeddings using HuggingFaceBgeEmbeddings.
Create a Qdrant collection from documents.
Perform similarity searches with Langchain and QdrantClient.

Requirements

Python 3.8 or higher
langchain_community library
qdrant_client library
transformers library
pypdf library

Installation

Clone the repository:

git clone https://github.com/yourusername/LLM-Project.git
cd LLM-Project

Install Necessary Packages:

pip install langchain-community qdrant-client transformers pypdf

Set up Qdrant:
- Qdrant is used as the vector database. You can run Qdrant locally using Docker or download and install it directly.
- To start Qdrant with Docker:
```
docker run -p 6333:6333 qdrant/qdrant
```
- Note: Ensure Qdrant is running on http://127.0.0.1:6333 or update the URL parameter in both ingest.py and app.py to reflect your Qdrant server's address.
Place PDF Document:
- Place the PDF document (DL.pdf) in the collections directory inside your project folder.

Files

1. ingest.py

This script processes the PDF document and stores its embeddings in the Qdrant vector database.

Workflow

Load the PDF Document: Loaded using PyPDFLoader from LangChain.
Text Splitting: The document content is split into chunks (default: 1000 characters with 50 characters overlap) using RecursiveCharacterTextSplitter.
Generate Embeddings: Each text chunk is transformed into embeddings using the Hugging Face model (BAAI/bge-large-en).
Store in Qdrant: The embeddings are stored in a Qdrant collection (gpt_db).

2. app.py

This script provides an interface to query the database and retrieve the most relevant document chunks based on semantic similarity.

Workflow

Load Embeddings: Initializes the Hugging Face embeddings model.
Connect to Qdrant: Establishes a connection with the Qdrant database.
Search Query: Executes a similarity search based on the input query (example: "What is saliency maps?") and retrieves the top 5 most relevant chunks.
Display Results: Prints each retrieved document chunk along with its similarity score and metadata.

Usage

Running the Ingestion Script

To run the ingestion script:

python ingest.py

###Running the Query Script: To run the Query Script: bash python app.py Modify the query variable in app.py to customize the query string as desired.

##Dependencies

Python (>=3.8)
LangChain-Community: Library for document loaders, text splitters, and Qdrant integrations.
Qdrant-Client: Python client for interacting with the Qdrant database.
HuggingFace Transformers: Embeddings model from Hugging Face to generate vector representations of text.
PyPDF: A Python library to load PDF documents.

Install all dependencies with: bash pip install -r requirements.txt

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
ingest.py		ingest.py
raft_state.json		raft_state.json
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Project

Table of Contents

Overview

Requirements

Installation

Files

1. ingest.py

Workflow

2. app.py

Workflow

Usage

Running the Ingestion Script

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Project

Table of Contents

Overview

Requirements

Installation

Files

1. ingest.py

Workflow

2. app.py

Workflow

Usage

Running the Ingestion Script

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages