This repository contains code to load embeddings, create Qdrant collections, and perform similarity searches using Langchain and QdrantClient.
This project demonstrates how to:
- Load embeddings using HuggingFaceBgeEmbeddings.
- Create a Qdrant collection from documents.
- Perform similarity searches with Langchain and QdrantClient.
- Python 3.8 or higher
langchain_communitylibraryqdrant_clientlibrarytransformerslibrarypypdflibrary
-
Clone the repository:
git clone https://github.com/yourusername/LLM-Project.git cd LLM-Project -
Install Necessary Packages:
pip install langchain-community qdrant-client transformers pypdf
-
Set up Qdrant:
- Qdrant is used as the vector database. You can run Qdrant locally using Docker or download and install it directly.
- To start Qdrant with Docker:
docker run -p 6333:6333 qdrant/qdrant
- Note: Ensure Qdrant is running on http://127.0.0.1:6333 or update the URL parameter in both
ingest.pyandapp.pyto reflect your Qdrant server's address.
-
Place PDF Document:
- Place the PDF document (
DL.pdf) in thecollectionsdirectory inside your project folder.
- Place the PDF document (
This script processes the PDF document and stores its embeddings in the Qdrant vector database.
- Load the PDF Document: Loaded using
PyPDFLoaderfrom LangChain. - Text Splitting: The document content is split into chunks (default: 1000 characters with 50 characters overlap) using
RecursiveCharacterTextSplitter. - Generate Embeddings: Each text chunk is transformed into embeddings using the Hugging Face model (
BAAI/bge-large-en). - Store in Qdrant: The embeddings are stored in a Qdrant collection (
gpt_db).
This script provides an interface to query the database and retrieve the most relevant document chunks based on semantic similarity.
- Load Embeddings: Initializes the Hugging Face embeddings model.
- Connect to Qdrant: Establishes a connection with the Qdrant database.
- Search Query: Executes a similarity search based on the input query (example: "What is saliency maps?") and retrieves the top 5 most relevant chunks.
- Display Results: Prints each retrieved document chunk along with its similarity score and metadata.
To run the ingestion script:
python ingest.py###Running the Query Script:
To run the Query Script:
bash python app.py
Modify the query variable in app.py to customize the query string as desired.
##Dependencies
- Python (>=3.8)
- LangChain-Community: Library for document loaders, text splitters, and Qdrant integrations.
- Qdrant-Client: Python client for interacting with the Qdrant database.
- HuggingFace Transformers: Embeddings model from Hugging Face to generate vector representations of text.
- PyPDF: A Python library to load PDF documents.
Install all dependencies with:
bash pip install -r requirements.txt
This project is licensed under the MIT License. See the LICENSE file for details.