Skip to content

rohitvish07/RAG_Pipeline_

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 

Repository files navigation

YouTube Transcript QA using RAG Pipeline

This project is a simple Retrieval-Augmented Generation (RAG) pipeline that answers questions based on YouTube video transcripts.

We use freely available models from Hugging Face and process the transcript data to make it searchable and answerable using natural language questions.

πŸ” What It Does

  • Takes a YouTube video transcript
  • Splits and indexes the transcript into chunks
  • Uses a retriever to fetch relevant chunks based on a user query
  • Generates a natural language answer using a Hugging Face language model

πŸ› οΈ Technologies Used

  • Python
  • Hugging Face Transformers
  • Hugging Face Datasets
  • FAISS (for vector search)
  • Sentence Transformers (for embeddings)
  • YouTube Transcript API

πŸ“¦ How It Works (Simple Steps)

  1. Get Transcript: Automatically fetch or manually load a transcript from a YouTube video.
  2. Preprocess: Clean and split the transcript into chunks.
  3. Embed: Convert chunks into vector embeddings using a Sentence Transformer model.
  4. Store: Save the embeddings in a FAISS vector database.
  5. Ask Questions: When a question is asked, the pipeline retrieves the most relevant chunks.
  6. Generate Answer: A Hugging Face model generates a response based on the retrieved information.

πŸš€ Getting Started

Prerequisites

Install the required packages:

pip install transformers datasets faiss-cpu sentence-transformers youtube-transcript-api

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages