Skip to content

kiet08hogit/mini-pdf-rag-processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF RAG Processing Pipeline

This repository showcases a production-ready Retrieval-Augmented Generation (RAG) pipeline using FastAPI, LangChain, Azure AI Search, Azure Blob Storage, and DeepEval for evaluation.

Features

  • Document Ingestion: Upload PDFs, chunk them dynamically, and generate HuggingFace embeddings.
  • Azure Integration: Raw documents are stored securely in Azure Blob Storage, while vector embeddings are synced with Azure AI Search for fast retrieval.
  • Dynamic Retrieval: Search vector data using hybrid search and retrieve context to formulate a response using Google Gemini models.
  • DeepEval Testing: Includes an evaluation suite to grade the pipeline on Answer Relevancy and Faithfulness, ensuring the AI model doesn't hallucinate and returns high-quality context.

Setup Instructions

  1. Install Dependencies:

    pip install -r requirements.txt
  2. Environment Variables: Create a .env file in the root directory and add the following:

    GOOGLE_API_KEY=your-google-api-key
    AZURE_SEARCH_ENDPOINT=your-azure-search-endpoint
    AZURE_SEARCH_KEY=your-azure-search-key
    AZURE_SEARCH_INDEX_NAME=langchain-vector-index
    AZURE_STORAGE_CONNECTION_STRING=your-azure-storage-connection
    AZURE_STORAGE_CONTAINER_NAME=your-container-name
  3. Run the Server:

    uvicorn main:app --reload

    Access the interactive Swagger UI at http://127.0.0.1:8000/docs.

Running Tests

To evaluate the RAG model's performance, use DeepEval:

deepeval test run tests/test_rag.py

About

pdf-processing-rag_advanced

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages