Production-grade Multi-tenant Retrieval-Augmented Generation Platform
A full-stack, multi-tenant RAG platform that turns documents, URLs, and FAQs into a searchable knowledge base with AI-powered question answering. Built with Django + React, powered by OpenAI embeddings and Claude / GPT-4o LLMs.
Features Β· Architecture Β· Tech Stack Β· Quick Start Β· API Β· Testing
|
Knowledge Management
|
AI-Powered Retrieval
|
|
Multi-tenant & Secure
|
Developer Experience
|
graph TB
subgraph Client["π₯οΈ Browser (React 19 + Vite)"]
UI[React UI<br/>Tailwind CSS + Ant Design]
RTK[RTK Query<br/>State & Cache]
AUTH[JWT Auth<br/>Redux Slice]
end
subgraph Gateway["π API Gateway"]
VITE_PROXY[Vite Dev Proxy<br/>:5173 β :8000]
NGINX[Nginx<br/>Production Reverse Proxy]
end
subgraph Backend["βοΈ Django 6 + DRF"]
direction TB
URLS[URL Router<br/>/api/v1/]
MW[TenantMiddleware<br/>JWT β tenant_id]
subgraph Apps["Django Apps"]
AUTH_APP[accounts<br/>JWT + API Key Auth]
KB[knowledge_bases<br/>CRUD]
DOCS[documents<br/>Upload Β· URL Β· Chunks]
FAQ_APP[faq<br/>Q&A Management]
RETRIEVAL[retrieval<br/>Search Β· Prompt Β· Answer]
AUDIT[audit<br/>Retrieval & Query Logs]
end
subgraph Pipeline["Ingestion Pipeline"]
direction LR
PARSE[parsers<br/>PDFΒ·DOCXΒ·XLSXΒ·HTML]
CHUNK[chunking<br/>LlamaIndex Splitter]
EMBED_SVC[embeddings<br/>OpenAI API]
VSTORE[vector_store<br/>Qdrant Client]
end
end
subgraph Workers["π Celery Workers"]
INGEST[ingest_document<br/>parseβchunkβembedβindex]
INGEST_URL[ingest_url<br/>fetchβparseβchunkβembed]
EMBED_FAQ[embed_faq_item<br/>question+answerβembed]
end
subgraph Storage["πΎ Data Layer"]
PG[(PostgreSQL<br/>schema: rag)]
QDRANT[(Qdrant<br/>document_chunks)]
MINIO[(MinIO<br/>raw files)]
REDIS[(Redis<br/>Celery broker)]
end
subgraph LLM["π€ AI Services"]
OPENAI[OpenAI<br/>text-embedding-3-small<br/>GPT-4o / GPT-4o-mini]
ANTHROPIC[Anthropic<br/>Claude Sonnet 4.6<br/>Claude Haiku 4.5]
end
UI --> RTK --> VITE_PROXY --> URLS
URLS --> MW --> Apps
AUTH_APP --> PG
KB --> PG
DOCS --> PG
DOCS --> MINIO
DOCS --> REDIS
FAQ_APP --> PG
FAQ_APP --> REDIS
RETRIEVAL --> QDRANT
RETRIEVAL --> LLM
AUDIT --> PG
REDIS --> Workers
Workers --> PARSE --> CHUNK --> EMBED_SVC --> VSTORE
EMBED_SVC --> OPENAI
VSTORE --> QDRANT
Workers --> MINIO
flowchart LR
A([π File Upload\nor URL]) --> B
subgraph B["β Parse"]
B1[PDF β pypdf\nDOCX β python-docx\nXLSX β openpyxl\nHTML β BeautifulSoup4]
end
B --> C
subgraph C["β‘ Chunk"]
C1[LlamaIndex\nSentenceSplitter\nchunk_size=512 tokens\noverlap=64 tokens]
end
C --> D
subgraph D["β’ Embed"]
D1[OpenAI\ntext-embedding-3-small\nor 3-large per KB]
end
D --> E
subgraph E["β£ Index"]
E1[Qdrant upsert\nper-KB collection\nor shared collection]
end
E --> F([β
indexed\nDocument.status])
style A fill:#4f46e5,color:#fff
style F fill:#10b981,color:#fff
sequenceDiagram
actor User
participant FE as React Frontend
participant API as Django API
participant QD as Qdrant
participant LLM as Claude / GPT-4o
participant DB as PostgreSQL
User->>FE: Ask a question
FE->>API: POST /api/v1/rag/answer/\n{query, knowledge_base_id, llm_model}
API->>API: Embed query\n(OpenAI text-embedding-3-small)
API->>QD: query_points(vector, filter={tenant_id, kb_id}, top_k=5)
QD-->>API: Top-K chunks with scores
API->>API: Build prompt\n(system + context + question)
API->>LLM: Chat completion request
LLM-->>API: Generated answer
API->>DB: Save QueryLog\n(query, answer, tokens, latency_ms)
API-->>FE: {answer, sources, usage, latency_ms}
FE-->>User: Display answer + cited sources
erDiagram
TENANT {
uuid id PK
string name
string slug
string plan
}
USER {
uuid id PK
uuid tenant_id FK
string email
string role
}
KNOWLEDGE_BASE {
uuid id PK
uuid tenant_id FK
string name
int chunk_size
int retrieval_top_k
}
DOCUMENT {
uuid id PK
uuid tenant_id FK
uuid knowledge_base_id FK
string status
string file_path
}
DOCUMENT_CHUNK {
uuid id PK
uuid tenant_id FK
uuid document_id FK
text text
int chunk_index
bool is_embedded
}
FAQ_ITEM {
uuid id PK
uuid tenant_id FK
uuid knowledge_base_id FK
text question
text answer
bool is_embedded
}
TENANT ||--o{ USER : has
TENANT ||--o{ KNOWLEDGE_BASE : owns
KNOWLEDGE_BASE ||--o{ DOCUMENT : contains
KNOWLEDGE_BASE ||--o{ FAQ_ITEM : contains
DOCUMENT ||--o{ DOCUMENT_CHUNK : split_into
| Layer | Technology | Purpose |
|---|---|---|
| Runtime | Python 3.13 + uv | Fast dependency management |
| Framework | Django 6 + DRF 3.16 | Web framework + REST APIs |
| Auth | simplejwt + API Key | JWT tokens + agent access |
| Task Queue | Celery 5 + Redis | Async ingestion pipeline |
| Vector DB | Qdrant 1.17 | Cosine similarity search |
| Object Storage | MinIO | Raw file storage (S3-compatible) |
| Database | PostgreSQL (schema: rag) |
Structured data |
| Chunking | LlamaIndex SentenceSplitter / SemanticSplitter |
Token-aware text splitting |
| Embedding | OpenAI text-embedding-3-small / 3-large |
Per-KB configurable, 1536 / 3072-dim |
| LLM | Claude Sonnet 4.6 / GPT-4o | Answer generation |
| API Docs | drf-spectacular | OpenAPI 3.0 / Swagger |
| Document Parsing | pypdf Β· python-docx Β· openpyxl Β· python-pptx Β· BeautifulSoup4 | Multi-format support |
| Technology | Purpose |
|---|---|
| React 19 + TypeScript 5.9 | UI framework |
| Vite 7 | Build tool + dev proxy |
| Redux Toolkit + RTK Query | State management + data fetching |
| React Router 7 | Client-side routing |
| Ant Design 6 | UI components (tables, modals, forms) |
| Tailwind CSS 4 | Dark theme + layout + utilities |
# Start infrastructure services
docker run -d -p 6333:6333 qdrant/qdrant
docker run -d -p 19000:9000 -e MINIO_ROOT_USER=admin -e MINIO_ROOT_PASSWORD=admin123 \
minio/minio server /data --console-address ":9001"
redis-server --requirepass yourpasswordcd backend
# Install dependencies (uv auto-creates virtualenv)
uv sync
# Configure environment
cp .env.example .env
# Edit .env: set DB credentials, OPENAI_KEY, ANTHROPIC_API_KEY, REDIS_URL
# Database setup
uv run python src/manage.py migrate
uv run python src/manage.py init_qdrant # Create Qdrant collection
uv run python src/manage.py createsuperuser
# Run server
uv run python src/manage.py runserver # http://localhost:8000
# Run Celery worker (separate terminal)
cd src && uv run celery -A config.celery worker --loglevel=infocd frontend
npm install
npm run dev # http://localhost:5173Dev proxy: All
/api/*requests from:5173are automatically forwarded to:8000by Vite β no CORS config needed.
rag/
βββ backend/
β βββ .env # All configuration
β βββ pyproject.toml # uv dependencies
β βββ src/
β βββ config/
β β βββ settings/ # base / development / production
β β βββ api_router.py # /api/v1/ route registration
β β βββ celery.py # Celery app
β βββ apps/
β β βββ common/ # Base models, MinIO client, pagination
β β βββ tenants/ # Tenant model + TenantMiddleware
β β βββ accounts/ # User auth (JWT + API Key)
β β βββ knowledge_bases/ # KB CRUD
β β βββ documents/ # Upload, URL import, chunk preview
β β βββ faq/ # FAQ management + bulk import
β β βββ ingestion/ # Celery task orchestration
β β βββ parsers/ # PDF/DOCX/XLSX/PPTX/HTML parsers
β β βββ chunking/ # LlamaIndex SentenceSplitter / SemanticSplitter
β β βββ embeddings/ # OpenAI embedding service
β β βββ vector_store/ # Qdrant client wrapper
β β βββ retrieval/ # Search + prompt builder + RAG answer
β β βββ audit/ # Search & query logs
β βββ tests/ # 27 pytest tests
β
βββ frontend/
β βββ .env # VITE_API_URL, VITE_APP_NAME
β βββ src/
β βββ components/Layout/ # Collapsible sidebar + sticky header
β βββ pages/ # Login, Dashboard, KB, Docs, FAQ,
β β # Retrieval, Jobs, Logs
β βββ store/
β β βββ api/ # RTK Query endpoints (4 APIs)
β β βββ slices/authSlice # JWT token management
β βββ index.css # Tailwind v4 + Ant Design dark theme
β
βββ docs/
βββ README.md # Full technical documentation (CN)
All endpoints are prefixed with /api/v1/. Interactive docs at /api/schema/swagger-ui/.
POST /auth/login/ # Returns access + refresh JWT
POST /auth/token/refresh/ # Refresh access token
GET /auth/me/ # Current user infoGET /tenants/settings/ # Get tenant-level defaults (embedding_model, llm_model)
PATCH /tenants/settings/ # Update tenant-level defaultsGET /knowledge-bases/ # List (paginated)
POST /knowledge-bases/ # Create
PATCH /knowledge-bases/{id}/ # Update settings
DELETE /knowledge-bases/{id}/ # Delete
POST /knowledge-bases/{id}/rebuild/ # Async reindex with new embedding modelPOST /knowledge-bases/{kbId}/documents/upload/ # Upload file (multipart)
POST /knowledge-bases/{kbId}/documents/import-url/ # Import URL
GET /knowledge-bases/{kbId}/documents/{id}/chunks/ # Preview chunks
POST /knowledge-bases/{kbId}/documents/{id}/reindex/ # Re-trigger pipelineGET /knowledge-bases/{kbId}/faq/ # List FAQ items
POST /knowledge-bases/{kbId}/faq/ # Create (auto-embeds)
POST /knowledge-bases/{kbId}/faq/bulk-import/ # Bulk import
PATCH /knowledge-bases/{kbId}/faq/{id}/ # Update
DELETE /knowledge-bases/{kbId}/faq/{id}/ # DeletePOST /retrieval/search/ # Vector search β returns top-K chunks with scores
POST /rag/prompt/ # Build RAG prompt (no LLM call)
POST /rag/answer/ # Full RAG: retrieve + LLM β answer + sourcesSearch request:
{
"query": "What is RAG?",
"knowledge_base_id": "uuid",
"top_k": 5,
"score_threshold": 0.0
}Answer request:
{
"query": "What is RAG?",
"knowledge_base_id": "uuid",
"top_k": 5,
"llm_model": "claude-sonnet-4-6"
}Answer response:
{
"answer": "RAG (Retrieval-Augmented Generation) is...",
"sources": [
{ "source": "intro.pdf Β· page 3", "score": 0.921 }
],
"usage": { "prompt_tokens": 1240, "completion_tokens": 187, "total_tokens": 1427 },
"latency_ms": 1267
}# Django
DJANGO_SETTINGS_MODULE=config.settings.development
SECRET_KEY=your-secret-key
# PostgreSQL
DB_HOST=localhost
DB_PORT=5432
DB_DATABASE=demo
DB_USER=postgres
DB_PASSWORD=yourpassword
DB_SCHEMA=rag
# MinIO
MINIO_URL=localhost:19000
MINIO_USER=admin
MINIO_PASSWORD=admin123
MINIO_BUCKET=rag-documents
# Qdrant
VDB_HOST=localhost
VDB_PORT=6333
QDRANT_COLLECTION=document_chunks
# Redis / Celery
REDIS_URL=redis://:yourpassword@localhost:6379/0
# AI Keys
OPENAI_KEY=sk-proj-...
ANTHROPIC_API_KEY=sk-ant-...
DEFAULT_LLM_MODEL=claude-sonnet-4-6VITE_API_URL=http://localhost:8000
VITE_APP_NAME=RAG Platformcd backend
# Run all 27 tests
uv run pytest
# With reuse-db (faster on re-runs)
uv run pytest --reuse-db
# Specific file
uv run pytest src/tests/test_retrieval.py -v
# With coverage
uv run pytest --cov=apps --cov-report=htmlTest strategy:
- External services (MinIO, Qdrant, LLMs, background tasks) are all mocked β tests run without any infrastructure
- Real PostgreSQL is used with a
test_prefixed database conftest.pyauto-creates theragschema if missing
β test_auth.py β login, token refresh, protected endpoints
β test_knowledge_bases.py β CRUD, tenant isolation
β test_documents.py β upload, URL import, chunk preview
β test_faq.py β create, list, delete, bulk import
β test_retrieval.py β vector search, RAG answer, error handling
27 passed in 4.5s
# Build images
docker build -t rag-backend ./backend
docker build -t rag-frontend ./frontend
# Run backend (point to your infra)
docker run -d --env-file backend/.env -p 8000:8000 rag-backend
# Run frontend
docker run -d -p 80:80 rag-frontend| Page | Route | Description |
|---|---|---|
| Login | /login |
Email + password, JWT stored in localStorage |
| Dashboard | /dashboard |
Stats overview, KB summary, quick actions |
| Knowledge Bases | /knowledge-bases |
Create/edit KBs, configure chunk size & top-K, reindex with model switching |
| Documents | /knowledge-bases/:id/documents |
Upload files, import URLs, preview chunks |
| FAQ | /knowledge-bases/:id/faq |
Manage Q&A pairs, view embedding status |
| Retrieval Test | /retrieval |
Test vector search and RAG answers interactively |
| Jobs | /jobs |
Monitor ingestion pipeline progress per document |
| Logs | /logs |
Retrieval logs and RAG query logs with latency |
MIT License β see LICENSE for details.
Built with Django Β· React Β· Qdrant Β· OpenAI Β· Anthropic Β· Tailwind CSS