A fully local Retrieval-Augmented Generation (RAG) system built using Wikipedia .zim dumps, Ollama, Gemma 3 4B, embeddings, and FAISS.
This project extracts Wikipedia data from a .zim archive, cleans and chunks the text, generates embeddings, stores them in a vector database, and allows semantic question-answering locally using Gemma.
- Local Wikipedia semantic search
- Fully offline RAG pipeline
- Ollama integration
- Gemma 3 4B support
- FAISS vector search
- Chunk metadata filtering
- CLI chatbot interface
- ASCII loading animation
- Wikipedia
.zimextraction usinglibzim
| Component | Technology |
|---|---|
| LLM | Gemma 3 4B |
| Runtime | Ollama |
| Embeddings | nomic-embed-text |
| Vector DB | FAISS |
| Dataset | Wikipedia .zim |
| Language | Python |
| Extraction | libzim |
Wikipedia .zim
↓
Extract Articles
↓
Clean Text
↓
Chunk Text
↓
Generate Embeddings
↓
Store in FAISS
↓
User Query
↓
Retrieve Relevant Chunks
↓
Send Context to Gemma
↓
Generate Answer
git clone https://github.com/your-username/your-repo.git
cd your-repopython -m venv .venvActivate:
.venv\Scripts\activatesource .venv/bin/activatepip install libzim numpy requests tqdm notebook jupyterOptional:
pip install sentence-transformers torchDownload: https://ollama.com/download
Pull models:
ollama pull gemma2:4b
ollama pull nomic-embed-textDownload a Wikipedia .zim file from:
Example:
- Wikipedia English
- Wikipedia Mini
- Custom datasets
The pipeline:
- Opens
.zimarchive usinglibzim - Extracts articles
- Cleans HTML/text
- Chunks text into overlapping segments
- Generates embeddings
- Stores embeddings in FAISS
chunk_size = 300
overlap = 50Each chunk stores:
- text
- metadata
- chunk length
The system:
- retrieves more chunks than required
- filters low-quality chunks
- limits context size before generation
This improves:
- answer quality
- retrieval relevance
- context efficiency
python rag_chat.pyExample:
Ask: What is a black hole?
Thinking... ⠸
Answer:
A black hole is a region of spacetime where gravity is so strong that nothing, including light, can escape.
Different versions of libzim exposed different APIs:
- missing
iter_entries - missing
get_entry_by_id - different entry handling
Used:
zim.get_random_entry()with:
- deduplication
- redirect filtering
- namespace filtering
item.content returned memoryview instead of bytes.
bytes(item.content).decode("utf-8", errors="ignore")Large chunks exceeded embedding model context limits.
- reduced chunk size
- added hard text trimming before embedding
Sometimes Ollama returned:
{"error": "..."}instead of:
{"response": "..."}Added validation and debugging for API responses.
Interactive loops inside Jupyter notebooks behaved inconsistently.
Moved chatbot loop into standalone .py script.
- FAISS installation issues on some Windows setups
- Ollama embeddings become slow at very large scales
- Current pipeline still loads large chunk lists into memory
- Retrieval quality can still be improved with reranking
- Switch embeddings to
sentence-transformers - Streaming dataset processing
- SQLite / JSONL chunk storage
- Better FAISS indexes (IVF / HNSW)
- Web UI
- Citation-aware answers
- Hybrid retrieval
- Multi-threaded embedding generation