Skip to content
Merged

Dev #1228

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 55 additions & 1 deletion docs/getting-started/essentials.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,11 @@ Work through in order, or jump to the section you need:
4. [**Context management**](#context-management)
5. [**Basic RAG**](#basic-rag)
6. [**Open Terminal**](#open-terminal)
7. [**Troubleshooting**](#troubleshooting)

:::tip Deploying for a team?
If you are setting up Open WebUI for multiple users, also read the [**Scaling Open WebUI**](/getting-started/advanced-topics/scaling) guide. It covers infrastructure decisions (PostgreSQL, Redis, external vector databases, shared storage) that are separate from the feature-level essentials on this page. The two guides are additive — work through the essentials here for day-to-day usage, and the scaling guide for multi-user infrastructure.
:::

---

Expand Down Expand Up @@ -55,6 +60,13 @@ If you are running an older local model or a fine-tune that does not expose a fu

Many of the tools people look for are already built into Open WebUI and just need to be turned on: **web search**, **code execution**, **image generation**, **memory**, and **knowledge-base retrieval** are all available without installing any plugins. Once enabled, these appear automatically as system tools when using Native Mode.

Most of these need a small amount of setup (choosing a provider, adding an API key, or enabling a toggle). Setup guides for the most popular ones:

- [**Web Search**](/features/chat-conversations/web-search/) — connect a search provider (Google, Brave, DuckDuckGo, SearXNG, and many more) so the model can look things up
- [**Image Generation**](/features/chat-conversations/image-generation-and-editing/usage) — connect an image provider (OpenAI DALL-E, ComfyUI, Automatic1111, etc.) for in-chat image creation
- [**Code Execution**](/features/chat-conversations/chat-features/code-execution/) — run code blocks directly in chat (Pyodide runs in-browser by default, or connect Jupyter for server-side execution)
- [**Memory**](/features/chat-conversations/memory) — let the model remember facts about you across conversations

For anything not built in, the [**Open WebUI Community site**](https://openwebui.com/) is worth browsing. A few categories to give a sense of what is available:

- **Observability / cost tracking**: Langfuse, OpenLit, Portkey. Log every chat turn, token usage, and latency to your own stack.
Expand Down Expand Up @@ -195,16 +207,37 @@ The defaults are reasonable for getting started. When you outgrow them, there ar

- **Embedding engine.** The default (SentenceTransformers `all-MiniLM-L6-v2`) runs locally on CPU and consumes roughly 500 MB of RAM per worker. For any multi-user deployment, point at an external embeddings API (OpenAI, or Ollama with `nomic-embed-text`) via `RAG_EMBEDDING_ENGINE`.
- **Content extraction engine.** The default uses `pypdf`, which leaks memory during heavy ingestion. For anything beyond casual use, switch to **Tika** or **Docling** via `CONTENT_EXTRACTION_ENGINE`.
- **Vector database.** The default ChromaDB (local SQLite-backed) does not tolerate multi-worker deployments. At scale, use **Milvus**, **Qdrant**, or **PGVector**.
- **Vector database.** The default ChromaDB (local SQLite-backed) does not tolerate multi-worker deployments. At scale, switch to **PGVector** — it is the only vector database officially supported and maintained by the Open WebUI team. Milvus, Qdrant, and MariaDB Vector are also available but are community-maintained: they may break on upgrades and fixes depend on community contributions. See the [env-configuration reference](/reference/env-configuration#vector_db) for setup and the community disclaimers on each provider.

:::note When to worry about this
None of these matter for "a single user with a handful of PDFs." All of them start mattering the moment you have 100 documents or 10 concurrent users.
:::

### Recommended starting config

If you just want RAG to work well out of the box, these settings are a solid general-purpose starting point. They are not fine-tuned for every use case, but they will produce noticeably better results than the defaults for most document types.

Set these in **Admin Panel > Settings > Documents**:

| Setting | Recommended value | Default | Why |
|---------|-------------------|---------|-----|
| **Text Splitter** | `token` | `character` | Token-based splitting produces more consistent chunk sizes across document types |
| **Markdown Header Splitting** | **On** | On | Respects document structure by splitting at headings, keeping sections coherent |
| **Chunk Size** | `2000` | `1000` | Larger chunks preserve more surrounding context per retrieval hit |
| **Chunk Overlap** | `200` | `100` | More overlap means less chance of cutting a key sentence in half |
| **Top K** | `15` | `3` | Retrieves more candidate chunks, giving the model a wider pool of relevant context. If you are working with local models that have constrained context sizes, lower this to `5` to avoid filling the context window with retrieved chunks |
| **Embedding Model** | External (OpenAI or Ollama) | `all-MiniLM-L6-v2` (local CPU) | The default works for a single user but consumes ~500 MB RAM per worker. For any multi-user setup, use an external embedding API instead |

:::tip Embedding model
The default SentenceTransformers model runs locally on CPU and is fine for a single user getting started. For anything beyond that, point at an external embeddings API: set `RAG_EMBEDDING_ENGINE=openai` with an OpenAI API key, or `RAG_EMBEDDING_ENGINE=ollama` with any Ollama embedding model (e.g., `nomic-embed-text`). This offloads the work and frees significant RAM.
:::

More detail:
- [RAG overview](/features/chat-conversations/rag/)
- [Knowledge workspace](/features/workspace/knowledge)
- [Performance tuning for RAG](/troubleshooting/performance#embedding-engine)
- [Scaling: external vector database](/getting-started/advanced-topics/scaling#step-4--switch-to-an-external-vector-database) — required for multi-worker and multi-replica deployments
- [Scaling: content extraction & embeddings](/getting-started/advanced-topics/scaling#step-6--fix-content-extraction--embeddings) — fixing memory leaks at scale

---

Expand Down Expand Up @@ -238,6 +271,27 @@ Everything else (enterprise SSO, multi-replica HA, Redis scaling, observability)

---

## Troubleshooting

When something goes wrong, start here:

| Having problems with... | Read this |
|---|---|
| Connection refused, 401 errors, CORS failures, WebSocket disconnects | [Connection Errors](/troubleshooting/connection-error) |
| "Prompt is too long" or context window exceeded | [Context Window / Prompt Too Long](/troubleshooting/context-window) |
| RAG not returning relevant results, uploads failing, knowledge base issues | [RAG Troubleshooting](/troubleshooting/rag) |
| Web search not working or returning poor results | [Web Search Troubleshooting](/troubleshooting/web-search) |
| Image generation errors or provider setup | [Image Generation Troubleshooting](/troubleshooting/image-generation) |
| Speech-to-text, text-to-speech, or audio playback | [Audio Troubleshooting](/troubleshooting/audio) |
| SSO, OAuth, or LDAP login issues | [SSO & OAuth Troubleshooting](/troubleshooting/sso) |
| High memory usage, slow responses, or worker crashes | [Performance & RAM](/troubleshooting/performance) · [Scaling Guide](/getting-started/advanced-topics/scaling) |
| Login loops, config drift, or database locks in multi-replica setups | [Scaling & HA Troubleshooting](/troubleshooting/multi-replica) · [Scaling Guide](/getting-started/advanced-topics/scaling) |
| Locked out of admin account | [Reset Admin Password](/troubleshooting/password-reset) |
| TLS certificate errors with custom/internal CAs | [Custom CA Store](/troubleshooting/custom-ca) |
| Alembic migration errors or manual schema fixes | [Database Migration](/troubleshooting/manual-database-migration) |

---

## Questions?

This page is the condensed version. The full docs go much deeper. If you did not find what you needed:
Expand Down
69 changes: 69 additions & 0 deletions docs/troubleshooting/rag.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,75 @@ Retrieval-Augmented Generation (RAG) enables language models to reason over exte

This page covers the most common RAG problems and their solutions.

---

## Recommended Settings by Scenario

RAG performance depends heavily on your model setup. The defaults are conservative to work everywhere, but you will get much better results by tuning them to your specific deployment. All of these are configured in **Admin Panel > Settings > Documents**.

### Local models with constrained context (≤ 8K tokens)

Typical for Ollama models on consumer hardware (Llama 3.1 8B, Qwen 2.5 7B, Gemma 3 4B, Phi-3 Mini, etc.) where context windows are limited.

| Setting | Value | Why |
|---------|-------|-----|
| **Text Splitter** | `token` | Consistent chunk sizes aligned to actual token counts |
| **Markdown Header Splitting** | **On** | Preserves document structure |
| **Chunk Size** | `1000` | Smaller chunks so each retrieval result fits comfortably |
| **Chunk Overlap** | `100` | Standard overlap |
| **Top K** | `3–5` | Fewer chunks to avoid blowing out your limited context budget |
| **Full Context Mode** | Off | Cannot afford to send full documents |

:::tip Context budget math
With a chunk size of 1000 tokens and Top K of 5, RAG will inject roughly 5,000 tokens of retrieved content. Add your system prompt and conversation history — if your model has an 8K context window, that leaves about 2–3K tokens for the conversation itself. Adjust Top K downward if follow-up questions get cut off.
:::

### Cloud models with large context (32K+ tokens)

Typical for OpenAI (GPT-4o, GPT-5), Anthropic (Claude), Google (Gemini), DeepSeek, and other API providers where context windows are 32K–1M+ tokens.

| Setting | Value | Why |
|---------|-------|-----|
| **Text Splitter** | `token` | Consistent chunk sizes |
| **Markdown Header Splitting** | **On** | Preserves document structure |
| **Chunk Size** | `2000` | Larger chunks preserve more surrounding context per hit |
| **Chunk Overlap** | `200` | More overlap prevents sentence splits |
| **Top K** | `15–25` | Cast a wide net — you have the context budget for it |
| **Full Context Mode** | Consider for small docs | If a document is under ~20K tokens, full context can outperform retrieval |

:::tip When to use Full Context Mode
If you are working with a single small-to-medium document (under ~50 pages) and a large-context model, Full Context Mode often gives better results than chunked retrieval because the model sees everything with no retrieval gaps. Toggle it per-chat in the chat settings.
:::

### Mixed environment (local + cloud)

If you use both local and cloud models, configure for the lowest common denominator and override per-model or per-chat:

| Setting | Value | Why |
|---------|-------|-----|
| **Text Splitter** | `token` | Works for both |
| **Markdown Header Splitting** | **On** | Works for both |
| **Chunk Size** | `1500` | Compromise — not too large for local, not too small for cloud |
| **Chunk Overlap** | `200` | Safe default |
| **Top K** | `10` | Moderate — a good middle ground for mixed model usage |

---

## Embedding Model Recommendations

The embedding model determines retrieval quality. The right choice depends on your deployment:

| Scenario | Recommended | Why |
|----------|-------------|-----|
| **Single user, getting started** | `all-MiniLM-L6-v2` (default) | Runs locally, no setup needed |
| **Multi-user, Ollama available** | `nomic-embed-text` via Ollama | Offloads to Ollama, frees Open WebUI RAM |
| **Production, API budget available** | `text-embedding-3-small` via OpenAI | Best retrieval quality per dollar |
| **Air-gapped / self-hosted** | `nomic-embed-text` or `mxbai-embed-large` via Ollama | No external API calls needed |

After changing your embedding model, **reindex all existing documents** in **Admin Panel > Settings > Documents** for the new embeddings to take effect.

---

## Common RAG Issues and How to Fix Them

### 1. The Model "Can't See" Your Content
Expand Down
Loading