Skip to content

docs: add Streamlit integration cookbook#2821

Open
umutdinceryananer wants to merge 50 commits intolangfuse:mainfrom
umutdinceryananer:docs/streamlit-cookbook
Open

docs: add Streamlit integration cookbook#2821
umutdinceryananer wants to merge 50 commits intolangfuse:mainfrom
umutdinceryananer:docs/streamlit-cookbook

Conversation

@umutdinceryananer
Copy link
Copy Markdown

@umutdinceryananer umutdinceryananer commented Apr 19, 2026

What

Adds a Streamlit integration cookbook covering end-to-end Langfuse
tracing in a streaming chat app — sessions, user identification,
user-feedback scores, and live routing between OpenAI and Anthropic
with provider-tagged traces.

  • Cookbook notebook: cookbook/integration_streamlit.ipynb
  • Rendered guide: content/integrations/other/streamlit.mdx
  • Route entry: cookbook/_routes.json (isGuide: false)
  • Sidebar entry: content/integrations/other/meta.json
  • Logo: public/images/integrations/streamlit_icon.svg
  • Runnable companion: examples/streamlit-demo/

Why

Streamlit is one of the most common ways Python devs ship LLM UIs, but
there was no first-party Langfuse walkthrough. The guide mirrors the
structure of the existing Gradio cookbook so readers can jump between
the two without re-learning the layout.

Covered features

  • @observe decorator wrapping a generator-based streaming handler
  • langfuse.openai drop-in + AnthropicInstrumentor for auto-tracing
  • propagate_attributes(session_id, user_id, tags=[provider])
  • Per-tab session scoping via st.session_state
  • 👍 / 👎 feedback wired to create_score with CATEGORICAL data type
  • Live OpenAI (gpt-4o-mini) / Anthropic (claude-sonnet-4-6) switch
    via sidebar selector, with the provider tagged on every trace

Notes for maintainers

  • Runnable companion. Streamlit apps can't render inside a notebook
    the way Gradio does, so the cookbook links out to examples/streamlit-demo/
    as a one-command run. If the convention here is docs-only, I can move
    the code inline and drop the folder.
  • Screenshots. "Explore data in Langfuse" reuses existing official
    docs assets (/images/docs/session.png, /images/docs/users-list.png)
    since the dashboard views are framework-agnostic. Happy to capture
    Streamlit-specific traces and host them under
    static.langfuse.com/cookbooks/streamlit/..., or link a public demo
    trace, if you'd prefer.

Test plan

  • pnpm link-check passes for content/integrations/other/streamlit.mdx
  • examples/streamlit-demo/app.py runs against Langfuse Cloud; traces, sessions, users, feedback scores, and provider tags all populate
  • pnpm dev renders the integration page with all assets resolving

Disclaimer: Experimental PR review

Greptile Summary

This PR adds a first-party Streamlit integration cookbook covering end-to-end Langfuse tracing (sessions, user identification, user-feedback scores, multi-provider routing) along with a runnable companion app in examples/streamlit-demo/.

  • P1 — IndexError risk: trace_holder[0] in app.py (line 161) and the matching MDX snippet are accessed unconditionally. If stream_reply fails before its generator body executes, trace_holder is empty and the next st.rerun() will crash with IndexError. A guard (trace_holder[0] if trace_holder else None) fixes this safely.

Confidence Score: 4/5

Safe to merge after guarding the trace_holder access; everything else is documentation/style.

One P1 defect: unchecked index access on trace_holder will crash the app on any streaming failure. The two P2 findings (missing else branch in setup snippet, mid-document import) are minor docs inconsistencies. Fixing the P1 guard is a one-liner.

examples/streamlit-demo/app.py (line 161) and the matching block in content/integrations/other/streamlit.mdx

Important Files Changed

Filename Overview
examples/streamlit-demo/app.py Runnable Streamlit app with Langfuse tracing; potential IndexError on line 161 if trace_holder is empty after a streaming failure
content/integrations/other/streamlit.mdx Main integration guide; setup snippet silently ignores auth failures and has a mid-document import that diverges from project conventions
cookbook/integration_streamlit.ipynb Notebook version of the guide; code mirrors the MDX and composes cleanly into a single app.py
examples/streamlit-demo/requirements.txt Dependencies match the install instructions in the guide; no pinned versions other than langfuse>=4.0.0
content/integrations/other/meta.json Adds streamlit entry to the integrations sidebar in alphabetical order; looks correct
cookbook/_routes.json Adds integration_streamlit.ipynb route with docsPath and isGuide:false; consistent with adjacent entries
examples/streamlit-demo/README.md Clear setup and run instructions with links back to the integration guide
examples/streamlit-demo/.env.example Template env file with placeholder values for all required keys; no secrets present
public/images/integrations/streamlit_icon.svg SVG logo for the sidebar entry; no issues

Sequence Diagram

sequenceDiagram
    participant User as Browser Tab
    participant ST as Streamlit UI
    participant LF as Langfuse (@observe)
    participant OAI as OpenAI / Anthropic

    User->>ST: types prompt
    ST->>ST: append user message to session_state
    ST->>LF: propagate_attributes(session_id, user_id, tags)
    ST->>LF: stream_reply() [generator, @observe]
    LF->>LF: start trace, capture trace_id to trace_holder
    LF->>OAI: streaming API call (langfuse.openai or AnthropicInstrumentor)
    OAI-->>ST: stream chunks via st.write_stream
    ST->>ST: append assistant message + trace_id
    User->>ST: clicks thumbs up or thumbs down
    ST->>LF: create_score(trace_id, positive/negative, CATEGORICAL)
    LF-->>ST: flush and ack
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: examples/streamlit-demo/app.py
Line: 160-162

Comment:
**`trace_holder[0]` IndexError on empty list**

`trace_holder[0]` is accessed unconditionally after `st.write_stream`. If `stream_reply` raises before `trace_holder.append(...)` executes (e.g., a Langfuse initialisation error inside the `@observe` wrapper, or an exception thrown before the generator body is entered), `trace_holder` is empty and this line crashes the app. Even a safe path where `get_current_trace_id()` returns `None` would silently pass `None` to `create_score`, causing an API error on feedback.

```suggestion
        trace_id = trace_holder[0] if trace_holder else None
        st.session_state.messages.append(
            {"role": "assistant", "content": reply, "trace_id": trace_id}
        )
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: content/integrations/other/streamlit.mdx
Line: 71-77

Comment:
**Auth-failure branch missing in snippet**

The setup snippet's `init_langfuse()` only prints on success; it silently does nothing when `auth_check()` returns `False`. The full `app.py` (line 25) includes the `else` branch that tells users what to fix. The snippet is what most readers will copy first, so omitting the failure path makes misconfiguration invisible.

```suggestion
@st.cache_resource
def init_langfuse():
    client = get_client()
    if client.auth_check():
        print("Langfuse client authenticated")
    else:
        print("Langfuse authentication failed — check LANGFUSE_PUBLIC_KEY / LANGFUSE_SECRET_KEY / LANGFUSE_BASE_URL")
    atexit.register(client.flush)
    return client
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: content/integrations/other/streamlit.mdx
Line: 133

Comment:
**Inline import mid-document**

`from langfuse import observe, propagate_attributes` appears in a mid-page code block rather than at the top of the file. Readers who stitch the snippets together into one script may end up with mid-module imports. The project rule is to move all imports to the top of the module. Consider moving this line into the initial "Setup" imports block alongside the other `langfuse` imports, and noting in the prose that both names are imported there.

**Rule Used:** Move imports to the top of the module instead of p... ([source](https://app.greptile.com/review/custom-context?memory=c960fc07-9928-409f-a18b-a780cbdded12))

**Learned From**
[langfuse/langfuse-python#1387](https://github.com/langfuse/langfuse-python/pull/1387)

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "chore: remove stray .claude/settings.loc..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

Context used:

  • Rule used - Move imports to the top of the module instead of p... (source)

Learned From
langfuse/langfuse-python#1387

Anchors examples/streamlit-demo/ so subsequent commits can add the
chat app, notebook sources, and README. Ignores venv, .env,
__pycache__, and .ipynb_checkpoints for local dev.

Closes #12
Pins langfuse to the SDK v3 line (OpenTelemetry-based) so later
commits can rely on get_client() and @observe from langfuse
directly. streamlit, openai, and python-dotenv float to latest.

Closes #13
.env.example lists Langfuse credentials (public key, secret key,
base URL for EU vs US region selection) and the OpenAI API key
needed by the baseline chat app. Users copy to .env and fill in.

Closes #14
Baseline Streamlit app using st.chat_input and st.chat_message,
with a plain OpenAI call against gpt-4o-mini. Chat history
persists across reruns via st.session_state. No Langfuse wiring
yet — subsequent commits layer in tracing, sessions, feedback.

Closes #15
Covers prerequisites, venv setup, dependency install, env file
creation, and the streamlit run command. Notes that Langfuse
credentials stay as placeholders until later commits wire up
tracing.

Closes #16
Wraps get_client() + auth_check() in @st.cache_resource so the
credential check runs once per session instead of on every
Streamlit rerun. No tracing yet — this commit only verifies the
client can authenticate against the configured Langfuse instance.

Closes #17
Swap `from openai import OpenAI` for `from langfuse.openai import OpenAI`.
Every chat.completions.create call now emits a trace + generation to
Langfuse without any further code changes.
Extract the OpenAI call into `generate_reply` and decorate it with
`@observe()`. The trace now has a `generate_reply` parent span with the
auto-instrumented OpenAI generation nested as a child, giving a clean
request-level grouping instead of a flat generation.
Register `client.flush` with `atexit` inside the cached initializer so
any pending spans in the OpenTelemetry buffer are exported before the
Streamlit process tears down. Prevents trace loss on Ctrl+C / hot-reload.
Generate a uuid4 on first page load and keep it in Streamlit's
session_state so it survives reruns within the same browser tab.
Sets up the identifier that will be attached to Langfuse traces in the
next commit.
Pass the session_state uuid into `generate_reply` and call
`langfuse.update_current_trace(session_id=...)` inside the @observe
span. All traces from one browser tab now group under a single session
in the Langfuse Sessions view.
Langfuse SDK v4 removed `update_current_trace` on the client. Trace-level
attributes (session_id, user_id, tags, metadata) are now set via the
`propagate_attributes` context manager, which must wrap span creation so
inheritance works correctly.

Wrap the `generate_reply` call in `propagate_attributes(session_id=...)`
so the @observe span inherits the session_id at creation time.
Adds a sidebar reset button that clears the chat history and generates a
fresh session_id. Lets users start a new Langfuse session on demand
within the same browser tab without refreshing the page.
The app relies on v4-only APIs (`propagate_attributes`, the v4 `@observe`
and `get_client` pattern). The previous `>=3.0.0` pin would allow a
fresh install to resolve to v3 and break at runtime. Bumping the minimum
to v4 prevents that drift.
Return the Langfuse trace_id alongside the reply from `generate_reply`
and store it on the message dict in `session_state`. Sets up the hook
for the thumbs feedback UI, which needs a trace_id to attach scores to.
Render 👍 / 👎 buttons under each assistant message. Clicking either
posts a numeric `user-feedback` score (1 or -1) to the message's trace
via `langfuse.create_score` and marks the trace as scored in
session_state so the buttons don't re-appear on rerun.
…iately

The new assistant message was only rendered inline with `st.markdown`
and was not passing through the history loop where the thumbs buttons
are drawn. Users had to send a second message before the first one
showed feedback controls. Triggering `st.rerun()` after the append
forces the history loop to redraw the new message with its buttons.
Use CATEGORICAL data type with "positive" / "negative" values instead of
NUMERIC 1 / -1. Matches the common convention for binary feedback,
renders as a clean label in the Langfuse UI (no trailing zeros) and
groups naturally in score aggregations by category.
`create_score` buffers the event on the batch queue, so scores weren't
appearing in the Langfuse dashboard until the process exited (when the
atexit handler fired). Calling `flush` right after the score keeps the
UX immediate: click thumbs, refresh Langfuse, see the score.
Convert `generate_reply` into a generator that yields delta tokens from
a streaming chat completion, then render it with `st.write_stream` so
users see the reply typed out rather than a sudden block. The @observe
wrapper keeps the span open until the generator is exhausted, so the
full streamed content is still captured as one Langfuse generation.

Trace_id capture moves to a small list passed in by the caller because
a generator can't return a second value.
Pass `stream_options={"include_usage": True}` so OpenAI sends a final
chunk with exact prompt/completion/total token counts. The langfuse
wrapper reports OpenAI's numbers instead of falling back to a local
tiktoken estimate, which keeps cost attribution accurate.

Guard the loop against the terminal usage chunk (which has empty
`choices`) so iteration doesn't IndexError on the last frame.
Adds "Your name" text input in sidebar, stores in
st.session_state["user_id"]. Defaults to "anonymous" when blank.

Closes #28
Adds langfuse.update_current_trace(user_id=...) inside the @observe
handler so chat traces are tagged with the active user.

Closes #29
Adds anthropic and opentelemetry-instrumentation-anthropic to
requirements.txt for upcoming multi-provider support.

Closes #30
Wraps the Anthropic SDK with OpenTelemetry instrumentation once at app
startup, guarded against Streamlit reruns via @st.cache_resource. Sets
up the plumbing for Anthropic provider routing in #32.

Closes #31
Sidebar dropdown selects model; routes to OpenAI or Anthropic SDK
accordingly and tags traces with provider name.

Closes #32
Adds the empty notebook shell with NOTEBOOK_METADATA, intro,
"What is Streamlit?"/"What is Langfuse?" blockquotes,
STEPS_START/STEPS_END markers, and LearnMore trailer. Steps
will be filled in subsequent commits.

Closes #33
Adds the empty notebook shell with NOTEBOOK_METADATA, intro,
"What is Streamlit?"/"What is Langfuse?" blockquotes,
STEPS_START/STEPS_END markers, and LearnMore trailer. Also
registers the notebook in cookbook/_routes.json so the build
script processes it. Steps will be filled in subsequent commits.

Closes #33
Adds Step 1 markdown header and %pip install cell covering
streamlit, langfuse, openai, anthropic, the OTel Anthropic
instrumentor, and python-dotenv.

Closes #34
Adds env var setup (Langfuse + OpenAI + Anthropic keys) and
get_client() with auth_check() verification.

Closes #35
Adds the baseline chat app — OpenAI client, st.session_state
for message history, st.chat_input/chat_message UI. No Langfuse
instrumentation yet.

Closes #36
Layers Langfuse on the baseline: langfuse.openai drop-in for
automatic LLM tracing, @observe-wrapped handler for one trace
per turn, and @st.cache_resource init for the Langfuse client
to survive Streamlit reruns.

Closes #37
Generates a session_id with uuid, attaches it to traces via
propagate_attributes, and adds a "New conversation" button
that resets the message history and mints a fresh session.

Closes #38
Adds 👍/👎 buttons under each assistant message wired to
langfuse.create_score with data_type="CATEGORICAL". Captures
trace_id during the @observe call and stores it on the message
so each trace can be scored exactly once.

Closes #39
Switches the handler to a streaming generator with stream=True,
renders chunks via st.write_stream, and requests authoritative
token usage with stream_options={"include_usage": True}. Uses a
trace_holder list to surface the trace_id since generators can't
return a tuple.

Closes #40
Adds a "Your name" sidebar input that persists in
st.session_state.user_id (defaults to "anonymous") and is
attached to traces via propagate_attributes alongside the
session_id.

Closes #41
Adds Anthropic alongside OpenAI: AnthropicInstrumentor at
startup so OpenTelemetry spans land under the same trace, a
sidebar model selector, and provider-named tags via
propagate_attributes for filtering in the Langfuse UI.

Closes #42
Closes the cookbook with a tour of the Langfuse UI: traces,
sessions, users, scores, tags, and shareable trace links.
References screenshot assets at langfuse.com/images/cookbook/
integration-streamlit/ to be uploaded with the upstream PR.

Closes #43
Official color mark SVG from Streamlit's brand kit
(https://streamlit.io/brand), ~1.7KB, aligned with the
pattern used for other integration icons in this directory.

Closes #46
Adds "streamlit" to content/integrations/other/meta.json so the
Streamlit page appears in the left-nav under Integrations → Other,
alphabetically between promptfoo and testable-minds.

Closes #47
Assistant messages carry a trace_id field in st.session_state for
feedback scoring. OpenAI silently ignores the extra key but Anthropic
rejects the request with:

  BadRequestError: messages.1.trace_id: Extra inputs are not permitted

Project the history down to {role, content} inside stream_reply before
handing it to either provider. Applied identically to the cookbook
notebook's Step 9 cell.

Closes #49
Captures referenced by integration_streamlit.ipynb Step 10:
- streamlit-trace.png: stream_reply trace detail with nested
  anthropic.chat generation, session/user/tag metadata, and
  negative user-feedback score
- streamlit-session.png: session view with 3 chronological
  generations for user umut, including a positive score
- streamlit-users.png: users listing with 7 users, event
  counts, tokens and cost columns

Closes #48
Generated by scripts/update_cookbook_docs.sh from
cookbook/integration_streamlit.ipynb (routed via _routes.json
with isGuide=false). NOTEBOOK_METADATA converted to YAML
frontmatter, STEPS markers converted to <Steps> component,
MARKDOWN_COMPONENT converted to <LearnMore /> import.

Closes #44
Switch from the "Step 1..10" walkthrough to a gradio-style topical
layout (Introduction, Setup, Implementation of Chat functions, Run
Streamlit App, Explore data in Langfuse). Drops the repeated full-app
snippets in favor of one consolidated app.py reference, cutting the
rendered page from ~746 to ~415 lines.

Replace the three custom retina-limited screenshots under
public/images/cookbook/integration-streamlit/ with existing official
Langfuse UI assets (docs/session.png, docs/users-list.png) and link
to the Tracing docs + public demo project for the trace view.
The original README described the initial OpenAI-only baseline. The
app has since grown to cover Langfuse tracing, sessions, user
identification, streaming, user-feedback scores, and multi-provider
routing — update the description, prerequisites, and how-it-works
bullets to match the current state, and add a pointer back to the
Streamlit integration guide as the canonical reference.
The file is a local per-user Claude Code permission cache and was
tracked by accident in early branch commits. Untrack it and add the
path to .gitignore so it cannot be re-committed.
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 19, 2026

@Automaticare is attempting to deploy a commit to the langfuse Team on Vercel.

A member of the Team first needs to authorize it.

@dosubot dosubot Bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Apr 19, 2026
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 19, 2026

CLA assistant check
All committers have signed the CLA.

@dosubot dosubot Bot added the documentation Improvements or additions to documentation label Apr 19, 2026
Comment thread examples/streamlit-demo/app.py
Comment thread content/integrations/other/streamlit.mdx
Addresses Greptile review feedback on langfuse#2821:

- P1: trace_holder[0] could IndexError if stream_reply raised before
  the append; fall back to trace_id = None so the UI stays alive and
  the scored-traces guard naturally hides the thumbs buttons on rows
  without a trace id.
- P2: the abbreviated init_langfuse snippet in the guide only printed
  the success path, making misconfigured credentials silent. Mirror
  the else branch from the full app.py so readers copying the snippet
  also see the auth-failure hint.
Consolidates observe and propagate_attributes into the Setup
imports rather than re-importing them mid-document. Adds a brief
prose note so readers know where the names came from. Resolves
Greptile P2 in both the notebook and the rendered mdx.
@umutdinceryananer
Copy link
Copy Markdown
Author

Quick update: all three Greptile findings now resolved (P1 IndexError guard in d7b9bcf, P2 auth-failure branch in d7b9bcf, P2 mid-document imports in d00eff5). Ready for review when you have time — happy to capture Streamlit-specific screenshots or adjust scope if helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants