add error analysis guide#2892
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
@claude review |
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
All prior feedback addressed (notebook source added, frontmatter complete, skill link fixed, illustrative-data caveat reworded).
Extended reasoning...
Overview
This documentation-only PR adds a new error-analysis cookbook guide. It includes a hand-written MDX, a corresponding source notebook in cookbook/, a _routes.json registration, a meta.json entry, and a small frameborder -> frameBorder JSX casing fix in a related blog post. No code paths, build scripts, or runtime behavior are changed.
Security risks
None. This is content-only — no auth, network, or data-handling code involved. External links go to langfuse.com pages (verified to exist) and to public references like hamel.dev.
Level of scrutiny
Light scrutiny appropriate for a docs-only change. The earlier higher-stakes risk (regen script silently deleting a hand-edited cookbook MDX with no notebook source) is now eliminated because a backing notebook was added and registered in cookbook/_routes.json.
Other factors
All four issues I raised on earlier revisions have been addressed in the latest commit: backing notebook present, frontmatter has category: Evaluation and sidebarTitle, the Claude Code skill link now points at a path that actually resolves (content/docs/api-and-data-platform/features/agent-skill.mdx), and Step 4.2's draft-style parenthetical has been reworded to frame the partial-sample numbers as illustrative. The Greptile bot's confidence is 4/5 and the only outstanding flag from external review is the same illustrative-data caveat that has now been reworded. Bug hunting system found no bugs.
Restores cookbook/error-analysis-llm-applications.ipynb and its _routes.json entry. Reverts lib/cookbook_route_mapping.ts to main — the showNotebookBanner field is no longer used. The standard "this is a Jupyter notebook" banner will show for this page. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Disclaimer: Experimental PR review
Greptile Summary
Adds a new cookbook guide on error analysis for LLM applications, covering open coding, failure taxonomy clustering, labeling, and deciding when to build evaluators versus fix prompts. The guide also registers the new page in
meta.json.Confidence Score: 4/5
Safe to merge; the only flag is a draft note in the example data that should be cleaned up before the guide goes live.
The guide is well-structured and the process it describes is technically sound. The one thing worth fixing before publishing is the parenthetical in Step 4.2 that reveals the example bar chart is based on only 19 of 100 traces — readers may lose confidence in the example data if that note ships as-is.
content/guides/cookbook/error-analysis-llm-applications.mdx — specifically the Step 4.2 failure rates table and its incomplete-data caveat.
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[Choose what to annotate\nTrace vs. GENERATION observation] --> B[Select ~100 representative traces\nby latency, cost, tags, multi-turn] B --> C[Create annotation queue\nwith open_coding + pass_fail_assessment] C --> D[Open code first 30-50 traces\nFree-text observations, no pre-defined categories] D --> E{New failure types\nstill appearing?} E -- Yes --> D E -- No --> F[Cluster into 5-10 named failure categories\nSplit by root cause, merge by same root cause] F --> G[Create boolean score configs per category\nNew queue with all 10 score configs] G --> H[Label all 100 traces] H --> I[Compute failure rates\nLangfuse Dashboard - Scores widget] I --> J{For each category:\nCan we just fix it?} J -- Yes --> K[Prompt / tool / code fix] J -- No --> L{Worth building\nan evaluator?} L -- Yes --> M[LLM-as-judge or code-based check] L -- No --> N[Monitor / defer] K & M & N --> O[Re-run after next\nprompt rewrite, model switch, or incident]Prompt To Fix All With AI
Reviews (1): Last reviewed commit: "add error analysis guide" | Re-trigger Greptile