Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions examples/approval/bot/agent.config.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,25 @@
/**
* @agent Approval Agent
* @pattern Human-in-the-Loop Tool Execution
*
* WHY THIS AGENT EXISTS:
* This agent demonstrates how to add a user approval gate before executing any tool.
* In production AI systems, certain actions (e.g., making purchases, modifying data,
* sending emails) should NOT execute autonomously — they require explicit human consent.
* This pattern solves the "trust boundary" problem: the LLM decides WHAT to do, but the
* human confirms WHETHER it actually happens.
*
* ARCHITECTURE DECISIONS:
* - Minimal config: This agent deliberately uses the simplest possible configuration to
* isolate and showcase the approval pattern without distractions.
* - Single integration (webchat): Approval UX relies on interactive buttons for
* approve/reject, which webchat supports natively.
* - Cerebras model: Chosen for low latency on a pattern that involves multiple LLM
* round-trips (propose -> wait -> re-execute), where speed matters more than reasoning depth.
* - No bot/user state in config: State is handled inside the conversation via
* ToolWithApproval.ApprovalState, keeping the config clean and the approval logic
* self-contained within the conversation handler.
*/
import { defineConfig } from "@botpress/runtime";

export default defineConfig({
Expand Down
32 changes: 32 additions & 0 deletions examples/approval/bot/src/conversations/index.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,33 @@
/**
* @conversation Approval Agent - Webchat Conversation
*
* WHY IT'S BUILT THIS WAY:
* This conversation handler demonstrates the "approval gate" pattern for AI tool execution.
* The key insight is that ToolWithApproval is a drop-in replacement for Autonomous.Tool —
* you define tools exactly the same way, but they automatically require user approval before
* executing. This makes it trivial to add human oversight to any existing tool.
*
* HOW THE APPROVAL FLOW WORKS:
* 1. LLM decides to call "foo" with inputs {x: 3, y: 5}
* 2. ToolWithApproval.execute() is called — but there's no pending approval yet
* 3. It throws an error: "requires approval before execution"
* 4. The LLM catches this error and asks the user to approve (via buttons)
* 5. User clicks "Approve" — this creates a new user message in the transcript
* 6. LLM retries calling "foo" with the same inputs
* 7. ToolWithApproval.execute() sees a pending approval AND a new user message → executes
*
* WHY STATE EXTENSION (not separate state):
* The state uses z.object({}).extend(ToolWithApproval.ApprovalState) rather than defining
* pendingApprovals inline. This keeps the approval mechanism encapsulated — if you want to
* add approval to any conversation, you just .extend() the state. The conversation's own
* business state stays clean.
*
* WHY TOOLS ARE DEFINED INSIDE THE HANDLER:
* The FooTool is created inside the handler (not at module scope) because it needs access
* to the conversation `state` object. ToolWithApproval stores pending approvals in state,
* so it must receive the live state reference to read/write approval records. This is a
* deliberate ADK pattern: tools that need conversation context are defined inside handlers.
*/
import { Conversation, z } from "@botpress/runtime";

import { ToolWithApproval } from "./tool-with-approval";
Expand All @@ -11,6 +41,8 @@ export const Webchat = new Conversation({

handler: async ({ execute, state }) => {
// Define a tool that requires approval instead of "Autonomous.Tool"
// Created inside the handler because it needs the live `state` reference
// to track pending approvals across LLM iterations
const FooTool = new ToolWithApproval({
state,
name: "foo",
Expand Down
51 changes: 46 additions & 5 deletions examples/approval/bot/src/conversations/tool-with-approval.ts
Original file line number Diff line number Diff line change
@@ -1,12 +1,53 @@
import { Autonomous, context, z } from "@botpress/runtime";

/**
* A tool that requires user approval before execution.
* When executed, if there is no pending approval for the given input,
* it will throw an error indicating that approval is needed.
* @class ToolWithApproval
* @pattern Error-Driven Approval Gate (extends Autonomous.Tool)
*
* This tool keeps track of pending approvals in the conversation state.
* You need to extend the conversation state with `ToolWithApproval.ApprovalState` to use it.
* WHY THIS APPROACH (error-throwing instead of a separate approval tool):
* The approval mechanism works by exploiting the LLM's error-recovery loop. When the
* tool throws an error saying "requires approval", the LLM naturally responds by asking
* the user for confirmation. This is more elegant than a separate "request_approval" tool
* because:
* 1. No extra tool definition needed — any tool can become approval-gated
* 2. The LLM's built-in error handling drives the UX naturally
* 3. The approval state is invisible to the LLM — it just retries the same tool call
*
* HOW THE STATE MACHINE WORKS:
* State transitions for a single tool call:
*
* [No approval] --LLM calls tool--> [Pending approval created, error thrown]
* | |
* | [LLM asks user to approve]
* | |
* | [User sends new message]
* | |
* v [LLM retries same tool call]
* [Tool executes] <----match found + new user message----+
* |
* [Approval record cleaned up]
*
* WHY DEEP EQUALITY CHECK (not approval IDs):
* Approvals are matched by comparing the tool name + full input object via deep equality.
* This was chosen over unique approval IDs because the LLM doesn't need to track IDs —
* it simply retries the exact same tool call. This keeps the LLM prompt clean and reduces
* the chance of the LLM fabricating or misremembering an approval ID.
*
* WHY lastUserMessageId CHECK:
* When the LLM retries a tool call, we verify that the user has sent a NEW message since
* the approval was created. This prevents auto-approval: without this check, the LLM could
* call the tool, get the error, immediately retry, and auto-approve itself. The message ID
* check guarantees a real human interaction happened between attempts.
*
* WHY .slice(-10) ON PENDING APPROVALS:
* The pending approvals array is capped at 10 entries to prevent unbounded state growth.
* In practice, there should only be 1-2 pending approvals at a time, but the cap prevents
* edge cases where the LLM repeatedly proposes different inputs without user response.
*
* WHY structuredClone:
* The input object is deep-cloned before storing in state to prevent reference sharing
* between the approval record and the LLM's working memory. Without cloning, mutations
* to the input object could silently break the deep equality check on retry.
*/
export class ToolWithApproval extends Autonomous.Tool {
private state: {
Expand Down
23 changes: 23 additions & 0 deletions examples/brand-extractor/bot/agent.config.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,26 @@
/**
* @agent Brand Extractor Agent
* @pattern Conversation + Background Workflow with Real-Time Progress UI
*
* WHY THIS AGENT EXISTS:
* This agent extracts brand identity (colors, logo, themes) from any company website.
* It demonstrates the "conversation-starts-workflow" pattern where the LLM handles user
* interaction while a durable Workflow runs the heavy multi-step extraction in the background.
*
* ARCHITECTURE DECISIONS:
* - Conversation + Workflow split: The conversation handler manages chat UX (greeting,
* clarification, status) while the Workflow handles the multi-step extraction pipeline
* (search, screenshot, vision analysis). This separation exists because extraction takes
* 30-120 seconds — too long to block the conversation loop.
* - Browser integration: Required for web search (finding company URLs), screenshot capture
* (visual brand analysis), and logo extraction. The browser integration provides headless
* browser actions as integration-level tools usable from workflows.
* - Cerebras model for both autonomous and zai: Speed over depth. The conversational part
* is simple (ask for company, start extraction), and the heavy intelligence is in the
* workflow's vision analysis step which uses the cognitive API's "best" model directly.
* - Real-time progress: The workflow updates a custom message component in the chat UI as
* each step completes, giving users visual feedback without polling.
*/
import { defineConfig } from "@botpress/runtime";

export default defineConfig({
Expand Down
36 changes: 35 additions & 1 deletion examples/brand-extractor/bot/src/conversations/index.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,35 @@
/**
* @conversation Brand Extractor - Webchat Conversation
*
* WHY IT'S BUILT THIS WAY:
* This conversation handler implements the "workflow monitor" pattern. It has two concerns:
* 1. Chat with the user to understand what brand they want extracted
* 2. Monitor a background Workflow and update the UI when it finishes
*
* HOW THE CONVERSATION-WORKFLOW BRIDGE WORKS:
* - The conversation stores a Reference.Workflow in state, which is a live pointer to a
* running workflow instance. On every handler invocation (every new user message), the
* handler checks if the workflow has reached a terminal state (completed/failed/timedout).
* - If terminal: it updates the progress UI component and clears the reference from state.
* - If still running: it skips — the workflow itself updates progress via direct message updates.
*
* WHY Reference.Workflow (not a workflow ID string):
* Reference.Workflow provides typed access to the workflow's status, input, and output
* directly from conversation state. Without it, you'd need to manually call the API to
* fetch workflow status on every message — Reference.Workflow does this automatically.
*
* WHY DYNAMIC TOOLS (tools as a function, not an array):
* The `tools` parameter is a function `() => [...]` that returns different tools based on
* whether an extraction is active. This prevents the LLM from calling start_extraction
* while one is already running, or stop_extraction when nothing is running. Dynamic tools
* are more reliable than instruction-based constraints because the LLM physically cannot
* call a tool that isn't in its tool list.
*
* WHY messageId IS STORED IN STATE:
* The progress UI is a custom message component that gets updated in-place (not new messages).
* The messageId connects the conversation to the specific message being updated by the
* workflow, so the conversation can do a final status update when the workflow terminates.
*/
import {
adk,
Autonomous,
Expand All @@ -19,7 +51,9 @@ export const Webchat = new Conversation({
extraction: Reference.Workflow("brand_extraction").optional(),
}),
handler: async ({ execute, conversation, state }) => {
// Check workflow status on every handler call if we have an active extraction
// Check workflow status on every handler call if we have an active extraction.
// This is the "workflow monitor" pattern — on each user message, we check if the
// background workflow has finished and update the UI accordingly.
if (state.extraction && state.messageId) {
const workflowStatus = state.extraction.workflow.status;
const workflowInput = state.extraction.workflow.input;
Expand Down
41 changes: 40 additions & 1 deletion examples/brand-extractor/bot/src/workflows/index.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,41 @@
/**
* @workflow BrandExtractionWorkflow
* @pattern Durable Multi-Step Pipeline with Real-Time Progress Updates
*
* WHY THIS IS A WORKFLOW (not inline in the conversation):
* Brand extraction is a 6-step pipeline that takes 30-120 seconds. Workflows in ADK are
* durable — each `step()` is checkpointed, so if the process crashes mid-extraction, it
* resumes from the last completed step rather than restarting. This is critical for a
* pipeline that makes expensive API calls (web search, screenshots, vision analysis).
*
* THE 6-STEP PIPELINE AND WHY EACH STEP EXISTS:
* 1. find-website: Resolves company name -> URL via web search (skipped if user gave URL)
* 2. discover-pages: Finds important pages beyond homepage using site: search + zai.filter
* (WHY: A single homepage screenshot may not capture the full brand palette — product
* pages, about pages, etc. often use different brand colors)
* 3. extract-logo: Gets logo via domain-based logo API (non-critical — continues on failure)
* 4. screenshot: Captures screenshots of all discovered pages IN PARALLEL via step.map
* (WHY step.map: Each screenshot is independent and takes 2-5 seconds; parallelizing
* 3-5 screenshots cuts total time from 15s to 5s)
* 5. extract-brand: Vision analysis of ALL screenshots using the cognitive API's "best"
* model, then zai.extract to structure the natural language description into typed data
* (WHY two-phase: Vision model excels at describing what it sees in natural language;
* zai.extract excels at structuring text into Zod schemas. Combining them is more
* reliable than asking vision to directly output structured JSON)
* 6. finalize: Assembles all extracted data and marks the progress UI as complete
*
* WHY extractPaletteScript IS INJECTED INTO SCREENSHOTS:
* The browser integration's captureScreenshot accepts a JavaScript payload that runs on
* the page before capture. This script extracts CSS color values from the page's stylesheets
* and renders them as a color bar overlay at the top of the screenshot. This gives the
* vision model exact HEX values to read, rather than trying to eyeball colors from pixels
* (which is unreliable for subtle shades).
*
* WHY 10-MINUTE TIMEOUT:
* The pipeline involves network-dependent steps (web search, screenshots, logo fetch).
* Under normal conditions it completes in 30-90 seconds, but slow websites or retries
* can extend this. 10 minutes provides generous headroom without allowing runaway workflows.
*/
import { Workflow, z, actions, context, adk } from "@botpress/runtime";
import {
updateBrandProgressComponent,
Expand All @@ -9,7 +47,8 @@ import {
} from "../utils/progress-component";
import extractPaletteScript from "../utils/extract-palette-script";

// Schema for final brand theme extraction
// Schema for final brand theme extraction — Zod schema used by zai.extract to structure
// the vision model's natural language description into typed brand data
const BrandThemes = z.object({
lightTheme: ColorTheme,
darkTheme: ColorTheme,
Expand Down
26 changes: 26 additions & 0 deletions examples/clause-extraction/bot/agent.config.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,29 @@
/**
* @agent Clause Extraction Agent
* @pattern File Upload -> Workflow Pipeline -> Interactive Q&A
*
* WHY THIS AGENT EXISTS:
* This agent extracts, categorizes, and risk-assesses contractual clauses from uploaded
* legal documents. It demonstrates the most complex ADK pattern: file upload handling,
* background workflow processing, database persistence, and post-extraction Q&A — all
* with real-time progress tracking.
*
* ARCHITECTURE DECISIONS:
* - Claude Sonnet for autonomous (not Cerebras): Legal contract analysis requires deep
* reasoning — understanding clause implications, risk assessment relative to which party
* the user represents, and nuanced categorization. Cerebras is fast but less reliable for
* complex legal reasoning. Sonnet was chosen for accuracy over speed.
* - Cerebras for zai: The zai model handles simpler tasks (text extraction from passages,
* summarization) where speed matters and reasoning depth is less critical.
* - bot + user state in config: Both are z.object({}) — intentionally empty. The real state
* lives in conversation state (uploaded files, workflow references, party selection).
* The empty bot/user state declarations exist to enable future extension without config
* migration.
* - No browser integration: Unlike brand-extractor and deep-research, this agent works with
* uploaded files (not web content), so it only needs the webchat integration for file uploads.
* - Tables for persistence: Extracted clauses and contracts are stored in ADK Tables (not
* just in-memory), enabling structured querying, filtering, and full-text search post-extraction.
*/
import { z, defineConfig } from "@botpress/runtime";

export default defineConfig({
Expand Down
48 changes: 45 additions & 3 deletions examples/clause-extraction/bot/src/conversations/index.ts
Original file line number Diff line number Diff line change
@@ -1,12 +1,54 @@
/**
* @conversation Clause Extraction - Webchat Conversation
*
* WHY IT'S BUILT THIS WAY:
* This conversation handler manages the full lifecycle of contract analysis:
* 1. FILE UPLOAD HANDLING: Intercepts file/bloc messages before the LLM loop
* 2. GUIDED WORKFLOW: Ensures the user specifies which party they represent (critical for risk)
* 3. BACKGROUND EXTRACTION: Launches a durable workflow for the heavy processing
* 4. POST-EXTRACTION Q&A: Provides query + summarize tools for interactive analysis
*
* KEY DESIGN PATTERNS:
*
* File processing before execute():
* File uploads are handled BEFORE the LLM autonomous loop (execute()). This is because
* file messages arrive as raw binary data that needs to be processed and stored — the LLM
* can't directly handle file bytes. By processing files first, we store the fileId in state
* and then the LLM works with file references, not raw data.
*
* uploadedFiles array (not single file):
* Files are tracked as an array in state to support users uploading multiple documents across
* a conversation. The array persists across workflow failures — if extraction fails, the user
* doesn't need to re-upload. Only the most recent file (.at(-1)) is used for new extractions.
*
* Dynamic tools (function, not array):
* Tools change based on whether an extraction is running. During extraction, the
* analyze_contract tool is hidden to prevent starting a second extraction. After extraction,
* all tools are available. This is enforced structurally (tool not in list) rather than
* through instructions alone.
*
* Dynamic instructions (buildInstructions function):
* The system prompt changes based on state — whether a file has been uploaded, whether the
* user has selected a party, etc. This gives the LLM exactly the right context for its current
* situation without overloading it with irrelevant instructions.
*
* WHY userParty is a required tool parameter (not just state):
* The party selection (party_a vs party_b) is a required input to analyze_contract rather
* than being read from state silently. This forces the LLM to explicitly ask the user and
* pass their answer, creating an auditable decision point. Risk assessment is subjective —
* the same clause can be "high risk" for one party and "low risk" for the other.
*
* SECURITY: userId scoping:
* The query_clauses tool always includes userId in its database filter, injected via closure
* (not via LLM input). This prevents the LLM from being prompt-injected into querying
* another user's clauses — the userId filter is hardcoded at tool creation time.
*/
import { Conversation, z, Autonomous, Reference } from "@botpress/runtime";
import ExtractClausesWorkflow from "../workflows/extract-clauses";
import { createExtractionProgressComponent } from "../utils/progress-component";
import { processFileMessage } from "../utils/file-upload";
import { createQueryClausesTool, createSummarizeClausesTool } from "../tools/clause-tools";

/**
* Main conversation handler for clause extraction bot
*/
export default new Conversation({
channel: ["webchat.channel"],
state: z.object({
Expand Down
Loading