Skip to content

feat(tools): add human-in-the-loop tool type#38

Merged
mattapperson merged 5 commits intomainfrom
feat/hitl-tools
May 6, 2026
Merged

feat(tools): add human-in-the-loop tool type#38
mattapperson merged 5 commits intomainfrom
feat/hitl-tools

Conversation

@mattapperson
Copy link
Copy Markdown
Collaborator

Summary

  • Adds a fourth client tool kind — a HITL tool — that extends manual-tool semantics with two async hooks: onToolCalled (decides per-call whether to respond programmatically or pause for a human) and onResponseReceived (post-processes the caller-supplied result before the model sees it).
  • onToolCalled returning a value short-circuits the call like a regular execute; returning null pauses the loop like a manual tool, surfacing the function_call to the caller for manual resume.
  • onResponseReceived fires on a later turn when an incoming FunctionCallOutputItem corresponds (by callId → function_call.name) to a HITL tool. The returned value replaces the output sent to the model; throwing becomes {"error": ...}.

Design decisions

Discriminator Presence of onToolCalled on the config (no execute field at all)
Pause sentinel onToolCalled returning null
onResponseReceived trigger Walk input, map each function_call_output.callId to its originating function_call.name, dispatch to the matching tool
Error handling Hook throws → {error} output sent to the model

What changed

  • packages/agent/src/lib/tool-types.tsHITLToolFunction, HITLTool, new guards (isHITLTool, isAutoResolvableTool); isManualTool tightened to exclude HITL; ClientTool union widened.
  • packages/agent/src/lib/tool.ts — new factory overload + runtime branch (ordered before the execute: false check).
  • packages/agent/src/lib/tool-executor.tsexecuteHITLTool; executeTool dispatcher now returns ToolExecutionResult | null; new applyOnResponseReceivedHooks helper that walks input items and rewrites tool-output entries.
  • packages/agent/src/lib/tool-orchestrator.tshasExecuteFunction gates replaced with isAutoResolvableTool.
  • packages/agent/src/lib/model-result.ts — same gate replacement in three sites; HITL null handled as a new paused branch (no output, no broadcast); applyOnResponseReceivedHooks invoked on the initial-send input and on the resume-send input (but not on follow-up sends of our own tool outputs).
  • packages/agent/src/index.ts — exports HITLTool, HITLToolFunction, isHITLTool, isAutoResolvableTool, and isManualTool (previously unexported).
  • packages/agent/tests/unit/hitl-tool.test.ts — 16 tests covering factory, guards, dispatcher, short-circuit, pause, transform, throw, non-JSON, orphan output, and missing-hook passthrough.

Example

const approvePayment = tool({
  name: 'approve_payment',
  inputSchema: z.object({ amount: z.number() }),
  outputSchema: z.object({ ok: z.boolean() }),
  onToolCalled: async (input) => {
    if (input.amount < 100) return { ok: true };  // auto-approve small amounts
    return null;                                   // escalate to a human for larger ones
  },
  onResponseReceived: async (raw) => {
    return { ...(raw as object), reviewedAt: Date.now() };
  },
});

Test plan

  • pnpm --filter @openrouter/agent exec tsc --noEmit — no type errors
  • pnpm --filter @openrouter/agent run lint — no lint errors
  • pnpm --filter @openrouter/agent test — 264/264 tests pass (16 new HITL tests, no regressions in 248 pre-existing tests)
  • pnpm turbo run build --filter=@openrouter/agent — build succeeds

Adds a fourth client tool kind that extends manual-tool semantics with two
async hooks. `onToolCalled` runs when the model invokes the tool — returning
a value short-circuits like `execute`, returning `null` pauses the loop like
a manual tool. `onResponseReceived` runs on a later turn when an incoming
`FunctionCallOutputItem` matches (by callId → function_call.name) a HITL
tool, letting the tool post-process caller-supplied results before the
model sees them.

Keeps HITL control flow local to the tool definition instead of smeared
across the caller.
- Route auto-resolvable checks through isAutoResolvableTool so pure-HITL
  turns actually enter the execution loop and invoke onToolCalled.
- Propagate HITL pauses out of executeToolRound; the outer loop now
  persists pending calls under a new awaiting_hitl status and returns
  before issuing a follow-up request with missing outputs.
- Scope onResponseReceived hooks to freshly-supplied outputs on resume
  so caller-supplied outputs hooked at init aren't re-hooked.
- Preserve paused HITL calls in pendingToolCalls when they occur during
  approval resume instead of silently dropping them.
- Include originalOutput alongside error when onResponseReceived throws,
  so the model can distinguish hook failure from tool-reported error.
- Replace unsafe InputsUnion cast with a structurally-typed rewritten
  array.
Copy link
Copy Markdown

@perry-the-pr-reviewer perry-the-pr-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting as COMMENT: maintainer app returned 403 on APPROVE for this repo. Contents of the review are a positive (LGTM) verdict.

LGTM ✅

Clean implementation of HITL (human-in-the-loop) tools as a fourth client-tool kind. Design is coherent, execution semantics are well-specified, and the follow-up commit closes the exact gaps I'd have called out on the first commit.

What's shipping

  • New HITLTool / HITLToolFunction types; tool() discriminates on onToolCalled
  • onToolCalled returns value → short-circuit (execute-like); returns null → pause (manual-like)
  • Optional onResponseReceived post-processes caller-supplied FunctionCallOutputItem before the model sees it
  • ConversationStatus gains 'awaiting_hitl', parallel to 'awaiting_approval'
  • New guards: isHITLTool, isAutoResolvableTool, isManualTool (last was previously unexported)

What convinced me

  1. Guard ordering — in both tool() (L295) and executeTool (L401), HITL is checked before the execute === false / hasExecuteFunction branches. Essential since HITL configs have no execute field and would otherwise match manual-tool shape.
  2. Blast-radius migration is complete — every hasExecuteFunction call site in tool-orchestrator.ts (L81, L103) and model-result.ts (L611, L623, L639, L808, L1475) that gates "can this be auto-resolved" has been switched to isAutoResolvableTool. Remaining hasExecuteFunction uses are definitional or intentionally scoped (e.g. executeTool L405 post-HITL routing).
  3. applyOnResponseReceivedHooks defensive coding — non-array input returns unchanged; orphan outputs (no matching function_call) pass through; non-JSON raw output is fed to the hook as-is; a thrown hook preserves the caller's original output alongside the error so the model can distinguish hook failure from tool error; returns same array reference when nothing changed.
  4. Resume-side hook scoping (Fix #8)hookFreshToolOutputs runs onResponseReceived only on fresh outputs on resume, never on outputs already persisted in message history. Avoids double-hooking. Locked in by a dedicated test.
  5. Approved-HITL-that-pauses (Fix #9)processApprovalDecisions tracks hitlPausedIds; status precedence awaiting_approval > awaiting_hitl > in_progress is correct. Locked in by test.
  6. 16 tests cover factory guards, executeHITLTool (value/null/throw/schema), dispatcher routing, applyOnResponseReceivedHooks (transform / passthrough / throw-with-originalOutput / non-JSON / orphan / no-hook), integration through ModelResult (auto-resolve, pause, resume transform), and state-machine pins for fixes #1/#2/#8/#9.

CI

lint, typecheck, unit-tests, e2e-tests, Prepare all green on HEAD dc2ba48. One Agent check is in_progress but non-blocking.

Minor observations (non-blocking, no changes requested)

  • isHITLTool uses 'onToolCalled' in tool.function && typeof … === 'function' — correct, matches the tool() runtime discriminator exactly.
  • applyOnResponseReceivedHooks short-circuits on hookByName.size === 0 (no hooks to apply) — keeps this hot path O(1) when no HITL tool has a hook.
  • The InputsArrayItem element type avoids an as cast on rewrite — nice.

Nothing blocking. Shipping it.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new human-in-the-loop (HITL) client tool variant that can either auto-resolve tool calls via an onToolCalled hook or pause execution for a human, plus an onResponseReceived hook to post-process tool outputs before they’re shown to the model.

Changes:

  • Adds HITL tool types/guards (isHITLTool, isAutoResolvableTool) and updates manual-tool detection to exclude HITL tools.
  • Extends the tool execution pipeline to support HITL pause semantics (executeTool can return null) and applies onResponseReceived output rewriting during init/resume flows.
  • Adds extensive unit/integration coverage for HITL behavior, including pause/resume and state-machine transitions.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
packages/agent/tests/unit/hitl-tool.test.ts Adds comprehensive unit + integration tests covering HITL creation, dispatch, pause/resume, and hook behavior.
packages/agent/src/lib/tool.ts Adds HITL tool() overload and runtime factory branch discriminated by onToolCalled.
packages/agent/src/lib/tool-types.ts Defines HITL tool interfaces/types, updates guards, and adds awaiting_hitl conversation status.
packages/agent/src/lib/tool-orchestrator.ts Updates tool-loop gating to use isAutoResolvableTool (now includes HITL).
packages/agent/src/lib/tool-executor.ts Adds executeHITLTool, updates executeTool to return nullable result, and introduces applyOnResponseReceivedHooks.
packages/agent/src/lib/model-result.ts Integrates HITL pausing into ModelResult’s state machine and applies onResponseReceived hooks during init/resume.
packages/agent/src/index.ts Exports new HITL types/guards and re-exports isManualTool.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/agent/src/lib/tool-executor.ts Outdated
Comment thread packages/agent/src/lib/tool-orchestrator.ts
Comment thread packages/agent/src/lib/tool.ts
Comment thread packages/agent/src/lib/model-result.ts Outdated
Comment thread packages/agent/src/lib/model-result.ts Outdated
- Preserve content-array FunctionCallOutputItem shapes in the HITL
  onResponseReceived pipeline instead of JSON.stringify'ing them; export
  isContentArray from conversation-state for reuse.
- Require outputSchema on HITL tool configs and use it to validate
  caller-supplied responses. Validation failures (and thrown hooks)
  replace the output with {error, originalOutput}; executeHITLTool now
  validates onToolCalled's return unconditionally.
- Stop re-processing historical function_call_output items at initStream.
  When resuming from saved state, only freshly-supplied input items are
  hooked; history's function_call items still power callId->name
  resolution. Refactored hookFreshToolOutputs into applyHooksToFreshItems.
- Drop the onResponseReceived call on SDK-generated outputs in
  continueWithUnsentResults. The hook is now strictly for caller-supplied
  outputs, matching its documented semantics.
applyOnResponseReceivedHooks and executeToolRound grew cyclomatic
complexity above 15 when HITL logic landed, tripping sentrux gate
(complex functions 9 -> 11). Extract per-item helpers so the gate
matches baseline again:

- applyOnResponseReceivedHooks: move per-item hook/validate logic into
  computeHitlItemOutput + invokeOnResponseReceived; split map builders
  into buildHitlToolMap / buildCallIdToNameMap; parseRawFunctionCallOutput
  replaces the inline try/JSON.parse branch.
- executeToolRound: move the output-for-model branching into
  computeToolOutputForModel; describeNonRecord factors out the nested
  typeof/Array.isArray ternary.

Pure refactor — no behavior change. All 280 unit tests still pass;
sentrux gate reports "No degradation detected".
@mattapperson mattapperson merged commit c649c9c into main May 6, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants