feat(tools): add human-in-the-loop tool type#38
Conversation
Adds a fourth client tool kind that extends manual-tool semantics with two async hooks. `onToolCalled` runs when the model invokes the tool — returning a value short-circuits like `execute`, returning `null` pauses the loop like a manual tool. `onResponseReceived` runs on a later turn when an incoming `FunctionCallOutputItem` matches (by callId → function_call.name) a HITL tool, letting the tool post-process caller-supplied results before the model sees them. Keeps HITL control flow local to the tool definition instead of smeared across the caller.
- Route auto-resolvable checks through isAutoResolvableTool so pure-HITL turns actually enter the execution loop and invoke onToolCalled. - Propagate HITL pauses out of executeToolRound; the outer loop now persists pending calls under a new awaiting_hitl status and returns before issuing a follow-up request with missing outputs. - Scope onResponseReceived hooks to freshly-supplied outputs on resume so caller-supplied outputs hooked at init aren't re-hooked. - Preserve paused HITL calls in pendingToolCalls when they occur during approval resume instead of silently dropping them. - Include originalOutput alongside error when onResponseReceived throws, so the model can distinguish hook failure from tool-reported error. - Replace unsafe InputsUnion cast with a structurally-typed rewritten array.
There was a problem hiding this comment.
Posting as COMMENT: maintainer app returned 403 on APPROVE for this repo. Contents of the review are a positive (LGTM) verdict.
LGTM ✅
Clean implementation of HITL (human-in-the-loop) tools as a fourth client-tool kind. Design is coherent, execution semantics are well-specified, and the follow-up commit closes the exact gaps I'd have called out on the first commit.
What's shipping
- New
HITLTool/HITLToolFunctiontypes;tool()discriminates ononToolCalled onToolCalledreturns value → short-circuit (execute-like); returnsnull→ pause (manual-like)- Optional
onResponseReceivedpost-processes caller-suppliedFunctionCallOutputItembefore the model sees it ConversationStatusgains'awaiting_hitl', parallel to'awaiting_approval'- New guards:
isHITLTool,isAutoResolvableTool,isManualTool(last was previously unexported)
What convinced me
- Guard ordering — in both
tool()(L295) andexecuteTool(L401), HITL is checked before theexecute === false/hasExecuteFunctionbranches. Essential since HITL configs have noexecutefield and would otherwise match manual-tool shape. - Blast-radius migration is complete — every
hasExecuteFunctioncall site intool-orchestrator.ts(L81, L103) andmodel-result.ts(L611, L623, L639, L808, L1475) that gates "can this be auto-resolved" has been switched toisAutoResolvableTool. RemaininghasExecuteFunctionuses are definitional or intentionally scoped (e.g.executeToolL405 post-HITL routing). applyOnResponseReceivedHooksdefensive coding — non-array input returns unchanged; orphan outputs (no matchingfunction_call) pass through; non-JSON raw output is fed to the hook as-is; a thrown hook preserves the caller's original output alongside the error so the model can distinguish hook failure from tool error; returns same array reference when nothing changed.- Resume-side hook scoping (Fix #8) —
hookFreshToolOutputsrunsonResponseReceivedonly on fresh outputs on resume, never on outputs already persisted in message history. Avoids double-hooking. Locked in by a dedicated test. - Approved-HITL-that-pauses (Fix #9) —
processApprovalDecisionstrackshitlPausedIds; status precedenceawaiting_approval>awaiting_hitl>in_progressis correct. Locked in by test. - 16 tests cover factory guards,
executeHITLTool(value/null/throw/schema), dispatcher routing,applyOnResponseReceivedHooks(transform / passthrough / throw-with-originalOutput / non-JSON / orphan / no-hook), integration through ModelResult (auto-resolve, pause, resume transform), and state-machine pins for fixes #1/#2/#8/#9.
CI
lint, typecheck, unit-tests, e2e-tests, Prepare all green on HEAD dc2ba48. One Agent check is in_progress but non-blocking.
Minor observations (non-blocking, no changes requested)
isHITLTooluses'onToolCalled' in tool.function && typeof … === 'function'— correct, matches thetool()runtime discriminator exactly.applyOnResponseReceivedHooksshort-circuits onhookByName.size === 0(no hooks to apply) — keeps this hot path O(1) when no HITL tool has a hook.- The
InputsArrayItemelement type avoids anascast on rewrite — nice.
Nothing blocking. Shipping it.
There was a problem hiding this comment.
Pull request overview
This PR introduces a new human-in-the-loop (HITL) client tool variant that can either auto-resolve tool calls via an onToolCalled hook or pause execution for a human, plus an onResponseReceived hook to post-process tool outputs before they’re shown to the model.
Changes:
- Adds HITL tool types/guards (
isHITLTool,isAutoResolvableTool) and updates manual-tool detection to exclude HITL tools. - Extends the tool execution pipeline to support HITL pause semantics (
executeToolcan returnnull) and appliesonResponseReceivedoutput rewriting during init/resume flows. - Adds extensive unit/integration coverage for HITL behavior, including pause/resume and state-machine transitions.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/agent/tests/unit/hitl-tool.test.ts | Adds comprehensive unit + integration tests covering HITL creation, dispatch, pause/resume, and hook behavior. |
| packages/agent/src/lib/tool.ts | Adds HITL tool() overload and runtime factory branch discriminated by onToolCalled. |
| packages/agent/src/lib/tool-types.ts | Defines HITL tool interfaces/types, updates guards, and adds awaiting_hitl conversation status. |
| packages/agent/src/lib/tool-orchestrator.ts | Updates tool-loop gating to use isAutoResolvableTool (now includes HITL). |
| packages/agent/src/lib/tool-executor.ts | Adds executeHITLTool, updates executeTool to return nullable result, and introduces applyOnResponseReceivedHooks. |
| packages/agent/src/lib/model-result.ts | Integrates HITL pausing into ModelResult’s state machine and applies onResponseReceived hooks during init/resume. |
| packages/agent/src/index.ts | Exports new HITL types/guards and re-exports isManualTool. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Preserve content-array FunctionCallOutputItem shapes in the HITL
onResponseReceived pipeline instead of JSON.stringify'ing them; export
isContentArray from conversation-state for reuse.
- Require outputSchema on HITL tool configs and use it to validate
caller-supplied responses. Validation failures (and thrown hooks)
replace the output with {error, originalOutput}; executeHITLTool now
validates onToolCalled's return unconditionally.
- Stop re-processing historical function_call_output items at initStream.
When resuming from saved state, only freshly-supplied input items are
hooked; history's function_call items still power callId->name
resolution. Refactored hookFreshToolOutputs into applyHooksToFreshItems.
- Drop the onResponseReceived call on SDK-generated outputs in
continueWithUnsentResults. The hook is now strictly for caller-supplied
outputs, matching its documented semantics.
applyOnResponseReceivedHooks and executeToolRound grew cyclomatic complexity above 15 when HITL logic landed, tripping sentrux gate (complex functions 9 -> 11). Extract per-item helpers so the gate matches baseline again: - applyOnResponseReceivedHooks: move per-item hook/validate logic into computeHitlItemOutput + invokeOnResponseReceived; split map builders into buildHitlToolMap / buildCallIdToNameMap; parseRawFunctionCallOutput replaces the inline try/JSON.parse branch. - executeToolRound: move the output-for-model branching into computeToolOutputForModel; describeNonRecord factors out the nested typeof/Array.isArray ternary. Pure refactor — no behavior change. All 280 unit tests still pass; sentrux gate reports "No degradation detected".
Summary
onToolCalled(decides per-call whether to respond programmatically or pause for a human) andonResponseReceived(post-processes the caller-supplied result before the model sees it).onToolCalledreturning a value short-circuits the call like a regularexecute; returningnullpauses the loop like a manual tool, surfacing the function_call to the caller for manual resume.onResponseReceivedfires on a later turn when an incomingFunctionCallOutputItemcorresponds (bycallId → function_call.name) to a HITL tool. The returned value replaces theoutputsent to the model; throwing becomes{"error": ...}.Design decisions
onToolCalledon the config (noexecutefield at all)onToolCalledreturningnullonResponseReceivedtriggerinput, map eachfunction_call_output.callIdto its originatingfunction_call.name, dispatch to the matching tool{error}output sent to the modelWhat changed
packages/agent/src/lib/tool-types.ts—HITLToolFunction,HITLTool, new guards (isHITLTool,isAutoResolvableTool);isManualTooltightened to exclude HITL;ClientToolunion widened.packages/agent/src/lib/tool.ts— new factory overload + runtime branch (ordered before theexecute: falsecheck).packages/agent/src/lib/tool-executor.ts—executeHITLTool;executeTooldispatcher now returnsToolExecutionResult | null; newapplyOnResponseReceivedHookshelper that walks input items and rewrites tool-output entries.packages/agent/src/lib/tool-orchestrator.ts—hasExecuteFunctiongates replaced withisAutoResolvableTool.packages/agent/src/lib/model-result.ts— same gate replacement in three sites; HITLnullhandled as a newpausedbranch (no output, no broadcast);applyOnResponseReceivedHooksinvoked on the initial-send input and on the resume-send input (but not on follow-up sends of our own tool outputs).packages/agent/src/index.ts— exportsHITLTool,HITLToolFunction,isHITLTool,isAutoResolvableTool, andisManualTool(previously unexported).packages/agent/tests/unit/hitl-tool.test.ts— 16 tests covering factory, guards, dispatcher, short-circuit, pause, transform, throw, non-JSON, orphan output, and missing-hook passthrough.Example
Test plan
pnpm --filter @openrouter/agent exec tsc --noEmit— no type errorspnpm --filter @openrouter/agent run lint— no lint errorspnpm --filter @openrouter/agent test— 264/264 tests pass (16 new HITL tests, no regressions in 248 pre-existing tests)pnpm turbo run build --filter=@openrouter/agent— build succeeds