feat: KG-driven model routing with provider probing#761
Conversation
Create 10 ADF routing rule markdown files with route/action/priority/ synonyms directives for KG-based agent dispatch. Add action:: directive to RouteDirective for CLI command templates. Support multiple route/action pairs per file with backward-compatible route field. Refs #400 Co-Authored-By: Terraphim AI <noreply@terraphim.ai>
KgRouter loads routing rules from markdown taxonomy directory, builds thesaurus from synonyms, and uses terraphim_automata::find_matches for Aho-Corasick pattern matching against agent task descriptions. Returns KgRouteDecision with provider, model, action template, confidence, and ordered fallback routes. Supports health-aware fallback via first_healthy_route() and template rendering via render_action(). Refs #400 Co-Authored-By: Terraphim AI <noreply@terraphim.ai>
Add provider_probe.rs with ProviderHealthMap using CircuitBreaker from terraphim_spawner::health. Probes CLI tools via action:: templates from KG rules, measures latency, saves pi-benchmark compatible JSON results. Wire KG router into spawn_agent(): KG routing tried first (Aho-Corasick synonym match), with health-aware fallback skipping unhealthy providers. Falls back to existing keyword RoutingEngine when no KG match found. Add [routing] config section to OrchestratorConfig with taxonomy_path, probe_ttl_secs, probe_results_dir, and probe_on_startup fields. Refs #400 Co-Authored-By: Terraphim AI <noreply@terraphim.ai>
KgRouter now tracks the latest mtime of .md files in the taxonomy directory. reload_if_changed() compares current mtime against cached value and rebuilds the Aho-Corasick automaton if files have been modified. Called on the orchestrator's reconciliation tick for zero-restart routing updates. Refs #400 Co-Authored-By: Terraphim AI <noreply@terraphim.ai>
Documentation PreviewYour documentation changes have been deployed to: This preview will be available until the PR is closed. |
Fix D-1: replace deprecated std::io::Error::new(ErrorKind::Other, e) with std::io::Error::other(e) in provider_probe.rs. Add verification and validation report from V-model right-side review. Refs #400 Co-Authored-By: Terraphim AI <noreply@terraphim.ai>
Documentation PreviewYour documentation changes have been deployed to: This preview will be available until the PR is closed. |
D-2: probe_all() called on startup when probe_on_startup=true, and re-probed in reconcile_tick when cached results expire (TTL-based). Saves JSON results to configured probe_results_dir. D-3: ExitClassifier ModelError/RateLimit feeds record_failure() into provider circuit breaker. Success/EmptySuccess feeds record_success(). D-4: reload_if_changed() called every reconcile_tick, checks mtime of markdown files and rebuilds Aho-Corasick automaton if changed. D-5: Use sh -c for action template execution instead of split_whitespace, matching CommandStep::Shell pattern in tinyclaw. Handles quoted arguments correctly. Refs #400 Co-Authored-By: Terraphim AI <noreply@terraphim.ai>
Documentation PreviewYour documentation changes have been deployed to: This preview will be available until the PR is closed. |
The probe's sh -c doesn't have ~/.local/bin, ~/.bun/bin, ~/.cargo/bin on PATH where opencode and claude live. Use bash -lc (login shell) to source the user profile, matching the systemd ExecStart pattern. Refs #400 Co-Authored-By: Terraphim AI <noreply@terraphim.ai>
Documentation PreviewYour documentation changes have been deployed to: This preview will be available until the PR is closed. |
Replace bash -lc (which fails if .profile has errors) with bash -c plus explicit PATH prepend of ~/.local/bin, ~/.bun/bin, ~/bin, ~/.cargo/bin, ~/go/bin. Avoids broken .profile sourcing while ensuring CLI tools are discoverable. Refs #400 Co-Authored-By: Terraphim AI <noreply@terraphim.ai>
Documentation PreviewYour documentation changes have been deployed to: This preview will be available until the PR is closed. |
opencode requires 'run -m provider/model "prompt"' syntax.
All action templates now use {{ model }} placeholder from route
directive instead of hardcoding model names.
Refs #400
Co-Authored-By: Terraphim AI <noreply@terraphim.ai>
Use absolute paths for opencode (/home/alex/.bun/bin/opencode) and claude (/home/alex/.local/bin/claude). Add --format json to opencode. Replace pay-per-use opencode/ models with subscription providers: gpt-5-nano -> opencode-go/minimax-m2.5, minimax-m2.5-free -> minimax-coding-plan/MiniMax-M2.5. Refs #400 Co-Authored-By: Terraphim AI <noreply@terraphim.ai>
Validates 10 rules loaded, every route has action:: template, security_audit matches cargo audit/CVE, reasoning has priority 80, and multi-route fallback chains are present. Refs #400 Co-Authored-By: Terraphim AI <noreply@terraphim.ai>
Documentation PreviewYour documentation changes have been deployed to: This preview will be available until the PR is closed. |
Documentation PreviewYour documentation changes have been deployed to: This preview will be available until the PR is closed. |
Add e2e test verifying every ADF agent routes to expected provider+model via KG synonym matching. Fix multi-line synonyms: parser requires synonyms:: prefix on each line. All 12 agents route correctly. Refs #400 Co-Authored-By: Terraphim AI <noreply@terraphim.ai>
Documentation PreviewYour documentation changes have been deployed to: This preview will be available until the PR is closed. |
Expand all 10 routing rules from 2 to 4 routes each: - Coding tasks: +zai-coding-plan/glm-5-turbo +openai/gpt-5.3-codex - Reasoning tasks: +zai-coding-plan/glm-5 +openai/gpt-5.4 - Documentation/cost: +zai-coding-plan/glm-5-turbo +openai/gpt-5.4-mini All subscription providers only (no opencode/ pay-per-use prefix). E2e test updated: 12/12 agents route correctly with 4 fallbacks. Refs #400 Co-Authored-By: Terraphim AI <noreply@terraphim.ai>
Documentation PreviewYour documentation changes have been deployed to: This preview will be available until the PR is closed. |
Probe timeout/error marks provider unhealthy immediately, not after 5 failures. Probe success is authoritative over circuit breaker state. Mixed results: if ANY model succeeds for a provider, provider is healthy. This fixes the bug where kimi timed out in probe (30s) but was still selected as primary because circuit breaker threshold wasn't reached. Refs #400 Co-Authored-By: Terraphim AI <noreply@terraphim.ai>
Documentation PreviewYour documentation changes have been deployed to: This preview will be available until the PR is closed. |
Replace 10 category-based routing files with 3 tier files: - planning_tier.md (pri=80): opus for strategic planning, architecture - review_tier.md (pri=60): haiku for verification, validation, compliance - implementation_tier.md (pri=50): sonnet for coding, testing, security KG routing now takes priority over static model config in spawn_agent. Phase keywords in task text determine tier, not agent name. E2e test: 13/13 agents route to correct tier: - 2 agents -> PLANNING (opus): meta-coordinator, product-development - 5 agents -> REVIEW (haiku): spec-validator, quality-coord, compliance, drift-detector, merge-coordinator - 6 agents -> IMPLEMENTATION (sonnet): security-sentinel, test-guardian, implementation-swarm, documentation-gen, browser-qa, log-analyst Refs #400 Co-Authored-By: Terraphim AI <noreply@terraphim.ai>
Documentation PreviewYour documentation changes have been deployed to: This preview will be available until the PR is closed. |
When KG tier routing selects a model that uses a different CLI than the agent's static cli_tool (e.g., claude instead of opencode), extract the CLI path from the action:: template and use it for the Provider construction. This enables seamless routing across CLI tools. Refs #400 Co-Authored-By: Terraphim AI <noreply@terraphim.ai>
Documentation PreviewYour documentation changes have been deployed to: This preview will be available until the PR is closed. |
opencode run completes in ~11s but the full agent lifecycle (init, step_start, tool_use, step_finish, next_step, session_end) can take longer under load. 30s was too tight causing false-positive timeouts for kimi provider. Increase to 60s to match actual completion time. Refs #400 Co-Authored-By: Terraphim AI <noreply@terraphim.ai>
Documentation PreviewYour documentation changes have been deployed to: This preview will be available until the PR is closed. |
Remove ambiguous words (specification, research, design the, blueprint, triage, risk assessment) that appear in issue bodies and cause review agents to escalate to opus. Keep only unambiguous planning phrases like 'create a plan', 'architecture design', 'strategic planning'. Fixes quality-coordinator being routed to opus when reviewing an issue whose body contained planning language. Refs #400 Co-Authored-By: Terraphim AI <noreply@terraphim.ai>
Documentation PreviewYour documentation changes have been deployed to: This preview will be available until the PR is closed. |
Each non-review agent gets its own git worktree in /tmp/adf-worktrees/ before spawning. Review-tier agents (haiku) skip isolation since they are read-only. Worktrees are cleaned up after agent exit. Flow: create_agent_worktree() -> spawn with worktree as working_dir -> try_commit_agent_work(worktree) -> remove_agent_worktree() Prevents concurrent agents from corrupting each other's working tree. Fail-open: if worktree creation fails, agent uses shared working_dir. Fixes #246 Refs #400 Co-Authored-By: Terraphim AI <noreply@terraphim.ai>
Documentation PreviewYour documentation changes have been deployed to: This preview will be available until the PR is closed. |
Refs #400 Co-Authored-By: Terraphim AI <noreply@terraphim.ai>
Documentation PreviewYour documentation changes have been deployed to: This preview will be available until the PR is closed. |
Summary
route::+action::pairs withsynonyms::for Aho-Corasick matchingterraphim_spawner::health)action::templates for full-stack availability testingChanges
New files
docs/taxonomy/routing_scenarios/adf/-- 10 KG routing rule markdown filescrates/terraphim_orchestrator/src/kg_router.rs-- KG routing enginecrates/terraphim_orchestrator/src/provider_probe.rs-- Provider health probingModified files
crates/terraphim_types/src/lib.rs--RouteDirective.action,MarkdownDirectives.routescrates/terraphim_automata/src/markdown_directives.rs--action::directive parsing, multi-route supportcrates/terraphim_orchestrator/src/lib.rs-- KG routing in spawn_agent(), health gatecrates/terraphim_orchestrator/src/config.rs--[routing]config sectionDesign
action::template uses{{ model }}and{{ prompt }}placeholdersTest plan
cargo test -p terraphim_automata-- 90 tests (2 new for action/multi-route)cargo test -p terraphim_orchestrator-- 374 tests (11 new for kg_router + provider_probe)cargo check --workspace-- full workspace compiles[routing]config pointing to taxonomy dirRefs #400
Generated with Terraphim AI