From f440f9108d5f6ca1e45c54429c6bdc241760659a Mon Sep 17 00:00:00 2001
From: Nathan Schram <5553883+nathanschram@users.noreply.github.com>
Date: Wed, 22 Apr 2026 07:02:14 +0000
Subject: [PATCH 1/2] docs(claude.md): note v0.35.3rc1 staging + Claude
 extra_args feature
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Bump unit-test count 2372 → 2387 (reflects #407 +8 test_build_args tests
  and prior untracked test additions).
- Expand test_build_args.py entry 42 → 56 tests with the new coverage areas.
- Add extra_args passthrough feature entry under "Features (vs upstream
  takopi)" — documents the Claude-in-Chrome motivator, reserved-flag list,
  and argv placement (#407, shipped in v0.35.3rc1).

Issue progress tracked in gh#407 comment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 CLAUDE.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/CLAUDE.md b/CLAUDE.md
index 07fc0d9..2ebb409 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -47,6 +47,7 @@ Untether adds interactive permission control, plan mode support, and several UX
 - **Trigger visibility (Tier 1)** — `/ping` shows per-chat trigger summary (`⏰ triggers: 1 cron (id, 9:00 AM daily (Melbourne))`); run footer shows `⏰ cron:<id>` / `⚡ webhook:<id>` for trigger-initiated runs; new `describe_cron()` utility renders common patterns in plain English
 - **Graceful restart improvements (Tier 1)** — persists Telegram `update_id` to `last_update_id.json` so restarts don't drop/duplicate messages; `Type=notify` systemd integration via stdlib `sd_notify` (`READY=1` + `STOPPING=1`); `RestartSec=2`
 - **`diff_preview` plan bypass (#283)** — after user approves a plan outline via "Pause & Outline Plan", the `_discuss_approved` flag short-circuits diff preview for subsequent Edit/Write tools so no second approval is needed
+- **Claude `extra_args` passthrough (#407, v0.35.3rc1)** — `[claude] extra_args = [...]` lets users supply upstream CLI flags verbatim (mirrors `codex.extra_args`, `pi.extra_args`). Primary motivator: `extra_args = ["--chrome"]` enables Claude-in-Chrome's `mcp__claude-in-chrome__*` tool namespace on a GUI Mac. Flags Untether manages internally (`-p`, `--print`, `--output-format`, `--input-format`, `--resume`/`-r`, `--continue`/`-c`, `--permission-mode`, `--permission-prompt-tool`) are rejected at config-load with a `ConfigError`. User args land on argv after the managed stream-json prelude and before resume / model / effort / allowed-tools / permission flags, preserving the trailing `-p <prompt>` (or stdin prompt under permission-mode) position
 
 See `.claude/skills/claude-stream-json/` and `.claude/rules/control-channel.md` for implementation details.
 
@@ -180,7 +181,7 @@ Rules in `.claude/rules/` auto-load when editing matching files:
 
 ## Tests
 
-2372 unit tests, 80% coverage threshold. Integration testing against `@untether_dev_bot` is **mandatory before every release** — see `docs/reference/integration-testing.md` for the full playbook with per-release-type tier requirements (patch/minor/major). All integration test tiers are fully automated by Claude Code via Telegram MCP tools and Bash.
+2387 unit tests, 80% coverage threshold. Integration testing against `@untether_dev_bot` is **mandatory before every release** — see `docs/reference/integration-testing.md` for the full playbook with per-release-type tier requirements (patch/minor/major). All integration test tiers are fully automated by Claude Code via Telegram MCP tools and Bash.
 
 Key test files:
 
@@ -204,7 +205,7 @@ Key test files:
 - `test_pi_compaction.py` — 6 tests: compaction start/end, aborted, no tokens, sequence
 - `test_proc_diag.py` — 24 tests: format_diag, is_cpu_active, collect_proc_diag (Linux /proc reads), ProcessDiag defaults
 - `test_exec_runner.py` — 22 tests: event tracking (event_count, recent_events ring buffer, PID in StartedEvent meta), JsonlStreamState defaults
-- `test_build_args.py` — 42 tests: CLI argument construction for all 6 engines, model/reasoning/permission flags
+- `test_build_args.py` — 56 tests: CLI argument construction for all 6 engines, model/reasoning/permission flags, Claude `extra_args` argv ordering, permission-mode argv, multi-flag order, `build_runner` parsing, and reserved-flag rejection (#407)
 - `test_telegram_files.py` — 17 tests: file helpers, deduplication, deny globs, default upload paths
 - `test_telegram_file_transfer_helpers.py` — 48 tests: `/file put` and `/file get` command handling, media groups, force overwrite
 - `test_loop_coverage.py` — 29 tests: update loop edge cases, message routing, callback dispatch, shutdown integration

From 90a2df651870a79bb13cda8e783cbc1eb44b44ff Mon Sep 17 00:00:00 2001
From: Nathan Schram <5553883+nathanschram@users.noreply.github.com>
Date: Wed, 22 Apr 2026 08:02:26 +0000
Subject: [PATCH 2/2] =?UTF-8?q?chore(gitignore):=20untrack=20internal=20do?=
 =?UTF-8?q?cs=20=E2=80=94=20audits,=20test=20artifacts,=20handovers?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Public repo hygiene pass. Three classes of file shouldn't be committed
going forward:

1. docs/handover/ (new) — Claude Code handover docs that pass context
   between sessions. Internal-only by nature.
2. docs/audits/ — incident and security audits referencing production
   bot names and internal workflows. GitHub issues/milestones are the
   public tracker.
3. docs/tests/ — per-release integration test plans and execution
   reports. Contain internal bot references, skipped-test notes, and
   QA methodology detail that isn't user-facing. docs/reference/
   integration-testing.md remains the public playbook.
4. incoming/*.md — draft design and feedback markdown uploaded via
   Telegram file-transfer. Auto-named file_*.jpg already covered.

Files untracked (contents preserved on disk):
- docs/audits/pitchdocs-context-guard-interference.md
- docs/tests/v0.35.2-integration-test-plan.md
- docs/tests/results/v0.35.2-results.md
- docs/tests/results/v0.35.2rc3-results.md

mkdocs/zensical nav (zensical.toml) doesn't reference any of these
paths, so the docs site build is unaffected.

History note: these files remain visible in git history; removing
them entirely would require a separate BFG/filter-branch pass, which
is out of scope for this PR. The forward-going commitment is that
internal planning/audit/test-artifact content stops being tracked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .gitignore                                    |  12 +
 .../pitchdocs-context-guard-interference.md   | 136 -----
 docs/tests/results/v0.35.2-results.md         | 185 ------
 docs/tests/results/v0.35.2rc3-results.md      | 252 --------
 docs/tests/v0.35.2-integration-test-plan.md   | 552 ------------------
 5 files changed, 12 insertions(+), 1125 deletions(-)
 delete mode 100644 docs/audits/pitchdocs-context-guard-interference.md
 delete mode 100644 docs/tests/results/v0.35.2-results.md
 delete mode 100644 docs/tests/results/v0.35.2rc3-results.md
 delete mode 100644 docs/tests/v0.35.2-integration-test-plan.md

diff --git a/.gitignore b/.gitignore
index b3de2de..6547412 100644
--- a/.gitignore
+++ b/.gitignore
@@ -15,6 +15,18 @@ _site/
 docs/reference/changelog.md
 docs/plans/
 docs/promotion/
+# Internal-only docs — public repo hygiene. Plans, audits, handovers, and
+# per-release test artifacts reference production bot names, internal
+# processes, and draft design work. GitHub issues/milestones are the
+# public tracker.
+docs/handover/
+docs/audits/
+docs/tests/
+# Draft design/feedback files in incoming/ — auto-named uploads plus WIP
+# markdown that shouldn't be committed by default. Demo screenshots and
+# d1_*.* data files remain tracked individually.
+incoming/file_*.jpg
+incoming/*.md
 .envrc
 .claude/settings.json
 .claude/plans/
diff --git a/docs/audits/pitchdocs-context-guard-interference.md b/docs/audits/pitchdocs-context-guard-interference.md
deleted file mode 100644
index 47ddae4..0000000
--- a/docs/audits/pitchdocs-context-guard-interference.md
+++ /dev/null
@@ -1,136 +0,0 @@
-# Audit: PitchDocs Context Guard Interference with Untether
-
-**Date**: 2026-03-09
-**Severity**: Medium — causes content loss in Telegram sessions
-**Affected**: Untether Telegram bridge + PitchDocs Claude Code plugin (context-guard)
-
-## Incident
-
-A user in the BIP project chat (via Untether production, `@hetz_lba1_bot`) asked Claude Code to find and outline a backlinks document. Claude completed the task successfully (rc=0, 46.7s, 3 tool calls) but the user received only this 170-character response:
-
-> No files were modified in this interaction — I only read the backlinks doc and outlined it in the chat. The hook fired as a false positive. No context doc updates needed.
-
-The actual document outline was generated in an intermediate assistant turn but was **replaced** by this hook-response message in the final output. The user never saw the outline.
-
-## Root Cause
-
-Two compounding issues create the content loss:
-
-### 1. PitchDocs Stop hook false positive
-
-The `context-guard-stop.sh` hook (installed by PitchDocs `/context-guard install`) fires at session end and checks whether structural files were modified without corresponding context document updates.
-
-**The detection mechanism**:
-```bash
-CHANGED_FILES=$(git status --porcelain 2>/dev/null | awk '{print $NF}')
-```
-
-This checks ALL dirty files in the working tree — not just files modified in the current Claude Code session. In the BIP project, PitchDocs had been recently installed, leaving untracked infrastructure files:
-- `.claude/rules/context-quality.md` — matches structural pattern `.claude/rules/*.md`
-- `.claude/hooks/*` — hook scripts themselves
-- `.claude/settings.json` — plugin settings
-
-Meanwhile, `CLAUDE.md` had already been updated and committed in a previous session, so it appeared clean in `git status`. The hook logic:
-1. Found structural files dirty → `HAS_STRUCTURAL=true`
-2. Found no context docs dirty → `HAS_CONTEXT=false`
-3. Returned `"decision": "block"` with a nudge to update context docs
-
-**This is a false positive** — context docs were already up to date. The structural "changes" were just the hook infrastructure itself, not actual project structure changes.
-
-### 2. Content displacement in Untether
-
-When a Stop hook returns `"decision": "block"`, Claude Code gets one more turn to address the concern before stopping. In a terminal session this is fine — the user can scroll up to see earlier output. But in Untether's Telegram model:
-
-1. Intermediate assistant text appears as **progress message edits** (each new turn replaces the previous)
-2. The `result.result` text from the final `CompletedEvent` becomes the **persistent final message**
-3. If Claude's final turn addresses a hook concern instead of user-requested content, that meta-commentary becomes the only thing the user sees
-4. The actual content (the outline) was in an earlier turn and is lost
-
-## Cross-Project Comparison
-
-All 4 LBA projects with context-guard installed use **identical hook scripts**. The difference is git working tree state:
-
-| Project | Structural files dirty? | Context docs dirty? | Hook fires? | Hook blocks? |
-|---------|------------------------|-------------------|-------------|-------------|
-| **BIP** | YES — untracked `.claude/rules/context-quality.md` | NO — `CLAUDE.md` already committed | YES | **YES (false positive)** |
-| **Scout** | NO — only `scout-db-export.sql`, `test-probe` | N/A | NO — fast exit | No |
-| **Brand Copilot** | YES — 113 dirty files including structural | YES — `CLAUDE.md` also dirty | YES | **No** — context doc also dirty |
-| **littlebearapps.com** | N/A — no context-guard installed | N/A | N/A | N/A |
-
-**Pattern**: The false positive occurs when:
-1. PitchDocs infrastructure is freshly installed but not committed to git
-2. Context docs were already updated in a prior session (clean in `git status`)
-3. The current session is read-only (no actual file modifications)
-
-## PitchDocs Recommendations
-
-### P1: Add Untether session detection (high priority)
-
-Stop hooks that block at session end are fundamentally incompatible with Untether's single-message output model. The hook should detect Untether sessions and skip blocking.
-
-**Proposed change** in `context-guard-stop.sh`, after the `stop_hook_active` check:
-
-```bash
-# Skip blocking in Untether sessions — Stop hook blocks displace
-# user-requested content in the Telegram final message.
-[ -n "${UNTETHER_SESSION:-}" ] && echo '{}' && exit 0
-```
-
-`UNTETHER_SESSION` is set by Untether's runner environment for all Claude Code subprocess invocations.
-
-### P2: Fix false positive on hook infrastructure files (high priority)
-
-The hook should not trigger on its own infrastructure. Options:
-
-**Option A — Exclude hook infrastructure from structural check** (recommended):
-```bash
-case "$FILE" in
-  .claude/hooks/*) continue ;;          # Hook scripts themselves
-  .claude/settings.json) continue ;;     # Plugin settings
-  # ... existing structural patterns ...
-esac
-```
-
-**Option B — Use tracked-only file detection**:
-Replace `git status --porcelain` with `git diff --name-only` + `git diff --cached --name-only` to only check tracked files that were actually modified, excluding untracked new files.
-
-**Option C — Auto-commit infrastructure on install**:
-After `/context-guard install`, automatically `git add` and commit the hook infrastructure files so they don't pollute `git status` in subsequent sessions.
-
-### P3: Improve context doc freshness detection (medium priority)
-
-The current logic assumes that if context docs aren't dirty, they haven't been updated. But this fails when context docs were updated and committed in a previous session. A more robust check could:
-- Compare context doc last-modified timestamps against structural file timestamps
-- Check if context docs were updated in the last N commits
-- Use a marker file (`.claude/.context-guard-last-audit`) to track when context was last verified
-
-### P4: Reduce hook intrusiveness in read-only sessions (low priority)
-
-If the current session made no file modifications (all tool calls were Read, Grep, Glob, etc.), the Stop hook should not fire. This would require Claude Code to expose session-modified files to the hook, which isn't currently available.
-
-## Untether Recommendations
-
-### U1: Enhance preamble with hook awareness (implementing now)
-
-Add explicit guidance to the Untether preamble telling Claude that hook concerns must never displace user-requested content:
-
-```
-- If hooks fire at session end, your final response MUST still contain the user's
-  requested content. Hook concerns are secondary — briefly note them AFTER the main
-  content, never instead of it.
-```
-
-This is advisory and may not always be followed, but it gives Claude clear prioritisation guidance.
-
-### U2: Consider content accumulation (future — optional)
-
-A more robust approach would be to accumulate all assistant text from the session and include it in the final message, rather than only showing the `result.result` text. This would prevent content loss regardless of what the final turn contains. However, this would significantly change the message format and could make messages very long.
-
-## Hook Script Reference
-
-**File**: `context-guard-stop.sh` (PitchDocs v1.19.1)
-**Trigger**: Claude Code `Stop` event (session end)
-**Behaviour**: Returns `"decision": "block"` when structural files in `git status` have no matching context doc updates
-**Infinite loop guard**: Checks `stop_hook_active` flag — allows stop on second attempt
-**Structural patterns checked**: `commands/*.md`, `.claude/skills/*/SKILL.md`, `.claude/agents/*.md`, `.claude/rules/*.md`, `package.json`, `pyproject.toml`, `Cargo.toml`, `go.mod`, `tsconfig*.json`, `wrangler.toml`, `vitest.config*`, `jest.config*`, `eslint.config*`, `biome.json`, `.claude-plugin/plugin.json`
-**Context docs checked**: `CLAUDE.md`, `AGENTS.md`, `GEMINI.md`, `.cursorrules`, `.windsurfrules`, `.clinerules`, `.github/copilot-instructions.md`, `llms.txt`
diff --git a/docs/tests/results/v0.35.2-results.md b/docs/tests/results/v0.35.2-results.md
deleted file mode 100644
index 8423930..0000000
--- a/docs/tests/results/v0.35.2-results.md
+++ /dev/null
@@ -1,185 +0,0 @@
-# v0.35.2 integration test report — 2026-04-18
-
-**Dev bot version:** pyproject `0.35.1` (pre-bump) on origin/dev HEAD `fe7dbb3` (includes all v0.35.2 commits)
-**Engines:** claude 2.1.114, codex 0.121.0, opencode 0.0.55 (**archived binary — per #338**), pi 0.67.68, gemini 0.38.2
-**Skipped:** AMP (auth blocked, per user direction)
-
-Started: 2026-04-18T03:10Z (13:10 AEST)
-Completed: 2026-04-18T03:40Z (~30 min active testing plus investigation)
-
----
-
-## Tier 7 (command smoke) — PASS 65/65
-
-| Q# | Command | Claude | Codex | OpenCode | Pi | Gemini |
-|---|---|---|---|---|---|---|
-| Q1 | `/ping` | ✅ pong + uptime | ✅ | ✅ | ✅ | ✅ |
-| Q2 | `/config` | ✅ 10 buttons | ✅ 7 | ✅ 6 | ✅ 5 | ✅ 7 |
-| Q3 | `/usage` | ✅ full report | ✅ not available | ✅ not available | ✅ not available | ✅ not available |
-| Q4 | `/export` | ✅ no history | ✅ | ✅ | ✅ | ✅ |
-| Q5 | `/browse` | ✅ 3 dirs/19 files | ✅ 4 files | ✅ 2 dirs/17 files | ✅ 9 files | ✅ 5 files |
-| Q6 | `/verbose` | ✅ toggle | ✅ | ✅ | ✅ | ✅ |
-| Q7 | `/cancel` | ✅ nothing running | ✅ | ✅ | ✅ | ✅ |
-| Q8 | `/planmode` | ✅ toggle + reset | n/a | n/a | n/a | n/a |
-| Q9 | `/stats` | ✅ no sessions | ✅ | ✅ | ✅ | ✅ |
-| Q10 | `/ctx` | ✅ claude-test | ✅ codex-test | ✅ opencode-test | ✅ pi-test | ✅ gemini-test |
-| Q11 | `/agent` | ✅ claude default | ✅ codex | ✅ opencode | ✅ pi (model override cleared mid-test) | ✅ gemini |
-| Q12 | `/trigger` | ✅ all (default) | ✅ | ✅ | ✅ | ✅ |
-| Q13 | `/file` | ✅ usage help | ✅ | ✅ | ✅ | ✅ |
-
-**Engine-aware `/config` buttons working (rc4 #218 area):** Claude shows plan+ask+diff_preview+cost; Codex hides cost; OpenCode hides plan/ask; Pi hides plan/ask/cost; Gemini has cost+approval.
-
----
-
-## Tier 1 (universal, U1-U10) — partial matrix with blockers documented
-
-Matrix: 5 engines × 10 tests = 50 runs intended. Actual coverage:
-
-| U# | Claude | Codex | OpenCode | Pi | Gemini |
-|---|---|---|---|---|---|
-| U1 | ✅ opus 4.7 (1M) · plan · $0.74 · resume | ❌ **upstream bwrap sandbox** | ❌ **#338 archived binary `--format` mismatch** | ⚠️ created file — **V7 FAIL: no model in footer** | ⚠️ tokens shown, no USD (likely free tier) |
-| U2 | ✅ multiple progress phases | ❌ same sandbox error | N/A | ✅ no model in footer | ✅ list_directory only |
-| U3 | ✅ split 4/4 msgs, footer on last | skip | N/A | ✅ split 6-8 msgs | ✅ split 2/2 + **outbox `📎` delivery working** |
-| U4 | ✅ resume ok 18s | skip | N/A | ✅ resume ok 10s | ✅ resume ok |
-| U5 | skipped (config model flip) | skip | N/A | skip | skip |
-| U6 | partial — `/cancel` covered in Q7 | skip | N/A | skip | skip |
-| U7 | ✅ clean error, no traceback | skip | N/A | ✅ clean | ✅ clean |
-| U8 | covered via Q3 | covered | N/A | covered | covered |
-| U9 | covered via Q4 | covered | N/A | covered | covered |
-| U10 | covered via Q5 | covered | N/A | covered | covered |
-
-**Environmental blockers:**
-- **Codex:** every run fails `bwrap: loopback: Failed RTM_NEWADDR: Operation not permitted`. Environment-level sandbox issue on lba-1, not Untether. Documented as upstream. Error handling is clean (no crash, friendly message).
-- **OpenCode:** binary 0.0.55 (just reinstalled, archived repo per #338) uses `-p`/`-f json` CLI; Untether's runner emits `run --format json`. Every run fails `unknown flag: --format`. Treat as **N/A / upstream deprecation documented in #338**. Error handling is clean.
-
----
-
-## Tier 2 (Claude interactive) — 5/7
-
-| C# | Test | Result | Notes |
-|---|---|---|---|
-| C1 | Approve bash | ⚠️ N/A | `allowed_tools=["Bash","Read"]` in config auto-approves Bash — no buttons shown. Not a bug; test assumption needs updating. |
-| C2 | Deny bash/edit | ✅ | "Denied permission request" shown; Claude processed deny cleanly. |
-| C3 | Pause & Outline Plan | skipped | Time budget. Plan outline flow indirectly covered by approval flow in C5/C6. |
-| C4 | AskUserQuestion | ✅ | Exercised by V2 run — Claude invoked AskUserQuestion, 5 option buttons rendered, press_inline_button worked, answer fed back to Claude. |
-| C5 | Diff preview | ✅ | Approval message showed `📝 greetings.txt / - hello world / - goodbye world`. Approved → Edit executed. |
-| C6 | Rapid approve→deny (#197) | ✅ | Approved first.txt (03:38:38), denied second.txt (03:38:50). Both processed cleanly, no stale button, no spinner hang. `_HANDLED_REQUESTS` LRU fix working. |
-| C7 | Subscription footer | ✅ | Exercised by V10.1 — footer showed `💰$0.07 · 1 tn · 2.4s · 6/30` + `⚡ 5h: 28% \| 7d: 22%`. |
-
----
-
-## v0.35.2 scenarios (V1-V15)
-
-| V# | Issue | Result | Evidence / Notes |
-|---|---|---|---|
-| V1 | #196 bot_token mask | ✅ PASS | `grep -iE "8678330610:[A-Za-z0-9_-]{20}"` — no raw token in logs. Log shows `url=https://api.telegram.org/bot[REDACTED]/sendMessage`. |
-| V2 | #198 env allowlist | ✅ PASS (both) | **Pi**: printenv revealed only allowlisted vars (PATH/HOME/LANG/CI/NO_COLOR/SSH_AUTH_SOCK/OPENAI_API_KEY/UNTETHER_CONFIG_PATH/XDG_RUNTIME_DIR…); AWS_ACCESS_KEY_ID/DATABASE_URL/STRIPE_API_KEY all empty. **Claude**: same AWS/DB/STRIPE filtering confirmed. BWS_ACCESS_TOKEN seen in Claude's tool output — traced to `~/.bashrc export` re-exporting it in bash subprocess; not a leak via Untether's env hook. |
-| V3 | #199 Codex HTML escape | N/A | Could not naturally trigger a codex auth/HTML error during run (codex fails earlier at bwrap sandbox). Marked N/A per plan note. |
-| V4 | #201 Dispatch sanitisation | ✅ PASS | `/ctx set` → "error: usage: /ctx set <project> [@branch]" (no traceback, no path). `/file get /etc/passwd` → "invalid download path." Both clean. *Cosmetic:* `/ctx set` has duplicated "usage:" string (minor). |
-| V5 | #203 Registry sweep | ✅ PASS (deferred) | Dev service uptime < 1h; no sweep events expected per plan ("sweep runs on 60-second stall-monitor tick but only prunes ≥1h old entries"). |
-| V6 | #204 download_file URL validation | ✅ PASS | No `download_file.rejected` / `download_file.invalid` events for legitimate file puts. |
-| V7 | #225 Pi model footer | ❌ **FAIL** | Pi footer shows `🏷 dir: pi-test` with no model despite fresh `/new` session, no `/model` override, no `pi.model` in TOML, and `provider = "openai-codex"`. Raw pi output confirms `model:"gpt-5.4"` in `message_end`. Fix works in isolation (unit tests + direct Python call) but not live. **Commented on #225 (comment 4272553435).** |
-| V8 | #247 callback.answered | ✅ PASS | `callback.answered command=aq early=True has_toast=True latency_ms=341.1 total_ms=359.8` — well under 2000ms. |
-| V9 | #275 Process tree cleanup | skipped | Time budget. FD count at suite end = 11, zombies = 0 — indirect evidence of clean cleanup. |
-| V10 | #316 Cost footer + parity | ⚠️ PARTIAL | **V10.1 Claude:** ✅ `💰$0.07 · 1 tn · 2.4s · 6/30` + `⚡ 5h: 28% \| 7d: 22%` both shown (after flipping `show_subscription_usage = true`). **V10.2 Gemini:** tokens rendered (`💰4.1s · 66.3k/66`), no USD — likely free tier, per plan acceptable. **V10.3 OpenCode:** N/A (#338). **V10.4 Cached:** not tested. |
-| V11 | #317 run_once cron | ✅ PASS | Added `v0352-test-once` cron with `run_once=true`, saved, restarted. Logs: `triggers.cron.firing cron_id=v0352-test-once` → `triggers.cron.run_once_completed cron_id=v0352-test-once remaining_crons=0`. `~/.untether-dev/run_once_fired.json` contains `{"fired":{"v0352-test-once":"2026-04-18T03:34:57+00:00"}}`. Telegram chat received `READY` response with footer `⏰ cron:v0352-test-once`. No re-fire observed. |
-| V12 | #318 Restart-required Telegram warning | ❌ **PARTIAL FAIL** | Flipped `session_mode` from `chat` → `stateless`. structlog event fired: `config.reload.transport_config_changed keys=['session_mode'] restart_required=True transport=telegram`. **No Telegram message sent** — code in `loop.py:1360-1366` calls only `logger.warning(...)`, no outbox write; `grep` for "Config reload" / "restart required" in `src/` returns zero hits. This is exactly the "Proposed Improvement #2" gap in issue #318. **Commented on #318 (comment 4272580505).** |
-| V13 | #320 Webhook port bind graceful | ✅ PASS | Port 9876 was organically held by `qgis.bin` pid 2113. On restart, Untether logged structured `triggers.server.bind_failed host=127.0.0.1 port=9876 error="Errno 98 address already in use" hint='Another process may be using this port. Check with: ss -tlnp \| grep 9876' fix='Set [triggers.server] port = <N> in untether.toml (current: 9876)'`. `/ping` confirmed bot alive (`🏓 pong — up 15s`). |
-| V14 | #322 Stuck-after-tool_result | ✅ PASS | `grep -E "stuck_after_tool_result\|progress_edits.stuck\|recovery"` across 70 minutes of runs — zero false positives on healthy runs. |
-| V15 | #330 Per-cron permission_mode | ✅ PASS | Cron with `permission_mode="auto"` fired in Claude chat (plan mode on). Log: `trigger.cron.permission_mode_override chat_permission_mode=plan engine=claude trigger_permission_mode=auto trigger_source=cron:v0352-test-once`. No approval buttons shown, run completed autonomously, footer showed `plan · ⏰ cron:v0352-test-once`. |
-
----
-
-## Tier 3 selective (T6, T8, S9) — SKIPPED
-
-- T6 (emoji entities) — skipped (time budget; no entity rendering failures observed during other runs).
-- T8 (stale button click) — skipped (requires 10+ min wait).
-- S9 (concurrent Approve clicks) — effectively covered by C6 rapid approve→deny demonstrating exactly-once handling.
-
----
-
-## Tier 5/6 (B5, S2, S3, S7)
-
-| # | Test | Result | Notes |
-|---|---|---|---|
-| B5 | Log sweep | ✅ PASS with caveats | See log findings below. |
-| S2 | Concurrent sessions | partial | Gemini + Claude ran concurrently multiple times during V2/V10 phases with no cross-contamination observed. Not formally run. |
-| S3 | `/restart` mid-run | skipped | Service was restarted mid-suite (twice: once for branch switch, once to init triggers) — both drained and resumed cleanly; drain semantics implicitly verified. |
-| S7 | Rapid-fire | ✅ PASS | 5 rapid prompts sent in ~2s. Only latest (`rapid 5`) triggered `handle.incoming` — others coalesced cleanly (per forward-coalescing). Exactly one session lock acquired. No crash. |
-
----
-
-## Logs (B5)
-
-**Unexpected ERROR lines:** 3 total across the run —
-- 2× `telegram.http_error Bad Request: chat not found chat_id=123` — caused by the `[transports.telegram] chat_id = 123` placeholder in dev cfg; attempts to announce startup to the placeholder chat. Untether's own `project.skipped.chat_id_matches_transport alias=z80 reason='must not match transports.telegram.chat_id'` correctly skips the z80 project but the placeholder sends still fail. **Not a new bug — cosmetic dev-only annoyance.**
-- 1× `opencode.process.failed rc=1` — expected, from the --format flag incompatibility with archived opencode 0.0.55.
-
-**Notable WARNING lines:**
-- **Gemini `jsonl.msgspec.invalid error='JSON is malformed: invalid character (byte 0)'`** — **seen on EVERY Gemini run (7+ instances)**. Not a crash, but the runner discards one or more JSONL lines per run. Worth a separate issue if not already tracked. Runs complete successfully despite this.
-- 2× `transport.send.failed chat_id=123 text_len=320/361` — same placeholder chat issue as above.
-- 1× `projects.config.skipped_projects skipped=['z80']` — expected skip.
-
-**FD count after suite:** 11 (untether PID 1065954).
-**Zombies:** 0.
-
----
-
-## Bugs filed / commented
-
-### Commented on existing issues
-- **#225** (Pi model footer) — **comment 4272553435**: Pi footer regression despite PR #327 merged. Full reproducer, raw pi JSONL capture, and isolated code path validation. Live behavior fails.
-- **#318** (Restart-required warning) — **comment 4272580505**: structlog event works, Telegram-visible message not implemented (Proposed Improvement #2 from the issue body is the gap).
-
-### New issues filed in v0.35.2 milestone
-- None filed. Additional findings worth considering but not filed without user direction:
-  - **Gemini `jsonl.msgspec.invalid` on every run** — malformed JSON line rejected, run completes. Candidate for a new issue.
-  - **OpenCode `--format` flag vs archived 0.0.55 binary** — documented in #338 but worth an explicit incompat note in the runner or README.
-  - **Cosmetic: `/ctx set` error text has duplicated "usage:" substring** — minor.
-
----
-
-## Release readiness
-
-**Verdict at 2026-04-18 (rc3): 🟡 NO-GO pending two blockers, and one fix-forward.**
-
-**Blockers (resolved in subsequent rcs — see addendum below):**
-1. **#225 Pi model footer regression (V7 FAIL)** — the headline fix of PR #327 does not surface the model in the footer live. Unit tests pass; live does not. Something between `translate_pi_event`'s supplementary `StartedEvent` emission and the footer render is broken. Needs investigation before tag.
-2. **#318 V12 Telegram warning missing (partial)** — the issue was closed as "completed" but Proposed Improvement #2 (visible Telegram message) is not in the code. Closure is inconsistent with scope. Either re-open and ship the Telegram notification, or close-out with `status=partial` and update the issue body.
-
-**Fix-forward acceptable for this cut:**
-- OpenCode #338 documented deprecation — no new Untether bug introduced; archived binary is upstream.
-- Codex bwrap sandbox — environment/lba-1 issue, unrelated to Untether.
-- Gemini `jsonl.msgspec.invalid` — non-fatal; worth triaging post-release.
-
-**Everything else v0.35.2 shipped (V1, V2, V4, V5, V6, V8, V10.1, V11, V13, V14, V15) is working as designed.**
-
-Recommend: address V7/#225 before bumping pyproject to 0.35.2 and tagging. V12/#318 can ship with updated scope / follow-up issue.
-
----
-
-## 📌 Addendum (2026-04-20, release-prep on f676d0e)
-
-Both rc3 blockers landed before the release tag:
-
-- **#225 Pi footer (V7) — FIXED** in PR #339 (commit `efa60a0`). The supplementary `StartedEvent` was being silently dropped by `JsonlSubprocessRunner.handle_started_event` as a same-session duplicate. The filter now emits duplicates through when the event carries `meta`; true duplicates (no meta) are still dropped. Live-verified on `@untether_dev_bot` — Pi footer shows the configured default model. Issue closed.
-- **#318 restart-required visible warning — FIXED** in PR #336 + follow-up commit in PR #339. `_notify_restart_required` now broadcasts to every project chat and admin DM (the original `cfg.chat_id` send path failed silently in project-routed deployments). Live-verified on `@untether_dev_bot`. Issue closed.
-
-Both fixes are ancestors of the release commit `f676d0e` (`git merge-base --is-ancestor efa60a0 f676d0e` → ancestor). The CodeRabbit review on PR #373 flagged this addendum as needed because the original verdict above predated the rc4 fixes — verdict is now **🟢 GO** with all v0.35.2 milestone issues closed (31 closed, 0 open).
-
----
-
-## Known limitations of this run
-
-- OpenCode tests shifted to N/A after `--format` flag incompatibility surfaced — tests weren't repeated across the suite; 4-engine matrix instead of 5.
-- Codex bwrap sandbox blocked every run — reported failures consistent with environment, not Untether.
-- Tier 3 T6 (emoji entities), T8 (stale button 10-min wait), and V9 (process tree with node workerd) skipped for time.
-- C1 (Bash approval flow) is auto-approved by current `allowed_tools=["Bash","Read"]` config — marked N/A rather than tested after config edit.
-- Service was restarted twice during testing (branch switch to origin/dev, trigger initialization) — no cross-test contamination observed.
-- AMP per user direction.
-
-## Config state at end
-
-Reverted `session_mode = "chat"` and removed test cron. Kept `show_subscription_usage = true` (opened for V10.1 verification; kept because the plan note in #316 suggests this is the preferred state). Backups at `~/.untether-dev/untether.toml.bak-v10`, `.bak-v11v15`, `.bak-v12`.
-
-Branch checkout: currently detached HEAD at `fe7dbb3` (origin/dev). `feature/198-env-allowlist` local branch preserved — user's original working branch.
diff --git a/docs/tests/results/v0.35.2rc3-results.md b/docs/tests/results/v0.35.2rc3-results.md
deleted file mode 100644
index 69d47bb..0000000
--- a/docs/tests/results/v0.35.2rc3-results.md
+++ /dev/null
@@ -1,252 +0,0 @@
-# v0.35.2rc3 integration test report — 2026-04-19
-
-**Run window:** 2026-04-19 03:22:58Z → 03:51:37Z (~29 minutes; plan budgeted ~2.5h, abbreviated due to rate-limit back-off and sandbox-blocked config edits)
-**Dev bot:** `@untether_dev_bot`, `untether-dev.service`, PID 1021286, editable install of commit `2e231d8` on branch `dev` (+ `git pull --ff-only` from `feature/346-wedge-detector-awareness`)
-**Dev bot version:** `0.35.2rc3` (confirmed via `--version` after `pip install -e .` refresh — editable install reported stale rc2 until refreshed)
-**Plan:** `/home/nathan/.claude/plans/please-use-the-0-35-2rc3-twinkling-crayon.md` (§§1–10)
-
-**Engines (CLI versions):**
-- Claude Code `2.1.114`
-- Codex CLI `0.121.0`
-- OpenCode `0.0.55` (wrapper script at `~/.local/bin/opencode` that requires `BWS_ACCESS_TOKEN` — systemd unit doesn't provide it, so OpenCode fails on every run)
-- Pi `0.67.68`
-- Gemini `0.38.2`
-
----
-
-## Headline
-
-**Release readiness: NO-GO for final v0.35.2 — recommend rc4.**
-
-Primary blocker: **[#361](https://github.com/littlebearapps/untether/issues/361)** — a host credential (`BWS_ACCESS_TOKEN`) reaches Claude's Bash-tool subprocess despite `utils/env_policy.py` excluding it. #198 was the headline security fix of this release; its promise doesn't hold for Claude under realistic use. Pi is clean.
-
-Secondary finding: **[#362](https://github.com/littlebearapps/untether/issues/362)** — `/at` scheduled runs bypass the chat's project default engine and fall through to the global default. Functional, not blocking.
-
-All other rc3-specific scenarios (V16–V20) pass or are `N/A` for log-path reasons documented below.
-
----
-
-## Tier 7 — command smoke (5 engines × 13 commands = 65 interactions)
-
-**Result: 65/65 PASS** — every command responded cleanly, no crashes, no unexpected replies.
-
-| Command | Claude | Codex | OpenCode | Pi | Gemini |
-|---|---|---|---|---|---|
-| `/ping` | ✓ pong + uptime | ✓ | ✓ | ✓ | ✓ |
-| `/config` | ✓ menu with Plan/Ask/Diff/Verbose/Cost/Resume/Trigger/Engine&Model/Effort/About | ✓ codex-specific menu | ✓ opencode-specific menu | ✓ pi-specific menu | ✓ gemini-specific menu |
-| `/usage` | ✓ subscription table (5h/7d/Sonnet/Extra) | ✓ "not available for codex" | ✓ "not available" | ✓ "not available" | ✓ "not available" |
-| `/export` | ✓ "no session" (no sessions run yet) | ✓ | ✓ | ✓ | ✓ |
-| `/browse` | ✓ dir listing w/ buttons | ✓ | ✓ | ✓ | ✓ |
-| `/verbose` | ✓ | ✓ | ✓ | ✓ | ✓ |
-| `/cancel` | ✓ "nothing running" | ✓ | ✓ | ✓ | ✓ |
-| `/planmode` (Claude only) | ✓ toggle confirmation | n/a | n/a | n/a | n/a |
-| `/stats` | ✓ | ✓ | ✓ | ✓ | ✓ |
-| `/ctx` | ✓ resolved ctx | ✓ | ✓ | ✓ | ✓ |
-| `/agent` | ✓ | ✓ | ✓ | ✓ | ✓ |
-| `/trigger` | ✓ | ✓ | ✓ | ✓ | ✓ |
-| `/file` | ✓ usage help | ✓ | ✓ | ✓ | ✓ |
-
----
-
-## Tier 1 — universal (abbreviated)
-
-Not a full U1-U10 × 5 engines matrix (plan budgeted 45 min; actual time did not allow). Equivalent coverage via:
-
-- **U1 (create file)** fired on Claude/Codex/OpenCode/Pi/Gemini
-- **U7 (error handling)** — Claude V2 prompt includes reading `/nonexistent/test-path` → graceful "File does not exist", no traceback
-- **U8 (/usage)** — covered in Tier 7
-- **U6 (cancel)** — naturally tested when Gemini hung; `/cancel` produced clean `session.summary cancelled=True`, subprocess killed, no orphans
-
-| Engine | U1 result | Footer | Notes |
-|---|---|---|---|
-| Claude | ✓ (via V2, V17, V20 runs) | `🏷 dir: claude-test \| opus 4.7 (1M) · xhigh · plan/acceptEdits` · `💰...` · `⚡ 5h: N% \| 7d: N%` | V17 xhigh wired through |
-| Codex | ✓ 2+2 = 4 (clean); U1 file-create blocked by Codex's own sandbox (not rc3) | `🏷 dir: codex-test \| codex-mini-latest` | |
-| OpenCode | ✗ `BWS_ACCESS_TOKEN: unbound variable` — pre-existing env gap in dev systemd (`~/.local/bin/opencode` wrapper needs BWS). **Not rc3 related.** | n/a | Pre-existing per 2026-04-18 logs |
-| Pi | ✓ 2+2 (8s); hello.txt (9s) | `🏷 dir: pi-test \| gpt-5.4` | **V7 (#225) confirmed: model name in footer from JSONL** |
-| Gemini | ⚠ upstream slowness — 2+ mins to emit first event on trivial prompts; cancelled cleanly | `🏷 dir: gemini-test` (starting) | Not an Untether regression |
-
----
-
-## Tier 2 — Claude interactive (abbreviated)
-
-Claude Code `2.1.114`'s plan mode refuses `Write` at the text level without ever calling the `Write` tool, so the classic C1/C2/C5 Approve/Deny/Diff-preview flow does **not** fire on simple write prompts. Coverage adjusted:
-
-| # | Result | Notes |
-|---|---|---|
-| C1 (tool approval) | ✓ by design | `ls -la` auto-approved because `[engines.claude] allowed_tools = ["Bash","Read"]`. No regression — matches config intent. |
-| C2 (deny) | n/a | Same reason as C1 |
-| C3 (plan outline) | ✓ historical | Prior cron fire (msg 54399, 2026-04-18) shows outline flow + ExitPlanMode approve path working. |
-| **C4 (AskUserQuestion + option buttons)** | **✓ full live pass** | Claude emitted AskUserQuestion during the "add division to calculator" prompt; 5 option buttons rendered (`Floor division //`, `Integer-only divide`, `Didn't realise it exists`, `Other (type reply)`, `cancel`); pressed one → `callback.answered early=True has_toast=True latency_ms=217ms`; Claude continued and completed cleanly. |
-| C5 (diff preview) | n/a | Would require Claude to actually call Edit/Write — plan mode declines at text level. |
-| C6 (rapid approve/deny) | n/a | Same reason. |
-| C7 (/usage subscription) | ✓ | Tier 7 Q3: `⚡ 5h: 45% · Weekly 27% · Sonnet 0% · Extra $31,824 used`. |
-
----
-
-## v0.35.2 scenarios (V1–V15 — rc1/rc2 scope)
-
-| # | Issue | Result | Evidence |
-|---|---|---|---|
-| V1 | #196 `bot_token` → SecretStr | ✓ | `journalctl` over 15 min window: 0 matches for bare token regex `\d{5,}:[A-Za-z0-9_-]{35}`. Token appears masked as `bot***` in all HTTP error logs. |
-| V2-pi | #198 env allowlist (Pi) | ✓ | Pi printenv: `BWS=<empty>`, `AWS=<empty>`, `DB=<empty>`, `STRIPE=<empty>`; only allowlisted keys present. |
-| **V2-claude** | **#198 env allowlist (Claude)** | **✗ FAIL → [#361](https://github.com/littlebearapps/untether/issues/361)** | Claude's Bash tool observes `BWS_ACCESS_TOKEN=<real value>` despite `env_policy.py` not including BWS. Same test on Pi was clean. Token value redacted in this report — real value reached Telegram transcript + journald + /tmp and **should be rotated**. |
-| V3 | #199 Codex HTML escape | n/a | Could not force Codex auth error on demand during this window. |
-| V4 | #201 dispatch sanitisation | ✓ | `/ctx set` (no args) → friendly "usage:" reply, no traceback; `/file get /etc/passwd` → "invalid download path.", no path disclosure. |
-| V5 | #203 registry sweep (1h TTL) | ✓ (smoke) | 0 spurious sweep events; test window < 1h so no fire expected. |
-| V6 | #204 download_file URL validation | ✓ (smoke) | 0 `download_file.rejected` events on legit paths during test. |
-| **V7** | **#225 Pi model footer from JSONL** | **✓ live confirmed** | `🏷 dir: pi-test \| gpt-5.4` — `gpt-5.4` pulled from Pi `message_end` JSONL (no `pi.model` override in cfg). |
-| **V8** | **#247 `callback.answered` instrumentation** | **✓ full live pass** | Multiple fires during /config menu navigation + C4 AskUserQuestion: `latency_ms` 188–219ms (all <2000ms), `early=True` for non-toggle actions, `has_toast=True` when toast shown, `command=config`/`command=aq` both recorded. |
-| V9 | #275 process tree cleanup | ✓ (smoke) | No orphan workerd/vitest/node processes after test window; untether-dev children count = 0 at teardown; Gemini /cancel killed node subprocess cleanly in 10s (SIGTERM → SIGKILL escalation). |
-| V10 | #316 cost footer parity | ✓ (partial live) | **Claude**: `💰$0.93 · 1 tn · 4.0s · 6/30 · ⚡ 5h: 51% · 7d: 27%` — both API cost and subscription usage rendered. **Pi**: `🏷 dir: pi-test \| gpt-5.4` (no cost line, expected — Pi uses provider). **Codex**: `🏷 codex-test \| codex-mini-latest` clean. **Gemini**: didn't complete successfully this run; historical msg 54404 shows `💰4.1s · 66.3k/66`. **OpenCode** V10.3 not exercisable (env-blocked). |
-| V11 | #317 `run_once` cron persistence | ✓ historical | Prior cron `v0352-test-once` (2026-04-18 03:34:57) fired once, didn't re-fire across the 2026-04-18 → 2026-04-19 restart cycle. Sandbox blocked new cron-config edit this run. |
-| V12 | #318 restart-required warning | ✓ historical | Chat history shows 8 `⚠️ Config reload` / `⟳ Setting session_mode changed` warnings across 2026-04-18 flip cycles — both forms (original + follow-up broadcast) visible, consistent with #318 + follow-up. Sandbox blocked new live edit. |
-| V13 | #320 webhook port graceful bind | n/a | Requires port squatter + restart — disruptive. Deferred. |
-| **V14** | **#322 stuck-after-tool_result no-false-positive** | **✓ live confirmed** | 0 `stuck_after_tool_result` events in the 30-min window despite: Claude rate-limiting 6×, a 20s background bash primitive, Gemini 4-min idle before /cancel, and multiple tool-use runs. |
-| V15 | #330 per-cron permission_mode | ✓ historical | Prior cron ran Claude plan-mode chat autonomously (no approval buttons in history) — consistent with `permission_mode = "auto"` override. Sandbox blocked new cron-config edit this run. |
-
----
-
-## RC3 scenarios (V16–V20 — rc3 additions)
-
-| # | Issue | Result | Evidence |
-|---|---|---|---|
-| **V16** | **#348 `/health` command** | **✓ live PASS (5 engines, partial render)** | All 5 engines return the same compact 6-line report: RAM / swap / untether pid+RSS+FDs+children / triggers status / today's API cost / uptime. **Missing vs plan expectations** (likely by design for idle state): no explicit live-sessions table, no stall-watchdog status row, no subprocess-by-type breakdown. "triggers: disabled" shown for disabled cfg — matches. Re-run after a live Claude session showed `children: 1` — process-tree count correctly updates. No tracebacks. |
-| **V17** | **#351 xhigh effort** | **✓ full live PASS** | `/config → 🧠 Effort` menu lists buttons: Low / Medium / High / **Xhigh** / Max / Clear override. Press on `Xhigh` → toast `Reasoning: xhigh`. Subsequent Claude spawn includes `'--effort', 'xhigh'` in `args=[...]` (confirmed in `subprocess.spawn` log). Telegram footer shows `🏷 dir: claude-test \| opus 4.7 (1M) · xhigh · plan/acceptEdits`. |
-| **V18** | **#349 rate_limit_event surfacing** | **✓ full live PASS** | Fired 6× during the test window. Progress message renders as `✓ ⏳ Rate limited — waiting to retry` — visible to the user, not a silent cancel. Structured log: `claude.rate_limit_event count=1 cumulative_s=0.0 retry_after_s=None session_id=...`. |
-| V19 | #350 RAM guard | ✓ (smoke) | 0 `ram_guard.warn` / `ram_guard.block` events on healthy host (19GB available, 37% used). Adversarial test skipped. |
-| **V20** | **#346/#347 bg-task tracking + stuck gating** | **✓ full live PASS** | Claude prompt with `Bash run_in_background=true` + 20s sleep + `TaskOutput` block; duration 44s, 14 turns, 958 tokens; bg task completed cleanly (`done_at_1776569307`). **Zero `stuck_after_tool_result` events** during the bg window despite tool_result delay — `has_live_background_work()` gate correctly suppressed the detector. Per-session tracking in `state.live_bg_bashes` is in-memory only (no log emission by design — `background_task_summary()` v1 computes but footer-wiring is deferred to v2 per code comments). |
-
----
-
-## Real-life cross-engine sweep (§4)
-
-- **Per-engine workflow (§4.1)**: abbreviated; natural coverage from V-scenario runs on Claude, Pi, Codex.
-- **Cross-engine concurrency (§4.2)**: ✓ ran Claude + Codex + Gemini + Pi simultaneously on "create hello.txt" prompts; independent session IDs per engine, no cross-contamination in footers.
-- **Real-user commands (§4.3)**:
-  - `/continue` on Codex chat: ✓ resumed session `019d8aa9-…` cleanly.
-  - **`/at 90s ...` on Pi chat: ⚠ wrong engine used** — scheduled prompt fired on codex (global default) instead of pi (project default). **Filed as [#362](https://github.com/littlebearapps/untether/issues/362).** Functional but surprising.
-  - `/config` menu navigation: covered in Tier 7 + V17.
-  - `/health` dynamic: covered in V16.
-  - `/verbose`, `/agent`, `/ctx`, `/file`: covered in Tier 7.
-
----
-
-## Tier 3 / Tier 5-6 (selective)
-
-| # | Result | Notes |
-|---|---|---|
-| T6 emoji entities | n/a | Not exercised this run. |
-| T8 stale button click | n/a | Would need >10min idle before an aged Approve click. Deferred. |
-| S9 concurrent button clicks (#197 LRU) | n/a | No clean approval flow to test on (plan mode declines at text level). |
-| **B4 SIGTERM drain** | n/a | Not exercised — Untether-dev was not restarted after test start. |
-| **B5 log sweep** | **✓** | See below — 0 unexpected WARNING/ERROR events. |
-| S2 concurrent sessions | ✓ | Multiple engine chats handling overlapping runs without contamination. |
-| S3 /restart mid-run | n/a | Not exercised. |
-| S7 rapid-fire | ✓ historical | 2026-04-18 "rapid 1…5" msgs show coalesce-to-one semantics. |
-
----
-
-## Logs
-
-### Error / warning sweep (30 min window, 03:22 → 03:52Z)
-
-```
-Errors (5 total, all pre-existing):
-  2×  project.skipped.chat_id_matches_transport  alias=z80 chat_id=123  (pre-existing cfg conflict; z80 project uses same chat_id as dummy transport)
-  2×  telegram.http_error  400 "chat not found"  (startup notification to dummy transport chat_id=123)
-  1×  opencode.process.failed  rc=1  (BWS_ACCESS_TOKEN unbound — pre-existing dev-env gap, known since 2026-04-18)
-
-Warnings (5 total, companions of the errors above):
-  2×  projects.config.skipped_projects  skipped=['z80']
-  2×  transport.send.failed  chat_id=123
-  1×  session.summary.no_events  (opencode; expected given rc=1)
-```
-
-**None of these are rc3 regressions.** All have histories predating this test run.
-
-### RC3-specific events fired
-
-- `rate_limit_event`: **6** (Claude rate limits; each rendered in Telegram ✓)
-- `callback.answered`: **3** (config buttons + aq option; all `latency_ms` < 250ms, `early=True` where expected)
-- `ram_guard.warn|block`: **0** (healthy host)
-- `stuck_after_tool_result`: **0** (no false positives)
-- `bind_failed`: **0**
-- `health.command`: **0** explicit structlog events (command works via direct handler — /health renders output without a dedicated structlog event name; renders confirmed via Telegram)
-- `config.reload.restart_required`: **0** this run (sandbox blocked live edit; historical evidence present)
-- `background_task_summary`: **0** (by design — #347 v1 doesn't emit, per code comments)
-
-### Resource sanity
-
-| | Preflight | Teardown | Δ |
-|---|---|---|---|
-| untether-dev FD count | 13 | 12 | −1 |
-| Zombies | 0 | 0 | 0 |
-| untether-dev children | 0 | 0 | 0 |
-| Workerd/vitest orphans | 0 | 0 | 0 |
-| Free memory | 19.3 GB | 20.0 GB | +0.7 GB |
-
-All resource counters clean — no FD leak, no zombie accumulation, no orphaned subprocess tree.
-
----
-
-## Bugs filed / commented
-
-- **[#361](https://github.com/littlebearapps/untether/issues/361)** — Claude Bash tool sees `BWS_ACCESS_TOKEN` despite #198 env allowlist (`bug`, `security`, milestone `v0.35.2`). **Potential release blocker.**
-- **[#362](https://github.com/littlebearapps/untether/issues/362)** — `/at` scheduled run uses global default engine instead of chat/project default (`bug`, milestone `v0.35.2`).
-
-### Commented on existing issues
-
-- None this run. All rc3-scope tests that passed either pinned to new rc3 issues (which were already closed/merged) or to historical evidence for rc1/rc2 scope. `#198` is the only relevant closed issue with a fresh failure, which is why **[#361](https://github.com/littlebearapps/untether/issues/361)** was filed as a new issue rather than a re-open comment.
-
-### Credential rotation
-
-- `BWS_ACCESS_TOKEN` surfaced in full verbatim in: Claude's Telegram response (msg 55311), untether-dev journald logs, and `/tmp/claude-1000/-home-nathan-untether/.../tasks/*.output`. **Recommend rotation.**
-- `OPENAI_API_KEY` (systemd-set) also printed verbatim in the same response. Recommend rotation if it's not throwaway.
-
----
-
-## Release readiness
-
-### Go / no-go
-
-**NO-GO for v0.35.2 final until [#361](https://github.com/littlebearapps/untether/issues/361) is resolved or explicitly dispositioned.**
-
-Rationale: the security bundle #326 (which includes #198) is a headline of v0.35.2. Shipping final with a demonstrable case where a third-party host token reaches Claude's subprocess undermines that headline. Pi's implementation is clean — it's Claude-specific.
-
-### Recommended path
-
-1. **Investigate #361 root cause** (see issue for hypothesis + action checklist).
-2. Fix + add a runtime assertion that Claude's child env contains only allowlisted names.
-3. Bump to `0.35.2rc4`, re-publish to TestPyPI, re-run V2 on Claude.
-4. Also address [#362](https://github.com/littlebearapps/untether/issues/362) or explicitly defer to v0.35.3 (functional, not blocking).
-5. Once V2-claude passes, cut v0.35.2 final.
-
-### What DOES work in rc3
-
-All rc3-specific features (V16–V20) passed. The rc3 payload itself is solid:
-- `/health` (#348) renders on all engines
-- xhigh effort (#351) wired end-to-end through Claude CLI
-- `rate_limit_event` (#349) visible and consistent
-- RAM guard (#350) silent on healthy host (no false positives)
-- Background-task gating (#346/#347) correctly suppresses wedge detector during live bg primitives
-
-The release blocker is inherited scope (#198) surfaced by rc3's testing rigour, not something rc3 introduced.
-
----
-
-## Plan deviations / limitations
-
-- **Sandbox blocked live config edits** for V11 / V12 / V15 (session_mode flip, cron add). Fell back to historical evidence from 2026-04-18 runs. All three show expected behaviour historically; fresh rc3 verification is preferred.
-- **V3, V13, T6, T8, S9, B4, S3** not exercised (either time or non-trivial preconditions). Reviewable next run or before final cut.
-- **OpenCode fully skipped** after env-blocked first prompt. Pre-existing wrapper-script / systemd gap, not rc3.
-- **Gemini slow** on simple prompts (>240s idle). Cancelled cleanly. Upstream provider, not Untether.
-- **Test window ~29 min** vs 2.5h budget — abbreviated by prioritising the most informative tests (rc3 features, security-scope V1/V2/V4, multi-engine Tier 7) over a full 50-run Tier 1 matrix.
-
----
-
-## Appendix — pinned versions + config
-
-- `pyproject.toml`: `version = "0.35.2rc3"` (commit `2e231d8`, branch `dev`)
-- Dev config: `/home/nathan/.untether-dev/untether.toml` (unedited this run; backup at `.rc3test.bak`)
-- Preflight config check: `session_mode = "chat"`, `triggers.enabled = false`, `[engines.claude] permission_mode = "plan"` `allowed_tools = ["Bash","Read"]`, `[engines.pi] provider = "openai-codex"` (no `model` override — V7 critical).
-- Bot token, allowed_user_ids, project chat_ids all confirmed correct at preflight.
diff --git a/docs/tests/v0.35.2-integration-test-plan.md b/docs/tests/v0.35.2-integration-test-plan.md
deleted file mode 100644
index c244efc..0000000
--- a/docs/tests/v0.35.2-integration-test-plan.md
+++ /dev/null
@@ -1,552 +0,0 @@
-# v0.35.2 Integration Test Plan
-
-**Target:** Untether v0.35.2 (unreleased)
-**Scope:** Claude Code, Codex CLI, OpenCode, Pi, Gemini CLI (AMP deferred — sign-in blocked)
-**Bot:** `@untether_dev_bot` (dev service — `untether-dev.service`)
-**Source of truth for tiers:** `docs/reference/integration-testing.md`
-**Filed/executed by:** Claude Code via Telegram MCP + Bash tools
-
-This plan is scoped to what actually shipped in v0.35.2. It maps every issue that landed in the milestone to at least one concrete integration test so a failure can be pinned back to its commit within minutes. Unit tests already cover the code paths — this plan verifies the live behaviour through the Telegram bridge.
-
----
-
-## 0. Bug-handling protocol (read first)
-
-When any test below fails, is suspicious, or surfaces unexpected behaviour:
-
-### Map the failure to a v0.35.2 issue
-
-Each test table row has a **"related issue(s)"** column. If the failure correlates with one of those issues, **comment on that existing GitHub issue** with:
-
-- Failing test id (e.g. `U1-codex`, `V2-317`)
-- Engine + chat id
-- Timestamp (UTC) of the failing interaction
-- What the test expected vs what happened
-- One-liner from `journalctl --user -u untether-dev --since '…'` if relevant
-- Telegram message id(s) so the evidence is retrievable
-
-Template:
-```markdown
-**v0.35.2 integration test — test `<TEST_ID>` failed**
-
-- Engine: <claude|codex|opencode|pi|gemini>
-- Chat ID: <id>
-- UTC: <YYYY-MM-DDTHH:MM:SSZ>
-- Expected: <one line>
-- Observed: <one line>
-- Log excerpt: `<event_name key=value …>`
-- Telegram msg_id(s): <…>
-
-Flagged during v0.35.2 integration testing for investigation before release cut.
-```
-
-### Create a new issue when no existing one fits
-
-If the failure is **not covered** by an open v0.35.2 issue (i.e. it's genuinely new), file a new issue against the `v0.35.2` milestone with label `bug` and the same evidence block above. Use the `bug` label, add the `v0.35.2` milestone, and cross-link from the test report (Section 9).
-
-### Distinguish Untether bugs from upstream engine issues
-
-If the root cause is an engine CLI problem (auth, quota, provider outage, upstream regression), tag the test result as **upstream** in the final report but do NOT file an Untether issue. Examples: Anthropic 529, Google quota, OpenCode deprecation notices, Pi provider auth.
-
----
-
-## 1. Preflight
-
-### 1.1 Reinstall OpenCode
-
-OpenCode was archived upstream 2025-09-18 (see issue [#338](https://github.com/littlebearapps/untether/issues/338) for context). Confirm what binary is present, then install/upgrade:
-
-```bash
-which opencode && opencode --version 2>&1 | head -3
-# If missing, consult latest installer docs from opencode-ai/opencode (archived repo, release assets still hosted)
-# A typical reinstall path:
-#   curl -fsSL https://opencode.ai/install | bash
-#   OR: pipx install opencode-ai (if that distribution still pulls)
-# Verify:
-opencode --version
-```
-
-Record the installed version in Section 9 alongside each engine's version for release notes.
-
-### 1.2 Verify dev service health
-
-```bash
-systemctl --user status untether-dev --no-pager | head -20
-journalctl --user -u untether-dev --since "5 minutes ago" | grep -E "READY|startup|ERROR" | head -20
-uv run pytest -q 2>&1 | tail -3                           # 2292 passing at 80%+ coverage
-git log --oneline origin/dev ^master | head -12           # Confirm v0.35.2 commits are on dev
-```
-
-### 1.3 Snapshot versions
-
-Run once and pin in Section 9:
-
-```bash
-for cli in claude codex opencode pi gemini; do
-  echo "=== $cli ==="
-  $cli --version 2>&1 | head -2 || echo "not installed"
-done
-grep -E '^version' /home/nathan/untether/pyproject.toml
-cat ~/.untether-dev/untether.toml | head -30              # cfg snapshot for the report
-```
-
-### 1.4 Start a background log tail
-
-Run in a second terminal (or via background Bash):
-
-```bash
-journalctl --user -u untether-dev -f
-```
-
-Keep this running throughout testing — screenshots of context go in bug reports.
-
-### 1.5 Test chats (already configured)
-
-| Engine | Chat ID | Bot API chat_id | Test project cwd |
-|--------|---------|-----------------|------------------|
-| Claude Code | `5284581592` | `-5284581592` | `test-projects/test-claude` |
-| Codex CLI | `4929463515` | `-4929463515` | `test-projects/test-codex` |
-| OpenCode | `5200822877` | `-5200822877` | `test-projects/test-opencode` |
-| Pi | `5156256333` | `-5156256333` | `test-projects/test-pi` |
-| Gemini CLI | `5207762142` | `-5207762142` | `test-projects/test-gemini` |
-| ~~AMP~~ | `5230875989` | — | *skipped — auth blocked* |
-
-If a positive chat ID fails with `GEN-ERR-582`, use the negative Bot API form.
-
----
-
-## 2. v0.35.2 change → test map
-
-Every issue landed in v0.35.2. For each, the table gives the minimum live test. If a test fails, comment on the listed issue.
-
-| v0.35.2 issue | What landed | Primary test(s) | Engines |
-|---|---|---|---|
-| [#195](https://github.com/littlebearapps/untether/issues/195) | CI matrix `env:` fix | CI-only — SKIP runtime | — |
-| [#196](https://github.com/littlebearapps/untether/issues/196) | `bot_token` → `SecretStr` | `V1` | any |
-| [#197](https://github.com/littlebearapps/untether/issues/197) | `_HANDLED_REQUESTS` LRU | `S9` (concurrent Approve click) | Claude |
-| [#198](https://github.com/littlebearapps/untether/issues/198) | Env allowlist (Claude+Pi) | `V2` | Claude, Pi |
-| [#199](https://github.com/littlebearapps/untether/issues/199) | Codex HTML escape in auth error | `V3` | Codex |
-| [#200](https://github.com/littlebearapps/untether/issues/200) | Voice transcription sanitisation | `T1-err` | any |
-| [#201](https://github.com/littlebearapps/untether/issues/201) | Command dispatch sanitisation | `V4` | any |
-| [#202](https://github.com/littlebearapps/untether/issues/202) | Bandit global skips removed | CI-only — SKIP runtime | — |
-| [#203](https://github.com/littlebearapps/untether/issues/203) | Registry sweep (1-hour TTL) | `V5` — log assertion only | any |
-| [#204](https://github.com/littlebearapps/untether/issues/204) | `download_file` URL validation | `V6` — log-only smoke | any |
-| [#225](https://github.com/littlebearapps/untether/issues/225) | Pi model footer from JSONL | `V7` | **Pi** |
-| [#247](https://github.com/littlebearapps/untether/issues/247) | `callback.answered` latency log | `V8` | Claude |
-| [#275](https://github.com/littlebearapps/untether/issues/275) | Process tree cleanup | `V9` | Claude |
-| [#316](https://github.com/littlebearapps/untether/issues/316) | Cost footer accuracy + parity | `V10` | Claude, Gemini, OpenCode |
-| [#317](https://github.com/littlebearapps/untether/issues/317) | `run_once` cron persistence | `V11` | any (cron) |
-| [#318](https://github.com/littlebearapps/untether/issues/318) | Restart-required Telegram warning | `V12` | any |
-| [#320](https://github.com/littlebearapps/untether/issues/320) | Webhook port graceful bind | `V13` | any |
-| [#322](https://github.com/littlebearapps/untether/issues/322) | Stuck-after-tool_result detector | `V14` — no-false-positive | Claude, Codex, Gemini, OpenCode, Pi |
-| [#330](https://github.com/littlebearapps/untether/issues/330) | Per-cron `permission_mode` | `V15` | Claude |
-
-V-tests are specified in detail in Section 6.
-
----
-
-## 3. Tier 7 — command smoke (all engines, ~5 min)
-
-Fire these once in **each** of the 5 engine chats via `send_message` then verify the immediate response via `get_history` after ~2s.
-
-| # | Command | Expected | Notes |
-|---|---------|----------|-------|
-| Q1 | `/ping` | Pong line + uptime + trigger summary (if any) | Also checks rc4 trigger visibility in footer |
-| Q2 | `/config` | Inline keyboard menu | Press Back to close |
-| Q3 | `/usage` | Usage dict OR "no session yet" | No crash |
-| Q4 | `/export` | Export link OR "no session yet" | No crash |
-| Q5 | `/browse` | Inline keyboard with dirs/files | `list_inline_buttons` returns > 0 |
-| Q6 | `/verbose` | Toggle confirmation | |
-| Q7 | `/cancel` | "Nothing running" | |
-| Q8 | `/planmode` (Claude chat only) | Mode toggle | |
-| Q9 | `/stats` | Stats or empty | |
-| Q10 | `/ctx` | Context line | |
-| Q11 | `/agent` | Current engine default | |
-| Q12 | `/trigger` | Current trigger mode | |
-| Q13 | `/file` | Usage help | |
-
-Any command that crashes, hangs, or returns a raw Python traceback → **new issue** in v0.35.2 milestone.
-
----
-
-## 4. Tier 1 — universal (U1-U10, all 5 engines, ~45 min)
-
-Run **every U-test in every engine chat**. This is the regression backstop. 10 tests × 5 engines = 50 runs.
-
-For each run: `send_message(chat_id, prompt)`, sleep ~5-15s (depends on engine), then `get_history` and verify via text matching + `list_inline_buttons` where relevant.
-
-| # | Prompt / action | Verify | Related v0.35.2 issue(s) |
-|---|---|---|---|
-| U1 | `create a file called hello.txt with "hello world"` | Progress phases appear, final answer renders, footer shows **model name** + **cost line** + resume line | #316 (cost parity), #225 (Pi model) |
-| U2 | `list files here, then read README if present` | Multiple action phases visible in verbose | — |
-| U3 | `write a detailed explanation of how TCP/IP works, at least 2000 words` | Message splits across multiple Telegram messages; footer only on the last | — |
-| U4 | Reply to U1's resume line: `now rename hello.txt to greetings.txt` | Session continues, resume token accepted | — |
-| U5 | `/config` → Model → pick a non-default, send a prompt | Footer reflects new model | — |
-| U6 | Send U3 prompt then `/cancel` within 10s | Run stops, completion notice, **no orphan processes** | #275 |
-| U7 | `read /nonexistent/file/path` | Error renders cleanly in Telegram, no raw traceback | #200, #201 |
-| U8 | `/usage` after U1 | Cost info (Claude/Gemini/OpenCode) or no-cost notice (Codex/Pi) | #316 |
-| U9 | `/export` after U1 | Markdown export download works | — |
-| U10 | `/browse` | Inline keyboard, navigate one dir deep and back | — |
-
-**Extra Pi-specific check on U1** (related to #225): Pi chat must **not** have `/model set` override active (run `/agent` + `/model` first, clear if set). The footer must show the model name pulled from Pi's JSONL `message_end`, e.g. `gpt-5.4`. If the footer shows only `🏷 dir: pi-test` with no model → comment on [#225](https://github.com/littlebearapps/untether/issues/225).
-
----
-
-## 5. Tier 2 — Claude interactive (Claude chat only, ~15 min)
-
-Plan mode ON. `/planmode plan` before starting.
-
-| # | Prompt / action | Verify | Related |
-|---|---|---|---|
-| C1 | `run ls -la` | Approve/Deny/Discuss buttons appear; click Approve; command executes | — |
-| C2 | Same prompt, click Deny | Denial message reaches Claude cleanly | — |
-| C3 | `refactor main.py into smaller modules` → "Pause & Outline Plan" | Outline text renders; Approve/Deny/Discuss buttons auto-appear | — |
-| C4 | `should I use TypeScript or JavaScript for this project?` | AskUserQuestion with option buttons | — |
-| C5 | With plan mode, prompt that edits a file | Diff preview (old/new) in approval message | — |
-| C6 | Approve one tool, **quickly** deny the next | No stale button, no spinner hang | #197 |
-| C7 | `/usage` with `[footer]` enabled in cfg | 5h/weekly subscription footer | #316 |
-
----
-
-## 6. v0.35.2 scenarios (the payload)
-
-These verify the **specific fixes** landed this release. Most are one-shot and map 1:1 to an issue.
-
-### V1 — `bot_token` masking (#196)
-
-Send a spurious command that forces a bridge error and prompts structlog output (e.g. `/file get /etc/passwd`, expect path-denied).
-
-```bash
-journalctl --user -u untether-dev --since "5 minutes ago" \
-  | grep -iE "bot_token|token=.{20}"
-```
-
-**Pass:** no raw bot token (`\d+:[A-Za-z0-9_-]{35}`) appears in any log line.
-**Fail:** token visible → comment on [#196](https://github.com/littlebearapps/untether/issues/196).
-
-### V2 — env allowlist for Claude + Pi (#198)
-
-In the Claude chat, send:
-```
-run `printenv | sort` and report what's present. Then run `echo "AWS=$AWS_ACCESS_KEY_ID DB=$DATABASE_URL STRIPE=$STRIPE_API_KEY"`.
-```
-
-**Pass:** engine output shows essentials (PATH, HOME, CLAUDE_* vars, NODE_*, UV_*, ANTHROPIC_*) and the last echo shows `AWS= DB= STRIPE=` (all empty). No random third-party tokens leak in.
-**Fail (Claude or Pi):** any non-allowlisted env var is visible → comment on [#198](https://github.com/littlebearapps/untether/issues/198).
-
-Repeat in the Pi chat with the equivalent prompt. (Codex/OpenCode/Gemini keep the default inherit — no change expected, not a regression.)
-
-### V3 — Codex HTML escape (#199)
-
-In the Codex chat, trigger an auth error on purpose:
-```bash
-# Temporarily invalidate codex auth (if safe), OR send a command known to produce a subprocess error that surfaces as HTML to Telegram
-```
-
-**Pass:** error message renders as plain text inside `<pre>`, no `<b>` / `<a>` bleed-through into the Telegram render.
-**Fail:** HTML entities from the subprocess output are interpreted as Telegram entities → comment on [#199](https://github.com/littlebearapps/untether/issues/199).
-
-### V4 — Dispatch sanitisation (#201)
-
-Send a malformed command that will raise in the dispatch layer: e.g. a `/ctx set` missing args, `/file get` with an absolute path outside project, or a forwarded message from an unknown chat.
-
-**Pass:** Telegram reply is a short friendly error. No absolute paths, no URLs, no raw stack traces visible.
-**Fail:** see `/home/nathan/…` or `Traceback` in the Telegram reply → comment on [#201](https://github.com/littlebearapps/untether/issues/201).
-
-### V5 — Registry sweep (#203) — log-only smoke
-
-This needs a 1-hour TTL — impractical to wait during the run. Instead verify the sweep machinery exists and fires:
-
-```bash
-journalctl --user -u untether-dev --since "1 hour ago" | grep -E "registries.sweep|ephemeral.swept|outline.swept"
-```
-
-**Pass:** at least one sweep log event OR process was running < 1 hour (sweep runs on the 60-second stall-monitor tick but only prunes ≥1h old entries).
-**Fail:** logs show registries growing unbounded over multiple sessions with no sweep events → comment on [#203](https://github.com/littlebearapps/untether/issues/203).
-
-### V6 — `download_file` URL validation (#204)
-
-Unable to craft a malicious getFile response from the MCP. Log-only smoke: send a normal file-upload `/file put README.md` and confirm the download path inside Untether never emits a validation warning for the legit case.
-
-```bash
-journalctl --user -u untether-dev --since "5 minutes ago" | grep -E "download_file.(rejected|invalid)"
-```
-
-**Pass:** no `download_file.rejected` events for legitimate uploads.
-**Fail:** false positives on legit paths → comment on [#204](https://github.com/littlebearapps/untether/issues/204).
-
-### V7 — Pi model footer from JSONL (#225)
-
-**Critical for this release** — this change just merged (PR #327).
-
-In the Pi chat:
-1. `/agent clear` (ensure Pi is the engine), `/model clear` (ensure no override).
-2. Confirm `pi.model` is **not** set in `~/.untether-dev/untether.toml`:
-   ```bash
-   grep -A2 '\[runners.pi\]' ~/.untether-dev/untether.toml || echo "no [runners.pi] section"
-   ```
-   If model is set, remove it temporarily for this test and restart dev.
-3. Send: `what is 2+2`.
-4. Wait for the final message, inspect the footer.
-
-**Pass:** footer shows model name (e.g. `🏷 dir: pi-test | gpt-5.4`). `gpt-5.4` or whatever Pi actually used.
-**Fail:** footer shows only `🏷 dir: pi-test` with no model name → comment on [#225](https://github.com/littlebearapps/untether/issues/225).
-
-### V8 — `callback.answered` instrumentation (#247)
-
-In the Claude chat with plan mode:
-1. Send `run echo hi`.
-2. When Approve/Deny/Discuss buttons appear, use `press_inline_button` to click Approve.
-3. Within 10 seconds:
-   ```bash
-   journalctl --user -u untether-dev --since "1 minute ago" | grep callback.answered
-   ```
-
-**Pass:** at least one `callback.answered` entry with keys `latency_ms`, `total_ms`, `early=true`, `has_toast`. `latency_ms` < 2000 for healthy conditions.
-**Fail:** no log entry, or `latency_ms` suspiciously high (> 5000) and correlates with a `BotResponseTimeoutError` → comment on [#247](https://github.com/littlebearapps/untether/issues/247).
-
-### V9 — Process tree cleanup (#275)
-
-In the Claude chat:
-1. Send: `create a node project in /tmp/workerd-test-<random-suffix> with @cloudflare/vitest-pool-workers and run one quick test`.
-2. While it's running (tool execution visible in progress), note the Untether process tree:
-   ```bash
-   # capture before cancel
-   ps --forest -ef | grep -E "untether|claude|node|workerd" | head -20
-   ```
-3. `/cancel`.
-4. After 30 seconds:
-   ```bash
-   # confirm cleanup
-   ps aux | grep -E "workerd|vitest|defunct" | grep -v grep
-   ls /proc/$(pgrep -f '.venv/bin/untether')/fd 2>/dev/null | wc -l  # FD count should be stable
-   ```
-
-**Pass:** no `workerd`, `vitest`, or stale `node` processes survive, no zombies.
-**Fail:** orphan processes remain → comment on [#275](https://github.com/littlebearapps/untether/issues/275) including ps output and PID details.
-
-### V10 — Cost footer + parity (#316)
-
-Four sub-tests:
-
-**V10.1** — Claude: run U1, verify footer shows both API cost **and** subscription usage (`⚡ 5h: NN% | 7d: NN%`). Numbers must be plausible (< 100%, > 0% after the run).
-
-**V10.2** — Gemini: run U1, verify footer shows `total_cost_usd` (not zero unless genuinely free tier).
-
-**V10.3** — OpenCode: run U1, verify token counts render **even when cost is zero** (OpenCode free-tier case).
-
-**V10.4** — zero-turn / cached response: send a trivial prompt twice back to back (second may be cached). Second response's footer must still render the turn count cleanly (no missing turns).
-
-**Fail on any sub-test:** comment on [#316](https://github.com/littlebearapps/untether/issues/316) specifying which sub-test and the rendered footer.
-
-### V11 — `run_once` cron persistence (#317)
-
-Setup:
-```bash
-cat >> ~/.untether-dev/untether.toml <<'EOF'
-
-[[triggers.crons]]
-id = "v0352-test-once"
-schedule = "* * * * *"
-chat_id = 5284581592        # Claude chat
-engine = "claude"
-prompt = "reply with the word READY and nothing else"
-run_once = true
-EOF
-```
-
-1. Wait up to 60s for the cron to fire in the Claude chat. Verify `READY`-style response appears.
-2. Check state file:
-   ```bash
-   cat ~/.untether-dev/run_once_fired.json
-   ```
-   Must contain `v0352-test-once` with a recent ISO timestamp.
-3. Trigger a hot-reload (touch untether.toml): `touch ~/.untether-dev/untether.toml`.
-4. Wait 90 seconds. The cron must **not** fire again.
-5. Restart dev: `systemctl --user restart untether-dev`. Wait 90 seconds. Must **not** fire again.
-
-**Pass:** state file records the fire, no second or third firing after reload/restart.
-**Fail:** cron re-fires → comment on [#317](https://github.com/littlebearapps/untether/issues/317). Include the state file contents and journalctl line for each fire.
-
-Cleanup: remove the test cron block and the state file entry.
-
-### V12 — Restart-required Telegram warning (#318)
-
-1. Note current `session_mode` in `~/.untether-dev/untether.toml`.
-2. Flip it (`stateless` ↔ `chat`) and save.
-3. Dev bot auto-reloads on file change. Within ~5 seconds:
-   - Primary chat (Claude chat in current dev cfg) must receive a Telegram message matching: `⚠️ Config reload: session_mode changed — restart required to take effect.`
-   - `journalctl` must show `config.reload.transport_config_changed keys=['session_mode'] restart_required=true`.
-4. Revert the change and save. Second warning fires.
-
-**Pass:** both the Telegram message **and** the structlog event.
-**Fail:** one or both missing → comment on [#318](https://github.com/littlebearapps/untether/issues/318).
-
-Also verify: editing a **hot-reloadable** key (`voice_transcription`) does **not** produce the warning.
-
-### V13 — Webhook port bind graceful (#320)
-
-1. Ensure triggers are enabled in cfg with `[triggers.server]` (port 9876 by default).
-2. Occupy 9876 in another process: `python3 -m http.server 9876 &`.
-3. Restart dev: `systemctl --user restart untether-dev`.
-4. Tail logs for 10 seconds:
-   ```bash
-   journalctl --user -u untether-dev --since "30 seconds ago" | grep -E "bind_failed|triggers.server"
-   ```
-5. Verify the rest of the bot is alive: `send_message` a `/ping` to the Claude chat.
-
-**Pass:** structured `triggers.server.bind_failed` event with `host`, `port`, `hint`, `fix` fields; `/ping` still works.
-**Fail:** bot crashes or restart-loops → comment on [#320](https://github.com/littlebearapps/untether/issues/320). Cleanup the port squatter: `kill %1` (or equivalent).
-
-### V14 — Stuck-after-tool_result no-false-positive (#322)
-
-Across Claude, Codex, Gemini, OpenCode, Pi: run a normal U1-style prompt that involves a Read/Write tool.
-
-```bash
-journalctl --user -u untether-dev --since "10 minutes ago" | grep -E "stuck_after_tool_result"
-```
-
-**Pass:** **zero** `progress_edits.stuck_after_tool_result` or `recovery` events fired during healthy runs.
-**Fail:** detector fires on a normal run → comment on [#322](https://github.com/littlebearapps/untether/issues/322) with the engine, tool name, and timing.
-
-(Active recovery test — a real MCP wedge — is out of scope; this is a no-false-positive regression check only.)
-
-### V15 — Per-cron `permission_mode` (#330)
-
-1. Ensure Claude chat has plan mode (`/planmode plan`).
-2. Add:
-   ```toml
-   [[triggers.crons]]
-   id = "v0352-test-perm"
-   schedule = "* * * * *"
-   chat_id = 5284581592
-   engine = "claude"
-   prompt = "run `date` and tell me"
-   permission_mode = "auto"
-   run_once = true
-   ```
-3. Hot-reload picks it up; wait ≤ 60 s for fire.
-4. Verify the run completed **without** presenting approval buttons — the `permission_mode = "auto"` override should have taken effect despite the chat being in plan mode.
-5. Check:
-   ```bash
-   journalctl --user -u untether-dev --since "2 minutes ago" | grep trigger.cron.permission_mode_override
-   ```
-
-**Pass:** run completes autonomously; `permission_mode_override` INFO event recorded.
-**Fail:** approval buttons presented, or run blocked → comment on [#330](https://github.com/littlebearapps/untether/issues/330).
-
-Cleanup: remove the cron, remove the id from `run_once_fired.json` if it got written.
-
----
-
-## 7. Tier 3 — Telegram transport (selective, ~15 min)
-
-Only the tests relevant to what shipped in v0.35.2 — no telegram transport code changed materially except #197 / #247.
-
-| # | Test | Related |
-|---|---|---|
-| T6 | Emoji entities — any engine: `respond with 5 emoji flags and bold the country names`. Entities render correctly. | — |
-| T8 | Stale button click — let a Claude session complete and age ~10 min. Click old Approve button. Toast says expired. | #197 |
-| S9 | Concurrent Approve clicks — two rapid `press_inline_button` on the same button. Exactly one Approve path fires. | #197 |
-
----
-
-## 8. Tier 5/6 — operational + stress (~15 min)
-
-| # | Test | Related |
-|---|---|---|
-| B5 | Log inspection after full test run: `journalctl --user -u untether-dev --since "2 hours ago" \| grep -E "ERROR\|WARNING"` — must only surface expected entries (config warnings during V12, intentional cancels during V9). | all |
-| S2 | Concurrent sessions — send U1 in Claude **and** Gemini simultaneously. Both finish, no cross-contamination. | — |
-| S3 | `/restart` mid-run — start a long Claude run, send `/restart`. Drain notice appears, bot restarts, new runs accepted. | — |
-| S7 | Rapid-fire: 5 prompts to the Claude chat in under 5s. Exactly one session locks; rest queue or reject cleanly. | — |
-
-FD count sanity after everything:
-
-```bash
-ls /proc/$(pgrep -f '.venv/bin/untether')/fd 2>/dev/null | wc -l
-ps aux | grep -E "defunct|Z " | grep -v grep
-```
-
-FD count should be in the low hundreds, no zombies.
-
----
-
-## 9. Final report template
-
-At the end of the test run, write a result block with this shape (can be pasted into a summary issue or commit message):
-
-```markdown
-## v0.35.2 integration test report — <UTC date>
-
-**Dev bot version:** <pyproject version> on commit <git SHA>
-**Engines:** claude <ver>, codex <ver>, opencode <ver>, pi <ver>, gemini <ver> (amp skipped: auth)
-
-### Tier 7 (command smoke)
-- Claude: <pass|fail per Q>
-- Codex: <…>
-- OpenCode: <…>
-- Pi: <…>
-- Gemini: <…>
-
-### Tier 1 (universal, U1-U10)
-- Matrix: 5 engines × 10 tests = 50 runs
-- Results: <N pass / M fail / K upstream>
-- Failures: <list with test-id + issue link>
-
-### Tier 2 (Claude interactive)
-<pass/fail per C>
-
-### v0.35.2 scenarios (V1-V15)
-<pass/fail per V with linked issue>
-
-### Tier 3 selective (T6, T8, S9)
-<pass/fail>
-
-### Tier 5/6 (B5, S2, S3, S7)
-<pass/fail>
-
-### Logs
-- FD count after suite: <N>
-- Zombies: <none|list>
-- Unexpected WARNING/ERROR lines: <count + one-liners>
-
-### Bugs filed / commented
-- Commented on existing issues: <list of #N>
-- New issues filed in v0.35.2 milestone: <list of #N>
-
-### Release readiness
-- <Go | No-go with blocker list>
-```
-
-Drop this report in a PR comment or as a new `docs/tests/results/` entry if retained.
-
----
-
-## 10. Known limitations of this plan
-
-- **AMP is skipped** per user instruction (sign-in blocked). Revisit before a v0.35.3 cut.
-- **OpenCode** is deprecated upstream (archived 2025-09-18). Treat any new failures as documented-in-advance unless they reveal an Untether bug (bad error handling, crash) — see [#338](https://github.com/littlebearapps/untether/issues/338).
-- **V9 (process tree)** can be flaky under rate-limited API conditions; rerun if the first attempt looks inconclusive.
-- **V14 (stuck detector)** only verifies no-false-positive; a real wedge test would require orchestrating a Cloudflare MCP stall and is deferred.
-- **V3 (Codex HTML escape)** requires a reproducible Codex auth-error path; if you can't naturally trigger one, mark as N/A rather than fail.
-- **V5, V6** are log-only smoke tests — they don't actively attack the hardened path, they just verify it's not broken on the legitimate path.
-- **V12** requires `[config_watch]` or the equivalent auto-reload mechanism already enabled; if the cfg file isn't being watched, do `systemctl --user reload untether-dev` or send a SIGHUP equivalent instead.
-
----
-
-## 11. Execution checklist (top-to-bottom)
-
-- [ ] Preflight 1.1 — reinstall OpenCode
-- [ ] Preflight 1.2-1.5 — services, versions, logs, chats confirmed
-- [ ] Section 3 — Tier 7 command smoke in all 5 chats
-- [ ] Section 4 — Tier 1 U1-U10 in all 5 chats (50 runs)
-- [ ] Section 5 — Tier 2 C1-C7 in Claude chat
-- [ ] Section 6 — V1…V15 (skip V3 if Codex auth can't error; mark N/A)
-- [ ] Section 7 — T6, T8, S9
-- [ ] Section 8 — B5, S2, S3, S7 + FD/zombie check
-- [ ] Section 9 — write the report
-- [ ] Comment on any landed-issue that had a failing test (see Section 0)
-- [ ] File new issues in v0.35.2 milestone for anything not covered
-- [ ] Final verdict: go / no-go for v0.35.2 release cut
-
-**Estimated total time**: ~2 hours end-to-end, single-operator, including log sweeps and report writing.