Skip to content

fix: tooling hardening — runs.json crash, image-copy abort, jargon, dead code#706

Merged
Muizzkolapo merged 3 commits into
mainfrom
fix/tooling-hardening
Jun 19, 2026
Merged

fix: tooling hardening — runs.json crash, image-copy abort, jargon, dead code#706
Muizzkolapo merged 3 commits into
mainfrom
fix/tooling-hardening

Conversation

@Muizzkolapo

Copy link
Copy Markdown
Owner

Summary

Six findings from hardening/reviews/tooling-review.md for agent_actions/tooling/.

# Severity File Fix
1 P0 — HIGH docs/run_tracker.py record_action_start, record_action_complete, finalize_workflow_run now catch (OSError, JSONDecodeError) on json.load. A corrupted runs.json from a previous interrupted write no longer crashes the live workflow; a warning is logged and the call returns. start_workflow_run already had this guard.
2 P1 — MEDIUM code_scanner.py scan_tool_functions now catches OSError alongside SyntaxError/UnicodeDecodeError. A broken symlink or file deleted between rglob and read_text no longer aborts catalog generation. Matches sibling pattern in docs/scanner/component_scanners.py.
3 P1 — MEDIUM docs/generator.py _copy_readme_images wraps the mkdir+copy2 pair in try/except OSError. A single bad image (disk full, permission denied) no longer aborts the entire catalog. Log message uses "stage" because the same block covers both mkdir and copy failures.
4 P1 — MEDIUM docs/generator.py _copy_readme_images switched from str.replace to span-based substitution with right-to-left splicing. The naive replace corrupted shared filename suffixes — logo.png would rewrite inside dark/logo.png on the first replace, double-prefixing the second.
5 P2 — LOW lsp/diagnostics.py, lsp/server.py Rewrote IDE diagnostic messages: dropped "context_scope reference" framework jargon in favour of action/field language users actually author. Signature-help documentation similarly clarified.
6 P3 — LOW lsp/models.py Removed unused ActionDefinition dataclass (and its DEFAULT_ACTION_KIND import). Superseded by ActionMetadata; no callers anywhere — grep -r "ActionDefinition" returns only the class definition.

Verification

  • ruff format --check agent_actions tests — clean on touched files
  • ruff check agent_actions/tooling tests/unit/tooling — clean
  • pytest tests/unit/tooling tests/tooling — 279 passed (152 unit + 127 integration)
  • pytest -q --ignore=tests/integrations — 7409 passed, 2 skipped
  • pytest tests/integrations -q — 96 passed
  • Pi consensus (5 reviewers): 2 confirmed YES with full positive reviews, 1 UNCLEAR due to verdict-line parse mismatch on a positive review, 1 Pi process error, 1 truncated. Reviewer 4's cosmetic note on the error message wording ("Failed to copy" when block also covers mkdir) addressed before commit by switching to "Failed to stage".

New tests

tests/unit/tooling/test_run_tracker.py — corrupted and empty runs.json no-crash cases for record_action_start, record_action_complete, finalize_workflow_run. Verifies warning is logged.

tests/unit/tooling/test_copy_readme_images.py (new file) — substring-collision regression (logo.png vs dark/logo.png), selective copy2 failure does not abort or propagate, external URLs (http://, https://, data:) pass through unchanged.

A _allow_log_propagation fixture re-enables propagation on the agent_actions logger for the duration of caplog-based assertions, since LoggerFactory sets propagate=False at the root.

…ead code

Six findings from hardening/reviews/tooling-review.md.

P0 — RunTracker json.load now wrapped in try/except (OSError,
JSONDecodeError) in record_action_start, record_action_complete and
finalize_workflow_run. A corrupted runs.json from a previous interrupted
write no longer crashes the live workflow thread; the call logs a warning
and returns. start_workflow_run already had this guard.

P1 — code_scanner.scan_tool_functions now catches OSError around
read_text, alongside the existing SyntaxError/UnicodeDecodeError. A
broken symlink or file deleted between rglob and read_text no longer
aborts catalog generation.

P1 — generator._copy_readme_images guards the mkdir+copy2 pair with
try/except OSError so a single bad image (disk full, permission denied)
no longer aborts the entire catalog. The log message uses "stage"
because the same block covers both mkdir and copy failures.

P1 — generator._copy_readme_images switched from str.replace to
span-based substitution with right-to-left splicing. The naive replace
corrupted shared filename suffixes — 'logo.png' would rewrite inside
'dark/logo.png' on the first replace, double-prefixing the second.

P2 — LSP diagnostics and signature help rewritten to drop the
"context_scope reference" phrasing in favour of action/field language
users actually author. server.py signature documentation similarly
clarified.

P3 — Removed the unused ActionDefinition dataclass (and its
DEFAULT_ACTION_KIND import) from lsp/models.py. Superseded by
ActionMetadata; no callers anywhere.

Tests: 7 new cases covering corrupted/empty runs.json across all three
methods, the substring-collision fix, the copy-failure fallback, and
URL passthrough. A `_allow_log_propagation` fixture restores log
propagation on the `agent_actions` logger for the duration of caplog-
based assertions (LoggerFactory sets propagate=False).

Verification: ruff format --check + ruff check clean on touched files;
pytest -q passes 7409 + 96 integration tests with the new ones included.
Pi consensus: 2/5 YES with full positive reviews, 1 UNCLEAR due to
verdict-parse mismatch on a YES review, 1 Pi process error, 1 review
truncated. Reviewer 4's cosmetic note on the error message wording
addressed before commit.
Eight findings from a recall-mode simplify review of PR #706.

Correctness:

- run_tracker.py now `.touch(exist_ok=True)` the runs.json file before
  acquiring the portalocker lock in record_action_start /
  record_action_complete / finalize_workflow_run. Without the touch,
  `portalocker.Lock(file, "r+", ...)` raises FileNotFoundError if the
  file is missing — and the @Retry decorator only catches
  LockException, so FNFE propagated to the workflow thread. This was
  the exact crash the previous commit's docstring claimed to prevent.
  start_workflow_run already had the touch; the other three did not.

- The three new try/except guards no longer just `return` on a
  corrupted file. They now repair runs.json to the
  `_empty_runs_data(extended=True)` shape before returning, matching
  the sibling pattern in _load_runs_data_from_file (line 125) and
  start_workflow_run (line 247). The old code's silent-fallback shape
  left the file permanently broken so every subsequent record/finalize
  call kept logging "cannot record action ..." warnings forever. The
  helper _load_or_repair was extracted to centralise this behaviour.

- code_scanner.scan_tool_functions now also catches OSError raised by
  `Path.rglob("*.py")` itself during iteration (it raises during the
  for-statement, not inside the body, so the inner try/except didn't
  cover it). A 0700-mode subdir under tools/ no longer aborts the
  whole scan.

- lsp/server.py SignatureHelp documentation now accurately describes
  the data: variables are aggregated across every action in the file,
  not "this action's context_scope.observe". The diagnostics layer
  uses a stricter per-action set for guard-condition validation, so
  the doc now points users at per-action diagnostics for strict scope.

Cleanup:

- Removed the dead `seen_spans: set[tuple[int, int]]` dedup in
  _copy_readme_images. The html and markdown URL-group spans live in
  disjoint surrounding chars (`src="..."` vs `](...)`), so their
  (start, end) tuples can never coincide; `finditer` already returns
  non-overlapping matches within a single pattern.

- Hoisted `images_dir.mkdir(parents=True, exist_ok=True)` out of the
  per-image try/except. A systemic mkdir failure (read-only artefact
  dir) now logs one "cannot create images directory" warning and
  aborts cleanly instead of producing N misleading per-image warnings.
  The per-image try/except is back to covering only copy2.

- Moved the _allow_log_propagation fixture into a new
  tests/unit/tooling/conftest.py (mirroring tests/unit/llm/conftest.py
  and tests/unit/unification/conftest.py) and marked it autouse. The
  two test files no longer carry duplicated fixture bodies or pass it
  as a method parameter.

Tests:

- Added TestCorruptedRunsFileRecovers covering all three methods plus
  a follow-up test proving the healed file is usable by a normal
  start_workflow_run + record_action_start sequence afterwards.
- Added TestMissingRunsFileNoCrash covering the missing-file case for
  all three methods.
- Added test_mkdir_failure_aborts_cleanly_with_one_warning covering
  the hoisted-mkdir behaviour.

Verification:

- ruff format --check / ruff check clean on touched files.
- pytest tests/unit/tooling tests/tooling: 284 passed.
- pytest -q --ignore=tests/integrations: 7414 passed, 2 skipped
  (up from 7409 due to the new heal/missing-file tests).
mypy --strict-equivalents flagged 'Returning Any from function declared
to return dict[str, Any]'. Annotate the loaded value explicitly so the
type checker sees the intended shape without changing runtime behavior.
@Muizzkolapo Muizzkolapo merged commit 55b3124 into main Jun 19, 2026
5 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant