Skip to content

feat(notifications): outbound webhook (#560)#594

Merged
frankbria merged 8 commits into
mainfrom
feat/webhook-notifications-560
May 22, 2026
Merged

feat(notifications): outbound webhook (#560)#594
frankbria merged 8 commits into
mainfrom
feat/webhook-notifications-560

Conversation

@frankbria
Copy link
Copy Markdown
Owner

@frankbria frankbria commented May 22, 2026

Summary

Closes #560.

Adds outbound webhook notifications: a single user-configurable URL in Settings → Notifications that receives JSON payloads on three events:

  • batch.completed — only on full BATCH_COMPLETED (not PARTIAL/FAILED/CANCELLED, per ticket scope)
  • blocker.created
  • pr.merged — only on successful merges

The Test button posts a sample payload and surfaces the HTTP status code via toast.

Implementation notes

  • Storage: per-workspace .codeframe/notifications_config.json via atomic_write_json, mirroring the workspace_config pattern from [Phase 5.1] Settings page: PROOF9 defaults and workspace config #556. No SQLite migration — the Traycer plan's workspace_settings table was unnecessary now that the JSON-on-disk convention is established.
  • Service layer: WebhookNotificationService gains a generic send_event(payload, url) → WebhookSendResult and send_event_background(...) wrapper. The legacy send_blocker_notification(_background) API used by backend_worker_agent.py and frontend_worker_agent.py is preserved untouched.
  • Dispatch sites: core/blockers.py:create(), four BATCH_COMPLETED emit sites in core/conductor.py, and ui/routers/pr_v2.py:merge_pull_request after a successful merge. All wrapped in try/except — a misconfigured webhook can never break the triggering operation. 5s timeout, fire-and-forget.
  • Endpoints: GET / PUT /api/v2/settings/notifications and POST /api/v2/settings/notifications/test, added to the existing settings_v2.py (no new router file).
  • UI: new NotificationsTab.tsx registered as the 5th settings tab. URL input + enable toggle + Test button (disabled when URL is unsaved or empty). Save/Discard mirror WorkspaceConfigTab.tsx.

Test plan

  • tests/core/test_notifications_config.py — 9 tests: load/save roundtrip, defaults, corrupt JSON fallback, is_webhook_active semantics
  • tests/notifications/test_webhook_send_event.py — 13 tests: 2xx/5xx/timeout/ClientError/no-URL paths, URL override, background scheduling, payload formatters
  • tests/ui/test_settings_notifications.py — 10 tests: GET defaults, PUT persist, PUT empty/whitespace, GET-after-PUT, POST /test (200/500/no-URL/disabled-flag)
  • tests/core/test_blockers_webhook.py — 4 tests: no dispatch when disabled, dispatch when enabled, failure swallowed
  • tests/core/test_conductor_webhook.py — 7 tests: fires only on BATCH_COMPLETED, not on PARTIAL/FAILED/CANCELLED, failure swallowed
  • tests/ui/test_pr_v2_webhook.py — 4 tests: dispatch with full GitHub URL, fallback URL when env unconfigured, failure swallowed
  • web-ui/src/__tests__/components/settings/NotificationsTab.test.tsx — 12 tests: empty/loaded state, dirty-state Save/Test toggling, success/error toasts for each test-button outcome, Discard semantics
  • SettingsPage.test.tsx updated for the new tab
  • npm run build succeeds
  • Lint clean on all touched files
  • 121 v2 tests pass across affected modules (settings, workspace_config, pr_v2, conductor-related, blockers, notifications)

Known limitations

  • Webhook URL is stored plaintext for v1 — consistent with the workspace SQLite DB / BLOCKER_WEBHOOK_URL env var pattern. TODO: encrypt at rest will be a follow-up.
  • Webhook is fire-and-forget with no retry — matches ticket scope ("Fire and forget (no retry), log failures").
  • HMAC signing / verification is explicitly out of scope per the ticket.

Summary by CodeRabbit

  • New Features

    • Outbound, non-blocking webhook notifications for blocker.created, batch.completed, and pr.merged events
    • Workspace-scoped notification settings with a new Notifications tab, API client support, and a server-side “Test webhook” action
    • Webhook payload formatting and best-effort background delivery with defensive failure handling
  • Documentation

    • Roadmap/docs updated: Phase 5.3 “Async notifications” marked complete with webhook details and failure semantics
  • Tests

    • Extensive tests for dispatch, config validation, UI flows, and failure resilience

Review Change Stack

Settings → Notifications tab lets users configure a single webhook URL +
enabled toggle that receives JSON payloads on three events:

  - batch.completed (only on full COMPLETED, not PARTIAL/FAILED/CANCELLED)
  - blocker.created
  - pr.merged (only on successful merges)

The Test button posts a sample {"event": "test"} payload and surfaces the
HTTP status code via toast.

Implementation:
  - Per-workspace .codeframe/notifications_config.json (atomic write),
    matching the workspace_config pattern from #556. No SQLite migration.
  - WebhookNotificationService gains a generic send_event(payload, url)
    that returns WebhookSendResult (ok/status_code/error) and a fire-and-
    forget send_event_background wrapper. Legacy send_blocker_notification
    API preserved for the existing in-agent blocker integration.
  - Dispatch helpers in core/blockers.py, core/conductor.py (4 BATCH sites),
    and ui/routers/pr_v2.py — all wrapped in try/except so a misconfigured
    webhook never breaks the triggering operation. 5s timeout.
  - 47 new tests across config storage, send_event, settings endpoints,
    and per-trigger dispatch wiring.

Closes #560.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR implements end-to-end outbound webhook notifications: per-workspace config persistence, a webhook sender with payload formatters, best-effort dispatch on blocker.create, batch.completed, and pr.merged, settings API endpoints (get/put/test), and a Notifications settings UI with tests.

Changes

Webhook Notifications Implementation

Layer / File(s) Summary
Documentation: Phase Completion Status
CLAUDE.md, docs/PRODUCT_ROADMAP.md
Marks Phase 5.3 complete and documents the webhook surface: settings UI, persistence, test endpoint, dispatched events, and failure-handling model (logged, non-blocking).
Notifications Configuration Storage
codeframe/core/notifications_config.py, tests/core/test_notifications_config.py
Per-workspace webhook config persisted to state_dir/notifications_config.json; provides load_notifications_config, save_notifications_config, and is_webhook_active() with defaults, atomic writes, URL safety checks, and redacted logging.
Webhook Service & Payload Formatting
codeframe/notifications/webhook.py, tests/notifications/test_webhook_send_event.py
Adds WebhookSendResult, UTC ISO-8601 timestamp helper, payload formatters for batch.completed, blocker.created, pr.merged, and test, send_event() returning structured results, and send_event_background() for fire-and-forget dispatch from async or sync contexts.
Event Dispatchers: Blocker, Batch, PR
codeframe/core/blockers.py, codeframe/core/conductor.py, codeframe/ui/routers/pr_v2.py, tests/core/test_blockers_webhook.py, tests/core/test_conductor_webhook.py, tests/ui/test_pr_v2_webhook.py
Integrates best-effort webhook dispatchers: blocker creation, batch completion (resume/retry/serial/parallel), and PR merge now trigger background webhook sends when enabled; dispatchers check activation, format payloads, call send_event_background, and swallow/log failures.
Settings API: Webhook Configuration Endpoints
codeframe/ui/routers/settings_v2.py, tests/ui/test_settings_notifications.py
Adds GET /api/v2/settings/notifications, PUT /api/v2/settings/notifications (trim + empty→null + scheme/host validation), and POST /api/v2/settings/notifications/test which sends a test payload and returns send outcome or 400 when no URL is configured.
Frontend: Settings Page & Notifications UI
web-ui/src/app/settings/page.tsx, web-ui/src/components/settings/NotificationsTab.tsx, web-ui/src/lib/api.ts, web-ui/src/types/index.ts, web-ui/src/__tests__/components/settings/*
Adds a Notifications tab and NotificationsTab React component with SWR-backed draft editing, Test/Save/Discard actions, API client notificationsApi.get/update/test, TypeScript types, and unit tests covering UI flows and toasts.
Backend tests for config & dispatch
tests/core/*, tests/ui/*
Pytest suites validating notifications config behavior, event dispatch integration, and dispatch resilience (errors are logged and do not break flows).
Webhook send & payload tests
tests/notifications/test_webhook_send_event.py
Unit tests for send_event, send_event_background, exception mapping, sync/async background dispatch behavior, and payload formatting/timestamp correctness.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 A webhook hops from code to net,

It posts a payload, tiny and neat,
Batch, blocker, PR — a gentle thunk,
Fire-and-forget, then off to sleep. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat(notifications): outbound webhook (#560)' directly and concisely describes the main change: adding outbound webhook functionality to the notifications feature.
Linked Issues check ✅ Passed The PR fully implements all coding requirements from #560: webhook URL persistence, test endpoint, batch/blocker/PR events, fire-and-forget dispatch with logging, 5s timeout, URL validation, and comprehensive test coverage.
Out of Scope Changes check ✅ Passed All changes are directly scoped to #560 objectives: webhook infrastructure, notification config storage, dispatch points, API endpoints, and UI tab. Documentation updates and test modules are supporting changes for the webhook feature implementation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/webhook-notifications-560

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

Code Review — feat(notifications): outbound webhook (#560)

Overall: This is a well-scoped implementation that cleanly follows the established patterns for workspace config, service layers, and tab UI. Architecture is sound — core is headless, dispatch is fire-and-forget, failures never bubble into callers. The test coverage is comprehensive. A few items to address before merge.


Issues

1. URL validation — SSRF risk (low but real)

save_notifications_config and update_notification_settings accept any string as webhook_url. A user could enter file:///etc/passwd or an internal network address and aiohttp would attempt the request. Since this is a UI feature backed by user auth, the blast radius is limited, but basic validation is cheap:

from urllib.parse import urlparse
parsed = urlparse(url)
if parsed.scheme not in ('http', 'https'):
    raise ValueError(f"Webhook URL must use http or https, got: {parsed.scheme!r}")

2. send_event_background silently drops in CLI (sync) context

blockers.py and conductor.py are called from CLI code that runs without an event loop. send_event_background catches the RuntimeError from asyncio.get_running_loop() and logs a warning, so webhooks will never fire from CLI-initiated batch runs or blocker creation. This is mentioned in the PR description but is not documented anywhere user-facing (CLAUDE.md, roadmap). Consider adding a note to the known limitations section — users configuring webhooks and triggering batch via CLI will be confused when they receive no delivery.

3. # pragma: no cover on a reachable branch

In blockers.py line 47:

    except Exception:  # pragma: no cover -- guarded by send_event_background already

The justification is incorrect — send_event_background only guards network-level errors in the async path. The outer except is reachable if the lazy import fails (e.g., module not installed). Remove the pragma or add a test patching the import to raise ImportError. The same pattern in conductor.py does NOT use pragma: no cover — that inconsistency should be fixed either way.

4. Timestamp interoperability

_utc_iso_now() returns +00:00 offset format (e.g., 2026-05-22T10:30:00.123456+00:00). Many webhook consumers (Slack, Zapier, Discord) expect the Z suffix. Consider:

def _utc_iso_now() -> str:
    return datetime.now(timezone.utc).strftime('%Y-%m-%dT%H:%M:%SZ')

Not a blocker but may cause consumer-side parsing warnings.


Minor / Nice to have

  • Duplicate atomic write logic: notifications_config.py (core) reimplements the tempfile atomic write that ui/routers/_helpers.atomic_write_json already provides. Architecturally forced (core can't import UI), so not a bug. Long-term fix would be moving the utility to core/utils.py.

  • test_notification_webhook ignores webhook_enabled: The /test endpoint fires regardless of the enabled flag. This is confirmed intentional by test_test_works_even_when_enabled_flag_is_false (test before enabling is good UX). Worth a one-line docstring note so future editors don't "fix" it.

  • Confirm tests/notifications/ is discovered: The new tests/notifications/test_webhook_send_event.py directory — worth confirming pytest picks it up without an __init__.py in the test runner CI step.


What's well done

  • is_webhook_active is the right abstraction — callers get a URL or None, no config schema leakage.
  • All three dispatch helpers (_dispatch_blocker_webhook, _dispatch_batch_completed_webhook, _dispatch_pr_merged_webhook) are consistently structured with the same try/except shape.
  • canTest in NotificationsTab.tsx is exactly right: URL must match the saved value (not dirty) before the Test button enables.
  • Legacy send_blocker_notification / send_blocker_notification_background API is untouched — zero risk of breaking existing callers.
  • BATCH_COMPLETED-only filter correctly gates on events.EventType.BATCH_COMPLETED, not a string comparison.
  • 59 new tests across 7 test files is solid coverage.

Summary: Items 1 (URL validation) and 2 (CLI-context documentation) are worth addressing before merge. Items 3 and 4 are small but clean fixes. Everything else is polish.

@frankbria
Copy link
Copy Markdown
Owner Author

Demo verification (Phase 11) — end-to-end outcome evidence

Started a local HTTP receiver on 127.0.0.1:9876, configured a workspace's notifications_config.json to point at it, and exercised all three trigger paths. The receiver logged the actual payloads — not "the function was called", but the bytes that arrived over the wire.

Setup

[receiver] listening on :9876
[demo] workspace at /tmp/tmpXXX/demo_ws
[demo] AC1: saved webhook_url='http://127.0.0.1:9876/hook' enabled=True (is_webhook_active='http://127.0.0.1:9876/hook')

AC1 — Webhook URL saves and persists

save_notifications_config + load_notifications_config roundtrip confirmed; is_webhook_active(workspace) returns the URL only when both URL is set AND enabled flag is true.

AC2 — Test button fires a payload and surfaces the HTTP status

[demo] AC2: /test result -> ok=True status=204 error=None

Receiver got:

{"event": "test", "timestamp": "2026-05-22T18:06:04.034448+00:00"}

AC3 — Events fire on real triggers

Blocker created (blockers.create):

{"event": "blocker.created", "blocker_id": "a1349c6a-8ea8-4dce-b1f4-c0a62302d015", "task_id": null, "timestamp": "..."}

Batch completion (_dispatch_batch_completed_webhook with BATCH_COMPLETED):

{"event": "batch.completed", "batch_id": "demo-batch-2", "task_count": 5, "timestamp": "..."}

Negative case (BATCH_PARTIAL): receiver got nothing — confirming we only fire on full COMPLETED, per the ticket scope.

AC4 — Failures don't break the triggering operation

With the webhook URL changed to http://127.0.0.1:1/dead (port 1 = always refused), a blockers.create() call still succeeded and returned a valid Blocker. Log output captured the error without raising:

Webhook ClientError for event blocker.created: Cannot connect to host 127.0.0.1:1 ssl:default [Connect call failed ('127.0.0.1', 1)]
[demo] AC4: blocker.create against dead URL still returned blocker id=86ae27c2 (no exception bubbled up)

AC5 — Tests pass

  • Backend: full v2 suite — 3036 passed, 2130 deselected in 664s (no regressions)
  • Frontend: full unit suite — 886 passed across 72 suites (already green in CI)
  • Build: npm run build clean

The PR-merge webhook (pr.merged) wasn't included in the runtime demo because it requires an actual GitHub merge against a live PR; it's covered by tests/ui/test_pr_v2_webhook.py (4 tests) including the canonical-URL construction and a graceful fallback when GITHUB_REPO is unset.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
tests/ui/test_settings_notifications.py (1)

39-47: ⚡ Quick win

Add a contract test that workspace_path is required.

Because the fixture overrides get_v2_workspace, these tests won’t catch regressions where the router stops requiring workspace_path. Add one non-overridden test asserting request failure when workspace_path is missing.

As per coding guidelines, codeframe/ui/routers/**/*.py: Web UI API client must require workspace_path query parameter for all endpoints.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/ui/test_settings_notifications.py` around lines 39 - 47, Add a new test
(not using the client fixture that overrides get_v2_workspace) that mounts the
settings_v2.router on a FastAPI app and issues a request to one of its endpoints
without providing the workspace_path query parameter, asserting the response is
a 400/422 (validation error) or otherwise indicates a missing required query
param; specifically, create an app, include settings_v2.router, do NOT set
app.dependency_overrides[get_v2_workspace], use TestClient to call an endpoint
from settings_v2.router without workspace_path, and assert the request fails to
ensure the router enforces the workspace_path requirement.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@codeframe/core/notifications_config.py`:
- Around line 51-55: load_notifications_config currently assumes
json.loads(path.read_text()) returns a dict and calls data.get(...), which will
raise for non-object JSON (e.g., []); modify load_notifications_config to
validate that data is a mapping (e.g., isinstance(data, dict)) and if not, treat
it as an empty dict before accessing keys so that "webhook_url" and
"webhook_enabled" are read safely (return None and False respectively when
payload is not an object).

In `@codeframe/notifications/webhook.py`:
- Around line 275-293: send_event_background currently returns early when no
running asyncio loop, dropping webhooks; modify it so the sync call path still
queues delivery by, on RuntimeError, spawning a daemon thread that calls
asyncio.run(self.send_event(payload, url=url)) (or equivalent thread-target
wrapper) so the send runs asynchronously without blocking the caller. Update the
except RuntimeError block in send_event_background to create and start that
daemon Thread rather than logging and returning, keeping the existing
loop.create_task behavior when a running loop exists.

In `@codeframe/ui/routers/settings_v2.py`:
- Around line 433-496: Wrap calls to load_notifications_config,
save_notifications_config, and WebhookNotificationService.send_event in
try/except blocks inside get_notification_settings,
update_notification_settings, and test_notification_webhook respectively; on
exception, log the error (use logger.exception or request-scoped logger) and
raise HTTPException(status_code=500, detail=api_error(str(e),
ErrorCodes.SERVER_ERROR)) so the response follows the router's api_error payload
pattern, preserving the existing behavior for validation (e.g., keep the 400
when no URL) and include the exception message in the api_error detail for
debugging.

In `@tests/notifications/test_webhook_send_event.py`:
- Around line 111-119: The test is incorrectly decorated with
`@pytest.mark.asyncio` so an event loop exists and the "outside loop" branch isn't
exercised; change test_send_event_background_outside_loop_logs_and_returns to a
synchronous test by removing the `@pytest.mark.asyncio` decorator and converting
async def to def, instantiate WebhookNotificationService and call
svc.send_event_background({"event":"test"}) exactly as before (optionally keep
assertions or caplog checks) so the call runs outside any event loop and
exercises the intended branch in
WebhookNotificationService.send_event_background.

---

Nitpick comments:
In `@tests/ui/test_settings_notifications.py`:
- Around line 39-47: Add a new test (not using the client fixture that overrides
get_v2_workspace) that mounts the settings_v2.router on a FastAPI app and issues
a request to one of its endpoints without providing the workspace_path query
parameter, asserting the response is a 400/422 (validation error) or otherwise
indicates a missing required query param; specifically, create an app, include
settings_v2.router, do NOT set app.dependency_overrides[get_v2_workspace], use
TestClient to call an endpoint from settings_v2.router without workspace_path,
and assert the request fails to ensure the router enforces the workspace_path
requirement.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 953e7b11-c3e1-4e93-9d82-70e970fc428e

📥 Commits

Reviewing files that changed from the base of the PR and between d1fc789 and 7dd20e8.

📒 Files selected for processing (20)
  • CLAUDE.md
  • codeframe/core/blockers.py
  • codeframe/core/conductor.py
  • codeframe/core/notifications_config.py
  • codeframe/notifications/webhook.py
  • codeframe/ui/routers/pr_v2.py
  • codeframe/ui/routers/settings_v2.py
  • docs/PRODUCT_ROADMAP.md
  • tests/core/test_blockers_webhook.py
  • tests/core/test_conductor_webhook.py
  • tests/core/test_notifications_config.py
  • tests/notifications/test_webhook_send_event.py
  • tests/ui/test_pr_v2_webhook.py
  • tests/ui/test_settings_notifications.py
  • web-ui/src/__tests__/components/settings/NotificationsTab.test.tsx
  • web-ui/src/__tests__/components/settings/SettingsPage.test.tsx
  • web-ui/src/app/settings/page.tsx
  • web-ui/src/components/settings/NotificationsTab.tsx
  • web-ui/src/lib/api.ts
  • web-ui/src/types/index.ts

Comment thread codeframe/core/notifications_config.py
Comment thread codeframe/notifications/webhook.py
Comment thread codeframe/ui/routers/settings_v2.py
Comment thread tests/notifications/test_webhook_send_event.py Outdated
Internal review found four issues; this commit addresses all of them.

1. **SSRF guard (high)**: `PUT /api/v2/settings/notifications` now rejects
   URLs whose scheme isn't `http(s)` or that lack a host. Previously a user
   could enter `file:///etc/passwd` and aiohttp would attempt the request.

2. **CLI sync-context delivery (high)**: `send_event_background` now
   spawns a daemon thread when no event loop is running. Previously
   webhooks would silently never fire from `cf work batch run` and other
   CLI-initiated batch / blocker creation paths — a functional defect
   against AC3 of the ticket. Verified end-to-end against a local
   receiver: both `blocker.created` and `batch.completed` events arrive
   from sync Python with no event loop.

3. **Incorrect `# pragma: no cover` (low)**: Removed from `blockers.py`.
   The justification was wrong — the import inside the helper can fail
   independently of the network-level guard in `send_event_background`.

4. **Timestamp interoperability (low)**: Payload `timestamp` now uses
   `YYYY-MM-DDTHH:MM:SSZ` instead of `…+00:00` offset format. Slack,
   Discord, and Zapier all prefer the Z suffix. Sub-second precision was
   dropped along with the offset.

New tests:
  - 5 endpoint tests for URL scheme/host validation (file://, ftp://,
    http://, https://, missing host)
  - 1 test confirming `send_event_background` runs in a daemon thread
    when called from sync context
  - 1 test confirming the Z-suffix timestamp format
@frankbria
Copy link
Copy Markdown
Owner Author

Review feedback addressed (commit 6d7f1fc)

Internal Claude review flagged 4 issues; all fixed.

# Severity Issue Fix
1 High SSRF risk — file:// / ftp:// accepted as webhook_url _validate_webhook_url in settings_v2.py rejects non-http(s) schemes and URLs without a host. 5 new endpoint tests.
2 High send_event_background silently drops in CLI sync context Spawns a daemon thread when no event loop is running. Verified end-to-end: both blocker.created and batch.completed arrived at a local receiver from sync Python. 1 new unit test.
3 Low Incorrect # pragma: no cover in blockers.py Removed.
4 Low Timestamp uses +00:00 instead of Z _utc_iso_now now returns YYYY-MM-DDTHH:MM:SSZ. 1 new format test.

Re-demo: CLI sync-context (the AC3 defect)

$ python /tmp/webhook_receiver.py 9876 &
[receiver] listening on :9876
$ python /tmp/demo_webhook_sync.py
[sync-demo] workspace at /tmp/tmpj1q7bp7k/demo_ws, webhook -> http://127.0.0.1:9876/hook
18:10:54 BLOCKER_CREATED task_id=t-sync-1
[sync-demo] created blocker 42e5cbc1
[sync-demo] dispatched BATCH_COMPLETED
[sync-demo] done.

# receiver captured:
[RECEIVED] {"event": "blocker.created", "blocker_id": "42e5cbc1-…", "task_id": "t-sync-1", "timestamp": "2026-05-22T18:10:54Z"}
[RECEIVED] {"event": "batch.completed", "batch_id": "sync-batch-1", "task_count": 3, "timestamp": "2026-05-22T18:10:54Z"}

Note the Z suffix on timestamps and the fact that both events arrived from a synchronous Python process with no event loop — this is the path that previously dropped silently.

Tests

  • 82/82 webhook-related v2 tests pass (was 59, added 7 new tests)
  • Lint clean on all touched files

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

Follow-up Review — feat(notifications): outbound webhook (#560)

Reviewing against the prior review comment from the initial cut. The author also ran a complete end-to-end demo (AC1–AC5) with a live HTTP receiver, which is strong evidence of correctness.


Prior Issues — Status

1. SSRF URL validation ✅ Resolved
_validate_webhook_url in settings_v2.py enforces http/https scheme and a non-empty host, and raises HTTPException(400) otherwise. Test coverage in test_settings_notifications.py confirms file://, ftp://, and host-less URLs are rejected.

2. CLI sync dispatch (fire-and-forget from non-async context) ✅ Resolved
send_event_background now detects the absence of a running event loop and spawns a daemon thread calling _run_send_event_sync -> asyncio.run(send_event(...)). The demo confirms blocker and batch webhooks fire from CLI-initiated code. test_send_event_background_outside_loop_runs_in_thread verifies the thread is spawned. This was the most impactful open issue.

3. pragma: no cover inconsistency ✅ Not present
The shipped implementation does not use # pragma: no cover on any of the dispatch helpers — the exception blocks are all exercised by the test_*_failure_is_swallowed / test_*_failure_does_not_raise tests.

4. Timestamp Z suffix ✅ Resolved
_utc_iso_now() uses strftime with Z suffix — no fractional seconds, no +00:00 offset. test_timestamp_uses_z_suffix_not_offset asserts this explicitly.


Remaining Concern (one item)

Non-dict JSON in load_notifications_config

The except clause catches OSError, json.JSONDecodeError, and ValueError, but not AttributeError. If the file contains valid non-object JSON ([], null, a bare integer), json.loads() succeeds and then data.get(...) raises AttributeError — which propagates uncaught, breaking the "never raises" contract.

Minimal fix:

data = json.loads(path.read_text())
if not isinstance(data, dict):
    raise ValueError(f"Expected JSON object, got {type(data).__name__}")

CodeRabbit flagged this too. Small but real gap — a manually edited config file could trigger it.


What is Well Done

  • All four dispatch helpers follow the same consistent shape: load config → check is_webhook_active → format payload → send_event_background → catch-all except Exception with logging.
  • is_webhook_active is the right single control point — callers get a URL or None with no config-schema leakage.
  • canTest in NotificationsTab.tsx is correct: saved URL must match the current input (no dirty state) before Test enables.
  • The PR is tightly scoped to issue [Phase 5.3] Async notifications: outbound webhook #560; HMAC and retry are explicitly called out as future work in code comments and CLAUDE.md.

Verdict: The one remaining open item (isinstance(data, dict) guard) is a one-liner. Everything else from the prior review has been addressed. Once that fix lands, this is ready to merge.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@codeframe/ui/routers/settings_v2.py`:
- Around line 450-480: The PUT-only validation in _validate_webhook_url allows
persisted/hand-edited webhook URLs to bypass checks; ensure validation runs at
send/load time by invoking _validate_webhook_url (or its logic) from every
runtime send path before creating/using WebhookNotificationService: update
test_notification_webhook to call _validate_webhook_url on the loaded URL and
modify the dispatcher in codeframe/core/blockers.py (where notifications are
read from notifications_config.json and sent) to re-validate and reject/skip
invalid entries, or centralize validation in the shared load/send routine so
persisted configs cannot bypass the scheme and host checks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a347c4ad-df45-4eb1-a1be-c6d8024d8e1f

📥 Commits

Reviewing files that changed from the base of the PR and between 7dd20e8 and 6d7f1fc.

📒 Files selected for processing (5)
  • codeframe/core/blockers.py
  • codeframe/notifications/webhook.py
  • codeframe/ui/routers/settings_v2.py
  • tests/notifications/test_webhook_send_event.py
  • tests/ui/test_settings_notifications.py

Comment thread codeframe/ui/routers/settings_v2.py
Address remaining review feedback from CodeRabbit and the follow-up
Claude review:

1. **Non-dict JSON crashes `load_notifications_config`**: a config file
   containing valid-but-non-object JSON (`[]`, `null`, `42`) caused
   `data.get(...)` to raise `AttributeError` uncaught, violating the
   "never raises" contract. `isinstance(data, dict)` guard added; the
   shared except clause now catches it via re-raised ValueError.

2. **Defence-in-depth URL validation at load/send time**: the PUT
   endpoint validates URL scheme/host, but a hand-edited or
   pre-migration `.codeframe/notifications_config.json` could bypass
   that. `is_webhook_active` now re-runs scheme/host validation and
   returns None (skipping dispatch) if a stored URL is unsafe. The
   `/test` endpoint also re-validates before sending.

3. **Structured exception handling for endpoints**: `PUT /notifications`
   and `POST /notifications/test` now wrap I/O failures in the standard
   `api_error(...)` shape used elsewhere in the router, instead of
   bubbling raw 500s.

New tests:
  - 3 corrupt-JSON cases in test_notifications_config.py (list/null/int)
  - 3 unsafe-URL cases in is_webhook_active (file://, schemeless, valid
    https as positive control)
  - 1 endpoint test rejecting an unsafe stored URL at /test time

130/130 webhook-related v2 tests pass.
@frankbria
Copy link
Copy Markdown
Owner Author

Round 2 — addressing CodeRabbit + follow-up Claude review (commit 77adedb)

CodeRabbit (cross-family review) and the follow-up Claude review converged on the same remaining concern after my last fixup, plus two adjacent items. All three are addressed.

Source Severity Issue Status
Claude + CodeRabbit Major load_notifications_config raises AttributeError on valid-but-non-object JSON ([], null, 42) Fixed: isinstance(data, dict) guard, falls back to defaults
CodeRabbit Major URL validation only on PUT — hand-edited config could bypass Fixed: is_webhook_active re-validates scheme/host; /test endpoint re-validates before sending
CodeRabbit Minor Endpoint I/O failures bubble as raw 500s instead of api_error(...) shape Fixed: PUT/test now use the consistent error envelope
CodeRabbit Major send_event_background drops in sync path Already fixed in 6d7f1fc (daemon thread) — CodeRabbit reviewed the older commit
CodeRabbit Major Test doesn't run outside event loop Already fixed in 6d7f1fc (removed @pytest.mark.asyncio, patches _run_send_event_sync) — same
CodeRabbit Nitpick Contract test that workspace_path is required Skipped — concern is router-wide, not specific to #560 changes; existing v2 tests cover the dependency override

New tests (8)

  • Corrupt JSON: list/null/integer → defaults
  • Defence-in-depth: hand-edited file:// and schemeless URLs rejected at is_webhook_active
  • Hand-edited unsafe URL rejected at /test endpoint
  • Positive control: valid https:// URL accepted

Backend: 130/130 webhook-related v2 tests pass. Lint clean.

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

Code Review — feat(notifications): outbound webhook (#560)

Overall this is a solid implementation. The architecture respects the core-headless contract, the fire-and-forget isolation is thorough, the defense-in-depth URL validation is a nice touch, and the test coverage is comprehensive across all layers. A few issues worth addressing before merge:


Issues

1. Unused expression in sync-dispatch thread test

In tests/notifications/test_webhook_send_event.py:

threading.active_count() - threads_before  # informational only

This computes a value but discards it — it is not an assertion and does nothing. If you want to verify no threads leak, turn it into assert threading.active_count() - threads_before <= 1. As written it gives false confidence.

2. Private-network SSRF not blocked

_validate_webhook_url and _is_safe_webhook_url guard against file:// / ftp:// / schemeless URLs, but they do not block RFC-1918 addresses or loopback. These all pass validation today: http://localhost:8080/admin, http://127.0.0.1/internal, http://192.168.1.1/. The docstring already says "SSRF on local resources" but only the scheme half of that risk is addressed. In a self-hosted single-user tool the blast radius is limited, but once this runs behind any shared server a misconfigured URL can reach internal services. Recommend at least a TODO comment noting this gap, or a socket-level check in _is_safe_webhook_url if the effort is acceptable.

3. Test button tooltip vs. disabled state edge case

In NotificationsTab.tsx:

const canTest =
  !testing && trimmedUrl.length > 0 && (data.webhook_url ?? '') === trimmedUrl;

When a user appends a trailing space to a saved URL, trimmedUrl equals data.webhook_url so canTest is true — but dirty is also true, so the tooltip says "Save the URL before testing" while the button is enabled. Simple fix: add && !dirty to canTest:

const canTest = !testing && !dirty && trimmedUrl.length > 0 && !!data.webhook_url;

4. WebhookNotificationService class docstring is stale

The class docstring still reads "Async webhook notification service for blocker alerts" — the class is now general-purpose (batch / blocker / PR / test events). The module-level docstring was updated; the class docstring was not.

5. format_batch_payload missing a status field

The payload only has batch_id and task_count. Consumers cannot see why the batch ended — the event only fires for BATCH_COMPLETED today, but that contract is invisible to webhook receivers. A "status": "completed" field would make the payload self-documenting and allow forward-compatible differentiation if PARTIAL/FAILED events are added later.


Nitpicks

  • Misleading roadmap text: CLAUDE.md / PRODUCT_ROADMAP.md say "atomically written via atomic_write_json" but save_notifications_config in core/ implements its own tempfile.mkstemp + os.replace directly (correctly — core cannot import the UI helper). The description could be more precise.
  • Import ordering in settings_v2.py: The notification imports land between codeframe.core.workspace and codeframe.lib.rate_limiter. Not a blocker, but isort would reorder these.
  • pr_v2.py fallback URL is opaque: pr_url = f"pr#{pr_number}" (no scheme, no host) will likely confuse downstream webhook consumers. Consider a clearer placeholder or omit the field rather than returning an unparseable non-URL string.

What is working well

  • The lazy-import pattern in dispatch helpers keeps the critical path clean and ensures a bad import never breaks the triggering operation.
  • send_event_background correctly handles both sync (daemon thread) and async (create_task) contexts — the CLI path is easy to miss and it is explicitly tested.
  • test_rejects_unsafe_stored_url_at_test_time verifies defence-in-depth against hand-edited config — good test.
  • The enabled-flag-independent Test button behaviour is an intentional and well-documented design choice.

Items 1 and 3 are the ones I would most want addressed before landing; 2 is worth a TODO comment at minimum.

Address remaining follow-up Claude review items.

1. **Useless expression in sync-dispatch test**: replaced
   `threading.active_count() - threads_before  # informational only`
   with a real assertion that waits up to ~1s for the daemon thread to
   join and asserts no thread leak.

2. **Private-network SSRF gap**: added explicit TODO in
   `_is_safe_webhook_url` noting that RFC-1918 and loopback addresses
   are intentionally permitted for self-hosted single-user use; flagged
   for a socket-level check if CodeFRAME ever runs multi-tenant.

3. **Stale class docstring**: `WebhookNotificationService` is no longer
   blocker-only; docstring now describes the general-purpose `send_event`
   role alongside the legacy `send_blocker_notification`.

4. **`format_batch_payload` self-documenting status field**: added
   `"status": "completed"` so consumers don't need out-of-band knowledge
   that the event only fires for COMPLETED today; forward-compatible if
   PARTIAL/FAILED are added later.

5. **`pr_url` no longer carries an unparseable sentinel**: when
   GitHubIntegration can't be constructed, the payload now sets
   `pr_url: None` instead of `"pr#N"`. Added a new always-present
   `pr_number` field so consumers can branch on it without parsing a URL.

6. **`canTest` requires !dirty**: defence-in-depth so the Test button
   never fires against a value other than the persisted/displayed URL,
   even via programmatic state edits.

Test updates: 2 new payload-shape tests (`status`, optional `pr_url`),
1 thread-leak assertion, plus the PR-merge tests updated to the new
`format_pr_payload(pr_number, pr_url)` signature.

132/132 backend webhook tests pass; 12/12 frontend NotificationsTab
tests pass; lint clean; build clean.
@frankbria
Copy link
Copy Markdown
Owner Author

Round 3 — polish for follow-up Claude review (commit 97989f9)

The latest review caught a useless expression, several payload/API design improvements, and a minor doc/SSRF gap. All addressed.

# Severity Issue Fix
1 Major Unused expression in sync-dispatch test gave false confidence Replaced with a real "no thread leak" assertion that waits up to ~1s
2 Major _is_safe_webhook_url doesn't block RFC-1918 / loopback Added explicit TODO noting this is intentional for self-hosted single-user; flagged for a socket-level check if multi-tenant
3 Minor WebhookNotificationService class docstring still said "for blocker alerts" Updated to describe the general-purpose role
4 Minor format_batch_payload lacked self-documenting status field Added "status": "completed" — forward-compatible if PARTIAL/FAILED events get added
5 Nitpick pr_url fallback sentinel "pr#N" was unparseable Now sets pr_url: None when GitHub isn't configured; added always-present pr_number field so consumers can branch without parsing URLs
6 Minor canTest could fire against a different value than displayed if dirty Added !dirty to canTest; defense-in-depth (the specific reachability via type="url" trailing-whitespace was actually a no-op since jsdom/browsers normalize, but the cleaner check is worth it)

Tests

  • Backend: 132/132 webhook-related v2 tests pass (added 3 new payload-shape tests)
  • Frontend: 12/12 NotificationsTab tests pass; build clean
  • Lint clean

Payload changes (additive, backward-compatible for existing consumers)

  • batch.completed: now includes status: "completed"
  • pr.merged: now includes pr_number: <int>; pr_url is now Optional[str] (was always a string, sometimes the unparseable pr#N sentinel)

Both changes are documented in code via the comments on the format helpers.

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

Code Review — Round 4 (commit 97989f9)

This review checks that all Round 3 items are correctly implemented in the current code.

Round 3 Items — All Resolved

# Issue Verdict
1 Unused expression in sync-dispatch thread test Fixed: Replaced with real assertion + polling (assert threading.active_count() - threads_before <= 0)
2 RFC-1918 / loopback SSRF not blocked Fixed: Explicit TODO added to _is_safe_webhook_url docstring documenting intentional behavior for self-hosted use, with actionable guidance for the multi-tenant path
3 Stale class docstring on WebhookNotificationService Fixed: Updated to describe the general-purpose outbound role
4 format_batch_payload missing status field Fixed: "status": "completed" added with a forward-compatibility comment; test_format_batch_payload_includes_status asserts it
5 pr_url fallback was unparseable sentinel "pr#N" Fixed: Now correctly sets pr_url = None when GitHubIntegration() can't be constructed; pr_number is always present for consumers
6 canTest could fire while form is dirty Fixed: !dirty added: const canTest = !testing && !dirty && !!data.webhook_url && trimmedUrl.length > 0 — prevents testing against a URL that differs from what's saved

Spot-checks on the Current Diff

  • _is_safe_webhook_url TODO: confirmed present at the correct location in notifications_config.py, explaining RFC-1918 scope and the socket-level check path.
  • Test endpoint re-validates stored URL: _validate_webhook_url(url) is called at test-time for defence-in-depth; test_rejects_unsafe_stored_url_at_test_time covers the hand-edited bypass path.
  • non-dict JSON guard: isinstance(data, dict) check from Round 2 is present, with three covering tests (list/null/integer).
  • Sync dispatch daemon thread: _run_send_event_sync path confirmed in send_event_background; thread assertion uses polling loop (50 x 20ms) before asserting no leaked thread.
  • canTest trailing-space edge case: with !dirty, a trailing space on the saved URL sets dirty = true and correctly disables the Test button even though trimmedUrl matches the saved value.

One Minor Observation (no action required)

import time is placed inline inside test_send_event_background_outside_loop_runs_in_thread rather than at the module top. Valid Python, but slightly inconsistent with the file's other imports. Not worth a change on its own.

Verdict

All prior review items across four rounds are resolved. The implementation is clean: core is headless, dispatch is consistently fire-and-forget, URL validation is defence-in-depth at both the API layer and the send layer, and the 60+ new tests cover the full surface. Ready to merge.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@codeframe/core/notifications_config.py`:
- Around line 137-141: The current logger.warning call in the webhook validation
branch logs the raw url (see _is_safe_webhook_url and the logger.warning call),
which may leak secrets; change it to log a redacted representation instead—parse
the provided url (using urllib.parse or equivalent) and reconstruct only the
safe parts (e.g., scheme://host[:port] and path/query replaced with "<redacted>"
or omitted) and use that redacted string in the warning message so the log still
indicates which host/scheme was refused without exposing tokens or credentials.

In `@codeframe/ui/routers/pr_v2.py`:
- Around line 198-205: The docstring describing the helper is incorrect about
the pr_url fallback: update the helper's docstring to reflect that when
GitHubIntegration() fails the code sets pr_url = None (not "pr#N"); locate the
block using GitHubIntegration, the pr_url variable and pr_number and change the
docstring text to state that the canonical github.com URL is returned when
GITHUB_REPO is configured, otherwise pr_url will be None and consumers should
rely on the always-present pr_number field.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1c80de8c-8629-4a02-ae2a-81e2986d9bd2

📥 Commits

Reviewing files that changed from the base of the PR and between 6d7f1fc and 97989f9.

📒 Files selected for processing (10)
  • codeframe/core/notifications_config.py
  • codeframe/notifications/webhook.py
  • codeframe/ui/routers/pr_v2.py
  • codeframe/ui/routers/settings_v2.py
  • tests/core/test_notifications_config.py
  • tests/notifications/test_webhook_send_event.py
  • tests/ui/test_pr_v2_webhook.py
  • tests/ui/test_settings_notifications.py
  • web-ui/src/__tests__/components/settings/NotificationsTab.test.tsx
  • web-ui/src/components/settings/NotificationsTab.tsx

Comment thread codeframe/core/notifications_config.py
Comment thread codeframe/ui/routers/pr_v2.py
)

Two new CodeRabbit findings on the latest commits:

1. **URL credential leak in logs** (Major): `is_webhook_active` was
   logging the full rejected URL via `%s`. Webhook URLs commonly embed
   credentials in the path/query (Slack `T*/B*/...` tokens, signed
   Zapier hooks, GitHub PAT URLs). Added `_redact_url_for_log` that
   returns just `scheme://host` for log output. New test asserts
   path-embedded tokens and query parameters never appear in log text.

2. **Stale docstring** (Minor): `_dispatch_pr_merged_webhook` docstring
   still said the fallback was `pr#N`, but round-3 changed it to
   `pr_url = None` (with `pr_number` always present). Docstring now
   matches the implementation.
@frankbria
Copy link
Copy Markdown
Owner Author

Round 4 — secret-leak guard + stale docstring (commit 3c022c5)

CodeRabbit's late-arriving review on the round-2/3 commits caught two more items.

# Severity Issue Fix
1 Major is_webhook_active logs the raw rejected URL — Slack/Zapier/GitHub webhooks often embed credentials in path/query, which would leak into logs Added _redact_url_for_log returning only scheme://host. New test asserts path tokens and query params never appear in log records.
2 Minor _dispatch_pr_merged_webhook docstring still said fallback was "pr#N" (round-3 changed it to pr_url=None) Docstring now matches the implementation.

Both threads marked resolved via GraphQL.

Backend webhook tests: 104/104 passing (1 new redaction test). Lint clean.

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

Code Review — feat(notifications): outbound webhook (#560)

Context: This is my first review pass. CodeRabbit has done three passes and the author has iterated on the feedback. The current diff reflects those fixes (non-object JSON guard, daemon-thread sync path, URL redaction in logs, _validate_webhook_url re-check at test time). Those items look resolved and I won't repeat them.


What's working well

  • Architecture compliance: notifications_config.py is cleanly headless — no FastAPI imports, no UI concerns. The deferred-import pattern in _dispatch_* helpers correctly avoids circular imports while keeping core headless. ✅
  • Failure isolation: every dispatch site wraps in try/except and the triggering operation always completes regardless of webhook state. The 5s timeout is sensible for fire-and-forget. ✅
  • Security layers: scheme/netloc validation at PUT time and re-validated in is_webhook_active for hand-edited configs; log redaction via _redact_url_for_log with a test asserting no token leaks. ✅
  • Test depth: 59 new backend tests + 12 frontend tests covering the full dispatch chain from config → service → UI. The test_unsafe_url_log_does_not_leak_path_or_query test is a nice touch. ✅

Issues to address

1. _validate_webhook_url return value silently discarded in /test endpoint

In settings_v2.py:test_notification_webhook:

url = (cfg["webhook_url"] or "").strip()
if not url:
    raise HTTPException(400, ...)
_validate_webhook_url(url)   # ← return value thrown away
svc = WebhookNotificationService(webhook_url=url, timeout=5)

_validate_webhook_url is a dual-purpose function — it normalises (strips, None-ifies empty) and validates (raises on bad scheme). Here only the raising side-effect is wanted, but this isn't obvious and a future reader might drop the call thinking it's dead code. Either add a one-line comment (# raises HTTPException(400) for non-http(s) scheme) or use the return value (url = _validate_webhook_url(url) or url) to make the intent legible.


2. loop.create_task return value discarded — possible spurious log noise

In send_event_background (async path):

loop.create_task(self.send_event(payload, url=url))

send_event already swallows all exceptions, so in practice this won't raise. But Python ≥ 3.11 emits Task exception was never retrieved to the logs when an unscheduled task escapes the event loop. Assigning the task and adding a no-op done-callback is idiomatic protection:

task = loop.create_task(self.send_event(payload, url=url))
task.add_done_callback(lambda t: t.exception() if not t.cancelled() else None)

Minor, but worth one line to prevent confusing log noise in production.


3. Missing type annotation on _dispatch_batch_completed_webhook

def _dispatch_batch_completed_webhook(
    workspace: Workspace, event_type, batch_id: str, task_count: int
) -> None:

event_type should be events.EventType. It's used in a direct equality check (event_type != events.EventType.BATCH_COMPLETED), so the type is clear from context, but the annotation is missing while all other parameters are typed.


4. Remaining CodeRabbit nitpick (open)

The suggestion from the first CodeRabbit pass to add a contract test that GET /api/v2/settings/notifications requires workspace_path is still open. All three new settings endpoints share this requirement; a missing workspace_path should 400/422, and the current test suite exercises only the overridden-dependency path. A single un-overridden test would catch regressions. Low priority, but it's been mentioned twice.


5. save_notifications_config reimplements atomic write

The PR description says config is persisted "via atomic_write_json", but save_notifications_config implements mkstemp + os.replace inline. The existing atomic_write_json helper lives in codeframe/ui/routers/_helpers.py, which core correctly cannot import (Architecture Rule #1). That's the right call. If this pattern recurs, consider promoting atomic_write_json to codeframe/lib/ where both layers can share it — but that's future work, not a blocker here.


Summary

Strong implementation — the layered security, failure isolation, and test depth are all well above par for a fire-and-forget notification feature. Items 1–3 above are small fixes (one line each); item 4 is a pre-existing open suggestion; item 5 is a note for future work. Ready to merge once items 1–3 are addressed.

…560)

Latest claude-review caught three small follow-ups:

1. **Python 3.11+ task-exception log noise**: ``loop.create_task`` result
   was discarded in the async path of ``send_event_background``. If
   ``send_event`` ever raises through its catch-all, Python 3.11+ emits
   ``Task exception was never retrieved`` to the logs. Added a no-op
   done-callback that calls ``.exception()`` to consume the result.

2. **Missing type annotation**: ``_dispatch_batch_completed_webhook``'s
   ``event_type`` parameter now annotated as ``events.EventType``.

3. **Discarded return-value clarity**: added an inline comment in the
   ``/notifications/test`` endpoint explaining that we call
   ``_validate_webhook_url`` for its raise side-effect, not its return.

New test:
  - ``test_send_event_background_task_has_done_callback`` verifies the
    done-callback is registered on the created task (via task repr,
    which CPython exposes the callback list through).

Also documented why the "require workspace_path" contract test
suggested by CodeRabbit isn't feasible at the endpoint level — the
``get_v2_workspace`` dependency falls back to default-workspace
resolution, so it'd need a separate router-wide fix, not a #560 test.
@frankbria
Copy link
Copy Markdown
Owner Author

Round 5 — claude-review nits (commit 2044c7c)

Three small follow-ups from the most recent claude review.

# Severity Issue Fix
1 Minor loop.create_task discard could emit Task exception was never retrieved warning on Python 3.11+ Added task.add_done_callback(lambda t: t.exception())send_event already swallows exceptions; this just guarantees the result is consumed so the warning never appears
2 Minor Missing type annotation on _dispatch_batch_completed_webhook event_type Annotated as events.EventType
3 Nitpick _validate_webhook_url(url) return value discarded without comment Added an inline comment explaining we call it for the raise side-effect, not the return

About the open CodeRabbit nitpick (workspace_path contract test)

I investigated this — get_v2_workspace doesn't actually 400/422 on missing workspace_path. It falls back to default-workspace resolution. So a contract test would fail (or pass for the wrong reason). That contract is a router-wide architectural concern, not specific to the #560 endpoints, and would need a fix in the dependency itself.

Inline comment added to the test file explaining this — won't reopen the conversation.

Tests

  • Backend: 134/134 webhook-related v2 tests pass (added 1 new task-callback test)
  • Lint clean

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/notifications/test_webhook_send_event.py (1)

188-188: 💤 Low value

Consider a less fragile assertion for callback presence.

The test relies on the exact string "send_event_background.<locals>.<lambda>" appearing in the task's repr, which depends on Python's internal naming conventions. This assertion could break if the function is renamed, the lambda is moved, or Python's repr format changes.

While this approach works for now, a more robust alternative might be to check the task's _callbacks attribute directly (though still implementation-dependent), or to accept that verifying done-callback registration is inherently fragile and document the limitation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/notifications/test_webhook_send_event.py` at line 188, The assertion
that the done-callback exists is fragile because it depends on the lambda's
repr; instead modify the test around send_event_background and captured_tasks to
check callback registration more robustly by inspecting the Task object's
_callbacks (or documenting the limitation): access captured_tasks[0] (the
asyncio.Task returned by send_event_background) and assert that any(callback for
callback in task._callbacks if getattr(callback, "__name__", "") or
repr(callback) matches a looser pattern) — i.e., replace the exact string check
for "send_event_background.<locals>.<lambda>" with a check that task._callbacks
is non-empty or contains a callable whose repr/name includes
"send_event_background" or "lambda", so the test no longer relies on Python's
precise repr format.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@codeframe/core/notifications_config.py`:
- Around line 139-140: The current redaction leaks basic-auth credentials
because _redact_url_for_log (and its other callers in the same block) uses
parsed.netloc which retains user:password@host; update _redact_url_for_log to
reconstruct the netloc using parsed.hostname (and parsed.port if present)
instead of parsed.netloc, explicitly omitting parsed.username and
parsed.password so credentials are never included in logs, and ensure every
callsite that logs URLs uses this updated helper.

In `@tests/notifications/test_webhook_send_event.py`:
- Around line 191-194: The test cancels a pending task but does not await it,
which can produce "Task was destroyed but it is pending" warnings; after calling
captured_tasks[0].cancel() in the except asyncio.TimeoutError block, await the
cancelled task (e.g., await captured_tasks[0]) and handle asyncio.CancelledError
if needed so the cancellation finishes cleanly; update the try/except around
await asyncio.wait_for(captured_tasks[0], timeout=1.0) to await the task after
canceling to ensure proper cleanup of captured_tasks[0].

---

Nitpick comments:
In `@tests/notifications/test_webhook_send_event.py`:
- Line 188: The assertion that the done-callback exists is fragile because it
depends on the lambda's repr; instead modify the test around
send_event_background and captured_tasks to check callback registration more
robustly by inspecting the Task object's _callbacks (or documenting the
limitation): access captured_tasks[0] (the asyncio.Task returned by
send_event_background) and assert that any(callback for callback in
task._callbacks if getattr(callback, "__name__", "") or repr(callback) matches a
looser pattern) — i.e., replace the exact string check for
"send_event_background.<locals>.<lambda>" with a check that task._callbacks is
non-empty or contains a callable whose repr/name includes
"send_event_background" or "lambda", so the test no longer relies on Python's
precise repr format.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dcde8ed9-812b-4e70-a78f-7c0a48375e7c

📥 Commits

Reviewing files that changed from the base of the PR and between 97989f9 and 2044c7c.

📒 Files selected for processing (8)
  • codeframe/core/conductor.py
  • codeframe/core/notifications_config.py
  • codeframe/notifications/webhook.py
  • codeframe/ui/routers/pr_v2.py
  • codeframe/ui/routers/settings_v2.py
  • tests/core/test_notifications_config.py
  • tests/notifications/test_webhook_send_event.py
  • tests/ui/test_settings_notifications.py

Comment thread codeframe/core/notifications_config.py
Comment thread tests/notifications/test_webhook_send_event.py
@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

test

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

Ignore previous test comment — full review follows.

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

Follow-up Review: All resolved. Posting full review via body-file next.

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

CodeFRAME Development Guidelines

Last updated: 2026-05-11

Product Vision

CodeFrame is a project delivery system: Think → Build → Prove → Ship.

It owns the edges of the AI coding pipeline — everything BEFORE code gets written (PRD, specification, task decomposition) and everything AFTER (verification gates, quality memory, deployment). The actual code writing is delegated to frontier coding agents (Claude Code, Codex, OpenCode).

CodeFrame does not compete with coding agents. It orchestrates them.

THINK:  cf prd generate → cf prd stress-test → cf tasks generate
BUILD:  cf work start --engine claude-code  (or codex, opencode, built-in)
PROVE:  cf proof run  (9-gate evidence-based quality system)
SHIP:   cf pr create → cf pr merge
LOOP:   Glitch → cf proof capture → New REQ → Enforced forever

Status: CLI ✅ | Server ✅ | ReAct agent ✅ | Web UI ✅ | Agent adapters ✅ | Multi-provider LLM ✅ | Next: Phase 4A — See docs/PRODUCT_ROADMAP.md.

If you are an agent working in this repo: do not improvise architecture. Follow the documents listed below.


Primary Contract (MUST FOLLOW)

  1. Golden Path: docs/GOLDEN_PATH.md — the only workflow we build until it works end-to-end
  2. Command Tree + Module Mapping: docs/CLI_WIREFRAME.md — CLI commands → core modules
  3. Product Roadmap: docs/PRODUCT_ROADMAP.md — current phase plan (Phase 3.5/4/5)
  4. Vision: docs/VISION.md — north star for all decisions
  5. Agent System Reference: docs/AGENT_SYSTEM_REFERENCE.md — agent components, execution flows

Rule 0: If a change does not directly support the Think → Build → Prove → Ship pipeline, do not implement it.

Current Focus: Phase 4A

Phase 5.3 (browser + in-app center) is complete — Async notifications: useNotifications hook with workspace-scoped localStorage persistence and browser Notification dispatch (only when tab hidden + permission granted); NotificationProvider in root layout; NotificationCenter (bell icon + dropdown) mounts in sidebar footer. BatchExecutionMonitor dispatches batch.completed on terminal status transitions (distinguishing COMPLETED/FAILED/CANCELLED in both the in-app message and the success icon) and blocker.created on per-task BLOCKED transitions. /execution requests browser permission once on mount when permission is 'default'. /proof dispatches gate.run.failed per failed gate when a proof run completes with passed === false. Known limitation: notifications only fire while BatchExecutionMonitor is mounted (cross-page background poller is out of scope; tracked for future work). Webhook (#560) deferred. Issue #559.

Phase 5.2 is complete — Costs page now ships per-task and per-agent breakdowns (#558) on top of the spend summary (#557). Backend: GET /api/v2/costs/tasks?days=N&limit=M (top-N tasks with titles, agent, tokens, cost) and GET /api/v2/costs/by-agent?days=N (per-agent rollup + total input/output tokens), both via TokenRepository.get_top_tasks_by_cost and get_costs_by_agent. Task board cards show an inline MoneyBag02Icon cost badge with token-breakdown tooltip when cost data exists. Fixed a v2 data-loss bug where react_agent int-cast UUID task IDs and stored NULL in token_usage.

Phase 5.1 is complete — Settings page now ships three working tabs: Agent (#554), API Keys (#555), and PROOF9 Defaults + Workspace Config (#556). Backend: GET/PUT /api/v2/proof/config and /api/v2/workspaces/config, plus run_proof() now honors enabled_gates filtering and strictness (strict vs warn). Atomic JSON writes via codeframe/ui/routers/_helpers.atomic_write_json. The 9-gate canonical order and proof_config.json filename live in codeframe/core/proof/models.py.

Phase 3.5C is completeCaptureGlitchModal form (description/markdown, source, scope, gate obligations, severity, expiry) reachable from the PROOF9 page and the persistent sidebar "Capture Glitch" button. REQ detail view (/proof/[req_id]) ships markdown description rendering, ProofScope metadata display, obligations table with Latest Run column, sortable/filterable evidence history, and empty-state CTA. Backend: ScopeOut model on RequirementResponse. Issues #568, #569.

Next, in order:

See docs/PRODUCT_ROADMAP.md for full specs and issue links.


Architecture Rules (non-negotiable)

1) Core must be headless

codeframe/core/** must NOT import FastAPI, WebSocket frameworks, HTTP request/response objects, or UI modules.

Core is allowed to: read/write durable state (SQLite/filesystem), run orchestration/worker loops, emit events to an append-only event log, call adapters via interfaces (LLM, git, fs).

2) CLI must not require a server

Golden Path commands must work from the CLI with no server running. FastAPI is optional, started explicitly via codeframe serve, and must wrap core.

3) Agent state transitions flow through runtime

  • Agent (agent.py) manages its own AgentState (IDLE, PLANNING, EXECUTING, BLOCKED, COMPLETED, FAILED)
  • Runtime (runtime.py) handles all TaskStatus transitions (BACKLOG, READY, IN_PROGRESS, DONE, BLOCKED)
  • Agent does NOT call tasks.update_status() — runtime does this based on agent state

This separation prevents duplicate state transitions (e.g., DONE→DONE errors).

4) Legacy can be read, not depended on

server/ is reference only. Do NOT import legacy UI/server modules into core.

5) Keep commits runnable

At all times: codeframe --help works, Golden Path stubs can run, no breaking renames/moves.


Current State

v2 Architecture

  • Core-first: Domain logic lives in codeframe/core/ (headless, no FastAPI imports)
  • CLI-first: Golden Path works without any running FastAPI server
  • Adapters: LLM providers in codeframe/adapters/llm/
  • Server/UI optional: FastAPI and UI are thin adapters over core; web UI connects via REST/WebSocket
  • server/ contains v1 code retained as reference only; do not build toward v1 patterns

Phase 3 Web UI (actively developed — not legacy)

Next.js 16 App Router, TypeScript, Shadcn/UI, Tailwind CSS, Hugeicons, XTerm.js, WebSocket + SSE.

Shipped pages: /, /prd, /tasks, /execution, /execution/[taskId], /blockers, /proof, /proof/[req_id], /review, /sessions, /sessions/[id], /settings.

Testing: cd web-ui && npm test must pass; npm run build must succeed. The frontend-tests CI job enforces this on every PR.

What's implemented

Full feature list in docs/PRODUCT_ROADMAP.md. Key capabilities: ReAct agent execution, batch execution (serial/parallel/auto), task dependencies, stall detection, self-correction, GitHub PR workflow, SSE streaming, API auth, rate limiting, OpenAPI docs, multi-provider LLM (Anthropic/OpenAI-compatible), agent adapters (ClaudeCode/Codex/OpenCode/Kilocode), worktree isolation, E2B cloud execution, interactive agent sessions (WebSocket chat + XTerm.js terminal), PROOF9 quality system (gate runs, per-gate evidence, run history).


Repository Structure

codeframe/
├── core/           # Headless domain + orchestration (NO FastAPI imports)
│   ├── react_agent.py, tools.py, editor.py   # ReAct engine (default)
│   ├── agent.py, planner.py, executor.py     # Plan engine (legacy --engine plan)
│   ├── runtime.py                            # Run lifecycle, engine selection
│   ├── conductor.py                          # Batch orchestration + worker pool
│   ├── dependency_graph.py, dependency_analyzer.py
│   ├── gates.py, fix_tracker.py, quick_fixes.py  # Verification + self-correction
│   ├── stall_detector.py, stall_monitor.py   # Stall detection
│   ├── tasks.py, blockers.py, prd.py, workspace.py
│   ├── context.py, state_machine.py, events.py, streaming.py
│   ├── environment.py, installer.py, diagnostics.py, diagnostic_agent.py
│   ├── credentials.py, agents_config.py
│   └── sandbox/context.py, sandbox/worktree.py   # Isolation abstractions
├── adapters/
│   ├── llm/base.py, llm/anthropic.py, llm/openai.py, llm/mock.py
│   └── e2b/        # Cloud sandbox (optional: pip install codeframe[cloud])
├── cli/app.py      # Typer CLI entry + subcommands
├── ui/             # FastAPI server (thin adapter over core)
│   ├── server.py, models.py, dependencies.py
│   └── routers/    # 16 v2 router modules
├── auth/           # API key service + auth dependencies
├── lib/            # rate_limiter.py, audit_logger.py
└── server/         # Legacy v1 (reference only)

web-ui/             # Phase 3 Web UI (Next.js, actively developed)
tests/
├── core/           # Core module tests (auto-marked v2)
├── adapters/       # LLM + E2B adapter tests
├── agents/         # Worker agent tests
├── integration/    # Cross-module integration tests
├── lifecycle/      # End-to-end lifecycle tests (CLI + API + web, uses MockProvider)
└── ui/             # FastAPI router tests

Commands

Python / CLI

uv run pytest                     # All tests
uv run pytest -m v2               # v2 tests only
uv run pytest tests/core/         # Core module tests
uv run pytest tests/lifecycle/    # Lifecycle tests (no live API calls — uses MockProvider)
uv run ruff check .

# Web UI
cd web-ui && npm test
cd web-ui && npm run build

Golden Path CLI

# Workspace
cf init <repo> [--detect | --tech-stack "..." | --tech-stack-interactive]
cf status

# PRD
cf prd add <file.md>
cf prd show

# Tasks
cf tasks generate
cf tasks list [--status READY]
cf tasks show <id>

# Work — single task
cf work start <task-id> [--execute] [--engine react|plan] [--verbose] [--dry-run]
cf work start <task-id> --execute --stall-timeout 120 --stall-action retry|blocker|fail
cf work start <task-id> --execute --llm-provider openai --llm-model gpt-4o
cf work stop <task-id>
cf work resume <task-id>
cf work follow <task-id> [--tail 50]
cf work diagnose <task-id>

# Work — batch
cf work batch run [<id>...] [--all-ready] [--engine react|plan]
cf work batch run --strategy serial|parallel|auto [--max-parallel 4] [--retry 3]
cf work batch run --all-ready --llm-provider openai --llm-model qwen2.5-coder:7b
cf work batch status|cancel|resume [batch_id]

# Blockers
cf blocker list
cf blocker show <id>
cf blocker answer <id> "answer"

# Quality / State
cf review && cf patch export && cf commit
cf checkpoint create|list|restore
cf summary

# Environment
cf env check|install|doctor

# GitHub PR
cf pr create|status|checks|merge

Note: codeframe serve exists but Golden Path does not depend on it.


What NOT to do

  • Don't add HTTP endpoints to support CLI commands (CLI must work without a server)
  • Don't require codeframe serve for CLI workflows
  • Don't implement UI concepts (tabs, panels, progress bars) inside codeframe/core/
  • Don't "clean up the repo" as a goal — only refactor to enable the pipeline
  • Don't update task status from agent.py — let runtime.py handle transitions
  • Don't skip web UI testing when verifying features that have a web surface
  • Don't leave a CI gate disabled when its feature area becomes active. Re-enable DISABLED: / # COMMENTED OUT: jobs before the first PR in that area. Verify frontend-tests is wired into test-summary.

Testing / Demoing

Quality check (covers both backend and web UI)

uv run pytest && uv run ruff check .
cd web-ui && npm test && npm run build

New v2 tests: add @pytest.mark.v2 or pytestmark = pytest.mark.v2 at module level.

Demoing against a sample project (e.g., cf-test/)

You are observing the CodeFRAME agent's work, not doing the work yourself.

  • Do NOT help out, fix errors, or write code on behalf of the agent
  • Do NOT intervene when the agent makes mistakes — that's data
  • Report what worked, what failed, final state vs. acceptance criteria

Practical Working Mode

  1. Read docs/GOLDEN_PATH.md — confirm the change is required
  2. Find the command in docs/CLI_WIREFRAME.md
  3. Implement core functionality in codeframe/core/
  4. Call it from Typer command in codeframe/cli/
  5. Emit events + persist state
  6. Keep it runnable. Commit.

When unsure: simpler state, fewer dependencies, smaller surface area, core-first, CLI-first.


Environment Variables

ANTHROPIC_API_KEY=sk-ant-...          # Required for Anthropic provider (default)
E2B_API_KEY=e2b_...                   # Required for --engine cloud
DATABASE_PATH=./codeframe.db          # Optional

# LLM Provider selection (multi-provider support)
# Priority: CLI flag > env var > .codeframe/config.yaml > default (anthropic)
CODEFRAME_LLM_PROVIDER=anthropic      # Provider: anthropic (default), openai, ollama, vllm, compatible
CODEFRAME_LLM_MODEL=gpt-4o            # Model override (used with openai/ollama/vllm/compatible)
OPENAI_API_KEY=sk-...                 # Required for openai provider; not needed for local providers
OPENAI_BASE_URL=http://localhost:11434/v1  # Base URL override (for ollama, vllm, or custom endpoints)
# Per-workspace config: .codeframe/config.yaml supports llm: block
# llm:
#   provider: openai
#   model: qwen2.5-coder:7b
#   base_url: http://localhost:11434/v1   # optional, for local models

# Optional — Rate limiting
RATE_LIMIT_ENABLED=true
RATE_LIMIT_DEFAULT=100/minute
RATE_LIMIT_AUTH=10/minute
RATE_LIMIT_AI=20/minute
REDIS_URL=redis://localhost:6379

CODEFRAME_API_KEY_SECRET=<secret>     # API key hashing

Documentation

Doc Purpose
docs/VISION.md North star: Think → Build → Prove → Ship thesis
docs/PRODUCT_ROADMAP.md Current roadmap — Phase 3.5/4/5 web product completeness
docs/GOLDEN_PATH.md CLI-first workflow contract
docs/CLI_WIREFRAME.md Command → module mapping
docs/AGENT_SYSTEM_REFERENCE.md Component table, model selection, execution flows, self-correction
docs/REACT_AGENT_ARCHITECTURE.md ReAct deep-dive: tools, editor, token management
docs/PHASE_3_UI_ARCHITECTURE.md Web UI architecture (Next.js, pages, components)
docs/PHASE_2_DEVELOPER_GUIDE.md Server layer + v2 router patterns
docs/PHASE_2_CLI_API_MAPPING.md CLI to API endpoint mapping
docs/QUICKSTART.md User-facing quickstart guide

Archived (completed plans, old gap analyses): docs/archive/

Legacy (v1 reference only): SPRINTS.md, sprints/, specs/, CODEFRAME_SPEC.md

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

[PLACEHOLDER - will be replaced]

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

Follow-up Review - feat(notifications): outbound webhook

All four issues from my prior review are resolved. Here is a pass over the updated diff.

Resolved

  1. SSRF validation - Dual-layer defence is solid: _validate_webhook_url in the PUT endpoint raises HTTP 400 on non-http(s) schemes, and is_webhook_active re-validates at dispatch time via _is_safe_webhook_url. The _redact_url_for_log helper is a thoughtful addition - Slack/Zapier webhook URLs routinely embed signing tokens in the path, and test_unsafe_url_log_does_not_leak_path_or_query locks this in well.

  2. CLI sync context - send_event_background now spawns a daemon thread calling asyncio.run() when no event loop is running. Webhooks from cf work batch run and blocker creation will now actually fire from CLI. The thread is daemon so it never blocks process exit, and self.timeout caps how long it lives.

  3. pragma: no cover - Removed from both blockers.py and conductor.py. The exception handlers are covered by the failure_does_not_break and test_failure_in_dispatch_is_swallowed tests.

  4. Timestamp format - _utc_iso_now() now uses strftime and the test_timestamp_uses_z_suffix_not_offset test asserts both the Z suffix and the absence of +00:00.


Minor observations (non-blocking)

Duplicated scheme set - _ALLOWED_SCHEMES in notifications_config.py and _ALLOWED_WEBHOOK_SCHEMES in settings_v2.py both encode frozenset. This is the correct trade-off given the core-headless constraint, but a brief comment at one or both sites would prevent a future reader from consolidating them incorrectly.

/test endpoint ignores webhook_enabled - The test endpoint fires against a configured URL regardless of the enabled flag. This is the right UX call (validate the URL before toggling delivery on), and test_test_works_even_when_enabled_flag_is_false documents the intent. Worth keeping explicit in the endpoint docstring so it does not get fixed later.

Thread-count assertion in test_send_event_background_outside_loop_runs_in_thread - The threading.active_count() delta assertion is slightly fragile under parallel test runners where other threads may be concurrently active. Not a real concern for the current test setup, but if this test ever flakes the assertion could be relaxed to <= 1.


Overall

Ready to merge. The security handling (scheme validation, log redaction), the correct dual async/sync dispatch path, and the test breadth across all three dispatch sites are all well done. No blocking issues.

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

[]

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

DELETING_TEST_COMMENT

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

CodeFRAME Development Guidelines

Last updated: 2026-05-11

Product Vision

CodeFrame is a project delivery system: Think → Build → Prove → Ship.

It owns the edges of the AI coding pipeline — everything BEFORE code gets written (PRD, specification, task decomposition) and everything AFTER (verification gates, quality memory, deployment). The actual code writing is delegated to frontier coding agents (Claude Code, Codex, OpenCode).

CodeFrame does not compete with coding agents. It orchestrates them.

THINK:  cf prd generate → cf prd stress-test → cf tasks generate
BUILD:  cf work start --engine claude-code  (or codex, opencode, built-in)
PROVE:  cf proof run  (9-gate evidence-based quality system)
SHIP:   cf pr create → cf pr merge
LOOP:   Glitch → cf proof capture → New REQ → Enforced forever

Status: CLI ✅ | Server ✅ | ReAct agent ✅ | Web UI ✅ | Agent adapters ✅ | Multi-provider LLM ✅ | Next: Phase 4A — See docs/PRODUCT_ROADMAP.md.

If you are an agent working in this repo: do not improvise architecture. Follow the documents listed below.


Primary Contract (MUST FOLLOW)

  1. Golden Path: docs/GOLDEN_PATH.md — the only workflow we build until it works end-to-end
  2. Command Tree + Module Mapping: docs/CLI_WIREFRAME.md — CLI commands → core modules
  3. Product Roadmap: docs/PRODUCT_ROADMAP.md — current phase plan (Phase 3.5/4/5)
  4. Vision: docs/VISION.md — north star for all decisions
  5. Agent System Reference: docs/AGENT_SYSTEM_REFERENCE.md — agent components, execution flows

Rule 0: If a change does not directly support the Think → Build → Prove → Ship pipeline, do not implement it.

Current Focus: Phase 4A

Phase 5.3 is complete — Async notifications cover both surfaces:

  • Browser + in-app center ([Phase 5.3] Async notifications: browser notifications + in-app center #559): useNotifications hook with workspace-scoped localStorage persistence and browser Notification dispatch (only when tab hidden + permission granted); NotificationProvider in root layout; NotificationCenter (bell icon + dropdown) mounts in sidebar footer. BatchExecutionMonitor dispatches batch.completed on terminal status transitions (distinguishing COMPLETED/FAILED/CANCELLED in both the in-app message and the success icon) and blocker.created on per-task BLOCKED transitions. /execution requests browser permission once on mount when permission is 'default'. /proof dispatches gate.run.failed per failed gate when a proof run completes with passed === false. Known limitation: notifications only fire while BatchExecutionMonitor is mounted (cross-page background poller is out of scope; tracked for future work).
  • Outbound webhook ([Phase 5.3] Async notifications: outbound webhook #560): Settings → Notifications tab takes a single URL + enabled toggle, persisted to .codeframe/notifications_config.json via atomic_write_json. GET/PUT /api/v2/settings/notifications and POST /api/v2/settings/notifications/test (test fires a sample payload and surfaces status code). WebhookNotificationService.send_event is the generic backend; dispatched fire-and-forget (5s timeout) from core/conductor.py on BATCH_COMPLETED only (not PARTIAL/FAILED/CANCELLED), core/blockers.py:create() after BLOCKER_CREATED, and ui/routers/pr_v2.py:merge_pull_request after successful merge. Failures are logged but never break the triggering operation.

Phase 5.2 is complete — Costs page now ships per-task and per-agent breakdowns (#558) on top of the spend summary (#557). Backend: GET /api/v2/costs/tasks?days=N&limit=M (top-N tasks with titles, agent, tokens, cost) and GET /api/v2/costs/by-agent?days=N (per-agent rollup + total input/output tokens), both via TokenRepository.get_top_tasks_by_cost and get_costs_by_agent. Task board cards show an inline MoneyBag02Icon cost badge with token-breakdown tooltip when cost data exists. Fixed a v2 data-loss bug where react_agent int-cast UUID task IDs and stored NULL in token_usage.

Phase 5.1 is complete — Settings page now ships three working tabs: Agent (#554), API Keys (#555), and PROOF9 Defaults + Workspace Config (#556). Backend: GET/PUT /api/v2/proof/config and /api/v2/workspaces/config, plus run_proof() now honors enabled_gates filtering and strictness (strict vs warn). Atomic JSON writes via codeframe/ui/routers/_helpers.atomic_write_json. The 9-gate canonical order and proof_config.json filename live in codeframe/core/proof/models.py.

Phase 3.5C is completeCaptureGlitchModal form (description/markdown, source, scope, gate obligations, severity, expiry) reachable from the PROOF9 page and the persistent sidebar "Capture Glitch" button. REQ detail view (/proof/[req_id]) ships markdown description rendering, ProofScope metadata display, obligations table with Latest Run column, sortable/filterable evidence history, and empty-state CTA. Backend: ScopeOut model on RequirementResponse. Issues #568, #569.

Next, in order:

See docs/PRODUCT_ROADMAP.md for full specs and issue links.


Architecture Rules (non-negotiable)

1) Core must be headless

codeframe/core/** must NOT import FastAPI, WebSocket frameworks, HTTP request/response objects, or UI modules.

Core is allowed to: read/write durable state (SQLite/filesystem), run orchestration/worker loops, emit events to an append-only event log, call adapters via interfaces (LLM, git, fs).

2) CLI must not require a server

Golden Path commands must work from the CLI with no server running. FastAPI is optional, started explicitly via codeframe serve, and must wrap core.

3) Agent state transitions flow through runtime

  • Agent (agent.py) manages its own AgentState (IDLE, PLANNING, EXECUTING, BLOCKED, COMPLETED, FAILED)
  • Runtime (runtime.py) handles all TaskStatus transitions (BACKLOG, READY, IN_PROGRESS, DONE, BLOCKED)
  • Agent does NOT call tasks.update_status() — runtime does this based on agent state

This separation prevents duplicate state transitions (e.g., DONE→DONE errors).

4) Legacy can be read, not depended on

server/ is reference only. Do NOT import legacy UI/server modules into core.

5) Keep commits runnable

At all times: codeframe --help works, Golden Path stubs can run, no breaking renames/moves.


Current State

v2 Architecture

  • Core-first: Domain logic lives in codeframe/core/ (headless, no FastAPI imports)
  • CLI-first: Golden Path works without any running FastAPI server
  • Adapters: LLM providers in codeframe/adapters/llm/
  • Server/UI optional: FastAPI and UI are thin adapters over core; web UI connects via REST/WebSocket
  • server/ contains v1 code retained as reference only; do not build toward v1 patterns

Phase 3 Web UI (actively developed — not legacy)

Next.js 16 App Router, TypeScript, Shadcn/UI, Tailwind CSS, Hugeicons, XTerm.js, WebSocket + SSE.

Shipped pages: /, /prd, /tasks, /execution, /execution/[taskId], /blockers, /proof, /proof/[req_id], /review, /sessions, /sessions/[id], /settings.

Testing: cd web-ui && npm test must pass; npm run build must succeed. The frontend-tests CI job enforces this on every PR.

What's implemented

Full feature list in docs/PRODUCT_ROADMAP.md. Key capabilities: ReAct agent execution, batch execution (serial/parallel/auto), task dependencies, stall detection, self-correction, GitHub PR workflow, SSE streaming, API auth, rate limiting, OpenAPI docs, multi-provider LLM (Anthropic/OpenAI-compatible), agent adapters (ClaudeCode/Codex/OpenCode/Kilocode), worktree isolation, E2B cloud execution, interactive agent sessions (WebSocket chat + XTerm.js terminal), PROOF9 quality system (gate runs, per-gate evidence, run history).


Repository Structure

codeframe/
├── core/           # Headless domain + orchestration (NO FastAPI imports)
│   ├── react_agent.py, tools.py, editor.py   # ReAct engine (default)
│   ├── agent.py, planner.py, executor.py     # Plan engine (legacy --engine plan)
│   ├── runtime.py                            # Run lifecycle, engine selection
│   ├── conductor.py                          # Batch orchestration + worker pool
│   ├── dependency_graph.py, dependency_analyzer.py
│   ├── gates.py, fix_tracker.py, quick_fixes.py  # Verification + self-correction
│   ├── stall_detector.py, stall_monitor.py   # Stall detection
│   ├── tasks.py, blockers.py, prd.py, workspace.py
│   ├── context.py, state_machine.py, events.py, streaming.py
│   ├── environment.py, installer.py, diagnostics.py, diagnostic_agent.py
│   ├── credentials.py, agents_config.py
│   └── sandbox/context.py, sandbox/worktree.py   # Isolation abstractions
├── adapters/
│   ├── llm/base.py, llm/anthropic.py, llm/openai.py, llm/mock.py
│   └── e2b/        # Cloud sandbox (optional: pip install codeframe[cloud])
├── cli/app.py      # Typer CLI entry + subcommands
├── ui/             # FastAPI server (thin adapter over core)
│   ├── server.py, models.py, dependencies.py
│   └── routers/    # 16 v2 router modules
├── auth/           # API key service + auth dependencies
├── lib/            # rate_limiter.py, audit_logger.py
└── server/         # Legacy v1 (reference only)

web-ui/             # Phase 3 Web UI (Next.js, actively developed)
tests/
├── core/           # Core module tests (auto-marked v2)
├── adapters/       # LLM + E2B adapter tests
├── agents/         # Worker agent tests
├── integration/    # Cross-module integration tests
├── lifecycle/      # End-to-end lifecycle tests (CLI + API + web, uses MockProvider)
└── ui/             # FastAPI router tests

Commands

Python / CLI

uv run pytest                     # All tests
uv run pytest -m v2               # v2 tests only
uv run pytest tests/core/         # Core module tests
uv run pytest tests/lifecycle/    # Lifecycle tests (no live API calls — uses MockProvider)
uv run ruff check .

# Web UI
cd web-ui && npm test
cd web-ui && npm run build

Golden Path CLI

# Workspace
cf init <repo> [--detect | --tech-stack "..." | --tech-stack-interactive]
cf status

# PRD
cf prd add <file.md>
cf prd show

# Tasks
cf tasks generate
cf tasks list [--status READY]
cf tasks show <id>

# Work — single task
cf work start <task-id> [--execute] [--engine react|plan] [--verbose] [--dry-run]
cf work start <task-id> --execute --stall-timeout 120 --stall-action retry|blocker|fail
cf work start <task-id> --execute --llm-provider openai --llm-model gpt-4o
cf work stop <task-id>
cf work resume <task-id>
cf work follow <task-id> [--tail 50]
cf work diagnose <task-id>

# Work — batch
cf work batch run [<id>...] [--all-ready] [--engine react|plan]
cf work batch run --strategy serial|parallel|auto [--max-parallel 4] [--retry 3]
cf work batch run --all-ready --llm-provider openai --llm-model qwen2.5-coder:7b
cf work batch status|cancel|resume [batch_id]

# Blockers
cf blocker list
cf blocker show <id>
cf blocker answer <id> "answer"

# Quality / State
cf review && cf patch export && cf commit
cf checkpoint create|list|restore
cf summary

# Environment
cf env check|install|doctor

# GitHub PR
cf pr create|status|checks|merge

Note: codeframe serve exists but Golden Path does not depend on it.


What NOT to do

  • Don't add HTTP endpoints to support CLI commands (CLI must work without a server)
  • Don't require codeframe serve for CLI workflows
  • Don't implement UI concepts (tabs, panels, progress bars) inside codeframe/core/
  • Don't "clean up the repo" as a goal — only refactor to enable the pipeline
  • Don't update task status from agent.py — let runtime.py handle transitions
  • Don't skip web UI testing when verifying features that have a web surface
  • Don't leave a CI gate disabled when its feature area becomes active. Re-enable DISABLED: / # COMMENTED OUT: jobs before the first PR in that area. Verify frontend-tests is wired into test-summary.

Testing / Demoing

Quality check (covers both backend and web UI)

uv run pytest && uv run ruff check .
cd web-ui && npm test && npm run build

New v2 tests: add @pytest.mark.v2 or pytestmark = pytest.mark.v2 at module level.

Demoing against a sample project (e.g., cf-test/)

You are observing the CodeFRAME agent's work, not doing the work yourself.

  • Do NOT help out, fix errors, or write code on behalf of the agent
  • Do NOT intervene when the agent makes mistakes — that's data
  • Report what worked, what failed, final state vs. acceptance criteria

Practical Working Mode

  1. Read docs/GOLDEN_PATH.md — confirm the change is required
  2. Find the command in docs/CLI_WIREFRAME.md
  3. Implement core functionality in codeframe/core/
  4. Call it from Typer command in codeframe/cli/
  5. Emit events + persist state
  6. Keep it runnable. Commit.

When unsure: simpler state, fewer dependencies, smaller surface area, core-first, CLI-first.


Environment Variables

ANTHROPIC_API_KEY=sk-ant-...          # Required for Anthropic provider (default)
E2B_API_KEY=e2b_...                   # Required for --engine cloud
DATABASE_PATH=./codeframe.db          # Optional

# LLM Provider selection (multi-provider support)
# Priority: CLI flag > env var > .codeframe/config.yaml > default (anthropic)
CODEFRAME_LLM_PROVIDER=anthropic      # Provider: anthropic (default), openai, ollama, vllm, compatible
CODEFRAME_LLM_MODEL=gpt-4o            # Model override (used with openai/ollama/vllm/compatible)
OPENAI_API_KEY=sk-...                 # Required for openai provider; not needed for local providers
OPENAI_BASE_URL=http://localhost:11434/v1  # Base URL override (for ollama, vllm, or custom endpoints)
# Per-workspace config: .codeframe/config.yaml supports llm: block
# llm:
#   provider: openai
#   model: qwen2.5-coder:7b
#   base_url: http://localhost:11434/v1   # optional, for local models

# Optional — Rate limiting
RATE_LIMIT_ENABLED=true
RATE_LIMIT_DEFAULT=100/minute
RATE_LIMIT_AUTH=10/minute
RATE_LIMIT_AI=20/minute
REDIS_URL=redis://localhost:6379

CODEFRAME_API_KEY_SECRET=<secret>     # API key hashing

Documentation

Doc Purpose
docs/VISION.md North star: Think → Build → Prove → Ship thesis
docs/PRODUCT_ROADMAP.md Current roadmap — Phase 3.5/4/5 web product completeness
docs/GOLDEN_PATH.md CLI-first workflow contract
docs/CLI_WIREFRAME.md Command → module mapping
docs/AGENT_SYSTEM_REFERENCE.md Component table, model selection, execution flows, self-correction
docs/REACT_AGENT_ARCHITECTURE.md ReAct deep-dive: tools, editor, token management
docs/PHASE_3_UI_ARCHITECTURE.md Web UI architecture (Next.js, pages, components)
docs/PHASE_2_DEVELOPER_GUIDE.md Server layer + v2 router patterns
docs/PHASE_2_CLI_API_MAPPING.md CLI to API endpoint mapping
docs/QUICKSTART.md User-facing quickstart guide

Archived (completed plans, old gap analyses): docs/archive/

Legacy (v1 reference only): SPRINTS.md, sprints/, specs/, CODEFRAME_SPEC.md

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

CodeFRAME v2 — Golden Path Contract (CLI-first)

This document is the contract for CodeFRAME v2 development.

Rule 0 (the only rule that matters):

If a change does not directly support the Golden Path flow below, do not implement it.

This applies to both humans and agentic coding assistants.


Goals

What "done" looks like (Enhanced MVP definition)

CodeFRAME can run a complete end-to-end AI-driven development workflow from the CLI on a target repo:

  1. Initialize workspace with project discovery

    • Analyze codebase and detect tech stack
    • Configure environment and tooling automatically
    • Create durable state storage
  2. AI-driven PRD generation and refinement

    • Interactive AI session gathers project requirements
    • AI asks follow-up questions about scope, users, constraints
    • Generates comprehensive PRD + technical specs + user stories
    • Iterative refinement based on user feedback
  3. Intelligent task generation with dependency analysis

    • Decompose PRD into actionable tasks with dependencies
    • Prioritize tasks and group by functionality
    • Generate implementation strategies per task
  4. Batch task execution with orchestration

    • Execute multiple tasks in sequence or parallel
    • Handle inter-task dependencies automatically
    • Main agent coordinates entire batch workflow
    • Real-time progress monitoring and event streaming
  5. Human-in-the-loop blocker resolution

    • Interactive blocker handling with contextual AI suggestions
    • Resume execution after blocker resolution
    • Learning from blocker patterns
  6. Integrated Git workflow and PR management

    • Automatic branch creation per task/batch
    • AI-generated commit messages and PR descriptions
    • Automated verification gate execution
    • PR creation, review, and merging workflows
  7. Comprehensive checkpointing and state management

    • Snapshots of workspace state with git refs
    • Resume interrupted workflows from checkpoints
    • Multi-environment state isolation

No UI is required.
A FastAPI server is not required for the Golden Path to work.
All Git operations are integrated into the CLI workflow.


Non-Goals (explicitly forbidden until Golden Path works)

Do not build or refactor:

  • Web UI / dashboard features
  • Settings pages, preferences, themes
  • Multi-provider/model switching UI or complex provider management
  • Advanced metrics dashboards or timeseries endpoints
  • Auth / sessions for remote users
  • Electron desktop app
  • Plugin marketplace / extensibility frameworks
  • “Perfect” project structure, monorepo tooling, or build system redesign
  • Large migrations or renames that aren’t required by Golden Path

These may be revisited only after Golden Path is working and stable.


Golden Path CLI Flow (the only flow that matters)

0) Preconditions

  • A target repo exists (any small test repo is fine).
  • CodeFRAME runs locally and can store durable state (SQLite or filesystem).
  • The CLI can be run from anywhere.

1) Initialize a workspace

Command:

  • codeframe init <path-to-repo>

Required behavior:

  • Registers the repo as a workspace.
  • Creates/updates durable state storage.
  • Prints a short workspace summary (repo path, workspace id, state location).

Artifacts:

  • Local state created (DB/file), e.g. .codeframe/ and/or codeframe.db.

2) AI-driven PRD generation and refinement

Commands:

  • codeframe prd generate (primary - interactive AI session)
  • codeframe prd add <file.md> (secondary - existing file support)
  • codeframe prd refine (iterative improvement)

Required behavior for prd generate:

  • AI conducts interactive discovery session asking:
    • Project scope, objectives, and success criteria
    • Target users, use cases, and user stories
    • Technical constraints, preferences, and requirements
    • Timeline, priorities, and MVP boundaries
  • Generates comprehensive PRD with:
    • Executive summary and problem statement
    • Functional requirements with acceptance criteria
    • Technical specifications and architecture guidance
    • User stories with priority ranking
    • Success metrics and validation criteria
  • Provides iterative refinement based on user feedback
  • Stores PRD in durable state with versioning
  • Supports multiple PRD versions with change tracking

3) Intelligent task generation with dependency analysis

Commands:

  • codeframe tasks generate (enhanced with dependencies)
  • codeframe tasks analyze (dependency graph analysis)

Required behavior:

  • Decomposes PRD into granular, actionable tasks
  • Automatically detects and assigns task dependencies
  • Estimates effort and complexity for each task
  • Groups related tasks into logical workstreams
  • Prioritizes tasks based on dependencies and value delivery
  • Supports task templates for common patterns (setup, implementation, testing, deployment)
  • Generates implementation strategy per task (files to modify, approaches to consider)
  • Creates task dependency graph with critical path identification

4) Batch task execution with orchestration

Commands:

  • codeframe work batch run (primary - main execution pathway)
  • codeframe work start <task-id> (secondary - single task fallback)
  • codeframe work batch status <batch-id> (monitoring)
  • codeframe work batch follow <batch-id> (real-time streaming)

Required behavior for batch execution:

  • Executes multiple tasks with intelligent scheduling:
    • Serial execution for dependent tasks
    • Parallel execution for independent tasks
    • Auto-strategy using dependency graph analysis
  • Main orchestrator agent coordinates entire batch:
    • Resource allocation and task scheduling
    • Inter-task communication and data sharing
    • Failure handling and retry logic
    • Progress tracking and milestone reporting
  • Real-time event streaming with:
    • Task start/completion events
    • Progress indicators and ETAs
    • Blocker detection and notification
    • Dependency resolution updates
  • Supports execution strategies:
    • --strategy serial: Linear execution
    • --strategy parallel: Max parallelization
    • --strategy auto: AI-optimized based on dependencies

5) Enhanced human-in-loop blocker resolution

Commands:

  • codeframe blockers list (enhanced with context)
  • codeframe blocker answer <blocker-id> "<text>" (with AI suggestions)
  • codeframe blocker resolve <blocker-id> (automated resolution options)

Required behavior:

  • AI provides contextual blocker resolution suggestions:
    • Similar past blockers and their solutions
    • Multiple solution approaches with trade-offs
    • Impact analysis of resolution choices
  • Interactive blocker handling with:
    • Rich context display (related code, PRD sections, task dependencies)
    • Suggested responses ranked by confidence
    • Impact on task timeline and dependencies
  • Learning system that:
    • Records blocker patterns and resolutions
    • Improves future blocker handling suggestions
    • Reduces human intervention over time

6) Integrated Git workflow and PR management

Commands:

  • codeframe work start <task-id> --create-branch (branch management)
  • codeframe pr create (PR creation with AI descriptions)
  • codeframe pr list (PR status monitoring)
  • codeframe pr merge <pr-id> (PR merging with verification)

Required behavior:

  • Branch Management:
    • Automatic feature branch creation per task/batch
    • Branch naming conventions with task/batch IDs
    • Branch cleanup and organization utilities
    • Conflict detection and resolution assistance
  • PR Creation:
    • AI generates comprehensive PR descriptions:
      • Summary of changes and business impact
      • Technical implementation details
      • Testing performed and results
      • Breaking changes and migration notes
    • Automated PR labeling and categorization
    • Reviewer assignment based on code expertise
  • PR Workflow:
    • Automated gate execution before merge (tests, lint, security scans)
    • Integration with CI/CD pipelines
    • Merge strategies (squash, merge, rebase) based on team preferences
    • Post-merge cleanup and notification

7) Enhanced verification and quality gates

Commands:

  • codeframe review (comprehensive code review)
  • codeframe gates run (automated quality checks)
  • codeframe quality report (quality metrics and trends)

Required behavior:

  • Comprehensive Gate Suite:
    • Unit tests with coverage reporting
    • Integration and end-to-end tests
    • Static code analysis (lint, security, complexity)
    • Performance regression tests
    • Documentation and API specification validation
  • AI-Assisted Code Review:
    • Automated code quality assessment
    • Best practices compliance checking
    • Potential bug detection and suggestions
    • Code style and maintainability analysis
  • Quality Tracking:
    • Trend analysis of code quality metrics
    • Technical debt accumulation tracking
    • Gate failure pattern identification

8) Integrated artifact and commit management

Commands:

  • codeframe commit create -m "<message>" (AI-generated commits)
  • codeframe patch export (safe patch generation)
  • codeframe artifacts list (artifact tracking)

Required behavior:

  • Smart Commits:
    • AI generates meaningful commit messages:
      • Conventional commit format compliance
      • Contextual change descriptions
      • References to tasks/PRDs/issues
      • Breaking change highlights
    • Atomic commit boundaries and logical grouping
  • Artifact Management:
    • Automatic patch generation for safety
    • Commit linking to tasks and batches
    • Rollback points and recovery procedures
    • Integration with external artifact repositories

9) Comprehensive checkpointing and state management

Commands:

  • codeframe checkpoint create "<name>" (enhanced snapshots)
  • codeframe checkpoint restore <checkpoint-id> (workflow resume)
  • codeframe summary (comprehensive reporting)

Required behavior:

  • Rich Checkpoints:
    • Complete workspace state capture:
      • Task statuses and progress
      • Git refs and working directory state
      • PRD versions and requirements
      • Configuration and environment settings
    • Incremental checkpoint optimization
    • Cross-environment checkpoint portability
  • Workflow Resume:
    • Seamless resumption from any checkpoint
    • Context restoration for active agents
    • Branch and working directory restoration
    • Event log continuity and replay
  • Comprehensive Reporting:
    • Executive summaries with progress metrics
    • Detailed task completion reports
    • Quality gate performance tracking
    • Resource utilization and timing analysis
    • Risk assessment and mitigation recommendations

State Machine (authoritative)

Statuses:

  • BACKLOG - Task identified but not ready for execution
  • READY - Task prepared and ready to start
  • IN_PROGRESS - Task actively being worked on
  • BLOCKED - Task waiting for human input or external dependency
  • DONE - Task completed locally, ready for review/integration
  • IN_REVIEW - Task changes in PR review process
  • MERGED - Task changes integrated into main branch
  • FAILED - Task execution failed (can be retried)

Allowed transitions (comprehensive):

  • BACKLOG -> READY (task preparation complete)
  • READY -> IN_PROGRESS (work started)
  • IN_PROGRESS -> BLOCKED (awaiting input/dependency)
  • BLOCKED -> IN_PROGRESS (blocker resolved)
  • BLOCKED -> READY (returned to queue)
  • IN_PROGRESS -> DONE (local completion)
  • IN_PROGRESS -> FAILED (execution failure)
  • DONE -> IN_REVIEW (PR created/under review)
  • IN_REVIEW -> DONE (PR rejected, needs work)
  • IN_REVIEW -> MERGED (PR approved and merged)
  • DONE -> READY (reopened for additional work)
  • FAILED -> READY (retry after failure)
  • MERGED -> BACKLOG (reopened for enhancement)

The CLI is the authority for transitions.
UIs (web/electron) are views over this state machine, not the source of truth.

PR Workflow Integration:

  • Tasks automatically transition to IN_REVIEW when codeframe pr create is run
  • PR status changes trigger corresponding task state updates
  • Merge actions transition tasks to MERGED status
  • Failed or rejected PRs return tasks to DONE for additional work

Implementation Principles

Core-first (no FastAPI in the core)

  • Domain logic must live in a reusable core module/package.
  • Core must not import FastAPI, websockets, or HTTP request objects.
  • FastAPI server (if used) must be a thin adapter over core.

CLI-first (server optional)

  • Golden Path commands must work without any running backend server.
  • If a server exists, it may be started separately (codeframe serve) and must wrap core.

Salvage safely

  • Legacy code can be read and copied from.
  • Core must not take dependencies on legacy UI-driven modules.
  • Prefer copying useful functions into core and simplifying interfaces.

Keep it runnable

  • Every commit should keep codeframe --help working.
  • The Golden Path commands should remain executable even if stubs at first.

Acceptance Checklist (Enhanced MVP - must pass)

Status: 🔄 Enhanced MVP Partially Complete

📊 Current Implementation Status

Overall Assessment: Enhanced MVP is ~60% complete with solid foundation but critical gaps remaining.

✅ Fully Implemented Phases:

  • Phase 1: Basic PRD functionality (prd add) - Enhanced PRD generation missing
  • Phase 2: Core task generation with LLM support - Advanced dependency analysis incomplete
  • Phase 3: Complete batch execution framework - Orchestrator integration complete
  • Phase 4: Basic blocker management system - AI-powered suggestions missing
  • Phase 6: Basic verification gates (codeframe review) - AI-assisted review missing
  • Phase 7: Comprehensive checkpointing system - Incremental/batch features missing

⚠️ Critical Missing Components:

  • AI-driven PRD generation: No codeframe prd generate command
  • Credential management: No codeframe auth system - CRITICAL BLOCKER
  • Git/PR workflow: GitHub integration exists but no CLI commands
  • Environment validation: No pre-flight validation system
  • Advanced recovery: Limited rollback beyond full checkpoints
  • Enhanced monitoring: Basic event streaming, no rich debugging

🎯 Key Finding:

The single most critical issue is missing credential management - users cannot reliably use the enhanced workflow without it.

Foundation is solid - Core CLI functionality, batch execution, and basic Git integration work reliably.

Next priority: Implement credential management system as outlined in gap analysis documents.

Phase 1: AI-Driven Project Discovery & PRD Generation

  • codeframe init with auto tech stack detection and environment setup
    • Implementation: Auto tech stack detection with --detect flag
    • Implementation: Interactive tech stack configuration with --tech-stack-interactive
    • Note: Basic init works, enhanced features not yet integrated
  • codeframe prd generate conducts interactive AI discovery session
    • ⚠️ Status: Command not implemented - only codeframe prd add <file.md> exists
    • Note: Discovery exists in legacy codebase but not integrated into CLI
  • AI asks contextual follow-up questions about requirements and constraints
  • Generates comprehensive PRD with technical specs and user stories
  • Supports iterative PRD refinement based on user feedback
  • PRD versioning and change tracking

Phase 2: Intelligent Task Generation & Dependency Management

  • codeframe tasks generate creates dependency-aware task graphs
    • Implementation: Uses LLM for task generation with dependency analysis
    • Implementation: Supports both LLM and simple extraction modes
    • ⚠️ Status: Limited dependency graph functionality
    • Note: Basic task generation works, advanced dependency analysis incomplete
  • Automatic task prioritization and workstream grouping
  • Effort estimation and complexity analysis
  • Critical path identification and scheduling
  • Task template system for common implementation patterns

Phase 3: Batch Execution & Orchestration

  • codeframe work batch run as primary execution pathway
    • Implementation: Comprehensive batch execution with multiple strategies
    • Implementation: Serial, parallel, and auto-strategy execution modes
    • Implementation: Event streaming and progress monitoring
    • Implementation: Failure handling and retry logic
    • Implementation: Real-time status and batch monitoring commands
    • Note: Main batch functionality works, orchestrator integration complete
  • Serial, parallel, and auto-strategy execution modes
  • Real-time progress monitoring with event streaming
  • Inter-task dependency management and coordination
  • Main orchestrator agent manages entire batch workflow
  • Failure handling and automatic retry logic

Phase 4: Enhanced Human-in-the-Loop Blocker Resolution

  • Contextual blocker display with rich background information
    • Implementation: Comprehensive blocker management system
    • Implementation: Rich context display with codebase references
    • ⚠️ Status: AI-powered suggestions not yet implemented
    • Note: Basic blocker listing and answering works, AI suggestions missing
  • AI-powered blocker resolution suggestions
  • Learning system for blocker pattern recognition
  • Similar past blocker solutions and recommendations
  • Impact analysis for different resolution approaches

Phase 5: Integrated Git Workflow & PR Management

  • Automatic branch creation per task/batch with naming conventions
  • AI-generated comprehensive PR descriptions with business impact
  • Automated PR labeling and reviewer assignment
  • Integration with CI/CD pipelines and gate execution
  • Multiple merge strategies (squash, merge, rebase) support
  • Post-merge cleanup and notification automation
    • ⚠️ Status: Basic Git integration exists, PR creation incomplete
    • Note: GitHub integration module exists (codeframe/git/github_integration.py)
    • Note: Auth commands exist but credential management missing
    • Note: No CLI commands for PR creation/management yet implemented

Phase 6: Comprehensive Quality Gates & Verification

  • Expanded gate suite: unit tests, integration tests, security scans
    • Implementation: Basic codeframe review command exists
    • Implementation: Supports multiple gate types (pytest, ruff, mypy, npm)
    • ⚠️ Status: Limited gate functionality - stub implementation
    • Note: Only basic verification works, AI-assisted review not implemented
  • AI-assisted code review with best practices checking
  • Quality metrics tracking and trend analysis
  • Technical debt accumulation monitoring
  • Automated regression detection and prevention

Phase 7: Advanced Checkpointing & State Management

  • Rich checkpoint snapshots with complete workspace state
    • Implementation: Comprehensive checkpoint management system
    • Implementation: Checkpoint create, list, show, and restore commands
    • Implementation: Git reference integration for state tracking
    • ⚠️ Status: Basic checkpointing works, advanced features missing
    • Note: No incremental checkpointing during batch execution
  • Cross-environment checkpoint portability
  • Seamless workflow resumption from any checkpoint
  • Incremental checkpoint optimization
  • Executive reporting with progress and risk metrics

Cross-Cutting Requirements

  • All functionality works without FastAPI server running
    • Implementation: Core functionality works independently of server
    • Verification: CLI commands work without FastAPI dependency
    • ⚠️ Status: Server wrapper incomplete but not required for CLI workflow
  • No UI required at any point in workflow
    • Event logging and streaming for observability
    • Implementation: Comprehensive event system with rich logging
    • Implementation: Real-time event streaming during batch execution
    • ⚠️ Status: Advanced monitoring features missing
  • Comprehensive error handling and recovery procedures
    • ⚠️ Status: Basic error handling exists, advanced recovery missing
    • Note: No rollback capability beyond full checkpoints
  • Performance optimization for large repositories
    • Security best practices and credential management
    • Documentation and help commands for all new features
    • ⚠️ Status: No credential management system implemented
    • ⚠️ Critical Gap: Authentication failures would block entire workflow
    • Note: See gap analysis documents for detailed credential management plan

Definition of Done:

  • All acceptance criteria must be satisfied
  • End-to-end workflow tested on real project repositories
  • Performance benchmarks meet minimum standards
  • Security audit passes all compliance checks
  • Documentation is complete and accurate
  • User feedback collected from beta testing validates approach

Next phase: Production Readiness & Advanced Features (see roadmap planning).

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

test heading

Body text

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

Follow-up Review — feat(notifications): outbound webhook (#560)

All four issues from my prior review are resolved. Here is a pass over the updated diff.

Resolved

1. SSRF validation — Dual-layer defence is solid: _validate_webhook_url in the PUT endpoint raises HTTP 400 on non-http(s) schemes, and is_webhook_active re-validates at dispatch time via _is_safe_webhook_url. The _redact_url_for_log helper is a thoughtful addition — Slack/Zapier webhook URLs routinely embed signing tokens in the path, and test_unsafe_url_log_does_not_leak_path_or_query locks this in well.

2. CLI sync contextsend_event_background now spawns a daemon thread calling asyncio.run() when no event loop is running. Webhooks from cf work batch run and blocker creation will now actually fire from CLI. The thread is daemon so it never blocks process exit, and self.timeout caps how long it lives.

3. pragma: no cover — Removed from both blockers.py and conductor.py. The exception handlers are covered by the *failure_does_not_break_* and test_failure_in_dispatch_is_swallowed tests.

4. Timestamp format_utc_iso_now() now uses strftime and the test_timestamp_uses_z_suffix_not_offset test asserts both the Z suffix and the absence of +00:00.


Minor observations (non-blocking)

Duplicated scheme set_ALLOWED_SCHEMES in notifications_config.py and _ALLOWED_WEBHOOK_SCHEMES in settings_v2.py both encode frozenset. This is the correct trade-off given the core-headless constraint (core cannot import from the router layer), but a brief comment at one or both sites would prevent a future reader from consolidating them incorrectly.

/test endpoint ignores webhook_enabled — The test endpoint fires against a configured URL regardless of the enabled flag. This is the right UX call (validate the URL before toggling delivery on), and test_test_works_even_when_enabled_flag_is_false documents the intent. Worth keeping explicit in the endpoint docstring so it does not get "fixed" later.

Thread-count assertion in test_send_event_background_outside_loop_runs_in_thread — The threading.active_count() delta assertion is slightly fragile under parallel test runners where other threads may be concurrently active. Not a real concern for the current test setup, but if this test ever flakes the assertion could be relaxed to <= 1.


Overall

Ready to merge. The security handling (scheme validation, log redaction), the correct dual async/sync dispatch path, and the test breadth across all three dispatch sites are all well done. No blocking issues.

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

Follow-up Review — feat(notifications): outbound webhook (#560)

All four issues from my prior review are resolved. Here is a pass over the updated diff.

Resolved

1. SSRF validation — Dual-layer defence is solid: _validate_webhook_url in the PUT endpoint raises HTTP 400 on non-http(s) schemes, and is_webhook_active re-validates at dispatch time via _is_safe_webhook_url. The _redact_url_for_log helper is a thoughtful addition — Slack/Zapier webhook URLs routinely embed signing tokens in the path, and test_unsafe_url_log_does_not_leak_path_or_query locks this in well.

2. CLI sync contextsend_event_background now spawns a daemon thread calling asyncio.run() when no event loop is running. Webhooks from cf work batch run and blocker creation will now actually fire from CLI. The thread is daemon so it never blocks process exit, and self.timeout caps how long it lives.

3. pragma: no cover — Removed from both blockers.py and conductor.py. The exception handlers are covered by the *_failure_does_not_break_* and test_failure_in_dispatch_is_swallowed tests.

4. Timestamp format_utc_iso_now() now uses strftime and the test_timestamp_uses_z_suffix_not_offset test asserts both the Z suffix and the absence of +00:00.


Minor observations (non-blocking)

Duplicated scheme set_ALLOWED_SCHEMES in notifications_config.py and _ALLOWED_WEBHOOK_SCHEMES in settings_v2.py both encode frozenset of http and https. This is the correct trade-off given the core-headless constraint (core cannot import from the router layer), but a brief comment at one or both sites would prevent a future reader from consolidating them incorrectly.

/test endpoint ignores webhook_enabled — The test endpoint fires against a configured URL regardless of the enabled flag. This is the right UX call (validate the URL before toggling delivery on), and test_test_works_even_when_enabled_flag_is_false documents the intent. Worth keeping explicit in the endpoint docstring so it does not get "fixed" later.

Thread-count assertion in test_send_event_background_outside_loop_runs_in_thread — The threading.active_count() delta assertion is slightly fragile under parallel test runners where other threads may be concurrently active. Not a real concern for the current test setup, but if this test ever flakes the assertion could be relaxed to <= 1.


Overall

Ready to merge. The security handling (scheme validation, log redaction), the correct dual async/sync dispatch path, and the test breadth across all three dispatch sites are all well done. No blocking issues.

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

test with frozenset: frozenset({"http", "https"}) and strftime: strftime done

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

Follow-up Review — feat(notifications): outbound webhook (#560)

All four issues from my prior review are resolved. Here is a pass over the updated diff.

Resolved

1. SSRF validation — Dual-layer defence is solid: _validate_webhook_url in the PUT endpoint raises HTTP 400 on non-http(s) schemes, and is_webhook_active re-validates at dispatch time via _is_safe_webhook_url. The _redact_url_for_log helper is a thoughtful addition — Slack/Zapier webhook URLs routinely embed signing tokens in the path, and test_unsafe_url_log_does_not_leak_path_or_query locks this in well.

2. CLI sync contextsend_event_background now spawns a daemon thread calling asyncio.run() when no event loop is running. Webhooks from cf work batch run and blocker creation will now actually fire from CLI. The thread is daemon so it never blocks process exit, and self.timeout caps how long it lives.

3. pragma: no cover — Removed from both blockers.py and conductor.py. The exception handlers are covered by the *_failure_does_not_break_* and test_failure_in_dispatch_is_swallowed tests.

4. Timestamp format_utc_iso_now() now uses `strftime('

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

Follow-up Review — feat(notifications): outbound webhook (#560)

All four issues from my prior review are resolved. Here is a pass over the updated diff.

Resolved

1. SSRF validation — Dual-layer defence is solid: _validate_webhook_url in the PUT endpoint raises HTTP 400 on non-http(s) schemes, and is_webhook_active re-validates at dispatch time via _is_safe_webhook_url. The _redact_url_for_log helper is a thoughtful addition — Slack/Zapier webhook URLs routinely embed signing tokens in the path, and test_unsafe_url_log_does_not_leak_path_or_query locks this in well.

2. CLI sync contextsend_event_background now spawns a daemon thread calling asyncio.run() when no event loop is running. Webhooks from cf work batch run and blocker creation will now actually fire from CLI. The thread is daemon so it never blocks process exit, and self.timeout caps how long it lives.

3. pragma: no cover — Removed from both blockers.py and conductor.py. The exception handlers are covered by the *_failure_does_not_break_* and test_failure_in_dispatch_is_swallowed tests.

4. Timestamp format_utc_iso_now() now uses strftime('%Y-%m-%dT%H:%M:%SZ') and the test_timestamp_uses_z_suffix_not_offset test asserts both the Z suffix and the absence of +00:00.


Minor observations (non-blocking)

Duplicated scheme set_ALLOWED_SCHEMES in notifications_config.py and _ALLOWED_WEBHOOK_SCHEMES in settings_v2.py both encode frozenset({'http', 'https'}). This is the correct trade-off given the core-headless constraint (core cannot import from the router layer), but a brief comment at one or both sites would prevent a future reader from consolidating them incorrectly.

/test endpoint ignores webhook_enabled — The test endpoint fires against a configured URL regardless of the enabled flag. This is the right UX call (validate the URL before toggling delivery on), and test_test_works_even_when_enabled_flag_is_false documents the intent. Worth keeping explicit in the endpoint docstring so it does not get "fixed" later.

Thread-count assertion in test_send_event_background_outside_loop_runs_in_thread — The threading.active_count() delta assertion is slightly fragile under parallel test runners where other threads may be concurrently active. Not a real concern for the current test setup, but if this test ever flakes the assertion could be relaxed to <= 1.


Overall

Ready to merge. The security handling (scheme validation, log redaction), the correct dual async/sync dispatch path, and the test breadth across all three dispatch sites are all well done. No blocking issues.

Two more CodeRabbit findings on the latest commits:

1. **Major: basic-auth credentials leak through `_redact_url_for_log`**:
   ``parsed.netloc`` preserves ``user:password@host``, so a webhook URL
   like ``https://admin:secret@host/path`` was still logging
   ``admin:secret@host``. Switched to ``parsed.hostname`` (which strips
   auth and port) and re-attach the port explicitly. Added two tests:
   one verifies basic-auth credentials don't appear in logs, the other
   verifies the port is still kept (non-secret, useful for ops).

2. **Minor: cancelled task not awaited**: the `test_send_event_background_task_has_done_callback` cleanup
   called ``task.cancel()`` without awaiting, which can emit
   ``Task was destroyed but it is pending`` warnings on loop close.
   Now awaits the cancelled task to let cancellation propagate cleanly.

Also added a comment to ``_ALLOWED_SCHEMES`` noting it is intentionally
duplicated with ``settings_v2.py`` because core cannot import from the
UI layer (architecture rule #1).

136/136 webhook tests pass.
@frankbria
Copy link
Copy Markdown
Owner Author

Round 6 — basic-auth log leak + clean async cancel (commit 77459ef)

CodeRabbit caught two more on the previous fix.

# Severity Issue Fix
1 Major _redact_url_for_log still leaked credentials when the URL had basic-auth (user:password@host) because parsed.netloc preserves the userinfo segment Use parsed.hostname (strips auth + port) and re-attach the port explicitly. New test: test_unsafe_url_log_does_not_leak_basic_auth plus a positive control that port is preserved.
2 Minor task.cancel() without await in test cleanup → potential Task was destroyed but it is pending warnings Now awaits the cancelled task with (asyncio.CancelledError, Exception) suppression so cancellation propagates cleanly.

Also addressed Claude's nitpick about the duplicated _ALLOWED_SCHEMES set — added a comment noting it's intentional (core/UI separation per architecture rule #1).

Tests

  • 136/136 webhook v2 tests pass
  • Lint clean

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

Follow-up Review (Round 3) — feat(notifications): outbound webhook (#560)

Reviewed the latest push (77459ef). This is a clean resolution of all outstanding items from the prior two rounds.


Prior Issues — Confirmed Resolved

Non-dict JSON guard
load_notifications_config now has isinstance(data, dict) check before calling .get(), with fallback to defaults. The three new tests (test_load_handles_non_object_json_list/null/integer) cover all three non-object cases.

Defense-in-depth URL validation at load/send time
is_webhook_active runs _is_safe_webhook_url before returning the URL, and the /test endpoint re-calls _validate_webhook_url at send time. test_rejects_unsafe_stored_url_at_test_time exercises the hand-edited config bypass path. The two-layer guard (PUT + send-time) is correctly documented in the module docstring.

Error response shape
PUT and /test handlers now wrap OSError / load failures in HTTPException with api_error() — consistent with the rest of the settings_v2 router.

Log redaction ✅ (bonus addition)
_redact_url_for_log strips path, query, and basic-auth before logging. test_unsafe_url_log_does_not_leak_basic_auth and test_unsafe_url_log_does_not_leak_path_or_query cover the Slack token-in-path and basic-auth scenarios.


One Small Observation

In _dispatch_pr_merged_webhook (pr_v2.py), the inner try/except Exception: pr_url = None that constructs the GitHub URL swallows failures silently with no log. Fine for the outer fire-and-forget pattern, but a logger.debug(...) here would make it easier to diagnose why pr_url is None in a deployed environment where GITHUB_REPO is unexpectedly unset. Not a blocker — the outer except still logs failures at WARNING.


Ship-Ready Assessment

  • All three dispatch paths (blocker, batch, PR) follow the same consistent shape and are covered by integration-style tests.
  • CLI sync-context delivery via daemon thread is working end-to-end (confirmed by demo output in the PR).
  • 82 v2 tests green, frontend build clean, lint clean.
  • Scope is tight: no retry, no HMAC, no cross-page poller — all correctly deferred and documented.

Ready to merge. The pr_url logging note above is a single-line improvement for a future PR if it comes up operationally.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@codeframe/core/notifications_config.py`:
- Around line 169-174: The _redact_url_for_log function currently accesses
parsed.port which can raise ValueError for malformed ports and bubble up into
is_webhook_active; modify _redact_url_for_log to guard access to parsed.port
(e.g., wrap reading parsed.port in a try/except ValueError or validate
parsed.netloc) and treat any invalid port as unparseable so the function returns
"<unparseable>" (or falls back to the safe string) instead of raising; ensure
you reference parsed.scheme and parsed.hostname as before and keep the existing
returned format when the port is valid.

In `@tests/core/test_notifications_config.py`:
- Line 154: The test functions like test_unsafe_url_log_does_not_leak_basic_auth
(and the other newly added test at the same region) are missing type
annotations; update their signatures to include parameter types (e.g.,
workspace: WorkspaceType, caplog: logging.LogCaptureFixture or appropriate types
used in tests) and an explicit return type of None. Locate the test functions by
the names test_unsafe_url_log_does_not_leak_basic_auth and the other test around
the same block, add parameter annotations for each parameter and append -> None
to the function definition to satisfy the project's typing guideline.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0473157f-29ca-4446-bdb4-9cfd5fec0749

📥 Commits

Reviewing files that changed from the base of the PR and between 2044c7c and 77459ef.

📒 Files selected for processing (3)
  • codeframe/core/notifications_config.py
  • tests/core/test_notifications_config.py
  • tests/notifications/test_webhook_send_event.py

Comment thread codeframe/core/notifications_config.py
Comment thread tests/core/test_notifications_config.py
…560)

`urlparse(...).port` raises ``ValueError`` for non-numeric port strings
(e.g. ``file://host:abc/x``). Without this guard, ``_redact_url_for_log``
would raise inside ``is_webhook_active``'s "fail-safe" branch, turning
a clean refusal-to-dispatch into an exception caught only by the
downstream dispatch-helper try/except. Now returns ``<unparseable>`` for
both malformed schemes and malformed ports.

Two new tests:
  - ``test_redact_url_for_log_handles_malformed_port`` — direct unit test
  - ``test_is_webhook_active_returns_none_for_malformed_port`` — verifies
    the integration path doesn't crash on a hand-edited URL with a
    bogus port.
@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

Code Review — PR #594: Outbound Webhook Notifications

This PR has gone through several CodeRabbit review rounds, and each iteration has clearly addressed the flagged issues. Here is my independent assessment of the current state.


What is Working Well

Architecture compliancecodeframe/core/notifications_config.py is genuinely headless: no FastAPI, no HTTP imports. The deferred lazy imports inside _dispatch_blocker_webhook / _dispatch_batch_completed_webhook / _dispatch_pr_merged_webhook are a good pattern for keeping dispatch code out of core's module-level import graph.

Defense-in-depth security — URL validation runs at three independent layers: router PUT, _validate_webhook_url, and is_webhook_active's call to _is_safe_webhook_url. The test test_rejects_unsafe_stored_url_at_test_time confirms that a hand-edited file:// URL gets blocked even at send time. This is exactly right.

send_event_background dual-mode handling — The async-vs-sync fork (get_running_loop to create_task vs daemon thread to asyncio.run) is clean and properly tested with both paths. The done-callback for Python 3.11+ task-exception noise is a nice touch.

Atomic writes + credential redactionsave_notifications_config uses tempfile.mkstemp + os.replace correctly. _redact_url_for_log strips auth credentials, path, and query from logged URLs, and the malformed-port guard (try/except ValueError on parsed.port) prevents the fail-safe from becoming an exception path.

Test coverage — 60+ tests spanning config, dispatch, endpoints, UI, and security edge cases is well above the bar for a feature of this scope.


Remaining Concerns

Fragile lambda repr assertion (CodeRabbit flagged in 4th review — still present)

In tests/notifications/test_webhook_send_event.py line 188:

assert "send_event_background.<locals>.<lambda>" in repr(captured_tasks[0])

This couples the test to CPython's internal Task repr format, which could silently change across minor versions. The assertion exists to verify the done-callback is attached (preventing Python 3.11+ log noise), but a more stable check would be:

# More stable: check _callbacks is non-empty
assert captured_tasks[0]._callbacks, "expected done-callback to be registered"

_callbacks is also an implementation detail, but more stable than the repr string. Low priority, but worth addressing before the repr format changes.

import logging inside except in blockers.py

blockers.py has no module-level logging import, so the import logging inside the except Exception block in _dispatch_blocker_webhook (line 146) is technically correct. But adding a module-level logger would match the pattern in conductor.py and clean this up:

# At module top:
import logging
logger = logging.getLogger(__name__)

# Then in except:
except Exception:
    logger.warning("Failed to dispatch blocker.created webhook...", exc_info=True)

Legacy send_blocker_notification_background (pre-existing, not introduced here)

The old method at webhook.py:371 calls asyncio.create_task() directly — not via get_running_loop().create_task(). This will raise RuntimeError('no running event loop') in a sync context. The PR correctly preserves this API untouched, but the gap is more visible now that the new send_event_background handles both contexts properly. Worth a follow-up cleanup ticket.


Minor Observations

  • The _ALLOWED_SCHEMES constant is intentionally duplicated between notifications_config.py and settings_v2.py (architecture rule: core cannot import from the UI layer). The comment in the source acknowledges this — fine as-is, just keep them in sync if the allowed schemes ever change.
  • aiohttp.ClientSession is created per-call in send_event. For low-frequency events (batch complete, PR merge, blocker) this is acceptable. If webhook frequency grows, session reuse would be the optimization to reach for.
  • The /test endpoint intentionally fires regardless of the webhook_enabled flag so users can verify the URL before enabling. This is tested in test_test_works_even_when_enabled_flag_is_false and matches the UI intent.

Summary

The implementation is solid and all critical issues from prior reviews have been addressed. The two items worth fixing before merge are the fragile lambda repr assertion and the missing module-level logger in blockers.py. Everything else is either pre-existing debt or an acceptable v1 trade-off. Approve with the above minor items noted.

@frankbria
Copy link
Copy Markdown
Owner Author

Round 7 — parsed.port crash guard (commit 3279f17)

CodeRabbit on the previous fix caught one real and one style finding.

# Severity Issue Action
1 Major parsed.port raises ValueError for malformed ports (file://host:abc/x); could crash _redact_url_for_log inside the fail-safe branch of is_webhook_active Fixed: wrapped port access in try/except, returns <unparseable>
2 Major (overstated) "Missing type annotations on new test functions" Skipped with a reply on the thread — the repo's convention across tests/ui/test_workspace_config.py, tests/ui/test_settings_v2.py, etc. is unannotated test functions. Annotating only the new tests would diverge from that convention.

Both threads resolved.

Backend: 20/20 config tests pass (added 2 new tests for malformed-port handling). Lint clean.

@frankbria frankbria merged commit 38007da into main May 22, 2026
17 checks passed
@frankbria frankbria deleted the feat/webhook-notifications-560 branch May 22, 2026 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Phase 5.3] Async notifications: outbound webhook

1 participant