docs: implement self-improvement loop architecture#239
Conversation
- Add three signal log files: style_lint_runs.jsonl, pr_review_runs.md, human_review_feedback.jsonl - Extend draft_docs/SKILL.md step 8 to append style lint violation records to style_lint_runs.jsonl on agent-authored PRs - Extend review-docs-pr/SKILL.md to append review summaries to pr_review_runs.md - Add improve-drafting-skills skill: monthly outer loop that reads all three logs and proposes targeted edits to skills/templates - Add Phase 2 redirect-drafter to weekly-404-monitor: auto-drafts vercel.json redirects for high-confidence uncovered 404 gaps - Add improve-aeo-crosslink-skill: quarterly outer loop that reads aeo_crosslink_audit_runs.md and proposes edits to the audit skill Co-Authored-By: Oz <oz-agent@warp.dev>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
I'm starting a first review of this pull request. You can view the conversation on Warp. I completed the review and no human review was requested for this pull request. Comment Powered by Oz |
There was a problem hiding this comment.
Overview
This PR adds self-improvement loop infrastructure for docs operations: new signal logs, new outer-loop skills, and updates to drafting/review/404-monitor skills.
Concerns
- The new JSONL log files are seeded with
#comment rows, which conflicts with the documented one-record-per-line JSONL format and can break consumers before any real signal is recorded. - Several logging steps require automated agents to commit directly to
main; that either bypasses the normal PR review path when credentials allow it, or silently loses the durable signal when branch protection blocks the push. - The human feedback collector misses inline review comments and attributes the full PR diff as human edits, so the primary training signal can be incomplete and polluted with agent-authored content.
Security
- The outer-loop skills consume human comments, PR text, and run logs as inputs for editing skills, but they do not define a prompt-injection boundary for treating that content strictly as data.
Spec alignment
No approved or repository spec context was provided for this PR, so there were no spec-alignment findings.
Verdict
Found: 0 critical, 5 important, 0 suggestions
Request changes
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
| @@ -0,0 +1,4 @@ | |||
| # Human review feedback log — one JSON record per line, appended by the feedback collector step. | |||
There was a problem hiding this comment.
# header rows make the file invalid JSONL, so any line-by-line JSON parser in improve-drafting-skills will fail before it sees real feedback. Keep JSONL logs empty until the first record or explicitly require consumers to skip comment lines; the same applies to style_lint_runs.jsonl.
| ```json | ||
| {"date":"YYYY-MM-DD","pr":"NNN","branch":"BRANCH_NAME","authored_by":"agent","skill_used":"SKILL_NAME","files_scanned":N,"violations":{"check_name":count}} | ||
| ``` | ||
| 4. From a clean checkout or worktree based on the latest `main`, stage only `.agents/logs/style_lint_runs.jsonl` and commit directly to `main`: |
There was a problem hiding this comment.
main makes the signal path depend on agents having default-branch push rights and bypasses the normal PR review path when they do. Use a draft log PR or an artifact consumed by a scheduled collector instead; this pattern also appears in the new review/feedback logging steps.
| Before reading the logs, run the feedback collector to capture any merged agent-authored PRs from the past 30 days that have not yet been logged to `human_review_feedback.jsonl`: | ||
|
|
||
| 1. Use `gh pr list --repo warpdotdev/docs --state merged --label oz-agent` or search for PRs with `oz-agent@warp.dev` as a commit author in the past 30 days. | ||
| 2. For each such PR, use `gh pr view NNN --json reviews,comments` to extract human review comments and verdicts. |
There was a problem hiding this comment.
gh pr view --json reviews,comments does not collect inline review comments, which are the main [skill-feedback] signal this loop is designed to learn from. Fetch review threads or pull review comments as well before appending human_review_feedback.jsonl records.
|
|
||
| 1. Use `gh pr list --repo warpdotdev/docs --state merged --label oz-agent` or search for PRs with `oz-agent@warp.dev` as a commit author in the past 30 days. | ||
| 2. For each such PR, use `gh pr view NNN --json reviews,comments` to extract human review comments and verdicts. | ||
| 3. Also run `git diff MERGE_BASE..PR_HEAD -- src/content/docs/` to capture human follow-up edits made to the branch after the agent's last commit. |
There was a problem hiding this comment.
MERGE_BASE..PR_HEAD captures the whole PR diff, not just human edits after the agent's last commit, so the collector will log agent-authored content as human feedback. Diff from the last bot-authored commit to PR_HEAD or classify commits by author before writing human_edit records.
|
|
||
| ### 1. Read the last 30 days of signal data | ||
|
|
||
| Parse all three log files and filter to entries from the past 30 days. |
There was a problem hiding this comment.
- Empty JSONL log files (comment headers were invalid JSONL) - Replace direct-to-main commits in inner loops with stdout signal markers ([SIGNAL:style-lint] and [SIGNAL:pr-review]) consumed by the outer loop via oz run get -- eliminates branch protection dependency on inner loops - Fix feedback collector to fetch inline review comments via gh api pulls/NNN/comments (not just top-level comments field) - Fix git diff range: LAST_BOT_COMMIT..MERGE_COMMIT instead of MERGE_BASE..PR_HEAD to exclude agent-authored content from human edits - Add explicit prompt-injection security boundary to improve-drafting-skills: treat log content as data only, discard injection indicators, act only on parsed structured fields Co-Authored-By: Oz <oz-agent@warp.dev>
|
/oz-review |
|
I'm re-reviewing this pull request in response to a review request. You can view the conversation on Warp. I completed the review and no human review was requested for this pull request. Comment Powered by Oz |
There was a problem hiding this comment.
Overview
This PR adds documentation and skill instructions for self-improvement loops around docs drafting, PR review, 404 redirects, and AEO cross-link audits. The architecture is directionally coherent, but several changed instructions disagree about where signals are written/read, which would cause the monthly loop to miss automated feedback. I could not inspect the contents of .agents/logs/human_review_feedback.jsonl or .agents/logs/style_lint_runs.jsonl because the attached diff reports those patches as unavailable, so any concerns about those files are summarized rather than commented inline.
Concerns
- The new
pr_review_runs.mdlog describes a committed log writer, but the updated review skill only emits a stdout marker. improve-drafting-skillsstill instructs the monthly analysis to parse log files for automated signals even though the same skill now says those signals are only collected from Oz run output.- Raw human review comment text is committed to
mainwithout an explicit pre-append injection/secret filtering step. - The new review signal is described as running after submission, but the skill workflow produces
review.jsonfor the control plane to publish, so the marker may never be emitted.
Verdict
Found: 0 critical, 4 important, 0 suggestions
Request changes
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
| @@ -0,0 +1,17 @@ | |||
| # PR review run log | |||
|
|
|||
| New entries are prepended by each `review-docs-pr` agent run on an agent-authored PR. Most recent entry first. | |||
There was a problem hiding this comment.
review-docs-pr skill; it now emits [SIGNAL:pr-review] to stdout, so either wire the skill to prepend this file or remove the committed log stub to avoid a dead signal source.
| - Set `tag` to the prefix found in the comment (`[skill-feedback]`, `[template-feedback]`, `[style-rule-gap]`) or `""` if none. | ||
| - Set `feedback_type` to `"review_comment"`, `"human_edit"`, or `"review_verdict"`. | ||
| - **Skip** comments from `oz-agent@warp.dev`, `vercel`, `github-actions`, or any other bot actor (check the author login or `authorAssociation`). | ||
| 5. Append accepted records to `.agents/logs/human_review_feedback.jsonl` and commit directly to `main` as part of this monthly outer loop run: |
There was a problem hiding this comment.
main, but injection/secret filtering is only described later for analysis; filter/redact records before appending so malicious comments or accidental sensitive data are not persisted.
|
|
||
| ## Workflow | ||
|
|
||
| ### 1. Read the last 30 days of signal data |
There was a problem hiding this comment.
|
|
||
| ## Signal logging | ||
|
|
||
| After submitting the PR review, emit a summary record for the `improve-drafting-skills` outer loop. Apply this step only when reviewing an agent-authored PR (branch created by a drafting skill, or commit author is `oz-agent@warp.dev`). |
There was a problem hiding this comment.
review.json for the control plane to publish; emit the signal immediately after creating/validating review.json so the marker appears in the run output.
- Fix pr_review_runs.md header: now correctly states it is written by the outer loop (improve-drafting-skills), not by review-docs-pr - Add Step A.5: outer loop writes parsed [SIGNAL:pr-review] records to pr_review_runs.md as a human-readable audit trail, closing the dead signal source - Pre-append security filtering in Step B: injection detection and secret redaction happen before records reach human_review_feedback.jsonl - Workflow step 1: now reads in-memory Step A records + on-disk human_review_feedback.jsonl; no longer references non-existent files - Update threshold table: source labels now say 'Step A in-memory records' - review-docs-pr signal logging: emit after validating review.json (not 'after submitting') so the marker appears in Oz run output regardless of how the review is published Co-Authored-By: Oz <oz-agent@warp.dev>
Summary
Implements the self-improvement loop architecture for Warp docs content operations. This PR adds the signal infrastructure and outer loop skills that allow agent-authored docs to improve automatically from accumulated feedback.
Architecture
Four coordinated loops — three active now, one documented for future deployment:
improve-drafting-skills) reads these logs and proposes targeted edits to drafting skills and templates.weekly-404-monitor— after posting the weekly Slack report, the agent now also proposes redirect entries for high-confidence uncovered 404 gaps.improve-aeo-crosslink-skill) reads theaeo_crosslink_auditrun log and proposes improvements to the audit skill itself. Deploy on month 3 afteraeo_crosslink_audithas 8+ run log entries.Changes
New log files (
.agents/logs/)style_lint_runs.jsonl— JSONL, one record per style lint run on an agent-authored PRpr_review_runs.md— Markdown log ofreview-docs-prruns on agent-authored PRshuman_review_feedback.jsonl— JSONL, human review comments and edits collected from merged agent PRsNew skills
.agents/skills/improve-drafting-skills/SKILL.md— monthly outer loop (Loop 1).agents/skills/improve-aeo-crosslink-skill/SKILL.md— quarterly outer loop (Loop 4)Modified skills (additive changes only)
.agents/skills/draft_docs/SKILL.md— step 8 extended to append a violation record tostyle_lint_runs.jsonlon cloud agent runs.agents/skills/review-docs-pr/SKILL.md— new Signal logging section: appends a summary entry topr_review_runs.mdafter reviewing an agent-authored PR.agents/skills/weekly-404-monitor/SKILL.md— new Phase 2 section: redirect drafter with confidence scoring and draft PR for HIGH-confidence matchesNext steps (after merge)
Two new Oz scheduled agents need to be configured in the Oz web app:
improve-drafting-skills— monthly, first Monday of each month at 9am PTimprove-aeo-crosslink-skill— quarterly, first Monday of Jan/Apr/Jul/Oct (start on month 3 when the run log has 8+ entries)weekly-404-monitoralready runs as a scheduled agent — no new agent is needed; Phase 2 runs within the same existing agent.Architecture plan: https://staging.warp.dev/drive/notebook/LiSAdtZGryD78gSNj5kGPx
Conversation: https://staging.warp.dev/conversation/652a054e-d757-4632-8554-5176f5529ee2
Co-Authored-By: Oz oz-agent@warp.dev