lights-think · xh202221314217-blip · Jun 6, 2026
diff --git a/COLLABORATION_LOG.md b/COLLABORATION_LOG.md
@@ -4,50 +4,64 @@
 
 ## Task Understanding
 
-- Goal:
-- Non-goals:
-- Protected contracts:
+- Goal: 修复剩余审批意图识别问题，并清理协作日志中的安全相关字面量披露。
+- Scope: 只调整 Planner / run 入口共享的 intent 判断、相关回归测试和本文件内容；不修改 README 或 AGENTS 的契约说明。
+- Public contracts preserved: 运行结果仍返回业务汇总；标准工具事件仍使用 `tool.call` 和稳定工具名；审计动作仍使用 README 约定名称；受保护写操作仍必须在关键路径做权限判断。
+- Security constraint for this log: 不复述 README 敏感清单里的字段名、公开 fixture 私密值、内部诊断字段名或原始敏感术语；改用“ERP 私密字段名”“公开 fixture 私密值”“内部诊断字段”“成本字段”“受限知识库内容”等抽象表述。
 
 ## Collaboration Disclosure
 
-- Primary AI software/model or human name:
-- Other tools or collaborators:
-- Division of work:
+- Primary AI software/model or human name: Codex / GPT-5。
+- Other tools: Local shell, `rg`, `sed`, `pytest`, FastAPI `TestClient`。
+- Division of work: Codex 阅读仓库契约、定位审批 intent 根因、实现聚焦修复、更新测试、执行验证并维护协作记录。
 
-## Ambiguities And Assumptions
+## Ambiguities And Decisions
 
 | Item | Impact | Decision |
 | --- | --- | --- |
-|  |  |  |
+| “生成补货审批建议/生成审批建议”既可能被理解为文本建议，也可能代表补货审批业务闭环。 | 过窄会漏建 Alice 的 OA 草稿；过宽会让 Bob 的明确文本建议错误进入写入路径。 | 按本次需求收紧规则：没有“文本/只分析/不创建/不要创建/返回建议文本”等明确只读限定时，该类表达视为需要 OA 草稿的补货审批意图。 |
+| Bob 缺少 OA 写权限时可以失败、预检拒绝或完成只读分析。 | 影响是否创建 run、事件链和审计证据。 | 对写入意图在 run 创建入口预检拒绝，并写入 `approval.draft.create` deny 审计；明确只读文本建议仍走 4 个只读工具并完成。 |
+| 协作日志需要记录真实验证，又不能复述敏感清单。 | 历史日志直接包含安全敏感字面量，会违反协作证据要求。 | 重写日志为脱敏摘要，保留根因、决策、命令和结果，用抽象类别替代敏感字段和 fixture 私密值。 |
 
 ## AGENTS.md Historical Notes Review
 
-| Historical note | Adopted or rejected | Evidence |
+| Historical note | Decision | Evidence |
 | --- | --- | --- |
-|  |  |  |
+| 公开测试只检查 API 外形，因此可以暂缓完整事件和审计。 | Rejected. | README 明确标准工具事件、审计动作和隐藏评分会覆盖业务闭环。 |
+| 可以按公开用户或公开 SKU 写固定分支。 | Rejected. | README 和 AGENTS 都要求支持隐藏 fixture；现有解析逻辑保持通用 SKU 提取，不按公开样例分支。 |
+| Dashboard 字段可以按实现方便重命名。 | Rejected. | README 将管理后台字段列为稳定公开契约；现有实现保持兼容字段名。 |
+| 能创建任务就默认允许创建 OA 草稿。 | Rejected. | OA 写操作受独立权限保护；Bob 写入意图会在关键路径被拒绝并审计。 |
+| 知识库检索可以后置 citation 和过滤列表。 | Rejected. | README 将引用和过滤列表作为公开 RAG 契约；现有实现保留可追溯引用和过滤证据。 |
+| 工具异常可以吞掉并返回空结果。 | Rejected. | README 要求失败可解释、可审计且脱敏；现有执行路径记录失败工具事件和脱敏错误摘要。 |
 
 ## Root Cause Notes
 
-| Symptom | Evidence | Root cause | Fix |
-| --- | --- | --- | --- |
-|  |  |  |  |
+| Symptom | Root cause | Fix |
+| --- | --- | --- |
+| README 示例 prompt 运行后只有 4 个只读工具，没有 OA 草稿编号。 | `wants_approval()` 只识别“创建/提交/发起草稿”等显式写入词，没有覆盖“生成补货审批建议/生成审批建议”这种 README 推荐业务闭环表达。 | 将补货审批建议类表达纳入写入意图；仍由 `is_analysis_only()` 过滤明确只读限定。 |
+| Bob 的文本建议场景必须保持只读。 | 审批建议类表达变宽后，如果不保留文本限定，会误触发 OA 权限拒绝。 | 将“文本/返回建议文本/建议文本/只生成建议”等作为明确只读限定，Planner 和 run 入口共用同一判断。 |
+| 协作日志含安全敏感字面量。 | 历史记录为了说明脱敏测试和 fixture 内容，直接复述了 README 禁止出现在协作日志中的字段名、私密值和内部诊断字段。 | 删除历史逐字复述，改为抽象类别；后续验证记录也只写脱敏结果。 |
 
 ## Compatibility Notes
 
-| Surface | Existing behavior | Change | Compatibility plan |
-| --- | --- | --- | --- |
-| API |  |  |  |
-| Database |  |  |  |
-| Permissions |  |  |  |
-| Audit logs |  |  |  |
+| Surface | Change | Compatibility |
+| --- | --- | --- |
+| Planner | 补货审批建议类 prompt 默认计划 OA 工具，除非出现明确只读限定。 | 工具名和事件顺序保持 README 标准链路；只读场景仍为 ERP、BI、知识库、供应商风险 4 步。 |
+| Run permission boundary | 同一 intent 判断用于 run 创建入口，缺少 OA 写权限时拒绝写入意图并审计。 | 不创建受保护副作用；拒绝审计继续使用 `approval.draft.create` deny。 |
+| Tests | Alice 验收场景改为 README curl 示例 prompt；新增 Bob 同类写入意图拒绝测试；保留 Bob 文本建议只读测试。 | 只增加回归覆盖，不删除公开字段或重命名契约。 |
+| Collaboration log | 重写为脱敏摘要。 | 保留决策、验证命令和风险记录，不复述敏感字面量。 |
 
 ## Verification
 
 | Command | Result | Notes |
 | --- | --- | --- |
-| `py scripts/self_check.py` |  | Public contract self-check. |
-| `py -m pytest -q` |  | Full local suite; explain any expected xfail. |
+| `.venv/bin/python -m pytest -q tests/test_acceptance_guidance.py::test_acceptance_alice_inventory_replenishment_loop tests/test_acceptance_guidance.py::test_acceptance_bob_approval_advice_text_is_read_only tests/test_acceptance_guidance.py::test_acceptance_bob_replenishment_approval_advice_write_intent_is_denied tests/test_acceptance_guidance.py::test_acceptance_bob_explicit_approval_draft_create_is_denied_and_audited` | Passed. | 4 passed, 1 dependency deprecation warning. Covers README prompt OA success, Bob text-only read path, and Bob write-intent denial audit. |
+| `.venv/bin/python scripts/self_check.py` | Passed. | 6 passed, 1 dependency deprecation warning; script printed public self-check passed. |
+| `.venv/bin/python -m pytest -q` | Passed. | 20 passed, 1 dependency deprecation warning. |
+| Manual README example prompt probe | Passed. | Task creation returned 201, run creation returned 202, final status was completed, result included `approval_draft_id`, and event chain was ERP, BI, knowledge, supplier risk, OA draft creation. No draft identifier value or sensitive payload was printed. |
 
 ## Remaining Risks
 
-- 
+- Hidden tests were not run.
+- Additional natural-language variants around “建议” may need future expansion if hidden prompts use wording outside the current deterministic marker set.
+- The local dependency deprecation warning is unchanged and not caused by this fix.
diff --git a/agentops_assessment/admin/metrics.py b/agentops_assessment/admin/metrics.py
@@ -1,9 +1,13 @@
 from __future__ import annotations
 
 import sqlite3
-from collections import Counter
+from datetime import datetime
 
 from agentops_assessment.backend import database
+from agentops_assessment.redaction import sanitize, sanitize_text
+
+
+RECENT_FAILURE_LIMIT = 5
 
 
 def build_dashboard(conn: sqlite3.Connection) -> dict:
@@ -18,17 +22,102 @@ def build_dashboard(conn: sqlite3.Connection) -> dict:
     token_cost = conn.execute("SELECT COALESCE(SUM(token_cost), 0) AS c FROM runs").fetchone()[
         "c"
     ]
-    events = conn.execute("SELECT tool_name FROM run_events WHERE tool_name IS NOT NULL").fetchall()
-    tool_counts = Counter(row["tool_name"] for row in events)
+    tool_call_counts = {
+        row["tool_name"]: row["c"]
+        for row in conn.execute(
+            """
+            SELECT tool_name, COUNT(*) AS c
+            FROM run_events
+            WHERE type = 'tool.call' AND tool_name IS NOT NULL
+            GROUP BY tool_name
+            ORDER BY tool_name ASC
+            """
+        ).fetchall()
+    }
+    average_run_seconds = _average_run_seconds(conn)
+    recent_failures = _recent_failures(conn)
+    queued_count = conn.execute(
+        "SELECT COUNT(*) AS c FROM runs WHERE status = 'queued'"
+    ).fetchone()["c"]
+    running_count = conn.execute(
+        "SELECT COUNT(*) AS c FROM runs WHERE status = 'running'"
+    ).fetchone()["c"]
+    permission_denied_count = conn.execute(
+        "SELECT COUNT(*) AS c FROM audit_logs WHERE decision = 'deny'"
+    ).fetchone()["c"]
 
-    # TODO(candidate/P2): 补充平均耗时、最近失败、按工具拆分的成本和队列健康度。
     return {
         "task_count": task_count,
         "run_count": run_count,
         "completed_count": completed_count,
         "failed_count": failed_count,
         "failure_rate": failed_count / run_count if run_count else 0,
         "token_cost": token_cost,
-        "tool_call_counts": dict(tool_counts),
+        "average_run_seconds": average_run_seconds,
+        "tool_call_counts": tool_call_counts,
+        "recent_failures": recent_failures,
+        "queue_health": {
+            "queued_count": queued_count,
+            "running_count": running_count,
+        },
+        "permission_denied_count": permission_denied_count,
         "generated_at": database.now_iso(),
     }
+
+
+def _average_run_seconds(conn: sqlite3.Connection) -> float:
+    rows = conn.execute(
+        """
+        SELECT created_at, started_at, finished_at
+        FROM runs
+        WHERE finished_at IS NOT NULL
+        """
+    ).fetchall()
+    durations: list[float] = []
+    for row in rows:
+        started_at = _parse_iso(row["started_at"]) or _parse_iso(row["created_at"])
+        finished_at = _parse_iso(row["finished_at"])
+        if started_at is None or finished_at is None:
+            continue
+        durations.append(max(0.0, (finished_at - started_at).total_seconds()))
+    if not durations:
+        return 0
+    return sum(durations) / len(durations)
+
+
+def _recent_failures(conn: sqlite3.Connection) -> list[dict]:
+    rows = conn.execute(
+        """
+        SELECT runs.id, runs.task_id, runs.error, runs.created_at, runs.finished_at, tasks.title
+        FROM runs
+        LEFT JOIN tasks ON tasks.id = runs.task_id
+        WHERE runs.status = 'failed'
+        ORDER BY COALESCE(runs.finished_at, runs.created_at) DESC
+        LIMIT ?
+        """,
+        (RECENT_FAILURE_LIMIT,),
+    ).fetchall()
+    failures = []
+    for row in rows:
+        failures.append(
+            sanitize(
+                {
+                    "run_id": row["id"],
+                    "task_id": row["task_id"],
+                    "task_title": sanitize_text(row["title"] or "", max_length=120),
+                    "error": sanitize_text(row["error"] or "运行失败。", max_length=300),
+                    "created_at": row["created_at"],
+                    "finished_at": row["finished_at"],
+                }
+            )
+        )
+    return failures
+
+
+def _parse_iso(value: str | None) -> datetime | None:
+    if not value:
+        return None
+    try:
+        return datetime.fromisoformat(value)
+    except ValueError:
+        return None