lights-think · IMHEREJOE · Jun 3, 2026 · Jun 3, 2026 · Jun 3, 2026
diff --git a/COLLABORATION_LOG.md b/COLLABORATION_LOG.md
@@ -4,39 +4,51 @@
 
 ## Task Understanding
 
-- Goal:
-- Non-goals:
-- Protected contracts:
+- Goal: 补全企业 Agent 后端的核心能力，使其满足 README 中的公开契约，包含任务创建/运行、Planner->Executor 工具链、RAG 权限感知检索、OA 草稿边界、审计与管理后台指标。
+- Non-goals: 重构架构、引入外部数据库或第三方云服务、变更公开 API 字段名或删除公开契约字段。
+- Protected contracts: `GET /api/runs/{run_id}` 返回的 `result` 结构、`/api/knowledge/search` 的 `answer/citations/filtered_doc_ids`、审计动作名与脱敏规则（禁止 `vendor_secret`/`unit_cost_usd` 等出现在响应或日志中）。
 
 ## Collaboration Disclosure
 
-- Primary AI software/model or human name:
-- Other tools or collaborators:
-- Division of work:
+
+- Primary AI software/model or human name: AI-assisted by GitHub Copilot (implementation aided by an assistant using GPT-5 mini); candidate developer verified and supervised changes in the workspace.
+- Other tools or collaborators: local Python 3.10 venv, pytest for local verification.
+- Division of work: AI generated implementation and edits; developer ran tests, reviewed diffs and approved commits.
 
 ## Ambiguities And Assumptions
 
 | Item | Impact | Decision |
 | --- | --- | --- |
-|  |  |  |
+| 是否在创建运行前严格校验工具级权限 | 决定运行行为：立即 403 或允许进入 run 并在执行中失败 | 假设：优先在 `POST /api/tasks/{id}/run` 做工具权限预检并记录拒绝审计（实现了预检并返回 403） |
+| prompt 中是否必含 SKU | Planner 能否推断目标对象 | 假设：候选人应在 prompt 中包含可识别 SKU（实现中若无法识别会使 run 失败并记录错误） |
+| OA 草稿创建的权限边界 | 是否允许分析类任务隐式创建草稿 | 决定：严格受 `oa:approval:write` 权限控制，分析类任务或无权限时跳过创建并记录事件/审计。 |
 
 ## AGENTS.md Historical Notes Review
 
 | Historical note | Adopted or rejected | Evidence |
 | --- | --- | --- |
-|  |  |  |
+| 公开测试只检查 API 外形，因此可以先不实现完整运行事件和审计日志。 | Rejected | 已实现并记录运行事件与审计（参见 `backend/worker.py` 的 `_record_step_events` 以及多处 `insert_audit_log`）。公开测试与验收导向均能检测事件和审计行为，且本地运行通过。 |
+| 当前 fixture 主要是 `SKU-001` 和 `SKU-002`，实现时可以优先按这两个 SKU 写固定分支。 | Rejected | Planner 实现为基于正则提取 SKU（`agent/planner.py`），未对具体 SKU 硬编码，支持替换 fixture。测试使用样例但逻辑非为单一 SKU 写死。 |
+| Dashboard 字段可以按实现方便重命名，前端会适配。 | Rejected | 遵守 README 中的字段契约（`task_count`/`run_count`/`failure_rate`/`token_cost`/`tool_call_counts` 等），避免破坏兼容性（见 `admin/metrics.py`）。 |
+| 如果用户能创建任务，就默认允许创建 OA 审批草稿，后续再补权限。 | Rejected | OA 写入严格受权限控制；在执行前根据用户权限决定 `skip_oa`，未默认放通（见 `backend/worker.py` 的 `skip_oa` 逻辑与审计）。 |
+| 知识库检索只要返回一段答案即可，citation 和被过滤文档列表可以后置。 | Rejected | 实现了权限感知检索、引用返回与 `filtered_doc_ids`（参见 `rag/search.py`），并对受限文档做过滤以避免泄露。 |
+| 为了减少失败噪音，工具异常可以统一吞掉并返回空结果。 | Rejected | 工具异常采用重试策略并在失败时上报错误，执行器会记录失败并使 run 进入 `failed`，避免静默吞错（见 `agent/tools.py` 和 `agent/executor.py`）。 |
 
 ## Root Cause Notes
 
 | Symptom | Evidence | Root cause | Fix |
 | --- | --- | --- | --- |
-|  |  |  |  |
+| Starter repo 测试未通过（占位实现） | 多个 `TODO(candidate/...)` 注释和抛出 `NotImplementedError` | 初始仓库保留占位实现以作为考题 | 实现 Planner、Executor、RAG 权限检索、工具脱敏、审计和管理统计；参见修改文件列表。 |
+| 提示词注入未检测 | `rag/security.py` 含模式但未接入 | 项目起始时未将检测挂到创建任务路径 | 在 `backend/app.py` 的 `create_task` 中接入 `detect_prompt_injection` 并记录拒绝审计。 |
 
 ## Compatibility Notes
 
 | Surface | Existing behavior | Change | Compatibility plan |
 | --- | --- | --- | --- |
-| API |  |  |  |
+| API | 公开契约字段（runs.result, knowledge.search 输出, audit actions） | 增加审计日志条目、事件类型 `tool.skipped`，并在 run result 中包含 `approval_draft_id` 仅当有权限 | 保持原字段名与语义，新增字段仅在合规条件下出现，未删除或重命名任何公开字段。 |
+| Database | schema 如 README 所示 | 向 `runs.token_cost` 写入估算成本，向 `run_events` 写入事件 | 未修改表结构，兼容已有数据；写入内容均经过脱敏。 |
+| Permissions | 以前未完全实现工具級预检 | 在 `POST /api/tasks/{id}/run` 添加了工具权限预检并记录拒绝审计 | 兼容现有权限语义；预检只在缺少关键读权限时返回 403（不会静默跳过）。 |
+| Audit logs | 仅有部分审计事件 | 增加 `task.rejected`、`approval.draft.create`、`run.events.read` 等记录 | 审计字段与 README 兼容，payload 经过脱敏；保持决策可追溯。 |
 | Database |  |  |  |
 | Permissions |  |  |  |
 | Audit logs |  |  |  |
@@ -45,9 +57,16 @@
 
 | Command | Result | Notes |
 | --- | --- | --- |
-| `py scripts/self_check.py` |  | Public contract self-check. |
-| `py -m pytest -q` |  | Full local suite; explain any expected xfail. |
+| `py scripts/self_check.py` (via configured venv: `e:\\youliu\\Agent-test\\.venv\\Scripts\\python.exe scripts/self_check.py`) | Exit 0 — 公共自检通过 | 运行公开自检脚本，确认 API 基本契约。 |
+| `py -m pytest -q` (via venv) | `4 passed, 1 xfailed, 5 xpassed` | 全量本地测试通过。`tests/test_acceptance_guidance.py::test_acceptance_alice_inventory_replenishment_loop` 标记为 xfail（starter repo 的验收导向之一），其余验收导向用例意外通过（xpass），说明实现已覆盖多个验收要求。 |
 
-## Remaining Risks
+未覆盖/可改进风险与注意事项:
+- token 计费为简单计数（每完成步骤 +1），并非真实 token 计费。若正式评分依赖精确成本，需要替换为基于模型返回的 token 估算接口。
+- 当前 Planner 使用正则提取 SKU，复杂自然语言场景可能需要更健壮的解析或 LLM 回退。
+- 虽做了模式化的提示词注入检测，但检测并非穷尽；可在后续增加更严格规则或模型级检查。 
 
+## Remaining Risks
+ - token 计费精度不足（当前为步骤计数）。
+ - Planner 对异常或模糊 prompt 的鲁棒性有限。
+ - 提示词注入检测为基于正则的启发式规则，可能存在漏报或误报。 
 - 
diff --git a/agentops_assessment/admin/metrics.py b/agentops_assessment/admin/metrics.py
@@ -2,6 +2,7 @@
 
 import sqlite3
 from collections import Counter
+from datetime import datetime
 
 from agentops_assessment.backend import database
 
@@ -15,13 +16,39 @@ def build_dashboard(conn: sqlite3.Connection) -> dict:
     completed_count = conn.execute(
         "SELECT COUNT(*) AS c FROM runs WHERE status = 'completed'"
     ).fetchone()["c"]
-    token_cost = conn.execute("SELECT COALESCE(SUM(token_cost), 0) AS c FROM runs").fetchone()[
-        "c"
-    ]
+    token_cost = conn.execute("SELECT COALESCE(SUM(token_cost), 0) AS c FROM runs").fetchone()["c"]
     events = conn.execute("SELECT tool_name FROM run_events WHERE tool_name IS NOT NULL").fetchall()
     tool_counts = Counter(row["tool_name"] for row in events)
 
-    # TODO(candidate/P2): 补充平均耗时、最近失败、按工具拆分的成本和队列健康度。
+    durations = conn.execute(
+        "SELECT started_at, finished_at FROM runs WHERE started_at IS NOT NULL AND finished_at IS NOT NULL"
+    ).fetchall()
+    total_seconds = 0.0
+    counted = 0
+    for row in durations:
+        try:
+            started = datetime.fromisoformat(row["started_at"])
+            finished = datetime.fromisoformat(row["finished_at"])
+            total_seconds += (finished - started).total_seconds()
+            counted += 1
+        except Exception:
+            continue
+
+    recent_failures = [
+        {
+            "run_id": row["id"],
+            "task_id": row["task_id"],
+            "error": row["error"],
+            "finished_at": row["finished_at"],
+        }
+        for row in conn.execute(
+            "SELECT id, task_id, error, finished_at FROM runs WHERE status = 'failed' ORDER BY finished_at DESC LIMIT 5"
+        ).fetchall()
+    ]
+
+    queued_count = conn.execute("SELECT COUNT(*) AS c FROM runs WHERE status = 'queued'").fetchone()["c"]
+    running_count = conn.execute("SELECT COUNT(*) AS c FROM runs WHERE status = 'running'").fetchone()["c"]
+
     return {
         "task_count": task_count,
         "run_count": run_count,
@@ -30,5 +57,9 @@ def build_dashboard(conn: sqlite3.Connection) -> dict:
         "failure_rate": failed_count / run_count if run_count else 0,
         "token_cost": token_cost,
         "tool_call_counts": dict(tool_counts),
+        "average_run_seconds": total_seconds / counted if counted else 0,
+        "queued_count": queued_count,
+        "running_count": running_count,
+        "recent_failures": recent_failures,
         "generated_at": database.now_iso(),
     }
diff --git a/agentops_assessment/agent/executor.py b/agentops_assessment/agent/executor.py
@@ -16,17 +16,42 @@ def __init__(
         self.registry = registry
         self.state_store = state_store or InMemoryRunStateStore()
 
+    def _render_value(self, value: Any, context: dict[str, Any]) -> Any:
+        if isinstance(value, str):
+            try:
+                return value.format_map({k: v for k, v in context.items() if isinstance(v, (str, int, float))})
+            except Exception:
+                return value
+        if isinstance(value, dict):
+            return {key: self._render_value(val, context) for key, val in value.items()}
+        if isinstance(value, list):
+            return [self._render_value(item, context) for item in value]
+        return value
+
+    def _sanitize_tool_output(self, payload: Any) -> Any:
+        if isinstance(payload, dict):
+            sanitized = {}
+            for key, value in payload.items():
+                if key in {"vendor_secret", "unit_cost_usd"}:
+                    continue
+                sanitized[key] = self._sanitize_tool_output(value)
+            return sanitized
+        if isinstance(payload, list):
+            return [self._sanitize_tool_output(item) for item in payload]
+        return payload
+
+    def _merge_context(self, context: dict[str, Any], output: Any) -> None:
+        if isinstance(output, dict):
+            for key, value in output.items():
+                context[key] = value
+
     def execute(
         self,
         run_id: str,
         plan: list[PlanStep],
         context: dict[str, Any],
     ) -> RunState:
-        """执行计划并持久化步骤状态。
-
-        TODO(candidate/P0): 实现可恢复的多步骤执行、工具入参渲染、
-        步骤事件持久化、错误处理和最终业务结果汇总。
-        """
+        """执行计划并持久化步骤状态。"""
         state = RunState(
             run_id=run_id,
             status="failed",
@@ -40,4 +65,34 @@ def execute(
             ],
         )
         self.state_store.save(state)
-        raise NotImplementedError("TODO(candidate/P0): 实现 Agent 执行器。")
+
+        for index, step in enumerate(plan):
+            step_state = state.steps[index]
+            step_state.status = "running"
+            self.state_store.save(state)
+
+            if step.tool_name == "oa.create_approval_draft" and context.get("skip_oa", False):
+                step_state.status = "skipped"
+                step_state.output = {"reason": "oa approval skipped by policy or permission"}
+                self.state_store.save(state)
+                continue
+
+            try:
+                tool_args = self._render_value(step.input_template, context)
+                output = self.registry.call(step.tool_name, tool_args)
+                sanitized_output = self._sanitize_tool_output(output)
+                step_state.output = sanitized_output
+                step_state.status = "completed"
+                self._merge_context(context, sanitized_output)
+                self.state_store.save(state)
+            except Exception as exc:
+                step_state.status = "failed"
+                step_state.error = str(exc)
+                state.status = "failed"
+                self.state_store.save(state)
+                return state
+
+        state.status = "completed"
+        state.result = {k: v for k, v in context.items() if k not in {"prompt", "user_permissions", "analysis_only", "requires_approval", "user_has_oa_write", "skip_oa"}}
+        self.state_store.save(state)
+        return state
diff --git a/agentops_assessment/agent/planner.py b/agentops_assessment/agent/planner.py
@@ -1,5 +1,6 @@
 from __future__ import annotations
 
+import re
 from dataclasses import dataclass, field
 from typing import Any
 
@@ -15,21 +16,96 @@ class PlanStep:
 
 
 class Planner:
+    SKU_PATTERN = re.compile(r"\bSKU-[A-Z0-9]+\b", re.IGNORECASE)
+    ANALYSIS_ONLY_PATTERNS = [
+        re.compile(r"只分析", re.IGNORECASE),
+        re.compile(r"不创建", re.IGNORECASE),
+        re.compile(r"不要创建", re.IGNORECASE),
+        re.compile(r"不生成.*审批", re.IGNORECASE),
+        re.compile(r"仅分析", re.IGNORECASE),
+    ]
+    APPROVAL_INTENT_PATTERNS = [
+        re.compile(r"审批草稿", re.IGNORECASE),
+        re.compile(r"补货审批", re.IGNORECASE),
+        re.compile(r"创建审批", re.IGNORECASE),
+        re.compile(r"审批建议", re.IGNORECASE),
+        re.compile(r"补货建议", re.IGNORECASE),
+    ]
+
     def __init__(self, llm: FakeLLM | None = None) -> None:
         self.llm = llm or FakeLLM()
 
-    def create_plan(self, prompt: str, context: dict[str, Any] | None = None) -> list[PlanStep]:
-        """为业务请求创建多步骤工具计划。
+    def _extract_sku(self, prompt: str) -> str | None:
+        match = self.SKU_PATTERN.search(prompt)
+        return match.group(0).upper() if match else None
+
+    def _is_analysis_only(self, prompt: str) -> bool:
+        return any(pattern.search(prompt) for pattern in self.ANALYSIS_ONLY_PATTERNS)
+
+    def _requires_approval(self, prompt: str) -> bool:
+        return any(pattern.search(prompt) for pattern in self.APPROVAL_INTENT_PATTERNS)
 
-        TODO(candidate/P0): 推断 SKU 和业务意图，选择必要工具，并返回一个
-        确定性的计划。计划应覆盖 ERP、BI、知识库、必要的供应商风险
-        和可能的 OA 审批步骤，不能写死单个用户、SKU 或样例 prompt。
-        """
+    def create_plan(self, prompt: str, context: dict[str, Any] | None = None) -> list[PlanStep]:
+        """为业务请求创建多步骤工具计划。"""
         self.llm.complete(prompt)
-        return [
+        sku = self._extract_sku(prompt)
+        if not sku:
+            raise ValueError("无法识别 SKU。请在提示中包含有效 SKU。")
+
+        analysis_only = self._is_analysis_only(prompt)
+        include_oa = not analysis_only and self._requires_approval(prompt)
+
+        plan: list[PlanStep] = [
             PlanStep(
-                id="understand_request",
-                tool_name="llm.summarize",
-                description="占位步骤。请替换为真实的业务执行计划。",
-            )
+                id="erp_inventory",
+                tool_name="erp.get_inventory",
+                description="获取ERP库存信息",
+                input_template={"sku": sku},
+            ),
+            PlanStep(
+                id="bi_sales",
+                tool_name="bi.get_sales",
+                description="获取BI销售预测数据",
+                input_template={"sku": sku},
+            ),
+            PlanStep(
+                id="knowledge_search",
+                tool_name="knowledge.search",
+                description="检索知识库中的库存与审批规则",
+                input_template={
+                    "query": prompt,
+                    "user_permissions": "{user_permissions}",
+                    "top_k": 3,
+                },
+            ),
+            PlanStep(
+                id="supplier_risk",
+                tool_name="supplier.get_risk",
+                description="获取供应商风险数据",
+                input_template={"supplier_id": "{supplier_id}"},
+            ),
         ]
+        if include_oa:
+            plan.append(
+                PlanStep(
+                    id="oa_draft",
+                    tool_name="oa.create_approval_draft",
+                    description="创建补货审批草稿",
+                    input_template={
+                        "sku": sku,
+                        "warehouse": "{warehouse}",
+                        "stock_gap": "{stock_gap}",
+                        "forecast_units_next_14d": "{forecast_units_next_14d}",
+                        "supplier_id": "{supplier_id}",
+                        "approval_type": "inventory_replenishment",
+                    },
+                )
+            )
+        return plan
+
+    def parse_intent(self, prompt: str) -> dict[str, Any]:
+        return {
+            "sku": self._extract_sku(prompt),
+            "analysis_only": self._is_analysis_only(prompt),
+            "requires_approval": self._requires_approval(prompt),
+        }
diff --git a/agentops_assessment/agent/tools.py b/agentops_assessment/agent/tools.py
@@ -13,6 +13,20 @@
 
 ToolCallable = Callable[[dict[str, Any]], dict[str, Any]]
 
+SENSITIVE_KEYS = {"vendor_secret", "unit_cost_usd"}
+
+
+def sanitize_payload(value: Any) -> Any:
+    if isinstance(value, dict):
+        return {
+            key: sanitize_payload(val)
+            for key, val in value.items()
+            if key not in SENSITIVE_KEYS
+        }
+    if isinstance(value, list):
+        return [sanitize_payload(item) for item in value]
+    return value
+
 
 class ToolRegistry:
     def __init__(self, retry_attempts: int = 1) -> None:
@@ -67,9 +81,7 @@ def call(self, name: str, args: dict[str, Any]) -> dict[str, Any]:
             self.last_call_attempts[name] = attempts
             try:
                 result = self._tools[name](args)
-                # TODO(candidate/P1): 规范化工具输出，并对敏感字段做脱敏；
-                # vendor_secret、unit_cost_usd 等不得进入 result/events/audit。
-                return result
+                return sanitize_payload(result)
             except TransientIntegrationError as exc:
                 last_error = exc
                 continue