Skip to content

auto-recall can block gateway startup / first-turn path long enough to fail health checks #1452

@starm0on

Description

@starm0on

Symptom | 症状

With auto-recall enabled and existing memories present, gateway startup or the first user turn can block long enough to trigger health-check failures and restart loops.

auto-recall 开启且已有记忆数据时,gateway 启动阶段或首轮用户消息阶段可能被阻塞得足够久,最终触发健康检查失败甚至重启链。

Trigger | 触发条件

  • @memtensor/memos-local-openclaw-plugin@1.0.8
  • Linux + systemctl --user
  • existing memory records already in database
  • allowPromptInjection=true and auto-recall enabled
  • recall filtering path uses a slow/timeout-prone model

Minimal Reproduction | 最小复现

  1. Enable auto-recall.

  2. Have enough existing memories in the local database.

  3. Use a slow model on the recall/filter path.

  4. Restart gateway or send the first message after startup.

  5. On affected runs, recall/filter work blocks the critical path for tens of seconds.

  6. 开启 auto-recall

  7. 本地数据库里已有一定数量的记忆。

  8. 让 recall/filter 链路走一个慢模型。

  9. 重启 gateway,或在启动后发送第一条消息。

  10. 有问题的运行里,这条 recall/filter 工作会把关键路径阻塞几十秒。

Actual Result | 实际结果

In the affected environment, this could stall the startup / first-turn path for ~30-40s. That was enough to trip health checks and contribute to restart loops.

在实际环境里,这条链路会把启动或首轮路径卡住约 30-40 秒,足以打爆健康检查,并参与重启循环。

Local Workaround | 本地临时解决办法

The local mitigation was:

  • add hard timeout around recall/filter LLM work
  • fail open on timeout/errors
  • do not let recall/filter exceptions propagate to the gateway top level
  • keep startup ready independent from slow recall work

本地止血方式是:

  • 给 recall/filter 的 LLM 工作加硬超时
  • 超时或报错后 fail-open
  • 不让 recall/filter 异常再抛到 gateway 顶层
  • 保持 startup ready 不依赖慢召回链路

Suggested Fix | 建议修复方向

Please move auto-recall filtering out of the startup-critical / first-turn-critical path, and enforce timeout + fail-open semantics for slow recall models. Recall enrichment should be optional context, not something that can delay readiness or destabilize the gateway.

建议把 auto-recall filtering 从 startup-critical / first-turn-critical 路径中挪开,并对慢召回模型强制 timeout + fail-open。召回增强应该是可选上下文,而不应该影响 ready 或拖垮 gateway 稳定性。

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions