Symptom | 症状
With auto-recall enabled and existing memories present, gateway startup or the first user turn can block long enough to trigger health-check failures and restart loops.
在 auto-recall 开启且已有记忆数据时,gateway 启动阶段或首轮用户消息阶段可能被阻塞得足够久,最终触发健康检查失败甚至重启链。
Trigger | 触发条件
@memtensor/memos-local-openclaw-plugin@1.0.8
- Linux +
systemctl --user
- existing memory records already in database
allowPromptInjection=true and auto-recall enabled
- recall filtering path uses a slow/timeout-prone model
Minimal Reproduction | 最小复现
-
Enable auto-recall.
-
Have enough existing memories in the local database.
-
Use a slow model on the recall/filter path.
-
Restart gateway or send the first message after startup.
-
On affected runs, recall/filter work blocks the critical path for tens of seconds.
-
开启 auto-recall。
-
本地数据库里已有一定数量的记忆。
-
让 recall/filter 链路走一个慢模型。
-
重启 gateway,或在启动后发送第一条消息。
-
有问题的运行里,这条 recall/filter 工作会把关键路径阻塞几十秒。
Actual Result | 实际结果
In the affected environment, this could stall the startup / first-turn path for ~30-40s. That was enough to trip health checks and contribute to restart loops.
在实际环境里,这条链路会把启动或首轮路径卡住约 30-40 秒,足以打爆健康检查,并参与重启循环。
Local Workaround | 本地临时解决办法
The local mitigation was:
- add hard timeout around recall/filter LLM work
- fail open on timeout/errors
- do not let recall/filter exceptions propagate to the gateway top level
- keep startup
ready independent from slow recall work
本地止血方式是:
- 给 recall/filter 的 LLM 工作加硬超时
- 超时或报错后 fail-open
- 不让 recall/filter 异常再抛到 gateway 顶层
- 保持 startup
ready 不依赖慢召回链路
Suggested Fix | 建议修复方向
Please move auto-recall filtering out of the startup-critical / first-turn-critical path, and enforce timeout + fail-open semantics for slow recall models. Recall enrichment should be optional context, not something that can delay readiness or destabilize the gateway.
建议把 auto-recall filtering 从 startup-critical / first-turn-critical 路径中挪开,并对慢召回模型强制 timeout + fail-open。召回增强应该是可选上下文,而不应该影响 ready 或拖垮 gateway 稳定性。
Symptom | 症状
With
auto-recallenabled and existing memories present, gateway startup or the first user turn can block long enough to trigger health-check failures and restart loops.在
auto-recall开启且已有记忆数据时,gateway 启动阶段或首轮用户消息阶段可能被阻塞得足够久,最终触发健康检查失败甚至重启链。Trigger | 触发条件
@memtensor/memos-local-openclaw-plugin@1.0.8systemctl --userallowPromptInjection=trueandauto-recall enabledMinimal Reproduction | 最小复现
Enable
auto-recall.Have enough existing memories in the local database.
Use a slow model on the recall/filter path.
Restart gateway or send the first message after startup.
On affected runs, recall/filter work blocks the critical path for tens of seconds.
开启
auto-recall。本地数据库里已有一定数量的记忆。
让 recall/filter 链路走一个慢模型。
重启 gateway,或在启动后发送第一条消息。
有问题的运行里,这条 recall/filter 工作会把关键路径阻塞几十秒。
Actual Result | 实际结果
In the affected environment, this could stall the startup / first-turn path for ~30-40s. That was enough to trip health checks and contribute to restart loops.
在实际环境里,这条链路会把启动或首轮路径卡住约 30-40 秒,足以打爆健康检查,并参与重启循环。
Local Workaround | 本地临时解决办法
The local mitigation was:
readyindependent from slow recall work本地止血方式是:
ready不依赖慢召回链路Suggested Fix | 建议修复方向
Please move auto-recall filtering out of the startup-critical / first-turn-critical path, and enforce timeout + fail-open semantics for slow recall models. Recall enrichment should be optional context, not something that can delay readiness or destabilize the gateway.
建议把 auto-recall filtering 从 startup-critical / first-turn-critical 路径中挪开,并对慢召回模型强制 timeout + fail-open。召回增强应该是可选上下文,而不应该影响 ready 或拖垮 gateway 稳定性。