Summary
Sequential codex-companion task calls fail ~50% of the time with
codex app-server connection closed, in a near-perfect pass / fail / pass / fail
alternation. This is single-session, non-concurrent — one call at a time — and is
not project-specific (reproduces from any cwd, including $HOME).
Auth//codex:setup is healthy, and a directly-spawned codex app-server
(initialize + thread/start) is 100% reliable, so the app-server itself is fine — the
failure is in broker reuse + the direct-retry fallback.
Env: plugin openai/codex-plugin-cc v1.0.5 (latest) · codex-cli 0.142.1 ·
Node v22.22.3 · macOS arm64.
Steps to reproduce
cd ~ # any directory; no project .codex needed
for i in $(seq 1 6); do
node <plugin>/1.0.5/scripts/codex-companion.mjs task --fresh --effort low "Reply: $i"
done
# => ~3/6 fail with "codex app-server connection closed", alternating ok/FAIL/ok/FAIL/...
Forcing a fresh broker each run (delete state/<slug>-<hash>/broker.json before each
call) → 6/6 pass. That isolates the cause to broker reuse.
Root cause
task connects through the broker: CodexAppServerClient.connect() →
ensureBrokerSession(). The broker process exits after serving a single turn, but
its broker.json + unix socket linger briefly:
- Run N — no live broker → spawn fresh broker → turn succeeds. Broker lingers.
- Run N+1 —
isBrokerEndpointReady() still connects/initializes against that broker,
so it is reused → the broker drops the second turn →
AppServerClientBase.handleExit(null) (scripts/lib/app-server.mjs:172) →
Error("codex app-server connection closed.") with no rpcCode and no code.
This failure tears down the broker.
- Run N+2 — broker gone → fresh spawn → succeeds. → the alternation.
The existing direct-mode fallback in withAppServer()
(scripts/lib/codex.mjs:620-633) does not catch this. shouldRetryDirect only
triggers on BROKER_BUSY_RPC_CODE, ENOENT, or ECONNREFUSED — never on a clean
"connection closed" mid-turn — so the error is thrown to the caller instead of retrying
direct:
const shouldRetryDirect =
(client?.transport === "broker" && error?.rpcCode === BROKER_BUSY_RPC_CODE) ||
(brokerRequested && (error?.code === "ENOENT" || error?.code === "ECONNREFUSED"));
Suggested fix (verified locally)
Extend shouldRetryDirect to also retry direct when a broker connection drops:
const brokerConnectionDropped =
client?.transport === "broker" &&
(error?.code === "ECONNRESET" ||
error?.code === "EPIPE" ||
/connection closed|exited unexpectedly/i.test(error?.message ?? ""));
const shouldRetryDirect =
(client?.transport === "broker" && error?.rpcCode === BROKER_BUSY_RPC_CODE) ||
brokerConnectionDropped ||
(brokerRequested && (error?.code === "ENOENT" || error?.code === "ECONNREFUSED"));
After this patch, with no broker clearing: 6/6 and 5/5 consecutive runs pass across
two different cwds. This just restores the fallback's own intent. A more thorough fix
could additionally avoid reusing a broker that is shutting down (e.g. liveness/age check
in isBrokerEndpointReady/ensureBrokerSession), but the broker is alive enough to
accept+initialize at reuse time, so the direct-retry is what actually unblocks it.
Related but distinct
Summary
Sequential
codex-companion taskcalls fail ~50% of the time withcodex app-server connection closed, in a near-perfect pass / fail / pass / failalternation. This is single-session, non-concurrent — one call at a time — and is
not project-specific (reproduces from any cwd, including
$HOME).Auth/
/codex:setupis healthy, and a directly-spawnedcodex app-server(
initialize+thread/start) is 100% reliable, so the app-server itself is fine — thefailure is in broker reuse + the direct-retry fallback.
Env: plugin
openai/codex-plugin-ccv1.0.5 (latest) · codex-cli 0.142.1 ·Node v22.22.3 · macOS arm64.
Steps to reproduce
Forcing a fresh broker each run (delete
state/<slug>-<hash>/broker.jsonbefore eachcall) → 6/6 pass. That isolates the cause to broker reuse.
Root cause
taskconnects through the broker:CodexAppServerClient.connect()→ensureBrokerSession(). The broker process exits after serving a single turn, butits
broker.json+ unix socket linger briefly:isBrokerEndpointReady()still connects/initializes against that broker,so it is reused → the broker drops the second turn →
AppServerClientBase.handleExit(null)(scripts/lib/app-server.mjs:172) →Error("codex app-server connection closed.")with norpcCodeand nocode.This failure tears down the broker.
The existing direct-mode fallback in
withAppServer()(
scripts/lib/codex.mjs:620-633) does not catch this.shouldRetryDirectonlytriggers on
BROKER_BUSY_RPC_CODE,ENOENT, orECONNREFUSED— never on a clean"connection closed" mid-turn — so the error is thrown to the caller instead of retrying
direct:
Suggested fix (verified locally)
Extend
shouldRetryDirectto also retry direct when a broker connection drops:After this patch, with no broker clearing: 6/6 and 5/5 consecutive runs pass across
two different cwds. This just restores the fallback's own intent. A more thorough fix
could additionally avoid reusing a broker that is shutting down (e.g. liveness/age check
in
isBrokerEndpointReady/ensureBrokerSession), but the broker is alive enough toaccept+
initializeat reuse time, so the direct-retry is what actually unblocks it.Related but distinct
ensureBrokerSession/broker.json, but is about concurrentsame-cwd invocations. This report is purely sequential — no race.
dropping the next turn during normal use.
jobs.json, "Task is still running"), a differentfailure path and error message.