Describe the bug
The copilot CLI process can silently hang for many minutes (potentially indefinitely) with data permanently backed up in its TCP send buffers to GitHub API endpoints, while producing zero output to either its process log or the session events.jsonl. The user-facing TUI shows no indication that anything is wrong (no spinner change, no error, no "reconnecting…" message); the process simply stops making progress.
The underlying network condition appears to be a stale TCP connection (most likely after a network/route change on a multi-homed host such as VPN/Tailscale on/off, Wi-Fi → Ethernet handoff, etc.). The kernel keeps the socket in ESTAB because there is no TCP keepalive and no application-level write deadline, so the bytes sit in Send-Q forever. The CLI never times out, never retries on a fresh connection, and never surfaces a diagnostic to the user.
Observed signature
$ ss -tnpi | grep "pid=<copilot-pid>"
ESTAB 0 1481886 10.83.27.238:36816 140.82.112.22:443 users:(("MainThread",pid=92622,fd=129))
ESTAB 0 3154 10.83.27.238:40390 140.82.114.21:443 users:(("MainThread",pid=92622,fd=28))
ESTAB 0 2510 10.83.27.238:40396 140.82.114.21:443 users:(("MainThread",pid=92622,fd=39))
ESTAB 0 3042 10.83.27.238:47374 140.82.113.21:443 users:(("MainThread",pid=92622,fd=45))
All peer IPs reverse-resolve to lb-140-82-*-*-iad.github.com (GitHub's IAD load balancers). Notice the non-zero Send-Q values that never drain — these connections are dead from GitHub's perspective but the local kernel still considers them established.
Meanwhile:
$ ls -la --time-style=full-iso ~/.copilot/logs/process-*-92622.log \
~/.copilot/session-state/<sid>/events.jsonl
-rw-r--r-- ... 21:00:43 ... process-1779115412438-92622.log
-rw------- ... 20:48:45 ... <sid>/events.jsonl
…no writes for ~5 and ~17 minutes respectively, while the user is sitting at the prompt with no feedback. The process state is S (sleeping in epoll_wait); from /proc/<pid>/wchan it is parked waiting for EPOLLOUT on the dead sockets.
Affected version
GitHub Copilot CLI 1.0.48
(@github/copilot npm package, Linux x64 binary.)
Steps to reproduce the behavior
Hard to reproduce on demand, but a reliable recipe on a multi-homed Linux host:
- Start a
copilot session over a network interface that has a default route to GitHub (e.g. a VPN such as Tailscale providing the 10.83.27.238 address above).
- Issue any request that causes the CLI to open HTTPS connections to
api.github.com / *.github.com.
- Bring that interface down, change the default route, or otherwise break the path without sending RSTs back to the local kernel (interface down, NAT box reboot, ISP blip, VPN reconnect, suspend/resume).
- Issue another request, or simply wait until the CLI tries to use one of the pooled HTTPS connections.
- The CLI hangs. No log lines, no
events.jsonl entries, no TUI feedback. ss -tnpi on the PID shows non-zero Send-Q to GitHub IPs that never decreases.
It also reproduces "naturally" during ordinary daily use on machines that toggle between VPN/no-VPN or Wi-Fi/Ethernet.
Expected behavior
The CLI should not hang silently on a dead socket. At minimum, one or more of:
- Per-request write/read timeouts on the HTTP client used for GitHub API traffic, so a stalled socket is detected within a bounded time (e.g. tens of seconds) instead of relying on the kernel's default TCP retransmission timeout (which can be 15+ minutes).
- TCP keepalive (
SO_KEEPALIVE with sensible TCP_KEEPIDLE/TCP_KEEPINTVL/TCP_KEEPCNT) on long-lived sockets so the kernel itself tears down dead connections faster.
- Connection-pool health checks that proactively close and replace sockets that have been idle across a known network change (or just replace on any I/O error / timeout rather than retrying on the same dead FD).
- User-visible feedback in the TUI when an outbound request has been pending longer than a short threshold — even just a "still waiting on api.github.com…" line — so the user can distinguish "the model is thinking" from "we are wedged on a dead socket".
- Either an automatic retry on a fresh connection, or a clear error surfaced to the user with guidance (e.g.
/reconnect or similar).
Additional context
- OS: Linux (CachyOS, kernel 7.0.6-1-cachyos, x86_64)
- Node: v26.1.0
- Install:
@github/copilot global npm install, binary at /usr/lib/node_modules/@github/copilot/node_modules/@github/copilot-linux-x64/copilot
- Terminal / shell: standard pty, bash
- Network: multi-homed — local LAN (
192.168.1.12) + Tailscale (10.83.27.238) + IPv6. The stuck sockets in this incident were all bound to the Tailscale address; concurrently-healthy sockets on the same PID were on IPv6 and on the LAN address.
Diagnostic commands that help confirm the bug
# 1. Stuck send queues on copilot's sockets
ss -tnpi | awk 'NR>1 && /pid=<COPILOT_PID>/ && $3+0>0'
# 2. Process is parked in epoll_wait, not crashed
cat /proc/<COPILOT_PID>/wchan; echo
cat /proc/<COPILOT_PID>/status | grep -E '^(State|Threads)'
# 3. No log/event writes despite the process being "running"
stat -c '%y %n' ~/.copilot/logs/process-*-<COPILOT_PID>.log
stat -c '%y %n' ~/.copilot/session-state/<SID>/events.jsonl
Workaround
sudo ss -K dst <github-ip> dport = 443 to forcibly destroy the stuck sockets (requires CAP_NET_ADMIN).
- Or, killing the
copilot process and restarting the session. Either way, no in-product recovery is offered.
Related (not duplicates)
Describe the bug
The
copilotCLI process can silently hang for many minutes (potentially indefinitely) with data permanently backed up in its TCP send buffers to GitHub API endpoints, while producing zero output to either its process log or the sessionevents.jsonl. The user-facing TUI shows no indication that anything is wrong (no spinner change, no error, no "reconnecting…" message); the process simply stops making progress.The underlying network condition appears to be a stale TCP connection (most likely after a network/route change on a multi-homed host such as VPN/Tailscale on/off, Wi-Fi → Ethernet handoff, etc.). The kernel keeps the socket in
ESTABbecause there is no TCP keepalive and no application-level write deadline, so the bytes sit inSend-Qforever. The CLI never times out, never retries on a fresh connection, and never surfaces a diagnostic to the user.Observed signature
All peer IPs reverse-resolve to
lb-140-82-*-*-iad.github.com(GitHub's IAD load balancers). Notice the non-zeroSend-Qvalues that never drain — these connections are dead from GitHub's perspective but the local kernel still considers them established.Meanwhile:
…no writes for ~5 and ~17 minutes respectively, while the user is sitting at the prompt with no feedback. The process state is
S(sleeping inepoll_wait); from/proc/<pid>/wchanit is parked waiting forEPOLLOUTon the dead sockets.Affected version
(
@github/copilotnpm package, Linux x64 binary.)Steps to reproduce the behavior
Hard to reproduce on demand, but a reliable recipe on a multi-homed Linux host:
copilotsession over a network interface that has a default route to GitHub (e.g. a VPN such as Tailscale providing the10.83.27.238address above).api.github.com/*.github.com.events.jsonlentries, no TUI feedback.ss -tnpion the PID shows non-zeroSend-Qto GitHub IPs that never decreases.It also reproduces "naturally" during ordinary daily use on machines that toggle between VPN/no-VPN or Wi-Fi/Ethernet.
Expected behavior
The CLI should not hang silently on a dead socket. At minimum, one or more of:
SO_KEEPALIVEwith sensibleTCP_KEEPIDLE/TCP_KEEPINTVL/TCP_KEEPCNT) on long-lived sockets so the kernel itself tears down dead connections faster./reconnector similar).Additional context
@github/copilotglobal npm install, binary at/usr/lib/node_modules/@github/copilot/node_modules/@github/copilot-linux-x64/copilot192.168.1.12) + Tailscale (10.83.27.238) + IPv6. The stuck sockets in this incident were all bound to the Tailscale address; concurrently-healthy sockets on the same PID were on IPv6 and on the LAN address.Diagnostic commands that help confirm the bug
Workaround
sudo ss -K dst <github-ip> dport = 443to forcibly destroy the stuck sockets (requiresCAP_NET_ADMIN).copilotprocess and restarting the session. Either way, no in-product recovery is offered.Related (not duplicates)
TypeError: fetch failedafter idle period — CLI reuses dead pooled TCP connection #3257 — "HTTP MCP servers fail withTypeError: fetch failedafter idle period — CLI reuses dead pooled TCP connection" (same class of bug — stale pooled connection — but for MCP transports rather than the main GitHub API client)