Skip to content

File descriptor leak in saveStateSnapshot#188

Closed
AnyCPU wants to merge 1 commit into
basecamp:mainfrom
AnyCPU:fix/fd_leak_1
Closed

File descriptor leak in saveStateSnapshot#188
AnyCPU wants to merge 1 commit into
basecamp:mainfrom
AnyCPU:fix/fd_leak_1

Conversation

@AnyCPU

@AnyCPU AnyCPU commented Feb 13, 2026

Copy link
Copy Markdown
Contributor

This function is called on every state-mutating RPC operation (deploy, remove, pause, resume, rollout). Under active redeployment, this can exhaust the process file descriptor limit and cause subsequent file/socket operations to fail with EMFILE (too many open files).

Root cause

The matching read function RestoreLastSavedState had the same bug, which was fixed in commit 1a660f7 (Sep 2024) — but the write path was missed. The bug has been present since state persistence was introduced in commit 9451420 (Mar 2024).

Fix

Add defer f.Close() after the os.Create error check. Additionally, add slog.Error logging on all error paths, since every call site uses defer r.saveStateSnapshot() and discards the return value.

@AnyCPU

AnyCPU commented Feb 16, 2026

Copy link
Copy Markdown
Contributor Author

The branch fix/fd_leak_1 is now fully superseded by the feature/atomic_state_1 work already, see #195.

@AnyCPU AnyCPU closed this Mar 24, 2026
@AnyCPU AnyCPU deleted the fix/fd_leak_1 branch March 24, 2026 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant