Skip to content

fix(net): break tx loop on non-recoverable errors to prevent busy-loop#606

Open
areporeporepo wants to merge 1 commit intocontainers:mainfrom
areporeporepo:fix/tx-busy-loop-on-persistent-errors
Open

fix(net): break tx loop on non-recoverable errors to prevent busy-loop#606
areporeporepo wants to merge 1 commit intocontainers:mainfrom
areporeporepo:fix/tx-busy-loop-on-persistent-errors

Conversation

@areporeporepo
Copy link
Copy Markdown

Summary

Fixes #602.

When process_tx() returns a persistent error other than NothingWritten (e.g. Internal or ProcessNotRunning), the loop in process_tx_loop() continues spinning if the guest keeps supplying TX buffers, because tx_has_deferred_frame is false and has_new_entries is true:

process_tx() → Err(Internal) → tx_has_deferred_frame = false
enable_notification() → has_new_entries = true

Fix

Break immediately on non-recoverable errors. The worker will be re-entered on the next:

  • TX queue kick (guest adds new descriptors)
  • Backend socket writable event (edge-triggered epoll OUT)

This is safe because Internal and ProcessNotRunning errors are non-transient — retrying immediately won't resolve them, but the next epoll cycle gives the system a chance to recover or the guest driver to reset.

Change

 self.tx_has_deferred_frame = match self.process_tx() {
     Err(TxError::Backend(WriteError::NothingWritten)) => true,
     Err(e) => {
-        false
+        break;
     }
     _ => false,
 };

Testing

  • cargo clippy --features net -- -D warnings passes
  • cargo fmt -- --check passes
  • CI unit tests (Linux-only, relying on PR checks)

When process_tx() returns a persistent error other than NothingWritten
(e.g. Internal or ProcessNotRunning), the loop in process_tx_loop()
continues spinning if the guest keeps supplying TX buffers, because
tx_has_deferred_frame is false and has_new_entries is true.

Break immediately on non-recoverable errors instead of returning false.
The worker will be re-entered on the next TX queue kick or backend
socket writable event from epoll.

Fixes: containers#602

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: anh nguyen <anh.nqqq@icloud.com>
Signed-off-by: anh nguyen <29374105+areporeporepo@users.noreply.github.com>
@slp
Copy link
Copy Markdown
Collaborator

slp commented Mar 27, 2026

/gemini review

@slp
Copy link
Copy Markdown
Collaborator

slp commented Mar 27, 2026

@mtjhrc PTAL

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the virtio net worker to break the TX processing loop upon encountering non-recoverable errors. This change is intended to prevent potential busy-looping scenarios where a guest continues to provide TX buffers despite backend failures. I have no feedback to provide as there were no review comments to evaluate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Possible busy-loop on process_tx errors

2 participants