Skip to content

Stabilize server integration test flakiness in CI websocket/DM paths#3

Merged
dlukt merged 3 commits intomainfrom
copilot/fix-cargo-test-failures
Mar 29, 2026
Merged

Stabilize server integration test flakiness in CI websocket/DM paths#3
dlukt merged 3 commits intomainfrom
copilot/fix-cargo-test-failures

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 29, 2026

cargo test in the server CI job was intermittently failing (run 23707518059, job 69061787458), including websocket_authenticated_upgrade_emits_hello_envelope. Failures were timing- and scheduling-sensitive in high-concurrency integration runs.

  • WebSocket event wait logic hardening (test helper)

    • Updated websocket_read_json_with_op to consume the remaining deadline budget on each read instead of polling with fixed 250ms slices.
    • This removes under-wait behavior where loop overhead could exhaust effective wait time before the declared timeout.
  • Flaky hello handshake assertion

    • Increased the hello-event wait in websocket_authenticated_upgrade_emits_hello_envelope from 1_500 to 5_000 ms to absorb CI jitter during startup/handshake.
  • CI server test execution mode

    • Switched server workflow test command to single-threaded execution:
      • cargo testRUST_TEST_THREADS=1 cargo test
    • This aligns CI with the integration suite’s non-thread-safe behavior and eliminates cross-test contention/race failures in websocket/DM cases.
async fn websocket_read_json_with_op(
    stream: &mut TcpStream,
    op: &str,
    total_timeout_ms: u64,
) -> Option<serde_json::Value> {
    let deadline = Instant::now() + Duration::from_millis(total_timeout_ms);
    while Instant::now() < deadline {
        let remaining = deadline.saturating_duration_since(Instant::now());
        if remaining.is_zero() {
            break;
        }
        let remaining_ms = remaining.as_millis().min(u64::MAX as u128) as u64;
        if let Some(message) = websocket_read_text_frame(stream, remaining_ms).await {
            let Ok(value) = serde_json::from_str::<serde_json::Value>(&message) else {
                continue;
            };
            if value["op"] == op {
                return Some(value);
            }
        } else {
            break;
        }
    }
    None
}

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • stun.l.google.com
    • Triggering command: /home/REDACTED/work/discool/discool/server/target/debug/deps/discool_server-e99f972a8d89f01a /home/REDACTED/work/discool/discool/server/target/debug/deps/discool_server-e99f972a8d89f01a -Wl,--as-needed -Wl,-Bstatic /home/REDACTED/.rustup/toolchains/1.93.0-x86_64-REDACTED-linux-gnu/lib/rustlib/x86_/home/REDACTED/work/discool/discool/server/target/debug/deps/discool_server-120cd1741168c41d.0ulktfwvu62u2xbeeqxp1rvox.1nih5gc.rcgu.o /home/REDACTED/.rustup/toolchains/1.93.0-x86_64-REDACTED-linux-gnu/lib/rustlib/x86_/home/REDACTED/work/discool/discool/server/target/debug/deps/discool_server-120cd1741168c41d.0w8p9670oxg5am6ar0rtbc49v.1nih5gc.rcgu.o /home/REDACTED/.rustup/toolchains/1.93.0-x86_64-REDACTED-linux-gnu/lib/rustlib/x86_/home/REDACTED/work/discool/discool/server/target/debug/deps/discool_server-120cd1741168c41d.12s5z8lfqda5ho8ygugq5tmhc.1nih5gc.rcgu.o /home/REDACTED/work/discool/discool/server/target/debug/deps/libdiscool_server-58c/home/REDACTED/work/discool/discool/server/target/debug/deps/discool_server-120cd1741168c41d.14e7tb45uz7tzpkj4jhnefkqt.1nih5gc.rcgu.o /home/REDACTED/work/discool/discool/server/target/debug/deps/libtracing_subscriber/home/REDACTED/work/discool/discool/server/target/debug/deps/discool_server-120cd1741168c41d.14ny3zqdfy4hwxthvaslbvfjj.1nih5gc.rcgu.o /home/REDACTED/work/discool/discool/server/target/debug/deps/libtracing_serde-4bb6/home/REDACTED/work/discool/discool/server/target/debug/deps/discool_server-120cd1741168c41d.14s001zz72xozmoh1qej8blgv.1nih5gc.rcgu.o /home/REDACTED/work/discool/discool/server/target/debug/deps/libsharded_slab-f514a814c�� /home/REDACTED/work/discool/discool/server/target/debug/deps/libmatchers-254130c5bbbcee17.rlib /home/REDACTED/work/discool/discool/server/target/debug/deps/libnu_ansi_term-17ed419d97ba7553.rlib 7411�� 741168c41d.14ny3zqdfy4hwxthvaslbvfjj.166qs5j.rcgu.o 741168c41d.14s001zz72xozmoh1qej8blgv.166qs5j.rcgu.o r_binds_to_confi/home/REDACTED/work/discool/discool/server/target/debug/deps/discool_server-120cd1/tmp/rustcdWkCht/symbols.o r_binds_to_conficc r_binds_to_confi-m64 r_binds_to_confi/tmp/rustc2QYiSt/symbols.o r_binds_to_confi/home/REDACTED/work/discool/discool/server/target/debug/deps/discool_server-36dca1/home/REDACTED/work/discool/discool/server/target/debug/deps/discool_server-120cd1741168c41d.059hbzoocwp7dhbdkb3l0jmmv.1nih5gc.rcgu.o (dns block)
    • Triggering command: /home/REDACTED/work/discool/discool/server/target/debug/discool-server /home/REDACTED/work/discool/discool/server/target/debug/discool-server tcp -m 64-REDACTED-linux-gnu/bin/rust-lld --uid-owner 0 -j ACCEPT ol_server-120cd1cc ol_server-120cd1-m64 ol_server-120cd1/tmp/rustcm4niyv/symbols.o ol_server-120cd1/home/REDACTED/work/discool/discool/server/target/debug/deps/server_binds_to_confi--eh-frame-hdr ol_s�� ol_server-120cd1/home/REDACTED/work/discool/discool/server/target/debug/deps/server_binds_to_confi/home/REDACTED/work/discool/discool/server/target/debug/discool-server ol_server-120cd1/home/REDACTED/work/discool/discool/server/target/debug/deps/server_binds_to_confi--hash-style=gnu ol_server-120cd1/home/REDACTED/work/discool/discool/server/target/debug/deps/server_binds_to_confi--as-needed ol_server-120cd1iptables ol_server-120cd1-w ol_server-120cd1-t ol_server-120cd1security (dns block)
    • Triggering command: /home/REDACTED/work/discool/discool/server/target/debug/discool-server /home/REDACTED/work/discool/discool/server/target/debug/discool-server /home/REDACTED/work/discool/discool/server/target/debug/deps/p2p_gossip_inter_instance-f48ec705fb247a0b.1i3f84twlc72d7289xk6zsccl.16tafg2.rcgu.o /home/REDACTED/work/discool/discool/server/target/debug/deps/p2p_gossip_inter_instance-f48ec705fb247a0b.1l5zpgxsib4kdcq7gliz0e6cg.16tafg2.rcgu.o /home/REDACTED/work/discool/discool/server/target/debug/deps/p2p_gossip_inter_instance-f48ec705fb247a0b.28la5uih1r76vodthe0xlh40d.16tafg2.rcgu.o /home/REDACTED/work/discool/discool/server/target/debug/deps/p2p_gossip_inter_instance-f48ec705fb247a0b.28weqlkx9d27kwsxsd23h6gay.16tafg2.rcgu.o /home/REDACTED/work/discool/discool/server/target/debug/deps/p2p_gossip_inter_instance-f48ec705fb247a0b.2j731du61uqanb4faqhp8wy3q.16tafg2.rcgu.o /home/REDACTED/work/discool/discool/server/target/debug/deps/p2p_gossip_inter_instance-f48ec705fb247a0b.35fhe654bkio84xkb3bqa3apj.16tafg2.rcgu.o /home/REDACTED/work/discool/discool/server/target/debug/deps/p2p_gossip_inter_instance-f48ec705fb247a0b.38s5qj0jur39f02c7wrnczbkr.16tafg2.rcgu.o /home/REDACTED/work/discool/discool/server/target/debug/deps/p2p_gossip_inter_instance-f48ec705fb247a0b.3mipu4kk523ix69ntsn4o2g3r.16tafg2.rcgu.o /home/REDACTED/work/discool/discool/server/target/debug/deps/p2p_gossip_inter_instance�� /home/REDACTED/work/discool/discool/server/target/debug/deps/p2p_gossip_inter_instance-f48ec705fb247a0b.4opl6mjstf35iwyume4axryru.16tafg2.rcgu.o /home/REDACTED/work/discool/discool/server/target/debug/deps/p2p_gossip_inter_instance-f48ec705fb247a0b.4rrw3fzrrmdxol21l9xqfzknt.16tafg2.rcgu.o gure�� gured_port-9fbdd9bfd353badf.1dtqu4buyi5022k413yg5e2h8.0kb58b8.rcgu.o gured_port-9fbdd9bfd353badf.1hq1lvhlo3vcro7m2silef3v1.0kb58b8.rcgu.o r_binds_to_confibin r_binds_to_conficc r_binds_to_confi-m64 r_binds_to_confi/tmp/rustcCKuj3s/symbols.o r_binds_to_confi/home/REDACTED/work/discool/discool/server/target/debug/deps/p2p_gossip_inter_instance-f48ec705fb247a0b.02k1gtylg8lxj0yfqfjadrdn8.16tafg2.rcgu.o (dns block)

If you need me to access, download, or install something from one of these locations, you can either:


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Agent-Logs-Url: https://github.com/dlukt/discool/sessions/9d0293b0-c704-4f01-9570-3d4f3c427887

Co-authored-by: dlukt <201112286+dlukt@users.noreply.github.com>
@dlukt dlukt marked this pull request as ready for review March 29, 2026 11:25
Copilot AI review requested due to automatic review settings March 29, 2026 11:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to stabilize failing Rust CI tests by adjusting WebSocket test helper timing behavior and aligning CI’s test execution with a single-threaded configuration.

Changes:

  • Updated websocket_read_json_with_op to compute a deadline-based timeout passed into websocket_read_text_frame.
  • Increased the “hello” WebSocket event wait time in one test.
  • Updated CI to run cargo test with RUST_TEST_THREADS=1.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
server/tests/server_binds_to_configured_port.rs Adjusts WebSocket test read timing logic and increases a specific test timeout to reduce flakiness.
.github/workflows/ci.yml Runs server tests single-threaded in CI to improve determinism.

Comment on lines +309 to +318
let remaining_ms = remaining.as_millis().min(u128::from(u64::MAX)) as u64;
if let Some(message) = websocket_read_text_frame(stream, remaining_ms).await {
let Ok(value) = serde_json::from_str::<serde_json::Value>(&message) else {
continue;
};
if value["op"] == op {
return Some(value);
}
} else {
break;
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

websocket_read_text_frame applies timeout_ms separately to multiple read_exact calls (header/len/mask/payload). Passing the full remaining_ms here can make a single call block for a multiple of remaining_ms, exceeding total_timeout_ms and making test timeouts much longer than intended. Also, breaking on None changes behavior from “keep waiting until deadline” to “stop on first timeout/non-text frame”; since websocket_read_text_frame returns None for non-text opcodes (e.g., ping/pong), this can prematurely abort waiting for the desired op. Consider restoring a short poll interval (e.g., min(remaining, 250ms)) and only breaking when the deadline is reached, or change websocket_read_text_frame to distinguish timeout/close/non-text so websocket_read_json_with_op can continue appropriately.

Suggested change
let remaining_ms = remaining.as_millis().min(u128::from(u64::MAX)) as u64;
if let Some(message) = websocket_read_text_frame(stream, remaining_ms).await {
let Ok(value) = serde_json::from_str::<serde_json::Value>(&message) else {
continue;
};
if value["op"] == op {
return Some(value);
}
} else {
break;
// Use a capped poll interval to avoid a single call blocking for the full
// remaining timeout across multiple internal reads in websocket_read_text_frame.
let remaining_ms = remaining.as_millis().min(u128::from(u64::MAX)) as u64;
let poll_ms = remaining_ms.min(250);
if let Some(message) = websocket_read_text_frame(stream, poll_ms).await {
let Ok(value) = serde_json::from_str::<serde_json::Value>(&message) else {
continue;
};
if value["op"] == op {
return Some(value);
}

Copilot uses AI. Check for mistakes.
if remaining.is_zero() {
break;
}
let remaining_ms = remaining.as_millis().min(u128::from(u64::MAX)) as u64;
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remaining.as_millis() truncates sub-millisecond durations to 0; when remaining > 0 but < 1ms, this passes timeout_ms = 0 into websocket_read_text_frame, causing an immediate timeout and early break even though the overall deadline hasn’t been reached. Clamp remaining_ms to at least 1 when remaining is non-zero, or use a higher-resolution conversion (e.g., micros) before clamping.

Suggested change
let remaining_ms = remaining.as_millis().min(u128::from(u64::MAX)) as u64;
let remaining_ms =
std::cmp::max(1, remaining.as_millis().min(u128::from(u64::MAX)) as u64);

Copilot uses AI. Check for mistakes.
Copilot AI changed the title [WIP] Fix failing cargo test cases Stabilize server integration test flakiness in CI websocket/DM paths Mar 29, 2026
Copilot AI requested a review from dlukt March 29, 2026 11:29
@dlukt dlukt merged commit 8eca0cd into main Mar 29, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants