feat: honor a declared primary host NIC when selecting the boot device#2690
feat: honor a declared primary host NIC when selecting the boot device#2690chet wants to merge 1 commit into
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (10)
🚧 Files skipped from review as they are similar to previous changes (10)
Summary by CodeRabbit
WalkthroughA new ChangesDeclared Primary NIC propagation through DHCP and ingestion
Sequence Diagram(s)sequenceDiagram
rect rgba(173, 216, 230, 0.5)
Note over DHCPClient,machine_interfaces: DHCP Discovery Path
DHCPClient->>discover_dhcp: DHCP packet (mac_address)
discover_dhcp->>ExpectedMachineData: declared_primary_mac()
ExpectedMachineData-->>discover_dhcp: Option~MacAddress~
discover_dhcp->>find_or_create_machine_interface: is_primary = Some(mac == declared)
find_or_create_machine_interface->>set_primary_interface: reconcile if differs
find_or_create_machine_interface-->>discover_dhcp: machine_interface
end
rect rgba(144, 238, 144, 0.5)
Note over SiteExplorer,machine_interfaces: Zero-DPU Ingestion Path
SiteExplorer->>create_zero_dpu_machine: ExpectedMachineData + observed MACs
create_zero_dpu_machine->>ExpectedMachineData: declared_primary_mac()
ExpectedMachineData-->>create_zero_dpu_machine: Option~MacAddress~
create_zero_dpu_machine->>set_primary_interface: reconcile existing unowned interface
create_zero_dpu_machine->>predicted_machine_interface_create: primary_interface = is_declared_primary
predicted_machine_interface_create-->>create_zero_dpu_machine: PredictedMachineInterface
create_zero_dpu_machine->>move_predicted_machine_interface_to_machine: promote prediction
move_predicted_machine_interface_to_machine->>set_primary_interface: reconcile if current_primary differs
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@coderabbitai PTAL, thank you! |
|
@coderabbitai full review |
✅ Action performedFull review finished. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/api-core/src/handlers/bmc_endpoint_explorer.rs (1)
1320-1355:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate outdated comment about predictions and primary flag.
The comment at lines 1322-1323 states "Predictions hold no primary flag," which is no longer accurate after this PR introduces the
primary_interfacefield onPredictedMachineInterface. Predictions now carry the declared primary intent. The comment should be revised to reflect that these specific test predictions are non-primary (as created by thepredicted()helper withprimary_interface: false), not that predictions categorically lack a primary flag.📝 Suggested comment revision
- // Predictions hold no primary flag, so with several (a report listing - // SuperNICs alongside the boot NIC) the declared intent is unknowable: + // When no prediction is marked primary, with several predictions (a report listing + // SuperNICs alongside the boot NIC) the declared intent is unknowable:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/api-core/src/handlers/bmc_endpoint_explorer.rs` around lines 1320 - 1355, Update the outdated comment in the test function no_mac_multiple_predictions_refuse_to_guess_a_boot_device. The current comment incorrectly states that predictions hold no primary flag, but the PredictedMachineInterface struct now includes a primary_interface field. Revise the comment to clarify that predictions do support a primary flag, but the specific test predictions created by the predicted() helper function are marked as non-primary, and the test verifies that resolution refuses to guess the boot device when facing multiple non-primary predictions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@crates/api-core/src/handlers/bmc_endpoint_explorer.rs`:
- Around line 1320-1355: Update the outdated comment in the test function
no_mac_multiple_predictions_refuse_to_guess_a_boot_device. The current comment
incorrectly states that predictions hold no primary flag, but the
PredictedMachineInterface struct now includes a primary_interface field. Revise
the comment to clarify that predictions do support a primary flag, but the
specific test predictions created by the predicted() helper function are marked
as non-primary, and the test verifies that resolution refuses to guess the boot
device when facing multiple non-primary predictions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 68671db3-5b89-481d-a88b-dd9d806611be
📒 Files selected for processing (10)
crates/api-core/src/dhcp/discover.rscrates/api-core/src/handlers/bmc_endpoint_explorer.rscrates/api-core/src/tests/expected_machine.rscrates/api-db/migrations/20260616215917_predicted_machine_interface_primary.sqlcrates/api-db/src/machine_interface.rscrates/api-db/src/predicted_machine_interface.rscrates/api-model/src/expected_machine.rscrates/api-model/src/predicted_machine_interface.rscrates/site-explorer/src/machine_creator.rscrates/site-explorer/tests/zero_dpu.rs
ExpectedHostNic.primary is meant to say "this NIC is the host's boot interface," but ingestion ignored it: a zero-DPU or NIC-mode host's predicted interfaces always promoted as non-primary, so the declared NIC never stuck and the host booted via the lowest-MAC fallback until an operator ran set-primary-interface. The declared primary is now authoritative across the writers that own a NIC -- recorded on the prediction so it survives promotion, and honored by the DHCP path. This also settles the pre-ingestion window for multi-NIC hosts: several NICs that lease before ingestion each default to primary, and adopting the second one tripped the one_primary_interface_per_machine index and failed the host's ingestion. Ingestion now reconciles the adopted rows so exactly one interface is primary. - Record the declared primary on predicted_machine_interfaces and resolve it through one ExpectedMachineData::declared_primary_mac() the writers share; promotion and the DHCP find-or-create path both honor it. - Reconcile the adopted interfaces at zero-DPU ingestion to exactly one primary, ending the OnePrimaryInterface adoption failure. - Nothing declared -> today's automation stands: DPU takeover during ingestion, else the pick_boot_interface lowest-MAC fallback. Tests cover the declared primary surviving DHCP arrival order, promotion landing it on the real row, multi-NIC adoption no longer colliding, and a resolver unit test. Part of NVIDIA#2657 (epic NVIDIA#2660). The genuine-DPU-mode case -- a declared integrated NIC while the DPU stays managed -- is tracked separately in Signed-off-by: Chet Nichols III <chetn@nvidia.com> NVIDIA#2668.
5c25a50 to
0d1d8e3
Compare
|
@coderabbitai Thanks for the eyes. Updated -- revised the stale "predictions hold no primary flag" note in three spots (the |
|
(ᵔᴥᵔ)🐇 ✅ Action performedReview finished.
|
What
Makes a host's declared
ExpectedHostNic.primaryauthoritative when selecting its boot device, for the paths that own a NIC today (zero-DPU / NIC-mode hosts).primaryis meant to declare the boot interface, but ingestion ignored it -- predicted interfaces always promoted as non-primary, so a declared NIC never stuck and the host booted via the lowest-MAC fallback until an operator ranset-primary-interface. This records the declared primary on the prediction so it survives promotion, resolves it through oneExpectedMachineData::declared_primary_mac()the writers share, and honors it in the DHCP find-or-create path.It also settles the multi-NIC pre-ingestion window: several NICs leasing before ingestion each defaulted to primary, and adopting the second one tripped the
one_primary_interface_per_machineindex and failed the host's ingestion (OnePrimaryInterface). Ingestion now reconciles the adopted rows to exactly one primary.Nothing declared -> today's automation stands (DPU takeover during ingestion, else the
pick_boot_interfacelowest-MAC fallback).Scope
Part of #2657 (epic #2660). The genuine-DPU-mode case -- booting a declared integrated NIC while the DPU stays managed -- is split into #2668: there, demoting the DPU host interface must come together with owning the integrated NIC as the replacement primary admin interface (coupled by the admin-primary invariant), so it can't ride along here.
Test plan
cargo clippy --all-features --testsclean over the touched crates;cargo +nightly fmt.declared_primary_macresolver unit test; zero-DPU integration tests for declared-primary-via-prediction and multi-NIC no-collision adoption; an api-core DHCP arrival-order test. Existing seed + DPU ingestion tests still pass.