Skip to content

feat(zero-dpu): Allow flat VPC's to not belong to a network segment#2666

Open
kensimon wants to merge 1 commit into
NVIDIA:mainfrom
kensimon:unlink-vpc-and-segment
Open

feat(zero-dpu): Allow flat VPC's to not belong to a network segment#2666
kensimon wants to merge 1 commit into
NVIDIA:mainfrom
kensimon:unlink-vpc-and-segment

Conversation

@kensimon

@kensimon kensimon commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

This supports #2647.

In an environment without DPU's, we don't get dynamic network isolation, and the expectation is that there is a single, flat network address space, modeled as a single network segment. In this case, we still want to model VPC's for things like NVLink partitions, where tenants will have the understanding that the network is not isolated between VPC's.

To make this work, a network segment can no longer be the "link" that links an instance to a VPC. Before, an instance interface maps to a machine interface, which maps to a network segment, which maps to a VPC. But if multiple VPC's are all sharing a network segment, this link doesn't exist (namely, network_segments.vpc_id will be null), and so we need the instance address itself to store the VPC ID on it.

This change adds a vpc_id column to instance_addresses, so that we know the VPC of any given address without having to consult the network_segments table. It also changes the AllocateInstance API so that you can pass a auto_config object which allows specifying a VPC ID as part of the allocation request. (This replaces the auto: bool field, such that auto is implied if auto_config is set.)

Related issues

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Breaking Changes

  • This PR contains breaking changes

The --zero-dpu flag on nico-admin-cli instance allocate is now --flat-vpc-id=<vpc_id>. The old flag was just merged last week so it's doubtful anyone's relying on it yet (especially because the functionality isn't done yet.)

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

In an environment without DPU's, we don't get dynamic network isolation,
and the expectation is that there is a single, flat network address
space, modeled as a single network segment. In this case, we still want
to model VPC's for things like NVLink partitions, where tenants will
have the understanding that the network is not isolated between VPC's.

To make this work, a network segment can no longer be the "link" that links an
instance to a VPC. Before, an instance interface maps to a machine
interface, which maps to a network segment, which maps to a VPC. But if
multiple VPC's are all sharing a network segment, this link doesn't
exist (namely, network_segments.vpc_id will be null), and so we need the
instance address itself to store the VPC ID on it.

This change adds a vpc_id column to instance_addresses, so that we know
the VPC of any given address without having to consult the
network_segments table. It also changes the AllocateInstance API so that
you can pass a `auto_config` object which allows specifying a VPC ID as
part of the allocation request. (This replaces the `auto: bool` field,
such that auto is implied if `auto_config` is set.)
@kensimon kensimon requested a review from a team as a code owner June 16, 2026 20:57
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

The PR replaces the zero_dpu: bool / auto: bool mechanism with a structured auto_config: Option<InstanceNetworkAutoConfig> carrying a vpc_id field. The change propagates from the protobuf wire contract through RPC conversions, internal model types, a new database migration adding instance_addresses.vpc_id, core allocation validation, and the admin CLI's --flat-vpc-id argument.

Changes

Flat VPC / auto_config for zero-DPU instance allocation

Layer / File(s) Summary
Proto contract and RPC model conversions
crates/rpc/proto/forge.proto, crates/rpc/build.rs, crates/rpc/src/model/instance/config/network.rs, crates/rpc/src/model/instance/status/network.rs
Deprecates InstanceNetworkConfig.auto (field 2) and adds auto_config (field 3) with InstanceNetworkAutoConfig { vpc_id }. Adds vpc_id field 8 to InstanceInterfaceStatus. Derives serde::Serialize for the new proto type. Adds TryFrom<forge::InstanceNetworkAutoConfig>, rewrites RPC↔model conversions to set and read auto_config, and derives the deprecated auto boolean from auto_config.is_some().
Internal model types
crates/api-model/src/instance/config/network.rs, crates/api-model/src/instance/status/network.rs, crates/api-model/src/instance_address.rs, crates/network/src/virtualization.rs
Replaces InstanceNetworkConfig.auto: bool with auto_config: Option<InstanceNetworkAutoConfig>. Adds InstanceInterfaceConfig.vpc_id: Option<VpcId> cleared during diff and preserved by copy_existing_resources. Updates into_external_view, for_segment_ids, and for_vpc_prefix_id. Extends InstanceInterfaceStatus and InstanceAddress with vpc_id and propagates it through all construction paths.
Database schema, queries, and address allocation
crates/api-db/migrations/...vpc_id.sql, crates/api-db/src/vpc.rs, crates/api-db/src/instance_address.rs, crates/api-db/src/instance.rs, crates/api-db/src/instance_network_config.rs, crates/api-db/src/network_security_group.rs, crates/api-db/src/dpa_interface.rs
Adds vpc_id FK column to instance_addresses with backfill and indexes. Changes find_by_segment to return Option<Vpc>. Adds instance_addresses-count guard in try_delete. Rewrites find_ids and NSG propagation query to use instance_addresses.vpc_id directly. Introduces interface_vpc_id helper; updates validate and allocate to derive and persist vpc_id per interface. Updates add_inband_interfaces_to_config to gate on auto_config.vpc_id.
Core allocation and validation logic
crates/api-core/src/instance/mod.rs, crates/api-core/src/handlers/instance.rs, crates/api-core/src/handlers/dpu.rs, crates/api-core/src/ethernet_virtualization.rs
Adds validate_zero_dpu_auto_vpc enforcing tenant ownership, Flat virtualization, and fabric_interface_type == Nic. Switches zero-DPU path from network.auto to auto_config. Updates HostInband binding checks to allow unbound segments. Switches DPU rejection to auto_config.is_some(). Adds three-case DPU segment VPC validation. Updates update_instance_network_config immutability guard to use auto_config/vpc_id. Adds None-to-error mapping where find_by_segment returns Option.
Admin CLI: --flat-vpc-id, machine selection, and request building
crates/admin-cli/src/instance/allocate/args.rs, crates/admin-cli/src/instance/allocate/cmd.rs, crates/admin-cli/src/machine/show/cmd.rs, crates/admin-cli/src/rpc.rs, crates/admin-cli/src/instance/show/cmd.rs
Replaces --zero-dpu with --flat-vpc-id: Option<VpcId> in CLI args. Updates get_next_free_machine signature and branching. Adds flat_vpc_id allocation branch in build_instance_request validating Flat VPC and assembling auto_config. Updates final InstanceConfig wiring. Refactors instance show interface rendering to batch VPC resolution via join_all and adds get_vpc_by_id helper.
Test infrastructure and test updates
crates/api-core/src/test_support/network_segment.rs, crates/api-core/src/tests/common/api_fixtures/test_managed_host.rs, crates/api-core/src/tests/common/api_fixtures/*, crates/api-core/src/tests/instance_allocate.rs, crates/api-core/src/tests/instance.rs, crates/api-core/src/tests/vpc_peering.rs, crates/api-core/src/tests/..., crates/agent/src/tests/full.rs, crates/test-harness/src/network/controller.rs, crates/machine-a-tron/src/api_client.rs
Introduces FIXTURE_TENANT_ORG_ID replacing all hardcoded tenant org UUIDs. Adds TestManagedHost::from_rpc_machine and release_instances_from_vpcs. Populates auto_config: None / #[allow(deprecated)] auto: false on InstanceInterfaceConfig literals across all test files. Adds vpc_id: None to status fixtures. Adds new negative-path tests for missing and non-Flat VPC ID. Strengthens allocation tests to assert VPC ID propagation through RPC wire, interface status, persisted model, and instance_addresses.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as admin-cli
  participant RPC as build_instance_request
  participant CoreAlloc as api-core allocate_instance
  participant ValidateVpc as validate_zero_dpu_auto_vpc
  participant DB as instance_address::allocate
  participant Resp as RPC response

  CLI->>RPC: --flat-vpc-id=<VpcId>
  RPC->>RPC: validate Flat virtualization, build auto_config { vpc_id }
  RPC->>CoreAlloc: InstanceConfig { auto_config: Some { vpc_id }, interfaces: [] }
  CoreAlloc->>ValidateVpc: validate_zero_dpu_auto_vpc(vpc_id, tenant_org)
  ValidateVpc-->>CoreAlloc: Ok (Flat, Nic, tenant-owned)
  CoreAlloc->>CoreAlloc: validate HostInband segment binding vs vpc_id
  CoreAlloc->>DB: allocate addresses with interface_vpc_id per interface
  DB->>DB: INSERT instance_addresses(vpc_id)
  DB-->>CoreAlloc: allocated addresses
  CoreAlloc->>Resp: Instance { status.network.interfaces[*].vpc_id, network.auto_config.vpc_id }
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
crates/api-db/src/instance_network_config.rs (1)

75-80: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the invalid-argument message to reference auto_config instead of deprecated auto.

Line 76 still reports InstanceNetworkConfig.auto, which no longer matches the request contract and can mislead operators during troubleshooting.

Suggested wording update
-            "InstanceNetworkConfig.auto reached the resolver with {} \
+            "InstanceNetworkConfig.auto_config reached the resolver with {} \
              pre-existing interfaces; auto requests must arrive with an \
              empty interfaces list",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-db/src/instance_network_config.rs` around lines 75 - 80, The error
message in the DatabaseError::InvalidArgument call references the deprecated
field name `auto`, but the field has been renamed to `auto_config`. Update the
error message string on line 76 to replace "InstanceNetworkConfig.auto" with
"InstanceNetworkConfig.auto_config" so the error message accurately reflects the
current field name and prevents misleading operators during troubleshooting.
crates/api-core/src/instance/mod.rs (1)

935-944: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update stale auto error text to auto_config.

These user-facing errors still tell operators to set InstanceNetworkConfig.auto, but this path now requires/rejects auto_config.

Proposed fix
-        // Zero-DPU hosts (no DPU, or DPU in NIC mode) MUST use `auto`, because
+        // Zero-DPU hosts (no DPU, or DPU in NIC mode) MUST use `auto_config`, because
         // their only valid attachments are HostInband segments, and NICo knows
         // which one(s) the host is on. Conversely, hosts with DPUs cannot use
-        // `auto`, and are expected to enumerate their interfaces explicitly.
+        // `auto_config`, and are expected to enumerate their interfaces explicitly.
         if !mh_snapshot.has_managed_dpus() {
             let Some(requested_auto_config) = request.config.network.auto_config else {
                 return Err(CarbideError::InvalidArgument(format!(
-                    "zero-DPU host {} requires `InstanceNetworkConfig.auto = true`; cannot allocate an instance with explicitly-listed interfaces or with `auto = false`",
+                    "zero-DPU host {} requires `InstanceNetworkConfig.auto_config.vpc_id`; cannot allocate an instance without auto_config",
                     mh_snapshot.host_snapshot.id,
                 )));
             };
-            // `auto` is only valid on zero-DPU hosts; DPU-managed hosts must
+            // `auto_config` is only valid on zero-DPU hosts; DPU-managed hosts must
             // list their interfaces explicitly.
             if request.config.network.auto_config.is_some() {
                 return Err(CarbideError::InvalidArgument(format!(
-                    "host {} has DPUs; `InstanceNetworkConfig.auto` is only valid on zero-DPU hosts",
+                    "host {} has DPUs; `InstanceNetworkConfig.auto_config` is only valid on zero-DPU hosts",
                     mh_snapshot.host_snapshot.id,
                 )));
             }

Also applies to: 1022-1028

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/instance/mod.rs` around lines 935 - 944, The error
message in the zero-DPU host validation block references the outdated field name
`InstanceNetworkConfig.auto` but the code is now checking for `auto_config`.
Update the error message text to correctly reference
`InstanceNetworkConfig.auto_config` instead of `InstanceNetworkConfig.auto` to
match the actual field being validated. Additionally, apply the same correction
to any other similar error messages at lines 1022-1028 that still reference the
old field name.
crates/api-db/src/vpc.rs (1)

240-255: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not return soft-deleted VPCs from find_by_segment.

fetch_optional now models “segment is unbound”, but a segment whose vpc_id points at a soft-deleted VPC still returns Some(vpc). That lets allocation/validation treat deleted VPC membership as active.

Proposed fix
     .filter_relation(
         &ObjectColumnFilter::One(network_segment::IdColumn, &segment_id),
         Some("s"),
     );
+    query.push(" AND v.deleted IS NULL");
     query.push(" LIMIT 1");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-db/src/vpc.rs` around lines 240 - 255, The find_by_segment
function in the VPC query builder returns soft-deleted VPCs because it lacks a
filter to exclude them, allowing allocation/validation logic to treat deleted
VPC membership as active. Add a filter to the FilterableQueryBuilder in
find_by_segment to exclude soft-deleted VPCs by filtering on the appropriate
soft-delete column (such as deleted_at or similar field) in the vpcs table,
similar to how the function already filters by segment_id using filter_relation.
crates/rpc/src/model/instance/config/network.rs (1)

1002-1029: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Test fixture does not exercise the intended rejection path.

The test test_auto_rejects_explicit_interfaces sets auto_config: None, which means auto = config.auto_config.is_some() evaluates to false. Consequently, the check at line 142 (if auto && !config.interfaces.is_empty()) never triggers. The test currently passes because the conversion succeeds (no rejection occurs), and then the expect_err call fails with the wrong reason.

To correctly test that auto_config combined with non-empty interfaces is rejected, supply auto_config: Some(...):

Proposed fix
     #[test]
     fn test_auto_rejects_explicit_interfaces() {
         // `auto: true` and populated interfaces are mutually exclusive,
         // so this should error.
+        let vpc_id = VpcId::new();
         let rpc_config = rpc::InstanceNetworkConfig {
             interfaces: vec![rpc::InstanceInterfaceConfig {
                 function_type: rpc::InterfaceFunctionType::Physical as i32,
                 network_segment_id: Some(NetworkSegmentId::new()),
                 network_details: None,
                 device: None,
                 device_instance: 0,
                 virtual_function_id: None,
                 ip_address: None,
                 ipv6_interface_config: None,
                 routing_profile: None,
             }],
             #[allow(deprecated)]
-            auto: false,
-            auto_config: None,
+            auto: true,
+            auto_config: Some(forge::InstanceNetworkAutoConfig {
+                vpc_id: Some(vpc_id),
+            }),
         };
         let result: Result<InstanceNetworkConfig, _> = rpc_config.try_into();
         let err = result.expect_err("auto + non-empty interfaces should be rejected");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/rpc/src/model/instance/config/network.rs` around lines 1002 - 1029, In
the test function `test_auto_rejects_explicit_interfaces`, the `auto_config`
field is currently set to `None`, which causes `auto` to be derived as `false`
and prevents the intended rejection check from triggering. Change `auto_config:
None` to `auto_config: Some(...)` (with any valid auto configuration structure)
so that the condition `if auto && !config.interfaces.is_empty()` at line 142 is
properly evaluated and the rejection behavior is actually tested.
crates/admin-cli/src/instance/allocate/args.rs (1)

25-54: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a flat-VPC allocation example.

The command help now exposes --flat-vpc-id, but the examples only cover subnet and VPC-prefix allocation, leaving the zero-DPU replacement path undiscoverable.

Proposed help addition
 Allocate one instance onto a subnet:
     $ nico-admin-cli instance allocate --prefix-name eth0 --subnet 192.0.2.0/24
 
+Allocate one instance into a flat VPC for a zero-DPU host:
+    $ nico-admin-cli instance allocate --prefix-name eth0 \
+    --flat-vpc-id 12345678-1234-5678-90ab-cdef01234567
+
 Allocate several instances at once:
     $ nico-admin-cli instance allocate --number 4 --prefix-name eth0 \

As per coding guidelines, admin CLI examples should cover each meaningful axis of variation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/admin-cli/src/instance/allocate/args.rs` around lines 25 - 54, The
help documentation for the instance allocate command is missing an example
demonstrating the --flat-vpc-id flag, which is a key allocation option. Add a
new example in the after_long_help string that shows how to allocate an instance
using --flat-vpc-id instead of subnet or vpc-prefix-id. Follow the existing
example format and include it alongside the other allocation examples to make
this allocation path discoverable for users. This example should demonstrate the
zero-DPU replacement path as mentioned in the comment.

Source: Coding guidelines

🧹 Nitpick comments (4)
crates/api-core/src/tests/instance_config_update.rs (1)

901-903: ⚡ Quick win

Consolidate deprecated-field suppression into a single test helper.

These repeated #[allow(deprecated)] auto: false initializers are functionally correct but create high churn and spread lint suppression across the file. Prefer one helper that sets auto_config: None and contains the deprecation handling in one place.

Proposed refactor
+#[allow(deprecated)]
+fn test_network_config(
+    interfaces: Vec<rpc::InstanceInterfaceConfig>,
+) -> rpc::InstanceNetworkConfig {
+    rpc::InstanceNetworkConfig {
+        interfaces,
+        auto: false,
+        auto_config: None,
+    }
+}
...
-    let network = rpc::InstanceNetworkConfig {
-        interfaces: vec![...],
-        #[allow(deprecated)]
-        auto: false,
-        auto_config: None,
-    };
+    let network = test_network_config(vec![...]);

Also applies to: 969-971, 1028-1030, 1115-1117, 1188-1190, 1276-1278, 1344-1346, 1475-1477, 1543-1545, 1660-1662, 1702-1704, 1746-1748, 1884-1886, 1944-1946, 2004-2006

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/tests/instance_config_update.rs` around lines 901 - 903,
Create a single test helper function that encapsulates the initialization of the
deprecated `auto` field and `auto_config` field. Instead of repeating the
`#[allow(deprecated)]` attribute and the pattern `auto: false, auto_config:
None` at multiple test locations (lines 901-903, 969-971, 1028-1030, 1115-1117,
1188-1190, 1276-1278, 1344-1346, 1475-1477, 1543-1545, 1660-1662, 1702-1704,
1746-1748, 1884-1886, 1944-1946, 2004-2006), define a helper function that
builds the configuration struct with the deprecated field handling contained in
one place, then replace all occurrences of the scattered `#[allow(deprecated)]
auto: false, auto_config: None` pattern with calls to this helper function.

Source: Coding guidelines

crates/api-core/src/tests/instance.rs (2)

1033-1035: ⚡ Quick win

Centralize the deprecated auto field handling.

The repeated local #[allow(deprecated)] blocks make future removal noisy. Prefer ..Default::default() for these explicit-interface configs if available, or move the deprecated-field initialization behind one small test helper with a single documented allow. As per coding guidelines, “Avoid using #[allow(...)] unless you have a strong reason to do so.”

Representative cleanup
     rpc::InstanceNetworkConfig {
         interfaces,
-        #[allow(deprecated)]
-        auto: false,
-        auto_config: None,
+        ..Default::default()
     }

If the generated type cannot use Default cleanly, centralize the suppression instead:

#[allow(deprecated)]
fn explicit_network_config(
    interfaces: Vec<rpc::InstanceInterfaceConfig>,
) -> rpc::InstanceNetworkConfig {
    rpc::InstanceNetworkConfig {
        interfaces,
        auto: false,
        auto_config: None,
    }
}

Also applies to: 1849-1851, 2442-2444, 3030-3032, 3416-3418, 3506-3508, 3593-3595, 3749-3751, 3827-3829, 3965-3967, 4143-4145, 4242-4244, 4384-4386, 4522-4524, 4589-4591, 4745-4747, 5333-5335

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/tests/instance.rs` around lines 1033 - 1035, Centralize
the deprecated field handling across all instances in the test file. Create a
single test helper function (e.g., explicit_network_config) that contains the
`#[allow(deprecated)]` annotation and constructs the rpc::InstanceNetworkConfig
with the deprecated auto and auto_config fields set to false and None
respectively. Then replace all instances of the scattered #[allow(deprecated)]
blocks at lines 1033-1035, 1849-1851, 2442-2444, 3030-3032, 3416-3418,
3506-3508, 3593-3595, 3749-3751, 3827-3829, 3965-3967, 4143-4145, 4242-4244,
4384-4386, 4522-4524, 4589-4591, 4745-4747, and 5333-5335 with calls to this
centralized helper function, removing the redundant local allow annotations
entirely.

Source: Coding guidelines


167-178: ⚡ Quick win

Assert persisted instance_addresses.vpc_id in the record checks.

These tests now compute expected VPC IDs, but the rows read from instance_addresses still only validate address/prefix. Add direct record.vpc_id assertions so a regression that leaves the new DB association unset/wrong is caught even if config/status propagation still works. Based on PR objectives, instance_addresses.vpc_id is now the direct VPC association used by lookup paths.

Proposed assertions
     // This should the first IP. Algo does not look into machine_interface_addresses
     // table for used addresses for instance.
     assert_eq!(record.address.to_string(), "192.0.4.3");
+    assert_eq!(record.vpc_id, Some(vpc_ids[0]));
     assert_eq!(
         &record.address,
         network_config.interfaces[0]
     // This should the first IP. Algo does not look into machine_interface_addresses
     // table for used addresses for instance.
     assert_eq!(record.address.to_string(), "192.0.4.3");
+    assert_eq!(record.vpc_id, Some(vpc_id));
     assert_eq!(
         &record.address,
         network_config.interfaces[0]
     // This should the first IP. Algo does not look into machine_interface_addresses
     // table for used addresses for instance.
     assert_eq!(record.address.to_string(), "10.217.5.225");
+    assert_eq!(record.vpc_id, Some(vpc.id));
     assert_eq!(
         &record.address,
         network_config.interfaces[0]

Also applies to: 273-286, 355-359, 513-524, 2641-2641, 2671-2682

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/tests/instance.rs` around lines 167 - 178, The test code
now computes expected VPC IDs from segments using the join_all and
db::vpc::find_by_segment pattern, but the subsequent record assertions for
instance_addresses rows only validate address and prefix fields. Add direct
assertions that check record.vpc_id against the expected VPC IDs that were
computed in this block to ensure the database associations are correctly set.
Apply this fix at all locations where instance_addresses records are being
validated after computing VPC IDs, including the primary location (167-178) and
the other affected test locations mentioned in the comment (273-286, 355-359,
513-524, 2641-2641, 2671-2682).
crates/rpc/proto/forge.proto (1)

2749-2755: ⚡ Quick win

Update the contract comment to make auto_config the sole authoritative path.

Line 2749-Line 2751 still describes mutual exclusivity using deprecated auto. This can mislead API consumers and generated documentation; the rule should be documented against auto_config instead.

Suggested doc fix
-  // Mutually exclusive with `auto`: when `auto` is true, this
-  // MUST be empty, When `auto` is false, this lists the explicit
-  // interface configuration the caller wants applied.
+  // Mutually exclusive with `auto_config`: when `auto_config` is set, this
+  // MUST be empty. When `auto_config` is unset, this lists the explicit
+  // interface configuration the caller wants applied.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/rpc/proto/forge.proto` around lines 2749 - 2755, Update the contract
comment for the `interfaces` field (lines 2749-2751) to remove references to the
deprecated `auto` field and instead describe the mutual exclusivity relationship
with `auto_config`. The comment should make clear that `auto_config` is the
authoritative mechanism for controlling whether interfaces are automatically
configured or explicitly specified, ensuring API consumers and generated
documentation reflect the current contract rather than referencing deprecated
fields.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/admin-cli/src/instance/show/cmd.rs`:
- Around line 201-230: The current implementation uses filter_map and flatten
operations that remove missing VPC lookup results, causing the remaining VPCs to
shift to earlier indices and misalign with their corresponding interfaces.
Replace the filter_map(s.vpc_id).map(...) and
filter_map(c.network_segment_id).map(...) patterns with approaches that preserve
an Option<Vpc> for each interface instead of discarding missing results.
Specifically, for the auto_network branch, map each interface status to an
Option containing either the looked-up VPC or None if vpc_id is missing; for the
else branch, map each interface config to an Option containing either the
looked-up VPC or None if segment_id is missing or the lookup fails. Remove the
flatten() call since you now have Vec<Option<Vpc>> with proper 1:1 alignment,
and adjust the vpcs.get(idx) lookups to handle the Option wrapper appropriately
when displaying results.

In `@crates/admin-cli/src/machine/show/cmd.rs`:
- Around line 441-443: The current condition checking if flat_vpc_id.is_some()
returns the first eligible machine without verifying it is suitable for zero-DPU
flat-VPC auto allocation. Add an additional eligibility check in the if
statement to ensure the machine is a static-membership machine before returning
Some(machine), since the server rejects auto_config on DPU-backed machines and
this prevents batch allocation from trying valid hosts.

In `@crates/admin-cli/src/rpc.rs`:
- Around line 1267-1274: The error checking in the flat-vpc-id validation block
(in the condition with vpc_prefix_id, vf_vpc_prefix_id, and vf_subnet) currently
only rejects VPC prefixes and VF subnets, but it must also reject subnet,
explicit IP addresses, and IPv6 selector fields to prevent them from being
silently ignored. Expand the if condition to include checks for the subnet
field, any explicit IP selector fields, and IPv6 selector fields in the
allocate_instance struct, adding each to the logical OR chain with the existing
checks.

In `@crates/api-core/src/handlers/dpu.rs`:
- Around line 258-263: The error handling in the vpc::find_by_segment lookup
block incorrectly uses CarbideError::InvalidArgument when the network segment is
not found to be a member of a VPC. This is a system-side database state issue,
not a validation failure on caller input. Change the error variant from
CarbideError::InvalidArgument to CarbideError::FailedPrecondition to properly
classify this as a system failure and ensure correct error ownership semantics
and retry/alert behavior according to the project's error handling guidelines.

In `@crates/api-core/src/instance/mod.rs`:
- Line 31: The VPC-attached address creation in instance_addresses inserts for
requested_auto_config.vpc_id lacks the necessary VpcRowLock::Mutation lock,
allowing a race condition where concurrent VPC deletions can soft-delete a VPC
while addresses are still being committed to it. Acquire the
VpcRowLock::Mutation lock before performing the instance_addresses inserts for
the requested VPC to ensure mutual exclusion with concurrent VPC deletion
operations and prevent dangling addresses from being created for deleted VPCs.

In `@crates/api-db/migrations/20260615120000_instance_addresses_vpc_id.sql`:
- Around line 1-8: The migration adds a nullable vpc_id column to
instance_addresses and only partially backfills it via an UPDATE statement that
joins with network_segments. This leaves room for NULL vpc_id values in existing
or newly-inserted rows during a mixed-version rollout, violating the intended
VPC-address association contract. After the UPDATE backfill statement completes,
add an ALTER TABLE statement to the migration to enforce the vpc_id column as
NOT NULL, ensuring data consistency and preventing future null values from being
inserted into this critical field.

In `@crates/api-db/src/vpc.rs`:
- Around line 286-293: The instance_address_count_query contains a redundant
EXISTS subquery that checks if the VPC still exists (is not deleted). Since
instance_addresses are hard-deleted during instance release, this defensive
check is unnecessary. Simplify the query string to remove the EXISTS clause and
only check WHERE vpc_id=$1, reducing the query to a single simple condition.
Update the instance_address_count_query variable to contain only the simplified
SQL without the subquery that validates VPC existence.

In `@crates/machine-a-tron/src/api_client.rs`:
- Around line 291-293: Remove the #[allow(deprecated)] annotation and refactor
the InstanceNetworkConfig construction to use Default::default() instead of
manually setting the deprecated fields auto and auto_config. Keep only the
non-deprecated fields that need to be explicitly set, allowing the deprecated
fields to use their default values without suppressing the deprecation warning.
This maintains clippy cleanliness and follows coding guidelines that avoid
unnecessary lint suppressions.

---

Outside diff comments:
In `@crates/admin-cli/src/instance/allocate/args.rs`:
- Around line 25-54: The help documentation for the instance allocate command is
missing an example demonstrating the --flat-vpc-id flag, which is a key
allocation option. Add a new example in the after_long_help string that shows
how to allocate an instance using --flat-vpc-id instead of subnet or
vpc-prefix-id. Follow the existing example format and include it alongside the
other allocation examples to make this allocation path discoverable for users.
This example should demonstrate the zero-DPU replacement path as mentioned in
the comment.

In `@crates/api-core/src/instance/mod.rs`:
- Around line 935-944: The error message in the zero-DPU host validation block
references the outdated field name `InstanceNetworkConfig.auto` but the code is
now checking for `auto_config`. Update the error message text to correctly
reference `InstanceNetworkConfig.auto_config` instead of
`InstanceNetworkConfig.auto` to match the actual field being validated.
Additionally, apply the same correction to any other similar error messages at
lines 1022-1028 that still reference the old field name.

In `@crates/api-db/src/instance_network_config.rs`:
- Around line 75-80: The error message in the DatabaseError::InvalidArgument
call references the deprecated field name `auto`, but the field has been renamed
to `auto_config`. Update the error message string on line 76 to replace
"InstanceNetworkConfig.auto" with "InstanceNetworkConfig.auto_config" so the
error message accurately reflects the current field name and prevents misleading
operators during troubleshooting.

In `@crates/api-db/src/vpc.rs`:
- Around line 240-255: The find_by_segment function in the VPC query builder
returns soft-deleted VPCs because it lacks a filter to exclude them, allowing
allocation/validation logic to treat deleted VPC membership as active. Add a
filter to the FilterableQueryBuilder in find_by_segment to exclude soft-deleted
VPCs by filtering on the appropriate soft-delete column (such as deleted_at or
similar field) in the vpcs table, similar to how the function already filters by
segment_id using filter_relation.

In `@crates/rpc/src/model/instance/config/network.rs`:
- Around line 1002-1029: In the test function
`test_auto_rejects_explicit_interfaces`, the `auto_config` field is currently
set to `None`, which causes `auto` to be derived as `false` and prevents the
intended rejection check from triggering. Change `auto_config: None` to
`auto_config: Some(...)` (with any valid auto configuration structure) so that
the condition `if auto && !config.interfaces.is_empty()` at line 142 is properly
evaluated and the rejection behavior is actually tested.

---

Nitpick comments:
In `@crates/api-core/src/tests/instance_config_update.rs`:
- Around line 901-903: Create a single test helper function that encapsulates
the initialization of the deprecated `auto` field and `auto_config` field.
Instead of repeating the `#[allow(deprecated)]` attribute and the pattern `auto:
false, auto_config: None` at multiple test locations (lines 901-903, 969-971,
1028-1030, 1115-1117, 1188-1190, 1276-1278, 1344-1346, 1475-1477, 1543-1545,
1660-1662, 1702-1704, 1746-1748, 1884-1886, 1944-1946, 2004-2006), define a
helper function that builds the configuration struct with the deprecated field
handling contained in one place, then replace all occurrences of the scattered
`#[allow(deprecated)] auto: false, auto_config: None` pattern with calls to this
helper function.

In `@crates/api-core/src/tests/instance.rs`:
- Around line 1033-1035: Centralize the deprecated field handling across all
instances in the test file. Create a single test helper function (e.g.,
explicit_network_config) that contains the `#[allow(deprecated)]` annotation and
constructs the rpc::InstanceNetworkConfig with the deprecated auto and
auto_config fields set to false and None respectively. Then replace all
instances of the scattered #[allow(deprecated)] blocks at lines 1033-1035,
1849-1851, 2442-2444, 3030-3032, 3416-3418, 3506-3508, 3593-3595, 3749-3751,
3827-3829, 3965-3967, 4143-4145, 4242-4244, 4384-4386, 4522-4524, 4589-4591,
4745-4747, and 5333-5335 with calls to this centralized helper function,
removing the redundant local allow annotations entirely.
- Around line 167-178: The test code now computes expected VPC IDs from segments
using the join_all and db::vpc::find_by_segment pattern, but the subsequent
record assertions for instance_addresses rows only validate address and prefix
fields. Add direct assertions that check record.vpc_id against the expected VPC
IDs that were computed in this block to ensure the database associations are
correctly set. Apply this fix at all locations where instance_addresses records
are being validated after computing VPC IDs, including the primary location
(167-178) and the other affected test locations mentioned in the comment
(273-286, 355-359, 513-524, 2641-2641, 2671-2682).

In `@crates/rpc/proto/forge.proto`:
- Around line 2749-2755: Update the contract comment for the `interfaces` field
(lines 2749-2751) to remove references to the deprecated `auto` field and
instead describe the mutual exclusivity relationship with `auto_config`. The
comment should make clear that `auto_config` is the authoritative mechanism for
controlling whether interfaces are automatically configured or explicitly
specified, ensuring API consumers and generated documentation reflect the
current contract rather than referencing deprecated fields.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ae5d57a1-9f74-4c1a-ba99-6b896f9b9dea

📥 Commits

Reviewing files that changed from the base of the PR and between 7eb1969 and 208de58.

📒 Files selected for processing (46)
  • crates/admin-cli/src/instance/allocate/args.rs
  • crates/admin-cli/src/instance/allocate/cmd.rs
  • crates/admin-cli/src/instance/show/cmd.rs
  • crates/admin-cli/src/machine/show/cmd.rs
  • crates/admin-cli/src/rpc.rs
  • crates/agent/src/tests/full.rs
  • crates/api-core/src/cfg/file.rs
  • crates/api-core/src/ethernet_virtualization.rs
  • crates/api-core/src/handlers/dpu.rs
  • crates/api-core/src/handlers/instance.rs
  • crates/api-core/src/instance/mod.rs
  • crates/api-core/src/setup.rs
  • crates/api-core/src/test_support/network_segment.rs
  • crates/api-core/src/tests/client_resolution.rs
  • crates/api-core/src/tests/common/api_fixtures/instance.rs
  • crates/api-core/src/tests/common/api_fixtures/mod.rs
  • crates/api-core/src/tests/common/api_fixtures/test_managed_host.rs
  • crates/api-core/src/tests/common/network_segment.rs
  • crates/api-core/src/tests/instance.rs
  • crates/api-core/src/tests/instance_allocate.rs
  • crates/api-core/src/tests/instance_config_update.rs
  • crates/api-core/src/tests/machine_dhcp.rs
  • crates/api-core/src/tests/machine_network.rs
  • crates/api-core/src/tests/network_security_group.rs
  • crates/api-core/src/tests/network_segment.rs
  • crates/api-core/src/tests/vpc.rs
  • crates/api-core/src/tests/vpc_find.rs
  • crates/api-core/src/tests/vpc_peering.rs
  • crates/api-core/src/tests/vpc_prefix.rs
  • crates/api-db/migrations/20260615120000_instance_addresses_vpc_id.sql
  • crates/api-db/src/dpa_interface.rs
  • crates/api-db/src/instance.rs
  • crates/api-db/src/instance_address.rs
  • crates/api-db/src/instance_network_config.rs
  • crates/api-db/src/network_security_group.rs
  • crates/api-db/src/vpc.rs
  • crates/api-model/src/instance/config/network.rs
  • crates/api-model/src/instance/status/network.rs
  • crates/api-model/src/instance_address.rs
  • crates/machine-a-tron/src/api_client.rs
  • crates/network/src/virtualization.rs
  • crates/rpc/build.rs
  • crates/rpc/proto/forge.proto
  • crates/rpc/src/model/instance/config/network.rs
  • crates/rpc/src/model/instance/status/network.rs
  • crates/test-harness/src/network/controller.rs

Comment on lines +201 to +230
let vpcs: Vec<Vpc> = if auto_network {
futures::future::join_all(
if_status
.iter()
.filter_map(|s| s.vpc_id)
.map(|vpc_id| get_vpc_by_id(api_client, vpc_id)),
)
.await
.into_iter()
.collect::<Result<Vec<_>, _>>()?
.into_iter()
} else {
futures::future::join_all(if_configs.iter().filter_map(|c| c.network_segment_id).map(
|segment_id| async move {
get_vpc_for_interface_network_segment(api_client, segment_id).await
},
))
.await
.into_iter()
.collect::<Result<Vec<_>, _>>()?
.into_iter()
}
.flatten()
.collect();
if !auto_network && if_configs.len() != if_status.len() {
writeln!(&mut lines, "\tLENGTH MISMATCH")?;
} else {
for (idx, status) in if_status.iter().enumerate() {
let vpc = vpcs.get(idx);
let if_config = if_configs.get(idx);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve interface-to-VPC alignment when lookups are missing.

filter_map(...).flatten() shrinks the lookup result list, so a missing vpc_id, missing segment ID, or not-found VPC shifts later VPCs onto the wrong interface rows. Collect one Option<Vpc> per rendered interface.

Proposed fix
-        let vpcs: Vec<Vpc> = if auto_network {
+        let vpcs: Vec<Option<Vpc>> = if auto_network {
             futures::future::join_all(
                 if_status
                     .iter()
-                    .filter_map(|s| s.vpc_id)
-                    .map(|vpc_id| get_vpc_by_id(api_client, vpc_id)),
+                    .map(|s| async move {
+                        if let Some(vpc_id) = s.vpc_id {
+                            get_vpc_by_id(api_client, vpc_id).await
+                        } else {
+                            Ok(None)
+                        }
+                    }),
             )
             .await
             .into_iter()
             .collect::<Result<Vec<_>, _>>()?
-            .into_iter()
         } else {
-            futures::future::join_all(if_configs.iter().filter_map(|c| c.network_segment_id).map(
-                |segment_id| async move {
+            futures::future::join_all(if_configs.iter().map(|c| async move {
+                if let Some(segment_id) = c.network_segment_id {
                     get_vpc_for_interface_network_segment(api_client, segment_id).await
-                },
-            ))
+                } else {
+                    Ok(None)
+                }
+            }))
             .await
             .into_iter()
             .collect::<Result<Vec<_>, _>>()?
-            .into_iter()
         }
-        .flatten()
-        .collect();
         if !auto_network && if_configs.len() != if_status.len() {
             writeln!(&mut lines, "\tLENGTH MISMATCH")?;
         } else {
             for (idx, status) in if_status.iter().enumerate() {
-                let vpc = vpcs.get(idx);
+                let vpc = vpcs.get(idx).and_then(Option::as_ref);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/admin-cli/src/instance/show/cmd.rs` around lines 201 - 230, The
current implementation uses filter_map and flatten operations that remove
missing VPC lookup results, causing the remaining VPCs to shift to earlier
indices and misalign with their corresponding interfaces. Replace the
filter_map(s.vpc_id).map(...) and filter_map(c.network_segment_id).map(...)
patterns with approaches that preserve an Option<Vpc> for each interface instead
of discarding missing results. Specifically, for the auto_network branch, map
each interface status to an Option containing either the looked-up VPC or None
if vpc_id is missing; for the else branch, map each interface config to an
Option containing either the looked-up VPC or None if segment_id is missing or
the lookup fails. Remove the flatten() call since you now have Vec<Option<Vpc>>
with proper 1:1 alignment, and adjust the vpcs.get(idx) lookups to handle the
Option wrapper appropriately when displaying results.

Comment on lines +441 to 443
if flat_vpc_id.is_some() {
return Some(machine);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Filter flat-VPC allocation to static-membership machines.

This returns the first Ready machine without checking whether it is eligible for zero-DPU flat-VPC auto allocation. Since the server rejects auto_config on DPU-backed machines, batch allocation can fail before trying a later valid host.

Proposed fix
-            if flat_vpc_id.is_some() {
-                return Some(machine);
-            }
+            if flat_vpc_id.is_some() {
+                let supports_static_membership = machine
+                    .instance_network_restrictions
+                    .as_ref()
+                    .is_some_and(|restrictions| {
+                        restrictions.network_segment_membership_type
+                            == forgerpc::InstanceNetworkSegmentMembershipType::Static as i32
+                    });
+
+                if supports_static_membership {
+                    return Some(machine);
+                }
+
+                tracing::debug!(%id, "machine does not support flat VPC auto allocation");
+                continue;
+            }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/admin-cli/src/machine/show/cmd.rs` around lines 441 - 443, The current
condition checking if flat_vpc_id.is_some() returns the first eligible machine
without verifying it is suitable for zero-DPU flat-VPC auto allocation. Add an
additional eligibility check in the if statement to ensure the machine is a
static-membership machine before returning Some(machine), since the server
rejects auto_config on DPU-backed machines and this prevents batch allocation
from trying valid hosts.

Comment on lines +1267 to +1274
if !allocate_instance.vpc_prefix_id.is_empty()
|| !allocate_instance.vf_vpc_prefix_id.is_empty()
|| !allocate_instance.vf_subnet.is_empty()
{
return Err(CarbideCliError::GenericError(
"--flat-vpc-id does not support explicit VPC prefixes or VF subnets"
.to_string(),
));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject all explicit interface selectors with --flat-vpc-id.

The flat branch currently rejects VPC prefixes and VF subnets, but subnet, explicit IPs, and IPv6 selector fields are still accepted and then ignored because this branch returns Vec::new() interfaces.

Proposed fix
-            if !allocate_instance.vpc_prefix_id.is_empty()
+            if !allocate_instance.subnet.is_empty()
+                || !allocate_instance.vpc_prefix_id.is_empty()
                 || !allocate_instance.vf_vpc_prefix_id.is_empty()
                 || !allocate_instance.vf_subnet.is_empty()
+                || !allocate_instance.ip_address.is_empty()
+                || !allocate_instance.vf_ip_address.is_empty()
+                || !allocate_instance.ipv6_vpc_prefix_id.is_empty()
+                || !allocate_instance.ipv6_vf_prefix_id.is_empty()
+                || !allocate_instance.ipv6_ip_address.is_empty()
+                || !allocate_instance.ipv6_vf_ip_address.is_empty()
             {
                 return Err(CarbideCliError::GenericError(
-                    "--flat-vpc-id does not support explicit VPC prefixes or VF subnets"
-                        .to_string(),
+                    "--flat-vpc-id cannot be combined with explicit interface selectors"
+                        .to_string(),
                 ));
             }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/admin-cli/src/rpc.rs` around lines 1267 - 1274, The error checking in
the flat-vpc-id validation block (in the condition with vpc_prefix_id,
vf_vpc_prefix_id, and vf_subnet) currently only rejects VPC prefixes and VF
subnets, but it must also reject subnet, explicit IP addresses, and IPv6
selector fields to prevent them from being silently ignored. Expand the if
condition to include checks for the subnet field, any explicit IP selector
fields, and IPv6 selector fields in the allocate_instance struct, adding each to
the logical OR chain with the existing checks.

Comment on lines +258 to +263
let Some(vpc) = db::vpc::find_by_segment(&mut txn, network_segment_id)
.await? else {
return Err(CarbideError::InvalidArgument(
"network segment is not a member of a VPC".to_string(),
).into())
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use a system-side error variant for missing VPC membership

At Line 260, this path returns InvalidArgument, but the failure is derived from persisted DB state (network_segment_id -> VPC mapping), not a bad caller argument. This should be surfaced as FailedPrecondition (or Internal) so the error class matches ownership and retry/alert semantics.

Suggested fix
-            let Some(vpc) = db::vpc::find_by_segment(&mut txn, network_segment_id)
-                .await? else {
-                return Err(CarbideError::InvalidArgument(
-                    "network segment is not a member of a VPC".to_string(),
-                ).into())
-            };
+            let Some(vpc) = db::vpc::find_by_segment(&mut txn, network_segment_id)
+                .await? else {
+                return Err(CarbideError::FailedPrecondition(format!(
+                    "network segment {network_segment_id} is not a member of a VPC"
+                )).into());
+            };

As per coding guidelines, handler errors should distinguish invalid user arguments from system-side failures and map uniformly via project error types.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/handlers/dpu.rs` around lines 258 - 263, The error
handling in the vpc::find_by_segment lookup block incorrectly uses
CarbideError::InvalidArgument when the network segment is not found to be a
member of a VPC. This is a system-side database state issue, not a validation
failure on caller input. Change the error variant from
CarbideError::InvalidArgument to CarbideError::FailedPrecondition to properly
classify this as a system failure and ensure correct error ownership semantics
and retry/alert behavior according to the project's error handling guidelines.

Source: Coding guidelines

use carbide_uuid::vpc::VpcPrefixId;
use carbide_uuid::vpc::{VpcId, VpcPrefixId};
use config_version::ConfigVersion;
use db::db_read::DbReader;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Hold the VPC mutation lock before creating VPC-attached addresses.

This validation is followed by instance_addresses inserts for requested_auto_config.vpc_id. Without VpcRowLock::Mutation, a concurrent VPC delete can count zero addresses, soft-delete the VPC, and still let this allocation commit addresses pointing at the deleted VPC.

Proposed fix
-use db::db_read::DbReader;
 use db::{
     self, ObjectColumnFilter, ObjectFilter, compute_allocation, extension_service, ib_partition,
     network_security_group,
 };
 async fn validate_zero_dpu_auto_vpc(
-    txn: impl DbReader<'_>,
+    txn: &mut PgConnection,
     vpc_id: VpcId,
     tenant_organization_id: &TenantOrganizationId,
 ) -> Result<model::vpc::Vpc, CarbideError> {
-    let vpc = db::vpc::find_by(txn, ObjectColumnFilter::One(db::vpc::IdColumn, &vpc_id))
+    let vpc = db::vpc::find_by_with_lock(
+        txn,
+        ObjectColumnFilter::One(db::vpc::IdColumn, &vpc_id),
+        db::vpc::VpcRowLock::Mutation,
+    )
         .await?
         .into_iter()
         .next()

Also applies to: 89-95

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/instance/mod.rs` at line 31, The VPC-attached address
creation in instance_addresses inserts for requested_auto_config.vpc_id lacks
the necessary VpcRowLock::Mutation lock, allowing a race condition where
concurrent VPC deletions can soft-delete a VPC while addresses are still being
committed to it. Acquire the VpcRowLock::Mutation lock before performing the
instance_addresses inserts for the requested VPC to ensure mutual exclusion with
concurrent VPC deletion operations and prevent dangling addresses from being
created for deleted VPCs.

Comment on lines +1 to +8
ALTER TABLE instance_addresses
ADD COLUMN vpc_id uuid REFERENCES vpcs(id);

UPDATE instance_addresses ia
SET vpc_id = ns.vpc_id
FROM network_segments ns
WHERE ia.segment_id = ns.id
AND ia.vpc_id IS NULL;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce instance_addresses.vpc_id as non-null after backfill.

Line 2 introduces a nullable column, and Lines 4–8 only opportunistically backfill. That leaves room for null vpc_id rows (e.g., mixed-version rollout writes), which can silently violate the new VPC-address association contract.

Suggested migration hardening
 ALTER TABLE instance_addresses
 ADD COLUMN vpc_id uuid REFERENCES vpcs(id);

 UPDATE instance_addresses ia
 SET vpc_id = ns.vpc_id
 FROM network_segments ns
 WHERE ia.segment_id = ns.id
 AND ia.vpc_id IS NULL;

+DO $$
+BEGIN
+    IF EXISTS (SELECT 1 FROM instance_addresses WHERE vpc_id IS NULL) THEN
+        RAISE EXCEPTION 'Backfill left NULL instance_addresses.vpc_id rows';
+    END IF;
+END $$;
+
+ALTER TABLE instance_addresses
+ALTER COLUMN vpc_id SET NOT NULL;
+
 CREATE INDEX instance_addresses_vpc_id_idx ON instance_addresses(vpc_id);
 CREATE INDEX instance_addresses_instance_id_vpc_id_idx ON instance_addresses(instance_id, vpc_id);

As per coding guidelines, crates/api-db/migrations/** changes should preserve compatibility/data consistency; leaving the new key nullable undermines that contract.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-db/migrations/20260615120000_instance_addresses_vpc_id.sql` around
lines 1 - 8, The migration adds a nullable vpc_id column to instance_addresses
and only partially backfills it via an UPDATE statement that joins with
network_segments. This leaves room for NULL vpc_id values in existing or
newly-inserted rows during a mixed-version rollout, violating the intended
VPC-address association contract. After the UPDATE backfill statement completes,
add an ALTER TABLE statement to the migration to enforce the vpc_id column as
NOT NULL, ensuring data consistency and preventing future null values from being
inserted into this critical field.

Source: Coding guidelines

Comment thread crates/api-db/src/vpc.rs
Comment on lines +286 to +293
let instance_address_count_query = "SELECT count(*) FROM instance_addresses
WHERE vpc_id=$1
AND EXISTS (SELECT 1 FROM vpcs WHERE id=$1 AND deleted IS NULL)";
let instance_address_count: i64 = sqlx::query_scalar(instance_address_count_query)
.bind(id)
.fetch_one(&mut *txn)
.await
.map_err(|e| DatabaseError::query(instance_address_count_query, e))?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Inspect instance_addresses lifecycle and whether VPC guards should ignore released rows.
# Expect: Either instance_addresses are hard-deleted, or VPC/address queries filter active rows consistently.

rg -n -C3 'CREATE TABLE instance_addresses|ALTER TABLE instance_addresses|instance_addresses.*deleted|deleted.*instance_addresses|pub async fn delete|fn delete' \
  --glob '!target/**'

rg -n -C4 'instance_addresses.*vpc_id|vpc_id.*instance_addresses|instance_address::delete|release.*instance' \
  --glob '!target/**'

Repository: NVIDIA/infra-controller

Length of output: 50381


🏁 Script executed:

# Find migration files and search for instance_addresses schema
find . -name "*.sql" -type f | head -20

Repository: NVIDIA/infra-controller

Length of output: 1397


🏁 Script executed:

# Search for instance_addresses in migrations/schema files
rg -l "instance_addresses" --type sql --type rs | head -20

Repository: NVIDIA/infra-controller

Length of output: 94


🏁 Script executed:

# Look for the table definition and deleted column
rg -A 10 "CREATE TABLE.*instance_addresses" --type sql

Repository: NVIDIA/infra-controller

Length of output: 1162


🏁 Script executed:

# Find instance_address deletion/release functions
rg -B 2 -A 10 "fn.*delete.*instance.*address|fn.*release.*address" \
  --type rs crates/api-db/

Repository: NVIDIA/infra-controller

Length of output: 94


🏁 Script executed:

# Check vpc.rs context around lines 280-300
sed -n '280,300p' crates/api-db/src/vpc.rs

Repository: NVIDIA/infra-controller

Length of output: 944


🏁 Script executed:

# Search for migrations that modify instance_addresses
rg "ALTER TABLE.*instance_addresses|vpc_id.*instance" \
  crates/api-db/migrations/ -A 5

Repository: NVIDIA/infra-controller

Length of output: 2461


🏁 Script executed:

# Check all migrations in order
ls -la crates/api-db/migrations/ | grep instance_addresses

Repository: NVIDIA/infra-controller

Length of output: 345


🏁 Script executed:

# Look at the full instance_addresses schema across all migrations
rg "instance_addresses" crates/api-db/migrations/ -A 15 | head -100

Repository: NVIDIA/infra-controller

Length of output: 10068


🏁 Script executed:

# Search for how instance_addresses are deleted or soft-deleted
rg -i "delete.*instance_address|instance_address.*delete|release.*instance" \
  crates/api-db/src/ -B 2 -A 8

Repository: NVIDIA/infra-controller

Length of output: 3147


🏁 Script executed:

# Check if vpc_id column is actually in instance_addresses by looking at all schema references
rg "instance_addresses.*vpc_id|vpc_id.*instance_addresses" crates/ --include='*.sql' --include='*.rs'

Repository: NVIDIA/infra-controller

Length of output: 509


🏁 Script executed:

# Search for release-related operations on instances or addresses
rg "release|Release" crates/api-db/src/instance.rs crates/api-db/src/instance_address.rs -B 3 -A 10

Repository: NVIDIA/infra-controller

Length of output: 49


🏁 Script executed:

# Check the full vpc.rs delete function context
sed -n '275,310p' crates/api-db/src/vpc.rs

Repository: NVIDIA/infra-controller

Length of output: 1600


🏁 Script executed:

# Look for any soft-delete patterns on instances
rg "deleted.*instances|instances.*deleted" crates/api-db/src/ -B 2 -A 5

Repository: NVIDIA/infra-controller

Length of output: 2768


Refactor the query to remove the redundant EXISTS subquery; instance_addresses are hard-deleted during instance release.

The instance_address::delete function executes a hard DELETE (DELETE FROM instance_addresses WHERE instance_id=$1), so orphaned or soft-deleted addresses do not retain the VPC reference. However, the EXISTS clause in lines 288–290 is redundant—it checks VPC existence within a deletion guard that already operates under that premise. Simplify to:

SELECT count(*) FROM instance_addresses WHERE vpc_id=$1

This eliminates unnecessary defensive logic and improves query clarity.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-db/src/vpc.rs` around lines 286 - 293, The
instance_address_count_query contains a redundant EXISTS subquery that checks if
the VPC still exists (is not deleted). Since instance_addresses are hard-deleted
during instance release, this defensive check is unnecessary. Simplify the query
string to remove the EXISTS clause and only check WHERE vpc_id=$1, reducing the
query to a single simple condition. Update the instance_address_count_query
variable to contain only the simplified SQL without the subquery that validates
VPC existence.

Comment on lines +291 to +293
#[allow(deprecated)]
auto: false,
auto_config: None,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Remove the deprecation suppression by constructing InstanceNetworkConfig without touching deprecated fields.

This local #[allow(deprecated)] is avoidable and weakens the repo’s lint discipline. Build the struct via Default and set only supported fields.

Suggested change
-            network: Some(rpc::InstanceNetworkConfig {
-                interfaces: vec![interface_config],
-                #[allow(deprecated)]
-                auto: false,
-                auto_config: None,
-            }),
+            network: Some({
+                let mut network = rpc::InstanceNetworkConfig::default();
+                network.interfaces = vec![interface_config];
+                network.auto_config = None;
+                network
+            }),

As per coding guidelines, Rust changes should remain clippy-clean and avoid #[allow(...)] unless there is a strong reason.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/machine-a-tron/src/api_client.rs` around lines 291 - 293, Remove the
#[allow(deprecated)] annotation and refactor the InstanceNetworkConfig
construction to use Default::default() instead of manually setting the
deprecated fields auto and auto_config. Keep only the non-deprecated fields that
need to be explicitly set, allowing the deprecated fields to use their default
values without suppressing the deprecation warning. This maintains clippy
cleanliness and follows coding guidelines that avoid unnecessary lint
suppressions.

Source: Coding guidelines

@github-actions

Copy link
Copy Markdown

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
boot-artifacts-aarch64 3 0 0 3 0 0
boot-artifacts-x86_64 3 0 0 3 0 0
forge-admin-cli-x86_64 244 4 22 103 6 109
machine-validation-runner 671 23 178 262 37 171
machine_validation 671 23 178 262 37 171
nvmetal-carbide 671 23 178 262 37 171
TOTAL 2263 73 556 895 117 622

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant