feat(machine-validation): M1: Add machine validation execution tracking foundation#2651
feat(machine-validation): M1: Add machine validation execution tracking foundation#2651sunilkumar-nvidia wants to merge 7 commits into
Conversation
# Conflicts: # crates/api-core/src/tests/machine_validation.rs
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Summary by CodeRabbit
WalkthroughThis PR introduces end-to-end execution tracking for machine validation runs. It adds two new typed UUID types, a DB migration creating ChangesMachine Validation Execution Tracking
Sequence Diagram(s)sequenceDiagram
participant Client
participant Api as Api (Forge impl)
participant Handler as handlers/machine_validation
participant DBExec as machine_validation_execution
participant DB as PostgreSQL
rect rgba(30, 100, 200, 0.5)
Note over Client,DB: update_machine_validation_run with selected_tests
Client->>Api: UpdateMachineValidationRun(selected_tests)
Api->>Handler: update_machine_validation_run
Handler->>DBExec: materialize_run_plan(run_id, context, selected_tests)
DBExec->>DB: upsert run_items + insert pending attempts
DB-->>DBExec: ok
DBExec-->>Handler: ()
Handler-->>Client: MachineValidationRun
end
rect rgba(20, 160, 80, 0.5)
Note over Client,DB: persist_validation_result write-ahead path
Client->>Api: PersistMachineValidationResult(result)
Api->>Handler: persist_validation_result
Handler->>DBExec: record_result(result)
DBExec->>DB: upsert run_item, update/insert attempt
DB-->>DBExec: terminal?
alt already terminal
DBExec-->>Handler: false
Handler-->>Client: early return (no legacy update)
else new terminal
DBExec-->>Handler: true
Handler->>DB: update legacy health-report / result projection
Handler-->>Client: ok
end
end
rect rgba(180, 60, 20, 0.5)
Note over Client,DB: ListMachineValidationRunItems / GetMachineValidationAttempt
Client->>Api: FindMachineValidationRunItemsByIds(run_id)
Api->>Handler: find_machine_validation_run_items_by_ids
Handler->>DBExec: find_run_items_by_ids
DBExec->>DB: SELECT with lateral join for latest attempt_id
DB-->>Handler: Vec<MachineValidationRunItem>
Handler-->>Client: MachineValidationRunItemList
Client->>Api: GetMachineValidationAttempt(attempt_id)
Api->>Handler: get_machine_validation_attempt
Handler->>DBExec: find_attempt_by_id
DBExec->>DB: SELECT from machine_validation_attempts
DB-->>Handler: MachineValidationAttempt
Handler-->>Client: MachineValidationAttempt
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
crates/api-model/src/machine_validation.rs (1)
119-149: ⚡ Quick winAdd table-driven tests for the new run-item/attempt state enums.
The new enums introduce string parsing/formatting contracts, but only
MachineValidationStatecurrently has scenario-based coverage in this file.As per coding guidelines, “Prefer table-driven tests for any function that maps inputs to outputs, errors, or other observable results.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/api-model/src/machine_validation.rs` around lines 119 - 149, The new enums MachineValidationRunItemState and MachineValidationAttemptState introduce string parsing contracts through the EnumString derive macro and string formatting through the Display trait implementation, but lack test coverage. Add table-driven tests for both enums to verify that all variants can be correctly parsed from strings and formatted back to strings using the Display implementation. Follow the same table-driven test pattern used for MachineValidationState in the file to ensure comprehensive coverage of the string conversion contracts for each enum variant.Source: Coding guidelines
crates/rpc/src/model/machine_validation.rs (1)
168-223: ⚡ Quick winAdd table-driven tests for the new run-item and attempt conversion mappings.
Lines 168-223 add two new field-mapping conversions, but this file’s tests do not currently lock these mappings. Please add table-driven cases for populated and sparse inputs to prevent silent contract drift in RPC serialization.
As per coding guidelines: “Prefer table-driven tests for any function that maps inputs to outputs.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/rpc/src/model/machine_validation.rs` around lines 168 - 223, The new From trait implementations for MachineValidationRunItem and MachineValidationAttempt (lines 168-223) lack test coverage for their field mappings. Add table-driven tests that verify the conversions for both implementations by testing cases with populated and sparse input values, ensuring each field is correctly mapped to its corresponding RPC output field and that default values are applied appropriately where needed.Source: Coding guidelines
crates/machine-validation/src/lib.rs (1)
163-177: ⚡ Quick winAvoid cloning the full tests vector for selection pass.
Lines 163-177 currently iterate via
tests.clone(). Iterate by reference and clone only selected items to reduce per-run allocation/copy overhead.As per coding guidelines: avoid needless `.clone()` and prefer ownership/borrowing choices that minimize cloning.Suggested fix
- let mut selected_tests = Vec::new(); - for test in tests.clone() { + let mut selected_tests = Vec::new(); + for test in &tests { if !machine_validation_filter.allowed_tests.is_empty() && !machine_validation_filter .allowed_tests .iter() .any(|t| t.eq_ignore_ascii_case(&test.test_id)) { continue; } run_request.total += 1; expected_time_duration += test.timeout.unwrap_or(7200); - selected_tests.push(test); + selected_tests.push(test.clone()); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/machine-validation/src/lib.rs` around lines 163 - 177, The tests vector is being cloned entirely at the beginning of the iteration loop, which is wasteful. Instead of iterating over tests.clone(), iterate by reference using &tests and only clone individual test items when they are actually selected and added to the selected_tests vector. Change the for loop to iterate by reference and modify the selected_tests.push call to clone only the test items that pass the filter criteria.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/api-core/src/handlers/machine_validation.rs`:
- Around line 829-855: After collecting selected_tests into a vector from
req.selected_tests using ModelMachineValidationTest::try_from, add validation to
ensure that req.total matches the length of the selected_tests vector. If they
do not match, return a CarbideError with a descriptive message (for example,
indicating a mismatch between the total count and the actual number of tests).
This check must occur before calling db::machine_validation::update_run to
prevent inconsistent state between run totals and actual run-item rows.
In `@crates/api-model/src/machine_validation.rs`:
- Around line 226-227: In the state field assignment where
MachineValidationRunItemState is parsed from a database row, replace the
.unwrap_or_default() call with the ? operator to properly propagate parsing
errors instead of silently defaulting to Pending when unknown state values are
encountered. This change must be applied at two locations: lines 226-227 in the
main state assignment and lines 266-267 in the sibling location where the same
pattern appears. By removing the unwrap_or_default() and using ? instead, row
decoding will fail immediately when encountering invalid or drifted state values
rather than masking them.
- Line 220: The current_attempt_id field decoding uses .ok().flatten() which
silently converts SQL decode errors into None, hiding potential schema/query
regressions and data corruption issues. Replace the .ok().flatten() pattern on
the row.try_get("current_attempt_id") call with proper error handling that
doesn't swallow decode failures. This should use the appropriate error
propagation or optional handling method (such as .optional() if available in the
database driver, or .? operator) to allow legitimate None values while
explicitly handling actual decoding errors.
In `@crates/rpc/proto/forge.proto`:
- Around line 609-610: The ListMachineValidationRunItems RPC returns an
unbounded repeated payload which doesn't follow the paginated list pattern
required by the codebase and will not scale for large validation histories.
Replace this single RPC with two separate RPC calls:
FindMachineValidationRunItemIds which returns a paginated list of IDs with
standard pagination fields, and FindMachineValidationRunItemsByIds which takes a
list of IDs and returns the full MachineValidationRunItem entities. Keep the
entity details in the by-IDs call response while the IDs call focuses on
pagination-friendly ID retrieval.
---
Nitpick comments:
In `@crates/api-model/src/machine_validation.rs`:
- Around line 119-149: The new enums MachineValidationRunItemState and
MachineValidationAttemptState introduce string parsing contracts through the
EnumString derive macro and string formatting through the Display trait
implementation, but lack test coverage. Add table-driven tests for both enums to
verify that all variants can be correctly parsed from strings and formatted back
to strings using the Display implementation. Follow the same table-driven test
pattern used for MachineValidationState in the file to ensure comprehensive
coverage of the string conversion contracts for each enum variant.
In `@crates/machine-validation/src/lib.rs`:
- Around line 163-177: The tests vector is being cloned entirely at the
beginning of the iteration loop, which is wasteful. Instead of iterating over
tests.clone(), iterate by reference using &tests and only clone individual test
items when they are actually selected and added to the selected_tests vector.
Change the for loop to iterate by reference and modify the selected_tests.push
call to clone only the test items that pass the filter criteria.
In `@crates/rpc/src/model/machine_validation.rs`:
- Around line 168-223: The new From trait implementations for
MachineValidationRunItem and MachineValidationAttempt (lines 168-223) lack test
coverage for their field mappings. Add table-driven tests that verify the
conversions for both implementations by testing cases with populated and sparse
input values, ensuring each field is correctly mapped to its corresponding RPC
output field and that default values are applied appropriately where needed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 2649fea8-3a22-4434-863c-2a6cdd8f0461
📒 Files selected for processing (16)
crates/api-core/src/api.rscrates/api-core/src/auth/internal_rbac_rules.rscrates/api-core/src/handlers/machine_validation.rscrates/api-core/src/machine_validation/mod.rscrates/api-core/src/tests/common/api_fixtures/mod.rscrates/api-core/src/tests/machine_validation.rscrates/api-db/migrations/20260613120000_machine_validation_execution_foundation.sqlcrates/api-db/src/lib.rscrates/api-db/src/machine_validation_execution.rscrates/api-model/src/machine_validation.rscrates/machine-controller/src/handler/machine_validation.rscrates/machine-validation/src/lib.rscrates/rpc/build.rscrates/rpc/proto/forge.protocrates/rpc/src/model/machine_validation.rscrates/uuid/src/machine_validation/mod.rs
🔍 Container Scan Summary
Per-CVE detail lives in the per-service |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/api-model/src/machine_validation.rs (1)
235-241: 💤 Low valueConsider simplifying
current_attempt_iddecoding if the column is always present.The
ColumnNotFoundbranch provides defensive handling for queries that might omitcurrent_attempt_id, but the DB module query (context snippet 1) always includes this column viaLEFT JOIN. If all queries consistently select this column, the match simplifies to:current_attempt_id: row.try_get("current_attempt_id")?,The existing defensive pattern is harmless but adds cognitive load. If queries vary and some omit the column, document that rationale with a comment.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/api-model/src/machine_validation.rs` around lines 235 - 241, The current_attempt_id field in the match statement has defensive handling for ColumnNotFound errors, but the DB module query always includes this column via LEFT JOIN. Verify that all queries using this deserialization code consistently select the current_attempt_id column. If they do, simplify the match statement by removing the ColumnNotFound branch and replacing the entire match block with a simple row.try_get call using the ? operator. If queries vary and some omit the column, keep the existing pattern but add a comment explaining the defensive rationale.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/api-model/src/machine_validation.rs`:
- Around line 235-241: The current_attempt_id field in the match statement has
defensive handling for ColumnNotFound errors, but the DB module query always
includes this column via LEFT JOIN. Verify that all queries using this
deserialization code consistently select the current_attempt_id column. If they
do, simplify the match statement by removing the ColumnNotFound branch and
replacing the entire match block with a simple row.try_get call using the ?
operator. If queries vary and some omit the column, keep the existing pattern
but add a comment explaining the defensive rationale.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b51a2949-61ef-4b9b-81ae-ea8c0b7ae557
📒 Files selected for processing (12)
crates/api-core/src/api.rscrates/api-core/src/auth/internal_rbac_rules.rscrates/api-core/src/handlers/machine_validation.rscrates/api-core/src/tests/machine_validation.rscrates/api-db/src/lib.rscrates/api-db/src/machine_validation_execution.rscrates/api-model/src/machine_validation.rscrates/machine-validation/src/lib.rscrates/rpc/build.rscrates/rpc/proto/forge.protocrates/rpc/src/model/machine_validation.rscrates/uuid/src/machine_validation/mod.rs
🚧 Files skipped from review as they are similar to previous changes (9)
- crates/api-db/src/lib.rs
- crates/rpc/src/model/machine_validation.rs
- crates/machine-validation/src/lib.rs
- crates/rpc/build.rs
- crates/uuid/src/machine_validation/mod.rs
- crates/rpc/proto/forge.proto
- crates/api-core/src/handlers/machine_validation.rs
- crates/api-core/src/tests/machine_validation.rs
- crates/api-db/src/machine_validation_execution.rs
This PR implements the M1 deliverables from the Machine Validation 2.0 design: #2070.
M1 is focused on adding the execution-tracking foundation without changing how Scout runs validation tests today. Scout still runs tests sequentially, but the API now records which tests were selected for a run and tracks each test as a run item with an associated attempt.
Description
What This Adds
Why This Is Needed
Previously, machine validation only stored legacy result rows after Scout reported results. There was no durable per-test execution plan or attempt-level state to inspect while a validation run was active.
This PR adds that foundation so later Machine Validation 2.0 milestones can build on stable run-item and attempt records.
Compatibility
Existing machine validation behavior is preserved:
DB Changes
Adds migration:
20260613120000_machine_validation_execution_foundation.sqlThe migration creates two new tables:
machine_validation_run_itemsmachine_validation(id)withON DELETE CASCADE.(run_id, test_id).machine_validation_attemptsmachine_validation_run_items(id)withON DELETE CASCADE.(run_item_id, attempt_number).This is an additive schema change. No existing tables are modified or backfilled.
Deployment / Upgrade Notes
Deployments must run the DB migration before relying on the new M1 APIs.
For normal Helm/Kubernetes deployments, this should be handled by the existing
nico-apimigration job, which runscarbide-api migratebefore the API rollout. No manual DB steps are expected as long as the migration job runs successfully.For manual/local deployments, run the existing migration command before starting the updated API service, for example:
cargo run --package carbide-api --no-default-features -- migrate or the equivalent deployed binary command: carbide-api migrate --datastore="<postgres connection string>"The migration is safe for rolling upgrades because it only adds new tables. Older code ignores these tables, and newer code continues writing the legacy machine_validation_results rows.
Notes
Validation
Related issues
Related to #454
Type of Change
Breaking Changes
Testing
Additional Notes