Skip to content

test(e2e): migrate upgrade stale sandbox to vitest#5542

Merged
cv merged 12 commits into
mainfrom
e2e-migrate/test-upgrade-stale-sandbox
Jun 20, 2026
Merged

test(e2e): migrate upgrade stale sandbox to vitest#5542
cv merged 12 commits into
mainfrom
e2e-migrate/test-upgrade-stale-sandbox

Conversation

@cv

@cv cv commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Summary

Migrates test/e2e/test-upgrade-stale-sandbox.sh into a typed live Vitest scenario. The new coverage preserves the #1904 stale sandbox upgrade contract through real install.sh, Docker image build, OpenShell sandbox creation, upgrade-sandboxes --check, and rebuild boundaries.

Related Issue

Refs #5098

Changes

  • Add test/e2e-scenario/live/upgrade-stale-sandbox.test.ts with typed artifact capture, cleanup, and secret redaction.
  • Wire upgrade-stale-sandbox-vitest into .github/workflows/e2e-vitest-scenarios.yaml as a free-standing dispatchable Vitest job.
  • Preserve legacy shell deletion for Phase 11 cleanup per the migration governance in Epic: Migrate legacy bash E2E into the Vitest E2E system #5098.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • PR description includes the DCO sign-off declaration and every commit appears as Verified in GitHub
  • Git hooks passed during commit and push, or npx prek run --from-ref main --to-ref HEAD passes
  • Targeted tests pass for changed behavior
  • Full npm test passes (broad runtime changes only)
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Targeted commands run:

  • npx biome check --write test/e2e-scenario/live/upgrade-stale-sandbox.test.ts
  • NEMOCLAW_RUN_E2E_SCENARIOS=1 npx vitest run --project e2e-scenarios-live test/e2e-scenario/live/upgrade-stale-sandbox.test.ts -t __compile_only_nomatch__ --silent=false --reporter=default --passWithNoTests
  • npx vitest run --project e2e-vitest-support test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts
  • npx tsx scripts/check-test-file-size-budget.ts test/e2e-scenario/live/upgrade-stale-sandbox.test.ts

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

  • Tests

    • Added comprehensive end-to-end test coverage for sandbox upgrade scenarios, including environment configuration, sandbox creation, and version verification workflows.
  • Chores

    • Enhanced CI/CD workflow with automated testing infrastructure for upgrade-stale-sandbox scenarios.
    • Updated workflow job dependencies to include new upgrade testing results in PR reporting.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv cv self-assigned this Jun 18, 2026
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new live Vitest E2E test (upgrade-stale-sandbox.test.ts) that provisions a sandbox with a pinned legacy OpenClaw base image, writes stale NemoClaw host metadata, verifies stale detection via upgrade-sandboxes --check, rebuilds the sandbox, and confirms it becomes up-to-date. Wires the test into a new upgrade-stale-sandbox-vitest GitHub Actions job with Docker auth isolation, and adds a workflow boundary validator enforcing all step-level constraints.

Changes

Stale Sandbox Upgrade E2E Test

Layer / File(s) Summary
Test constants, helpers, and stale-state setup
test/e2e-scenario/live/upgrade-stale-sandbox.test.ts
Defines imports, sandbox naming, legacy OpenClaw version constant, NemoClaw registry/session file paths, timing config, commandEnv helper, best-effort cleanup, JSON persistence, file snapshot/restore helpers, createOldBaseBuildContext (rewrites min_openclaw_version), and writeStaleRegistryEntry (mutates host registry/session metadata).
Live scenario test flow
test/e2e-scenario/live/upgrade-stale-sandbox.test.ts
Full live test: skip/fail gating on Docker availability, snapshot/restore hooks, best-effort pre-cleanup, NemoClaw install with retry, old-base Docker image build, old OpenShell sandbox creation with readiness polling and version assertion, stale metadata write, upgrade-sandboxes --check assertion, rebuild with readiness polling, post-rebuild version validation, and final up-to-date check.
GitHub Actions job and report-to-pr wiring
.github/workflows/e2e-vitest-scenarios.yaml
Adds upgrade-stale-sandbox-vitest job with job-scoped DOCKER_CONFIG isolation, Docker Hub login (3-retry/anonymous-skip), dependency install, CLI build, OpenShell install with secret unsets and OPENSHELL_BIN resolution, Vitest run, artifact upload, and Docker auth cleanup; adds job to report-to-pr needs.
Workflow boundary validator and dispatch selector tests
tools/e2e-scenarios/workflow-boundary.mts, test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts
Introduces validateUpgradeStaleSandboxVitestJob enforcing runner, timeout, env, secret non-exposure, Docker auth steps, Vitest run, and artifact upload; wires it into validateE2eVitestScenariosWorkflowBoundary; extends dispatch selector and free-standing inventory tests for the new scenario/job.

Sequence Diagram(s)

sequenceDiagram
    rect rgba(173, 216, 230, 0.5)
        Note over Test,Registry: Setup and stale state creation
    end
    participant Test as upgrade-stale-sandbox test
    participant Docker as Docker
    participant OpenShell as OpenShell
    participant NemoClaw as nemoclaw CLI
    participant Registry as NemoClaw Registry JSON

    Test->>NemoClaw: install.sh --non-interactive (retry on transient failure)
    Test->>Docker: docker build (OLD_OPENCLAW_VERSION base image)
    Test->>OpenShell: create sandbox from fixture Dockerfile
    OpenShell-->>Test: sandbox ready (readiness poll)
    Test->>OpenShell: assert old OpenClaw version inside sandbox
    Test->>Registry: writeStaleRegistryEntry (agentVersion = old version)

    rect rgba(144, 238, 144, 0.5)
        Note over Test,NemoClaw: Stale detection and rebuild
    end
    Test->>NemoClaw: upgrade-sandboxes --check
    NemoClaw-->>Test: sandbox flagged as stale
    Test->>NemoClaw: nemoclaw rebuild --yes
    NemoClaw-->>Test: rebuild complete
    Test->>OpenShell: poll for readiness
    Test->>OpenShell: assert new OpenClaw version (old version absent)
    Test->>NemoClaw: upgrade-sandboxes --check
    NemoClaw-->>Test: sandbox reported as up to date
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#5150: Adds a free-standing *-vitest GitHub Actions job and extends the same workflow boundary/selector validation machinery in tools/e2e-scenarios/workflow-boundary.mts and e2e-scenarios-workflow.test.ts.
  • NVIDIA/NemoClaw#5333: Extends the same .github/workflows/e2e-vitest-scenarios.yaml and tools/e2e-scenarios/workflow-boundary.mts infrastructure with another free-standing Vitest job and report-to-pr wiring for a different sandbox rebuild scenario.
  • NVIDIA/NemoClaw#5346: Also adds a per-scenario job validator wired into validateE2eVitestScenariosWorkflowBoundary and updates the corresponding workflow boundary tests, mirroring this PR's structural pattern.

Suggested labels

area: e2e, chore

🐇 A sandbox grew old, its agent version stale,
So nemoclaw checked and found it beyond the pale.
It rebuilt anew with a fresh OpenClaw inside,
"Up to date!" said the check — the bunny beamed with pride.
From old Docker layers to green CI runs at last,
The upgrade test passes — the stale days are past! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: migrating the upgrade stale sandbox E2E test from bash to Vitest/TypeScript, which aligns with the primary objective of the pull request.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch e2e-migrate/test-upgrade-stale-sandbox

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-code-quality

github-code-quality Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Code Coverage Overview

Languages: TypeScript

TypeScript / code-coverage/plugin

The overall coverage in the branch is 96%. Coverage data for the branch is not yet available.

Show a code coverage summary of the most covered files.
File 70cfd1e +/-
nemoclaw/src/se...cret-scanner.ts 100%
nemoclaw/src/commands/slash.ts 100%
nemoclaw/src/li...bprocess-env.ts 100%
nemoclaw/src/bl...eprint/state.ts 98%
nemoclaw/src/onboard/config.ts 98%
nemoclaw/src/bl...int/snapshot.ts 97%
nemoclaw/src/bl...print/runner.ts 95%
nemoclaw/src/co...ration-state.ts 94%
nemoclaw/src/bl...ate-networks.ts 94%
nemoclaw/src/index.ts 94%

TypeScript / code-coverage/cli

The overall coverage in the branch is 46%. Coverage data for the branch is not yet available.

Show a code coverage summary of the most covered files.
File 70cfd1e +/-
src/lib/state/o...oard-session.ts 90%
src/lib/inference/local.ts 76%
src/lib/sandbox/config.ts 72%
src/lib/actions...dbox/rebuild.ts 67%
src/lib/onboard/preflight.ts 64%
src/lib/actions...licy-channel.ts 56%
src/lib/state/sandbox.ts 55%
src/lib/policy/index.ts 49%
src/lib/onboard...er-gpu-patch.ts 44%
src/lib/onboard.ts 18%

Updated June 20, 2026 17:23 UTC
Code Coverage is in Public Preview. Learn more and provide us with your feedback.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: upgrade-stale-sandbox-vitest
Optional E2E: sandbox-rebuild-vitest, state-backup-restore-vitest

Dispatch hint: upgrade-stale-sandbox-vitest

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • upgrade-stale-sandbox-vitest (high): This PR introduces the upgrade-stale-sandbox live E2E job and test. It should run to validate the new workflow wiring, OpenShell/Docker setup, install/onboard phase, stale sandbox detection, rebuild path, and post-upgrade check.

Optional E2E

  • sandbox-rebuild-vitest (high): Adjacent confidence check for NemoClaw sandbox rebuild behavior, which is central to the new stale sandbox upgrade scenario but not directly changed in runtime code.
  • state-backup-restore-vitest (high): Useful adjacent lifecycle/state-registry coverage because the new helper snapshots and restores NemoClaw state files and manipulates sandbox registry metadata.

New E2E recommendations

  • None.

Dispatch hint

  • Workflow: .github/workflows/e2e-vitest-scenarios.yaml
  • jobs input: upgrade-stale-sandbox-vitest

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: upgrade-stale-sandbox-vitest
Optional Vitest E2E scenarios: None

Dispatch required Vitest E2E scenarios:

  • gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field jobs=upgrade-stale-sandbox-vitest

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required Vitest E2E scenarios

  • upgrade-stale-sandbox-vitest: Focused free-standing Vitest job wired for changed live test test/e2e-scenario/live/upgrade-stale-sandbox.test.ts.
    • Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field jobs=upgrade-stale-sandbox-vitest

Optional Vitest E2E scenarios

  • None.

Relevant changed files

  • .github/workflows/e2e-vitest-scenarios.yaml
  • test/e2e-scenario/live/upgrade-stale-sandbox-helpers.ts
  • test/e2e-scenario/live/upgrade-stale-sandbox.test.ts
  • test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts
  • tools/e2e-scenarios/workflow-boundary.mts

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 1 needs attention, 4 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 5 still apply, 0 new items found

Review findings

🛠️ Needs attention

  • Assert the rebuilt sandbox runs the expected current OpenClaw version (test/e2e-scenario/live/upgrade-stale-sandbox.test.ts:147): The legacy migration contract includes “9. Verify the sandbox now runs the current OpenClaw version,” but the new Vitest test only proves the rebuilt sandbox no longer reports the old 2026.3.11 version. That can pass for any other non-current version and leaves the acceptance clause only partially covered.
    • Recommendation: Compare the rebuilt `openclaw --version` output against the current expected OpenClaw version source used by the build, such as the `ARG OPENCLAW_VERSION` value in `Dockerfile.base`, or explicitly narrow the migration contract if “not old” is the intended behavior.
    • Evidence: Legacy `test/e2e/test-upgrade-stale-sandbox.sh` lists clause 9 as “Verify the sandbox now runs the current OpenClaw version”; the new assertion is `expect(resultText(newVersion)).not.toContain(OLD_OPENCLAW_VERSION)`.

🔎 Worth checking

  • Avoid uploading the full local sandboxes registry (test/e2e-scenario/live/upgrade-stale-sandbox.test.ts:118): The test preserves any existing registry entries, adds the synthetic stale sandbox, and uploads the entire `~/.nemoclaw/sandboxes.json` file as `registered-stale-sandbox.json`. Clean CI runners are likely low-risk, but the live test supports local execution where pre-existing registry entries can contain legacy, corrupt, or credential-shaped fields. `ArtifactSink.writeText()` does not redact arbitrary text.
    • Recommendation: Write an artifact containing only the synthetic `SANDBOX_NAME` entry needed for evidence, or redact/strip credential-shaped fields before writing the artifact.
    • Evidence: `writeStaleRegistryEntry()` uses `registry.sandboxes = registry.sandboxes ?? {}` and inserts `registry.sandboxes[SANDBOX_NAME]`; `registeredStaleSandboxJson()` returns `fs.readFileSync(REGISTRY_FILE, "utf8")`; the test uploads that raw string via `artifacts.writeText("registered-stale-sandbox.json", registeredStaleSandboxJson())`.
  • Fail on malformed NemoClaw state JSON before seeding stale metadata (test/e2e-scenario/live/upgrade-stale-sandbox-helpers.ts:68): Missing state files are a valid fixture setup condition, but malformed or unreadable `sandboxes.json` / `onboard-session.json` is a different invalid state. The helper currently catches every read or parse failure and returns the fallback, which can silently replace corrupt local state and hide upstream install/onboard corruption.
    • Recommendation: Return the fallback only when the file does not exist. If the file exists but cannot be read or parsed, throw a diagnostic error naming the corrupt file, or document and test why corrupt local state is intentionally tolerated.
    • Evidence: `readJsonFile<T>()` catches all exceptions from `JSON.parse(fs.readFileSync(file, "utf8"))` and returns `fallback`; `writeStaleRegistryEntry()` uses this helper for both `SESSION_FILE` and `REGISTRY_FILE`.
  • Secret-bearing free-standing job expands the trusted-code boundary (.github/workflows/e2e-vitest-scenarios.yaml:1762): The new manually dispatched job runs branch-controlled installer, build, and Vitest code with Docker access, Docker Hub authentication, and `NVIDIA_INFERENCE_API_KEY` available to the test step. The patch adds useful mitigations, but this remains a high-risk trusted-code path.
    • Recommendation: Keep this workflow manual-only and restricted to trusted maintainer dispatch, preserve the boundary validator coverage for secret placement and installer env unsets, and avoid adding `pull_request` / `pull_request_target` triggers or direct dispatch-input interpolation to this job.
    • Evidence: `upgrade-stale-sandbox-vitest` authenticates to Docker Hub, builds the repo CLI, runs `scripts/install-openshell.sh`, and runs `npx vitest` with `NVIDIA_INFERENCE_API_KEY` in the environment. Mitigations include pinned actions, `persist-credentials: false`, `npm ci --ignore-scripts`, installer `env -u` unsets, isolated `DOCKER_CONFIG`, and cleanup.
  • Source-of-truth review needed for tolerant registry/session JSON parsing (test/e2e-scenario/live/upgrade-stale-sandbox-helpers.ts:68): The localized tolerant parsing behavior does not answer why malformed NemoClaw state should be handled here, where that invalid state is created, why the source cannot be fixed in this PR, what regression test proves the source cannot regress, or when the workaround can be removed.
    • Recommendation: Either make malformed existing state impossible/fail-fast at this boundary, or document the invalid state and add targeted tests proving the intended behavior for malformed `sandboxes.json` and `onboard-session.json`.
    • Evidence: `readJsonFile()` catches all exceptions and returns fallback; `writeStaleRegistryEntry()` uses it for both `SESSION_FILE` and `REGISTRY_FILE`.

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — Rebuilt sandbox `openclaw --version` contains the current expected OpenClaw version parsed from `Dockerfile.base` or the same source used by the build.. The changed behavior crosses workflow dispatch, Docker Hub authentication, installer execution, Docker image building, OpenShell sandbox lifecycle, host NemoClaw metadata mutation, live rebuild behavior, and artifact upload. Support tests validate workflow wiring but not the runtime behavior.
  • **Runtime validation** — Malformed `~/.nemoclaw/sandboxes.json` fails before seeding stale metadata with a diagnostic naming the corrupt file.. The changed behavior crosses workflow dispatch, Docker Hub authentication, installer execution, Docker image building, OpenShell sandbox lifecycle, host NemoClaw metadata mutation, live rebuild behavior, and artifact upload. Support tests validate workflow wiring but not the runtime behavior.
  • **Runtime validation** — Malformed `~/.nemoclaw/onboard-session.json` fails before seeding stale metadata with a diagnostic naming the corrupt file.. The changed behavior crosses workflow dispatch, Docker Hub authentication, installer execution, Docker image building, OpenShell sandbox lifecycle, host NemoClaw metadata mutation, live rebuild behavior, and artifact upload. Support tests validate workflow wiring but not the runtime behavior.
  • **Runtime validation** — `registered-stale-sandbox.json` contains only the synthetic `SANDBOX_NAME` entry or strips credential-shaped fields from pre-existing registry entries.. The changed behavior crosses workflow dispatch, Docker Hub authentication, installer execution, Docker image building, OpenShell sandbox lifecycle, host NemoClaw metadata mutation, live rebuild behavior, and artifact upload. Support tests validate workflow wiring but not the runtime behavior.
  • **Runtime validation** — A live `upgrade-stale-sandbox` run validates stale detection, rebuild success, old `2026.3.11` absence, current version match, and final `upgrade-sandboxes --check` up-to-date output.. The changed behavior crosses workflow dispatch, Docker Hub authentication, installer execution, Docker image building, OpenShell sandbox lifecycle, host NemoClaw metadata mutation, live rebuild behavior, and artifact upload. Support tests validate workflow wiring but not the runtime behavior.
  • **Source-of-truth review needed for tolerant registry/session JSON parsing** — Either make malformed existing state impossible/fail-fast at this boundary, or document the invalid state and add targeted tests proving the intended behavior for malformed `sandboxes.json` and `onboard-session.json`.
  • **Acceptance clause:** Linked issue clauses/comments — add test evidence or identify existing coverage. The deterministic validation context reported `linkedIssues: []`; no linked issue body clauses or issue comments were available to extract literally.
  • **Acceptance clause:** 9. Verify the sandbox now runs the current OpenClaw version — add test evidence or identify existing coverage. The test reads `openclaw --version` after rebuild and asserts only that the output does not contain `OLD_OPENCLAW_VERSION`; it does not assert against the current expected OpenClaw version source.
Since last review details

Current findings:

  • Assert the rebuilt sandbox runs the expected current OpenClaw version (test/e2e-scenario/live/upgrade-stale-sandbox.test.ts:147): The legacy migration contract includes “9. Verify the sandbox now runs the current OpenClaw version,” but the new Vitest test only proves the rebuilt sandbox no longer reports the old 2026.3.11 version. That can pass for any other non-current version and leaves the acceptance clause only partially covered.
    • Recommendation: Compare the rebuilt `openclaw --version` output against the current expected OpenClaw version source used by the build, such as the `ARG OPENCLAW_VERSION` value in `Dockerfile.base`, or explicitly narrow the migration contract if “not old” is the intended behavior.
    • Evidence: Legacy `test/e2e/test-upgrade-stale-sandbox.sh` lists clause 9 as “Verify the sandbox now runs the current OpenClaw version”; the new assertion is `expect(resultText(newVersion)).not.toContain(OLD_OPENCLAW_VERSION)`.
  • Avoid uploading the full local sandboxes registry (test/e2e-scenario/live/upgrade-stale-sandbox.test.ts:118): The test preserves any existing registry entries, adds the synthetic stale sandbox, and uploads the entire `~/.nemoclaw/sandboxes.json` file as `registered-stale-sandbox.json`. Clean CI runners are likely low-risk, but the live test supports local execution where pre-existing registry entries can contain legacy, corrupt, or credential-shaped fields. `ArtifactSink.writeText()` does not redact arbitrary text.
    • Recommendation: Write an artifact containing only the synthetic `SANDBOX_NAME` entry needed for evidence, or redact/strip credential-shaped fields before writing the artifact.
    • Evidence: `writeStaleRegistryEntry()` uses `registry.sandboxes = registry.sandboxes ?? {}` and inserts `registry.sandboxes[SANDBOX_NAME]`; `registeredStaleSandboxJson()` returns `fs.readFileSync(REGISTRY_FILE, "utf8")`; the test uploads that raw string via `artifacts.writeText("registered-stale-sandbox.json", registeredStaleSandboxJson())`.
  • Fail on malformed NemoClaw state JSON before seeding stale metadata (test/e2e-scenario/live/upgrade-stale-sandbox-helpers.ts:68): Missing state files are a valid fixture setup condition, but malformed or unreadable `sandboxes.json` / `onboard-session.json` is a different invalid state. The helper currently catches every read or parse failure and returns the fallback, which can silently replace corrupt local state and hide upstream install/onboard corruption.
    • Recommendation: Return the fallback only when the file does not exist. If the file exists but cannot be read or parsed, throw a diagnostic error naming the corrupt file, or document and test why corrupt local state is intentionally tolerated.
    • Evidence: `readJsonFile<T>()` catches all exceptions from `JSON.parse(fs.readFileSync(file, "utf8"))` and returns `fallback`; `writeStaleRegistryEntry()` uses this helper for both `SESSION_FILE` and `REGISTRY_FILE`.
  • Secret-bearing free-standing job expands the trusted-code boundary (.github/workflows/e2e-vitest-scenarios.yaml:1762): The new manually dispatched job runs branch-controlled installer, build, and Vitest code with Docker access, Docker Hub authentication, and `NVIDIA_INFERENCE_API_KEY` available to the test step. The patch adds useful mitigations, but this remains a high-risk trusted-code path.
    • Recommendation: Keep this workflow manual-only and restricted to trusted maintainer dispatch, preserve the boundary validator coverage for secret placement and installer env unsets, and avoid adding `pull_request` / `pull_request_target` triggers or direct dispatch-input interpolation to this job.
    • Evidence: `upgrade-stale-sandbox-vitest` authenticates to Docker Hub, builds the repo CLI, runs `scripts/install-openshell.sh`, and runs `npx vitest` with `NVIDIA_INFERENCE_API_KEY` in the environment. Mitigations include pinned actions, `persist-credentials: false`, `npm ci --ignore-scripts`, installer `env -u` unsets, isolated `DOCKER_CONFIG`, and cleanup.
  • Source-of-truth review needed for tolerant registry/session JSON parsing (test/e2e-scenario/live/upgrade-stale-sandbox-helpers.ts:68): The localized tolerant parsing behavior does not answer why malformed NemoClaw state should be handled here, where that invalid state is created, why the source cannot be fixed in this PR, what regression test proves the source cannot regress, or when the workaround can be removed.
    • Recommendation: Either make malformed existing state impossible/fail-fast at this boundary, or document the invalid state and add targeted tests proving the intended behavior for malformed `sandboxes.json` and `onboard-session.json`.
    • Evidence: `readJsonFile()` catches all exceptions and returns fallback; `writeStaleRegistryEntry()` uses it for both `SESSION_FILE` and `REGISTRY_FILE`.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/e2e-scenario/live/upgrade-stale-sandbox.test.ts (1)

221-224: 💤 Low value

Consider validating the reason for exit code 1.

Accepting exit code 1 blindly could mask unexpected failures (e.g., permission errors, network issues). If exit code 1 specifically indicates "sandbox not found," consider asserting on the output to confirm.

-    expect(
-      deleteInstalledSandbox.exitCode === 0 || deleteInstalledSandbox.exitCode === 1,
-      resultText(deleteInstalledSandbox),
-    ).toBe(true);
+    if (deleteInstalledSandbox.exitCode !== 0) {
+      // Exit code 1 is expected when sandbox doesn't exist
+      expect(deleteInstalledSandbox.exitCode, resultText(deleteInstalledSandbox)).toBe(1);
+      expect(resultText(deleteInstalledSandbox)).toMatch(/not found|does not exist/i);
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/live/upgrade-stale-sandbox.test.ts` around lines 221 - 224,
The test for deleteInstalledSandbox accepts exit code 1 without validating the
actual reason for failure, which could mask unexpected errors like permission
issues or network problems. Modify the expectation to first check if exitCode is
0, or if it is 1, then validate the output or error message using resultText or
similar output inspection to confirm the failure reason is specifically "sandbox
not found" before allowing the test to pass. This ensures only expected failures
are accepted while catching other types of errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/e2e-vitest-scenarios.yaml:
- Around line 1698-1799: The free-standing workflow inventory file is missing
three required entries that will cause CI failures during the generate-matrix
validation step. Open tools/e2e-scenarios/free-standing-workflow-inventory.mts
and add upgrade-stale-sandbox-vitest to the allowed_jobs array,
upgrade-stale-sandbox to the free_standing_scenarios_csv string, and create the
mapping entry in free_standing_scenario_jobs_csv to connect the scenario to its
job. Additionally, on line 1781 in the upgrade-stale-sandbox-vitest job, the npx
vitest run command has excessive inline whitespace; refactor it to use backslash
line continuation for better readability and consistency with other multi-line
commands in the workflow.

---

Nitpick comments:
In `@test/e2e-scenario/live/upgrade-stale-sandbox.test.ts`:
- Around line 221-224: The test for deleteInstalledSandbox accepts exit code 1
without validating the actual reason for failure, which could mask unexpected
errors like permission issues or network problems. Modify the expectation to
first check if exitCode is 0, or if it is 1, then validate the output or error
message using resultText or similar output inspection to confirm the failure
reason is specifically "sandbox not found" before allowing the test to pass.
This ensures only expected failures are accepted while catching other types of
errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c8259484-c53b-4400-bf1d-f1bc59e1c496

📥 Commits

Reviewing files that changed from the base of the PR and between c4bd014 and 8aefe40.

📒 Files selected for processing (2)
  • .github/workflows/e2e-vitest-scenarios.yaml
  • test/e2e-scenario/live/upgrade-stale-sandbox.test.ts

Comment on lines +1698 to +1799
upgrade-stale-sandbox-vitest:
needs: generate-matrix
if: ${{ (inputs.jobs == '' && inputs.scenarios == '') || contains(format(',{0},', inputs.jobs), ',upgrade-stale-sandbox-vitest,') || contains(format(',{0},', inputs.scenarios), ',upgrade-stale-sandbox,') }}
runs-on: ubuntu-latest
timeout-minutes: 55
env:
FREE_STANDING_VITEST_JOB: "1"
FREE_STANDING_SCENARIO_ID: "upgrade-stale-sandbox"
E2E_ARTIFACT_DIR: ${{ github.workspace }}/e2e-artifacts/vitest/upgrade-stale-sandbox
NEMOCLAW_CLI_BIN: ${{ github.workspace }}/bin/nemoclaw.js
NEMOCLAW_RUN_E2E_SCENARIOS: "1"
NEMOCLAW_NON_INTERACTIVE: "1"
NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE: "1"
NEMOCLAW_SANDBOX_NAME: "e2e-upgrade-stale"
OPENSHELL_GATEWAY: "nemoclaw"
steps:
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
with:
persist-credentials: false

- name: Configure isolated Docker auth directory
run: echo "DOCKER_CONFIG=${RUNNER_TEMP}/docker-config-upgrade-stale-sandbox" >> "$GITHUB_ENV"

- name: Authenticate to Docker Hub
env:
DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}
DOCKERHUB_TOKEN: ${{ secrets.DOCKERHUB_TOKEN }}
shell: bash
run: |
set -euo pipefail
if [[ -z "${DOCKERHUB_USERNAME}" || -z "${DOCKERHUB_TOKEN}" ]]; then
echo "::notice::Docker Hub credentials not configured; continuing with anonymous pulls."
exit 0
fi
mkdir -p "${DOCKER_CONFIG}"
chmod 700 "${DOCKER_CONFIG}"
login_succeeded=0
for attempt in 1 2 3; do
if echo "${DOCKERHUB_TOKEN}" | timeout 30s docker login docker.io --username "${DOCKERHUB_USERNAME}" --password-stdin; then
login_succeeded=1
break
fi
if [[ "$attempt" -lt 3 ]]; then
echo "::warning::Docker Hub login attempt ${attempt} failed; retrying."
sleep 5
fi
done
if [[ "$login_succeeded" -ne 1 ]]; then
echo "::warning::Docker Hub login failed after 3 attempts; continuing with anonymous pulls."
fi

- name: Set up Node
uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e # v6.0.0
with:
node-version: 22
cache: npm

- name: Install root dependencies
run: npm ci --ignore-scripts

- name: Build CLI
run: npm run build:cli

- name: Install OpenShell CLI
run: bash scripts/install-openshell.sh

- name: Run upgrade stale sandbox live Vitest test
env:
NVIDIA_INFERENCE_API_KEY: ${{ secrets.NVIDIA_INFERENCE_API_KEY }}
run: |
set -euo pipefail
export PATH="$HOME/.local/bin:$HOME/.npm-global/bin:$PATH"
if command -v openshell >/dev/null 2>&1; then
OPENSHELL_BIN="$(command -v openshell)"
elif [ -x "$HOME/.local/bin/openshell" ]; then
OPENSHELL_BIN="$HOME/.local/bin/openshell"
else
echo "::error::OpenShell CLI not found after install"
ls -la /usr/local/bin/openshell "$HOME/.local/bin/openshell" 2>&1 || true
exit 1
fi
export OPENSHELL_BIN
"$OPENSHELL_BIN" --version
npx vitest run --project e2e-scenarios-live test/e2e-scenario/live/upgrade-stale-sandbox.test.ts --silent=false --reporter=default

- name: Upload upgrade stale sandbox artifacts
if: always()
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: e2e-vitest-scenarios-upgrade-stale-sandbox
path: e2e-artifacts/vitest/upgrade-stale-sandbox/
include-hidden-files: false
if-no-files-found: ignore
retention-days: 14

- name: Clean up Docker auth
if: always()
run: |
set -euo pipefail
docker logout docker.io || true
rm -rf "${DOCKER_CONFIG}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the free-standing inventory includes the new job and scenario
echo "=== Checking free-standing inventory for upgrade-stale-sandbox ==="
if fd -t f 'free-standing-workflow-inventory' tools/; then
  inventory_file=$(fd -t f 'free-standing-workflow-inventory' tools/ | head -1)
  echo "Found inventory file: $inventory_file"
  echo ""
  echo "--- Checking for job id ---"
  rg -n 'upgrade-stale-sandbox-vitest' "$inventory_file" || echo "WARNING: Job not found in inventory"
  echo ""
  echo "--- Checking for scenario id ---"
  rg -n 'upgrade-stale-sandbox[^-]' "$inventory_file" || echo "WARNING: Scenario not found in inventory"
else
  echo "WARNING: Could not find free-standing-workflow-inventory file"
fi

Repository: NVIDIA/NemoClaw

Length of output: 402


The new job is well-structured, but the free-standing inventory file is incomplete.

The upgrade-stale-sandbox-vitest job correctly implements isolated Docker auth, retry logic, and artifact handling. However, verification confirms that tools/e2e-scenarios/free-standing-workflow-inventory.mts is missing the required entries:

  • upgrade-stale-sandbox-vitest not in allowed_jobs
  • upgrade-stale-sandbox not in free_standing_scenarios_csv
  • Mapping not in free_standing_scenario_jobs_csv

Since the workflow's generate-matrix step validates inventory coverage, these missing entries will cause CI failures. Add all three required entries to the inventory file before merging.

Minor style note: Line 1781 has excessive inline whitespace in the vitest command. Consider using backslash line continuation for consistency.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/e2e-vitest-scenarios.yaml around lines 1698 - 1799, The
free-standing workflow inventory file is missing three required entries that
will cause CI failures during the generate-matrix validation step. Open
tools/e2e-scenarios/free-standing-workflow-inventory.mts and add
upgrade-stale-sandbox-vitest to the allowed_jobs array, upgrade-stale-sandbox to
the free_standing_scenarios_csv string, and create the mapping entry in
free_standing_scenario_jobs_csv to connect the scenario to its job.
Additionally, on line 1781 in the upgrade-stale-sandbox-vitest job, the npx
vitest run command has excessive inline whitespace; refactor it to use backslash
line continuation for better readability and consistency with other multi-line
commands in the workflow.

@cv cv added the v0.0.66 Release target label Jun 18, 2026
cv added 11 commits June 19, 2026 11:18
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
…andbox' into e2e-migrate/test-upgrade-stale-sandbox
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
…andbox' into e2e-migrate/test-upgrade-stale-sandbox

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv cv merged commit c9953e9 into main Jun 20, 2026
40 checks passed
@cv cv deleted the e2e-migrate/test-upgrade-stale-sandbox branch June 20, 2026 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v0.0.66 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants