test(e2e): migrate upgrade stale sandbox to vitest by cv · Pull Request #5542 · NVIDIA/NemoClaw

cv · 2026-06-18T02:19:39Z

Summary

Migrates test/e2e/test-upgrade-stale-sandbox.sh into a typed live Vitest scenario. The new coverage preserves the #1904 stale sandbox upgrade contract through real install.sh, Docker image build, OpenShell sandbox creation, upgrade-sandboxes --check, and rebuild boundaries.

Related Issue

Refs #5098

Changes

Add test/e2e-scenario/live/upgrade-stale-sandbox.test.ts with typed artifact capture, cleanup, and secret redaction.
Wire upgrade-stale-sandbox-vitest into .github/workflows/e2e-vitest-scenarios.yaml as a free-standing dispatchable Vitest job.
Preserve legacy shell deletion for Phase 11 cleanup per the migration governance in Epic: Migrate legacy bash E2E into the Vitest E2E system #5098.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

Targeted commands run:

npx biome check --write test/e2e-scenario/live/upgrade-stale-sandbox.test.ts
NEMOCLAW_RUN_E2E_SCENARIOS=1 npx vitest run --project e2e-scenarios-live test/e2e-scenario/live/upgrade-stale-sandbox.test.ts -t __compile_only_nomatch__ --silent=false --reporter=default --passWithNoTests
npx vitest run --project e2e-vitest-support test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts
npx tsx scripts/check-test-file-size-budget.ts test/e2e-scenario/live/upgrade-stale-sandbox.test.ts

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

Tests
- Added comprehensive end-to-end test coverage for sandbox upgrade scenarios, including environment configuration, sandbox creation, and version verification workflows.
Chores
- Enhanced CI/CD workflow with automated testing infrastructure for upgrade-stale-sandbox scenarios.
- Updated workflow job dependencies to include new upgrade testing results in PR reporting.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

coderabbitai · 2026-06-18T02:19:53Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a new live Vitest E2E test (upgrade-stale-sandbox.test.ts) that provisions a sandbox with a pinned legacy OpenClaw base image, writes stale NemoClaw host metadata, verifies stale detection via upgrade-sandboxes --check, rebuilds the sandbox, and confirms it becomes up-to-date. Wires the test into a new upgrade-stale-sandbox-vitest GitHub Actions job with Docker auth isolation, and adds a workflow boundary validator enforcing all step-level constraints.

Changes

Stale Sandbox Upgrade E2E Test

Layer / File(s)	Summary
Test constants, helpers, and stale-state setup `test/e2e-scenario/live/upgrade-stale-sandbox.test.ts`	Defines imports, sandbox naming, legacy OpenClaw version constant, NemoClaw registry/session file paths, timing config, `commandEnv` helper, best-effort cleanup, JSON persistence, file snapshot/restore helpers, `createOldBaseBuildContext` (rewrites `min_openclaw_version`), and `writeStaleRegistryEntry` (mutates host registry/session metadata).
Live scenario test flow `test/e2e-scenario/live/upgrade-stale-sandbox.test.ts`	Full live test: skip/fail gating on Docker availability, snapshot/restore hooks, best-effort pre-cleanup, NemoClaw install with retry, old-base Docker image build, old OpenShell sandbox creation with readiness polling and version assertion, stale metadata write, `upgrade-sandboxes --check` assertion, rebuild with readiness polling, post-rebuild version validation, and final up-to-date check.
GitHub Actions job and report-to-pr wiring `.github/workflows/e2e-vitest-scenarios.yaml`	Adds `upgrade-stale-sandbox-vitest` job with job-scoped `DOCKER_CONFIG` isolation, Docker Hub login (3-retry/anonymous-skip), dependency install, CLI build, OpenShell install with secret unsets and `OPENSHELL_BIN` resolution, Vitest run, artifact upload, and Docker auth cleanup; adds job to `report-to-pr` needs.
Workflow boundary validator and dispatch selector tests `tools/e2e-scenarios/workflow-boundary.mts`, `test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts`	Introduces `validateUpgradeStaleSandboxVitestJob` enforcing runner, timeout, env, secret non-exposure, Docker auth steps, Vitest run, and artifact upload; wires it into `validateE2eVitestScenariosWorkflowBoundary`; extends dispatch selector and free-standing inventory tests for the new scenario/job.

Sequence Diagram(s)

sequenceDiagram
    rect rgba(173, 216, 230, 0.5)
        Note over Test,Registry: Setup and stale state creation
    end
    participant Test as upgrade-stale-sandbox test
    participant Docker as Docker
    participant OpenShell as OpenShell
    participant NemoClaw as nemoclaw CLI
    participant Registry as NemoClaw Registry JSON

    Test->>NemoClaw: install.sh --non-interactive (retry on transient failure)
    Test->>Docker: docker build (OLD_OPENCLAW_VERSION base image)
    Test->>OpenShell: create sandbox from fixture Dockerfile
    OpenShell-->>Test: sandbox ready (readiness poll)
    Test->>OpenShell: assert old OpenClaw version inside sandbox
    Test->>Registry: writeStaleRegistryEntry (agentVersion = old version)

    rect rgba(144, 238, 144, 0.5)
        Note over Test,NemoClaw: Stale detection and rebuild
    end
    Test->>NemoClaw: upgrade-sandboxes --check
    NemoClaw-->>Test: sandbox flagged as stale
    Test->>NemoClaw: nemoclaw rebuild --yes
    NemoClaw-->>Test: rebuild complete
    Test->>OpenShell: poll for readiness
    Test->>OpenShell: assert new OpenClaw version (old version absent)
    Test->>NemoClaw: upgrade-sandboxes --check
    NemoClaw-->>Test: sandbox reported as up to date

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

NVIDIA/NemoClaw#5150: Adds a free-standing *-vitest GitHub Actions job and extends the same workflow boundary/selector validation machinery in tools/e2e-scenarios/workflow-boundary.mts and e2e-scenarios-workflow.test.ts.
NVIDIA/NemoClaw#5333: Extends the same .github/workflows/e2e-vitest-scenarios.yaml and tools/e2e-scenarios/workflow-boundary.mts infrastructure with another free-standing Vitest job and report-to-pr wiring for a different sandbox rebuild scenario.
NVIDIA/NemoClaw#5346: Also adds a per-scenario job validator wired into validateE2eVitestScenariosWorkflowBoundary and updates the corresponding workflow boundary tests, mirroring this PR's structural pattern.

Suggested labels

area: e2e, chore

🐇 A sandbox grew old, its agent version stale,
So nemoclaw checked and found it beyond the pale.
It rebuilt anew with a fresh OpenClaw inside,
"Up to date!" said the check — the bunny beamed with pride.
From old Docker layers to green CI runs at last,
The upgrade test passes — the stale days are past! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main change: migrating the upgrade stale sandbox E2E test from bash to Vitest/TypeScript, which aligns with the primary objective of the pull request.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch e2e-migrate/test-upgrade-stale-sandbox

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-code-quality · 2026-06-18T02:20:14Z

Code Coverage Overview

Languages: TypeScript

TypeScript / code-coverage/plugin

The overall coverage in the e2e-migrate/test-upg... branch is 96%. Coverage data for the main branch is not yet available.

Show a code coverage summary of the most covered files.

File	main	e2e-migrate/test-upg... `70cfd1e`	+/-
`nemoclaw/src/se...cret-scanner.ts`	—	100%	—
`nemoclaw/src/commands/slash.ts`	—	100%	—
`nemoclaw/src/li...bprocess-env.ts`	—	100%	—
`nemoclaw/src/bl...eprint/state.ts`	—	98%	—
`nemoclaw/src/onboard/config.ts`	—	98%	—
`nemoclaw/src/bl...int/snapshot.ts`	—	97%	—
`nemoclaw/src/bl...print/runner.ts`	—	95%	—
`nemoclaw/src/co...ration-state.ts`	—	94%	—
`nemoclaw/src/bl...ate-networks.ts`	—	94%	—
`nemoclaw/src/index.ts`	—	94%	—

TypeScript / code-coverage/cli

The overall coverage in the e2e-migrate/test-upg... branch is 46%. Coverage data for the main branch is not yet available.

Show a code coverage summary of the most covered files.

File	main	e2e-migrate/test-upg... `70cfd1e`	+/-
`src/lib/state/o...oard-session.ts`	—	90%	—
`src/lib/inference/local.ts`	—	76%	—
`src/lib/sandbox/config.ts`	—	72%	—
`src/lib/actions...dbox/rebuild.ts`	—	67%	—
`src/lib/onboard/preflight.ts`	—	64%	—
`src/lib/actions...licy-channel.ts`	—	56%	—
`src/lib/state/sandbox.ts`	—	55%	—
`src/lib/policy/index.ts`	—	49%	—
`src/lib/onboard...er-gpu-patch.ts`	—	44%	—
`src/lib/onboard.ts`	—	18%	—

_{Updated June 20, 2026 17:23 UTC
Code Coverage is in Public Preview. Learn more and provide us with your feedback.}

github-actions · 2026-06-18T02:21:03Z

E2E Advisor Recommendation

Required E2E: upgrade-stale-sandbox-vitest
Optional E2E: sandbox-rebuild-vitest, state-backup-restore-vitest

Dispatch hint: upgrade-stale-sandbox-vitest

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

upgrade-stale-sandbox-vitest (high): This PR introduces the upgrade-stale-sandbox live E2E job and test. It should run to validate the new workflow wiring, OpenShell/Docker setup, install/onboard phase, stale sandbox detection, rebuild path, and post-upgrade check.

Optional E2E

sandbox-rebuild-vitest (high): Adjacent confidence check for NemoClaw sandbox rebuild behavior, which is central to the new stale sandbox upgrade scenario but not directly changed in runtime code.
state-backup-restore-vitest (high): Useful adjacent lifecycle/state-registry coverage because the new helper snapshots and restores NemoClaw state files and manipulates sandbox registry metadata.

New E2E recommendations

None.

Dispatch hint

Workflow: .github/workflows/e2e-vitest-scenarios.yaml
jobs input: upgrade-stale-sandbox-vitest

github-actions · 2026-06-18T02:21:04Z

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: upgrade-stale-sandbox-vitest
Optional Vitest E2E scenarios: None

Dispatch required Vitest E2E scenarios:

gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field jobs=upgrade-stale-sandbox-vitest

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required Vitest E2E scenarios

upgrade-stale-sandbox-vitest: Focused free-standing Vitest job wired for changed live test test/e2e-scenario/live/upgrade-stale-sandbox.test.ts.
- Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field jobs=upgrade-stale-sandbox-vitest

Optional Vitest E2E scenarios

None.

Relevant changed files

.github/workflows/e2e-vitest-scenarios.yaml
test/e2e-scenario/live/upgrade-stale-sandbox-helpers.ts
test/e2e-scenario/live/upgrade-stale-sandbox.test.ts
test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts
tools/e2e-scenarios/workflow-boundary.mts

github-actions · 2026-06-18T02:23:24Z

PR Review Advisor

Findings: 1 needs attention, 4 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 5 still apply, 0 new items found

Review findings

🛠️ Needs attention

Assert the rebuilt sandbox runs the expected current OpenClaw version (test/e2e-scenario/live/upgrade-stale-sandbox.test.ts:147): The legacy migration contract includes “9. Verify the sandbox now runs the current OpenClaw version,” but the new Vitest test only proves the rebuilt sandbox no longer reports the old 2026.3.11 version. That can pass for any other non-current version and leaves the acceptance clause only partially covered.
- Recommendation: Compare the rebuilt `openclaw --version` output against the current expected OpenClaw version source used by the build, such as the `ARG OPENCLAW_VERSION` value in `Dockerfile.base`, or explicitly narrow the migration contract if “not old” is the intended behavior.
- Evidence: Legacy `test/e2e/test-upgrade-stale-sandbox.sh` lists clause 9 as “Verify the sandbox now runs the current OpenClaw version”; the new assertion is `expect(resultText(newVersion)).not.toContain(OLD_OPENCLAW_VERSION)`.

🔎 Worth checking

Avoid uploading the full local sandboxes registry (test/e2e-scenario/live/upgrade-stale-sandbox.test.ts:118): The test preserves any existing registry entries, adds the synthetic stale sandbox, and uploads the entire `~/.nemoclaw/sandboxes.json` file as `registered-stale-sandbox.json`. Clean CI runners are likely low-risk, but the live test supports local execution where pre-existing registry entries can contain legacy, corrupt, or credential-shaped fields. `ArtifactSink.writeText()` does not redact arbitrary text.
- Recommendation: Write an artifact containing only the synthetic `SANDBOX_NAME` entry needed for evidence, or redact/strip credential-shaped fields before writing the artifact.
- Evidence: `writeStaleRegistryEntry()` uses `registry.sandboxes = registry.sandboxes ?? {}` and inserts `registry.sandboxes[SANDBOX_NAME]`; `registeredStaleSandboxJson()` returns `fs.readFileSync(REGISTRY_FILE, "utf8")`; the test uploads that raw string via `artifacts.writeText("registered-stale-sandbox.json", registeredStaleSandboxJson())`.
Fail on malformed NemoClaw state JSON before seeding stale metadata (test/e2e-scenario/live/upgrade-stale-sandbox-helpers.ts:68): Missing state files are a valid fixture setup condition, but malformed or unreadable `sandboxes.json` / `onboard-session.json` is a different invalid state. The helper currently catches every read or parse failure and returns the fallback, which can silently replace corrupt local state and hide upstream install/onboard corruption.
- Recommendation: Return the fallback only when the file does not exist. If the file exists but cannot be read or parsed, throw a diagnostic error naming the corrupt file, or document and test why corrupt local state is intentionally tolerated.
- Evidence: `readJsonFile<T>()` catches all exceptions from `JSON.parse(fs.readFileSync(file, "utf8"))` and returns `fallback`; `writeStaleRegistryEntry()` uses this helper for both `SESSION_FILE` and `REGISTRY_FILE`.
Secret-bearing free-standing job expands the trusted-code boundary (.github/workflows/e2e-vitest-scenarios.yaml:1762): The new manually dispatched job runs branch-controlled installer, build, and Vitest code with Docker access, Docker Hub authentication, and `NVIDIA_INFERENCE_API_KEY` available to the test step. The patch adds useful mitigations, but this remains a high-risk trusted-code path.
- Recommendation: Keep this workflow manual-only and restricted to trusted maintainer dispatch, preserve the boundary validator coverage for secret placement and installer env unsets, and avoid adding `pull_request` / `pull_request_target` triggers or direct dispatch-input interpolation to this job.
- Evidence: `upgrade-stale-sandbox-vitest` authenticates to Docker Hub, builds the repo CLI, runs `scripts/install-openshell.sh`, and runs `npx vitest` with `NVIDIA_INFERENCE_API_KEY` in the environment. Mitigations include pinned actions, `persist-credentials: false`, `npm ci --ignore-scripts`, installer `env -u` unsets, isolated `DOCKER_CONFIG`, and cleanup.
Source-of-truth review needed for tolerant registry/session JSON parsing (test/e2e-scenario/live/upgrade-stale-sandbox-helpers.ts:68): The localized tolerant parsing behavior does not answer why malformed NemoClaw state should be handled here, where that invalid state is created, why the source cannot be fixed in this PR, what regression test proves the source cannot regress, or when the workaround can be removed.
- Recommendation: Either make malformed existing state impossible/fail-fast at this boundary, or document the invalid state and add targeted tests proving the intended behavior for malformed `sandboxes.json` and `onboard-session.json`.
- Evidence: `readJsonFile()` catches all exceptions and returns fallback; `writeStaleRegistryEntry()` uses it for both `SESSION_FILE` and `REGISTRY_FILE`.

🌱 Nice ideas

None.

Consider writing more tests for

**Runtime validation** — Rebuilt sandbox `openclaw --version` contains the current expected OpenClaw version parsed from `Dockerfile.base` or the same source used by the build.. The changed behavior crosses workflow dispatch, Docker Hub authentication, installer execution, Docker image building, OpenShell sandbox lifecycle, host NemoClaw metadata mutation, live rebuild behavior, and artifact upload. Support tests validate workflow wiring but not the runtime behavior.
**Runtime validation** — Malformed `~/.nemoclaw/sandboxes.json` fails before seeding stale metadata with a diagnostic naming the corrupt file.. The changed behavior crosses workflow dispatch, Docker Hub authentication, installer execution, Docker image building, OpenShell sandbox lifecycle, host NemoClaw metadata mutation, live rebuild behavior, and artifact upload. Support tests validate workflow wiring but not the runtime behavior.
**Runtime validation** — Malformed `~/.nemoclaw/onboard-session.json` fails before seeding stale metadata with a diagnostic naming the corrupt file.. The changed behavior crosses workflow dispatch, Docker Hub authentication, installer execution, Docker image building, OpenShell sandbox lifecycle, host NemoClaw metadata mutation, live rebuild behavior, and artifact upload. Support tests validate workflow wiring but not the runtime behavior.
**Runtime validation** — `registered-stale-sandbox.json` contains only the synthetic `SANDBOX_NAME` entry or strips credential-shaped fields from pre-existing registry entries.. The changed behavior crosses workflow dispatch, Docker Hub authentication, installer execution, Docker image building, OpenShell sandbox lifecycle, host NemoClaw metadata mutation, live rebuild behavior, and artifact upload. Support tests validate workflow wiring but not the runtime behavior.
**Runtime validation** — A live `upgrade-stale-sandbox` run validates stale detection, rebuild success, old `2026.3.11` absence, current version match, and final `upgrade-sandboxes --check` up-to-date output.. The changed behavior crosses workflow dispatch, Docker Hub authentication, installer execution, Docker image building, OpenShell sandbox lifecycle, host NemoClaw metadata mutation, live rebuild behavior, and artifact upload. Support tests validate workflow wiring but not the runtime behavior.
**Source-of-truth review needed for tolerant registry/session JSON parsing** — Either make malformed existing state impossible/fail-fast at this boundary, or document the invalid state and add targeted tests proving the intended behavior for malformed `sandboxes.json` and `onboard-session.json`.
**Acceptance clause:** Linked issue clauses/comments — add test evidence or identify existing coverage. The deterministic validation context reported `linkedIssues: []`; no linked issue body clauses or issue comments were available to extract literally.
**Acceptance clause:** 9. Verify the sandbox now runs the current OpenClaw version — add test evidence or identify existing coverage. The test reads `openclaw --version` after rebuild and asserts only that the output does not contain `OLD_OPENCLAW_VERSION`; it does not assert against the current expected OpenClaw version source.

Since last review details

Current findings:

Assert the rebuilt sandbox runs the expected current OpenClaw version (test/e2e-scenario/live/upgrade-stale-sandbox.test.ts:147): The legacy migration contract includes “9. Verify the sandbox now runs the current OpenClaw version,” but the new Vitest test only proves the rebuilt sandbox no longer reports the old 2026.3.11 version. That can pass for any other non-current version and leaves the acceptance clause only partially covered.
- Recommendation: Compare the rebuilt `openclaw --version` output against the current expected OpenClaw version source used by the build, such as the `ARG OPENCLAW_VERSION` value in `Dockerfile.base`, or explicitly narrow the migration contract if “not old” is the intended behavior.
- Evidence: Legacy `test/e2e/test-upgrade-stale-sandbox.sh` lists clause 9 as “Verify the sandbox now runs the current OpenClaw version”; the new assertion is `expect(resultText(newVersion)).not.toContain(OLD_OPENCLAW_VERSION)`.
Avoid uploading the full local sandboxes registry (test/e2e-scenario/live/upgrade-stale-sandbox.test.ts:118): The test preserves any existing registry entries, adds the synthetic stale sandbox, and uploads the entire `~/.nemoclaw/sandboxes.json` file as `registered-stale-sandbox.json`. Clean CI runners are likely low-risk, but the live test supports local execution where pre-existing registry entries can contain legacy, corrupt, or credential-shaped fields. `ArtifactSink.writeText()` does not redact arbitrary text.
- Recommendation: Write an artifact containing only the synthetic `SANDBOX_NAME` entry needed for evidence, or redact/strip credential-shaped fields before writing the artifact.
- Evidence: `writeStaleRegistryEntry()` uses `registry.sandboxes = registry.sandboxes ?? {}` and inserts `registry.sandboxes[SANDBOX_NAME]`; `registeredStaleSandboxJson()` returns `fs.readFileSync(REGISTRY_FILE, "utf8")`; the test uploads that raw string via `artifacts.writeText("registered-stale-sandbox.json", registeredStaleSandboxJson())`.
Fail on malformed NemoClaw state JSON before seeding stale metadata (test/e2e-scenario/live/upgrade-stale-sandbox-helpers.ts:68): Missing state files are a valid fixture setup condition, but malformed or unreadable `sandboxes.json` / `onboard-session.json` is a different invalid state. The helper currently catches every read or parse failure and returns the fallback, which can silently replace corrupt local state and hide upstream install/onboard corruption.
- Recommendation: Return the fallback only when the file does not exist. If the file exists but cannot be read or parsed, throw a diagnostic error naming the corrupt file, or document and test why corrupt local state is intentionally tolerated.
- Evidence: `readJsonFile<T>()` catches all exceptions from `JSON.parse(fs.readFileSync(file, "utf8"))` and returns `fallback`; `writeStaleRegistryEntry()` uses this helper for both `SESSION_FILE` and `REGISTRY_FILE`.
Secret-bearing free-standing job expands the trusted-code boundary (.github/workflows/e2e-vitest-scenarios.yaml:1762): The new manually dispatched job runs branch-controlled installer, build, and Vitest code with Docker access, Docker Hub authentication, and `NVIDIA_INFERENCE_API_KEY` available to the test step. The patch adds useful mitigations, but this remains a high-risk trusted-code path.
- Recommendation: Keep this workflow manual-only and restricted to trusted maintainer dispatch, preserve the boundary validator coverage for secret placement and installer env unsets, and avoid adding `pull_request` / `pull_request_target` triggers or direct dispatch-input interpolation to this job.
- Evidence: `upgrade-stale-sandbox-vitest` authenticates to Docker Hub, builds the repo CLI, runs `scripts/install-openshell.sh`, and runs `npx vitest` with `NVIDIA_INFERENCE_API_KEY` in the environment. Mitigations include pinned actions, `persist-credentials: false`, `npm ci --ignore-scripts`, installer `env -u` unsets, isolated `DOCKER_CONFIG`, and cleanup.
Source-of-truth review needed for tolerant registry/session JSON parsing (test/e2e-scenario/live/upgrade-stale-sandbox-helpers.ts:68): The localized tolerant parsing behavior does not answer why malformed NemoClaw state should be handled here, where that invalid state is created, why the source cannot be fixed in this PR, what regression test proves the source cannot regress, or when the workaround can be removed.
- Recommendation: Either make malformed existing state impossible/fail-fast at this boundary, or document the invalid state and add targeted tests proving the intended behavior for malformed `sandboxes.json` and `onboard-session.json`.
- Evidence: `readJsonFile()` catches all exceptions and returns fallback; `writeStaleRegistryEntry()` uses it for both `SESSION_FILE` and `REGISTRY_FILE`.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

test/e2e-scenario/live/upgrade-stale-sandbox.test.ts (1)

221-224: 💤 Low value

Consider validating the reason for exit code 1.

Accepting exit code 1 blindly could mask unexpected failures (e.g., permission errors, network issues). If exit code 1 specifically indicates "sandbox not found," consider asserting on the output to confirm.

-    expect(
-      deleteInstalledSandbox.exitCode === 0 || deleteInstalledSandbox.exitCode === 1,
-      resultText(deleteInstalledSandbox),
-    ).toBe(true);
+    if (deleteInstalledSandbox.exitCode !== 0) {
+      // Exit code 1 is expected when sandbox doesn't exist
+      expect(deleteInstalledSandbox.exitCode, resultText(deleteInstalledSandbox)).toBe(1);
+      expect(resultText(deleteInstalledSandbox)).toMatch(/not found|does not exist/i);
+    }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/live/upgrade-stale-sandbox.test.ts` around lines 221 - 224,
The test for deleteInstalledSandbox accepts exit code 1 without validating the
actual reason for failure, which could mask unexpected errors like permission
issues or network problems. Modify the expectation to first check if exitCode is
0, or if it is 1, then validate the output or error message using resultText or
similar output inspection to confirm the failure reason is specifically "sandbox
not found" before allowing the test to pass. This ensures only expected failures
are accepted while catching other types of errors.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/e2e-vitest-scenarios.yaml:
- Around line 1698-1799: The free-standing workflow inventory file is missing
three required entries that will cause CI failures during the generate-matrix
validation step. Open tools/e2e-scenarios/free-standing-workflow-inventory.mts
and add upgrade-stale-sandbox-vitest to the allowed_jobs array,
upgrade-stale-sandbox to the free_standing_scenarios_csv string, and create the
mapping entry in free_standing_scenario_jobs_csv to connect the scenario to its
job. Additionally, on line 1781 in the upgrade-stale-sandbox-vitest job, the npx
vitest run command has excessive inline whitespace; refactor it to use backslash
line continuation for better readability and consistency with other multi-line
commands in the workflow.

---

Nitpick comments:
In `@test/e2e-scenario/live/upgrade-stale-sandbox.test.ts`:
- Around line 221-224: The test for deleteInstalledSandbox accepts exit code 1
without validating the actual reason for failure, which could mask unexpected
errors like permission issues or network problems. Modify the expectation to
first check if exitCode is 0, or if it is 1, then validate the output or error
message using resultText or similar output inspection to confirm the failure
reason is specifically "sandbox not found" before allowing the test to pass.
This ensures only expected failures are accepted while catching other types of
errors.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c8259484-c53b-4400-bf1d-f1bc59e1c496

📥 Commits

Reviewing files that changed from the base of the PR and between c4bd014 and 8aefe40.

📒 Files selected for processing (2)

.github/workflows/e2e-vitest-scenarios.yaml
test/e2e-scenario/live/upgrade-stale-sandbox.test.ts

coderabbitai · 2026-06-18T02:26:04Z

+  upgrade-stale-sandbox-vitest:
+    needs: generate-matrix
+    if: ${{ (inputs.jobs == '' && inputs.scenarios == '') || contains(format(',{0},', inputs.jobs), ',upgrade-stale-sandbox-vitest,') || contains(format(',{0},', inputs.scenarios), ',upgrade-stale-sandbox,') }}
+    runs-on: ubuntu-latest
+    timeout-minutes: 55
+    env:
+      FREE_STANDING_VITEST_JOB: "1"
+      FREE_STANDING_SCENARIO_ID: "upgrade-stale-sandbox"
+      E2E_ARTIFACT_DIR: ${{ github.workspace }}/e2e-artifacts/vitest/upgrade-stale-sandbox
+      NEMOCLAW_CLI_BIN: ${{ github.workspace }}/bin/nemoclaw.js
+      NEMOCLAW_RUN_E2E_SCENARIOS: "1"
+      NEMOCLAW_NON_INTERACTIVE: "1"
+      NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE: "1"
+      NEMOCLAW_SANDBOX_NAME: "e2e-upgrade-stale"
+      OPENSHELL_GATEWAY: "nemoclaw"
+    steps:
+      - uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
+        with:
+          persist-credentials: false
+
+      - name: Configure isolated Docker auth directory
+        run: echo "DOCKER_CONFIG=${RUNNER_TEMP}/docker-config-upgrade-stale-sandbox" >> "$GITHUB_ENV"
+
+      - name: Authenticate to Docker Hub
+        env:
+          DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}
+          DOCKERHUB_TOKEN: ${{ secrets.DOCKERHUB_TOKEN }}
+        shell: bash
+        run: |
+          set -euo pipefail
+          if [[ -z "${DOCKERHUB_USERNAME}" || -z "${DOCKERHUB_TOKEN}" ]]; then
+            echo "::notice::Docker Hub credentials not configured; continuing with anonymous pulls."
+            exit 0
+          fi
+          mkdir -p "${DOCKER_CONFIG}"
+          chmod 700 "${DOCKER_CONFIG}"
+          login_succeeded=0
+          for attempt in 1 2 3; do
+            if echo "${DOCKERHUB_TOKEN}" | timeout 30s docker login docker.io --username "${DOCKERHUB_USERNAME}" --password-stdin; then
+              login_succeeded=1
+              break
+            fi
+            if [[ "$attempt" -lt 3 ]]; then
+              echo "::warning::Docker Hub login attempt ${attempt} failed; retrying."
+              sleep 5
+            fi
+          done
+          if [[ "$login_succeeded" -ne 1 ]]; then
+            echo "::warning::Docker Hub login failed after 3 attempts; continuing with anonymous pulls."
+          fi
+
+      - name: Set up Node
+        uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e # v6.0.0
+        with:
+          node-version: 22
+          cache: npm
+
+      - name: Install root dependencies
+        run: npm ci --ignore-scripts
+
+      - name: Build CLI
+        run: npm run build:cli
+
+      - name: Install OpenShell CLI
+        run: bash scripts/install-openshell.sh
+
+      - name: Run upgrade stale sandbox live Vitest test
+        env:
+          NVIDIA_INFERENCE_API_KEY: ${{ secrets.NVIDIA_INFERENCE_API_KEY }}
+        run: |
+          set -euo pipefail
+          export PATH="$HOME/.local/bin:$HOME/.npm-global/bin:$PATH"
+          if command -v openshell >/dev/null 2>&1; then
+            OPENSHELL_BIN="$(command -v openshell)"
+          elif [ -x "$HOME/.local/bin/openshell" ]; then
+            OPENSHELL_BIN="$HOME/.local/bin/openshell"
+          else
+            echo "::error::OpenShell CLI not found after install"
+            ls -la /usr/local/bin/openshell "$HOME/.local/bin/openshell" 2>&1 || true
+            exit 1
+          fi
+          export OPENSHELL_BIN
+          "$OPENSHELL_BIN" --version
+          npx vitest run --project e2e-scenarios-live             test/e2e-scenario/live/upgrade-stale-sandbox.test.ts             --silent=false --reporter=default
+
+      - name: Upload upgrade stale sandbox artifacts
+        if: always()
+        uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
+        with:
+          name: e2e-vitest-scenarios-upgrade-stale-sandbox
+          path: e2e-artifacts/vitest/upgrade-stale-sandbox/
+          include-hidden-files: false
+          if-no-files-found: ignore
+          retention-days: 14
+
+      - name: Clean up Docker auth
+        if: always()
+        run: |
+          set -euo pipefail
+          docker logout docker.io || true
+          rm -rf "${DOCKER_CONFIG}"
+


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify the free-standing inventory includes the new job and scenario echo "=== Checking free-standing inventory for upgrade-stale-sandbox ===" if fd -t f 'free-standing-workflow-inventory' tools/; then inventory_file=$(fd -t f 'free-standing-workflow-inventory' tools/ | head -1) echo "Found inventory file: $inventory_file" echo "" echo "--- Checking for job id ---" rg -n 'upgrade-stale-sandbox-vitest' "$inventory_file" || echo "WARNING: Job not found in inventory" echo "" echo "--- Checking for scenario id ---" rg -n 'upgrade-stale-sandbox[^-]' "$inventory_file" || echo "WARNING: Scenario not found in inventory" else echo "WARNING: Could not find free-standing-workflow-inventory file" fi

Repository: NVIDIA/NemoClaw

Length of output: 402

The new job is well-structured, but the free-standing inventory file is incomplete.

The upgrade-stale-sandbox-vitest job correctly implements isolated Docker auth, retry logic, and artifact handling. However, verification confirms that tools/e2e-scenarios/free-standing-workflow-inventory.mts is missing the required entries:

upgrade-stale-sandbox-vitest not in allowed_jobs

upgrade-stale-sandbox not in free_standing_scenarios_csv

Mapping not in free_standing_scenario_jobs_csv

Since the workflow's generate-matrix step validates inventory coverage, these missing entries will cause CI failures. Add all three required entries to the inventory file before merging.

Minor style note: Line 1781 has excessive inline whitespace in the vitest command. Consider using backslash line continuation for consistency.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/e2e-vitest-scenarios.yaml around lines 1698 - 1799, The free-standing workflow inventory file is missing three required entries that will cause CI failures during the generate-matrix validation step. Open tools/e2e-scenarios/free-standing-workflow-inventory.mts and add upgrade-stale-sandbox-vitest to the allowed_jobs array, upgrade-stale-sandbox to the free_standing_scenarios_csv string, and create the mapping entry in free_standing_scenario_jobs_csv to connect the scenario to its job. Additionally, on line 1781 in the upgrade-stale-sandbox-vitest job, the npx vitest run command has excessive inline whitespace; refactor it to use backslash line continuation for better readability and consistency with other multi-line commands in the workflow.

…ade-stale-sandbox

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

…andbox' into e2e-migrate/test-upgrade-stale-sandbox

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

…andbox' into e2e-migrate/test-upgrade-stale-sandbox Signed-off-by: Carlos Villela <cvillela@nvidia.com>

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(e2e): migrate upgrade stale sandbox to vitest

8aefe40

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv self-assigned this Jun 18, 2026

coderabbitai Bot reviewed Jun 18, 2026

View reviewed changes

cv added the v0.0.66 Release target label Jun 18, 2026

cv added 11 commits June 19, 2026 11:18

Merge branch 'main' into e2e-migrate/test-upgrade-stale-sandbox

96ad75a

Merge branch 'main' into e2e-migrate/test-upgrade-stale-sandbox

39cf882

Merge remote-tracking branch 'origin/main' into e2e-migrate/test-upgr…

7256827

…ade-stale-sandbox

test(e2e): refresh upgrade stale sandbox workflow

0f8ccbd

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

Merge remote-tracking branch 'origin/e2e-migrate/test-upgrade-stale-s…

bce7dac

…andbox' into e2e-migrate/test-upgrade-stale-sandbox

test(e2e): relax workflow inventory timeout

2354010

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(e2e): address stale upgrade review feedback

553645c

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(e2e): move scenario logic out of test wrapper

7cb5562

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

Merge remote-tracking branch 'origin/e2e-migrate/test-upgrade-stale-s…

31a00b5

…andbox' into e2e-migrate/test-upgrade-stale-sandbox Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(e2e): restore scenario logic to test file

97d8faa

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(e2e): move upgrade stale setup branches to helpers

70cfd1e

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv merged commit c9953e9 into main Jun 20, 2026
40 checks passed

cv deleted the e2e-migrate/test-upgrade-stale-sandbox branch June 20, 2026 18:17

cv mentioned this pull request Jun 21, 2026

Epic: Migrate legacy bash E2E into the Vitest E2E system #5098

Open

79 tasks

Conversation

cv commented Jun 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

github-code-quality Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage Overview

TypeScript / code-coverage/plugin

TypeScript / code-coverage/cli

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Vitest E2E Scenario Recommendation

Vitest E2E Scenario Advisor

Required Vitest E2E scenarios

Optional Vitest E2E scenarios

Relevant changed files

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cv commented Jun 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading

github-code-quality Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading