Add GPU-emulator (mirage/rocjitsu) environment axis for AORTA workloads#227
Open
vivekkhandelwal1 wants to merge 5 commits into
Open
Add GPU-emulator (mirage/rocjitsu) environment axis for AORTA workloads#227vivekkhandelwal1 wants to merge 5 commits into
vivekkhandelwal1 wants to merge 5 commits into
Conversation
Run AORTA workloads/triage cells on a software-emulated GPU (mirage control plane + rocjitsu) with no physical GPU, selected via the environment axis - for hardware-free dev / CI / functional-correctness. - Environment gains optional `emulator` / `mirage_profile` fields (peers of docker/venv/buck_target), threaded into `_aorta_environment` via the existing dispatcher asdict path. Allow-lists updated in the entry-point registry and JSON sidecar loader; built-in `emulated-rocjitsu` environment added. - New `aorta.emulation.mirage_launch`: turns an emulated environment into `mirage run --profile <p> -- <argv>`; non-emulated argv is returned byte-for-byte unchanged. $MIRAGE_BIN resolution; loud errors. - `SubprocessWorkload` (aorta probe) opt-in: wraps its argv through mirage when the cell's environment is emulated. - New single-process `gpu_smoke` workload (trivial CUDA kernel + verify; min_world_size=1) + `recipes/gpu-smoke-emulated.yaml` - a hardware-free emulator/CI smoke test. - docs + tests (no GPU required). Validated end-to-end on an emulated MI350X (no physical GPU): `mirage run --profile rocjitsu-MI350X -- aorta triage run --recipe recipes/gpu-smoke-emulated.yaml` -> matrix.md, 0% failure. Co-authored-by: Cursor <cursoragent@cursor.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds a new GPU-emulator environment axis to AORTA so workloads (and probe-mode subprocess launches) can be run under the mirage + rocjitsu software GPU emulator, enabling hardware-free development/CI runs. It also introduces a minimal gpu_smoke workload and an emulation-focused recipe to validate the end-to-end triage path without a physical GPU.
Changes:
- Extend
Environment(and environment registries/sidecars) withemulatorandmirage_profile, plus a built-inemulated-rocjitsuenvironment. - Add
aorta.emulation.mirage_launchto detect emulated environments and wrap subprocess argv asmirage run --profile … -- <argv>, and integrate this intoSubprocessWorkloadsetup. - Add
gpu_smokeworkload +recipes/gpu-smoke-emulated.yamland supporting documentation/tests for the new emulation path.
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/emulation/test_mirage_launch.py | New tests covering Environment round-trip, sidecar keys, emulation detection, mirage bin resolution, argv wrapping, and SubprocessWorkload opt-in behavior. |
| tests/emulation/init.py | Adds emulation test package marker. |
| src/aorta/workloads/gpu_smoke.py | New single-process GPU smoke workload for emulator/CI validation. |
| src/aorta/workloads/_subprocess.py | Wrap subprocess argv via mirage when the resolved environment is emulated; (also includes an unintended Tier-3 knob regression noted in comments). |
| src/aorta/registry/types.py | Adds emulator and mirage_profile fields to Environment and documents the new axis. |
| src/aorta/registry/sidecar.py | Allows emulator / mirage_profile keys in JSON sidecar environments. |
| src/aorta/registry/environments.py | Allows new keys and adds built-in emulated-rocjitsu environment. |
| src/aorta/emulation/mirage_launch.py | New emulation launch helper module: environment detection, mirage binary resolution, argv wrapping. |
| src/aorta/emulation/init.py | Exposes emulation helpers at the package level. |
| recipes/gpu-smoke-emulated.yaml | New recipe demonstrating triage execution under emulation. |
| pyproject.toml | Registers gpu_smoke workload entry point. |
| docs/plans/mirage-aorta-integration.md | Design/usage documentation for emulated GPU execution. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- _subprocess.py: the prior commit copied an older revision of this file, inadvertently reverting newer main features (e.g. `tier3_vram_growth` TrialContext wiring, `_terminate_process_tree`). Rebuild on main's current file and apply ONLY the emulation opt-in, now guarded so the non-emulated path is a true zero-cost no-op (no import / no extra work) and existing probes are byte-for-byte unchanged. - gpu_smoke.py: use a tolerance-based comparison (math.isclose) instead of exact float equality, which is brittle for float16/bfloat16 / larger sizes. Co-authored-by: Cursor <cursoragent@cursor.com>
- gpu_smoke: default `steps` via explicit `is None` so an intentional `steps: 0` is honored instead of being treated as missing (falsy-0). - Add dependency-free `tests/workloads/test_gpu_smoke.py` (stubs a minimal fake torch): cuda-availability gate, steps defaulting incl. explicit 0, pass/fail tolerance, and corruption (out-of-tolerance) detection. Co-authored-by: Cursor <cursoragent@cursor.com>
…ields Adding `emulator`/`mirage_profile` to `Environment` adds two keys to `asdict(Environment(...))`. Update the existing assertions that pin the exact asdict shape so they include `emulator: None` / `mirage_profile: None`: - tests/registry/test_environments.py (pure-buck asdict) - tests/run/test_dispatcher.py (docker/buck round-trip + buck/image override preservation: the `_aorta_environment` payload). Co-authored-by: Cursor <cursoragent@cursor.com>
- gpu_smoke: validate `dtype` and raise on an unknown/typo value (listing allowed values) instead of silently defaulting to float32, which could mask a misconfigured run as green. Add a unit test for the raise. - types.py: mention `emulator` (optional hint paired with `mirage_profile`) in the Environment docstring's first paragraph so it matches the schema. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
emulator/mirage_profile) so AORTA workloads and triage cells can run on a software-emulated GPU (the mirage control plane + rocjitsu emulator) with no physical GPU — for hardware-free dev / CI / functional-correctness. Peers ofdocker/venv/buck_target; threaded into_aorta_environmentvia the existing dispatcher path (no dispatcher change). Registry + JSON sidecar loader accept the new keys; built-inemulated-rocjitsuenvironment added.aorta.emulation.mirage_launch: turns an emulated environment intomirage run --profile <p> -- <argv>; non-emulated launches are returned byte-for-byte unchanged.$MIRAGE_BINresolution; fails loudly rather than silently running on real hardware.aorta probe(SubprocessWorkload) opt-in: wraps its argv through mirage when the cell's environment is emulated.gpu_smokeworkload (trivial CUDA kernel + verify;min_world_size=1) +recipes/gpu-smoke-emulated.yaml— a hardware-free emulator/CI smoke test.Test plan
tests/emulation/(18 tests, no GPU required):Environmentround-trips the new keys, built-inemulated-rocjitsuresolves, sidecars accept the keys, emulation detection, argv wrapping + passthrough,$MIRAGE_BINresolution + error paths,SubprocessWorkloadopt-in wrap.mirage run --profile rocjitsu-MI350X -- aorta triage run --recipe recipes/gpu-smoke-emulated.yaml→matrix.md, 0% failure.Notes / limitations
Made with Cursor