-
Notifications
You must be signed in to change notification settings - Fork 323
[worktrial] Taste reward shaping #1618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
alliegu-fleet
wants to merge
131
commits into
NovaSky-AI:main
Choose a base branch
from
fleet-ai:taste-reward-shaping
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
131 commits
Select commit
Hold shift + click to select a range
671e50d
feat: add Fleet Task environment for skyrl-gym
35d9513
feat: add Fleet training integration with entrypoints, scripts, and c…
ae7934e
Add task generation environment for skyrl-gym
91776e1
Add hint augmentation support for Fleet task training
f1c3f1b
merge: resolve fleet_task + task_gen registration conflict
66bf318
Merge remote-tracking branch 'origin/fleet/training' into fleet/all
73251e1
merge: resolve hint-augmentation + task_gen config conflict
ff094d9
Add VL/CUA multimodal support ported from SkyRL PR #288
1af0a8e
Add GCP spot H200 option for VL YAML
9660f8c
Fix setup: use fsdp extra instead of non-existent vllm extra
ce0cef4
Fix causal-conv1d build: use pip instead of uv for CUDA extension
8b16d85
Fix cd path in run scripts for SkyRL-v2 repo layout
282f5fe
Add missing config fields for Fleet training overrides
d1342f4
Apply legacy config translation in Hydra entrypoints
1f196e2
Add inference_engine defaults to YAML for Hydra entrypoints
f223bba
Fix fleet_task double registration and task-gen data path
4b6c15e
Fix legacy config sync, registration, and task-gen data path
0dc3a17
Handle OmegaConf DictConfig in get_config_as_yaml_str
8393415
Replace @hydra.main with from_cli_overrides in Fleet entrypoints
061fcdc
Upgrade accelerate in extra-setup to fix _is_hf_initialized TypeError
d011232
Fix accelerate install: use --no-deps to avoid torch re-resolution
baa37a5
Patch Parameter.__new__ to fix _is_hf_initialized TypeError
acafea3
Fix config parsing and per-record env_class for fleet training
31e96c1
Fix task_gen rollout dir and OmegaConf struct flag for config overrides
e0bd62c
Export FLEET_API_KEY to Ray runtime env and improve task import error…
bfd547a
fix(task_gen): use data_source as fallback for env_key in extras
d5620dc
fix(35b): add --no-pytorch-alloc-conf to prevent vLLM CuMem crash
205252f
fix: sanitize multimodal content for text-only chat templates
3d4ad77
fix: update uids after hint augmentation extends trajectory_ids
0f921a7
fix: catch env.init failures in agent_loop to prevent training crash
35de0bb
fix: hardcode flash_attn=false in all fleet run scripts + add "$@" to…
763f056
fix: use [0.0] rollout_logprobs in env_init_error fallback (not None)
244f40d
fix: match reward format in env_init_error fallback
79700ee
add per-env task-gen launcher script
c1ceec0
fix: handle per-trajectory exceptions in generate() instead of crashi…
9d51e33
fix: multi-node FSDP2 stability + hint batch size for 35B training
09d2b43
docs: add CLAUDE.md and fleet changelog for multi-node fixes
b98b240
Add training trajectory logging + S3 upload
56cd76a
fix: add dump_training_trajectories to TrainerConfig dataclass
22d6ace
fix: match system prompt with old fork for VL parity
d40c5f7
fix: match system prompt with old fork for VL parity
0f023ed
Add data.env_filter for per-env dataset filtering at training time
9cf2d5d
fix: re-add --no-pytorch-alloc-conf for vLLM 0.18.0 CuMemAllocator co…
95450e2
docs: update changelog + CLAUDE.md for vLLM 0.18.0 CuMemAllocator fix
acf8783
merge fleet/training: re-add --no-pytorch-alloc-conf for vLLM 0.18.0
90af7d9
Pass data_key, data_version, env_version, env_variables through to pa…
5d7c878
chore: rename wandb project to fleet-tool-use-grpo
ee19cd9
chore: rename wandb project to fleet-tool-use-grpo
afb8875
merge fleet/training: rename wandb project
89e8799
fix: enable flash_attn for 35B training (OOM without it)
b115b31
fix: enable flash_attn for 35B training (OOM without it)
fd4be8e
merge fleet/training: enable flash_attn for 35B
2db6c69
Pass representative env_variables + env_variable_keys per-env to parquet
42fe8fd
feat(tinker): add stop-sequences, top-p, loss-fn args and fix avg_raw…
b0371f9
Merge branch 'fleet/training' into fleet/all
4a1b65d
fix: use async env methods to prevent event loop isolation
3a69f08
Port chunked lm_head forward + rewrite CHANGELOG as coherent document
3c7d2e1
Fix apply_overlong_filtering call signature
833a7ae
chore: reduce VL eval samples to 1 for faster iteration
3dca648
revert: restore eval_n_samples_per_prompt=3 for pass@3
786b2af
Switch to flash_attn=true + update docs with corrected diagnosis
a31d087
Fix logprobs/tokens shape mismatch and cap max_input_length
5e8ac67
fix: add retry logic to _execute_meta_tool for transient connection e…
3b2dc02
Pin vLLM 0.17.0 + re-enable expandable_segments + update docs
0f34391
chore: temporarily disable eval_before_train for training verification
9c92108
Revert vllm_engine.py to pre-0.18 for vLLM 0.17.0 compatibility
dd93554
Keep vLLM 0.18.0, reduce seq length to 72K, restore vllm_engine.py
5b7bb43
Update YAML MAX_INPUT_LENGTH to 72000 to match fleet-35b-run.sh
d7a6b48
chore: re-enable eval_before_train for production VL run
2d12453
chore: disable eval_before_train to verify backward pass
a004025
fix: parse all tool calls per turn + remove exploration gate
621ae49
Clarify --no-pytorch-alloc-conf mechanism in CHANGELOG
a50ca53
Switch to flash_attn=false — flash_attn=true causes Xid 31 with vLLM …
816064b
Re-enable eval_before_train for production VL run
32a8022
docs: update CHANGELOG fix #4 — flash_attn=false, verified working st…
54d74ad
docs: update CHANGELOG — 10 steps verified, checkpoint at step 10
ef7687f
fix: disable hints during training by default
36d7f32
feat: tool-call reward shaping + increase context to 65K
efe1fb0
feat: LLM classifier gate to filter broken tasks before harness
f653a30
fix: remove read-write mismatch check, use Sonnet 4.5 for classifier
3cfd44e
Merge pull request #6 from fleet-ai/fleet/all
dzorlu 4fa377c
chore: update workdir.ref to main for task-gen YAML
cf0098c
chore: update workdir.ref to main for VL and 35B YAMLs
9990be6
feat(task-gen): v4 reward hacking fixes — judge, exploration, schema …
dzorlu 690aad7
fix: update per-env launch script for v4
ffa5238
fix: use case statement for seed counts (bash compat)
b967f43
Reduce VL max_input_length from 128K to 96K to prevent OOM
519293d
Compact schema + remove describe_db tool
c6773a3
feat(task-gen): verifier hardening — exploration gate, anti-permissiv…
dzorlu 35efb8e
feat(task-gen): add 35B task-gen YAML and run script (#15)
dzorlu 086807e
feat: LLM-synthesized hints for failed trajectories
19ad98c
Enable partial_reward for VL training
b6174df
fix: use OpenRouter via litellm instead of direct Anthropic API
8cf2fd8
fix: use correct OpenRouter model ID for hint synthesis
c03cf79
Binary reward + truncate query_db responses
bb2e5de
Fix binary reward: restore base_quality + ablation config
1b08b60
Fix submission nudge: append to tool results, not dead branch
d89caf6
CLAUDE.md: report binary variance reward, not just pass@8
06cb395
v5.1: verifier dry-run, MCP tool prompt, earlier nudge, zero_variance…
8a8fbde
35b: baseline on v6, disable hints
d7d087e
VL training: add browser_use modality support, switch to v6 data
e9ebbef
fix: allow browser_use modality in fleet-common-setup.sh validation
1ff4065
fix: add 900s trajectory timeout to VL training
3109829
35b: use triton GDN prefill to avoid FlashInfer JIT hang
af81a43
Revert "35b: use triton GDN prefill to avoid FlashInfer JIT hang"
8a87a25
35b: enable eval_before_train for step 0 baseline
0dc943e
fix: CLAUDE.md primary branch is main, not fleet/all
9b7fc1f
fix: wire S3 upload for eval results after every eval
5a84fde
35b: eval_interval=10 (was 20)
e95a47b
feat: add checkpoint broadcast to workers for
sumi-fleet-hub d643e41
fix: gather checkpoint shards from workers before S3
sumi-fleet-hub 9f7137e
fix: increase broadcast timeout to 30min for large checkpoints
sumi-fleet-hub d1b9d87
fix: dynamic rsync timeout based on checkpoint size
sumi-fleet-hub bc184d7
Merge pull request #17 from fleet-ai/fix/multi-node-checkpoint
sumi-fleet-hub c7631c7
VL v1: lr 5e-7, max_turns 64, eval_before_train false
6a5a81a
VL v1.1: max_input_length 96K → 72K (fix NaN gradients)
179b23c
Revert "VL v1.1: max_input_length 96K → 72K (fix NaN gradients)"
3360624
VL v2: max_input_length 96K → 80K (fix NaN gradients)
2718251
VL v3: max_input_length 64K, zero_variance_filter=true
3b9a85e
feat: port HybridEnvSampler from SkyRL-archived
7fc32ab
VL: 2-node, batch_size=50, min_samples_per_env=2 (#18)
dzorlu 1ab8ef4
feat: add Fleet eval-only entrypoint with S3 checkpoint resume
sumi-fleet-hub 0865c8a
eval_before_train=true for checkpoint resume eval
62d4b73
Merge pull request #19 from fleet-ai/feat/eval-only-entrypoint
sumi-fleet-hub 5dfd198
Prioritize RunPod reserved H200s in SkyPilot task configs
e64bc7e
VL: increase max_input_length to 80K for longer browser trajectories
6ad8c76
VL: increase max_turns 64→80 for browser-use turn limit ablation
84b49de
feat: save screenshots in trajectory dumps for VL training
646d5a9
feat: save screenshots in eval trajectory dumps too
abf4008
VL: set eval_before_train=false to skip 10h eval overhead
9e9f648
Add taste-reward shaping on top of main (rebased)
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| # SkyRL-v2 (fleet-ai/SkyRL-v2) | ||
|
|
||
| Fork of SkyRL with Fleet-specific optimizations for multi-node FSDP2 training at scale. | ||
|
|
||
| ## Fleet Integration | ||
|
|
||
| Fleet-specific changes, fixes, and context are documented in: | ||
| - **[integrations/fleet/CHANGELOG.md](integrations/fleet/CHANGELOG.md)** — detailed changelog with root causes and fixes | ||
|
|
||
| Always consult the changelog before modifying Fleet training paths (`fsdp_worker.py`, `worker.py`, `model_wrapper.py`, `dispatch.py`, `fleet-*.sh`). | ||
|
|
||
| ## Key Differences from Upstream SkyRL | ||
|
|
||
| 1. **Multi-node FSDP2 stability**: Synchronous ref model offload/backload with `torch.distributed.barrier()` in `fsdp_worker.py`. Required because cross-node colocated training has no shared CUDA context. | ||
|
|
||
| 2. **Chunked lm_head forward**: `model_wrapper.py` has `loss_chunk_size` support ported from the old fork. Avoids materializing full `(B, S, vocab_size)` logits — critical for 35B with 131K vocab at 97K sequence length. Without it, OOM/Xid 31 during training forward. | ||
|
|
||
| 3. **CUDA memory management for 35B**: `torch.cuda.empty_cache()` before backward pass in `worker.py` (policy + critic). Prevents OOM from fragmentation. | ||
|
|
||
| 4. **Reduced sequence length (72K) for 35B**: `fleet-35b-run.sh` uses `MAX_INPUT_LENGTH=72000` (down from 96000) with `--no-pytorch-alloc-conf` (disables `expandable_segments` which conflicts with vLLM 0.18.0's `CuMemAllocator`). At 97K, SDPA OOM'd and flash_attn hit Xid 31 in GatedDeltaNet. At 72K, flash_attn=true + chunked lm_head + empty_cache fits without expandable_segments. | ||
|
|
||
| 5. **`stage_chunks` pre-staging**: `dispatch.py` has a `stage_chunks` optimization (not in upstream) that pre-stages mini-batch chunks in Ray object store. Includes dynamic `mini_batch_size` adjustment for hint augmentation's variable batch sizes. | ||
|
|
||
| ## Training Scripts | ||
|
|
||
| - `scripts/fleet-common-run.sh` — shared infra (Ray, NCCL, gIB detection, deps). Used by all runs. | ||
| - `scripts/fleet-35b-run.sh` — Qwen3.5-35B config. Calls `fleet-common-run.sh`. | ||
| - `scripts/fleet-9b-run.sh` — Qwen3.5-9B config. Calls `fleet-common-run.sh`. | ||
|
|
||
| All training flags live in these scripts. Never duplicate flags in SkyPilot YAMLs or fleet-research scripts. | ||
|
|
||
| ## Task-Gen Metrics | ||
|
|
||
| When reporting task-gen training metrics, distinguish between: | ||
| - **pass@8 / avg_raw_reward**: includes `base_quality=0.1` for passing sandbox+judge. Misleading — inflated by gate-passing alone. | ||
| - **binary variance reward**: the actual learning signal. `1.0` when solver rollouts are mixed (at least 1 pass + 1 fail), `0.0` otherwise. This is what matters. | ||
|
|
||
| Report binary variance reward count (how many tasks got `reward >= 1.0`) separately from gate-pass count. Check `EVAL` log lines for `total=1.0000` vs `total=0.0000`. | ||
|
|
||
| ## Branch | ||
|
|
||
| Primary development branch: `main` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,147 @@ | ||
| # Taste-Judge GRPO Launch Recipe | ||
|
|
||
| Wires `research/judge/judge.py` into the SkyRL Fleet GRPO training loop. | ||
| Reward shape is **GATED TASTE**: | ||
|
|
||
| ``` | ||
| effective_taste = max(taste_floor, taste_score) # 1.0 if judge fails / None | ||
| reward = verifier_reward * effective_taste | ||
| ``` | ||
|
|
||
| Blended only on the terminal step of each rollout, with a 10s judge timeout | ||
| and verifier-only fallback (`effective_taste = 1.0`, so reward collapses to | ||
| `verifier_reward`) on timeout/exception/None. | ||
|
|
||
| ### Why gated > additive | ||
|
|
||
| The previous additive shape `R = alpha * verifier + (1-alpha) * taste` | ||
| rewarded "pretty failures" — a trajectory that fails the verifier (v=0) | ||
| but narrates clean intent (t high) earned `(1-alpha) * t > 0`, which | ||
| incentivized the policy to learn good-looking failure modes. Gated taste | ||
| closes this hack: `verifier=0` forces `reward=0` regardless of taste, so | ||
| there is zero gradient toward pretty-failure mimicry. Among successes, | ||
| ugly successes still earn `floor * verifier` (default `floor=0.1`) so GRPO | ||
| sees within-group taste variance and can prefer pretty successes; setting | ||
| `floor=1.0` collapses the shape to pure verifier and serves as a clean | ||
| ablation baseline. **The floor is set to 0.1 (not 0.3) because offline | ||
| analysis showed mean rescaled taste of verifier=1 trajectories is ~0.13; | ||
| floor=0.3 would clip nearly all successes and kill within-group variance. | ||
| Re-tune floor after a 50-100 step pilot using the empirical effective_taste | ||
| P25 logged in WandB.** | ||
|
|
||
| ## One-block launch | ||
|
|
||
| ```bash | ||
| # 0. From your machine: | ||
| cd /tmp && rm -rf skyrl-fleet && git clone https://github.com/fleet-ai/skyrl-fleet.git | ||
| cd /tmp/skyrl-fleet | ||
|
|
||
| # 1. Apply the env patch (adds taste_floor config, _apply_taste_reward helper, | ||
| # and updates the three terminal returns + get_metrics). | ||
| git apply /Users/alliegu/Desktop/fleet/integration/env.py.diff | ||
|
|
||
| # 2. Vendor the taste-judge package into the workdir Python path. | ||
| cp -r /Users/alliegu/Desktop/fleet/integration/skyrl_taste skyrl-gym/skyrl_taste | ||
| cp -r /Users/alliegu/Desktop/fleet/research/judge research/judge | ||
|
|
||
| # 3. Drop the new YAML config into tasks/. | ||
| cp /Users/alliegu/Desktop/fleet/integration/configs/openenv-fleet-grpo-vl-taste.yaml \ | ||
| tasks/openenv-fleet-grpo-vl-taste.yaml | ||
|
|
||
| # 4. Sky launch with the new yaml + new env vars (judge keys are NEW; the rest | ||
| # are unchanged from the existing VL launch). | ||
| sky launch tasks/openenv-fleet-grpo-vl-taste.yaml \ | ||
| --env FLEET_API_KEY="$FLEET_API_KEY" \ | ||
| --env WANDB_API_KEY="$WANDB_API_KEY" \ | ||
| --env AWS_ACCESS_KEY_ID="$AWS_ACCESS_KEY_ID" \ | ||
| --env AWS_SECRET_ACCESS_KEY="$AWS_SECRET_ACCESS_KEY" \ | ||
| --env ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY" \ | ||
| --env OPENAI_API_KEY="$OPENAI_API_KEY" | ||
| ``` | ||
|
|
||
| ## Required env vars | ||
|
|
||
| - `ANTHROPIC_API_KEY` — **required**. Default judge backend (Claude via | ||
| `research/judge/judge.py`). Without it the judge import fails and the env | ||
| silently falls back to verifier-only reward (you'll see | ||
| `taste_judge_failed=True` in WandB). | ||
| - `OPENAI_API_KEY` — **only required if running inter-rater agreement | ||
| passes** (GPT-4o judge for cross-checking Claude scores during eval). Not | ||
| needed for the standard training run. | ||
| - `FLEET_API_KEY`, `WANDB_API_KEY`, `AWS_ACCESS_KEY_ID`, | ||
| `AWS_SECRET_ACCESS_KEY` — same as the upstream VL launch. | ||
|
|
||
| **Important:** Invoke `judge.py` with `blind_outcome=True` at training time | ||
| to suppress outcome bleed (Stream 4 finding — when the judge sees the | ||
| verifier outcome, taste scores correlate ~0.7 with verifier and the | ||
| shaping signal collapses to a noisy duplicate of the binary reward). The | ||
| async wrapper in `skyrl_taste/judge.py` handles this; double-check the | ||
| flag is forwarded if you swap the wrapper. | ||
|
|
||
| ## WandB metrics to watch | ||
|
|
||
| - `reward/train/mean` — gated reward; bounded above by verifier mean. | ||
| - `env/taste_reward` — judge's [0,1] raw score per trajectory. | ||
| - `env/effective_taste` — `max(floor, taste_reward)`; what actually | ||
| multiplies the verifier. | ||
| - `env/verifier_reward` — raw binary verifier per trajectory. | ||
| - `env/taste_floor` — the configured floor; sanity-check. | ||
| - `env/taste_judge_failed` — should stay near 0; spikes mean Anthropic | ||
| outage or judge parse failures (auto-fallback to pure verifier engaged). | ||
| - **Cross-check**: in within-group runs, plot Pearson(`taste_reward`, | ||
| `verifier_reward`). If correlation collapses below ~0.3, the judge is | ||
| scoring a different signal than the verifier — that's the expected case | ||
| and where the shaped-reward gradient comes from. If it climbs above | ||
| ~0.7, suspect outcome bleed (re-verify `blind_outcome=True`). | ||
| - `reward/train/variance_per_prompt` and `signal_ratio` (from | ||
| `integrations/fleet/reward_metrics.py`) should *increase* relative to a | ||
| verifier-only baseline on groups with mixed pretty/ugly successes. | ||
|
|
||
| ## Rollback | ||
|
|
||
| **Runtime kill switch (no redeploy):** | ||
| ```bash | ||
| sky exec <cluster> "echo SKYRL_TASTE_DISABLED=1 >> ~/.bashrc && pkill -HUP -f main_fleet" | ||
| # or update the SkyPilot env block and re-launch with --env SKYRL_TASTE_DISABLED=1 | ||
| ``` | ||
| This makes `score_trajectory_async` return `None`, the env's | ||
| `effective_taste` becomes `1.0`, and reward collapses to pure verifier. | ||
|
|
||
| **Full revert (uncheck-out the patch):** | ||
| ```bash | ||
| cd /tmp/skyrl-fleet | ||
| git apply -R /Users/alliegu/Desktop/fleet/integration/env.py.diff | ||
| rm -rf skyrl-gym/skyrl_taste research/judge | ||
| ``` | ||
|
|
||
| ## Two-knob ablation (floor x grpo_norm_by_std) | ||
|
|
||
| | floor \ grpo_norm_by_std | true (default) | false (recommended w/ gated taste) | | ||
| |---|---|---| | ||
| | 0.0 (pure multiplicative) | Ugly successes get R=0; group std collapses on all-ugly groups. Heavy gradient damping; expect slow learning. | Same dynamics, undamped; risk of policy ignoring ugly successes entirely. | | ||
| | 0.1 | Tiny within-success variance; std-norm wipes most of the gradient. | Tight bonus for pretty successes; conservative shaping. | | ||
| | 0.1 (default) | Tiny within-success variance from floor itself; std-norm still wipes most of the gradient. | **Headline candidate.** Multiplicative-with-cushion; closes hack and matches the empirical taste distribution. | | ||
| | 0.3 | Within-success std damped; offline data shows nearly all successes clip to floor — kills the signal. | Heavier shaping; only sensible if live taste distribution skews high. | | ||
| | 0.5 | Floor close to pretty-mid; less taste differentiation among successes. | Shallower shaping; useful as sensitivity check. | | ||
| | 1.0 (pure verifier) | **Identical to upstream baseline.** A/B control, no taste in std. | Identical to upstream too (no taste in std). | | ||
|
|
||
| Recommended order: run cell `(0.1, false)` first as the headline candidate, | ||
| then `(0.1, true)` to measure the std-norm effect, then `(1.0, true)` as | ||
| the upstream baseline. `(0.0, false)` is a diagnostic: confirms the gate | ||
| itself bites (ugly successes get zero) without floor compensation. | ||
|
|
||
| ## Risks / gotchas | ||
|
|
||
| - **Judge latency budget**: 10s timeout x `n_samples_per_prompt=4` at | ||
| `train_batch_size=50` = ~200 concurrent judge calls per training step. | ||
| Anthropic rate limits will throttle you before the GPU does. Watch | ||
| `taste_judge_failed` — sustained >10% means raise the limit or batch. | ||
| - **Reward range**: gated reward is in `[0, 1]` — same as verifier — so | ||
| pass@n threshold (`reward >= 1.0` in `reward_metrics.py:79-82`) only | ||
| triggers on `(verifier=1, taste=1.0)`. With `floor=0.1` and `verifier=1`, | ||
| blended max is 1.0 only when `taste_score=1.0`. **Pass@n will look | ||
| worse than verifier-only**; report it alongside the new gated-reward | ||
| mean, and consider plotting `verifier_reward >= 1.0` as a separate | ||
| pass@n line for direct comparison to the baseline. | ||
| - **Outcome bleed**: confirmed Stream 4 risk if the judge ever sees the | ||
| verifier outcome. Keep `blind_outcome=True` in `score_trajectory_async`. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The launch and rollback instructions include hardcoded, user-specific absolute paths (e.g.,
/Users/alliegu/...). This prevents other developers from being able to follow these instructions directly. Please replace these with relative paths from the repository root or use placeholders like<path-to-repo>to make the documentation reproducible.