ci: add bench-baseline workflow_dispatch (C-g step 3 followup)#89
Merged
ci: add bench-baseline workflow_dispatch (C-g step 3 followup)#89
Conversation
Closes the last open piece of Plan C-g — collecting native x86_64-linux / x86_64-windows / aarch64-darwin bench rows for merges where the user does not have measurement-grade local hardware (the OrbStack `my-ubuntu-amd64` VM in particular is Rosetta-translated, so its x86_64-linux rows in `bench/history.yaml` are schema-shakedown only — not a true native baseline). `.github/workflows/bench-baseline.yml`: workflow_dispatch with `os` (choice of ubuntu-latest / macos-latest / windows-latest) and an optional `reason` override. Linux/macOS provision via nix devshell (so hyperfine / yq / zig come from the same pinned versions test-nix uses); Windows uses `install-tools.ps1 -OnlyTool zig` + `-OnlyTool hyperfine` plus a one-shot yq.exe download (yq is not on the install-tools.ps1 list of pinned tools; the only consumer outside this workflow is `bench/record.sh` and that runs locally where the user already has yq through nix). The workflow runs `scripts/record-merge-bench.sh` and commits the new row directly to main with subject `Record <arch_suffix> bench baseline for <subject> (workflow_dispatch)`. One retry on push collision so a concurrent local Mac per-merge-bench commit does not lose the new row. `.claude/CLAUDE.md` Merge-Gate item 10 paragraph reworded: the on-PR ci_compare regression check is now spelled as 3-OS (not Ubuntu-only) and a forward pointer to the new workflow is added. Why workflow_dispatch and not on-push: bench runs cost ~5–7 min and the Mac aarch64-darwin row is already recorded locally on every merge; this workflow is for the platforms the user can't record locally with confidence. Manual trigger lets the user pick which merge SHAs deserve a native baseline rather than recording every one.
chaploud
added a commit
that referenced
this pull request
Apr 29, 2026
github-actions Bot
added a commit
that referenced
this pull request
Apr 29, 2026
…ep 3 followup): add bench-baseline workflow_dispatch (#89) (workflow_dispatch)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
bench/history.yamlnatively multi-arch and made the on-PR regression check run on Linux/macOS/Windows simultaneously. The remaining gap was that the user does not have measurement-grade local hardware for native x86_64-linux or x86_64-windows; the OrbStack rows in history.yaml are Rosetta-translated, so they are useful for schema validation but not for absolute-time tracking.scripts/record-merge-bench.shon a GitHub-hosted runner of the requested OS and commits the resulting row directly to main, with the same naming convention the local Mac per-merge bench uses.Usage
-f os=isubuntu-latest/macos-latest/windows-latest. Optional-f reason="..."overrides the bench-row reason; default is the HEAD commit subject.Implementation notes
test-nixuses).install-tools.ps1 -OnlyTool zig+-OnlyTool hyperfine(the two pieces the bench harness needs) plus a one-shotyq_windows_amd64.exedownload — yq is not oninstall-tools.ps1's pinned list and the only downstream consumer isbench/record.sh, which runs locally on hosts where the user already has yq via nix.permissions: contents: writeso the defaultGITHUB_TOKENcan push to main.Test plan
gh workflow run bench-baseline.yml -f os=ubuntu-latestand verify the resultingRecord x86_64-linux bench baseline for <subject> (workflow_dispatch)commit lands on main with one new history.yaml entry taggedarch: x86_64-linux.os=windows-latestonce an opportunity to verify presents itself.