feat(w54): LoopInfo substrate (shared branch / loop / liveness analysis)#91
Merged
feat(w54): LoopInfo substrate (shared branch / loop / liveness analysis)#91
Conversation
Both backends used to maintain their own copy of the same branch-target / loop-header / loop-body-extent pre-scan. Move the analysis into src/loop_info.zig so it is the single source of truth for the rest of the redesign (Phase 1+ enrichments — liveness and invariant-const classification — extend this struct rather than each backend's local fields). Behaviour-neutral: the dump-jit output for tgo_string_ops func#24 (196 instrs, 784 bytes) is byte-for-byte identical to main. Full Commit Gate green (tests 405/405, spec, e2e, real-world 50/50, FFI 80/80, minimal build). Refs: .dev/w54-redesign-plan.md (Pillar 1).
Extends LoopInfo.analyse(reg_count) with vreg_first_def[] and vreg_last_use[] arrays. Computed in the same single forward sweep that produces branch_targets / loop_headers / loop_end. Stores (0x36..0x3E), conditional branches (BR_IF / BR_IF_NOT) and RETURN correctly treat rd as a SOURCE rather than a destination — see opWritesRd / opUsesRdAsSource. NEVER_DEFINED sentinel marks vregs that no instruction in the body ever writes (Phase 4 will treat v < local_count as "defined-before-loop" in the invariant check). No JIT consumer yet; the JIT compile loop is unchanged so the emitted machine code is byte-for-byte identical to Phase 0 (verified on tgo_string_ops func#24, 196 instrs / 784 bytes). Conservative reads: rs1/rs2_field are treated as live-uses by every non-control / non-const opcode. Over-approximation only shrinks the Phase 5 coalescer's window, never breaks correctness. Multi-source ops (CALL, CALL_INDIRECT, RETURN_MULTI, memory.fill, memory.copy) that read additional vregs from following NOP slots are not modelled in Phase 1 — Phase 5 won't coalesce around them anyway. Full Commit Gate green: tests 409/409 (4 new liveness tests), spec, e2e, real-world 50/50, FFI, minimal build. Refs: .dev/w54-redesign-plan.md (Pillar 1 / Phase 1).
D138 captures the LoopInfo substrate as the shipped scope, with the magic hoist (Phase 3) and the liveness-driven coalescer (Phase 5) explicitly held back. The coalescer was reverted from this PR after Linux x86_64 CI flagged a go_math_big BigInt divergence: Mac aarch64 passes 50/50 realworld with the new RegFunc layout, but x86_64 mis-emits — the bug is in src/x86.zig's interaction with fewer-MOV / shifted-PC IR. Tracked as W54-coalescer. checklist.md closes W54 substrate and opens four follow-ups: - W54-coalescer: diagnose the x86_64 go_math_big regression. - W54-hoist-revisit: revive the magic hoist (pending W47 + W54-x86). - W54-x86: x86_64 hoist parity. - W54-libm: rw_c_math intrinsic recognition. memo.md handover updated. New: .dev/w54-redesign-postmortem.md captures the full session arc — magic-hoist-attempt → loop-pass- redesign → loop-info → revert — with branch names and cherry-pick instructions for the archived hoist + coalescer work. Lessons recorded: - Linux x86_64 CI is irreplaceable for arch-asymmetric regressions. Mac green + OrbStack-Rosetta green do not imply native x86_64 green. The earlier "OrbStack passes" reading was a stale Mach-O binary — OrbStack Linux can't run aarch64-darwin and falls through to wasmtime's output. Confirmed reproducible on a native OrbStack VM with a fresh build. - Regalloc-stage IR changes are arch-agnostic, but JIT consumption isn't. A new RegFunc shape that's correct by construction can still expose backend bugs.
934d141 to
1607ba9
Compare
chaploud
added a commit
that referenced
this pull request
Apr 29, 2026
Per Merge Gate item 10 — aarch64-darwin baseline at the merge SHA. Behaviour-neutral merge (substrate-only refactor); the recorded numbers should match recent main entries within bench σ.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
W54 substrate. Single structural change:
src/loop_info.zigis the single source of truth for the function's control-flow shape and per-vreg liveness.The two JIT backends used to maintain byte-for-byte identical
scanBranchTargetsimplementations (~60 lines each); both now consumeLoopInfo.analyse(allocator, ir, reg_count), which produces:branch_targets[],loop_headers[],loop_end[](drives JIT cache eviction and theknown_constswipe at merge points).vreg_first_def[],vreg_last_use[](one forward sweep, conservative reads — over-approximation extends last_use later than necessary, only shrinks a future coalescer's window, never breaks correctness). Future consumers (W54-coalescer, W54-hoist-revisit) read these.Behaviour byte-deterministic identical to main on every benchmark we dump-jit'd (
tgo_string_opsfunc#24,fibfunc#2). No performance change is expected or observed — Phase 0 + Phase 1 are pure refactoring; the new arrays are computed but no codegen consumer reads them yet.What was held back
The original W54 plan included two further pieces of work, both built and bench-validated, both held back from this PR:
Magic-constant loop-invariant hoist (digitCount JIT 196 → 192). Held back pending W47 (bench harness σ < 5%) and W54-x86 (parity). Cherry-pick path:
1600397+c4b806efromarchive/w54-magic-hoist-2026-04-30.Liveness-driven mov coalescing extension to
regalloc.copyPropagate(digitCount JIT 196 → 189). Reverted from this PR after the first CI run flagged Linux x86_64go_math_big: Mac aarch64 passes 50/50 realworld with the newRegFunclayout, but Linux x86_64 produces wrong BigInt subtraction (wasmtime:864197532086419753208641975320, zwasm:864197532160206729503480181784). The regalloc itself is arch-agnostic — the sameRegFuncflows through both backends — so the divergence is insrc/x86.zig's codegen interaction with the new IR layout. Reproducible on OrbStackmy-ubuntu-amd64with a fresh native x86_64 build. Tracked as W54-coalescer for diagnosis. Cherry-pick path:ec8182ffrom the archive branch.Phase 4(loop-invariantknown_constssurvival across loop headers) was dropped after RegIR inspection: digitCount emitsCONST32 r? = 10inside the loop body for every divisor site, so the optimisation never fires on the W54 target.Surfaced lessons
RegFuncshape that's correct by construction can still expose existing backend assumptions in one arch only.Architecture and rejected alternatives: D138 in
.dev/decisions.md. Full session arc + branch names:.dev/w54-redesign-postmortem.md.Test plan
bash scripts/gate-commit.sh— tests, spec, e2e, real-world 50/50, FFI 80/80, minimal build all green.bash scripts/record-merge-bench.shon main after squash-merge.