Skip to content

feat(w54): LoopInfo substrate (shared branch / loop / liveness analysis)#91

Merged
chaploud merged 3 commits intomainfrom
develop/w54-loop-info
Apr 29, 2026
Merged

feat(w54): LoopInfo substrate (shared branch / loop / liveness analysis)#91
chaploud merged 3 commits intomainfrom
develop/w54-loop-info

Conversation

@chaploud
Copy link
Copy Markdown
Contributor

@chaploud chaploud commented Apr 29, 2026

Summary

W54 substrate. Single structural change: src/loop_info.zig is the single source of truth for the function's control-flow shape and per-vreg liveness.

The two JIT backends used to maintain byte-for-byte identical scanBranchTargets implementations (~60 lines each); both now consume LoopInfo.analyse(allocator, ir, reg_count), which produces:

  • branch_targets[], loop_headers[], loop_end[] (drives JIT cache eviction and the known_consts wipe at merge points).
  • vreg_first_def[], vreg_last_use[] (one forward sweep, conservative reads — over-approximation extends last_use later than necessary, only shrinks a future coalescer's window, never breaks correctness). Future consumers (W54-coalescer, W54-hoist-revisit) read these.

Behaviour byte-deterministic identical to main on every benchmark we dump-jit'd (tgo_string_ops func#24, fib func#2). No performance change is expected or observed — Phase 0 + Phase 1 are pure refactoring; the new arrays are computed but no codegen consumer reads them yet.

What was held back

The original W54 plan included two further pieces of work, both built and bench-validated, both held back from this PR:

  • Magic-constant loop-invariant hoist (digitCount JIT 196 → 192). Held back pending W47 (bench harness σ < 5%) and W54-x86 (parity). Cherry-pick path: 1600397 + c4b806e from archive/w54-magic-hoist-2026-04-30.

  • Liveness-driven mov coalescing extension to regalloc.copyPropagate (digitCount JIT 196 → 189). Reverted from this PR after the first CI run flagged Linux x86_64 go_math_big: Mac aarch64 passes 50/50 realworld with the new RegFunc layout, but Linux x86_64 produces wrong BigInt subtraction (wasmtime: 864197532086419753208641975320, zwasm: 864197532160206729503480181784). The regalloc itself is arch-agnostic — the same RegFunc flows through both backends — so the divergence is in src/x86.zig's codegen interaction with the new IR layout. Reproducible on OrbStack my-ubuntu-amd64 with a fresh native x86_64 build. Tracked as W54-coalescer for diagnosis. Cherry-pick path: ec8182f from the archive branch.

Phase 4 (loop-invariant known_consts survival across loop headers) was dropped after RegIR inspection: digitCount emits CONST32 r? = 10 inside the loop body for every divisor site, so the optimisation never fires on the W54 target.

Surfaced lessons

  1. Linux x86_64 CI is irreplaceable for arch-asymmetric regressions. Mac aarch64 + OrbStack-Rosetta both green didn't imply native x86_64 green. The earlier "OrbStack passes" local reading was a stale Mach-O binary — OrbStack Linux can't execute aarch64-darwin and the test fell through to wasmtime's output. Confirmed reproducible on a native OrbStack VM with a fresh build.
  2. Regalloc-stage IR changes are arch-agnostic, but JIT consumption isn't. A new RegFunc shape that's correct by construction can still expose existing backend assumptions in one arch only.

Architecture and rejected alternatives: D138 in .dev/decisions.md. Full session arc + branch names: .dev/w54-redesign-postmortem.md.

Test plan

  • Mac aarch64: bash scripts/gate-commit.sh — tests, spec, e2e, real-world 50/50, FFI 80/80, minimal build all green.
  • JIT byte-identical for every dump-jit'd function (no behavioural codegen change in this PR).
  • CI green on all three OSes (Mac / Ubuntu / Windows) — re-run after the substrate-only force-push.
  • bash scripts/record-merge-bench.sh on main after squash-merge.

Both backends used to maintain their own copy of the same
branch-target / loop-header / loop-body-extent pre-scan. Move the
analysis into src/loop_info.zig so it is the single source of truth
for the rest of the redesign (Phase 1+ enrichments — liveness and
invariant-const classification — extend this struct rather than each
backend's local fields).

Behaviour-neutral: the dump-jit output for tgo_string_ops func#24
(196 instrs, 784 bytes) is byte-for-byte identical to main. Full
Commit Gate green (tests 405/405, spec, e2e, real-world 50/50, FFI
80/80, minimal build).

Refs: .dev/w54-redesign-plan.md (Pillar 1).
Extends LoopInfo.analyse(reg_count) with vreg_first_def[] and
vreg_last_use[] arrays. Computed in the same single forward sweep
that produces branch_targets / loop_headers / loop_end. Stores
(0x36..0x3E), conditional branches (BR_IF / BR_IF_NOT) and RETURN
correctly treat rd as a SOURCE rather than a destination — see
opWritesRd / opUsesRdAsSource. NEVER_DEFINED sentinel marks vregs
that no instruction in the body ever writes (Phase 4 will treat
v < local_count as "defined-before-loop" in the invariant check).

No JIT consumer yet; the JIT compile loop is unchanged so the
emitted machine code is byte-for-byte identical to Phase 0 (verified
on tgo_string_ops func#24, 196 instrs / 784 bytes).

Conservative reads: rs1/rs2_field are treated as live-uses by every
non-control / non-const opcode. Over-approximation only shrinks the
Phase 5 coalescer's window, never breaks correctness. Multi-source
ops (CALL, CALL_INDIRECT, RETURN_MULTI, memory.fill, memory.copy)
that read additional vregs from following NOP slots are not modelled
in Phase 1 — Phase 5 won't coalesce around them anyway.

Full Commit Gate green: tests 409/409 (4 new liveness tests), spec,
e2e, real-world 50/50, FFI, minimal build.

Refs: .dev/w54-redesign-plan.md (Pillar 1 / Phase 1).
D138 captures the LoopInfo substrate as the shipped scope, with
the magic hoist (Phase 3) and the liveness-driven coalescer (Phase
5) explicitly held back. The coalescer was reverted from this PR
after Linux x86_64 CI flagged a go_math_big BigInt divergence:
Mac aarch64 passes 50/50 realworld with the new RegFunc layout,
but x86_64 mis-emits — the bug is in src/x86.zig's interaction
with fewer-MOV / shifted-PC IR. Tracked as W54-coalescer.

checklist.md closes W54 substrate and opens four follow-ups:
- W54-coalescer: diagnose the x86_64 go_math_big regression.
- W54-hoist-revisit: revive the magic hoist (pending W47 + W54-x86).
- W54-x86: x86_64 hoist parity.
- W54-libm: rw_c_math intrinsic recognition.

memo.md handover updated. New: .dev/w54-redesign-postmortem.md
captures the full session arc — magic-hoist-attempt → loop-pass-
redesign → loop-info → revert — with branch names and
cherry-pick instructions for the archived hoist + coalescer work.

Lessons recorded:
- Linux x86_64 CI is irreplaceable for arch-asymmetric regressions.
  Mac green + OrbStack-Rosetta green do not imply native x86_64
  green. The earlier "OrbStack passes" reading was a stale Mach-O
  binary — OrbStack Linux can't run aarch64-darwin and falls through
  to wasmtime's output. Confirmed reproducible on a native OrbStack
  VM with a fresh build.
- Regalloc-stage IR changes are arch-agnostic, but JIT consumption
  isn't. A new RegFunc shape that's correct by construction can
  still expose backend bugs.
@chaploud chaploud force-pushed the develop/w54-loop-info branch from 934d141 to 1607ba9 Compare April 29, 2026 16:09
@chaploud chaploud changed the title feat(w54): LoopInfo substrate + branch-aware mov coalescing feat(w54): LoopInfo substrate (shared branch / loop / liveness analysis) Apr 29, 2026
@chaploud chaploud merged commit f4d061d into main Apr 29, 2026
10 checks passed
@chaploud chaploud deleted the develop/w54-loop-info branch April 29, 2026 16:27
chaploud added a commit that referenced this pull request Apr 29, 2026
Per Merge Gate item 10 — aarch64-darwin baseline at the merge SHA.
Behaviour-neutral merge (substrate-only refactor); the recorded
numbers should match recent main entries within bench σ.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant