perf(layout): paragraph wrapping speedups (F1 running-width, F2 binary-search) + benchmark regression gate by DemchaAV · Pull Request #142 · DemchaAV/GraphCompose

DemchaAV · 2026-06-08T14:50:30Z

Summary

Consolidates the private performance cycle: two verified canonical-layout
speedups plus the benchmark regression system that proves them. Rendering is
byte-identical (1144 tests, no snapshot/baseline rewritten); ./mvnw verify -pl .
green.

Engine changes (`TextFlowSupport`)

F1 — wrapParagraph running-width. The greedy wrapper keeps a running line
width and measures each token once, instead of re-measuring the whole accumulated
line on every token. Deterministic counter probe: long-text measured characters
−89% (291,324 → 32,457). Removes O(line-length × tokens) measured-char work — and
the per-glyph sanitize/encode it triggered — from paragraph layout. Byte-identical.
F2 — fitCharacters binary search. Long unbreakable tokens (URLs/IDs/no-space
runs, narrow columns) break via binary search instead of re-measuring every growing
prefix one char at a time. Worst-case −44% wall-clock (same-session A/B on a new
long-token scenario) and −80% measured chars (652 → 97 width calls). The fit
predicate is monotonic, so the search returns the same break index → byte-identical.
F1b — StringBuilder line assembly. Byte-identical cleanup; assembles each
wrapped line in a reused StringBuilder rather than currentLine + token. Removes
a latent O(line²) char-copy on pathologically wide/unwrapped lines. No measurable
steady-state perf (warm A/B: 719.8 → 719.8 KB) — kept as a defensive cleanup, not
a perf claim.

Benchmark regression system (benchmarks module — not part of the published library)

BenchmarkVerdictTool (+test): compares a current-speed run to the committed
baseline (baselines/current-speed-full.json); classifies each scenario
improved/neutral/regressed. Hard gate = average latency only; peak heap is
advisory (GC-timing noisy). A single run is advisory; the hard gate needs a median.
CountingTextMeasurementSystem + MeasurementCountBenchmark (+test):
deterministic measurement-call counts and per-compile allocation bytes
(ThreadMXBean) — proof independent of wall-clock/GC noise. The probe warms up
before its allocation window so Alloc KB reflects steady state.
CurrentSpeedBenchmark: new long-token worst-case scenario.
scripts/run-benchmarks.ps1: 11-verdict-current-speed gate step
(skippable via -SkipVerdict).

Heap note (honest)

The audit flagged a "~40 MB to lay out one paragraph" headline. Investigation proved
this was a JVM cold-start artifact (class-load / JIT / static-init on the probe's
first compile), not a layout cost: warm steady-state for the same document is
~0.65 MB (≈56× less), and allocation scales sub-linearly. There is no heap bug; the
probe was fixed to warm up so its numbers are trustworthy.

Net

Change	Verified result
F1	measured chars −89% (exact) → CPU on text-heavy docs
F2	long-token −44% wall-clock, measured work −80%
F1b	byte-identical cleanup (no measurable perf)
Tooling	regression gate + deterministic probe (now warm/honest)

One optimization family per commit; each delta is attributable. Base: develop.

…probe BenchmarkVerdictTool classifies a current-speed run vs the committed baseline (improved/neutral/regressed) and exits non-zero on a regression beyond the noise band. MeasurementCountBenchmark + CountingTextMeasurementSystem capture deterministic textWidth call counts and per-compile allocation bytes (ThreadMXBean) for proving algorithmic/allocation changes. run-benchmarks.ps1 gains the 11-verdict-current-speed gate step (skippable via -SkipVerdict). Adds baselines/current-speed-full.json (full-profile median). Benchmark-module only; not part of the published library.

…line prefix The greedy line wrapper measured textWidth(currentLine + nextToken) on every token, re-measuring the whole accumulated line - O(line-length x tokens) measured characters plus the per-glyph sanitize/encode it triggers. Keep a running line width and measure each token once instead; line starts re-measure to pin FP drift. Glyph advances are additive (no kerning) and EPS=1e-6 absorbs FP, so break points are unchanged - rendering is byte-identical (1144 tests + all layout/visual snapshots pass). Probe: long-text measured characters 291,324 -> 32,457 (~9x fewer); same-session A/B (full, Repeat 7): proposal -57% time / +131% throughput. No API or behaviour change. Refs audit finding F1.

…advisory peakHeapMb is a Runtime used-heap delta - GC-timing dependent and very noisy (observed 48-170 MB across repeats of identical code), so it false-failed the gate on invoice-template (heap +18.7%) even though that run was -15% faster on time. BenchmarkVerdictTool now hard-gates on average latency only; peakHeapMb is reported as advisory (still shown, never fails the build). The deterministic heap signal stays in MeasurementCountBenchmark (per-compile allocation bytes). run-benchmarks.ps1: step 11 runs the verdict as advisory for single runs (Repeat 1) and hard-gates only for medians (-Repeat >= 2), since one run is too noisy to gate against a median baseline. Unit test + CHANGELOG updated.

…uadratic re-measure) fitCharacters re-measured text.substring(0,index) for every index when breaking a long unbreakable token - O(n) width calls and O(n^2) measured characters. The fit predicate width(prefix) <= maxWidth is monotonic in prefix length, so binary-search the break index instead: it returns the same lastFitting (byte-identical wrapping) in O(log n) width calls. Probe on a 600-char token: width calls 652 -> 97, measured chars 36,317 -> 7,114, alloc ~1.5MB -> ~0.8MB. long-text (F1 path) and tables untouched; 1144 tests pass with no snapshot drift. Refs audit finding F2.

40 paragraphs with ~520-char unbreakable URL/ID tokens that overflow the line and force splitLongToken/fitCharacters. Makes the F2 worst case visible (same-session A/B: -44% avg, 14.47 -> 8.06 ms on this scenario) and guards against re-introducing quadratic long-token wrapping.

…n string copy) wrapParagraph concatenated Strings token-by-token (currentLine + token), re-copying the whole growing line each token and producing a throwaway String per step. Accumulate in a reused StringBuilder instead; the character sequence is identical so wrapping stays byte-for-byte the same (1144 tests, snapshots clean). Measured effect is small on typical text (~1% less compile allocation on long-text, lines bounded by column width) but it removes a latent O(line-length^2) copy on very wide/unwrapped lines.

The probe measured each scenario once, and the first scenario (long-text) in a fresh JVM carried ~36 MB of one-time class-load/JIT/static-init allocation -- a JVM artifact, not a layout cost (verified: cold first compile 36.6 MB vs warm 0.65 MB for the same document; layout alloc scales sub-linearly). Warm up 5 iterations before the measured pass so Alloc KB reflects steady-state per-document allocation; measurement-count columns are exact regardless. Also drops the F1b CHANGELOG perf claim -- a warm A/B shows no measurable steady-state allocation change (719.8 = 719.8 KB), so F1b stays as a byte-identical latent-O(n^2) cleanup, not a perf win.

…+ dep bump)

DemchaAV added 8 commits June 8, 2026 12:24

Merge develop into perf/engine-pipeline before consolidation (RTL #141 …

5e9b716

…+ dep bump)

DemchaAV merged commit 646a4ac into develop Jun 8, 2026
11 checks passed

DemchaAV deleted the perf/engine-pipeline branch June 8, 2026 15:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(layout): paragraph wrapping speedups (F1 running-width, F2 binary-search) + benchmark regression gate#142

perf(layout): paragraph wrapping speedups (F1 running-width, F2 binary-search) + benchmark regression gate#142
DemchaAV merged 8 commits into
developfrom
perf/engine-pipeline

DemchaAV commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DemchaAV commented Jun 8, 2026

Summary

Engine changes (TextFlowSupport)

Benchmark regression system (benchmarks module — not part of the published library)

Heap note (honest)

Net

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Engine changes (`TextFlowSupport`)