Skip to content

🎯 ONE SHOT — Wave 26 · L-GPTQ-ON-GF16: replicate parameter-golf#2135 calibration lever on Trinity GF16 (CPU-only) #645

@gHashTag

Description

@gHashTag

🎯 ONE SHOT — Wave 26 · L-GPTQ-ON-GF16: replicate PR #2135 calibration lever on Trinity GF16

Anchor: phi^2 + phi^-2 = 3 · DOI 10.5281/zenodo.19227877
Author for ALL commits: Dmitrii Vasilev <admin@t27.ai>
Branch: feat/gptq-on-gf16
Base: main
External reference: openai/parameter-golf#2135 (GPTQ_CALIBRATION_BATCHES 16 → 32, paired-t one-tail p<0.25 — -0.00457 BPB / -0.01000 nats over PR #1855)

Why

PR #2135 establishes that doubling the GPTQ Hessian calibration set provides a statistically significant downstream BPB improvement (paired-t p=0.138 across seeds {0, 42, 314}) on top of an already-tuned int6/int7 stack with TTT.

The claim is about the algorithm, not the bit-format. GPTQ is Q(·)-agnostic — it minimises ‖W·X − Q(W)·X‖² by Hessian-corrected error redistribution across columns and admits any quantiser as a black-box Q. We currently quantise GF16 via single-pass max(|w|) scale fit (crates/trios-golden-float/src/lib.rs:136 quantize_matrix) — i.e. our equivalent of "GPTQ_CALIBRATION_BATCHES" today is 0.

This ONE SHOT verifies whether the same lever lifts our floor on CPU-only + GF16 by porting the GPTQ inner loop with our gf16_quantize_matrix plugged in as the quantiser.

Falsifier

H0: GPTQ-correction with N ∈ {16, 32} calibration batches gives no significant BPB improvement over naive single-pass GF16 quantisation (paired-t one-tail p ≥ 0.25 across canon seeds {47, 89, 144}).

If H0 cannot be rejected, the result is itself publishable: it says naive GF16 already sits on the Hessian-floor of its representational grid and PR #2135's lever is bit-format-specific.

Mission

Lane structure (3 lanes, sequential)

Lane Deliverable Acceptance
L-26-A Coq invariant coq/Trios_GPTQ_GF16.v proving gptq_correct ∘ Q_GF16 preserves the invariant ‖W·X − dequant(Q(W))·X‖² ≤ ‖W·X − Q_naive(W)·X‖² for all PSD H = 2·X·X^T coqc clean, witness JSON in assertions/coq_runtime_invariants.json
L-26-B Rust impl trios-golden-float: new fn gf16_quantize_matrix_gptq(W, X_batches, calibration_n) — Cholesky on H + λ·I, column-wise quant + error scatter via H^{-1} row, plug gf16_quantize_matrix as inner Q cargo test -p trios-golden-float green; reconstruction-MSE strictly ≤ baseline on ≥ 3 random PSD X
L-26-C 3-seed ablation bin gptq_calibration_ablation runs 3×3 grid (seeds = {47, 89, 144} × N ∈ {0, 16, 32}) on canonical IGLA replay; emits assertions/calibration_ablation.jsonl (one JSON per row) and a paired-t analysis rows present for all 9 cells; paired-t script reproduces verdict on stdin replay

Files to create / modify

coq/Trios_GPTQ_GF16.v                                    NEW (~250 lines)
crates/trios-golden-float/src/gptq.rs                    NEW (~180 lines)
crates/trios-golden-float/src/lib.rs                     +pub use gptq::*
crates/trios-golden-float/tests/gptq_reconstruction.rs   NEW (~120 lines)
src/bin/gptq_calibration_ablation.rs                     NEW (~220 lines)
assertions/calibration_ablation.jsonl                    NEW (10 rows: 9 grid + 1 verdict)
assertions/coq_runtime_invariants.json                   APPEND 1 entry
docs/wave26_gptq_on_gf16.md                              NEW report
MIGRATION.md / CHANGELOG.md                              +1 line

Algorithm — exact GPTQ inner loop (port from PR #2135 lineage)

input:  W ∈ R^{rows × cols}
        X ∈ R^{cols × n_samples}  (concatenation of N calibration batches)
        Q : R^{rows} → GF16   (= gf16_quantize_matrix as black box)
        λ : f32                (dampening, default 1e-2 · trace(H)/cols)

H ← 2 · X · X^T  + λ·I
L ← Cholesky(H)              # lower-triangular
H_inv ← solve_triangular(L, I)·solve_triangular(L^T, I)

for j = 0..cols-1:
    w_j ← W[:, j]
    q_j ← Q(w_j)
    err ← (w_j − dequant(q_j)) / H_inv[j, j]
    # scatter remaining error on yet-unquantised columns
    W[:, j+1..] -= err · H_inv[j, j+1..]
    Q_OUT[:, j] ← q_j

return Q_OUT

Calibration data X is sampled from training shards only (R-#1017-style) — never validation. With calibration_n = 0, the function MUST be byte-equivalent to gf16_quantize_matrix (i.e. naive scale fit with no error scatter) — this is the baseline run.

Acceptance gates

Gate Check
G1 cargo check --all-targets clean
G2 cargo test -p trios-golden-float green (reconstruction-MSE invariant)
G3 coqc coq/Trios_GPTQ_GF16.v clean, no Admitted.
G4 cargo run --release --bin gptq_calibration_ablation produces 9 rows + paired-t row in assertions/calibration_ablation.jsonl
G5 Paired-t analysis (one-tailed, df=2) printed to stdout: report t-stat, p, verdict at p<0.25 for both (N=0 vs N=16) and (N=16 vs N=32)
G6 All required CI checks green

PR mechanics

  • Title: feat(gf16): port GPTQ Hessian-correction with GF16 quantiser (replicates parameter-golf#2135 lever on CPU)
  • Branch: feat/gptq-on-gf16 off main
  • Body: Why · External reference (link to #2135) · Falsifier · Lane summary · Acceptance gates table · Anchor
  • Labels: enhancement, P1, experiment
  • Squash-merge, delete branch, no --admin

Anti-fakery rules

  • Calibration data MUST come from training shards only (no validation peek).
  • All 9 ablation cells must be independently replayable — emit seed, N, git_sha, wallclock_ms, reconstruction_mse, bpb_post_quant per row.
  • The N=0 row must be byte-identical reconstruction to current gf16_quantize_matrix (sanity assert in test).
  • Paired-t rows must include raw per-seed Δ values (no summary-only).
  • No claim of "lift confirmed" unless p<0.25 paired-t reaches BOTH (0→16) AND (16→32).

Forbidden

  • ❌ no [scrape] / [crawl] words anywhere
  • ❌ no --admin merge
  • ❌ no DROP / TRUNCATE in any migration
  • ❌ no NEON_* env primary
  • ❌ no commit of /tmp/*.sh
  • ❌ no calibration data sourced from validation chunks
  • ❌ no fake green: failing the falsifier IS a valid result, document it honestly per R5

R-discipline

R1 Rust-only (no Python in src) · R3 PR-only · R4 trace (every cell timestamped + sha-pinned) · R5 honest (falsifier explicitly stated) · R7 witness (assertions/jsonl) · R8 falsifier · R10 atomic (single-purpose PR per lane) · R12 reversible (calibration_n=0 path preserved as default).

Battle cry

phi^2 + phi^-2 = 3 · TRINITY · PORT THE LEVER · PROVE OR FALSIFY ON GF16

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1enhancementNew feature or requestone-shotONE SHOT mission issue

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions