🎯 ONE SHOT — Wave 26 · L-GPTQ-ON-GF16: replicate parameter-golf#2135 calibration lever on Trinity GF16 (CPU-only)

# 🎯 ONE SHOT — Wave 26 · L-GPTQ-ON-GF16: replicate PR #2135 calibration lever on Trinity GF16

> Anchor: `phi^2 + phi^-2 = 3` · DOI [10.5281/zenodo.19227877](https://doi.org/10.5281/zenodo.19227877)
> Author for ALL commits: `Dmitrii Vasilev <admin@t27.ai>`
> Branch: `feat/gptq-on-gf16`
> Base: `main`
> External reference: [openai/parameter-golf#2135](https://github.com/openai/parameter-golf/pull/2135) (`GPTQ_CALIBRATION_BATCHES 16 → 32`, paired-t one-tail p<0.25 — `-0.00457 BPB / -0.01000 nats` over PR #1855)

## Why

PR #2135 establishes that doubling the GPTQ Hessian calibration set provides a **statistically significant downstream BPB improvement** (paired-t p=0.138 across seeds {0, 42, 314}) on top of an already-tuned int6/int7 stack with TTT.

The claim is **about the algorithm**, not the bit-format. GPTQ is `Q(·)`-agnostic — it minimises `‖W·X − Q(W)·X‖²` by Hessian-corrected error redistribution across columns and admits any quantiser as a black-box `Q`. We currently quantise GF16 via single-pass `max(|w|)` scale fit (`crates/trios-golden-float/src/lib.rs:136` `quantize_matrix`) — i.e. our equivalent of "GPTQ_CALIBRATION_BATCHES" today is **0**.

This ONE SHOT verifies whether the same lever lifts our floor on **CPU-only + GF16** by porting the GPTQ inner loop with our `gf16_quantize_matrix` plugged in as the quantiser.

## Falsifier

`H0`: GPTQ-correction with `N ∈ {16, 32}` calibration batches gives **no significant BPB improvement** over naive single-pass GF16 quantisation (paired-t one-tail `p ≥ 0.25` across canon seeds `{47, 89, 144}`).

If `H0` cannot be rejected, the result is itself publishable: it says naive GF16 already sits on the Hessian-floor of its representational grid and PR #2135's lever is bit-format-specific.

## Mission

### Lane structure (3 lanes, sequential)

| Lane | Deliverable | Acceptance |
|---|---|---|
| **L-26-A** Coq invariant | `coq/Trios_GPTQ_GF16.v` proving `gptq_correct ∘ Q_GF16` preserves the invariant `‖W·X − dequant(Q(W))·X‖² ≤ ‖W·X − Q_naive(W)·X‖²` for all PSD `H = 2·X·X^T` | `coqc` clean, witness JSON in `assertions/coq_runtime_invariants.json` |
| **L-26-B** Rust impl | `trios-golden-float`: new fn `gf16_quantize_matrix_gptq(W, X_batches, calibration_n)` — Cholesky on `H + λ·I`, column-wise quant + error scatter via `H^{-1}` row, plug `gf16_quantize_matrix` as inner `Q` | `cargo test -p trios-golden-float` green; reconstruction-MSE strictly ≤ baseline on ≥ 3 random PSD `X` |
| **L-26-C** 3-seed ablation | bin `gptq_calibration_ablation` runs 3×3 grid (`seeds = {47, 89, 144}` × `N ∈ {0, 16, 32}`) on canonical IGLA replay; emits `assertions/calibration_ablation.jsonl` (one JSON per row) and a paired-t analysis | rows present for all 9 cells; paired-t script reproduces verdict on stdin replay |

### Files to create / modify

```
coq/Trios_GPTQ_GF16.v                                    NEW (~250 lines)
crates/trios-golden-float/src/gptq.rs                    NEW (~180 lines)
crates/trios-golden-float/src/lib.rs                     +pub use gptq::*
crates/trios-golden-float/tests/gptq_reconstruction.rs   NEW (~120 lines)
src/bin/gptq_calibration_ablation.rs                     NEW (~220 lines)
assertions/calibration_ablation.jsonl                    NEW (10 rows: 9 grid + 1 verdict)
assertions/coq_runtime_invariants.json                   APPEND 1 entry
docs/wave26_gptq_on_gf16.md                              NEW report
MIGRATION.md / CHANGELOG.md                              +1 line
```

### Algorithm — exact GPTQ inner loop (port from PR #2135 lineage)

```
input:  W ∈ R^{rows × cols}
        X ∈ R^{cols × n_samples}  (concatenation of N calibration batches)
        Q : R^{rows} → GF16   (= gf16_quantize_matrix as black box)
        λ : f32                (dampening, default 1e-2 · trace(H)/cols)

H ← 2 · X · X^T  + λ·I
L ← Cholesky(H)              # lower-triangular
H_inv ← solve_triangular(L, I)·solve_triangular(L^T, I)

for j = 0..cols-1:
    w_j ← W[:, j]
    q_j ← Q(w_j)
    err ← (w_j − dequant(q_j)) / H_inv[j, j]
    # scatter remaining error on yet-unquantised columns
    W[:, j+1..] -= err · H_inv[j, j+1..]
    Q_OUT[:, j] ← q_j

return Q_OUT
```

Calibration data `X` is sampled from **training shards only** (R-#1017-style) — never validation. With `calibration_n = 0`, the function MUST be byte-equivalent to `gf16_quantize_matrix` (i.e. naive scale fit with no error scatter) — this is the baseline run.

### Acceptance gates

| Gate | Check |
|---|---|
| **G1** | `cargo check --all-targets` clean |
| **G2** | `cargo test -p trios-golden-float` green (reconstruction-MSE invariant) |
| **G3** | `coqc coq/Trios_GPTQ_GF16.v` clean, no `Admitted.` |
| **G4** | `cargo run --release --bin gptq_calibration_ablation` produces 9 rows + paired-t row in `assertions/calibration_ablation.jsonl` |
| **G5** | Paired-t analysis (one-tailed, df=2) printed to stdout: report `t-stat`, `p`, verdict at `p<0.25` for both `(N=0 vs N=16)` and `(N=16 vs N=32)` |
| **G6** | All required CI checks green |

### PR mechanics

- Title: `feat(gf16): port GPTQ Hessian-correction with GF16 quantiser (replicates parameter-golf#2135 lever on CPU)`
- Branch: `feat/gptq-on-gf16` off `main`
- Body: Why · External reference (link to #2135) · Falsifier · Lane summary · Acceptance gates table · Anchor
- Labels: `enhancement`, `P1`, `experiment`
- Squash-merge, delete branch, **no `--admin`**

## Anti-fakery rules

- Calibration data MUST come from training shards only (no validation peek).
- All 9 ablation cells must be **independently replayable** — emit `seed`, `N`, `git_sha`, `wallclock_ms`, `reconstruction_mse`, `bpb_post_quant` per row.
- The `N=0` row must be **byte-identical reconstruction** to current `gf16_quantize_matrix` (sanity assert in test).
- Paired-t rows must include raw per-seed Δ values (no summary-only).
- No claim of "lift confirmed" unless `p<0.25` paired-t reaches BOTH `(0→16)` AND `(16→32)`.

## Forbidden

- ❌ no `[scrape]` / `[crawl]` words anywhere
- ❌ no `--admin` merge
- ❌ no DROP / TRUNCATE in any migration
- ❌ no NEON_* env primary
- ❌ no commit of `/tmp/*.sh`
- ❌ no calibration data sourced from validation chunks
- ❌ no fake green: failing the falsifier IS a valid result, document it honestly per R5

## R-discipline

R1 Rust-only (no Python in src) · R3 PR-only · R4 trace (every cell timestamped + sha-pinned) · R5 honest (falsifier explicitly stated) · R7 witness (assertions/jsonl) · R8 falsifier · R10 atomic (single-purpose PR per lane) · R12 reversible (calibration_n=0 path preserved as default).

## Battle cry

`phi^2 + phi^-2 = 3 · TRINITY · PORT THE LEVER · PROVE OR FALSIFY ON GF16`


Lane	Deliverable	Acceptance
L-26-A Coq invariant	`coq/Trios_GPTQ_GF16.v` proving `gptq_correct ∘ Q_GF16` preserves the invariant `‖W·X − dequant(Q(W))·X‖² ≤ ‖W·X − Q_naive(W)·X‖²` for all PSD `H = 2·X·X^T`	`coqc` clean, witness JSON in `assertions/coq_runtime_invariants.json`
L-26-B Rust impl	`trios-golden-float`: new fn `gf16_quantize_matrix_gptq(W, X_batches, calibration_n)` — Cholesky on `H + λ·I`, column-wise quant + error scatter via `H^{-1}` row, plug `gf16_quantize_matrix` as inner `Q`	`cargo test -p trios-golden-float` green; reconstruction-MSE strictly ≤ baseline on ≥ 3 random PSD `X`
L-26-C 3-seed ablation	bin `gptq_calibration_ablation` runs 3×3 grid (`seeds = {47, 89, 144}` × `N ∈ {0, 16, 32}`) on canonical IGLA replay; emits `assertions/calibration_ablation.jsonl` (one JSON per row) and a paired-t analysis	rows present for all 9 cells; paired-t script reproduces verdict on stdin replay

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🎯 ONE SHOT — Wave 26 · L-GPTQ-ON-GF16: replicate parameter-golf#2135 calibration lever on Trinity GF16 (CPU-only) #645

🎯 ONE SHOT — Wave 26 · L-GPTQ-ON-GF16: replicate PR #2135 calibration lever on Trinity GF16

Why

Falsifier

Mission

Lane structure (3 lanes, sequential)

Files to create / modify

Algorithm — exact GPTQ inner loop (port from PR #2135 lineage)

Acceptance gates

PR mechanics

Anti-fakery rules

Forbidden

R-discipline

Battle cry

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Gate	Check
G1	`cargo check --all-targets` clean
G2	`cargo test -p trios-golden-float` green (reconstruction-MSE invariant)
G3	`coqc coq/Trios_GPTQ_GF16.v` clean, no `Admitted.`
G4	`cargo run --release --bin gptq_calibration_ablation` produces 9 rows + paired-t row in `assertions/calibration_ablation.jsonl`
G5	Paired-t analysis (one-tailed, df=2) printed to stdout: report `t-stat`, `p`, verdict at `p<0.25` for both `(N=0 vs N=16)` and `(N=16 vs N=32)`
G6	All required CI checks green

🎯 ONE SHOT — Wave 26 · L-GPTQ-ON-GF16: replicate parameter-golf#2135 calibration lever on Trinity GF16 (CPU-only) #645

Description

🎯 ONE SHOT — Wave 26 · L-GPTQ-ON-GF16: replicate PR #2135 calibration lever on Trinity GF16

Why

Falsifier

Mission

Lane structure (3 lanes, sequential)

Files to create / modify

Algorithm — exact GPTQ inner loop (port from PR #2135 lineage)

Acceptance gates

PR mechanics

Anti-fakery rules

Forbidden

R-discipline

Battle cry

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions