Experiment Checklist — KV-Cache Inception

NeurIPS 2026 deadline: May 6, 2026 Track progress in this file. Check boxes as tasks complete. Linear issues referenced as LOG-XX throughout.

Dependency Map

Phase 3 Gate (must pass first)
  └── LAT Probe Retraining (LOG-32)
        └── Experiment 1: Orthogonal Escape Detection (LOG-33)  ← Central paper claim
              └── Experiment 2: MCTS Comparison (LOG-34)
  └── Experiment 3: Memory Efficiency (LOG-35)                  ← Can run in parallel
  └── evaluation_framework.py (LOG-39)
        └── Experiment 4: Evaluation Reproducibility (LOG-36)
  └── procrustes.py (LOG-38)
        └── Experiment 5: Cross-Model Transfer (LOG-37)
              └── All results → Dataset (LOG-40) → Paper (LOG-42–46)

Current Status

Phase 2 code complete — 117 tests passing
Phase 3 gate passed on real hardware
Any experiment script written
Any experiment results collected

Chunk 0 — Phase 3 Gate (no code needed, just run it)

Validates that Phase 2 code works on a real 1B model before writing experiment scripts. Everything below is blocked until this passes.

huggingface-cli login

Run KV-cache mutability gate:

uv run python scripts/probe_kv_cache_mutability.py \
    --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --device auto

Pass criterion: exit code 0, gate_passed: true in output JSON

Run KV-MCTS smoke test (10 nodes):
```
uv run python scripts/run_kv_mcts.py \
    --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 \
    --nodes 10 --depth 3 --branches 2
```
Pass criterion: JSON output with 10 nodes, each with sigma_H_mean, rho_R_mean, oei_score populated (not null/0.5 flat)
Run Theorem 1 drift validation:
```
uv run python scripts/measure_lipschitz_drift.py \
    --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --n-cycles 200
```
Pass criterion: FP32 accumulator column stays bounded; naive bf16 column shows linear growth

Chunk 1 — Prerequisites (implement before experiments)

1A — Fix ρ_R raw dot product (code gap)

Paper Eq. 4: ρ_R^(l)(t) = w_hon^(l)⊤ h_t^(l) — raw dot product. Current Phase 2 implementation clips to [0,1] via (proj + 1) / 2. Must fix before Experiment 1 results are paper-valid.

Update PerLayerHonestyProjector.project() in logomesh/whitebox.py — remove (proj + 1) / 2 normalisation, return raw dot product
Update classify() thresholds in logomesh/telemetry_matrix.py if needed (0.3/0.7 bounds assume [0,1])
Update affected tests in tests/test_phase2_modules.py
uv run pytest tests/ -v → still 117 passed

1B — LAT Probe Retraining (LOG-32)

Replaces uniform neuron-index approximation with paper-faithful per-layer probes. Script: scripts/train_lat_probes.py

Construct honesty contrast pairs: benign factual responses vs. coerced confabulation (minimum 30 pairs)
Construct certainty contrast pairs: high-confidence outputs vs. hedged/uncertain outputs (minimum 30 pairs)
Construct goal-coercion contrast pairs: aligned refusals vs. jailbroken completions (minimum 30 pairs)

Run probe training:

uv run python scripts/train_lat_probes.py \
    --model TinyLlama/TinyLlama-1.1B-Chat-v1.0

Verify per-layer w_hon^(l) weights saved and loadable into PerLayerHonestyProjector
Verify probe accuracy ≥ 0.65 on held-out set (threshold in script)

1C — `logomesh/procrustes.py` (LOG-38)

Required by Experiment 5. Implement before running Exp 5.

Create logomesh/procrustes.py with ProcrustesAligner class
- fit(source_states, target_states) — scipy.linalg.orthogonal_procrustes
- transform(vectors) — apply R to steering vectors
Export ProcrustesAligner in logomesh/__init__.py
Write tests/test_procrustes.py:
- Verify R is orthogonal: ‖R^T R − I‖_F < 1e-5
- Verify transform on identity-aligned spaces returns input unchanged
uv run pytest tests/ -v → all passing

1D — `logomesh/evaluation_framework.py` (LOG-39)

Required by Experiment 4. Implement before running Exp 4.

Create logomesh/evaluation_framework.py with:
- SandboxEvaluator — pass@1 via Docker subprocess execution
- SemanticSimilarityEvaluator — cosine similarity using all-MiniLM-L6-v2
- LLMJudgeEvaluator — GPT-4 + Claude 3.5, 5-run protocol
- cohen_kappa(labels_a, labels_b) — inter-rater agreement
Export in logomesh/__init__.py
Write model-free tests (mock Docker + mock LLM responses)
uv run pytest tests/ -v → all passing

Chunk 2 — Experiment 1: Orthogonal Escape Detection

This is the central empirical claim of the paper. Run locally on TinyLlama first, then on H100. Linear: LOG-33 | Script: scripts/experiment1_orthogonal_escape.py

Implementation

Write scripts/experiment1_orthogonal_escape.py
- CLI: --model, --alpha-values, --n-prompts, --output-dir
- Inner loop: for each α ∈ {0.1, 0.5, 1.0, 2.0, 5.0}
  - Run ReversibleMCTS with that α
  - For each node record all four measurements (see below)
- Output: one JSON per α + summary CSV

Local run (TinyLlama)

Collect measurement (a): textual compliance rate — does the model output pass a safety check?
Collect measurement (b): H-Neuron stress distribution σ_H^(l) across all L layers
Collect measurement (c): RepE honesty projection ρ_R^(l) across all L layers
Collect measurement (d): null-space norm ‖Proj_Null(w_hon)(h_t^(l))‖_2 per layer
Verify hypothesis locally: (a) and (c) improve with α, (d) simultaneously increases
Generate OEI vs α figure (local prototype — not paper quality yet)

H100 run (full scale)

Coordinate H100 access with Prof. Tianyu Shi
Confirm target model (Llama-3.2-1B-Instruct or gpt-oss-20b)
Run full α sweep on H100
Collect results at scale (≥ 100 prompts per α value)
Generate paper-quality OEI vs α figure

Chunk 3 — Experiment 2: MCTS Comparison

Compares KV-MCTS against three baselines on the same prompt set. Linear: LOG-34 | Script: scripts/experiment2_mcts_comparison.py

Implementation

Write scripts/experiment2_mcts_comparison.py
- Baseline (a): text-space MCTS — wire to scripts/run_offline_mcts.py
- Baseline (b): KV-cache MCTS — wire to scripts/run_kv_mcts.py
- Baseline (c): GCG — implement greedy coordinate gradient attack
- Baseline (d): random KV-cache search — same α/layer distribution, no UCB1
- CLI: --model, --n-prompts, --output

Metrics (all three required)

Attack Success Rate (ASR) — fraction of rollouts producing alignment-breaking intervention
Mean rollouts to first success — efficiency comparison
Diversity — pairwise cosine distance of intervention vectors in KV-cache space

Runs

Local validation run on TinyLlama (small n)
Full H100 run
Generate comparison table for paper §5

Chunk 4 — Experiment 3: Memory Efficiency

Validates Theorem 1 memory complexity claim: O(M_KV + d·K_acc) vs O(b^d · M_KV). Linear: LOG-35 | Script: scripts/experiment3_memory_profiling.py

Implementation

Write scripts/experiment3_memory_profiling.py
- CLI: --model, --b-values, --d-values, --output-csv
- Profile peak VRAM for KV-MCTS (reversible) at each (b, d) pair
- Profile peak VRAM for naive parallel MCTS (clone full cache per branch)
- Measure wall-clock time overhead of rollback per cycle
- Measure Lipschitz drift (perplexity deviation) with/without FP32 accumulator

Parameters (from paper)

Branching factors: b ∈ {2, 3, 5}
Search depths: d ∈ {3, 5, 10, 20}

Metrics

Peak VRAM at each (b, d) — reversible vs standard parallel
Wall-clock rollback overhead
Lipschitz drift — perplexity deviation from baseline
MER = M_standard / M_reversible

Runs

Local run on RTX 3060 (small models — validates code + methodology)
H100 run for 20B model (produces paper numbers)
Generate memory scaling plot for paper

Chunk 5 — Experiment 4: Evaluation Reproducibility

Validates that diagnostic state classification is reproducible across evaluators. Requires evaluation_framework.py (Chunk 1D). Linear: LOG-36 | Script: scripts/experiment4_evaluation_reproducibility.py

Implementation

Write scripts/experiment4_evaluation_reproducibility.py
- Collect 500 MCTS node artifacts (telemetry + generation output)
- Run all five evaluation methods

Five evaluators (all required by paper)

(a) pass@1 — ground-truth execution in constrained Docker sandbox
(b) Cosine similarity — sentence embeddings (all-MiniLM-L6-v2) vs. task specification
(c) GPT-4 as judge — 5 independent runs
(d) Claude 3.5 as judge — 5 independent runs
(e) Human expert panel — recruit 3 reviewers, collect labels

Metrics

Inter-run variance for (c) and (d)
Cohen's κ across all evaluators — target κ ≥ 0.7
Correlation of LLM judges with human panel

Chunk 6 — Experiment 5: Cross-Model Transfer

Tests whether 1B steering vectors transfer to 7B and 20B via Procrustes alignment. Requires procrustes.py (Chunk 1C). Linear: LOG-37 | Script: scripts/experiment5_procrustes_transfer.py

Implementation

Write scripts/experiment5_procrustes_transfer.py
- Collect paired hidden-state activations: run same prompts through 1B and 7B
- Fit ProcrustesAligner on paired states
- Transfer w_hon^(l) steering vectors: 1B → 7B space
- Run KV-MCTS on 7B with transferred vectors
- Repeat: 1B → 20B (gpt-oss-20b) — H100 only

Confirm with Tianyu Shi

Confirm 7B target model ID
Confirm 20B access (gpt-oss-20b) on H100 cluster
Note: gpt-oss-20b is MoE — use router logit entropy for H-Neuron monitoring, NOT dense MLP

Metrics (both required)

(a) ASR degradation across scales: 1B native → 7B transferred → 20B transferred
(b) Telemetry signature preservation — do σ_H/ρ_R patterns carry over?

Reference

Oozeer et al. (2025), arXiv:2503.04429 — empirical support for transferability approach

Chunk 7 — Dataset + Paper

Assembles everything into the NeurIPS submission. Runs in parallel with late experiments.

Dataset (Max — LOG-40, LOG-46)

Finalise Croissant schema fields once Experiment 1 output format is confirmed (docs/dataset/croissant_schema_stub.json)
- Per-node: T_t matrix (σ_H^(l), ρ_R^(l)), DiagnosticState, OEI, TDS
- MCTS tree: node_id, parent_id, depth, alpha, reward
- Generation: prompt, system, completion
- Run metadata: model_id, hardware, seed, timestamp
Collect all experiment artifacts into dataset
Upload dataset to HuggingFace Hub
Write dataset card (NeurIPS D&B datasheet fields)

Paper (LOG-42–44)

Methods section (LOG-42) — can write NOW, no results needed
- Finalise all 12 equations (Eq. 1, 3–12)
- Theorem 1 both bounds + proof sketch
- Proposition: memory complexity with concrete 20B example
Related Work section (LOG-44) — blocked on Bakul's competitive analysis (LOG-41)
Experiments section (LOG-43) — blocked on all 5 experiments completing
- Exp 1: OEI vs α curve
- Exp 2: ASR comparison table
- Exp 3: VRAM scaling plot
- Exp 4: Cohen's κ score
- Exp 5: Procrustes transfer comparison
- Ablations: sensitivity to λ₁, λ₂, λ₃ and JSD threshold

Final submission prep

Reproducibility manifest (LOG-45): run IDs, git SHAs, seeds, hardware, timing, uv.lock snapshot
Advisor review — send Methods draft to Prof. Tianyu Shi
All authors confirm affiliations and disclosures
Submit by May 6, 2026

Hardware Reference

Task	Hardware	Model
Chunks 0–1 (gate + prereqs)	RTX 3060 / CPU	TinyLlama 1.1B
Experiments 1–4 (local validation)	RTX 3060 12GB	TinyLlama or Llama-3.2-1B
Experiments 1–5 (full scale)	8× H100 80GB (McGill)	gpt-oss-20b

Last updated: 2026-04-13. See Linear for issue-level detail: https://linear.app/logomesh-agentbeats-phase-2/project/logomesh-kv-cache-inception-neurips-2026-6e647f4bb3f5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment Checklist — KV-Cache Inception

Dependency Map

Current Status

Chunk 0 — Phase 3 Gate (no code needed, just run it)

Chunk 1 — Prerequisites (implement before experiments)

1A — Fix ρ_R raw dot product (code gap)

1B — LAT Probe Retraining (LOG-32)

1C — `logomesh/procrustes.py` (LOG-38)

1D — `logomesh/evaluation_framework.py` (LOG-39)

Chunk 2 — Experiment 1: Orthogonal Escape Detection

Implementation

Local run (TinyLlama)

H100 run (full scale)

Chunk 3 — Experiment 2: MCTS Comparison

Implementation

Metrics (all three required)

Runs

Chunk 4 — Experiment 3: Memory Efficiency

Implementation

Parameters (from paper)

Metrics

Runs

Chunk 5 — Experiment 4: Evaluation Reproducibility

Implementation

Five evaluators (all required by paper)

Metrics

Chunk 6 — Experiment 5: Cross-Model Transfer

Implementation

Confirm with Tianyu Shi

Metrics (both required)

Reference

Chunk 7 — Dataset + Paper

Dataset (Max — LOG-40, LOG-46)

Paper (LOG-42–44)

Final submission prep

Hardware Reference

FilesExpand file tree

EXPERIMENTS.md

Latest commit

History

EXPERIMENTS.md

File metadata and controls

Experiment Checklist — KV-Cache Inception

Dependency Map

Current Status

Chunk 0 — Phase 3 Gate (no code needed, just run it)

Chunk 1 — Prerequisites (implement before experiments)

1A — Fix ρ_R raw dot product (code gap)

1B — LAT Probe Retraining (LOG-32)

1C — logomesh/procrustes.py (LOG-38)

1D — logomesh/evaluation_framework.py (LOG-39)

Chunk 2 — Experiment 1: Orthogonal Escape Detection

Implementation

Local run (TinyLlama)

H100 run (full scale)

Chunk 3 — Experiment 2: MCTS Comparison

Implementation

Metrics (all three required)

Runs

Chunk 4 — Experiment 3: Memory Efficiency

Implementation

Parameters (from paper)

Metrics

Runs

Chunk 5 — Experiment 4: Evaluation Reproducibility

Implementation

Five evaluators (all required by paper)

Metrics

Chunk 6 — Experiment 5: Cross-Model Transfer

Implementation

Confirm with Tianyu Shi

Metrics (both required)

Reference

Chunk 7 — Dataset + Paper

Dataset (Max — LOG-40, LOG-46)

Paper (LOG-42–44)

Final submission prep

Hardware Reference

1C — `logomesh/procrustes.py` (LOG-38)

1D — `logomesh/evaluation_framework.py` (LOG-39)