openai · simonbissonnette · May 2, 2026
diff --git a/records/track_non_record_16mb/2026-04-15_Castor_L7GrowthV4_12h_SP8192/ARTIFACTS.md b/records/track_non_record_16mb/2026-04-15_Castor_L7GrowthV4_12h_SP8192/ARTIFACTS.md
@@ -0,0 +1,13 @@
+# Retained Local Artifacts
+
+Large model artifacts are not committed in this non-record folder. They are
+currently retained locally at:
+
+| Artifact | Path | Bytes | SHA256 |
+|---|---|---:|---|
+| final full-precision model | `/home/simon/castorv2/runs/castor_l7grow_v4_12h_seed1337/final_model.pt` | 135431355 | `02959aa988dd1668ca696ce1a0058309ea4fe52d3505f2a560f5240d74f6bac9` |
+| final full-precision snapshot | `/home/simon/castorv2/logs/castor_l7grow_v4_12h_seed1337.final_model_snapshot.pt` | 135431355 | `02959aa988dd1668ca696ce1a0058309ea4fe52d3505f2a560f5240d74f6bac9` |
+| latest training checkpoint | `/home/simon/castorv2/runs/castor_l7grow_v4_12h_seed1337/checkpoints/latest.pt` | 286390027 | `734a0f69d377a96439ae3ba8a4814741c5b270f6ef921e912f10ed48f93e4466` |
+
+The run used `SKIP_FINAL_PACKAGING=1`, so no final compressed int6 `.ptz`
+artifact was produced for this archive.
diff --git a/records/track_non_record_16mb/2026-04-15_Castor_L7GrowthV4_12h_SP8192/README.md b/records/track_non_record_16mb/2026-04-15_Castor_L7GrowthV4_12h_SP8192/README.md
@@ -0,0 +1,114 @@
+# L7 growth v4 precursor to PR 2014, 12 hours on RTX 4090, val_bpb 0.9697 pre-quant
+
+This is an archival **non-record** submission package for a 12-hour Castor
+pretraining run based on the l7 growth v4 recipe.
+
+I ran this for a personal project, but I think the result is interesting so I decided to share even if we are past the deadline. 
+
+# Main differences with PR 2014
+- Max context size was 8k instead of 3k
+- I didn't pre compile the context size since the cost of compilation on a 12 hours run is not significant.
+- I used a customized LR curve that I didn't include in PR 2014 since it doesn't quantize well
+- No EMA
+- Dataset used are different and are detailed in the .yaml file included
+
+## Result
+
+The exact logged final metric is:
+
+```text
+pre-quantization post-ema val_loss:1.83671792 val_bpb:0.96976490 eval_time:184477ms
+final_int6_roundtrip_exact val_loss:1.83671792 val_bpb:0.96976490 skipped_packaging:1
+```
+
+Notes:
+
+- `EMA_ENABLED=0` in the config, despite the historical log string saying
+  `post-ema`.
+- `SKIP_FINAL_PACKAGING=1`, so no final compressed 16MB package was produced.
+- Because packaging was skipped, the `final_int6_roundtrip_exact` line should be
+  read as a no-packaging roundtrip/check value, not as a produced compressed
+  int6 submission artifact.
+- The retained full-precision model is 135,431,355 bytes and is intentionally
+  not committed to this folder.
+
+## Model And Training Setup
+
+- Parameters: `35,944,536`
+- Vocabulary: `8192`, tokenizer `fineweb_8192_bpe.model`
+- Layers: `11`
+- Model dim: `512`
+- Heads: `8`, KV heads: `4`
+- MLP multiplier: `4.0`
+- Looping: enabled at `0.35`, `loop_start=3`, `loop_end=5`, `num_loops=2`
+- Training wallclock cap: `43200s`
+- Stopped at step `38707/100000`
+- Training batch tokens: `262144`
+- Validation batch tokens: `131072`
+- Eval context: `8192`
+- Eval stride: `4096`
+- TTT: enabled, `8` epochs, `32768` chunk tokens, SGD LR `0.005`
+
+Progressive context schedule:
+
+```text
+1024@0.200,2048@0.750,4096@0.850,8192@1.000
+```
+
+Midrun LR cap schedule:
+
+```text
+1.000@0.000,1.000@0.400,0.500@0.400,0.300@0.500,0.180@0.600,0.110@0.700,0.090@0.800,0.070@1.000
+```
+
+## Dataset
+
+The run used a pretrain mixture described in
+`castor_pretrain_mix_v0.yaml`:
+
+- FineWeb English
+- FineWeb2 French
+- FineWeb-Edu English
+- optional CommitPack code shards
+
+The pretokenized output path in the original run was:
+
+```text
+./data/datasets/castor_pretrain_sp8192_v0
+```
+
+The tokenizer path was:
+
+```text
+./data/tokenizers/fineweb_8192_bpe.model
+```
+
+## Reproduction Command
+
+From a workspace that contains the raw data and tokenizer:
+
+```bash
+CASTOR_TRAIN_ENV=./configs/train/l7grow_v4_castor_12h.env \
+  ./scripts/train_l7grow_v4_castor_12h.sh
+```
+
+The wrapper prepares the pretokenized shards if needed, then launches:
+
+```bash
+SIMON_ENV_FILE=./configs/train/l7grow_v4_castor_12h.env \
+  ./.venv/bin/python -u trainers/l7_grow/train_gpt.py
+```
+
+
+## Included Files
+
+- `train_seed1337.log`: exact historical trainer log
+- `l7grow_v4_castor_12h.env`: exact run environment/config
+- `castor_pretrain_mix_v0.yaml`: dataset mixture config
+- `train_l7grow_v4_castor_12h.sh`: wrapper entrypoint
+- `train_l7grow_v4_castor.sh`: underlying Castor launch script
+- `train_gpt.py`: Wrapper
+- `train_gpt_human.py`: Code
+- `env_utils.py`: env-file loader used by the trainer
+- `ARTIFACTS.md`: local paths and hashes for retained uncommitted weights
+- `submission.json`: metadata for this non-record archive
diff --git a/...track_non_record_16mb/2026-04-15_Castor_L7GrowthV4_12h_SP8192/castor_pretrain_mix_v0.yaml b/...track_non_record_16mb/2026-04-15_Castor_L7GrowthV4_12h_SP8192/castor_pretrain_mix_v0.yaml
@@ -0,0 +1,34 @@
+version: 1
+name: castor_pretrain_mix_v0
+description: |
+  Conversion des sources JSONL Castor vers le format .bin attendu par l7.grow.
+
+tokenizer_path: data/tokenizers/fineweb_8192_bpe.model
+vocab_size: 8192
+output_dir: data/datasets/castor_pretrain_sp8192_v0
+shard_size_tokens: 100000000
+val_ratio: 0.005
+append_eos: false
+batch_size: 1024
+seed: 1337
+
+sources:
+  - name: fineweb_en_v0
+    glob: data/pretrain/raw/fineweb_en_v0/shards/*.jsonl
+    required: true
+    kind: text
+
+  - name: fineweb2_fr_v0
+    glob: data/pretrain/raw/fineweb2_fr_v0/shards/*.jsonl
+    required: true
+    kind: text
+
+  - name: fineweb_edu_en_v0
+    glob: data/pretrain/raw/fineweb_edu_en_v0/shards/*.jsonl
+    required: true
+    kind: text
+
+  - name: commitpack_code_v0
+    glob: data/pretrain/raw/commitpack_code_v0/**/*.jsonl
+    required: false
+    kind: code
diff --git a/records/track_non_record_16mb/2026-04-15_Castor_L7GrowthV4_12h_SP8192/env_utils.py b/records/track_non_record_16mb/2026-04-15_Castor_L7GrowthV4_12h_SP8192/env_utils.py
@@ -0,0 +1,75 @@
+from __future__ import annotations
+
+import os
+from pathlib import Path
+
+PATH_LIKE_KEYS = frozenset(
+    {
+        "DATA_DIR",
+        "TOKENIZER_PATH",
+        "SAMPLE_CHECKPOINT",
+        "DATASETS_DIR",
+        "TRAIN_FILES",
+        "VAL_FILES",
+        "RUN_DIR",
+        "CHECKPOINT_DIR",
+        "RESUME_CHECKPOINT",
+        "INIT_MODEL_PATH",
+        "MODEL_PATH",
+        "QUANTIZED_MODEL_PATH",
+        "LOGFILE",
+    }
+)
+
+
+def resolve_path_value(script_dir: Path, raw_value: str) -> str:
+    path = Path(raw_value.strip())
+    if path.is_absolute():
+        return str(path)
+
+    candidates = [
+        (script_dir / path).resolve(),
+        (script_dir.parent / path).resolve(),
+    ]
+    for candidate in candidates:
+        if candidate.exists():
+            return str(candidate)
+    return str(candidates[0])
+
+
+def load_env_file(script_dir: Path, filename: str = ".env") -> None:
+    env_path = Path(filename)
+    if not env_path.is_absolute():
+        candidates = [
+            (script_dir / env_path).resolve(),
+            (script_dir.parent / env_path).resolve(),
+            (Path.cwd() / env_path).resolve(),
+        ]
+        for candidate in candidates:
+            if candidate.is_file():
+                env_path = candidate
+                break
+        else:
+            env_path = candidates[0]
+    if not env_path.is_file():
+        return
+
+    for raw_line in env_path.read_text(encoding="utf-8").splitlines():
+        line = raw_line.strip()
+        if not line or line.startswith("#"):
+            continue
+        if line.startswith("export "):
+            line = line[7:].lstrip()
+        if "=" not in line:
+            continue
+
+        key, value = line.split("=", 1)
+        key = key.strip()
+        value = value.strip()
+        if not key or key in os.environ:
+            continue
+        if len(value) >= 2 and value[0] == value[-1] and value[0] in {"'", '"'}:
+            value = value[1:-1]
+        if key in PATH_LIKE_KEYS and value:
+            value = resolve_path_value(script_dir, value)
+        os.environ[key] = value
diff --git a/...ds/track_non_record_16mb/2026-04-15_Castor_L7GrowthV4_12h_SP8192/l7grow_v4_castor_12h.env b/...ds/track_non_record_16mb/2026-04-15_Castor_L7GrowthV4_12h_SP8192/l7grow_v4_castor_12h.env
@@ -0,0 +1,52 @@
+# Castor v2 12-hour pretrain phase starting from Simon's l7.grow v4 snapshot
+# on the Castor EN/FR + code pretrain mix.
+
+RUN_ID=castor_l7grow_v4_12h_seed1337
+SEED=1337
+
+DATA_DIR=./data
+DATASETS_DIR=./data/datasets/castor_pretrain_sp8192_v0
+TOKENIZER_PATH=./data/tokenizers/fineweb_8192_bpe.model
+
+RUN_DIR=./runs/castor_l7grow_v4_12h_seed1337
+CHECKPOINT_DIR=./runs/castor_l7grow_v4_12h_seed1337/checkpoints
+RESUME_CHECKPOINT=./runs/castor_l7grow_v4_12h_seed1337/checkpoints/latest.pt
+INIT_MODEL_PATH=./checkpoints/bootstrap/l7grow_v4_seed1337_init.pt
+MODEL_PATH=./runs/castor_l7grow_v4_12h_seed1337/final_model.pt
+QUANTIZED_MODEL_PATH=./runs/castor_l7grow_v4_12h_seed1337/final_model.int6.ptz
+LOGFILE=./logs/castor_l7grow_v4_12h_seed1337.txt
+
+VOCAB_SIZE=8192
+MAX_WALLCLOCK_SECONDS=43200
+ITERATIONS=100000
+SAVE_CHECKPOINT_EVERY=1000
+KEEP_STEP_CHECKPOINTS=0
+
+TRAIN_BATCH_TOKENS=262144
+VAL_BATCH_TOKENS=131072
+VAL_LOSS_EVERY=8000
+TRAIN_LOG_EVERY=500
+
+TRAIN_SEQ_LEN=8192
+ROPE_TRAIN_SEQ_LEN=8192
+TRAIN_SEQ_SCHEDULE=1024@0.200,2048@0.750,4096@0.850,8192@1.000
+TRAIN_SEQ_SCHEDULE_MODE=wallclock
+SEQ_CHANGE_WARMUP_STEPS=32
+
+MIDRUN_CAP_SCHEDULE=1.000@0.000,1.000@0.400,0.500@0.400,0.300@0.500,0.180@0.600,0.110@0.700,0.090@0.800,0.070@1.000
+WARMDOWN_ITERS=12800
+
+EMA_ENABLED=0
+COMPILE_MODEL=1
+COMPILE_DYNAMIC=1
+DYNAMO_CACHE_SIZE_LIMIT=64
+
+EVAL_SEQ_LEN=8192
+EVAL_STRIDE=4096
+SLIDING_WINDOW_ENABLED=1
+TTT_ENABLED=1
+TTT_EPOCHS=8
+TTT_CHUNK_TOKENS=32768
+
+SKIP_FINAL_PACKAGING=1
+SAVE_PRE_QUANT_SNAPSHOT=1
diff --git a/records/track_non_record_16mb/2026-04-15_Castor_L7GrowthV4_12h_SP8192/submission.json b/records/track_non_record_16mb/2026-04-15_Castor_L7GrowthV4_12h_SP8192/submission.json
@@ -0,0 +1,18 @@
+{
+  "author": "Simon Bissonnette",
+  "github_id": "simonbissonnette",
+  "name": "L7 growth v4 precursor to PR 2014, 12 hours on RTX 4090, 0.9697 val_bpb",
+  "track": "non-record-unlimited-compute-archival",
+  "date": "2026-04-15",
+  "val_loss": 1.83671792,
+  "val_bpb": 0.96976490,
+  "model_params": 35944536,
+  "step_stop": 38707,
+  "iterations": 100000,
+  "wallclock_seconds": 43189.612,
+  "max_wallclock_seconds": 43200,
+  "artifact_bytes_full_precision": 135431355,
+  "bytes_total": null,
+  "packaging_skipped": true,
+  "blurb": "Archival non-record experiment from a personal Castor run: l7 growth v4 recipe, 35.9M params, SP8192 tokenizer, progressive context growth 1k->2k->4k->8k, a midrun LR cap, no EMA, and legal TTT. Final logged val_bpb is 0.96976490 pre-quant/check-only; final packaging was skipped, so this is shared as an interesting long-compute reference rather than a main-track 16MB record package."
+}
diff --git a/records/track_non_record_16mb/2026-04-15_Castor_L7GrowthV4_12h_SP8192/train_gpt.py b/records/track_non_record_16mb/2026-04-15_Castor_L7GrowthV4_12h_SP8192/train_gpt.py
@@ -0,0 +1,5 @@
+from train_gpt_human import main
+
+
+if __name__ == "__main__":
+    main()