pyronear · Chouffe · Jun 16, 2026 · Jun 16, 2026 · Jun 16, 2026 · Jun 16, 2026
diff --git a/Makefile b/Makefile
@@ -1,4 +1,4 @@
-PACKAGES := core train eval api benchmark monitor
+PACKAGES := core train eval api benchmark monitor triage
 
 # Released model.zip version fetched from HuggingFace by `fetch-model`.
 # Pinned in api/MODEL_VERSION — the repo version and the model version are

diff --git a/README.md b/README.md
@@ -13,7 +13,7 @@ through the full pipeline, with figures generated from real sequences.
 
 ## Packages
 
-Six independent packages, each with its own `pyproject.toml` and `tests/`.
+Seven independent packages, each with its own `pyproject.toml` and `tests/`.
 
 | Path | Import | Purpose |
 |------|--------|---------|
@@ -23,6 +23,7 @@ Six independent packages, each with its own `pyproject.toml` and `tests/`.
 | `api/`   | `temporal_model.api`   | FastAPI serving layer, shipped as a Docker service. Depends on `core`. |
 | `benchmark/` | `temporal_model.benchmark` | Latency/throughput/resource benchmark with a per-stage `predict()` breakdown, runnable across VMs. Depends on `core`. |
 | `monitor/` | `temporal_model.monitor` | Production decision replay: import scored sequences from alert-api, re-run them through the pinned api+model release, view tubes in the eval viewer. |
+| `triage/` | `temporal_model.triage` | Annotation-backlog triage: pull the pyro-annotator unannotated queue (read-only), score it, split into an unlabel worklist + a local-review viewer set. Depends on `core`. |
 
 `train`/`eval`/`api`/`benchmark` depend on `core` via a `uv` path source
 (`temporal-model-core = { path = "../core", editable = true }`). `core` and
@@ -32,9 +33,9 @@ Six independent packages, each with its own `pyproject.toml` and `tests/`.
 
 ```bash
 make                # list all available targets (same as `make help`)
-make install        # uv sync across all six packages
-make test           # pytest across all six packages
-make lint           # ruff check across all six packages + docs/assets/scripts
+make install        # uv sync across all seven packages
+make test           # pytest across all seven packages
+make lint           # ruff check across all seven packages + docs/assets/scripts
 ```
 
 Per package, `cd <pkg> && make install|lint|format|test`.
@@ -67,6 +68,22 @@ the pyro-annotator sequence store and writes a self-describing report under
 VMs for comparison — see `benchmark/README.md` and the `scripts/` provision /
 push / pull helpers.
 
+### Triage the annotation backlog
+
+```bash
+cd triage && make install                          # uv sync (brings dvc[s3])
+AWS_PROFILE=pyronear uv run dvc pull                # fetch data/02_shards from S3
+uv run temporal-triage unpack                      # tars → loose store + report
+( cd ../viewer && npm install )                     # first time only
+make viewer                                         # browse at localhost:3000
+```
+
+`triage/` pulls the pyro-annotator's unannotated backlog (read-only), scores it
+with the model, and splits it into **To Review** (worth a human's eyes) and
+**Unlabel** (auto false-positive) buckets, browsable in the viewer. The commands
+above are the *consumer* flow (no annotator credentials, no model, no Docker);
+producing new data needs credentials — see `triage/README.md`.
+
 ## Origin
 
 Ported from the Pyronear [`vision-rd`](https://github.com/pyronear/vision-rd)

diff --git a/docs/specs/2026-06-16-triage-design.md b/docs/specs/2026-06-16-triage-design.md
@@ -0,0 +1,157 @@
+# triage/: Backlog Triage of the Annotation Queue
+
+**Date:** 2026-06-16
+**Status:** Draft
+
+## Motivation
+
+The pyro-annotator (https://annotator.pyronear.org) accumulates a large backlog
+of **unannotated** sequences — alerts imported from the source APIs that no
+human has labelled yet. Most are false positives, so a person grinding through
+the whole queue spends most of their time discarding obvious non-smoke.
+
+`triage/` shrinks that queue. It pulls the unannotated backlog (read-only),
+scores every sequence with the temporal smoke classifier in-process, and splits
+the backlog at a configurable probability threshold:
+
+- **low (`< threshold`)** — almost certainly not smoke. Emitted as a read-only
+  **worklist** the team can later apply to the annotator as the `unlabeled`
+  false-positive type (the annotator's own concept for "false positive
+  discarded by auto-annotation without category").
+- **high (`>= threshold`)** — worth a human's eyes. Written into the eval-viewer
+  contract so they are reviewed locally in `viewer/`.
+
+The default threshold is **0.35**.
+
+## Scope & constraints
+
+- **API host.** The REST API is `annotationapi.pyronear.org` (the live FastAPI
+  service); `annotator.pyronear.org` is the frontend SPA, not the API.
+- **Read-only against the pyro-annotator API.** `triage` only issues `GET`s
+  (plus the one `POST /api/v1/auth/login` to obtain a bearer token). It never
+  writes annotations to production. The low-score "assign to unlabeled" action is
+  emitted as an artifact (sequence ids + a ready-to-POST `bulk` payload) and
+  applied by a human in a separate, deliberate step — out of scope for this
+  package.
+- **In-process scoring via `core`, no Docker.** We score for triage, not to
+  audit production parity, so we call `BboxTubeTemporalModel.predict()`
+  directly (like `benchmark`) rather than replaying through the pinned API
+  image (like `monitor`).
+- **Test on small subsets first.** `pull` supports `--limit`, and the intended
+  workflow ramps `--limit 3` → `--limit ~50` → full pull.
+
+## Decisions (agreed in brainstorming)
+
+1. **New sibling package `triage/`** (`temporal_model.triage`, script
+   `temporal-triage`, distribution `temporal-model-triage`), depending on
+   `core` via the `uv` path source — same shape as `benchmark`/`monitor`.
+2. **Pull scope = `processing_stage=ready_to_annotate`.** The deployed
+   annotator pre-creates a `SequenceAnnotation` record at `ready_to_annotate`
+   for the whole human queue, so `has_annotation=false` is near-empty (2 live)
+   while `ready_to_annotate` is the real backlog (21,489 live, matching the
+   UI's "Ready to Annotate" count). The `GET /sequences/` listing is not
+   implicitly org-scoped, so an account token pages the global backlog across
+   all organizations. The stage is a `--stage` flag (default
+   `ready_to_annotate`) so other stages can be pulled later.
+3. **Read-only; low scorers become a worklist, not a write.** `triage` emits
+   `unlabeled.json` (ids + bulk-unlabel payload). Applying it is a separate
+   human step.
+4. **In-process `core` scoring.** Sequence score = **max `probability` over
+   kept tubes** from `predict()`'s `details`. Sequences with no kept tubes (or
+   an uncalibrated model) score `0.0` and fall in the low bucket.
+5. **Threshold is a DVC param.** `triage/params.yaml` holds
+   `triage.threshold: 0.35`, listed under the `score` stage's `params:` so a
+   change reruns the stage. A `--threshold` CLI flag overrides it for ad-hoc
+   runs.
+6. **Reuse the Next.js eval viewer, no new UI.** The high bucket is written in
+   the exact reporting contract `viewer/` already reads
+   (`results.json`, `details/<key>.json`, `sequences/<key>.json`,
+   `model_config.json` under `data/08_reporting/<source>/vit_dinov2_finetune/`),
+   browsed with `DATA_ROOT=../triage`.
+
+## Architecture
+
+Two CLI commands.
+
+### `temporal-triage pull` (read-only)
+
+1. `POST /api/v1/auth/login` → bearer token (creds via `.envrc`, like
+   `monitor`).
+2. Page `GET /api/v1/sequences/?processing_stage=ready_to_annotate` (newest
+   first), honouring `--stage` / `--limit` / `--page-size`.
+3. For each sequence not already on disk: `GET /api/v1/detections/?sequence_id=…`
+   for its frames, `GET /api/v1/detections/{id}/url` for each signed image URL,
+   download the image, and write the store entry.
+4. Incremental — sequences already present are skipped.
+
+Store layout (mirrors `monitor`/`benchmark`):
+
+```
+data/01_raw/sequences/<org>/<camera>/seq_<id>/
+    meta.json        # SequenceMeta: key, ids, org/camera, ordered frames
+    images/<frame>.jpg
+```
+
+`meta.json` carries the viewer join key (`pyro-annotator_<sequence_id>`), the
+ordered frame list (`file`, `detection_id`, `recorded_at`, `bucket_key`), and
+provenance. There is **no ground-truth label** (these are unannotated) — the
+store's `label` is `"unknown"`.
+
+Exposed as `make pull` → runs `pull`, then `dvc add data/01_raw/sequences` +
+`dvc push`, so the pulled store is shareable.
+
+### `temporal-triage score` (DVC stage)
+
+1. Load each stored sequence into `core` `Frame`s (reuse `benchmark`'s
+   `dataset.py` loader shape).
+2. `model = BboxTubeTemporalModel.from_package(model.zip, device=auto)`;
+   `output = model.predict(frames)`.
+3. Sequence score = max kept-tube `probability` (`0.0` if none).
+4. Split at `threshold` (param, default 0.35) into low / high.
+5. Write outputs (below).
+
+`model.zip` is read from `api/models/` (populated by `make fetch-model`, same
+as `benchmark`).
+
+### Module layout (one purpose each)
+
+| Module | Purpose |
+|--------|---------|
+| `annotator_api.py` | Read-only client: `login`, `iter_unannotated_sequences`, `list_detections`, `detection_image_url`. |
+| `pull.py` | Orchestrate the pull; build `SequenceMeta`, download frames into the store. |
+| `store.py` | `meta.json` schema (`SequenceMeta`/`FrameMeta`) + read/write/iter. |
+| `score.py` | Load store → `predict` → sequence score; classify low/high. |
+| `report.py` | Write eval-viewer contract for the high bucket + `unlabeled.json` / `review.json` worklists. |
+| `cli.py` | `pull` / `score` subcommands. |
+
+## Outputs
+
+```
+data/08_reporting/pyro-annotator/vit_dinov2_finetune/
+    results.json            # eval-viewer rows (HIGH bucket): key, score, decision
+    details/<key>.json      # tubes/boxes/crops for the viewer (HIGH bucket)
+    sequences/<key>.json    # frame refs for the viewer
+    model_config.json       # model provenance
+    unlabeled.json          # LOW (<threshold): sequence_ids, scores, ready-to-POST bulk-unlabel payload
+    review.json             # HIGH (>=threshold): sequence_ids, scores
+    dropped.json            # skip reasons (no_images, predict_failed, ...)
+```
+
+`unlabeled.json` records, per sequence: `sequence_id`, `key`, `score`, plus a
+top-level `bulk_payload` block shaped for `POST /api/v1/annotations/sequences/bulk`
+with `false_positive_type: "unlabeled"` — copy-pasteable for the separate apply
+step, never sent by this package.
+
+## Testing
+
+Offline, like `monitor`: mocked HTTP for the client, a tiny on-disk fake store
+for `score`/`report`, no network and no Docker. `score`'s model call is
+exercised against a minimal packaged model or stubbed `predict` in unit tests;
+a real end-to-end run is the `--limit 3` manual smoke test.
+
+## Out of scope
+
+- Writing annotations back to the annotator (the apply step).
+- Production-parity scoring (ROI envelope / windowing) — that is `monitor`.
+- Per-organization or per-camera filtering beyond what the listing offers
+  (can be added as `pull` flags later if needed).
diff --git a/docs/specs/2026-06-16-triage-sharding-design.md b/docs/specs/2026-06-16-triage-sharding-design.md
@@ -0,0 +1,148 @@
+# triage/: Tar-Sharded Frame Storage
+
+**Date:** 2026-06-16
+**Status:** Implemented (`shards.py`, `pack`/`unpack` CLI)
+
+## Implementation notes (deltas from the original draft)
+
+- **The report is sharded too.** At full scale `details/` + `sequences/` are
+  ~43k tiny files — not "small". `pack` bundles per-sequence `details.json` +
+  `sequence.json` into `report/` shards; only the ~6 aggregate files
+  (`results.json`/`.parquet`, worklists, `model_config`, `dropped`) stay loose.
+- **Frames and predictions are separate shard sets** (`frames/` append-only and
+  model-independent; `report/` rebuilt per run) — a re-score never re-packs the
+  26 GB of frames.
+- **Every prediction is tagged with `model_version`** (the release version from
+  the model.zip manifest, e.g. `0.2.0`), so a future re-score's predictions stay
+  distinguishable. Multi-version report namespacing is deferred until a second
+  model run actually happens.
+- **No `dvc.yaml` pipeline.** The 247k-file store is impractical as a stage dep
+  and the workflow is staged/manual, so the single DVC-tracked artifact is
+  `data/02_shards`, added with `dvc add`.
+
+## Motivation
+
+`triage` stores pulled frames as loose files under
+`data/01_raw/sequences/<org>/<camera>/seq_<id>/images/*.jpg` and DVC-tracks the
+directory. At the 500-sequence scale this is ~6,934 S3 objects — fine. At the
+full `ready_to_annotate` backlog (~21,489 sequences, ~301k frames) it becomes
+**~300k loose S3 objects / ~30 GB**: DVC content-addresses one object per file,
+so push/pull is dominated by per-object overhead, the single `.dir` index grows
+to ~300k entries (slow `status`/`checkout`), and per-request S3 cost climbs.
+
+The fix is to **pack frames into tar shards** so DVC tracks tens of objects
+instead of hundreds of thousands. The scored outputs (`results.parquet` +
+small JSON) are tiny and stay loose — only frames are sharded.
+
+## Decisions
+
+1. **Append-only sealed shards.** Each `pack` run writes the sequences not yet
+   in any shard into one or more **new, immutable** tar files
+   (`shard_0001.tar`, `shard_0002.tar`, …); existing shards are never rewritten.
+   A broad incremental pull therefore re-pushes only its new shard(s), not the
+   whole set. (Rejected: `sequence_id % N` bucketing — even and deterministic,
+   but adding any sequence rewrites a ~1 GB bucket tar, so a wide incremental
+   pull re-pushes ~everything. Rejected: keep loose — the ~300k-object problem.)
+2. **Only frames are sharded.** Each shard holds, per sequence, both the frames
+   and the `meta.json` (so a shard is self-contained). The scored report
+   (`results.parquet`, `results.json`, `details/`, `sequences/`,
+   `model_config.json`) stays loose under `data/08_reporting/` — it is small and
+   the viewer reads per-key files.
+3. **The loose store is a local, regenerable cache.** `data/01_raw/sequences/`
+   is gitignored and **not** DVC-tracked. It is produced either by `pull`
+   (producer) or by `unpack` (consumer). The DVC source of truth for frames is
+   `data/02_shards/`.
+4. **A manifest drives incrementality.** `data/02_shards/manifest.json` maps
+   `sequence_id → shard_name` (plus `next_index`). `pack` packs only sequences
+   absent from the manifest; `unpack`/tooling use it to know what exists.
+5. **Migration is a one-time, deliberate cleanup.** Cutting over from the loose
+   layout orphans the already-pushed loose frame objects. We wipe the
+   `triage/` remote prefix and re-push under the shard layout rather than rely
+   on `dvc gc` (see Migration).
+
+## Layout
+
+```
+data/
+├── 02_shards/                      # DVC-tracked (cache:true) → pushed. Source of truth for frames.
+│   ├── shard_0001.tar              # immutable; each holds seq_<id>/{meta.json, images/*.jpg}
+│   ├── shard_0002.tar
+│   └── manifest.json               # {next_index, sequences: {<id>: "shard_0001.tar", ...}}
+├── 01_raw/sequences/               # LOCAL ONLY (gitignored, not DVC-tracked).
+│   └── <org>/<camera>/seq_<id>/…   #   produced by pull OR unpack; consumed by score + viewer.
+└── 08_reporting/…                  # scored outputs (parquet + JSON), loose, unchanged.
+```
+
+Shard tar member paths mirror the store's per-sequence layout so `unpack`
+restores it byte-for-byte:
+`<org_slug>/<camera_slug>/seq_<id>/meta.json` and `.../images/detection_*.jpg`.
+Tars are **uncompressed** (JPEGs are already compressed). Each shard is sealed
+at a target size (`SHARD_TARGET_BYTES`, default ~1 GB ≈ ~700 sequences); a
+`pack` run that exceeds it rolls to the next index.
+
+## Commands (two new)
+
+- **`temporal-triage pack`** — read `data/01_raw/sequences`, select sequences
+  absent from `manifest.json`, append them into new sealed shard(s), update the
+  manifest. Idempotent: re-running with nothing new is a no-op. Does not delete
+  the loose copies (they stay as the local working set).
+- **`temporal-triage unpack`** — extract every shard in `data/02_shards` into
+  `data/01_raw/sequences` (skips sequences already materialized). Reconstitutes
+  the loose store for `score` and the viewer from `dvc pull`-ed shards alone.
+
+`score`, `pull`, `report`, and the viewer are **unchanged** — they keep reading
+the loose store by path.
+
+## Workflows
+
+**Producer (annotator creds):**
+```
+make pull ARGS="--limit N"     # loose store (incremental, parallel downloads)
+temporal-triage pack           # new sequences → new sealed shard(s) + manifest
+dvc add data/02_shards         # track shards
+dvc push                       # upload only the new shard(s)
+```
+
+**Consumer (no creds, no Docker):**
+```
+dvc pull                       # fetch shards (tens of objects)
+temporal-triage unpack         # shards → loose store
+dvc repro                      # score → report   (or: temporal-triage score)
+cd ../viewer && DATA_ROOT=../triage npm run dev
+```
+
+## DVC pipeline changes
+
+- Untrack the loose store: remove `data/01_raw/sequences.dvc`.
+- Track shards: `dvc add data/02_shards` (cache:true → pushed).
+- `score` stage keeps `data/01_raw/sequences` as a **path dependency** (DVC
+  hashes it); the loose store is materialized by `pull` or `unpack` before
+  `score` runs. (Optionally an `unpack` stage with a `cache:false` output, so a
+  pure `dvc repro` on a fresh checkout unpacks then scores — to be decided in
+  implementation.)
+
+## Object-count impact (full backlog, ~21,489 seq / ~301k frames)
+
+| Layout | Frame S3 objects | Incremental pull of +500 seq | `.dir` size |
+|--------|------------------|------------------------------|-------------|
+| Loose (today) | ~300k | cheap (only new frames push) | ~300k entries |
+| Append-only shards | **~30–40 tars** | **1 new ~1 GB tar** | tiny |
+
+## Migration (one-time)
+
+1. `temporal-triage pack` the current 500-sequence loose store → `shard_0001.tar` + manifest.
+2. `aws s3 rm --recursive s3://pyro-vision-rd/dvc/temporal-model/triage/` — wipe the prefix (orphaned loose frames + report; both are regenerable).
+3. Remove `data/01_raw/sequences.dvc`; `dvc add data/02_shards`; `dvc commit score` (report); `dvc push`.
+4. Amend/replace the loose-layout pointer commit (`4ee716d`) so **no retained
+   commit references the orphaned loose frame objects** — this avoids `dvc gc`
+   reachability footguns entirely on the shared remote.
+
+## Out of scope
+
+- Compression of tars (JPEGs are already compressed).
+- Deleting loose frames after `pack` (kept as the local working set; gitignored).
+- Streaming frames directly from tars into the model (would require changing
+  `core`'s file-path `Frame` contract; `unpack` to loose files is non-invasive).
+- Not storing frames at all (reference-only + on-demand refetch) — leaner, but
+  couples reuse to live annotator access; revisit separately if shard storage
+  proves heavy.
diff --git a/triage/.dvc/.gitignore b/triage/.dvc/.gitignore
@@ -0,0 +1 @@
+/config.local
diff --git a/triage/.dvc/config b/triage/.dvc/config
@@ -0,0 +1,5 @@
+[core]
+    remote = s3remote
+    analytics = false
+['remote "s3remote"']
+    url = s3://pyro-vision-rd/dvc/temporal-model/triage/
diff --git a/triage/.dvcignore b/triage/.dvcignore
@@ -0,0 +1,3 @@
+# Add patterns of files dvc should ignore, which could improve
+# the performance. Learn more at
+# https://dvc.org/doc/user-guide/dvcignore
diff --git a/triage/.envrc.example b/triage/.envrc.example
@@ -0,0 +1,8 @@
+# Copy to triage/.envrc (untracked) and fill in; direnv loads it automatically.
+# Read-only credentials for the pyro-annotator API.
+# NB: the API is served at annotationapi.pyronear.org — annotator.pyronear.org
+# is the frontend SPA, not the API. triage only ever issues GETs (plus the
+# login POST that mints the token).
+export ANNOTATOR_API_URL=https://annotationapi.pyronear.org
+export ANNOTATOR_API_LOGIN=changeme
+export ANNOTATOR_API_PASSWORD=changeme
diff --git a/triage/.gitignore b/triage/.gitignore
@@ -0,0 +1,4 @@
+.venv/
+__pycache__/
+.pytest_cache/
+.ruff_cache/