Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
7cfd1e2
docs(triage): design spec for annotation-backlog triage project
Chouffe Jun 16, 2026
a1041a8
docs(triage): implementation plan
Chouffe Jun 16, 2026
56cbf4c
feat(triage): scaffold package
Chouffe Jun 16, 2026
120639f
feat(triage): sequence store (meta.json + frame loading)
Chouffe Jun 16, 2026
b810e70
feat(triage): read-only annotator client (GET-only + login)
Chouffe Jun 16, 2026
f96a0f3
feat(triage): incremental pull of unannotated sequences
Chouffe Jun 16, 2026
78df393
feat(triage): score sequences and bucket by threshold
Chouffe Jun 16, 2026
819bdba
feat(triage): viewer contract + unlabel/review worklists
Chouffe Jun 16, 2026
435ceb0
feat(triage): pull/score CLI (+ model_config reader)
Chouffe Jun 16, 2026
b685c61
feat(triage): DVC pipeline, params, README, repo wiring
Chouffe Jun 16, 2026
94625ba
fix(triage): pull ready_to_annotate stage from annotationapi.pyronear…
Chouffe Jun 16, 2026
71c57d3
feat(triage): parallel frame downloads + progress logging
Chouffe Jun 16, 2026
4ee716d
feat(triage): write results.parquet; snapshot 500-seq scored set to DVC
Chouffe Jun 16, 2026
3b8cdf2
docs(triage): tar-sharding design (append-only sealed shards)
Chouffe Jun 16, 2026
77a0a13
feat(triage): across-sequence pull concurrency (--seq-workers)
Chouffe Jun 16, 2026
df44861
feat(viewer): triage mode (Review/Unlabel cards, per-org, score column)
Chouffe Jun 16, 2026
26a5ed1
feat(viewer): rename Review->To Review; hide correctness in triage de…
Chouffe Jun 16, 2026
c591b6e
feat(viewer): relabel detail-pane verdict as bucket (to review/unlabe…
Chouffe Jun 16, 2026
b27508f
docs(triage): rewrite README (dvc pull->visualize flow, host, concurr…
Chouffe Jun 16, 2026
20a064d
docs: surface triage in root README (quick-start subsection, fix pack…
Chouffe Jun 16, 2026
d167e97
feat(triage): score progress heartbeat (every 250 sequences)
Chouffe Jun 16, 2026
2cdb1d9
feat(viewer): triage threshold slider (live To Review/Unlabel re-split)
Chouffe Jun 16, 2026
c85c90a
feat(viewer): smooth triage slider (useDeferredValue) + threshold-swe…
Chouffe Jun 16, 2026
87ae878
perf(viewer): make threshold drag smooth on large stores
Chouffe Jun 16, 2026
cc9b806
feat(viewer): click a threshold-sweep row to set the triage threshold
Chouffe Jun 16, 2026
bfdfa2c
feat(triage): tar-shard pack/unpack + model_version (0.2.0) on predic…
Chouffe Jun 16, 2026
54f19b4
feat(triage): track packed shards via dvc add; drop dvc.yaml pipeline
Chouffe Jun 16, 2026
82a67a6
docs(triage): mark sharding spec implemented (+ deltas: report shards…
Chouffe Jun 16, 2026
5c2737c
docs(triage): consumer prerequisites (branch, AWS profile, ~55GB disk…
Chouffe Jun 16, 2026
486da1f
docs(triage): remove implementation plan (superseded by shipped code …
Chouffe Jun 16, 2026
1bbfc2b
fix(triage): review fixes — resilient pull, robust pagination, triage…
Chouffe Jun 16, 2026
1fa070c
style(viewer): prettier format TriageCards (CI format:check)
Chouffe Jun 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
PACKAGES := core train eval api benchmark monitor
PACKAGES := core train eval api benchmark monitor triage

# Released model.zip version fetched from HuggingFace by `fetch-model`.
# Pinned in api/MODEL_VERSION — the repo version and the model version are
Expand Down
25 changes: 21 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ through the full pipeline, with figures generated from real sequences.

## Packages

Six independent packages, each with its own `pyproject.toml` and `tests/`.
Seven independent packages, each with its own `pyproject.toml` and `tests/`.

| Path | Import | Purpose |
|------|--------|---------|
Expand All @@ -23,6 +23,7 @@ Six independent packages, each with its own `pyproject.toml` and `tests/`.
| `api/` | `temporal_model.api` | FastAPI serving layer, shipped as a Docker service. Depends on `core`. |
| `benchmark/` | `temporal_model.benchmark` | Latency/throughput/resource benchmark with a per-stage `predict()` breakdown, runnable across VMs. Depends on `core`. |
| `monitor/` | `temporal_model.monitor` | Production decision replay: import scored sequences from alert-api, re-run them through the pinned api+model release, view tubes in the eval viewer. |
| `triage/` | `temporal_model.triage` | Annotation-backlog triage: pull the pyro-annotator unannotated queue (read-only), score it, split into an unlabel worklist + a local-review viewer set. Depends on `core`. |

`train`/`eval`/`api`/`benchmark` depend on `core` via a `uv` path source
(`temporal-model-core = { path = "../core", editable = true }`). `core` and
Expand All @@ -32,9 +33,9 @@ Six independent packages, each with its own `pyproject.toml` and `tests/`.

```bash
make # list all available targets (same as `make help`)
make install # uv sync across all six packages
make test # pytest across all six packages
make lint # ruff check across all six packages + docs/assets/scripts
make install # uv sync across all seven packages
make test # pytest across all seven packages
make lint # ruff check across all seven packages + docs/assets/scripts
```

Per package, `cd <pkg> && make install|lint|format|test`.
Expand Down Expand Up @@ -67,6 +68,22 @@ the pyro-annotator sequence store and writes a self-describing report under
VMs for comparison — see `benchmark/README.md` and the `scripts/` provision /
push / pull helpers.

### Triage the annotation backlog

```bash
cd triage && make install # uv sync (brings dvc[s3])
AWS_PROFILE=pyronear uv run dvc pull # fetch data/02_shards from S3
uv run temporal-triage unpack # tars → loose store + report
( cd ../viewer && npm install ) # first time only
make viewer # browse at localhost:3000
```

`triage/` pulls the pyro-annotator's unannotated backlog (read-only), scores it
with the model, and splits it into **To Review** (worth a human's eyes) and
**Unlabel** (auto false-positive) buckets, browsable in the viewer. The commands
above are the *consumer* flow (no annotator credentials, no model, no Docker);
producing new data needs credentials — see `triage/README.md`.

## Origin

Ported from the Pyronear [`vision-rd`](https://github.com/pyronear/vision-rd)
Expand Down
157 changes: 157 additions & 0 deletions docs/specs/2026-06-16-triage-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# triage/: Backlog Triage of the Annotation Queue

**Date:** 2026-06-16
**Status:** Draft

## Motivation

The pyro-annotator (https://annotator.pyronear.org) accumulates a large backlog
of **unannotated** sequences — alerts imported from the source APIs that no
human has labelled yet. Most are false positives, so a person grinding through
the whole queue spends most of their time discarding obvious non-smoke.

`triage/` shrinks that queue. It pulls the unannotated backlog (read-only),
scores every sequence with the temporal smoke classifier in-process, and splits
the backlog at a configurable probability threshold:

- **low (`< threshold`)** — almost certainly not smoke. Emitted as a read-only
**worklist** the team can later apply to the annotator as the `unlabeled`
false-positive type (the annotator's own concept for "false positive
discarded by auto-annotation without category").
- **high (`>= threshold`)** — worth a human's eyes. Written into the eval-viewer
contract so they are reviewed locally in `viewer/`.

The default threshold is **0.35**.

## Scope & constraints

- **API host.** The REST API is `annotationapi.pyronear.org` (the live FastAPI
service); `annotator.pyronear.org` is the frontend SPA, not the API.
- **Read-only against the pyro-annotator API.** `triage` only issues `GET`s
(plus the one `POST /api/v1/auth/login` to obtain a bearer token). It never
writes annotations to production. The low-score "assign to unlabeled" action is
emitted as an artifact (sequence ids + a ready-to-POST `bulk` payload) and
applied by a human in a separate, deliberate step — out of scope for this
package.
- **In-process scoring via `core`, no Docker.** We score for triage, not to
audit production parity, so we call `BboxTubeTemporalModel.predict()`
directly (like `benchmark`) rather than replaying through the pinned API
image (like `monitor`).
- **Test on small subsets first.** `pull` supports `--limit`, and the intended
workflow ramps `--limit 3` → `--limit ~50` → full pull.

## Decisions (agreed in brainstorming)

1. **New sibling package `triage/`** (`temporal_model.triage`, script
`temporal-triage`, distribution `temporal-model-triage`), depending on
`core` via the `uv` path source — same shape as `benchmark`/`monitor`.
2. **Pull scope = `processing_stage=ready_to_annotate`.** The deployed
annotator pre-creates a `SequenceAnnotation` record at `ready_to_annotate`
for the whole human queue, so `has_annotation=false` is near-empty (2 live)
while `ready_to_annotate` is the real backlog (21,489 live, matching the
UI's "Ready to Annotate" count). The `GET /sequences/` listing is not
implicitly org-scoped, so an account token pages the global backlog across
all organizations. The stage is a `--stage` flag (default
`ready_to_annotate`) so other stages can be pulled later.
3. **Read-only; low scorers become a worklist, not a write.** `triage` emits
`unlabeled.json` (ids + bulk-unlabel payload). Applying it is a separate
human step.
4. **In-process `core` scoring.** Sequence score = **max `probability` over
kept tubes** from `predict()`'s `details`. Sequences with no kept tubes (or
an uncalibrated model) score `0.0` and fall in the low bucket.
5. **Threshold is a DVC param.** `triage/params.yaml` holds
`triage.threshold: 0.35`, listed under the `score` stage's `params:` so a
change reruns the stage. A `--threshold` CLI flag overrides it for ad-hoc
runs.
6. **Reuse the Next.js eval viewer, no new UI.** The high bucket is written in
the exact reporting contract `viewer/` already reads
(`results.json`, `details/<key>.json`, `sequences/<key>.json`,
`model_config.json` under `data/08_reporting/<source>/vit_dinov2_finetune/`),
browsed with `DATA_ROOT=../triage`.

## Architecture

Two CLI commands.

### `temporal-triage pull` (read-only)

1. `POST /api/v1/auth/login` → bearer token (creds via `.envrc`, like
`monitor`).
2. Page `GET /api/v1/sequences/?processing_stage=ready_to_annotate` (newest
first), honouring `--stage` / `--limit` / `--page-size`.
3. For each sequence not already on disk: `GET /api/v1/detections/?sequence_id=…`
for its frames, `GET /api/v1/detections/{id}/url` for each signed image URL,
download the image, and write the store entry.
4. Incremental — sequences already present are skipped.

Store layout (mirrors `monitor`/`benchmark`):

```
data/01_raw/sequences/<org>/<camera>/seq_<id>/
meta.json # SequenceMeta: key, ids, org/camera, ordered frames
images/<frame>.jpg
```

`meta.json` carries the viewer join key (`pyro-annotator_<sequence_id>`), the
ordered frame list (`file`, `detection_id`, `recorded_at`, `bucket_key`), and
provenance. There is **no ground-truth label** (these are unannotated) — the
store's `label` is `"unknown"`.

Exposed as `make pull` → runs `pull`, then `dvc add data/01_raw/sequences` +
`dvc push`, so the pulled store is shareable.

### `temporal-triage score` (DVC stage)

1. Load each stored sequence into `core` `Frame`s (reuse `benchmark`'s
`dataset.py` loader shape).
2. `model = BboxTubeTemporalModel.from_package(model.zip, device=auto)`;
`output = model.predict(frames)`.
3. Sequence score = max kept-tube `probability` (`0.0` if none).
4. Split at `threshold` (param, default 0.35) into low / high.
5. Write outputs (below).

`model.zip` is read from `api/models/` (populated by `make fetch-model`, same
as `benchmark`).

### Module layout (one purpose each)

| Module | Purpose |
|--------|---------|
| `annotator_api.py` | Read-only client: `login`, `iter_unannotated_sequences`, `list_detections`, `detection_image_url`. |
| `pull.py` | Orchestrate the pull; build `SequenceMeta`, download frames into the store. |
| `store.py` | `meta.json` schema (`SequenceMeta`/`FrameMeta`) + read/write/iter. |
| `score.py` | Load store → `predict` → sequence score; classify low/high. |
| `report.py` | Write eval-viewer contract for the high bucket + `unlabeled.json` / `review.json` worklists. |
| `cli.py` | `pull` / `score` subcommands. |

## Outputs

```
data/08_reporting/pyro-annotator/vit_dinov2_finetune/
results.json # eval-viewer rows (HIGH bucket): key, score, decision
details/<key>.json # tubes/boxes/crops for the viewer (HIGH bucket)
sequences/<key>.json # frame refs for the viewer
model_config.json # model provenance
unlabeled.json # LOW (<threshold): sequence_ids, scores, ready-to-POST bulk-unlabel payload
review.json # HIGH (>=threshold): sequence_ids, scores
dropped.json # skip reasons (no_images, predict_failed, ...)
```

`unlabeled.json` records, per sequence: `sequence_id`, `key`, `score`, plus a
top-level `bulk_payload` block shaped for `POST /api/v1/annotations/sequences/bulk`
with `false_positive_type: "unlabeled"` — copy-pasteable for the separate apply
step, never sent by this package.

## Testing

Offline, like `monitor`: mocked HTTP for the client, a tiny on-disk fake store
for `score`/`report`, no network and no Docker. `score`'s model call is
exercised against a minimal packaged model or stubbed `predict` in unit tests;
a real end-to-end run is the `--limit 3` manual smoke test.

## Out of scope

- Writing annotations back to the annotator (the apply step).
- Production-parity scoring (ROI envelope / windowing) — that is `monitor`.
- Per-organization or per-camera filtering beyond what the listing offers
(can be added as `pull` flags later if needed).
148 changes: 148 additions & 0 deletions docs/specs/2026-06-16-triage-sharding-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
# triage/: Tar-Sharded Frame Storage

**Date:** 2026-06-16
**Status:** Implemented (`shards.py`, `pack`/`unpack` CLI)

## Implementation notes (deltas from the original draft)

- **The report is sharded too.** At full scale `details/` + `sequences/` are
~43k tiny files — not "small". `pack` bundles per-sequence `details.json` +
`sequence.json` into `report/` shards; only the ~6 aggregate files
(`results.json`/`.parquet`, worklists, `model_config`, `dropped`) stay loose.
- **Frames and predictions are separate shard sets** (`frames/` append-only and
model-independent; `report/` rebuilt per run) — a re-score never re-packs the
26 GB of frames.
- **Every prediction is tagged with `model_version`** (the release version from
the model.zip manifest, e.g. `0.2.0`), so a future re-score's predictions stay
distinguishable. Multi-version report namespacing is deferred until a second
model run actually happens.
- **No `dvc.yaml` pipeline.** The 247k-file store is impractical as a stage dep
and the workflow is staged/manual, so the single DVC-tracked artifact is
`data/02_shards`, added with `dvc add`.

## Motivation

`triage` stores pulled frames as loose files under
`data/01_raw/sequences/<org>/<camera>/seq_<id>/images/*.jpg` and DVC-tracks the
directory. At the 500-sequence scale this is ~6,934 S3 objects — fine. At the
full `ready_to_annotate` backlog (~21,489 sequences, ~301k frames) it becomes
**~300k loose S3 objects / ~30 GB**: DVC content-addresses one object per file,
so push/pull is dominated by per-object overhead, the single `.dir` index grows
to ~300k entries (slow `status`/`checkout`), and per-request S3 cost climbs.

The fix is to **pack frames into tar shards** so DVC tracks tens of objects
instead of hundreds of thousands. The scored outputs (`results.parquet` +
small JSON) are tiny and stay loose — only frames are sharded.

## Decisions

1. **Append-only sealed shards.** Each `pack` run writes the sequences not yet
in any shard into one or more **new, immutable** tar files
(`shard_0001.tar`, `shard_0002.tar`, …); existing shards are never rewritten.
A broad incremental pull therefore re-pushes only its new shard(s), not the
whole set. (Rejected: `sequence_id % N` bucketing — even and deterministic,
but adding any sequence rewrites a ~1 GB bucket tar, so a wide incremental
pull re-pushes ~everything. Rejected: keep loose — the ~300k-object problem.)
2. **Only frames are sharded.** Each shard holds, per sequence, both the frames
and the `meta.json` (so a shard is self-contained). The scored report
(`results.parquet`, `results.json`, `details/`, `sequences/`,
`model_config.json`) stays loose under `data/08_reporting/` — it is small and
the viewer reads per-key files.
3. **The loose store is a local, regenerable cache.** `data/01_raw/sequences/`
is gitignored and **not** DVC-tracked. It is produced either by `pull`
(producer) or by `unpack` (consumer). The DVC source of truth for frames is
`data/02_shards/`.
4. **A manifest drives incrementality.** `data/02_shards/manifest.json` maps
`sequence_id → shard_name` (plus `next_index`). `pack` packs only sequences
absent from the manifest; `unpack`/tooling use it to know what exists.
5. **Migration is a one-time, deliberate cleanup.** Cutting over from the loose
layout orphans the already-pushed loose frame objects. We wipe the
`triage/` remote prefix and re-push under the shard layout rather than rely
on `dvc gc` (see Migration).

## Layout

```
data/
├── 02_shards/ # DVC-tracked (cache:true) → pushed. Source of truth for frames.
│ ├── shard_0001.tar # immutable; each holds seq_<id>/{meta.json, images/*.jpg}
│ ├── shard_0002.tar
│ └── manifest.json # {next_index, sequences: {<id>: "shard_0001.tar", ...}}
├── 01_raw/sequences/ # LOCAL ONLY (gitignored, not DVC-tracked).
│ └── <org>/<camera>/seq_<id>/… # produced by pull OR unpack; consumed by score + viewer.
└── 08_reporting/… # scored outputs (parquet + JSON), loose, unchanged.
```

Shard tar member paths mirror the store's per-sequence layout so `unpack`
restores it byte-for-byte:
`<org_slug>/<camera_slug>/seq_<id>/meta.json` and `.../images/detection_*.jpg`.
Tars are **uncompressed** (JPEGs are already compressed). Each shard is sealed
at a target size (`SHARD_TARGET_BYTES`, default ~1 GB ≈ ~700 sequences); a
`pack` run that exceeds it rolls to the next index.

## Commands (two new)

- **`temporal-triage pack`** — read `data/01_raw/sequences`, select sequences
absent from `manifest.json`, append them into new sealed shard(s), update the
manifest. Idempotent: re-running with nothing new is a no-op. Does not delete
the loose copies (they stay as the local working set).
- **`temporal-triage unpack`** — extract every shard in `data/02_shards` into
`data/01_raw/sequences` (skips sequences already materialized). Reconstitutes
the loose store for `score` and the viewer from `dvc pull`-ed shards alone.

`score`, `pull`, `report`, and the viewer are **unchanged** — they keep reading
the loose store by path.

## Workflows

**Producer (annotator creds):**
```
make pull ARGS="--limit N" # loose store (incremental, parallel downloads)
temporal-triage pack # new sequences → new sealed shard(s) + manifest
dvc add data/02_shards # track shards
dvc push # upload only the new shard(s)
```

**Consumer (no creds, no Docker):**
```
dvc pull # fetch shards (tens of objects)
temporal-triage unpack # shards → loose store
dvc repro # score → report (or: temporal-triage score)
cd ../viewer && DATA_ROOT=../triage npm run dev
```

## DVC pipeline changes

- Untrack the loose store: remove `data/01_raw/sequences.dvc`.
- Track shards: `dvc add data/02_shards` (cache:true → pushed).
- `score` stage keeps `data/01_raw/sequences` as a **path dependency** (DVC
hashes it); the loose store is materialized by `pull` or `unpack` before
`score` runs. (Optionally an `unpack` stage with a `cache:false` output, so a
pure `dvc repro` on a fresh checkout unpacks then scores — to be decided in
implementation.)

## Object-count impact (full backlog, ~21,489 seq / ~301k frames)

| Layout | Frame S3 objects | Incremental pull of +500 seq | `.dir` size |
|--------|------------------|------------------------------|-------------|
| Loose (today) | ~300k | cheap (only new frames push) | ~300k entries |
| Append-only shards | **~30–40 tars** | **1 new ~1 GB tar** | tiny |

## Migration (one-time)

1. `temporal-triage pack` the current 500-sequence loose store → `shard_0001.tar` + manifest.
2. `aws s3 rm --recursive s3://pyro-vision-rd/dvc/temporal-model/triage/` — wipe the prefix (orphaned loose frames + report; both are regenerable).
3. Remove `data/01_raw/sequences.dvc`; `dvc add data/02_shards`; `dvc commit score` (report); `dvc push`.
4. Amend/replace the loose-layout pointer commit (`4ee716d`) so **no retained
commit references the orphaned loose frame objects** — this avoids `dvc gc`
reachability footguns entirely on the shared remote.

## Out of scope

- Compression of tars (JPEGs are already compressed).
- Deleting loose frames after `pack` (kept as the local working set; gitignored).
- Streaming frames directly from tars into the model (would require changing
`core`'s file-path `Frame` contract; `unpack` to loose files is non-invasive).
- Not storing frames at all (reference-only + on-demand refetch) — leaner, but
couples reuse to live annotator access; revisit separately if shard storage
proves heavy.
1 change: 1 addition & 0 deletions triage/.dvc/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/config.local
5 changes: 5 additions & 0 deletions triage/.dvc/config
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[core]
remote = s3remote
analytics = false
['remote "s3remote"']
url = s3://pyro-vision-rd/dvc/temporal-model/triage/
3 changes: 3 additions & 0 deletions triage/.dvcignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Add patterns of files dvc should ignore, which could improve
# the performance. Learn more at
# https://dvc.org/doc/user-guide/dvcignore
8 changes: 8 additions & 0 deletions triage/.envrc.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Copy to triage/.envrc (untracked) and fill in; direnv loads it automatically.
# Read-only credentials for the pyro-annotator API.
# NB: the API is served at annotationapi.pyronear.org — annotator.pyronear.org
# is the frontend SPA, not the API. triage only ever issues GETs (plus the
# login POST that mints the token).
export ANNOTATOR_API_URL=https://annotationapi.pyronear.org
export ANNOTATOR_API_LOGIN=changeme
export ANNOTATOR_API_PASSWORD=changeme
4 changes: 4 additions & 0 deletions triage/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.venv/
__pycache__/
.pytest_cache/
.ruff_cache/
Loading
Loading