Skip to content

feat(triage): annotation-backlog triage (pull → score → review/unlabel) + viewer mode + DVC sharding#63

Merged
Chouffe merged 32 commits into
mainfrom
arthur/chore-pyro-annotator-temporal-model-predictions
Jun 16, 2026
Merged

feat(triage): annotation-backlog triage (pull → score → review/unlabel) + viewer mode + DVC sharding#63
Chouffe merged 32 commits into
mainfrom
arthur/chore-pyro-annotator-temporal-model-predictions

Conversation

@Chouffe

@Chouffe Chouffe commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

What

Adds triage/ — a new sibling package that shrinks the pyro-annotator's
annotation backlog. It pulls the unannotated queue (processing_stage=ready_to_annotate)
read-only, scores every sequence with the temporal smoke classifier in-process,
and splits it at a threshold (default 0.35) into:

  • To Review (score ≥ 0.35) — browsed locally in the viewer
  • Unlabel (score < 0.35) — a read-only worklist (ids + ready-to-send bulk
    payload) to later mark as the unlabeled false-positive type

triage never writes to the annotator (enforced by test).

Run against prod: 21,489 sequences scored → 16,348 To Review / 5,141 Unlabel,
shared on S3 and verified pullable from a clean clone.

Package (temporal_model.triage)

  • annotator_api.py — read-only HTTP client (GET-only + the login POST; no
    patch/put/delete method exists).
  • pull.py — incremental pull with parallel frame downloads (--workers) and
    across-sequence concurrency (--seq-workers).
  • score.py — in-process core.predict(), sequence score = max kept-tube
    probability, bucket at threshold; progress heartbeat.
  • report.py — eval-viewer contract + results.parquet + worklists; tags every
    prediction with the release model_version (e.g. 0.2.0).
  • shards.pypack/unpack: the loose store (~247k files) + report (~43k
    files) would be a DVC object explosion, so per-sequence data is bundled into
    ~36 tar objects. Frames (immutable, append-only) and predictions (per-run)
    are separate shard sets, so re-scores never re-pack the 26 GB of frames.
  • cli.pypull / score / pack / unpack.

Viewer (viewer/)

Triage mode (detected via triage_bucket): To Review / Unlabel cards, a
clickable per-organization breakdown, a threshold slider + clickable sweep
table
(smooth on 21k rows via useDeferredValue + React.memo), a triage
score column, and the correctness column/slider hidden (no ground truth). Shared
monitor/eval paths untouched.

Data sharing (DVC)

Single tracked artifact data/02_shards (dvc add); no dvc.yaml pipeline (the
247k-file store is impractical as a stage dep). Consumer flow: dvc pull
unpack → view — no annotator creds, model, GPU, or Docker.

Docs

triage/README.md (producer + verified consumer flow), docs/specs/2026-06-16-triage-design.md,
docs/specs/2026-06-16-triage-sharding-design.md; root README + Makefile wired.

Testing

27 offline Python tests (mocked HTTP, fake store, stub model — no network/Docker)

  • 48 viewer tests; lint clean. The full consumer flow (clone → install → dvc pull
    → unpack → viewer serving 21k with frames) verified end-to-end from a clean clone.

Chouffe added 30 commits June 16, 2026 11:28
….org

The deployed annotator pre-creates a SequenceAnnotation record at
ready_to_annotate for the whole human queue, so has_annotation=false is
near-empty (2) while ready_to_annotate is the real backlog (21,489, matching
the UI). Parameterize the pull filter (--stage, default ready_to_annotate) and
correct the API host (annotationapi.pyronear.org; annotator.* is the SPA).
Per-sequence frames are fetched+downloaded concurrently (--workers, default 16)
via a ThreadPoolExecutor; the client mounts a larger urllib3 pool so concurrent
signed-URL fetches don't exhaust connections. pull logs a progress line every
25 sequences. Verified no API/object-store rate limiting (conc=40 all 200s).
Adds a columnar results.parquet twin of results.json (matches eval) for
analytical reuse. Tracks the 500-sequence ready_to_annotate store + scored
report in DVC (cache:true report, like monitor) — data lives in the S3 remote
s3://pyro-vision-rd/dvc/temporal-model/triage/, only pointers in git.
Process seq_workers sequences in parallel (ThreadPoolExecutor), each still
fetching its frames with the per-sequence workers pool. Total in-flight ~=
seq_workers*workers. Cuts the full backlog pull from ~7h to ~1h. Default stays
serial (seq_workers=1) for backward compatibility.
Detect triage trees via triage_bucket and render TriageCards (Review/Unlabel
split + bars, clickable per-organization breakdown, triage explainer). Table
gains a bucket + triage_score column and drops the correctness column (triage
rows are unlabeled); the logistic-threshold slider and outcome/GT filters are
hidden; the verdict filter relabels to Review/Unlabel. 44 viewer tests pass.
…tail pane

Triage bucket label is now 'To Review' (card, table, filter, explainer). The
detail pane's correctness Stat is hidden for triage rows (unlabeled, no ground
truth), matching the dropped table column.
…ency, parquet)

Document the consumer flow (dvc pull + viewer, no creds/model/Docker) front and
center, plus the correct API host (annotationapi), ready_to_annotate scope,
--seq-workers/--workers concurrency, results.parquet, full CLI reference, and
data layout. Also add model_config.py to the score stage deps (was missing, so a
change there wouldn't trigger dvc repro).
Re-enable the threshold slider in triage mode so sliding re-buckets To Review
(>= t) vs Unlabel (< t) live via applyThreshold; default to the triage 0.35
(top-level cfg.threshold) not the model logistic_threshold, and relabel as
'triage threshold'. TriageCards header + counts track the live value.
…ep table

Defer the big table re-render so dragging the threshold stays smooth on large
stores (rail cards stay live; React skips intermediate drag values, no manual
debounce). Add a ThresholdSweep table under the slider showing To Review/Unlabel
counts at standard thresholds, highlighting the one nearest the slider.
Defer the threshold INPUT (not output) so applyThreshold throttles to settled
values; memoize TriageCards/ThresholdSweep computations; wrap SequenceTable in
React.memo with stable props (useCallback sort handler) so the 21k-row table is
skipped during a drag. Also widen the sweep table: 0.05 steps around the ~0.45
operating point, bigger gaps at the extremes.
…tions

Add shards.py (pack: per-sequence frame tars [append-only] + report tars [per
run] + loose aggregates; unpack: restore loose store+report) and pack/unpack CLI
subcommands. Tag every prediction with the release model_version from the
model.zip manifest (e.g. 0.2.0), in results rows + model_config.
Replace loose-store + report-output DVC tracking with a single dvc-add of
data/02_shards (the packed artifact, ~36 objects). Drop the dvc.yaml pipeline
(the 247k-file store is impractical as a stage dep; the workflow is staged/manual)
and the now-unused params.yaml. README + Makefile updated for the pull → score →
pack → dvc add → push producer flow and the dvc pull → unpack → view consumer
flow.
Chouffe added 2 commits June 16, 2026 18:32
… slider gating, root README unpack

- pull: _pull_one_safe catches any error (not just RequestException) so one bad
  detection/disk error can't abort a 20k-seq concurrent run; sort detections
  tolerantly (recorded_at may be null).
- annotator client: stop paging on a short page instead of relying on a
  possibly-absent 'pages' field (avoided silent under-pull).
- viewer: gate the triage slider on triageMode, not the eval-shaped
  decision.aggregation field.
- root README: triage consumer flow was missing the 'unpack' step (+ npm install).
@Chouffe Chouffe merged commit 0b9f2f2 into main Jun 16, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant