feat(triage): annotation-backlog triage (pull → score → review/unlabel) + viewer mode + DVC sharding#63
Merged
Chouffe merged 32 commits intoJun 16, 2026
Conversation
….org The deployed annotator pre-creates a SequenceAnnotation record at ready_to_annotate for the whole human queue, so has_annotation=false is near-empty (2) while ready_to_annotate is the real backlog (21,489, matching the UI). Parameterize the pull filter (--stage, default ready_to_annotate) and correct the API host (annotationapi.pyronear.org; annotator.* is the SPA).
Per-sequence frames are fetched+downloaded concurrently (--workers, default 16) via a ThreadPoolExecutor; the client mounts a larger urllib3 pool so concurrent signed-URL fetches don't exhaust connections. pull logs a progress line every 25 sequences. Verified no API/object-store rate limiting (conc=40 all 200s).
Adds a columnar results.parquet twin of results.json (matches eval) for analytical reuse. Tracks the 500-sequence ready_to_annotate store + scored report in DVC (cache:true report, like monitor) — data lives in the S3 remote s3://pyro-vision-rd/dvc/temporal-model/triage/, only pointers in git.
Process seq_workers sequences in parallel (ThreadPoolExecutor), each still fetching its frames with the per-sequence workers pool. Total in-flight ~= seq_workers*workers. Cuts the full backlog pull from ~7h to ~1h. Default stays serial (seq_workers=1) for backward compatibility.
Detect triage trees via triage_bucket and render TriageCards (Review/Unlabel split + bars, clickable per-organization breakdown, triage explainer). Table gains a bucket + triage_score column and drops the correctness column (triage rows are unlabeled); the logistic-threshold slider and outcome/GT filters are hidden; the verdict filter relabels to Review/Unlabel. 44 viewer tests pass.
…tail pane Triage bucket label is now 'To Review' (card, table, filter, explainer). The detail pane's correctness Stat is hidden for triage rows (unlabeled, no ground truth), matching the dropped table column.
…ency, parquet) Document the consumer flow (dvc pull + viewer, no creds/model/Docker) front and center, plus the correct API host (annotationapi), ready_to_annotate scope, --seq-workers/--workers concurrency, results.parquet, full CLI reference, and data layout. Also add model_config.py to the score stage deps (was missing, so a change there wouldn't trigger dvc repro).
Re-enable the threshold slider in triage mode so sliding re-buckets To Review (>= t) vs Unlabel (< t) live via applyThreshold; default to the triage 0.35 (top-level cfg.threshold) not the model logistic_threshold, and relabel as 'triage threshold'. TriageCards header + counts track the live value.
…ep table Defer the big table re-render so dragging the threshold stays smooth on large stores (rail cards stay live; React skips intermediate drag values, no manual debounce). Add a ThresholdSweep table under the slider showing To Review/Unlabel counts at standard thresholds, highlighting the one nearest the slider.
Defer the threshold INPUT (not output) so applyThreshold throttles to settled values; memoize TriageCards/ThresholdSweep computations; wrap SequenceTable in React.memo with stable props (useCallback sort handler) so the 21k-row table is skipped during a drag. Also widen the sweep table: 0.05 steps around the ~0.45 operating point, bigger gaps at the extremes.
…tions Add shards.py (pack: per-sequence frame tars [append-only] + report tars [per run] + loose aggregates; unpack: restore loose store+report) and pack/unpack CLI subcommands. Tag every prediction with the release model_version from the model.zip manifest (e.g. 0.2.0), in results rows + model_config.
Replace loose-store + report-output DVC tracking with a single dvc-add of data/02_shards (the packed artifact, ~36 objects). Drop the dvc.yaml pipeline (the 247k-file store is impractical as a stage dep; the workflow is staged/manual) and the now-unused params.yaml. README + Makefile updated for the pull → score → pack → dvc add → push producer flow and the dvc pull → unpack → view consumer flow.
…, model_version, no pipeline)
…) + verified note
… slider gating, root README unpack - pull: _pull_one_safe catches any error (not just RequestException) so one bad detection/disk error can't abort a 20k-seq concurrent run; sort detections tolerantly (recorded_at may be null). - annotator client: stop paging on a short page instead of relying on a possibly-absent 'pages' field (avoided silent under-pull). - viewer: gate the triage slider on triageMode, not the eval-shaped decision.aggregation field. - root README: triage consumer flow was missing the 'unpack' step (+ npm install).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds
triage/— a new sibling package that shrinks the pyro-annotator'sannotation backlog. It pulls the unannotated queue (
processing_stage=ready_to_annotate)read-only, scores every sequence with the temporal smoke classifier in-process,
and splits it at a threshold (default 0.35) into:
score ≥ 0.35) — browsed locally in the viewerscore < 0.35) — a read-only worklist (ids + ready-to-sendbulkpayload) to later mark as the
unlabeledfalse-positive typetriage never writes to the annotator (enforced by test).
Run against prod: 21,489 sequences scored → 16,348 To Review / 5,141 Unlabel,
shared on S3 and verified pullable from a clean clone.
Package (
temporal_model.triage)annotator_api.py— read-only HTTP client (GET-only + the login POST; nopatch/put/delete method exists).
pull.py— incremental pull with parallel frame downloads (--workers) andacross-sequence concurrency (
--seq-workers).score.py— in-processcore.predict(), sequence score = max kept-tubeprobability, bucket at threshold; progress heartbeat.
report.py— eval-viewer contract +results.parquet+ worklists; tags everyprediction with the release
model_version(e.g.0.2.0).shards.py—pack/unpack: the loose store (~247k files) + report (~43kfiles) would be a DVC object explosion, so per-sequence data is bundled into
~36 tar objects. Frames (immutable, append-only) and predictions (per-run)
are separate shard sets, so re-scores never re-pack the 26 GB of frames.
cli.py—pull/score/pack/unpack.Viewer (
viewer/)Triage mode (detected via
triage_bucket): To Review / Unlabel cards, aclickable per-organization breakdown, a threshold slider + clickable sweep
table (smooth on 21k rows via
useDeferredValue+React.memo), a triagescore column, and the correctness column/slider hidden (no ground truth). Shared
monitor/evalpaths untouched.Data sharing (DVC)
Single tracked artifact
data/02_shards(dvc add); nodvc.yamlpipeline (the247k-file store is impractical as a stage dep). Consumer flow:
dvc pull→unpack→ view — no annotator creds, model, GPU, or Docker.Docs
triage/README.md(producer + verified consumer flow),docs/specs/2026-06-16-triage-design.md,docs/specs/2026-06-16-triage-sharding-design.md; root README + Makefile wired.Testing
27 offline Python tests (mocked HTTP, fake store, stub model — no network/Docker)
→ unpack → viewer serving 21k with frames) verified end-to-end from a clean clone.