refactor: move minimizerReferences from files to buildFiles by ivan-aksamentov · Pull Request #442 · nextstrain/nextclade_data

ivan-aksamentov · 2026-04-01T13:31:18Z

Depends on: nextstrain/nextclade_data#404

Current Nextclade versions reject array values in the files section of pathogen.json with invalid type: sequence, expected a string. Datasets declaring minimizerReferences as an array inside files [before] fail to load.

This PR (757b824) moves minimizerReferences from .files to a new top-level .buildFiles key [after]:

{
  "files": {
    "reference": "reference.fasta",
    "genomeAnnotation": "genome_annotation.gff3",
    "changelog": "CHANGELOG.md"
  },
  "buildFiles": {
    "minimizerReferences": [
      "minimizer_refs/additional_refs_B.fasta",
      "minimizer_refs/additional_refs_B1a.fasta"
    ]
  }
}

Unknown top-level keys are silently ignored by Nextclade, so the dataset remains loadable by both old and new versions. The companion script change reading from buildFiles is in nextstrain/nextclade_data#404 (9fc7f3e4).

Work items

Move minimizerReferences from files to buildFiles in pathogen.json

flu: add trees to internal segments, add old sequences to ha/na builds, add h2n2, h1n1, and all-b builds

…skip pseudo genes

Updated datasets: - Influenza A H1N1pdm HA (nextstrain/flu/h1n1pdm/ha/MW626062) - Influenza A H1N1pdm NA (nextstrain/flu/h1n1pdm/na/MW626056) - Influenza A H3N2 HA (nextstrain/flu/h3n2/ha/EPI1857216) - Influenza A H3N2 NA (nextstrain/flu/h3n2/na/EPI1857215) - Influenza B (all) HA (nextstrain/flu/b/ha/KX058884) - Influenza B (all) NA (nextstrain/flu/b/na/CY073894) - Influenza A H3N2 PB2 (nextstrain/flu/h3n2/pb2) - Influenza A H3N2 PB1 (nextstrain/flu/h3n2/pb1) - Influenza A H3N2 PA (nextstrain/flu/h3n2/pa) - Influenza A H3N2 HA (nextstrain/flu/h3n2/ha/CY163680) - Influenza A H3N2 NP (nextstrain/flu/h3n2/np) - Influenza A H3N2 MP (nextstrain/flu/h3n2/mp) - Influenza A H3N2 NS (nextstrain/flu/h3n2/ns) - Influenza A H1N1pdm PB2 (nextstrain/flu/h1n1pdm/pb2) - Influenza A H1N1pdm PB1 (nextstrain/flu/h1n1pdm/pb1) - Influenza A H1N1pdm PA (nextstrain/flu/h1n1pdm/pa) - Influenza A H1N1pdm HA (nextstrain/flu/h1n1pdm/ha/CY121680) - Influenza A H1N1pdm MP (nextstrain/flu/h1n1pdm/mp) - Influenza A H1N1pdm NP (nextstrain/flu/h1n1pdm/np) - Influenza A H1N1pdm NS (nextstrain/flu/h1n1pdm/ns) - Influenza B (all) PB1 (nextstrain/flu/b/pb1) - Influenza B (all) PB2 (nextstrain/flu/b/pb2) - Influenza B (all) PA (nextstrain/flu/b/pa) - Influenza B (all) NP (nextstrain/flu/b/np) - Influenza B (all) MP (nextstrain/flu/b/mp) - Influenza B (all) NS (nextstrain/flu/b/ns) - Influenza A H1N1 PB2 (nextstrain/flu/h1n1/pb2) - Influenza A H1N1 PB1 (nextstrain/flu/h1n1/pb1) - Influenza A H1N1 PA (nextstrain/flu/h1n1/pa) - Influenza A H1N1 HA (nextstrain/flu/h1n1/ha) - Influenza A H1N1 NP (nextstrain/flu/h1n1/np) - Influenza A H1N1 NA (nextstrain/flu/h1n1/na) - Influenza A H1N1 MP (nextstrain/flu/h1n1/mp) - Influenza A H1N1 NS (nextstrain/flu/h1n1/ns) - Influenza A H2N2 PB2 (nextstrain/flu/h2n2/pb2) - Influenza A H2N2 PB1 (nextstrain/flu/h2n2/pb1) - Influenza A H2N2 PA (nextstrain/flu/h2n2/pa) - Influenza A H2N2 NP (nextstrain/flu/h2n2/np) - Influenza A H2N2 HA (nextstrain/flu/h2n2/ha) - Influenza A H2N2 NA (nextstrain/flu/h2n2/na) - Influenza A H2N2 MP (nextstrain/flu/h2n2/mp) - Influenza A H2N2 NS (nextstrain/flu/h2n2/ns)

- Remove deprecated/enabled from root (obsolete) - Move experimental: true to attributes

…skip ci]

- Event-based reporter with severity levels, stage tracking, dataset lifecycle, and defect findings - Terminal renderer with severity-routed output (warnings/errors to stderr) - GitHub Actions renderer with annotation commands and step summary markdown - JSONL renderer for machine-readable build reports

- Replace logging-based logger with thin reporter adapter - Replace ad-hoc CI annotation emission with DefectFinding-based reporting - Remove Defect, DefectReport, Severity, print_defect_summary, write_defect_summary_markdown from schema.py (moved to reporting modules) - Add dataset start/finish lifecycle events and stage grouping to rebuild - Add --report-jsonl flag for machine-readable build output

`_build_schema_index()` uses `defaultdict` but `collections.defaultdict` was not imported, causing `NameError` on every rebuild.

Current Nextclade versions reject array values in the `files` section of `pathogen.json` with `invalid type: sequence, expected a string`. The `DatasetFiles` struct catches unknown keys with `rest_files: BTreeMap<String, String>` (https://github.com/nextstrain/nextclade/blob/f7db57f31/packages/nextclade/src/io/dataset.rs#L567-L569), which only accepts string values. An array like `minimizerReferences` fails deserialization before reaching the `other: serde_json::Value` catch-all. Move the lookup to `.buildFiles`, a top-level key absorbed by `VirusProperties`'s `other: serde_json::Value` (https://github.com/nextstrain/nextclade/blob/1400012f7/packages/nextclade/src/analyze/virus_properties.rs#L137-L138), which accepts any JSON type. This makes datasets with multiple minimizer reference files loadable by both old and new Nextclade versions.

- Depends on: #404 Current Nextclade versions reject array values in the `files` section of `pathogen.json` with `invalid type: sequence, expected a string`. Datasets declaring `minimizerReferences` as an array inside `files` fail to load. Move `minimizerReferences` from `.files` to a new top-level `.buildFiles` key: ```json { "files": { "reference": "reference.fasta", "genomeAnnotation": "genome_annotation.gff3", "changelog": "CHANGELOG.md" }, "buildFiles": { "minimizerReferences": [ "minimizer_refs/additional_refs_B.fasta", "minimizer_refs/additional_refs_B1a.fasta" ] } } ``` Unknown top-level keys are silently ignored by Nextclade, so the dataset remains loadable by both old and new versions. The companion script change reading from `buildFiles` is in #404 (9fc7f3e, 9fc7f3e). ### Work items - [x] Move `minimizerReferences` from `files` to `buildFiles` in `pathogen.json`

# Conflicts: # data_output/minimizer_index.json # scripts/rebuild

itpsgit and others added 30 commits December 5, 2025 14:40

add zikav dataset

3378eec

flu: add trees to internal segments, add old sequences to ha/na builds

c04e1b9

chore: rebuild [skip ci]

1a176d6

flu: set default CDS

4e9ebfd

chore: rebuild [skip ci]

4722b2d

flu: update internal gene trees

ed7d10c

flu: add all influenza b

f19b5db

flu: addition of internal b datasets

2021244

chore: rebuild [skip ci]

bde1b21

flu: update b trees

eaf60c1

flu: add h2n2 datasets

8cf92a6

flu: add h1n1 datasets

d757780

chore: rebuild [skip ci]

937cb7f

flu: fix h3n2 na tree

6df8f15

flu: fix h1n1pdm and h3n2 na trees

c1b8c0f

flu: restore vic datasets

d17ab51

chore: rebuild [skip ci]

1fdda23

reorder dataset in nextstrain collection

68c328a

chore: rebuild [skip ci]

25b24a5

flu: fix B dataset names

0f3ec96

flu: fix further b names and README

6f9f709

merge master

738e5d7

chore: rebuild [skip ci]

356acd9

flu: fix changelog entries

2584934

flu: fix changelog entries

af2faab

chore: rebuild [skip ci]

38b512a

Merge pull request #398 from nextstrain/flu-2025-12

73aa0ee

flu: add trees to internal segments, add old sequences to ha/na builds, add h2n2, h1n1, and all-b builds

docs: allow annotation curation script to take fasta as argument and …

4957e69

…skip pseudo genes

flu: fix missing clade attributes

c98cbc6

ivan-aksamentov and others added 19 commits March 6, 2026 14:21

fix(community/itps/zikav): unify attributes per schema update

98e4066

- Remove deprecated/enabled from root (obsolete) - Move experimental: true to attributes

docs: add changelog [skip ci]

964d2a9

chore: trigger CI

76f8f2c

chore: rebuild [skip ci]

33bc066

fix(nextstrain/dengue): align dataset metadata with schema [skip ci]

d7ee84e

fix(nextstrain/herpes): align qc metadata with schema [skip ci]

0e72d07

fix(nextstrain/measles): align qc metadata with schema [skip ci]

9db6a42

fix(nextstrain/mpox): align dataset metadata with schema [skip ci]

cd88bd6

fix(nextstrain/mumps): align dataset metadata with schema [skip ci]

14c5a01

fix(nextstrain/orthoebolavirus): align dataset metadata with schema […

1e6b0e3

…skip ci]

fix(nextstrain/rubella): align qc metadata with schema [skip ci]

f61b017

fix(nextstrain/wnv): align dataset metadata with schema [skip ci]

3c588c3

chore: trigger CI

0236d4f

chore: rebuild [skip ci]

f616874

fix(scripts): add missing defaultdict import in schema validation

fe1d568

`_build_schema_index()` uses `defaultdict` but `collections.defaultdict` was not imported, causing `NameError` on every rebuild.

ivan-aksamentov temporarily deployed to refs/pull/442/merge April 1, 2026 13:31 — with GitHub Actions Inactive

nextstrain-bot and others added 5 commits April 1, 2026 13:32

chore: rebuild [skip ci]

dddd44a

Revert "chore: rebuild [skip ci]" [skip ci]

b8d2fec

Merge remote-tracking branch 'origin/master' into feat/multiref

1e95c89

# Conflicts: # data_output/minimizer_index.json # scripts/rebuild

chore: rebuild [skip ci]

f40ac06

Merge branch 'feat/multiref' into fix/cva16-build-files

210ea34

ivan-aksamentov had a problem deploying to refs/heads/fix/cva16-build-files April 1, 2026 13:47 — with GitHub Actions Error

ivan-aksamentov deployed to refs/pull/442/merge April 1, 2026 13:47 — with GitHub Actions Active

chore: rebuild [skip ci]

d60f549

ivan-aksamentov merged commit d60f549 into cva16-testing-sample Apr 1, 2026

ivan-aksamentov deleted the fix/cva16-build-files branch April 1, 2026 13:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: move minimizerReferences from files to buildFiles#442

refactor: move minimizerReferences from files to buildFiles#442
ivan-aksamentov merged 135 commits intocva16-testing-samplefrom
fix/cva16-build-files

ivan-aksamentov commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

ivan-aksamentov commented Apr 1, 2026

Work items

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants