Skip to content

refactor: move minimizerReferences from files to buildFiles#442

Merged
ivan-aksamentov merged 135 commits intocva16-testing-samplefrom
fix/cva16-build-files
Apr 1, 2026
Merged

refactor: move minimizerReferences from files to buildFiles#442
ivan-aksamentov merged 135 commits intocva16-testing-samplefrom
fix/cva16-build-files

Conversation

@ivan-aksamentov
Copy link
Copy Markdown
Member

Current Nextclade versions reject array values in the files section of pathogen.json with invalid type: sequence, expected a string. Datasets declaring minimizerReferences as an array inside files [before] fail to load.

This PR (757b824) moves minimizerReferences from .files to a new top-level .buildFiles key [after]:

{
  "files": {
    "reference": "reference.fasta",
    "genomeAnnotation": "genome_annotation.gff3",
    "changelog": "CHANGELOG.md"
  },
  "buildFiles": {
    "minimizerReferences": [
      "minimizer_refs/additional_refs_B.fasta",
      "minimizer_refs/additional_refs_B1a.fasta"
    ]
  }
}

Unknown top-level keys are silently ignored by Nextclade, so the dataset remains loadable by both old and new versions. The companion script change reading from buildFiles is in nextstrain/nextclade_data#404 (9fc7f3e4).

Work items

  • Move minimizerReferences from files to buildFiles in pathogen.json

itpsgit and others added 30 commits December 5, 2025 14:40
flu: add trees to internal segments, add old sequences to ha/na builds, add h2n2, h1n1, and all-b builds
Updated datasets:

- Influenza A H1N1pdm HA (nextstrain/flu/h1n1pdm/ha/MW626062)
- Influenza A H1N1pdm NA (nextstrain/flu/h1n1pdm/na/MW626056)
- Influenza A H3N2 HA (nextstrain/flu/h3n2/ha/EPI1857216)
- Influenza A H3N2 NA (nextstrain/flu/h3n2/na/EPI1857215)
- Influenza B (all) HA (nextstrain/flu/b/ha/KX058884)
- Influenza B (all) NA (nextstrain/flu/b/na/CY073894)
- Influenza A H3N2 PB2 (nextstrain/flu/h3n2/pb2)
- Influenza A H3N2 PB1 (nextstrain/flu/h3n2/pb1)
- Influenza A H3N2 PA (nextstrain/flu/h3n2/pa)
- Influenza A H3N2 HA (nextstrain/flu/h3n2/ha/CY163680)
- Influenza A H3N2 NP (nextstrain/flu/h3n2/np)
- Influenza A H3N2 MP (nextstrain/flu/h3n2/mp)
- Influenza A H3N2 NS (nextstrain/flu/h3n2/ns)
- Influenza A H1N1pdm PB2 (nextstrain/flu/h1n1pdm/pb2)
- Influenza A H1N1pdm PB1 (nextstrain/flu/h1n1pdm/pb1)
- Influenza A H1N1pdm PA (nextstrain/flu/h1n1pdm/pa)
- Influenza A H1N1pdm HA (nextstrain/flu/h1n1pdm/ha/CY121680)
- Influenza A H1N1pdm MP (nextstrain/flu/h1n1pdm/mp)
- Influenza A H1N1pdm NP (nextstrain/flu/h1n1pdm/np)
- Influenza A H1N1pdm NS (nextstrain/flu/h1n1pdm/ns)
- Influenza B (all) PB1 (nextstrain/flu/b/pb1)
- Influenza B (all) PB2 (nextstrain/flu/b/pb2)
- Influenza B (all) PA (nextstrain/flu/b/pa)
- Influenza B (all) NP (nextstrain/flu/b/np)
- Influenza B (all) MP (nextstrain/flu/b/mp)
- Influenza B (all) NS (nextstrain/flu/b/ns)
- Influenza A H1N1 PB2 (nextstrain/flu/h1n1/pb2)
- Influenza A H1N1 PB1 (nextstrain/flu/h1n1/pb1)
- Influenza A H1N1 PA (nextstrain/flu/h1n1/pa)
- Influenza A H1N1 HA (nextstrain/flu/h1n1/ha)
- Influenza A H1N1 NP (nextstrain/flu/h1n1/np)
- Influenza A H1N1 NA (nextstrain/flu/h1n1/na)
- Influenza A H1N1 MP (nextstrain/flu/h1n1/mp)
- Influenza A H1N1 NS (nextstrain/flu/h1n1/ns)
- Influenza A H2N2 PB2 (nextstrain/flu/h2n2/pb2)
- Influenza A H2N2 PB1 (nextstrain/flu/h2n2/pb1)
- Influenza A H2N2 PA (nextstrain/flu/h2n2/pa)
- Influenza A H2N2 NP (nextstrain/flu/h2n2/np)
- Influenza A H2N2 HA (nextstrain/flu/h2n2/ha)
- Influenza A H2N2 NA (nextstrain/flu/h2n2/na)
- Influenza A H2N2 MP (nextstrain/flu/h2n2/mp)
- Influenza A H2N2 NS (nextstrain/flu/h2n2/ns)
ivan-aksamentov and others added 19 commits March 6, 2026 14:21
- Remove deprecated/enabled from root (obsolete)
- Move experimental: true to attributes
- Event-based reporter with severity levels, stage tracking, dataset lifecycle, and defect findings
- Terminal renderer with severity-routed output (warnings/errors to stderr)
- GitHub Actions renderer with annotation commands and step summary markdown
- JSONL renderer for machine-readable build reports
- Replace logging-based logger with thin reporter adapter
- Replace ad-hoc CI annotation emission with DefectFinding-based reporting
- Remove Defect, DefectReport, Severity, print_defect_summary, write_defect_summary_markdown from schema.py (moved to reporting modules)
- Add dataset start/finish lifecycle events and stage grouping to rebuild
- Add --report-jsonl flag for machine-readable build output
`_build_schema_index()` uses `defaultdict` but `collections.defaultdict`
was not imported, causing `NameError` on every rebuild.
Current Nextclade versions reject array values in the `files` section of `pathogen.json` with `invalid type: sequence, expected a string`. The `DatasetFiles` struct catches unknown keys with `rest_files: BTreeMap<String, String>` (https://github.com/nextstrain/nextclade/blob/f7db57f31/packages/nextclade/src/io/dataset.rs#L567-L569), which only accepts string values. An array like `minimizerReferences` fails deserialization before reaching the `other: serde_json::Value` catch-all.

Move the lookup to `.buildFiles`, a top-level key absorbed by `VirusProperties`'s `other: serde_json::Value` (https://github.com/nextstrain/nextclade/blob/1400012f7/packages/nextclade/src/analyze/virus_properties.rs#L137-L138), which accepts any JSON type. This makes datasets with multiple minimizer reference files loadable by both old and new Nextclade versions.
- Depends on: #404

Current Nextclade versions reject array values in the `files` section of `pathogen.json` with `invalid type: sequence, expected a string`. Datasets declaring `minimizerReferences` as an array inside `files` fail to load.

Move `minimizerReferences` from `.files` to a new top-level `.buildFiles` key:

```json
{
  "files": {
    "reference": "reference.fasta",
    "genomeAnnotation": "genome_annotation.gff3",
    "changelog": "CHANGELOG.md"
  },
  "buildFiles": {
    "minimizerReferences": [
      "minimizer_refs/additional_refs_B.fasta",
      "minimizer_refs/additional_refs_B1a.fasta"
    ]
  }
}
```

Unknown top-level keys are silently ignored by Nextclade, so the dataset remains loadable by both old and new versions. The companion script change reading from `buildFiles` is in #404 (9fc7f3e, 9fc7f3e).

### Work items

- [x] Move `minimizerReferences` from `files` to `buildFiles` in `pathogen.json`
@ivan-aksamentov ivan-aksamentov temporarily deployed to refs/pull/442/merge April 1, 2026 13:31 — with GitHub Actions Inactive
@ivan-aksamentov ivan-aksamentov had a problem deploying to refs/heads/fix/cva16-build-files April 1, 2026 13:47 — with GitHub Actions Error
@ivan-aksamentov ivan-aksamentov deployed to refs/pull/442/merge April 1, 2026 13:47 — with GitHub Actions Active
@ivan-aksamentov ivan-aksamentov merged commit d60f549 into cva16-testing-sample Apr 1, 2026
@ivan-aksamentov ivan-aksamentov deleted the fix/cva16-build-files branch April 1, 2026 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants