Hi DeepVariant team and community,
Thanks for maintaining DeepVariant — it’s a very powerful and widely adopted variant caller.
We are working with "human whole‑genome sequencing (WGS) data generated on the BGI / DNBSEQ platform (PE150)". Although DNBSEQ produces short reads similar to Illumina platforms, the underlying sequencing chemistry and error characteristics are distinct. DeepVariant’s standard WGS model is primarily trained and benchmarked on "Illumina short‑read data" in existing publications and documentation. At the same time, we’ve also seen references to "Complete Genomics / DNBSEQ‑specific DeepVariant case studies (e.g., T7/G400)", suggesting the possibility of platform‑specific models.
Our Questions
- For DNBSEQ PE150 human data, is the standard DeepVariant WGS model recommended, or should users prefer a Complete Genomics / DNBSEQ‑specific model if available?
- Are the "T7/G400 case‑study models" mentioned in documentation or presentations publicly available for general use, or were they only for demonstration and internal benchmarking?
- For older or externally service‑generated DNBSEQ datasets, what QC metrics would you recommend to assess whether the standard WGS model is acceptable? For example:
- Ti:Tv ratio expectations
- GQ calibration checks
- Indel / SNP balance
- Depth / mapping consistency
- Platform‑specific patterns to watch for
We want to ensure that our germline variant callset is robust and that the model choice is appropriate for the sequencing technology.
Thanks very much for any guidance!
Best regards,
Yi Zhao
Hi DeepVariant team and community,
Thanks for maintaining DeepVariant — it’s a very powerful and widely adopted variant caller.
We are working with "human whole‑genome sequencing (WGS) data generated on the BGI / DNBSEQ platform (PE150)". Although DNBSEQ produces short reads similar to Illumina platforms, the underlying sequencing chemistry and error characteristics are distinct. DeepVariant’s standard WGS model is primarily trained and benchmarked on "Illumina short‑read data" in existing publications and documentation. At the same time, we’ve also seen references to "Complete Genomics / DNBSEQ‑specific DeepVariant case studies (e.g., T7/G400)", suggesting the possibility of platform‑specific models.
Our Questions
We want to ensure that our germline variant callset is robust and that the model choice is appropriate for the sequencing technology.
Thanks very much for any guidance!
Best regards,
Yi Zhao