Skip to content

Coverage: investigate and better handle tabix indexing concatenated bgzipped files. #4

@grosscol

Description

@grosscol

Current State

During coverage aggregation, bgzipped results are concatenated together as the process runs.

# Aggregate depths from depth file chunks
mlr -N --tsv 'nest' --ivar ";" -f 3 \${PIPES[@]} |\
sort --numeric-sort --key=2 |\
bgzip >> ${result_file}

Tabix sometimes fails to produce a valid index for concatenated summary data occasionally. An index gets written, has the contig name, but can't be used to get data by region. tabix file.tsv.gz chr22:10000100-10000200 | wc -l gets 0.

Work around currently involves re-writing entire bgzipped file.

Action items

  • Generate small reproducible example of tabix not producing an index of data.
  • Sort out solution that is more efficient that re-writing the entire gzipped file.

Reference

Suspected to be related to:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions