Skip to content

--sampling-frac issues in v0.6.1 #599

@SuhasSrinivasan

Description

@SuhasSrinivasan

As discussed, creating an issue to track observed behavior and changes.

Documentation Issue

The below error message lacks information about the link between --sampling-frac and --seed.

Expected Behavior
The error message and command-line help text should explicitly state the dependency between --seed and --sampling-frac. Specifically, the documentation should clarify that --seed is only valid when used in conjunction with --sampling-frac. If a user provides --seed without --sampling-frac, the error message should clearly explain this relationship rather than just stating that a required argument is missing.

error: the following required arguments were not provided:
  --sampling-frac <SAMPLING_FRAC>

Usage: modkit pileup --reference <REFERENCE_FASTA> --sampling-frac <SAMPLING_FRAC> --modified-bases <MODIFIED_BASES>... --threads <THREADS> --seed <SEED> --mod-threshold <MOD_THRESHOLDS> <IN_BAM> <OUT_BED>

Using sampling fraction stalls the process

After an hour, confirming that the below did not work for v0.6.1, stalled at the sampling stage.

THREADS=8

            "${MODKIT_PATH}" pileup \
            --modified-bases C:m A:a A:17596 T:17802 \
                --threads "${THREADS}" \
                --reference "${REFERENCE_PATH}" \
                --sampling-frac 0.0005 \
                --seed 1234567 \
                --mod-threshold "a:${MOD_THRESHOLD}" \
                --mod-threshold "m:${MOD_THRESHOLD}" \
                --mod-threshold "17802:${MOD_THRESHOLD}" \
                --mod-threshold "17596:${MOD_THRESHOLD}" \
                "${bam_sorted}" "${pile_file}"
> parsed user-input threshold 0.5 for mod-code 17596
> parsed user-input threshold 0.5 for mod-code 17802
> parsed user-input threshold 0.5 for mod-code a
> parsed user-input threshold 0.5 for mod-code m
> discarded 223943 contigs with zero aligned reads
> parsed 4 base modification(s). Base modifications other than 'A:17596,A:a,C:m,T:17802' will be counted as 'N_other'.
> adding single-base motif: 'A 0'
> adding single-base motif: 'C 0'
> adding single-base motif: 'T 0'
> sampling 0.05% of reads
^C

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions