Skip to content

BIOIN-2656 add indel category to report#258

Open
rinashalibo wants to merge 10 commits intomasterfrom
rinas/add_indel_category
Open

BIOIN-2656 add indel category to report#258
rinashalibo wants to merge 10 commits intomasterfrom
rinas/add_indel_category

Conversation

@rinashalibo
Copy link
Collaborator

@rinashalibo rinashalibo commented Feb 22, 2026

Note

Low Risk
Primarily reporting/test-schema changes and a small import refactor; risk is limited to downstream consumers expecting the old CSV column layout or category list.

Overview
Extends concordance/report outputs to include an aucpr (area under PR curve) metric in the generated expected.out.stats.csv test fixture, shifting the CSV schema accordingly.

Adds a new reporting category, non-hmer Indel + hmer Indel <=12, and implements its filter logic in report_utils.__filter_by_category, enabling combined indel performance summaries in createVarReport.ipynb.

Refactors run_no_gt_report.py to import and call annotate_concordance directly from ugbio_comparison.vcf_comparison_utils (instead of via comparison_utils).

Written by Cursor Bugbot for commit e30240d. This will update automatically on new commits. Configure here.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new indel category "non-hmer Indel + hmer Indel <=12" to the variant reporting system, combining non-homopolymer indels with short homopolymer indels (length ≤12).

Changes:

  • Added filtering logic for the new combined indel category in the report utilities
  • Extended the category list in the variant report notebook to include the new category
  • Updated the ugbio_utils subproject commit reference

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
ugvc/reports/report_utils.py Implements filtering logic for the new combined indel category
ugvc/reports/createVarReport.ipynb Adds the new category to the reporting categories list
ugbio_utils Updates subproject commit reference
Comments suppressed due to low confidence (1)

ugvc/reports/report_utils.py:533

  • This line exceeds 120 characters and contains complex nested boolean logic that is difficult to read. Consider breaking this into multiple lines or extracting the conditions into named variables for better readability.
            result = data[((data["indel"]) & (data["hmer_length"] == 0) & (data["indel_length"] > 0)) | ((data["indel"]) & (data["hmer_length"] > 0) & (data["hmer_length"] <= 12))]            

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@doron-st doron-st left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggested simplification

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants