Skip to content

Mitocheck analysis#79

Open
axiomcura wants to merge 29 commits intoWayScience:mainfrom
axiomcura:mitocheck-analysis
Open

Mitocheck analysis#79
axiomcura wants to merge 29 commits intoWayScience:mainfrom
axiomcura:mitocheck-analysis

Conversation

@axiomcura
Copy link
Member

@axiomcura axiomcura commented Mar 6, 2026

This PR adds the buscar results after applying the method to Mitocheck.

This will be merged once #75 is merged.

Note to reviewer

Most of the changes come from the analysis notebooks. The main parts that need review are:

  • metric.py
  • the 3.mitocheck-analysis folder

Please ignore the cfret analysis for now. I’m not sure why those results appeared after merging #75. I will submit another PR rerunning the cfret analysis to confirm whether any bugs or errors were introduced after the merge.

Also, please don’t focus too much on the 4.cpjump1-analysis folder, as that analysis is still in progress.

axiomcura and others added 27 commits February 12, 2026 11:27
…alysis.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…alysis.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…_base_on_moa.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…alysis.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…otypic-activity.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ores.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@axiomcura axiomcura requested a review from Copilot March 6, 2026 21:39
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Mitocheck Buscar analysis outputs (notebooks/scripts + generated plots/tables) and updates core utility/metric code used by the analysis workflows.

Changes:

  • Extend shuffle_feature_profiles and add an ENSG→gene-symbol helper for Mitocheck preprocessing.
  • Refactor/extend buscar.metrics scoring + EMD threading controls used by downstream analyses.
  • Add Mitocheck analysis runner scripts plus converted notebooks and plot-generation R scripts/artifacts.

Reviewed changes

Copilot reviewed 30 out of 54 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
utils/data_utils.py Adds label shuffling option and ENSG→gene symbol API helper used in Mitocheck workflows.
r_buscar_env.yaml Expands environment with Python runtime + core Python deps used in Buscar workflows.
notebooks/4.cpjump1-analysis/nbconverted/5.cpjump_u2os_MoA_analysis.py Removes converted notebook script (analysis reshuffling/cleanup).
notebooks/4.cpjump1-analysis/nbconverted/3.calculate-on-off-scores.py Removes converted notebook script (analysis reshuffling/cleanup).
notebooks/4.cpjump1-analysis/nbconverted/2.run_buscar_rankings_base_on_moa.py Updates MoA ranking workflow to new shuffling + metrics API.
notebooks/4.cpjump1-analysis/nbconverted/2.assess-heterogeneity.py Removes converted notebook script (analysis reshuffling/cleanup).
notebooks/4.cpjump1-analysis/nbconverted/1.generate-on-off-signatures.py Minor notebook export tweaks and output changes.
notebooks/4.cpjump1-analysis/3.calculate-on-off-scores.ipynb Removes notebook.
notebooks/4.cpjump1-analysis/2.run_buscar_rankings_base_on_moa.ipynb Adds updated notebook version of MoA ranking run.
notebooks/4.cpjump1-analysis/2.assess-heterogeneity.ipynb Removes notebook.
notebooks/4.cpjump1-analysis/1.generate-on-off-signatures.ipynb Updates to use buscar.signatures.get_signatures and refreshes outputs.
notebooks/3.mitocheck-analysis/run-mitocheck-buscar.sh Adds a shell entrypoint to nbconvert + run Mitocheck Python analysis scripts.
notebooks/3.mitocheck-analysis/plots/run-mitocheck-plots.sh Adds a shell entrypoint to nbconvert + run Mitocheck R plotting scripts.
notebooks/3.mitocheck-analysis/plots/nbconverted/1.heat-map-on-off-scores.r Adds R script to render Mitocheck heatmaps.
notebooks/3.mitocheck-analysis/plots/nbconverted/2.gene-ranking-relationship.r Adds R script to quantify/correlate rank consistency across phenotypic states.
notebooks/3.mitocheck-analysis/plots/nbconverted/3.linear-modeling-ranking-and-proportion.r Adds R script for rank vs. phenotype-proportion modeling/plots.
notebooks/3.mitocheck-analysis/plots/all-plots/gene-ranking-relationships/profile_specificity_real.csv Adds generated summary table artifact.
notebooks/3.mitocheck-analysis/plots/all-plots/gene-ranking-relationships/profile_specificity_real_vs_shuffled.csv Adds generated comparison summary artifact.
notebooks/3.mitocheck-analysis/plots/all-plots/gene-ranking-relationships/kendall_w_real_permutation_summary.csv Adds generated Kendall’s W permutation summary artifact.
notebooks/3.mitocheck-analysis/plots/all-plots/gene-ranking-relationships/kendall_w_real_shuffled_permutation_summary.csv Adds generated real vs shuffled Kendall’s W summary artifact.
notebooks/3.mitocheck-analysis/nbconverted/0.explore-mitocheck-data.py Adds converted Mitocheck EDA script.
notebooks/3.mitocheck-analysis/nbconverted/1.buscar-analysis.py Adds converted Mitocheck Buscar analysis (including LOGO evaluation).
notebooks/3.mitocheck-analysis/1.buscar-analysis.ipynb Adds the Mitocheck analysis notebook source.
notebooks/2.cfret-analysis/4.assess-signature-significance.ipynb Notebook markdown/content adjustments.
notebooks/2.cfret-analysis/3.generate-pca-umap-components.ipynb Notebook output changes (cell counts).
notebooks/2.cfret-analysis/1.cfret-pilot-buscar-analysis.ipynb Notebook metadata version change.
notebooks/0.download-data/2.preprocessing.ipynb Uses ENSG→symbol conversion + stores mapping for Mitocheck.
buscar/metrics.py Refactors score computation and adds EMD threading + optional normalization behavior.
.pre-commit-config.yaml Updates Ruff pre-commit revision.
Comments suppressed due to low confidence (1)

notebooks/4.cpjump1-analysis/nbconverted/2.run_buscar_rankings_base_on_moa.py:377

  • measure_phenotypic_activity is invoked with emd_n_threads, but the function signature uses n_threads. As written, this will raise TypeError: unexpected keyword argument 'emd_n_threads'. Update the call site to use n_threads (or align the API if emd_n_threads is intended).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +266 to +270
else:
raise ValueError(
f"Invalid off_calculation '{off_calculation}'. Must be 'ratio_affected'"
" or 'emd'."
)
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the signature_type == "off" branch, the else: raises for any off_calculation other than "ratio_affected", which makes off_calculation="emd" (allowed by the signature) fail at runtime. Add the EMD off-score branch or remove "emd" from the accepted values.

Copilot uses AI. Check for mistakes.
Comment on lines 408 to 410
# extract morphological features for reference condition (excluding metadata)
ref_profile = profiles.filter(pl.col(treatment_col) == ref_state).drop(
meta_cols
)
ref_profile = profiles.filter(pl.col(state_col) == ref_state).drop(meta_cols)

Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state_col is allowed to be None, but it is used unconditionally in pl.col(state_col) when building ref_profile. This will fail at runtime for callers that don't pass state_col. Set state_col = treatment_col when state_col is None, and validate state_col exists / has no nulls similar to treatment_col.

Copilot uses AI. Check for mistakes.
signature: list[str],
signature_type: Literal["on", "off"],
on_calculation: Literal["emd"] = "emd",
off_calculation: Literal["ratio_affected", "emd"] = "ratio_affected",
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calculate_score (and measure_phenotypic_activity) accept off_calculation/off_method="emd" via the type hints and docs, but there is no implementation path for off-score EMD. Either implement the EMD off-score (mirroring the on-score branch) or remove "emd" from the accepted values to avoid a runtime error.

Suggested change
off_calculation: Literal["ratio_affected", "emd"] = "ratio_affected",
off_calculation: Literal["ratio_affected"] = "ratio_affected",

Copilot uses AI. Check for mistakes.
Comment on lines +736 to +740
r = requests.post(
f"{server}/lookup/id",
headers=headers,
data=json.dumps({"ids": ensg_ids}), # note: "ids" not "id"
)
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requests.post(...) is made without a timeout. In analysis/utility code this can hang indefinitely on network issues. Consider adding a reasonable timeout (and optionally retries/backoff) so notebook runs fail fast and predictably.

Copilot uses AI. Check for mistakes.
Comment on lines +753 to +757
gene_info = decoded.get(eng_gene)
if gene_info is not None:
decoded_genes[eng_gene] = gene_info.get("display_name", eng_gene)
else:
decoded_genes[eng_gene] = eng_gene
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring claims unresolved ENSG IDs are mapped to "N/A", but the implementation maps missing IDs back to the original ENSG ID (decoded_genes[eng_gene] = eng_gene). Either return "N/A" here or update the docstring to match the behavior (and consider returning {} early when ensg_ids is an empty list).

Copilot uses AI. Check for mistakes.
Comment on lines 14 to 18
import numpy as np
import pandas as pd
import polars as pl
import requests
from pycytominer.cyto_utils import infer_cp_features
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module now depends on requests (imported here) for transform_ensg_to_gene_symbol. Ensure requests is declared as a direct project dependency (not just transitively via another package), otherwise fresh installs may fail when the transitive dependency set changes.

Copilot uses AI. Check for mistakes.
Comment on lines +742 to +745
if not r.ok:
r.raise_for_status()
sys.exit(1)

Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling sys.exit(1) inside a utility function is a surprising side effect and prevents callers from handling API failures. Prefer raising an exception (after raise_for_status()), or returning an error value that the caller can act on.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants