Conversation
…alysis.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…alysis.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…_base_on_moa.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…alysis.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…otypic-activity.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ores.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Pull request overview
This PR adds Mitocheck Buscar analysis outputs (notebooks/scripts + generated plots/tables) and updates core utility/metric code used by the analysis workflows.
Changes:
- Extend
shuffle_feature_profilesand add an ENSG→gene-symbol helper for Mitocheck preprocessing. - Refactor/extend
buscar.metricsscoring + EMD threading controls used by downstream analyses. - Add Mitocheck analysis runner scripts plus converted notebooks and plot-generation R scripts/artifacts.
Reviewed changes
Copilot reviewed 30 out of 54 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
utils/data_utils.py |
Adds label shuffling option and ENSG→gene symbol API helper used in Mitocheck workflows. |
r_buscar_env.yaml |
Expands environment with Python runtime + core Python deps used in Buscar workflows. |
notebooks/4.cpjump1-analysis/nbconverted/5.cpjump_u2os_MoA_analysis.py |
Removes converted notebook script (analysis reshuffling/cleanup). |
notebooks/4.cpjump1-analysis/nbconverted/3.calculate-on-off-scores.py |
Removes converted notebook script (analysis reshuffling/cleanup). |
notebooks/4.cpjump1-analysis/nbconverted/2.run_buscar_rankings_base_on_moa.py |
Updates MoA ranking workflow to new shuffling + metrics API. |
notebooks/4.cpjump1-analysis/nbconverted/2.assess-heterogeneity.py |
Removes converted notebook script (analysis reshuffling/cleanup). |
notebooks/4.cpjump1-analysis/nbconverted/1.generate-on-off-signatures.py |
Minor notebook export tweaks and output changes. |
notebooks/4.cpjump1-analysis/3.calculate-on-off-scores.ipynb |
Removes notebook. |
notebooks/4.cpjump1-analysis/2.run_buscar_rankings_base_on_moa.ipynb |
Adds updated notebook version of MoA ranking run. |
notebooks/4.cpjump1-analysis/2.assess-heterogeneity.ipynb |
Removes notebook. |
notebooks/4.cpjump1-analysis/1.generate-on-off-signatures.ipynb |
Updates to use buscar.signatures.get_signatures and refreshes outputs. |
notebooks/3.mitocheck-analysis/run-mitocheck-buscar.sh |
Adds a shell entrypoint to nbconvert + run Mitocheck Python analysis scripts. |
notebooks/3.mitocheck-analysis/plots/run-mitocheck-plots.sh |
Adds a shell entrypoint to nbconvert + run Mitocheck R plotting scripts. |
notebooks/3.mitocheck-analysis/plots/nbconverted/1.heat-map-on-off-scores.r |
Adds R script to render Mitocheck heatmaps. |
notebooks/3.mitocheck-analysis/plots/nbconverted/2.gene-ranking-relationship.r |
Adds R script to quantify/correlate rank consistency across phenotypic states. |
notebooks/3.mitocheck-analysis/plots/nbconverted/3.linear-modeling-ranking-and-proportion.r |
Adds R script for rank vs. phenotype-proportion modeling/plots. |
notebooks/3.mitocheck-analysis/plots/all-plots/gene-ranking-relationships/profile_specificity_real.csv |
Adds generated summary table artifact. |
notebooks/3.mitocheck-analysis/plots/all-plots/gene-ranking-relationships/profile_specificity_real_vs_shuffled.csv |
Adds generated comparison summary artifact. |
notebooks/3.mitocheck-analysis/plots/all-plots/gene-ranking-relationships/kendall_w_real_permutation_summary.csv |
Adds generated Kendall’s W permutation summary artifact. |
notebooks/3.mitocheck-analysis/plots/all-plots/gene-ranking-relationships/kendall_w_real_shuffled_permutation_summary.csv |
Adds generated real vs shuffled Kendall’s W summary artifact. |
notebooks/3.mitocheck-analysis/nbconverted/0.explore-mitocheck-data.py |
Adds converted Mitocheck EDA script. |
notebooks/3.mitocheck-analysis/nbconverted/1.buscar-analysis.py |
Adds converted Mitocheck Buscar analysis (including LOGO evaluation). |
notebooks/3.mitocheck-analysis/1.buscar-analysis.ipynb |
Adds the Mitocheck analysis notebook source. |
notebooks/2.cfret-analysis/4.assess-signature-significance.ipynb |
Notebook markdown/content adjustments. |
notebooks/2.cfret-analysis/3.generate-pca-umap-components.ipynb |
Notebook output changes (cell counts). |
notebooks/2.cfret-analysis/1.cfret-pilot-buscar-analysis.ipynb |
Notebook metadata version change. |
notebooks/0.download-data/2.preprocessing.ipynb |
Uses ENSG→symbol conversion + stores mapping for Mitocheck. |
buscar/metrics.py |
Refactors score computation and adds EMD threading + optional normalization behavior. |
.pre-commit-config.yaml |
Updates Ruff pre-commit revision. |
Comments suppressed due to low confidence (1)
notebooks/4.cpjump1-analysis/nbconverted/2.run_buscar_rankings_base_on_moa.py:377
measure_phenotypic_activityis invoked withemd_n_threads, but the function signature usesn_threads. As written, this will raiseTypeError: unexpected keyword argument 'emd_n_threads'. Update the call site to usen_threads(or align the API ifemd_n_threadsis intended).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| else: | ||
| raise ValueError( | ||
| f"Invalid off_calculation '{off_calculation}'. Must be 'ratio_affected'" | ||
| " or 'emd'." | ||
| ) |
There was a problem hiding this comment.
In the signature_type == "off" branch, the else: raises for any off_calculation other than "ratio_affected", which makes off_calculation="emd" (allowed by the signature) fail at runtime. Add the EMD off-score branch or remove "emd" from the accepted values.
| # extract morphological features for reference condition (excluding metadata) | ||
| ref_profile = profiles.filter(pl.col(treatment_col) == ref_state).drop( | ||
| meta_cols | ||
| ) | ||
| ref_profile = profiles.filter(pl.col(state_col) == ref_state).drop(meta_cols) | ||
|
|
There was a problem hiding this comment.
state_col is allowed to be None, but it is used unconditionally in pl.col(state_col) when building ref_profile. This will fail at runtime for callers that don't pass state_col. Set state_col = treatment_col when state_col is None, and validate state_col exists / has no nulls similar to treatment_col.
| signature: list[str], | ||
| signature_type: Literal["on", "off"], | ||
| on_calculation: Literal["emd"] = "emd", | ||
| off_calculation: Literal["ratio_affected", "emd"] = "ratio_affected", |
There was a problem hiding this comment.
calculate_score (and measure_phenotypic_activity) accept off_calculation/off_method="emd" via the type hints and docs, but there is no implementation path for off-score EMD. Either implement the EMD off-score (mirroring the on-score branch) or remove "emd" from the accepted values to avoid a runtime error.
| off_calculation: Literal["ratio_affected", "emd"] = "ratio_affected", | |
| off_calculation: Literal["ratio_affected"] = "ratio_affected", |
| r = requests.post( | ||
| f"{server}/lookup/id", | ||
| headers=headers, | ||
| data=json.dumps({"ids": ensg_ids}), # note: "ids" not "id" | ||
| ) |
There was a problem hiding this comment.
requests.post(...) is made without a timeout. In analysis/utility code this can hang indefinitely on network issues. Consider adding a reasonable timeout (and optionally retries/backoff) so notebook runs fail fast and predictably.
| gene_info = decoded.get(eng_gene) | ||
| if gene_info is not None: | ||
| decoded_genes[eng_gene] = gene_info.get("display_name", eng_gene) | ||
| else: | ||
| decoded_genes[eng_gene] = eng_gene |
There was a problem hiding this comment.
Docstring claims unresolved ENSG IDs are mapped to "N/A", but the implementation maps missing IDs back to the original ENSG ID (decoded_genes[eng_gene] = eng_gene). Either return "N/A" here or update the docstring to match the behavior (and consider returning {} early when ensg_ids is an empty list).
| import numpy as np | ||
| import pandas as pd | ||
| import polars as pl | ||
| import requests | ||
| from pycytominer.cyto_utils import infer_cp_features |
There was a problem hiding this comment.
This module now depends on requests (imported here) for transform_ensg_to_gene_symbol. Ensure requests is declared as a direct project dependency (not just transitively via another package), otherwise fresh installs may fail when the transitive dependency set changes.
| if not r.ok: | ||
| r.raise_for_status() | ||
| sys.exit(1) | ||
|
|
There was a problem hiding this comment.
Calling sys.exit(1) inside a utility function is a surprising side effect and prevents callers from handling API failures. Prefer raising an exception (after raise_for_status()), or returning an error value that the caller can act on.
This PR adds the buscar results after applying the method to Mitocheck.
This will be merged once #75 is merged.
Note to reviewer
Most of the changes come from the analysis notebooks. The main parts that need review are:
Please ignore the cfret analysis for now. I’m not sure why those results appeared after merging #75. I will submit another PR rerunning the cfret analysis to confirm whether any bugs or errors were introduced after the merge.
Also, please don’t focus too much on the 4.cpjump1-analysis folder, as that analysis is still in progress.