Skip to content

New version including changes in SDRF validation, remove validation of simple txt files #69

Merged
ypriverol merged 6 commits intomainfrom
dev
Mar 21, 2026
Merged

New version including changes in SDRF validation, remove validation of simple txt files #69
ypriverol merged 6 commits intomainfrom
dev

Conversation

@ypriverol
Copy link
Copy Markdown
Member

@ypriverol ypriverol commented Mar 21, 2026

Summary by CodeRabbit

  • New Features

    • Added support for DIA‑NN unified design format validation.
  • Improvements

    • Simplified SDRF validation CLI, consolidated options, and made design input required.
    • Streamlined SDRF validation paths including a minimal-validation mode; recommended minimal columns now warn instead of failing.
  • Tests

    • Added integration and unit tests covering unified/legacy design parsing and validation edge cases.
  • Chores

    • Loosened sdrf-pipelines dependency to >=0.1.1, added a development container config, and bumped package version.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 21, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a development Dockerfile, relaxes the sdrf-pipelines dependency, updates an import path for UnimodDatabase, refactors SDRF validation/CLI to use unified validator with a minimal mode, and extends tests for DIA‑NN unified design parsing and error cases.

Changes

Cohort / File(s) Summary
Development & Packaging
Dockerfile.dev, pyproject.toml, recipe/meta.yaml
Adds Dockerfile.dev; relaxes sdrf-pipelines constraint (==0.0.33>=0.1.1); bumps package version and removes strict setuptools <78 constraints in recipe.
Import Path Alignment
quantmsutils/diann/dianncfg.py
Changes import for UnimodDatabase from sdrf_pipelines.openms.unimod to sdrf_pipelines.converters.openms.unimod.
SDRF Validation & CLI
quantmsutils/sdrf/check_samplesheet.py
Replaces multi-stage SDRF/expdesign validation with a single read/validate flow plus a new --minimal column-only validation; removes many skip flags and expdesign helpers; makes --exp_design required and adds --template/-t.
Tests — SDRF & DIA‑NN
tests/test_commands.py
Reworks SDRF CLI tests for --minimal mode; adds TestDiannUnifiedDesign covering unified-format design parsing, legacy compatibility, and error conditions for missing/inconsistent columns.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CLI as "quantmsutils CLI"
    participant FS as "Filesystem (SDRF / design files)"
    participant Validator as "sdrf_pipelines (read_sdrf / validate_sdrf)"
    participant Parser as "quantmsutils.get_exp_design_dfs"
    CLI->>FS: open --sdrf / --exp_design file
    CLI->>Validator: read_sdrf(path) / validate_sdrf(template or minimal)
    Validator-->>CLI: validation result (ok / errors)
    CLI->>Parser: get_exp_design_dfs(validated_path)
    Parser-->>CLI: DataFrames or raise ValueError
    CLI->>FS: report errors or continue processing
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • jpfeuffer
  • daichengxin

Poem

🐰 I hopped into Docker with a tiny Python nest,
loosened a pin so newer libs could rest,
I nudged an import, taught SDRF to validate anew,
tests lined up and DIA‑NN found its cue,
hop, build, and run — a small rabbit's cheer for you!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title partially aligns with the changeset. It mentions SDRF validation changes and removing validation of simple txt files, which are reflected in the modifications to check_samplesheet.py. However, it fails to highlight the primary change: introducing a new 'minimal' validation mode that allows running workflows with minimal metadata while preserving SDRF structure, which is the main objective of this PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
tests/test_commands.py (1)

110-118: This only tests the bypass branch.

--skip_sdrf_validation exits before read_sdrf() / validate_sdrf() run, so the refactored validation path in this PR is still untested. Please add at least one non-skip case.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_commands.py` around lines 110 - 118, The test only covers the
branch that bypasses SDRF validation; add a complementary test that exercises
the real validation path by calling run_cli_command("checksamplesheet", args)
without the "--skip_sdrf_validation" flag and using a valid or intentionally
invalid SDRF file from TEST_DATA_DIR (e.g., "PXD000001.sdrf.tsv" or another
fixture) so that read_sdrf() and validate_sdrf() are invoked; create a new test
function (e.g., test_check_samplesheet_sdrf_validate) or parametrize the
existing test to include both skip and non-skip cases and assert the appropriate
exit_code and/or error messages from validate_sdrf().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Dockerfile.dev`:
- Around line 1-5: The Dockerfile leaves the container running as root; create
and switch to a non-root user: add user/group creation (e.g., addgroup/adduser
or groupadd/useradd), create the project directory if needed, chown /src to that
user, and add a USER <username> line before installing/running app commands so
subsequent steps run as the non-root user; update relevant RUN/COPY/WORKDIR
sequences in the Dockerfile to ensure ownership is set (references: WORKDIR
/src, COPY . ., RUN pip install --no-cache-dir .).

In `@pyproject.toml`:
- Line 34: The dependency specification for sdrf-pipelines is too open
(sdrf-pipelines = ">=0.1.1") and could allow breaking pre-1.0 API changes that
this PR relies on (UnimodDatabase, read_sdrf); update pyproject.toml to pin a
stable upper bound such as ">=0.1.1,<0.2.0" (or use a compatible spec like
"~=0.1.1") so installs remain reproducible and the expected APIs
(UnimodDatabase, read_sdrf) won't change unexpectedly.

In `@quantmsutils/sdrf/check_samplesheet.py`:
- Around line 1-5: The file references os and errno inside the make_dir() helper
but those modules are not imported, causing F821; either restore the missing
imports (add import os and import errno at the top) or remove/replace make_dir()
usage; specifically update check_samplesheet.py to import os and errno so
make_dir() can call os.path.exists, os.makedirs and errno.EEXIST, or
refactor/inline the functionality where make_dir() is used (look for the
make_dir function and call sites to ensure consistency).
- Around line 46-50: Change the CLI option that supplies the SDRF path to use
click.Path(exists=True, dir_okay=False, readable=True) so the shell/CLI
validates file existence for the parameter (the variable passed into read_sdrf
is input_sdrf / the --exp_design option). Wrap calls to read_sdrf(input_sdrf)
and df.validate_sdrf(...) in a try/except that catches exceptions from
parsing/validation and re-raises them as click.ClickException with a clear
message (include the original exception text) so failures surface as
CLI-friendly errors; apply the same pattern to the second occurrence covering
lines ~63–80.

---

Nitpick comments:
In `@tests/test_commands.py`:
- Around line 110-118: The test only covers the branch that bypasses SDRF
validation; add a complementary test that exercises the real validation path by
calling run_cli_command("checksamplesheet", args) without the
"--skip_sdrf_validation" flag and using a valid or intentionally invalid SDRF
file from TEST_DATA_DIR (e.g., "PXD000001.sdrf.tsv" or another fixture) so that
read_sdrf() and validate_sdrf() are invoked; create a new test function (e.g.,
test_check_samplesheet_sdrf_validate) or parametrize the existing test to
include both skip and non-skip cases and assert the appropriate exit_code and/or
error messages from validate_sdrf().

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9390029a-3072-4eff-b38f-0d00d472e928

📥 Commits

Reviewing files that changed from the base of the PR and between cd0b93f and 0a6c80e.

⛔ Files ignored due to path filters (1)
  • tests/test_data/diann2msstats/PXD026600_diann_design.tsv is excluded by !**/*.tsv
📒 Files selected for processing (5)
  • Dockerfile.dev
  • pyproject.toml
  • quantmsutils/diann/dianncfg.py
  • quantmsutils/sdrf/check_samplesheet.py
  • tests/test_commands.py

Comment on lines +1 to +5
FROM python:3.11-slim
RUN apt-get update && apt-get install -y --no-install-recommends git procps libglib2.0-0t64 && rm -rf /var/lib/apt/lists/*
WORKDIR /src
COPY . .
RUN pip install --no-cache-dir .
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Run the dev container as a non-root user.

The image never switches away from root. That keeps the whole dev session over-privileged and will also leave bind-mounted files owned by root on the host.

Suggested change
 FROM python:3.11-slim
-RUN apt-get update && apt-get install -y --no-install-recommends git procps libglib2.0-0t64 && rm -rf /var/lib/apt/lists/*
+RUN apt-get update \
+ && apt-get install -y --no-install-recommends git procps libglib2.0-0t64 \
+ && useradd --create-home --shell /bin/bash appuser \
+ && rm -rf /var/lib/apt/lists/*
 WORKDIR /src
 COPY . .
 RUN pip install --no-cache-dir .
+RUN chown -R appuser:appuser /src
+USER appuser
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
FROM python:3.11-slim
RUN apt-get update && apt-get install -y --no-install-recommends git procps libglib2.0-0t64 && rm -rf /var/lib/apt/lists/*
WORKDIR /src
COPY . .
RUN pip install --no-cache-dir .
FROM python:3.11-slim
RUN apt-get update \
&& apt-get install -y --no-install-recommends git procps libglib2.0-0t64 \
&& useradd --create-home --shell /bin/bash appuser \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /src
COPY . .
RUN pip install --no-cache-dir .
RUN chown -R appuser:appuser /src
USER appuser
🧰 Tools
🪛 Trivy (0.69.3)

[error] 1-1: Image user should not be 'root'

Specify at least 1 USER command in Dockerfile with non-root user as argument

Rule: DS-0002

Learn more

(IaC/Dockerfile)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile.dev` around lines 1 - 5, The Dockerfile leaves the container
running as root; create and switch to a non-root user: add user/group creation
(e.g., addgroup/adduser or groupadd/useradd), create the project directory if
needed, chown /src to that user, and add a USER <username> line before
installing/running app commands so subsequent steps run as the non-root user;
update relevant RUN/COPY/WORKDIR sequences in the Dockerfile to ensure ownership
is set (references: WORKDIR /src, COPY . ., RUN pip install --no-cache-dir .).

python = "*"
click = "*"
sdrf-pipelines = "==0.0.33"
sdrf-pipelines = ">=0.1.1"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Which released sdrf-pipelinesversions expose bothsdrf_pipelines.converters.openms.unimod.UnimodDatabaseandsdrf_pipelines.sdrf.sdrf.read_sdrf/validate_sdrf, and what version range should downstream projects pin to in order to preserve those APIs?

💡 Result:

No released versions of sdrf-pipelines explicitly documented as exposing both sdrf_pipelines.converters.openms.unimod.UnimodDatabase and sdrf_pipelines.sdrf.sdrf.read_sdrf / validate_sdrf APIs could be identified from available sources. The package has releases from 0.0.2 to 0.1.0 (latest as of 2026-02-04), primarily CLI-focused (parse_sdrf validate-sdrf, convert-openms). Repository structure confirms sdrf_pipelines/ directory with sdrf/ and converters/, suggesting Python APIs exist, but no public API docs, changelogs, or version-specific mentions found. Downstream projects should test specific versions (e.g., >=0.0.33 <0.1.0) or pin to latest stable 0.1.0 after verifying API presence via code inspection, as no breaking change evidence exists.

Citations:


🏁 Script executed:

#!/bin/bash
# Check sdrf-pipelines versions available on PyPI and GitHub releases

# Try to fetch PyPI JSON API
echo "=== PyPI sdrf-pipelines releases ==="
curl -s "https://pypi.org/pypi/sdrf-pipelines/json" | jq -r '.releases | keys[]' | sort -V

echo -e "\n=== Latest version info ==="
curl -s "https://pypi.org/pypi/sdrf-pipelines/json" | jq '.info | {version, release_url}'

Repository: bigbio/quantms-utils

Length of output: 441


🏁 Script executed:

#!/bin/bash
# Verify the code uses the mentioned sdrf-pipelines APIs

echo "=== Searching for sdrf_pipelines imports in the codebase ==="
rg "from sdrf_pipelines|import sdrf_pipelines" -n

echo -e "\n=== Files that import sdrf_pipelines modules ==="
rg "sdrf_pipelines\." -l | head -20

Repository: bigbio/quantms-utils

Length of output: 433


Bound sdrf-pipelines to a stable version range.

This PR now depends on specific sdrf-pipelines APIs (UnimodDatabase, read_sdrf), but >=0.1.1 allows future pre-1.0 releases where these may change. Add an upper bound to keep installs reproducible.

Suggested change
-sdrf-pipelines = ">=0.1.1"
+sdrf-pipelines = ">=0.1.1,<0.2.0"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sdrf-pipelines = ">=0.1.1"
sdrf-pipelines = ">=0.1.1,<0.2.0"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` at line 34, The dependency specification for sdrf-pipelines
is too open (sdrf-pipelines = ">=0.1.1") and could allow breaking pre-1.0 API
changes that this PR relies on (UnimodDatabase, read_sdrf); update
pyproject.toml to pin a stable upper bound such as ">=0.1.1,<0.2.0" (or use a
compatible spec like "~=0.1.1") so installs remain reproducible and the expected
APIs (UnimodDatabase, read_sdrf) won't change unexpectedly.

Comment on lines +46 to +50
df = read_sdrf(input_sdrf)
errors = df.validate_sdrf(
template="ms-proteomics",
use_ols_cache_only=use_ols_cache_only,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Click, what is the recommended way to validate an input file option so missing paths fail as CLI usage errors rather than uncaught exceptions? Is click.Path(exists=True, dir_okay=False, readable=True) the right fit for this case?

💡 Result:

Yes, click.Path(exists=True, dir_okay=False, readable=True) is the recommended way to validate an input file option in Click. This ensures that if a missing or invalid path is provided (or if the option is required and omitted), Click raises a click.BadParameter exception during parameter validation, which is handled as a proper CLI usage error with formatted help text like "Error: Invalid value for '--input-file': Path 'missing.txt' does not exist." rather than propagating an uncaught exception later in the code. Key features from official docs: - exists=True: Requires the path to exist; missing paths fail validation immediately. - dir_okay=False: Rejects directories, accepting only files. - readable=True: Ensures the file is readable (default, but explicit for clarity). For an input file option, use it like: @click.option('--input-file', type=click.Path(exists=True, dir_okay=False, readable=True), required=True) If required=True is omitted, provide a path or it uses the default (if any); validation only applies to provided values. This catches issues early as CLI errors, preventing unhandled exceptions when accessing the path later.

Citations:


🏁 Script executed:

# First, locate the file
find . -name "check_samplesheet.py" -type f

Repository: bigbio/quantms-utils

Length of output: 104


🏁 Script executed:

# Once found, examine the file around the target lines (40-90)
cat -n quantmsutils/sdrf/check_samplesheet.py | sed -n '35,85p'

Repository: bigbio/quantms-utils

Length of output: 1771


🏁 Script executed:

# Also check if there are any other similar option decorators in the file
rg -n "click\.option" quantmsutils/sdrf/check_samplesheet.py -A 2

Repository: bigbio/quantms-utils

Length of output: 369


🏁 Script executed:

# Check if there are other commands with click.option decorators for file paths
rg -B 5 "@click.command" quantmsutils/sdrf/check_samplesheet.py | head -50

Repository: bigbio/quantms-utils

Length of output: 110


🏁 Script executed:

# Get the complete file to understand all commands
wc -l quantmsutils/sdrf/check_samplesheet.py
cat -n quantmsutils/sdrf/check_samplesheet.py

Repository: bigbio/quantms-utils

Length of output: 2945


Use click.Path() to validate file existence at the CLI boundary, and wrap parser errors as ClickExceptions.

Currently, --exp_design accepts any string without validation, so missing files are only discovered inside read_sdrf() and bubble up as raw exceptions. Validate the path at the CLI boundary using click.Path(exists=True, dir_okay=False, readable=True) to catch missing files early. Additionally, wrap read_sdrf() and validate_sdrf() calls in a try-except block to translate parsing failures to ClickException, ensuring all errors surface as proper CLI usage messages rather than raw tracebacks.

Suggested changes
-@click.option("--exp_design", help="SDRF file to be validated", required=True)
+@click.option(
+    "--exp_design",
+    type=click.Path(exists=True, dir_okay=False, readable=True, path_type=str),
+    help="SDRF file to be validated",
+    required=True,
+)
 `@click.option`("--skip_sdrf_validation", help="Skip all SDRF validation", is_flag=True)
@@
-    df = read_sdrf(input_sdrf)
-    errors = df.validate_sdrf(
-        template="ms-proteomics",
-        use_ols_cache_only=use_ols_cache_only,
-    )
+    try:
+        df = read_sdrf(input_sdrf)
+        errors = df.validate_sdrf(
+            template="ms-proteomics",
+            use_ols_cache_only=use_ols_cache_only,
+        )
+    except Exception as exc:
+        raise click.ClickException(str(exc)) from exc

Also applies to: 63–80

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@quantmsutils/sdrf/check_samplesheet.py` around lines 46 - 50, Change the CLI
option that supplies the SDRF path to use click.Path(exists=True,
dir_okay=False, readable=True) so the shell/CLI validates file existence for the
parameter (the variable passed into read_sdrf is input_sdrf / the --exp_design
option). Wrap calls to read_sdrf(input_sdrf) and df.validate_sdrf(...) in a
try/except that catches exceptions from parsing/validation and re-raises them as
click.ClickException with a clear message (include the original exception text)
so failures surface as CLI-friendly errors; apply the same pattern to the second
occurrence covering lines ~63–80.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@quantmsutils/sdrf/check_samplesheet.py`:
- Around line 5-6: Resolve the sdrf-pipelines version mismatch by choosing the
supported upstream version (either pin to 0.0.33 in pyproject.toml or update
requirements.txt to >=0.1.1), then update the code to match that version's API:
verify and, if necessary, change the import "from sdrf_pipelines.sdrf.sdrf
import read_sdrf" to the correct module path used by the selected version and
update calls to df.validate_sdrf(template=..., use_ols_cache_only=...) to match
the actual parameter names and signature in that version (rename or remove
use_ols_cache_only or template args as required); ensure both dependency files
reference the same version constraint so installs are consistent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8630374e-df9f-43fa-b364-685ef327004f

📥 Commits

Reviewing files that changed from the base of the PR and between 0a6c80e and 92c92ff.

📒 Files selected for processing (1)
  • quantmsutils/sdrf/check_samplesheet.py

Comment on lines +5 to +6

from sdrf_pipelines.sdrf.sdrf import read_sdrf
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

sdrf-pipelines 0.0.33 read_sdrf validate_sdrf API

💡 Result:

sdrf-pipelines 0.0.33 is a Python package (https://pypi.org/project/sdrf-pipelines/0.0.33/, released Nov 23, 2025) from the bigbio/sdrf-pipelines GitHub repo (https://github.com/bigbio/sdrf-pipelines). It is the official SDRF (Sample and Data Relationship Format) file validator and converter for proteomics/multiomics, translating SDRF files to pipeline configurations (e.g., OpenMS, MaxQuant, MSstats). Primary usage is via the CLI tool parse_sdrf: - validate-sdrf: Validates SDRF structure, formatting, uniqueness, and optionally ontology terms (EFO, CL, MS) using YAML schema templates (default: mass spectrometry). Example: parse_sdrf validate-sdrf --sdrf_file path/to/sdrf.tsv --template human --skip-ontology Simplified: parse_sdrf validate-sdrf-simple path/to/sdrf.tsv --template default No explicit Python API documentation for read_sdrf or validate_sdrf functions found in official sources (PyPI, GitHub README, COMMANDS.md referenced but not detailed). These may be internal functions called by the CLI (e.g., in sdrf_pipelines.sdrf.validators or parsers), accessible after pip install sdrf-pipelines by importing the module, but primarily designed for CLI use. Check GitHub source code (sdrf_pipelines/ directory) for programmatic access. Install with pip install sdrf-pipelines==0.0.33 or sdrf-pipelines[ontology] for full validation.

Citations:


🏁 Script executed:

#!/bin/bash
# Verify sdrf-pipelines 0.0.33 API structure by checking GitHub source
# First, check if we can access the repo and look at the module structure at the tag for 0.0.33

# Try to fetch the raw content from GitHub at the specific version
# Check the structure of sdrf_pipelines.sdrf.sdrf module

curl -s "https://raw.githubusercontent.com/bigbio/sdrf-pipelines/0.0.33/sdrf_pipelines/sdrf/sdrf.py" | head -100

echo "---"

# Also check for SDRF class and validate_sdrf method
curl -s "https://raw.githubusercontent.com/bigbio/sdrf-pipelines/0.0.33/sdrf_pipelines/sdrf/sdrf.py" | grep -A 10 "def read_sdrf\|def validate_sdrf\|class.*SDRF" | head -50

Repository: bigbio/quantms-utils

Length of output: 81


🏁 Script executed:

#!/bin/bash
# Check for dependency specifications in the repository
find . -type f \( -name "setup.py" -o -name "requirements*.txt" -o -name "pyproject.toml" -o -name "setup.cfg" \) | head -20

echo "---"

# Check the actual usage in the check_samplesheet.py file
cat -n quantmsutils/sdrf/check_samplesheet.py | head -50

Repository: bigbio/quantms-utils

Length of output: 1897


🏁 Script executed:

#!/bin/bash
# Check the requirements and pyproject.toml for sdrf-pipelines version
echo "=== requirements.txt ==="
cat requirements.txt | grep -i sdrf

echo ""
echo "=== pyproject.toml ==="
cat pyproject.toml | grep -A 5 -B 5 sdrf

Repository: bigbio/quantms-utils

Length of output: 285


🏁 Script executed:

#!/bin/bash
# Check the actual check_samplesheet.py to see how read_sdrf and validate_sdrf are being used
cat -n quantmsutils/sdrf/check_samplesheet.py

echo ""
echo "=== Checking if there are other files using these imports ==="
rg "read_sdrf|validate_sdrf" --type py

Repository: bigbio/quantms-utils

Length of output: 2914


Fix version conflict: requirements.txt (0.0.33) vs pyproject.toml (>=0.1.1).

The requirements.txt pins sdrf-pipelines==0.0.33 while pyproject.toml specifies sdrf-pipelines>=0.1.1. This creates an installation inconsistency. Verify whether the code targets 0.0.33 specifically or can work with 0.1.1+, then align both files. If the import path from sdrf_pipelines.sdrf.sdrf import read_sdrf and the method signature df.validate_sdrf(template=..., use_ols_cache_only=...) changed between versions, adjust the code accordingly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@quantmsutils/sdrf/check_samplesheet.py` around lines 5 - 6, Resolve the
sdrf-pipelines version mismatch by choosing the supported upstream version
(either pin to 0.0.33 in pyproject.toml or update requirements.txt to >=0.1.1),
then update the code to match that version's API: verify and, if necessary,
change the import "from sdrf_pipelines.sdrf.sdrf import read_sdrf" to the
correct module path used by the selected version and update calls to
df.validate_sdrf(template=..., use_ols_cache_only=...) to match the actual
parameter names and signature in that version (rename or remove
use_ols_cache_only or template args as required); ensure both dependency files
reference the same version constraint so installs are consistent.

@ypriverol
Copy link
Copy Markdown
Member Author

We are tackling this issue here bigbio/quantms#646; we are adding a new option called minimal that allows running quantms and quantmsdiann with minimal metadata, not organism, not organism part, etc., only the structure of the SDRF, and the parameters needed for the workflow.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
quantmsutils/sdrf/check_samplesheet.py (1)

54-61: ⚠️ Potential issue | 🟠 Major

Turn bad --exp_design inputs into CLI errors.

--exp_design is still a plain string, and failures from read_sdrf() / pd.read_csv() will currently bubble out as raw exceptions. Missing files and malformed SDRFs should fail as Click errors, not Python tracebacks.

Proposed fix
-@click.option("--exp_design", help="SDRF file to be validated", required=True)
+@click.option(
+    "--exp_design",
+    type=click.Path(exists=True, dir_okay=False, readable=True, path_type=str),
+    help="SDRF file to be validated",
+    required=True,
+)
@@
-    if minimal:
-        errors = _validate_minimal(input_sdrf)
-    else:
-        df = read_sdrf(input_sdrf)
-        errors = df.validate_sdrf(
-            template=template,
-            use_ols_cache_only=use_ols_cache_only,
-        )
+    try:
+        if minimal:
+            errors = _validate_minimal(input_sdrf)
+        else:
+            df = read_sdrf(input_sdrf)
+            errors = df.validate_sdrf(
+                template=template,
+                use_ols_cache_only=use_ols_cache_only,
+            )
+    except Exception as exc:
+        raise click.ClickException(str(exc)) from exc

Also applies to: 105-105, 121-133

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@quantmsutils/sdrf/check_samplesheet.py` around lines 54 - 61, Wrap SDRF
parsing/validation so filesystem and CSV parse errors become Click errors:
import click and catch exceptions raised by read_sdrf() / pandas (e.g.,
FileNotFoundError, pd.errors.ParserError, ValueError) and any exceptions from
df.validate_sdrf(...) and re-raise them as click.ClickException with a clear
message including input_sdrf and the original error; do the same for the
minimal-path call to _validate_minimal(input_sdrf) and for the other similar
blocks referenced (the code around the calls to read_sdrf, _validate_minimal,
and df.validate_sdrf at the other locations) so user-facing errors are emitted
instead of raw tracebacks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_commands.py`:
- Around line 141-148: The no-op expression f.name is unused after the
NamedTemporaryFile context; capture the temp file path into a variable and use
that in args. After writing in the with tempfile.NamedTemporaryFile(...) as f
block, assign the filename to a variable (e.g., temp_sdrf_path = f.name) and
then set args = ["--minimal", "--exp_design", temp_sdrf_path]; reference the
tempfile.NamedTemporaryFile usage and the args construction in
tests/test_commands.py to locate where to apply this change.

---

Duplicate comments:
In `@quantmsutils/sdrf/check_samplesheet.py`:
- Around line 54-61: Wrap SDRF parsing/validation so filesystem and CSV parse
errors become Click errors: import click and catch exceptions raised by
read_sdrf() / pandas (e.g., FileNotFoundError, pd.errors.ParserError,
ValueError) and any exceptions from df.validate_sdrf(...) and re-raise them as
click.ClickException with a clear message including input_sdrf and the original
error; do the same for the minimal-path call to _validate_minimal(input_sdrf)
and for the other similar blocks referenced (the code around the calls to
read_sdrf, _validate_minimal, and df.validate_sdrf at the other locations) so
user-facing errors are emitted instead of raw tracebacks.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c7f062f0-4f30-4dcd-b69a-503e8fa74c3a

📥 Commits

Reviewing files that changed from the base of the PR and between 92c92ff and fdc415b.

📒 Files selected for processing (3)
  • quantmsutils/sdrf/check_samplesheet.py
  • recipe/meta.yaml
  • tests/test_commands.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_commands.py (1)

126-149: Clean up temp SDRF files after assertions.

Both tests create files with delete=False and never remove them. This can leave residue across repeated runs.

Proposed cleanup pattern
         with tempfile.NamedTemporaryFile(mode="w", suffix=".sdrf.tsv", delete=False) as f:
             ...
             tmp_path = f.name
-        args = ["--minimal", "--exp_design", tmp_path]
-        result = run_cli_command("checksamplesheet", args)
-        assert result.exit_code == 0
+        try:
+            args = ["--minimal", "--exp_design", tmp_path]
+            result = run_cli_command("checksamplesheet", args)
+            assert result.exit_code == 0
+        finally:
+            if os.path.exists(tmp_path):
+                os.unlink(tmp_path)
         with tempfile.NamedTemporaryFile(mode="w", suffix=".sdrf.tsv", delete=False) as f:
             ...
             tmp_path = f.name
-        args = ["--minimal", "--exp_design", tmp_path]
-        result = run_cli_command("checksamplesheet", args)
-        assert result.exit_code != 0
-        assert "cleavage agent" in result.output.lower()
+        try:
+            args = ["--minimal", "--exp_design", tmp_path]
+            result = run_cli_command("checksamplesheet", args)
+            assert result.exit_code != 0
+            assert "cleavage agent" in result.output.lower()
+        finally:
+            if os.path.exists(tmp_path):
+                os.unlink(tmp_path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_commands.py` around lines 126 - 149, The tests create temporary
SDRF files with delete=False (tmp_path) and never remove them; update both
test_check_samplesheet_minimal and test_check_samplesheet_minimal_missing_column
to ensure the temp file is unlinked after the test by wrapping the
run/assertions in a try/finally (or use pytest tmp_path fixture) and call
os.unlink(tmp_path) in the finally block so the tmp_path is always removed even
if the assertion fails; reference the tmp_path variable in the cleanup to locate
the file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/test_commands.py`:
- Around line 126-149: The tests create temporary SDRF files with delete=False
(tmp_path) and never remove them; update both test_check_samplesheet_minimal and
test_check_samplesheet_minimal_missing_column to ensure the temp file is
unlinked after the test by wrapping the run/assertions in a try/finally (or use
pytest tmp_path fixture) and call os.unlink(tmp_path) in the finally block so
the tmp_path is always removed even if the assertion fails; reference the
tmp_path variable in the cleanup to locate the file.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9c30345d-72b6-4d2f-b539-7662fdc6ae44

📥 Commits

Reviewing files that changed from the base of the PR and between fdc415b and 11ccc4c.

📒 Files selected for processing (1)
  • tests/test_commands.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
quantmsutils/sdrf/check_samplesheet.py (1)

75-78: Use df.empty instead of len(df) == 0 for pandas idiom.

Minor style suggestion: df_rows.empty is the idiomatic pandas way to check for an empty DataFrame.

Suggested change
     df_rows = pd.read_csv(input_sdrf, sep="\t", nrows=1)
-    if len(df_rows) == 0:
+    if df_rows.empty:
         errors.append("ERROR: SDRF file contains a header but no data rows.")
         return errors
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@quantmsutils/sdrf/check_samplesheet.py` around lines 75 - 78, Replace the
len(df_rows) == 0 check with the pandas-idiomatic df_rows.empty in the function
where df_rows is created (the block reading the SDRF with pd.read_csv in
check_samplesheet.py); specifically, update the conditional that appends "ERROR:
SDRF file contains a header but no data rows." to use df_rows.empty to detect an
empty DataFrame.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@quantmsutils/sdrf/check_samplesheet.py`:
- Around line 75-78: Replace the len(df_rows) == 0 check with the
pandas-idiomatic df_rows.empty in the function where df_rows is created (the
block reading the SDRF with pd.read_csv in check_samplesheet.py); specifically,
update the conditional that appends "ERROR: SDRF file contains a header but no
data rows." to use df_rows.empty to detect an empty DataFrame.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6cf28c03-b9f1-49e8-b87d-0a43edfdc43a

📥 Commits

Reviewing files that changed from the base of the PR and between 11ccc4c and 45f3cff.

📒 Files selected for processing (1)
  • quantmsutils/sdrf/check_samplesheet.py

@ypriverol ypriverol merged commit 3a1d020 into main Mar 21, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant