Conversation
13-task plan covering robustness fixes, DDA support, new DIA-NN params, InfinDIA groundwork, comprehensive documentation, and issue cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without pipefail, if the command before tee fails, tee returns 0 and the Nextflow task appears to succeed. This masked failures in generate_cfg, diann_msstats, samplesheet_check, and sdrf_parsing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These are the longest-running tasks and most susceptible to transient failures (OOM, I/O timeouts). The error_retry label enables automatic retry on signal exits (130-145, 104, 175). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Guard ch_searchdb and ch_experiment_meta with ifEmpty to fail fast with clear error messages instead of hanging indefinitely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds conf/diann_versions/v2_3_2.config with ghcr.io/bigbio/diann:2.3.2 container. Use -profile diann_v2_3_2 to opt in. Default stays 1.8.1. Enables DDA support and InfinDIA features. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New param diann_dda (boolean, default: false) - Version guard: requires DIA-NN >= 2.3.2 - Passes --dda to all 5 DIA-NN modules when enabled - Accepts DDA acquisition method in SDRF when diann_dda=true - Added --dda to blocked lists in all modules Closes #5 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- test_dda: BSA dataset with diann_dda=true on DIA-NN 2.3.2 - test_dia_skip_preanalysis: tests previously untested skip path Both added to extended_ci.yml stage 2a. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- diann_light_models: 10x faster in-silico library generation - diann_export_quant: fragment-level parquet export - diann_site_ms1_quant: MS1 apex intensities for PTM quantification All require DIA-NN >= 2.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Experimental support for InfinDIA (DIA-NN 2.3.0+). Passes --infin-dia to library generation when enabled. Version guard enforces >= 2.3.0. No test config — InfinDIA requires large databases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete reference for all ~70 pipeline parameters grouped by category with types, defaults, descriptions, and version requirements. Closes #1 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- DDA mode documentation with limitations - Missing param sections (preprocessing, extra_args scope, verbose output) - DIA-NN version selection guide - Parquet vs TSV output explanation - MSstats format section - pmultiqc citation added - README updated with version table and parameter reference link Closes #3, #9, #15 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add version guard for DIA-NN 2.0+ params (--light-models, --export-quant, --site-ms1-quant) to prevent crashes with 1.8.1 - Add *.site_report.parquet as optional output in FINAL_QUANTIFICATION for site-level PTM quantification Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. test_dda.config: Add diann_version = '2.3.2' so the version guard
doesn't reject DDA mode (default is 1.8.1, guard requires >= 2.3.2)
2. quantmsdiann.nf: Update branch condition to also match "dda"
acquisition method. Previously "dda".contains("dia") was false,
causing all DDA files to be silently dropped from processing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These flags exist in DIA-NN 1.8.x but were removed in 2.3.x, causing 'unrecognised option' warnings. Only pass them for versions < 2.3. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughAdded DDA support and related DIA-NN options across the pipeline, removed the tdf2mzml module and its manifest, updated documentation and citation (Zenodo DOI), introduced version guards and new test/config profiles, and added a VersionUtils helper for semantic version checks. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Bruker .d to mzML conversion via tdf2mzml is no longer needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Merge dev branch (version bump, tdf2mzml removal, lint fixes, DOI update) - Update test_dda.config to use PXD022287 HeLa DDA dataset with subset FASTA - Add test_dda profile to CI matrix in ci.yml Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test_dda profile uses ghcr.io/bigbio/diann:2.3.2 which is a private container requiring authentication. Add Docker login step (matching merge_ci.yml) conditioned on test_dda profile. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove implementation plan from repo, add docs/plans/ to .gitignore - Add lib/VersionUtils.groovy for semantic version comparison (prevents string comparison bugs like '2.10.0' < '2.3') - Update all version guards in dia.nf and module scripts to use VersionUtils.versionLessThan/versionAtLeast Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DDA analysis support is a major feature warranting a major version bump. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename output version→versions in sdrf_parsing/meta.yml - Add ch_ prefix to input_file→ch_input_file in input_check/meta.yml - Fix grammar in pmultiqc and diann_msstats meta.yml descriptions - Fix glob pattern in decompress_dotd/meta.yml (double-dot expansion) - Update CITATIONS.md to link published Nature Methods article - Fix schema_input.json error messages (source name, whitespace) - Standardize quantmsdiann keyword in utils meta.yml Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: v1.0.0 — robustness, DDA support, new params, InfinDIA, docs
Support multiplexing
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/ci.yml (1)
38-74:⚠️ Potential issue | 🟠 MajorKeep
test_ddaout of the lightweightci.ymlworkflow.Line 38 adds a private-container profile to the fast/public workflow, and Lines 69-72 immediately have to skip it when GHCR credentials are absent. That means fork PRs can go green without ever exercising the DDA path. This profile belongs in
extended_ci.yml/merge_ci.yml, whileci.ymlshould stay on the public 1.8.1 matrix.Based on learnings Use 3-tier CI/CD strategy: every PR/push to dev tests latest DIA-NN (2.2.0) + 1.8.1; merge dev→master runs full version × feature matrix; ci.yml uses only 1.8.1 public container, and Tests using ghcr.io/bigbio/diann:* containers require GHCR authentication due to DIA-NN's academic-only license.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/ci.yml around lines 38 - 74, The CI workflow is including a private-container profile (test_dda) in the lightweight ci.yml and conditionally skipping GHCR login for forks; remove the test_dda profile and the "Log in to GitHub Container Registry" step from ci.yml (or move both into extended_ci.yml/merge_ci.yml) so ci.yml's matrix.test_profile and matrix.exec_profile only include the public/testable entries (e.g., keep test_dia, test_dia_dotd and exec_profile docker/public 1.8.1), and ensure references to GHCR_TOKEN, GHCR_USERNAME, and the "Log in to GitHub Container Registry" step exist only in the extended/merge workflows that run with GHCR credentials.
🧹 Nitpick comments (3)
workflows/quantmsdiann.nf (1)
61-65: Correct routing logic for DIA and DDA samples.The branch condition correctly routes both DIA and DDA acquisition methods to the DIA subworkflow. Since
create_input_channelnormalizesacquisition_methodto exactly"dia"or"dda"(lowercase), the logic is sound.Consider renaming the branch from
diato something likedia_ddaorquantifiablefor clarity, since it now handles both acquisition methods. This is optional.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@workflows/quantmsdiann.nf` around lines 61 - 65, The branch condition currently sends both DIA and DDA samples to the same path by checking item[0].acquisition_method (normalized by create_input_channel), which is correct; update the branch label/key from "dia" to a clearer name like "dia_dda" or "quantifiable" wherever used (the .branch closure producing the map key and downstream references e.g., FILE_PREPARATION.out.results .branch { ... } .set { ch_fileprep_result } and any consumers that read the branch map) so the intent is explicit while keeping the existing boolean condition unchanged.modules/local/diann/insilico_library_generation/main.nf (1)
51-54: Add module-local guards for the new DIA-NN flags.Lines 51-54 and 75-79 now emit
--dda,--light-models,--infin-dia, and--pre-selectdirectly from this process, but the minimum-version checks live only inworkflows/dia.nf. Please keep the guard next to the flags here as well, so the module fails fast if it is reused outside this workflow.Based on learnings When adding a new DIA-NN feature, identify the minimum DIA-NN version required and add appropriate version checks in the module.
Also applies to: 75-79
docs/parameters.md (1)
61-64: Keep--diann_ddadocumented in one canonical spot.It is described in both the general DIA-NN table and again in the dedicated DDA section. That duplication is easy to let drift; I’d keep the full parameter row once and let the other section be narrative guidance only.
Also applies to: 124-133
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/parameters.md` around lines 61 - 64, The parameter `--diann_dda` is duplicated: keep the full canonical table row for `--diann_dda` in the general DIA-NN parameters table and remove the duplicate table row in the dedicated DDA section; in that DDA section replace the duplicate row with a short narrative note referencing the `--diann_dda` flag (e.g., "use `--diann_dda` to explicitly enable DDA mode when SDRF lacks acquisition method") so the DDA section remains guidance-only and the parameter definition is maintained only once.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@conf/tests/test_dda.config`:
- Around line 3-10: The new DDA test profile in conf/tests/test_dda.config
breaks the repo convention that conf/tests/*.config are DIA-only; rename or
relocate the profile so it conforms: either rename the profile from test_dda to
test_dia_dda (and update any references to that profile) or move the file out of
conf/tests into the appropriate feature/profile directory for non-DIA tests;
update any references to the profile name (e.g., places that call --profile
test_dda) and apply the same change for the other occurrences mentioned (lines
~22-34).
In `@docs/usage.md`:
- Around line 98-134: The docs contain a duplicate "### Preprocessing Options"
section and document a parameter `--convert_dotd` that is not present in
nextflow_schema.json; remove the duplicated/preliminary block so only the
canonical Preprocessing Options remain (locate the repeated heading and its list
in docs/usage.md), and either add a matching parameter entry for `convert_dotd`
to nextflow_schema.json (using the exact parameter key convert_dotd) or remove
the `--convert_dotd` line from the docs; after updating parameters run `nf-core
pipelines schema build` to regenerate nextflow_schema.json and ensure the docs
and schema stay in sync.
In `@subworkflows/local/create_input_channel/main.nf`:
- Around line 63-64: The code currently uses rows[0] (base_row) after groupTuple
to pick per-file metadata (dissociation method, tolerances, variableMods, m/z
ranges, and FixedModifications via fixedMods[0]) which makes output
order-dependent and can drop row-specific values; update the logic in the block
that references base_row and the other similar blocks (the grouped handling
around groupTuple and the comparable logic in the 88-105 and 129-182 regions) to
collect all unique values across rows for each metadata field and emit them in
the scalar format (e.g., join / dedupe into a single string or array as required
by downstream workflows), and if multiple differing values are incompatible with
aggregation, fail fast with a clear error referencing the file/group;
specifically replace uses of rows[0], fixedMods[0], and any single-row lookups
with aggregation functions (unique/dedupe) or explicit validation that throws
when disagreements exist.
- Around line 33-40: The current logic that rewrites filestr when
params.local_input_type is set naively replaces everything after the last '.'
which breaks compressed filenames like sample.raw.gz (becoming sample.raw.mzML).
Update the filestr rewriting to detect common compression suffixes (e.g., .gz,
.bz2, .zip, .xz): if a compression suffix is present, split filestr into base
and compressionSuffix, replace the base's extension with params.local_input_type
and then reattach the compressionSuffix; otherwise perform the existing
single-extension replacement. Keep references to the same variables (filestr,
params.local_input_type, row.Filename) so the change integrates into the
existing block that builds filestr.
In `@workflows/dia.nf`:
- Around line 71-78: The ch_is_dda calculation mixes params.diann_dda with
SDRF-derived meta.acquisition_method causing downstream inconsistency: ENSURE
the final boolean is used consistently by either making params.diann_dda a
strict override or passing ch_is_dda into all DIA-NN modules; specifically,
update the logic around ch_is_dda (function/variable ch_is_dda) to treat
params.diann_dda as a definitive override when present, and propagate that same
boolean into the INSILICO_LIBRARY_GENERATION, ASSEMBLE_EMPIRICAL_LIBRARY, and
INDIVIDUAL_ANALYSIS module inputs so they all use the same DDA flag instead of
re-reading meta.acquisition_method.
---
Outside diff comments:
In @.github/workflows/ci.yml:
- Around line 38-74: The CI workflow is including a private-container profile
(test_dda) in the lightweight ci.yml and conditionally skipping GHCR login for
forks; remove the test_dda profile and the "Log in to GitHub Container Registry"
step from ci.yml (or move both into extended_ci.yml/merge_ci.yml) so ci.yml's
matrix.test_profile and matrix.exec_profile only include the public/testable
entries (e.g., keep test_dia, test_dia_dotd and exec_profile docker/public
1.8.1), and ensure references to GHCR_TOKEN, GHCR_USERNAME, and the "Log in to
GitHub Container Registry" step exist only in the extended/merge workflows that
run with GHCR credentials.
---
Nitpick comments:
In `@docs/parameters.md`:
- Around line 61-64: The parameter `--diann_dda` is duplicated: keep the full
canonical table row for `--diann_dda` in the general DIA-NN parameters table and
remove the duplicate table row in the dedicated DDA section; in that DDA section
replace the duplicate row with a short narrative note referencing the
`--diann_dda` flag (e.g., "use `--diann_dda` to explicitly enable DDA mode when
SDRF lacks acquisition method") so the DDA section remains guidance-only and the
parameter definition is maintained only once.
In `@workflows/quantmsdiann.nf`:
- Around line 61-65: The branch condition currently sends both DIA and DDA
samples to the same path by checking item[0].acquisition_method (normalized by
create_input_channel), which is correct; update the branch label/key from "dia"
to a clearer name like "dia_dda" or "quantifiable" wherever used (the .branch
closure producing the map key and downstream references e.g.,
FILE_PREPARATION.out.results .branch { ... } .set { ch_fileprep_result } and any
consumers that read the branch map) so the intent is explicit while keeping the
existing boolean condition unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ed739da3-8eb4-4f5a-a26b-5b9a92709834
📒 Files selected for processing (32)
.github/workflows/ci.yml.github/workflows/extended_ci.yml.gitignoreCITATIONS.mdassets/schema_input.jsonconf/diann_versions/v2_3_2.configconf/tests/test_dda.configconf/tests/test_dia_skip_preanalysis.configdocs/parameters.mddocs/usage.mdlib/VersionUtils.groovymodules/local/diann/assemble_empirical_library/main.nfmodules/local/diann/diann_msstats/main.nfmodules/local/diann/diann_msstats/meta.ymlmodules/local/diann/final_quantification/main.nfmodules/local/diann/generate_cfg/main.nfmodules/local/diann/individual_analysis/main.nfmodules/local/diann/insilico_library_generation/main.nfmodules/local/diann/insilico_library_generation/meta.ymlmodules/local/diann/preliminary_analysis/main.nfmodules/local/pmultiqc/meta.ymlmodules/local/samplesheet_check/main.nfmodules/local/sdrf_parsing/main.nfmodules/local/sdrf_parsing/meta.ymlmodules/local/utils/decompress_dotd/meta.ymlnextflow.confignextflow_schema.jsonsubworkflows/local/create_input_channel/main.nfsubworkflows/local/input_check/meta.ymlsubworkflows/local/utils_nfcore_quantms_pipeline/meta.ymlworkflows/dia.nfworkflows/quantmsdiann.nf
✅ Files skipped from review due to trivial changes (12)
- .gitignore
- modules/local/utils/decompress_dotd/meta.yml
- modules/local/diann/diann_msstats/meta.yml
- subworkflows/local/input_check/meta.yml
- CITATIONS.md
- subworkflows/local/utils_nfcore_quantms_pipeline/meta.yml
- modules/local/sdrf_parsing/meta.yml
- modules/local/diann/insilico_library_generation/meta.yml
- modules/local/pmultiqc/meta.yml
- assets/schema_input.json
- conf/diann_versions/v2_3_2.config
- conf/tests/test_dia_skip_preanalysis.config
🚧 Files skipped from review as they are similar to previous changes (1)
- nextflow.config
| Nextflow config file for testing DDA analysis (requires DIA-NN >= 2.3.2) | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| Tests DDA mode using the PXD022287 HeLa dataset with --diann_dda flag. | ||
| Uses ghcr.io/bigbio/diann:2.3.2. | ||
|
|
||
| Use as follows: | ||
| nextflow run bigbio/quantmsdiann -profile test_dda,docker [--outdir <OUTDIR>] | ||
|
|
There was a problem hiding this comment.
conf/tests/test_dda.config breaks the current test-profile contract.
This introduces a DDA profile under conf/tests and exposes it as test_dda, but this directory/naming scheme is currently reserved for DIA-only profiles. If DDA is now meant to be first-class here, the repository convention should be updated in the same PR; otherwise this profile should move or be renamed.
As per coding guidelines, conf/tests/*.config test profiles are DIA-only, and based on learnings new feature profiles should follow test_dia_<feature_name>.
Also applies to: 22-34
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@conf/tests/test_dda.config` around lines 3 - 10, The new DDA test profile in
conf/tests/test_dda.config breaks the repo convention that conf/tests/*.config
are DIA-only; rename or relocate the profile so it conforms: either rename the
profile from test_dda to test_dia_dda (and update any references to that
profile) or move the file out of conf/tests into the appropriate feature/profile
directory for non-DIA tests; update any references to the profile name (e.g.,
places that call --profile test_dda) and apply the same change for the other
occurrences mentioned (lines ~22-34).
| ### Preprocessing Options | ||
|
|
||
| - `--reindex_mzml` (default: true) — Re-index mzML files before processing. Disable with `--reindex_mzml false` if files are already indexed. | ||
| - `--mzml_statistics` (default: false) — Generate mzML statistics (parquet format) for QC. | ||
| - `--mzml_features` (default: false) — Enable feature detection in mzML statistics. | ||
| - `--convert_dotd` (default: false) — Convert Bruker .d files to mzML via tdf2mzml instead of passing natively to DIA-NN. | ||
|
|
||
| ### Passing Extra Arguments to DIA-NN | ||
|
|
||
| Use `--diann_extra_args` to pass additional flags to all DIA-NN steps. The pipeline validates and strips flags it manages internally to prevent conflicts. | ||
|
|
||
| Managed flags (stripped with a warning if passed via extra_args): `--lib`, `--f`, `--fasta`, `--threads`, `--verbose`, `--temp`, `--out`, `--matrices`, `--use-quant`, `--gen-spec-lib`, `--mass-acc`, `--mass-acc-ms1`, `--window`, `--var-mod`, `--fixed-mod`, `--monitor-mod`, and others. | ||
|
|
||
| To enable this, add `includeConfig 'conf/modules/dia.config'` to your configuration (already included by default). | ||
|
|
||
| ### DIA-NN Version Selection | ||
|
|
||
| The default DIA-NN version is 1.8.1. To use a different version: | ||
|
|
||
| | Version | Profile | Features | | ||
| | ------- | ----------------------- | ----------------------------------- | | ||
| | 1.8.1 | (default) | Core DIA analysis | | ||
| | 2.1.0 | `-profile diann_v2_1_0` | Native .raw support, reduced memory | | ||
| | 2.2.0 | `-profile diann_v2_2_0` | Speed optimizations | | ||
| | 2.3.2 | `-profile diann_v2_3_2` | DDA support, InfinDIA | | ||
|
|
||
| Example: `nextflow run bigbio/quantmsdiann -profile test_dia,docker,diann_v2_2_0` | ||
|
|
||
| ### Verbose Module Output | ||
|
|
||
| Use `-profile verbose_modules` to publish intermediate files from all pipeline steps: | ||
|
|
||
| ```bash | ||
| nextflow run bigbio/quantmsdiann -profile test_dia,docker,verbose_modules --outdir results | ||
| ``` | ||
|
|
||
| This publishes ThermoRawFileParser conversions, mzML indexing results, per-file DIA-NN logs, and spectral library intermediates. |
There was a problem hiding this comment.
Consolidate this duplicate section and fix the schema drift.
Line 98 repeats the existing ### Preprocessing Options heading, and the extra-args/version/verbose blocks here are second copies of the fuller sections later in the same file. Line 103 also documents --convert_dotd, but there is no matching property anywhere in the provided nextflow_schema.json, so schema-driven help and docs will disagree. Please keep a single section and either add the schema entry or remove the parameter mention.
Based on learnings Update nextflow_schema.json using nf-core pipelines schema build whenever parameters are added or modified.
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 98-98: Multiple headings with the same content
(MD024, no-duplicate-heading)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/usage.md` around lines 98 - 134, The docs contain a duplicate "###
Preprocessing Options" section and document a parameter `--convert_dotd` that is
not present in nextflow_schema.json; remove the duplicated/preliminary block so
only the canonical Preprocessing Options remain (locate the repeated heading and
its list in docs/usage.md), and either add a matching parameter entry for
`convert_dotd` to nextflow_schema.json (using the exact parameter key
convert_dotd) or remove the `--convert_dotd` line from the docs; after updating
parameters run `nf-core pipelines schema build` to regenerate
nextflow_schema.json and ensure the docs and schema stay in sync.
| if (!params.root_folder) { | ||
| filestr = row.URI?.toString()?.trim() ? row.URI.toString() : row.Filename.toString() | ||
| } else { | ||
| filestr = row.Filename.toString() | ||
| filestr = params.root_folder + File.separator + filestr | ||
| filestr = (params.local_input_type | ||
| ? filestr.take(filestr.lastIndexOf('.')) + '.' + params.local_input_type | ||
| : filestr) |
There was a problem hiding this comment.
local_input_type rewrites break compressed local copies.
Replacing only the suffix after the last . turns paths like sample.raw.gz into sample.raw.mzML, so the existence check fails for the compressed variants that --local_input_type is supposed to support.
Suggested fix
- filestr = row.Filename.toString()
+ filestr = row.Filename.toString().trim()
filestr = params.root_folder + File.separator + filestr
- filestr = (params.local_input_type
- ? filestr.take(filestr.lastIndexOf('.')) + '.' + params.local_input_type
- : filestr)
+ if (params.local_input_type) {
+ def compressionSuffix = ['.tar.gz', '.gz', '.tar', '.zip']
+ .find { suffix -> filestr.toLowerCase().endsWith(suffix) } ?: ''
+ def uncompressedPath = compressionSuffix
+ ? filestr[0..<(filestr.length() - compressionSuffix.length())]
+ : filestr
+ def extIndex = uncompressedPath.lastIndexOf('.')
+ def basePath = extIndex >= 0 ? uncompressedPath.take(extIndex) : uncompressedPath
+ filestr = basePath + '.' + params.local_input_type + compressionSuffix
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (!params.root_folder) { | |
| filestr = row.URI?.toString()?.trim() ? row.URI.toString() : row.Filename.toString() | |
| } else { | |
| filestr = row.Filename.toString() | |
| filestr = params.root_folder + File.separator + filestr | |
| filestr = (params.local_input_type | |
| ? filestr.take(filestr.lastIndexOf('.')) + '.' + params.local_input_type | |
| : filestr) | |
| if (!params.root_folder) { | |
| filestr = row.URI?.toString()?.trim() ? row.URI.toString() : row.Filename.toString() | |
| } else { | |
| filestr = row.Filename.toString().trim() | |
| filestr = params.root_folder + File.separator + filestr | |
| if (params.local_input_type) { | |
| def compressionSuffix = ['.tar.gz', '.gz', '.tar', '.zip'] | |
| .find { suffix -> filestr.toLowerCase().endsWith(suffix) } ?: '' | |
| def uncompressedPath = compressionSuffix | |
| ? filestr[0..<(filestr.length() - compressionSuffix.length())] | |
| : filestr | |
| def extIndex = uncompressedPath.lastIndexOf('.') | |
| def basePath = extIndex >= 0 ? uncompressedPath.take(extIndex) : uncompressedPath | |
| filestr = basePath + '.' + params.local_input_type + compressionSuffix | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@subworkflows/local/create_input_channel/main.nf` around lines 33 - 40, The
current logic that rewrites filestr when params.local_input_type is set naively
replaces everything after the last '.' which breaks compressed filenames like
sample.raw.gz (becoming sample.raw.mzML). Update the filestr rewriting to detect
common compression suffixes (e.g., .gz, .bz2, .zip, .xz): if a compression
suffix is present, split filestr into base and compressionSuffix, replace the
base's extension with params.local_input_type and then reattach the
compressionSuffix; otherwise perform the existing single-extension replacement.
Keep references to the same variables (filestr, params.local_input_type,
row.Filename) so the change integrates into the existing block that builds
filestr.
| def base_row = rows[0] | ||
|
|
There was a problem hiding this comment.
Grouped SDRF rows still make per-file metadata order-dependent.
After groupTuple, rows[0] becomes the source of truth for dissociation method, tolerances, variable mods, and m/z ranges, and FixedModifications keeps only fixedMods[0]. For multiplexed files that can silently drop row-specific values before workflows/dia.nf forwards them downstream. Please aggregate every unique value into the emitted scalar format, or fail fast when grouped rows disagree.
Also applies to: 88-105, 129-182
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@subworkflows/local/create_input_channel/main.nf` around lines 63 - 64, The
code currently uses rows[0] (base_row) after groupTuple to pick per-file
metadata (dissociation method, tolerances, variableMods, m/z ranges, and
FixedModifications via fixedMods[0]) which makes output order-dependent and can
drop row-specific values; update the logic in the block that references base_row
and the other similar blocks (the grouped handling around groupTuple and the
comparable logic in the 88-105 and 129-182 regions) to collect all unique values
across rows for each metadata field and emit them in the scalar format (e.g.,
join / dedupe into a single string or array as required by downstream
workflows), and if multiple differing values are incompatible with aggregation,
fail fast with a clear error referencing the file/group; specifically replace
uses of rows[0], fixedMods[0], and any single-row lookups with aggregation
functions (unique/dedupe) or explicit validation that throws when disagreements
exist.
| // Determine DDA mode: true if explicitly set via param OR auto-detected from SDRF | ||
| ch_is_dda = ch_experiment_meta.map { meta -> | ||
| def dda = params.diann_dda || meta.acquisition_method == 'dda' | ||
| if (dda && VersionUtils.versionLessThan(params.diann_version, '2.3.2')) { | ||
| error("DDA mode (detected from SDRF) requires DIA-NN >= 2.3.2. Current version: ${params.diann_version}. Use -profile diann_v2_3_2") | ||
| } | ||
| return dda | ||
| } |
There was a problem hiding this comment.
ch_is_dda can disagree with the downstream modules.
Line 73 re-applies params.diann_dda on top of the SDRF-derived acquisition method, but only INSILICO_LIBRARY_GENERATION receives that boolean at Line 90. ASSEMBLE_EMPIRICAL_LIBRARY and INDIVIDUAL_ANALYSIS still derive DDA from meta.acquisition_method, so --diann_dda true against an SDRF that explicitly says DIA will make the first step run with --dda while later steps stay in DIA mode. Either keep --diann_dda as a strict fallback here too, or thread the same boolean through every DIA-NN module.
Possible fix
- ch_is_dda = ch_experiment_meta.map { meta ->
- def dda = params.diann_dda || meta.acquisition_method == 'dda'
+ ch_is_dda = ch_experiment_meta.map { meta ->
+ if (params.diann_dda && meta.acquisition_method == 'dia') {
+ error('--diann_dda is only a fallback when the SDRF lacks an acquisition method; remove it or fix the SDRF value.')
+ }
+ def dda = meta.acquisition_method == 'dda'
if (dda && VersionUtils.versionLessThan(params.diann_version, '2.3.2')) {
error("DDA mode (detected from SDRF) requires DIA-NN >= 2.3.2. Current version: ${params.diann_version}. Use -profile diann_v2_3_2")
}
return dda
}Also applies to: 90-90
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@workflows/dia.nf` around lines 71 - 78, The ch_is_dda calculation mixes
params.diann_dda with SDRF-derived meta.acquisition_method causing downstream
inconsistency: ENSURE the final boolean is used consistently by either making
params.diann_dda a strict override or passing ch_is_dda into all DIA-NN modules;
specifically, update the logic around ch_is_dda (function/variable ch_is_dda) to
treat params.diann_dda as a definitive override when present, and propagate that
same boolean into the INSILICO_LIBRARY_GENERATION, ASSEMBLE_EMPIRICAL_LIBRARY,
and INDIVIDUAL_ANALYSIS module inputs so they all use the same DDA flag instead
of re-reading meta.acquisition_method.
…N >= 2.x DIA-NN 2.3.2 no longer produces a *lib.log.txt file during in-silico library generation. The hard-coded `cp *lib.log.txt` caused a fatal error. Now gracefully handles the missing log file with a fallback message. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…N >= 2.x DIA-NN 2.3.2 no longer produces a *lib.log.txt file during in-silico library generation. Instead of using a shell cp workaround, use the proper Nextflow approach: glob pattern with optional: true in the output declaration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace shell `cp` log file renaming with native Nextflow output declarations using the actual DIA-NN filenames and optional: true. This handles DIA-NN >= 2.x which may not produce log files. Modules updated: - preliminary_analysis: report.log.txt (optional) - assemble_empirical_library: report.log.txt (optional) - final_quantification: diann_report.log.txt (optional) - individual_analysis: report.log.txt (optional) The downstream log parser in dia.nf now uses .ifEmpty() to fall back to user params when no log file is produced. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restore the cp-based renaming (e.g., report.log.txt -> *_diann.log) to preserve sample-specific log names, but guard each cp with `if [ -f ... ]` so it doesn't fail when DIA-NN >= 2.x doesn't produce the log file. All log outputs are now marked optional: true. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DIA-NN names the log file based on the --out parameter (e.g., --out report.tsv -> report.log.txt). The in-silico library generation step was missing --out entirely, so DIA-NN 2.3.2 didn't produce any log file. Adding --out silicolibrarygeneration.tsv tells DIA-NN to write silicolibrarygeneration.log.txt. Still marked optional: true in case some DIA-NN versions skip the log for library-only runs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All DIA-NN modules now capture stdout/stderr via tee to guarantee a log file exists regardless of DIA-NN version. If DIA-NN also produces its native .log.txt file, it overwrites the tee output (native log may contain additional structured information). This two-layer approach ensures: - Log files are always produced (tee fallback) - Native DIA-NN logs are preferred when available (cp override) - Meaningful per-step names are preserved - No optional: true needed on log outputs - .ifEmpty() workaround in dia.nf no longer needed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Nextflow warns that .first() is useless on a value channel. The ch_sdrf channel already contains a single value and is multicast in DSL2, so .first() was redundant. Removed it and map directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FINAL_QUANTIFICATION emits both diann_report.parquet and diann_report.tsv when both exist (DIA-NN 2.x). The --report flag only accepts one file. Now selects parquet preferentially over TSV when both are staged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DIA-NN < 1.9 produces TSV, >= 1.9 produces parquet (ignoring the
.tsv extension in --out). The previous glob {tsv,parquet} matched
both files when intermediate versions produced both, breaking
DIANN_MSSTATS which only accepts one file.
Now emits separate channels (main_report_parquet, main_report_tsv)
each marked optional, merged via .mix() in the workflow so only
the version-appropriate format reaches downstream consumers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove hardcoded DIA-NN 1.9.2 container from pride_codon_slurm.config (cluster config should not dictate DIA-NN version) - Remove docker.enabled/singularity.enabled from version configs (container engine is selected via -profile, not version config) Users select DIA-NN version via -profile diann_v2_3_2 (or similar) and container engine via -profile singularity/docker independently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename protein_level_fdr_cutoff -> precursor_qvalue to accurately reflect that DIA-NN's --qvalue controls precursor-level filtering, not protein-level FDR. Expose two new matrix-level FDR parameters: - matrix_qvalue (default 0.01): maps to --matrix-qvalue - matrix_spec_q (default 0.05): maps to --matrix-spec-q This enables proteogenomics workflows to independently control matrix filtering (e.g., matrix_spec_q=1.0 to retain variant proteins that lack sufficient unique peptides). Closes #42 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add diann_scoring_mode parameter with three options: - 'generic' (default): standard scoring, maximizes protein IDs - 'proteoforms': single-residue mutation decoys, recommended for proteogenomics/variant detection (requires DIA-NN >= 2.0) - 'peptidoforms': extra peptidoform q-values for PTM analysis The flag is passed to all 5 DIA-NN steps and added to each module's blocked flags list. Version validation ensures proteoforms mode is only used with DIA-NN >= 2.0. Closes #43 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update the quantms-utils and pmultiqc images
Fix/insilico log diann 2.3.2
Since quantmsdiann is a DIA-NN-only pipeline, the diann_ prefix on parameters is redundant. Renamed all user-facing params: diann_debug -> debug_level diann_speclib -> speclib diann_extra_args -> extra_args diann_dda -> dda diann_light_models -> light_models diann_export_quant -> export_quant diann_site_ms1_quant -> site_ms1_quant diann_pre_select -> pre_select diann_report_decoys -> report_decoys diann_export_xic -> export_xic diann_normalize -> normalize diann_use_quant -> use_quant diann_tims_sum -> tims_sum diann_im_window -> im_window diann_channel_run_norm -> channel_run_norm diann_channel_spec_norm -> channel_spec_norm Removed diann_no_peptidoforms entirely — superseded by the new scoring_mode parameter (generic/proteoforms/peptidoforms). --no-peptidoforms remains in blocked flags lists. Note: diann_version is NOT renamed (used in profile names). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: rename FDR params and expose matrix-level q-value controls
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: add DIA-NN scoring mode parameter (Generic/Proteoforms/Peptidoforms)
PR checklist
nf-core pipelines lint).nextflow run . -profile test,docker --outdir <OUTDIR>).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).Summary by CodeRabbit
Documentation
New Features
Bug Fixes
Chores