Skip to content

Conversation

@ypriverol
Copy link
Member

@ypriverol ypriverol commented Nov 27, 2025

PR Type

Bug fix, Enhancement


Description

  • Fixed DIA-NN PTM parsing bug with improved modification handling

  • Refactored PTM conversion logic into reusable get_mod() function

  • Added support for Met-loss modification with --met-excision flag

  • Merged same modifications across different sites for cleaner config

  • Updated sdrf-pipelines dependency to version 0.0.33

  • Added test case for dianncfg command


Diagram Walkthrough

flowchart LR
  A["PTM Parsing Input"] --> B["get_mod Function"]
  B --> C["Validate Unimod"]
  C --> D["Extract Site Info"]
  D --> E["Handle Special Cases"]
  E --> F["Merge Same Mods"]
  F --> G["DIA-NN Config Output"]
Loading

File Walkthrough

Relevant files
Configuration changes
__init__.py
Version bump to 0.0.24                                                                     

quantmsutils/init.py

  • Bumped version from 0.0.23 to 0.0.24
+1/-1     
Bug fix
dianncfg.py
Refactor PTM parsing and fix Met-loss handling                     

quantmsutils/diann/dianncfg.py

  • Extracted PTM parsing logic into new get_mod() function for
    reusability
  • Moved UnimodDatabase() initialization to module level
  • Added special handling for Met-loss modification with --met-excision
    flag
  • Implemented modification merging for same PTMs across different sites
    using defaultdict
  • Improved error handling with exit(1) for unsupported modifications
  • Changed logging from info/print to error for unsupported modifications
  • Added import for defaultdict from collections
+81/-67 
Tests
test_commands.py
Add dianncfg command test case                                                     

tests/test_commands.py

  • Added new test method test_dianncfg_example() to validate dianncfg
    command
  • Tests enzyme, fixed modifications, and variable modifications
    parameters
  • Verifies successful command execution with exit code 0
+14/-1   
Dependencies
environment.yml
Pin sdrf-pipelines to version 0.0.33                                         

environment.yml

  • Updated sdrf-pipelines dependency from >=0.0.31 to ==0.0.33
+1/-1     
pyproject.toml
Pin sdrf-pipelines to version 0.0.33                                         

pyproject.toml

  • Updated sdrf-pipelines dependency from >=0.0.32 to ==0.0.33
+1/-1     
requirements.txt
Pin sdrf-pipelines to version 0.0.33                                         

requirements.txt

  • Updated sdrf-pipelines dependency from >=0.0.31 to ==0.0.33
+1/-1     

Summary by CodeRabbit

  • Chores

    • Pinned sdrf-pipelines to v0.0.33 and tightened pyopenms constraints.
    • Bumped package version to 0.0.24.
    • Added .gitignore entries for generated files.
  • Improvements

    • Enhanced modification parsing for DIA‑NN: site/terminal normalization, merging, validation and MET‑excision support.
    • Added public APIs for unimod handling and modification retrieval.
  • Tests

    • Added a test covering the DIA‑NN configuration workflow.

✏️ Tip: You can customize this high-level summary in your review settings.

@ypriverol ypriverol requested a review from Copilot November 27, 2025 06:44
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 27, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Updates dependency pins for sdrf-pipelines to ==0.0.33 across manifests, reconciles mixed pyopenms bound edits, bumps package version to 0.0.24, refactors DIA‑NN modification handling with a lazy UnimodDatabase plus get_mod and MET_LOSS_MODIFICATION, adds a CLI test for dianncfg, and updates .gitignore.

Changes

Cohort / File(s) Summary
Dependency files
environment.yml, pyproject.toml, requirements.txt
Pinned sdrf-pipelines to ==0.0.33; pyopenms bounds were edited inconsistently across manifests (>=3.2.0 vs >=3.3.0).
Package version
quantmsutils/__init__.py
Bumped __version__ from 0.0.23 to 0.0.24.
DIA‑NN configuration refactor
quantmsutils/diann/dianncfg.py
Added lazy get_unimod_database() and public get_mod(mod, mod_type); introduced MET_LOSS_MODIFICATION; changed convert_mod signature to convert_mod(fix_mod: str, var_mod: str); implemented site/terminal normalization and merging via defaultdict; injects --met-excision when relevant.
Tests
tests/test_commands.py
Added test_dianncfg_example to exercise the dianncfg CLI with sample fixed/variable modifications and assert exit code 0.
Ignore rules
.gitignore
Added ignore entries for diann_config.cfg and *.parquet.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant dianncfg as dianncfg module
    participant get_mod as get_mod()
    participant unimod as get_unimod_database()

    rect rgba(0,128,96,0.06)
    Note over dianncfg,unimod: New flow — lazy Unimod DB + get_mod helper
    Caller->>dianncfg: dianncfg(..., fix_mod, var_mod)
    dianncfg->>dianncfg: convert_mod(fix_mod, var_mod)
    dianncfg->>get_mod: request mapping for each mod
    end

    rect rgba(0,96,160,0.05)
    get_mod->>unimod: ensure/init unimod DB & query entry
    unimod-->>get_mod: unimod entry/data
    get_mod-->>dianncfg: formatted DIA‑NN modifier + site
    end

    dianncfg->>dianncfg: merge modifiers, normalize terminals, detect MET_LOSS_MODIFICATION
    dianncfg-->>Caller: assembled DIA‑NN config (may include --met-excision)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Focus areas:
    • quantmsutils/diann/dianncfg.py: correctness of get_mod(), terminal/site normalization, merging logic, MET_LOSS_MODIFICATION detection, and lazy unimod DB lifecycle (thread-safety/test isolation).
    • Dependency manifests: reconcile pyopenms bounds across environment.yml / requirements.txt and confirm intended sdrf-pipelines pin.
    • tests/test_commands.py: ensure the added CLI test is deterministic and self-contained.

Possibly related PRs

Poem

🐇 A hop, a pin, a tiny tweak,
A cached unimod for code to seek.
Mods merged neat, a test that sings,
v0.0.24 — the rabbit brings. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Update changes in PTM diann handling' is vague and uses non-descriptive phrasing like 'Update changes' that doesn't clearly convey the main improvements or fixes to the changeset. Consider a more specific title like 'Refactor PTM handling for DIA-NN with Met-loss support and modification merging' to better reflect the key changes in lazy initialization, get_mod function extraction, and Met-loss modification support.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b0fe77 and 4128e98.

📒 Files selected for processing (1)
  • pyproject.toml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pyproject.toml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build
  • GitHub Check: build (3.9)
  • GitHub Check: build (3.10)
  • GitHub Check: build

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@qodo-code-review
Copy link

qodo-code-review bot commented Nov 27, 2025

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Hard Exit: The functions call exit(1) on unsupported or multiplex modifications, causing abrupt
termination without graceful error propagation or contextual messaging/handling for edge
cases.

Referred Code
if tag == 0:
    logging.error(
        "Currently only supported unimod modifications for DIA pipeline. Skipped: "
        + mod
    )
    exit(1)

# TODO support DIA multiplex
if (
        "TMT" in diann_mod_name
        or "Label:" in diann_mod_name
        or "iTRAQ" in diann_mod_name
        or "mTRAQ" in diann_mod_name
        or "Dimethyl:" in diann_mod_name
):
    logging.error(
        "quantms DIA-NN workflow only support LFQ now! Unsupported modifications: "
        + mod
    )
    exit(1)
elif diann_mod_accession is not None:


 ... (clipped 25 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Action Logging: The new code performs configuration generation and exits on unsupported modifications
without emitting structured audit logs identifying user/context, making it unclear who
triggered the action and what outcome occurred.

Referred Code
    with open("diann_config.cfg", "w") as file:
        file.write("--cut " + cut + diann_fix_ptm + diann_var_ptm)


def get_mod(mod, mod_type):
    pattern = re.compile(r"\((.*?)\)")
    tag = 0
    diann_mod_accession = None
    diann_mod_name = None
    for modification in unimod_database.modifications:
        if modification.get_name() == mod.split(" ")[0]:
            diann_mod_accession = modification.get_accession().replace("UNIMOD:", "UniMod:") + "," + str(modification._delta_mono_mass)
            diann_mod_name = modification.get_name()
            tag = 1
            break

    if tag == 0:
        logging.error(
            "Currently only supported unimod modifications for DIA pipeline. Skipped: "
            + mod
        )


 ... (clipped 41 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Input Validation: CLI inputs for modifications are parsed and used without explicit validation/sanitization
beyond Unimod name matching, which may require additional checks for empty or malformed
values.

Referred Code
def convert_mod(fix_mod: str, var_mod: str) -> Tuple[List, List]:
    var_ptm = []
    fix_ptm = []
    if fix_mod != "":
        merged = defaultdict(list)
        for mod in fix_mod.split(","):
            diann_mod, site = get_mod(mod, "fixed_mod")
            merged[diann_mod].append(site)

        # merge same modification for different sites
        for name, site_list in merged.items():
            site_str = "".join(sorted(set(site_list)))
            fix_ptm.append(f"{name},{site_str}")

    if var_mod != "":
        merged = defaultdict(list)
        for mod in var_mod.split(","):
            diann_mod, site = get_mod(mod, "var_mod")
            merged[diann_mod].append(site)
        # merge same modification for different sites
        for name, site_list in merged.items():


 ... (clipped 4 lines)

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

qodo-code-review bot commented Nov 27, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Replace exit(1) calls with exceptions

Instead of calling exit(1) for unsupported modifications within the get_mod
function, raise custom exceptions. This allows the calling application to handle
errors gracefully rather than having the process terminate abruptly.

Examples:

quantmsutils/diann/dianncfg.py [67-71]
        logging.error(
            "Currently only supported unimod modifications for DIA pipeline. Skipped: "
            + mod
        )
        exit(1)
quantmsutils/diann/dianncfg.py [81-85]
        logging.error(
            "quantms DIA-NN workflow only support LFQ now! Unsupported modifications: "
            + mod
        )
        exit(1)

Solution Walkthrough:

Before:

def get_mod(mod, mod_type):
    # ...
    if modification_not_found:
        logging.error(
            "Currently only supported unimod modifications for DIA pipeline. Skipped: "
            + mod
        )
        exit(1)
    # ...
    if modification_is_for_labeling:
        logging.error(
            "quantms DIA-NN workflow only support LFQ now! Unsupported modifications: "
            + mod
        )
        exit(1)
    # ...
    return diann_mod_accession, site

After:

class UnsupportedModificationError(Exception):
    pass

def get_mod(mod, mod_type):
    # ...
    if modification_not_found:
        raise UnsupportedModificationError(f"Unsupported unimod modification: {mod}")
    # ...
    if modification_is_for_labeling:
        raise UnsupportedModificationError(f"Unsupported labeling modification for LFQ: {mod}")
    # ...
    return diann_mod_accession, site

# In the click command:
# try:
#   ... call get_mod ...
# except UnsupportedModificationError as e:
#   logging.error(e)
#   exit(1)
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that using exit(1) in the new get_mod helper function is poor design for a library, as it prevents graceful error handling by the calling code.

Medium
General
Raise exceptions instead of exiting

Replace exit(1) calls in the get_mod function with a raised ValueError. This
will improve reusability and allow the calling function to handle errors
gracefully.

quantmsutils/diann/dianncfg.py [66-85]

 if tag == 0:
-    logging.error(
+    raise ValueError(
         "Currently only supported unimod modifications for DIA pipeline. Skipped: "
         + mod
     )
-    exit(1)
 
 # TODO support DIA multiplex
 if (
         "TMT" in diann_mod_name
         or "Label:" in diann_mod_name
         or "iTRAQ" in diann_mod_name
         or "mTRAQ" in diann_mod_name
         or "Dimethyl:" in diann_mod_name
 ):
-    logging.error(
+    raise ValueError(
         "quantms DIA-NN workflow only support LFQ now! Unsupported modifications: "
         + mod
     )
-    exit(1)
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out that library functions should not call exit(), and raising an exception is better for reusability and testing, which is a significant architectural improvement.

Medium
Improve performance with a lookup map

Improve performance by creating a module-level lookup map for
unimod_database.modifications. This will optimize the modification lookup in the
get_mod function from a loop to a direct O(1) access.

quantmsutils/diann/dianncfg.py [59-64]

-for modification in unimod_database.modifications:
-    if modification.get_name() == mod.split(" ")[0]:
-        diann_mod_accession = modification.get_accession().replace("UNIMOD:", "UniMod:") + "," + str(modification._delta_mono_mass)
-        diann_mod_name = modification.get_name()
-        tag = 1
-        break
+mod_name = mod.split(" ")[0]
+if mod_name in unimod_name_map:
+    modification = unimod_name_map[mod_name]
+    diann_mod_accession = modification.get_accession().replace("UNIMOD:", "UniMod:") + "," + str(modification._delta_mono_mass)
+    diann_mod_name = modification.get_name()
+    tag = 1
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion provides a valid performance optimization by proposing a lookup map, which is a standard and effective technique to improve efficiency from O(N) to O(1) for lookups.

Medium
Possible issue
Avoid using a magic string
Suggestion Impact:The commit removed the magic string comparison by introducing a named constant MET_LOSS_MODIFICATION and using it in the comparison, reducing brittleness. It did not implement the proposed structured return from get_mod, but it addressed the magic string issue.

code diff:

+# Met-loss modification constant (UniMod:765) with mass shift and site specification
+MET_LOSS_MODIFICATION = "UniMod:765,-131.040485,*nM"
 
 
 @click.command("dianncfg", short_help="Create DIA-NN config file with enzyme and PTMs")
@@ -42,7 +45,7 @@
     for mod in fix_ptm:
         diann_fix_ptm += fix_ptm_str + mod
     for mod in var_ptm:
-        if mod == "UniMod:765,-131.040485,*nM":
+        if mod == MET_LOSS_MODIFICATION:
             diann_var_ptm += " --met-excision "
         else:
             diann_var_ptm += var_ptm_str + mod

Refactor the code to avoid using a hardcoded "magic string" for identifying the
Met-loss modification. Instead, return a structured result from get_mod to
indicate special modification types.

quantmsutils/diann/dianncfg.py [44-48]

-for mod in var_ptm:
-    if mod == "UniMod:765,-131.040485,*nM":
+for mod, is_met_excision in var_ptm:
+    if is_met_excision:
         diann_var_ptm += " --met-excision "
     else:
         diann_var_ptm += var_ptm_str + mod

[Suggestion processed]

Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies the use of a "magic string" as a brittle design and proposes a more robust, maintainable solution, improving code quality.

Low
  • Update

Copilot finished reviewing on behalf of ypriverol November 27, 2025 06:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the PTM (Post-Translational Modification) handling in the DIA-NN configuration module. The changes refactor the modification conversion logic to better handle PTM merging across sites and add special support for Met-loss (methionine loss) modifications. This is part of the quantms-utils proteomics workflow, which converts SDRF parameters to DIA-NN parameters.

Key changes include:

  • Refactored PTM conversion logic to extract a new get_mod() helper function for cleaner code organization
  • Added special handling for Met-loss modifications using DIA-NN's --met-excision flag
  • Implemented PTM merging logic to consolidate the same modification across different sites
  • Updated sdrf-pipelines dependency from >=0.0.31 to ==0.0.33 across all dependency files

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
quantmsutils/diann/dianncfg.py Refactored modification conversion with new get_mod() function, added Met-loss special handling, and implemented modification merging across sites
tests/test_commands.py Added new test case for dianncfg command with complex PTM examples including Met-loss
requirements.txt Updated sdrf-pipelines to exact version 0.0.33
pyproject.toml Updated sdrf-pipelines to exact version 0.0.33
environment.yml Updated sdrf-pipelines to exact version 0.0.33
quantmsutils/init.py Bumped package version from 0.0.23 to 0.0.24

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"Currently only supported unimod modifications for DIA pipeline. Skipped: "
+ mod
)
exit(1)
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using exit(1) directly is not ideal in library code as it terminates the entire Python process. Consider raising a custom exception (e.g., ValueError or a custom ModificationNotSupportedError) instead, allowing callers to handle the error appropriately. This pattern appears in lines 71, 85, and 110.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

"quantms DIA-NN workflow only support LFQ now! Unsupported modifications: "
+ mod
)
exit(1)
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using exit(1) directly is not ideal in library code as it terminates the entire Python process. Consider raising a custom exception (e.g., ValueError or a custom ModificationNotSupportedError) instead, allowing callers to handle the error appropriately.

Copilot uses AI. Check for mistakes.
"Currently only supported unimod modifications for DIA pipeline. Skipped: "
+ mod
)
exit(1)
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using exit(1) directly is not ideal in library code as it terminates the entire Python process. Consider raising a custom exception (e.g., ValueError or a custom ModificationNotSupportedError) instead, allowing callers to handle the error appropriately.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI commented Nov 27, 2025

@ypriverol I've opened a new pull request, #60, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
quantmsutils/diann/dianncfg.py (1)

54-111: Make get_mod error handling more defensive and reusable

Two potential sharp edges here:

  • site = re.findall(pattern, " ".join(mod.split(" ")[1:]))[0] assumes there is always at least one parenthesized site; malformed input would raise IndexError instead of a clear error.
  • Calling exit(1) from a helper makes the function hard to reuse programmatically and harder to test in isolation; it’s more idiomatic to raise a descriptive exception and let the CLI layer convert that to a non‑zero exit code.

Consider (a) guarding the findall result and failing with a clear message if no site is found, and (b) raising a custom exception (or ValueError) instead of exiting directly, with the dianncfg command catching and reporting it.

tests/test_commands.py (1)

88-100: Strengthen test_dianncfg_example by asserting on the generated config

This is a useful smoke test for the CLI, but it currently only checks exit_code == 0. To catch regressions in PTM/enzyme handling, consider also asserting that:

  • diann_config.cfg is created, and
  • its contents contain the expected --cut, --fixed-mod, --var-mod, and --met-excision flags for this input.

That would exercise the new get_mod / convert_mod logic more thoroughly.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e26ad9b and a88d479.

📒 Files selected for processing (6)
  • environment.yml (1 hunks)
  • pyproject.toml (1 hunks)
  • quantmsutils/__init__.py (1 hunks)
  • quantmsutils/diann/dianncfg.py (3 hunks)
  • requirements.txt (1 hunks)
  • tests/test_commands.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Agent
  • GitHub Check: build
  • GitHub Check: build (3.10)
  • GitHub Check: build (3.9)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build
🔇 Additional comments (4)
environment.yml (1)

9-9: Pinning sdrf-pipelines in conda env looks consistent

Pinning to ==0.0.33 improves reproducibility and matches the other manifests; just make sure this version is available on your configured bioconda/conda-forge channels in CI and production.

requirements.txt (1)

2-2: Requirements pin aligned with other manifests

Pinning sdrf-pipelines to ==0.0.33 here keeps pip installs consistent with the conda env and Poetry config; no issues from the code side.

quantmsutils/__init__.py (1)

1-1: Version bump to 0.0.24 is consistent

__version__ matches the version in pyproject.toml and the scope of changes in this PR; nothing else needed here.

pyproject.toml (1)

34-34: Poetry dependency pin consistent with other manifests

Pinning sdrf-pipelines to ==0.0.33 here keeps Poetry installs aligned with conda and requirements.txt, which should avoid subtle version skew.

Copy link
Contributor

Copilot AI commented Nov 27, 2025

@ypriverol I've opened a new pull request, #61, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
quantmsutils/diann/dianncfg.py (3)

36-48: Make the Met-loss → --met-excision mapping more robust than a full-string equality

The special handling for Met-loss currently depends on an exact match of the full formatted string:

if mod == "UniMod:765,-131.040485,*nM":
    diann_var_ptm += " --met-excision "

This is brittle against:

  • Changes in float formatting of the delta mass.
  • Potential future tweaks to how convert_mod formats PTMs.

Consider parsing mod into accession / mass / sites and keying the special case only on stable parts (e.g. accession UniMod:765 and sites *nM), e.g.:

parts = mod.split(",", 2)
accession, _, sites = parts if len(parts) == 3 else (None, None, None)
if accession == "UniMod:765" and sites == "*nM":
    diann_var_ptm += " --met-excision "
else:
    diann_var_ptm += var_ptm_str + mod

This preserves behavior while avoiding silent breakage from formatting changes.


54-125: get_mod is much clearer and safer; only remaining concern is error handling style

Positives in this refactor:

  • The new docstring and modification_found flag significantly improve readability and maintainability.
  • Guarding against missing site matches via if not sites: avoids the previous IndexError risk when parentheses are absent.
  • Terminal site normalization and the Met-loss *nM special case are now clearly expressed.

The only thing to consider in a future pass is replacing the internal exit(1) calls with raised exceptions so callers can decide how to handle failures (while the CLI can still convert them to process exits).


128-152: convert_mod correctly handles missing options and merges sites; tighten typing and empty-token handling

The updated guards:

if fix_mod:
    ...
if var_mod:
    ...

nicely avoid the previous None.split(",") crash when options are omitted and still handle empty strings correctly. The defaultdict-based merge to produce name,site_str also looks good.

Two small follow-ups you may want to tackle (can be deferred):

  1. Type hints to match runtime behavior

fix_mod and var_mod can be None as well as str, and the lists hold strings:

from typing import List, Tuple, Optional

def convert_mod(
    fix_mod: Optional[str], var_mod: Optional[str]
) -> Tuple[List[str], List[str]]:
    ...
  1. Skip empty tokens after splitting

If users accidentally pass a trailing comma, fix_mod.split(",") can yield "", which would then fail in get_mod. You can filter them cheaply:

for mod in (m for m in fix_mod.split(",") if m):
    ...
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 742695d and cd28956.

📒 Files selected for processing (2)
  • quantmsutils/diann/dianncfg.py (3 hunks)
  • requirements.txt (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build (3.9)
  • GitHub Check: build (3.10)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (1)
quantmsutils/diann/dianncfg.py (1)

10-16: Module-level Unimod database and defaultdict usage look appropriate

Using a module-global UnimodDatabase() plus defaultdict for downstream merging is a good fit here: it removes repeated DB construction while keeping PTM merging local to convert_mod. No functional issues from this change.

Copy link
Contributor

Copilot AI commented Nov 27, 2025

@ypriverol I've opened a new pull request, #63, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Contributor

Copilot AI commented Nov 27, 2025

@ypriverol I've opened a new pull request, #64, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
quantmsutils/diann/dianncfg.py (1)

39-40: Met-loss detection still couples behavior to exact mass string formatting

Even though MET_LOSS_MODIFICATION is now a named constant, the equality check in the var-mod loop still depends on the exact mass string ("UniMod:765,-131.040485,*nM"). If the Unimod library changes the precision or formatting of _delta_mono_mass, Met-loss will silently stop triggering --met-excision and instead be treated as a normal --var-mod.

To decouple behavior from float formatting, match on stable fields (accession + site) instead of the full string. For example:

-# Met-loss modification constant (UniMod:765) with mass shift and site specification
-MET_LOSS_MODIFICATION = "UniMod:765,-131.040485,*nM"
+# Met-loss modification accession and site (UniMod:765, N-terminal Met-loss)
+MET_LOSS_ACCESSION = "UniMod:765"
+MET_LOSS_SITE = "*nM"
@@
-    for mod in var_ptm:
-        if mod == MET_LOSS_MODIFICATION:
-            diann_var_ptm += " --met-excision "
-        else:
-            diann_var_ptm += var_ptm_str + mod
+    for mod in var_ptm:
+        # Expect format: accession,delta_mass,sites
+        parts = mod.split(",", 2)
+        if len(parts) == 3:
+            accession, _, sites = parts
+        else:
+            accession, sites = None, None
+
+        if accession == MET_LOSS_ACCESSION and sites == MET_LOSS_SITE:
+            diann_var_ptm += " --met-excision "
+        else:
+            diann_var_ptm += var_ptm_str + mod

This keeps behavior stable even if the Unimod mass representation changes.

Also applies to: 68-71

🧹 Nitpick comments (1)
quantmsutils/diann/dianncfg.py (1)

17-37: Lazy UnimodDatabase initialization looks good; consider adding type annotations

The lazy _unimod_database + get_unimod_database() pattern is a nice improvement for testability and import-time cost. To make the intent clearer to readers and type-checkers, consider annotating the storage and accessor:

  • _unimod_database: UnimodDatabase | None = None
  • def get_unimod_database() -> UnimodDatabase: ...

This makes it obvious that callers always receive a fully initialized UnimodDatabase instance.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd28956 and a9251f7.

📒 Files selected for processing (3)
  • .gitignore (1 hunks)
  • environment.yml (1 hunks)
  • quantmsutils/diann/dianncfg.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • environment.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build (3.10)
  • GitHub Check: build (3.9)
  • GitHub Check: build
🔇 Additional comments (1)
.gitignore (1)

172-177: Ignoring DIA-NN config and Parquet artifacts is appropriate

The new patterns (diann_config.cfg and *.parquet) are clearly documented as test-generated artifacts and are safe to keep out of version control. No issues here.

Comment on lines +145 to +167
def convert_mod(fix_mod: str, var_mod: str) -> Tuple[List, List]:
var_ptm = []
fix_ptm = []

if fix_mod != "":
if fix_mod:
merged = defaultdict(list)
for mod in fix_mod.split(","):
tag = 0
diann_mod = None
for modification in unimod_database.modifications:
if modification.get_name() == mod.split(" ")[0]:
diann_mod = modification.get_name() + "," + str(modification._delta_mono_mass)
tag = 1
break
if tag == 0:
logging.info(
"Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: "
+ mod
)
continue
site = re.findall(pattern, " ".join(mod.split(" ")[1:]))[0]
if site == "Protein N-term":
site = "*n"
elif site == "N-term":
site = "n"

if (
"TMT" in diann_mod
or "Label" in diann_mod
or "iTRAQ" in diann_mod
or "mTRAQ" in diann_mod
):
fix_ptm.append(diann_mod + "," + site + "," + "label")
elif diann_mod is not None:
fix_ptm.append(diann_mod + "," + site)
else:
print(
"Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: "
+ mod
)

if var_mod != "":
diann_mod, site = get_mod(mod, "fixed_mod")
merged[diann_mod].append(site)

# merge same modification for different sites
for name, site_list in merged.items():
site_str = "".join(sorted(set(site_list)))
fix_ptm.append(f"{name},{site_str}")

if var_mod:
merged = defaultdict(list)
for mod in var_mod.split(","):
tag = 0
diann_mod = None
for modification in unimod_database.modifications:
if modification.get_name() == mod.split(" ")[0]:
diann_mod = modification.get_name() + "," + str(modification._delta_mono_mass)
tag = 1
break
if tag == 0:
print(
"Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: "
+ mod
)
continue
site = re.findall(pattern, " ".join(mod.split(" ")[1:]))[0]
if site == "Protein N-term":
site = "*n"
elif site == "N-term":
site = "n"

if (
"TMT" in diann_mod
or "Label" in diann_mod
or "iTRAQ" in diann_mod
or "mTRAQ" in diann_mod
):
var_ptm.append(diann_mod + "," + site + "," + "label")
else:
var_ptm.append(diann_mod + "," + site)
diann_mod, site = get_mod(mod, "var_mod")
merged[diann_mod].append(site)
# merge same modification for different sites
for name, site_list in merged.items():
site_str = "".join(sorted(set(site_list)))
var_ptm.append(f"{name},{site_str}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Trim whitespace and align type hints in convert_mod to avoid subtle CLI failures

convert_mod now correctly uses truthiness checks, so None from click options is safe. However:

  • When users pass values like "Oxidation (M), Carbamidomethyl (C)" (note the space after the comma), fix_mod.split(",") produces entries with leading spaces. Because get_mod uses mod.split(" ") with an explicit separator, the leading space yields an empty first token, and the Unimod name lookup fails, causing an error/exit even though the modification name is valid.
  • The function signature still declares fix_mod: str, var_mod: str, but at runtime it accepts None (from click), so the annotation is misleading.

You can harden the parsing and align the annotations like this:

-def convert_mod(fix_mod: str, var_mod: str) -> Tuple[List, List]:
+def convert_mod(fix_mod: str | None, var_mod: str | None) -> Tuple[List[str], List[str]]:
@@
-    if fix_mod:
-        merged = defaultdict(list)
-        for mod in fix_mod.split(","):
-            diann_mod, site = get_mod(mod, "fixed_mod")
-            merged[diann_mod].append(site)
+    if fix_mod:
+        merged = defaultdict(list)
+        for raw_mod in fix_mod.split(","):
+            mod = raw_mod.strip()
+            if not mod:
+                continue
+            diann_mod, site = get_mod(mod, "fixed_mod")
+            merged[diann_mod].append(site)
@@
-    if var_mod:
-        merged = defaultdict(list)
-        for mod in var_mod.split(","):
-            diann_mod, site = get_mod(mod, "var_mod")
-            merged[diann_mod].append(site)
+    if var_mod:
+        merged = defaultdict(list)
+        for raw_mod in var_mod.split(","):
+            mod = raw_mod.strip()
+            if not mod:
+                continue
+            diann_mod, site = get_mod(mod, "var_mod")
+            merged[diann_mod].append(site)

And update the import to include the newer union syntax or Optional as appropriate for your minimum Python version (e.g., from typing import List, Tuple, Optional and Optional[str] if you need pre-3.10 support).

This keeps the CLI robust against common spacing patterns and documents the actual accepted types.

🤖 Prompt for AI Agents
In quantmsutils/diann/dianncfg.py around lines 145–167, tighten parsing and fix
annotations: change the signature to accept optional strings and precise return
types (e.g., def convert_mod(fix_mod: Optional[str], var_mod: Optional[str]) ->
Tuple[List[str], List[str]] and add Optional to imports), and when iterating
split values trim whitespace and ignore empty tokens (e.g., iterate over
(token.strip() for token in fix_mod.split(",")) and only call get_mod on
non-empty tokens) so entries like "Oxidation (M), Carbamidomethyl (C)" parse
correctly; keep the existing merged logic and identical fixes for var_mod.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
quantmsutils/diann/dianncfg.py (2)

150-158: LGTM! Consider whitespace handling for robustness.

The merging logic using defaultdict is clean and correctly consolidates identical modifications across different sites.

However, a past review comment noted that users might pass values with spaces after commas (e.g., "Oxidation (M), Carbamidomethyl (C)"), which would cause mod.split(" ")[0] in get_mod to fail due to leading whitespace. Consider adding .strip() when iterating:

for mod in fix_mod.split(","):
    mod = mod.strip()
    if not mod:
        continue
    diann_mod, site = get_mod(mod, "fixed_mod")

Based on learnings, past review comment already flagged this issue.


161-168: LGTM! Consider whitespace handling for robustness.

The variable modification merging logic mirrors the fixed modification approach and works correctly.

Apply the same whitespace trimming here as suggested for fixed modifications:

for mod in var_mod.split(","):
    mod = mod.strip()
    if not mod:
        continue
    diann_mod, site = get_mod(mod, "var_mod")

Based on learnings, past review comment already flagged this issue.

🧹 Nitpick comments (3)
quantmsutils/diann/dianncfg.py (3)

17-36: Lazy initialization improves testability.

The lazy initialization pattern addresses the previous concern about module-level database creation. The approach is sound for single-threaded CLI usage.

For future consideration: if this module is ever used in multi-threaded contexts, the global state could race. A thread-safe singleton pattern (with locks) or dependency injection would be more robust, but that's overkill for the current CLI use case.


94-94: Accessing private attribute _delta_mono_mass is fragile.

Direct access to the private _delta_mono_mass attribute couples this code to the internal implementation of the UnimodDatabase modification objects. If the upstream library refactors this attribute (renames it, removes it, or changes access patterns), this code will break.

Check if the modification object provides a public method like get_delta_mass() or similar. If not, consider requesting this as a feature in sdrf-pipelines or wrapping the access in a try-except block with a clear error message.

try:
    mass = modification._delta_mono_mass
except AttributeError:
    logging.error(f"Unable to retrieve mass for modification: {modification.get_name()}")
    exit(1)

146-146: Refine type hints for clarity.

The return type Tuple[List, List] is imprecise. Both lists contain strings, so the annotation should be Tuple[List[str], List[str]]. Additionally, since Click passes None when options are omitted (and the function now handles this with truthiness checks), the parameters should be typed as Optional[str].

Apply this diff to align the annotations with runtime behavior:

-from typing import List, Tuple
+from typing import List, Tuple, Optional

-def convert_mod(fix_mod: str, var_mod: str) -> Tuple[List, List]:
+def convert_mod(fix_mod: Optional[str], var_mod: Optional[str]) -> Tuple[List[str], List[str]]:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a9251f7 and 8aea86b.

📒 Files selected for processing (1)
  • quantmsutils/diann/dianncfg.py (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build (3.10)
  • GitHub Check: build (3.9)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (8)
quantmsutils/diann/dianncfg.py (8)

10-10: LGTM!

The defaultdict import is correctly added and used effectively in convert_mod to group modifications by accession.


39-40: LGTM!

The MET_LOSS_MODIFICATION constant eliminates the magic string and makes the Met-loss special case explicit and maintainable.


60-60: LGTM!

The updated convert_mod call correctly reflects the new signature that uses the lazy-initialized database internally.


69-72: LGTM!

The Met-loss special case handling is now clear and uses the named constant, improving maintainability.


78-87: LGTM!

The docstring clearly documents the function's purpose, parameters, return value, and potential exceptions.


89-97: LGTM!

Renaming tag to modification_found significantly improves code readability. The lookup logic using the lazy database instance is correct.


120-126: LGTM!

The error handling for missing site specifications is robust. Checking if not sites: before accessing sites[0] prevents the IndexError that could have occurred with the previous implementation.


127-143: LGTM!

The site normalization logic correctly handles various terminal modification scenarios including Protein N-term, N-term, and compound terminal cases. The special handling for Met-loss is appropriate.

@ypriverol ypriverol requested a review from Copilot November 27, 2025 09:20
Copilot finished reviewing on behalf of ypriverol November 27, 2025 09:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 7 changed files in this pull request and generated 16 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +17 to +20
# Lazy initialization of UnimodDatabase for improved testability.
# The database is created on first access rather than at module import time,
# which allows tests to mock or replace it more easily.
_unimod_database = None
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states "The database is created on first access rather than at module import time, which allows tests to mock or replace it more easily." However, there are no tests in this PR that demonstrate mocking this database, and the implementation uses a module-level global variable which is still difficult to mock properly. If testability is the goal, consider using dependency injection (passing the database as a parameter) or a more robust singleton pattern. The current implementation only partially achieves the stated testability goal.

Copilot uses AI. Check for mistakes.
Comment on lines +89 to +99
modification_found = 0
diann_mod_accession = None
diann_mod_name = None
for modification in get_unimod_database().modifications:
if modification.get_name() == mod.split(" ")[0]:
diann_mod_accession = modification.get_accession().replace("UNIMOD:", "UniMod:") + "," + str(modification._delta_mono_mass)
diann_mod_name = modification.get_name()
modification_found = 1
break

if modification_found == 0:
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using modification_found = 0 and modification_found = 1 as boolean flags is not idiomatic Python. Consider using modification_found = False and modification_found = True instead, or better yet, eliminate the flag by restructuring the logic (e.g., initialize diann_mod_accession = None and check if diann_mod_accession is None: at line 99).

Suggested change
modification_found = 0
diann_mod_accession = None
diann_mod_name = None
for modification in get_unimod_database().modifications:
if modification.get_name() == mod.split(" ")[0]:
diann_mod_accession = modification.get_accession().replace("UNIMOD:", "UniMod:") + "," + str(modification._delta_mono_mass)
diann_mod_name = modification.get_name()
modification_found = 1
break
if modification_found == 0:
diann_mod_accession = None
diann_mod_name = None
for modification in get_unimod_database().modifications:
if modification.get_name() == mod.split(" ")[0]:
diann_mod_accession = modification.get_accession().replace("UNIMOD:", "UniMod:") + "," + str(modification._delta_mono_mass)
diann_mod_name = modification.get_name()
break
if diann_mod_accession is None:

Copilot uses AI. Check for mistakes.
diann_mod_name = None
for modification in get_unimod_database().modifications:
if modification.get_name() == mod.split(" ")[0]:
diann_mod_accession = modification.get_accession().replace("UNIMOD:", "UniMod:") + "," + str(modification._delta_mono_mass)
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Accessing the private attribute modification._delta_mono_mass (with leading underscore) breaks encapsulation and could be fragile if the underlying library changes. Consider using a public method if available, or document why direct access to this private attribute is necessary.

Suggested change
diann_mod_accession = modification.get_accession().replace("UNIMOD:", "UniMod:") + "," + str(modification._delta_mono_mass)
# Prefer public accessor for mono mass if available; fallback to private attribute with documentation.
try:
# Try to use a public method/property if available
mono_mass = modification.get_delta_mono_mass()
except AttributeError:
# No public accessor; document why direct access is necessary
# Direct access to _delta_mono_mass is required as no public method is available.
# This may be fragile if the underlying library changes.
mono_mass = modification._delta_mono_mass
diann_mod_accession = modification.get_accession().replace("UNIMOD:", "UniMod:") + "," + str(mono_mass)

Copilot uses AI. Check for mistakes.
for DIA-NN and site is the modification site.
:raises SystemExit: If the modification is not found in the Unimod database, logs an error and exits.
"""
pattern = re.compile(r"\((.*?)\)")
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The regex pattern is compiled inside the get_mod() function, which means it will be recompiled for every modification processed. Since this function is called in a loop (lines 151, 162), consider moving the pattern compilation to module level (similar to how MET_LOSS_MODIFICATION is defined at line 40) to improve performance when processing multiple modifications.

Copilot uses AI. Check for mistakes.
Comment on lines +149 to 151
if fix_mod:
merged = defaultdict(list)
for mod in fix_mod.split(","):
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check if fix_mod: will evaluate to False for both None and empty string "", but if fix_mod is None, the call to fix_mod.split(",") at line 151 will raise an AttributeError. Consider explicitly checking for None or ensuring that the parameters always receive a string value (empty string for no modifications). The same issue exists for var_mod at line 160.

Copilot uses AI. Check for mistakes.

# merge same modification for different sites
for name, site_list in merged.items():
site_str = "".join(sorted(set(site_list)))
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The site merging logic concatenates sites without a delimiter (e.g., "STY" for phosphorylation on S, T, and Y). However, if a site is multi-character (like "*n" for Protein N-term), this concatenation could produce ambiguous results. For example, "*n" + "K" becomes "*nK" which could be interpreted as "Protein N-term K" (a two-part site from lines 131-138) rather than two separate sites. Consider documenting this behavior or adding validation to ensure sites don't create ambiguous concatenations.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
quantmsutils/diann/dianncfg.py (2)

60-72: MET-loss handling still coupled to full string (including mass) – consider parsing instead

The if mod == MET_LOSS_MODIFICATION check still ties the --met-excision mapping to the exact formatting of accession, mono mass, and site. Any future change in Unimod mass precision or str() formatting would silently drop the special case and emit a generic --var-mod instead.

You can decouple this by parsing the components and keying only on accession + site:

-    for mod in var_ptm:
-        if mod == MET_LOSS_MODIFICATION:
-            diann_var_ptm += " --met-excision "
-        else:
-            diann_var_ptm += var_ptm_str + mod
+    for mod in var_ptm:
+        # Expect format: accession,delta_mass,sites
+        parts = mod.split(",", 2)
+        accession, sites = (parts[0], parts[2]) if len(parts) == 3 else (None, None)
+
+        if accession == "UniMod:765" and sites == "*nM":
+            diann_var_ptm += " --met-excision "
+        else:
+            diann_var_ptm += var_ptm_str + mod

This preserves behavior while making it resilient to float-format tweaks.


146-170: Fix whitespace-sensitive parsing and align convert_mod type hints with runtime usage

Right now convert_mod will wrongly reject valid modifications when users include spaces after commas, e.g.:

dianncfg --fix_mod "Oxidation (M), Carbamidomethyl (C)"

fix_mod.split(",") yields " Carbamidomethyl (C)"; this is passed unchanged to get_mod, whose mod.split(" ")[0] becomes "" (leading space + explicit separator), so no Unimod match is found and the CLI exits with an “Unsupported modification” error.

Additionally, click passes None when options are omitted, but the type hints still say str, and the return type is Tuple[List, List] instead of Tuple[List[str], List[str]].

You can harden this and correct the annotations by stripping tokens and skipping empties:

-from typing import List, Tuple
+from typing import List, Tuple, Optional
@@
-def convert_mod(fix_mod: str, var_mod: str) -> Tuple[List, List]:
-    var_ptm = []
-    fix_ptm = []
-    if fix_mod:
-        merged = defaultdict(list)
-        for mod in fix_mod.split(","):
-            diann_mod, site = get_mod(mod, "fixed_mod")
-            merged[diann_mod].append(site)
+def convert_mod(fix_mod: Optional[str], var_mod: Optional[str]) -> Tuple[List[str], List[str]]:
+    var_ptm: List[str] = []
+    fix_ptm: List[str] = []
+    if fix_mod:
+        merged = defaultdict(list)
+        for raw_mod in fix_mod.split(","):
+            mod = raw_mod.strip()
+            if not mod:
+                continue
+            diann_mod, site = get_mod(mod, "fixed_mod")
+            merged[diann_mod].append(site)
@@
-    if var_mod:
-        merged = defaultdict(list)
-        for mod in var_mod.split(","):
-            diann_mod, site = get_mod(mod, "var_mod")
-            merged[diann_mod].append(site)
+    if var_mod:
+        merged = defaultdict(list)
+        for raw_mod in var_mod.split(","):
+            mod = raw_mod.strip()
+            if not mod:
+                continue
+            diann_mod, site = get_mod(mod, "var_mod")
+            merged[diann_mod].append(site)

(Plus, ensure Optional is imported from typing as shown.)

This keeps behavior identical for well‑formed inputs, but avoids surprising failures for the very common "A, B" style and documents the actual call contract.

🧹 Nitpick comments (2)
quantmsutils/diann/dianncfg.py (2)

17-40: Lazy Unimod initialization looks good; consider explicit testing hook

The lazy get_unimod_database() pattern and documentation are clear and avoid work at import time. If you expect frequent mocking in tests, you might optionally expose a tiny helper (e.g., set_unimod_database(db)) instead of having tests reach into _unimod_database directly, but the current approach is acceptable for this CLI‑oriented module.


78-143: get_mod improvements are solid; only minor style nits remain

The added docstring, Unimod‑only error message, multiplex guard, and especially the check for missing sites before indexing eliminate the earlier IndexError risk and make failures much clearer. The control flow is correct for both fixed and variable mods, including the Met-loss special case.

If you touch this again, you might consider making modification_found a proper boolean and/or normalizing mod once (e.g., mod_name = mod.split(" ", 1)[0].strip()) for readability, but there’s no functional issue here.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9fdb40c and 1b0fe77.

📒 Files selected for processing (1)
  • quantmsutils/diann/dianncfg.py (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build
  • GitHub Check: build (3.9)
  • GitHub Check: build (3.10)
  • GitHub Check: build

@ypriverol ypriverol merged commit 994602a into main Nov 27, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants