Skip to content

Conversation

@daichengxin
Copy link
Collaborator

@daichengxin daichengxin commented Nov 24, 2025

PR Type

Bug fix, Enhancement


Description

  • Fixed PTM parsing bug in DIA-NN configuration by merging duplicate modifications

  • Improved handling of multiple sites for same modification using defaultdict

  • Updated sdrf-pipelines dependency to version 0.0.33 for compatibility

  • Bumped package version to 0.0.24


Diagram Walkthrough

flowchart LR
  A["PTM Parse Input"] --> B["Collect Sites per Modification"]
  B --> C["Merge Duplicates with defaultdict"]
  C --> D["Format Output with Site List"]
  D --> E["Fixed PTM Output"]
Loading

File Walkthrough

Relevant files
Configuration changes
__init__.py
Version bump to 0.0.24                                                                     

quantmsutils/init.py

  • Updated version from 0.0.23 to 0.0.24
+1/-1     
pyproject.toml
Update version and dependency constraints                               

pyproject.toml

  • Updated package version from 0.0.23 to 0.0.24
  • Updated sdrf-pipelines dependency from >=0.0.32 to ==0.0.33
+2/-2     
Bug fix
dianncfg.py
Refactor PTM parsing to merge duplicate modifications       

quantmsutils/diann/dianncfg.py

  • Added defaultdict import for efficient modification grouping
  • Refactored convert_mod() function to merge duplicate modifications
    across different sites
  • Changed from appending individual entries to collecting sites in
    merged dictionary
  • Implemented merging logic that combines sites for same modification
    and formats as name (sites)
  • Applied same merging pattern to both fixed and variable PTM processing
+17/-5   
Dependencies
environment.yml
Pin sdrf-pipelines to version 0.0.33                                         

environment.yml

  • Updated sdrf-pipelines dependency from >=0.0.31 to ==0.0.33
+1/-1     
requirements.txt
Pin sdrf-pipelines to version 0.0.33                                         

requirements.txt

  • Updated sdrf-pipelines dependency from >=0.0.31 to ==0.0.33
+1/-1     

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 24, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@qodo-code-review
Copy link

qodo-code-review bot commented Nov 24, 2025

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
No audit logging: The new PTM merging logic adds processing paths without adding any audit or action logging
for critical operations or warnings.

Referred Code
if fix_mod != "":
    merged = defaultdict(list)
    for mod in fix_mod.split(","):
        tag = 0
        diann_mod = None
        for modification in unimod_database.modifications:
            if modification.get_name() == mod.split(" ")[0]:
                diann_mod = modification.get_name() + "," + str(modification._delta_mono_mass)
                tag = 1
                break
        if tag == 0:
            logging.info(
                "Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: "
                + mod
            )
            continue
        site = re.findall(pattern, " ".join(mod.split(" ")[1:]))[0]
        if site == "Protein N-term":
            site = "*n"
        elif site == "N-term":
            site = "n"


 ... (clipped 59 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Weak error handling: The code uses print for unsupported modifications and lacks structured error handling or
logging for edge cases like empty tokens or unknown sites.

Referred Code
        print(
            "Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: "
            + mod
        )

# merge same modification for different sites

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Print over logging: The code prints warnings instead of using structured logging, reducing log usefulness and
consistency.

Referred Code
        print(
            "Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: "
            + mod
        )

# merge same modification for different sites

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Missing validation: Newly added parsing relies on split(',') and regex without validating tokens or
site letters, which may cause unexpected behavior with malformed inputs.

Referred Code
if fix_mod != "":
    merged = defaultdict(list)
    for mod in fix_mod.split(","):
        tag = 0
        diann_mod = None
        for modification in unimod_database.modifications:
            if modification.get_name() == mod.split(" ")[0]:
                diann_mod = modification.get_name() + "," + str(modification._delta_mono_mass)
                tag = 1
                break
        if tag == 0:
            logging.info(
                "Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: "
                + mod
            )
            continue
        site = re.findall(pattern, " ".join(mod.split(" ")[1:]))[0]
        if site == "Protein N-term":
            site = "*n"
        elif site == "N-term":
            site = "n"


 ... (clipped 59 lines)

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

qodo-code-review bot commented Nov 24, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix incorrect list conversion for labels
Suggestion Impact:The commit removed the use of extend(list(...)) entirely and switched to appending the site string (and refactored label handling into get_mod). This prevents the erroneous character-wise splitting the suggestion highlighted.

code diff:

-                or "mTRAQ" in diann_mod
-            ):
-                merged[diann_mod].extend(list(site + "," + "label"))
-            elif diann_mod is not None:
-                merged[diann_mod].extend(list(site))
-            else:
-                print(
-                    "Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: "
-                    + mod
-                )
+            diann_mod, site = get_mod(mod)
+            merged[diann_mod].append(site)
 
         # merge same modification for different sites
         for name, site_list in merged.items():
@@ -98,35 +105,8 @@
     if var_mod != "":
         merged = defaultdict(list)
         for mod in var_mod.split(","):
-            tag = 0
-            diann_mod = None
-            for modification in unimod_database.modifications:
-                if modification.get_name() == mod.split(" ")[0]:
-                    diann_mod = modification.get_name() + "," + str(modification._delta_mono_mass)
-                    tag = 1
-                    break
-            if tag == 0:
-                print(
-                    "Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: "
-                    + mod
-                )
-                continue
-            site = re.findall(pattern, " ".join(mod.split(" ")[1:]))[0]
-            if site == "Protein N-term":
-                site = "*n"
-            elif site == "N-term":
-                site = "n"
-
-            if (
-                "TMT" in diann_mod
-                or "Label" in diann_mod
-                or "iTRAQ" in diann_mod
-                or "mTRAQ" in diann_mod
-            ):
-                merged[diann_mod].extend(list(site + "," + "label"))
-            else:
-                merged[diann_mod].extend(list(site))
-
+            diann_mod, site = get_mod(mod)
+            merged[diann_mod].append(site)
         # merge same modification for different sites

Fix incorrect list conversion for label modifications. The expression list(site
+ "," + "label") should be merged[diann_mod].append(site + ",label") to prevent
splitting the string into a list of characters.

quantmsutils/diann/dianncfg.py [84]

-merged[diann_mod].extend(list(site + "," + "label"))
+merged[diann_mod].append(site + ",label")

[Suggestion processed]

Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a critical bug where list(site + "," + "label") incorrectly splits the string into characters, leading to a malformed output. While the proposed fix is also flawed due to subsequent processing logic, identifying this bug is crucial for the function's correctness.

High
High-level
Pinning a dependency version is risky

The sdrf-pipelines dependency should not be pinned to an exact version. Instead,
use a compatible version range like ~=0.0.33 to avoid potential dependency
conflicts for users of this package.

Examples:

pyproject.toml [34]
sdrf-pipelines = "==0.0.33"
environment.yml [9]
  - sdrf-pipelines==0.0.33

Solution Walkthrough:

Before:

# pyproject.toml
[tool.poetry.dependencies]
sdrf-pipelines = "==0.0.33"

# requirements.txt
sdrf-pipelines==0.0.33

# environment.yml
dependencies:
  - sdrf-pipelines==0.0.33

After:

# pyproject.toml
[tool.poetry.dependencies]
sdrf-pipelines = "~=0.0.33" # or ">=0.0.33, <0.1.0"

# requirements.txt
sdrf-pipelines>=0.0.33

# environment.yml
dependencies:
  - sdrf-pipelines>=0.0.33
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a dependency-pinning issue across multiple files, which is a significant maintainability and usability concern for a library, and proposes a valid best-practice solution.

Medium
General
Simplify site list extension logic
Suggestion Impact:The commit replaced uses of extend(list(site)) with append(site) when building the merged dict for both fixed and variable mods.

code diff:

+            diann_mod, site = get_mod(mod)
+            merged[diann_mod].append(site)
 
         # merge same modification for different sites
         for name, site_list in merged.items():
@@ -98,35 +105,8 @@
     if var_mod != "":
         merged = defaultdict(list)
         for mod in var_mod.split(","):
-            tag = 0
-            diann_mod = None
-            for modification in unimod_database.modifications:
-                if modification.get_name() == mod.split(" ")[0]:
-                    diann_mod = modification.get_name() + "," + str(modification._delta_mono_mass)
-                    tag = 1
-                    break
-            if tag == 0:
-                print(
-                    "Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: "
-                    + mod
-                )
-                continue
-            site = re.findall(pattern, " ".join(mod.split(" ")[1:]))[0]
-            if site == "Protein N-term":
-                site = "*n"
-            elif site == "N-term":
-                site = "n"
-
-            if (
-                "TMT" in diann_mod
-                or "Label" in diann_mod
-                or "iTRAQ" in diann_mod
-                or "mTRAQ" in diann_mod
-            ):
-                merged[diann_mod].extend(list(site + "," + "label"))
-            else:
-                merged[diann_mod].extend(list(site))
-
+            diann_mod, site = get_mod(mod)
+            merged[diann_mod].append(site)
         # merge same modification for different sites

Simplify the site list extension logic by replacing
merged[diann_mod].extend(list(site)) with the more direct and readable
merged[diann_mod].append(site).

quantmsutils/diann/dianncfg.py [86]

-merged[diann_mod].extend(list(site))
+merged[diann_mod].append(site)

[Suggestion processed]

Suggestion importance[1-10]: 4

__

Why: The suggestion correctly points out that extend(list(site)) is equivalent to append(site) for single-character sites, and proposes a change that improves code clarity and simplicity.

Low
  • Update

@jpfeuffer
Copy link
Contributor

I think the handling of labelling modifications is wrong in general. In the docs it is recommended to add "label" only to mods that do not affect retention time

@daichengxin
Copy link
Collaborator Author

Yes, so quantms currently support DIA-LFQ. DIA-iso is not supported. If label support is required, other dia module will need to be coordinated. I'm not sure if it's necessary now.

@jpfeuffer
Copy link
Contributor

True. I agree it is not super important. But then we should probably fail instead of producing something wrong. Let's at least fix the aggregation. And add a TODO for the "label" issue.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR attempts to fix a PTM parsing bug in DIA-NN configuration by introducing a merging mechanism for duplicate modifications across different sites. However, the implementation contains critical bugs that need to be addressed.

Key Changes:

  • Introduced defaultdict to collect sites for each modification
  • Refactored PTM handling to merge modifications with different sites into a single entry
  • Updated sdrf-pipelines dependency to version 0.0.33 across all dependency files
  • Bumped package version to 0.0.24

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
quantmsutils/diann/dianncfg.py Introduces merging logic for PTMs but contains bugs in string-to-list conversion
pyproject.toml Updates package version to 0.0.24 and pins sdrf-pipelines to 0.0.33
requirements.txt Pins sdrf-pipelines dependency to version 0.0.33
environment.yml Pins sdrf-pipelines dependency to version 0.0.33
quantmsutils/init.py Updates package version string to 0.0.24

Critical Issues Found:

  1. String-to-list conversion bugs (lines 84, 86, 126, 128): The code uses list(site) and list(site + "," + "label") which converts strings into character arrays. For example, list("K,label") produces ['K', ',', 'l', 'a', 'b', 'e', 'l'] instead of keeping it as a single string or list of strings. This breaks the merging logic.

  2. Label modification format inconsistency (lines 95-96, 131-133): The merging approach is incompatible with label modifications that require a ,label suffix. The old format was "ModName,mass,K,label" but the new merging logic doesn't properly handle this case.

  3. Missing test coverage: The convert_mod function lacks functional tests despite being complex and bug-prone. Only a help message test exists for the dianncfg command.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

merged[diann_mod].extend(list(site + "," + "label"))
else:
var_ptm.append(diann_mod + "," + site)
merged[diann_mod].extend(list(site))
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Using list(site) converts the string into a list of individual characters. For example, if site="*n" (as set on line 116), this produces ['*', 'n'] instead of ['*n'].

This should be: merged[diann_mod].append(site) to add the site as a single string element.

Copilot uses AI. Check for mistakes.
or "mTRAQ" in diann_mod
):
fix_ptm.append(diann_mod + "," + site + "," + "label")
merged[diann_mod].extend(list(site + "," + "label"))
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Using list(site + "," + "label") converts the string into a list of individual characters rather than a list containing the string. For example, if site="K", this produces ['K', ',', 'l', 'a', 'b', 'e', 'l'] instead of ['K', ',', 'label'].

This should be: merged[diann_mod].extend([site, ",", "label"]) or merged[diann_mod].append(site + "," + "label")

Copilot uses AI. Check for mistakes.
@fabianegli
Copy link
Contributor

It would be nice to have a test for the PTM parsing e.g. here or because the parsing is a function, it could be a small unit test.

@ypriverol
Copy link
Member

It would be nice to have a test for the PTM parsing e.g. here or because the parsing is a function, it could be a small unit test.

This is true, @fabianegli feel free to help us here, we are now quite busy with other issues, including quantms PTMs handling in SDRF DIA and DDA.

@ypriverol ypriverol merged commit 08715f3 into bigbio:dev Nov 27, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants