fixed diann PTM parse bug #57

daichengxin · 2025-11-24T12:42:29Z

PR Type

Bug fix, Enhancement

Description

Fixed PTM parsing bug in DIA-NN configuration by merging duplicate modifications
Improved handling of multiple sites for same modification using defaultdict
Updated sdrf-pipelines dependency to version 0.0.33 for compatibility
Bumped package version to 0.0.24

Diagram Walkthrough

flowchart LR
  A["PTM Parse Input"] --> B["Collect Sites per Modification"]
  B --> C["Merge Duplicates with defaultdict"]
  C --> D["Format Output with Site List"]
  D --> E["Fixed PTM Output"]

File Walkthrough

Relevant files

Configuration changes

__init__.py `Version bump to 0.0.24` quantmsutils/init.py Updated version from 0.0.23 to 0.0.24	+1/-1
pyproject.toml `Update version and dependency constraints` pyproject.toml Updated package version from 0.0.23 to 0.0.24 Updated sdrf-pipelines dependency from >=0.0.32 to ==0.0.33	+2/-2

Bug fix

dianncfg.py `Refactor PTM parsing to merge duplicate modifications` quantmsutils/diann/dianncfg.py Added `defaultdict` import for efficient modification grouping Refactored `convert_mod()` function to merge duplicate modifications across different sites Changed from appending individual entries to collecting sites in `merged` dictionary Implemented merging logic that combines sites for same modification and formats as `name (sites)` Applied same merging pattern to both fixed and variable PTM processing	+17/-5

Dependencies

environment.yml `Pin sdrf-pipelines to version 0.0.33` environment.yml Updated sdrf-pipelines dependency from >=0.0.31 to ==0.0.33	+1/-1
requirements.txt `Pin sdrf-pipelines to version 0.0.33` requirements.txt Updated sdrf-pipelines dependency from >=0.0.31 to ==0.0.33	+1/-1

Dev

coderabbitai · 2025-11-24T12:43:05Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

qodo-code-review · 2025-11-24T12:43:11Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢	No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
🟢	Generic: Meaningful Naming and Self-Documenting Code Objective: Ensure all identifiers clearly express their purpose and intent, making code self-documenting Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Error Handling Objective: To prevent the leakage of sensitive system information through error messages while providing sufficient detail for internal debugging. Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
⚪	Generic: Comprehensive Audit Trails Objective: To create a detailed and reliable record of critical system actions for security analysis and compliance. Status: No audit logging: The new PTM merging logic adds processing paths without adding any audit or action logging for critical operations or warnings. Referred Code if fix_mod != "": merged = defaultdict(list) for mod in fix_mod.split(","): tag = 0 diann_mod = None for modification in unimod_database.modifications: if modification.get_name() == mod.split(" ")[0]: diann_mod = modification.get_name() + "," + str(modification._delta_mono_mass) tag = 1 break if tag == 0: logging.info( "Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: " + mod ) continue site = re.findall(pattern, " ".join(mod.split(" ")[1:]))[0] if site == "Protein N-term": site = "*n" elif site == "N-term": site = "n" ... (clipped 59 lines) Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Robust Error Handling and Edge Case Management Objective: Ensure comprehensive error handling that provides meaningful context and graceful degradation Status: Weak error handling: The code uses print for unsupported modifications and lacks structured error handling or logging for edge cases like empty tokens or unknown sites. Referred Code print( "Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: " + mod ) # merge same modification for different sites Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Logging Practices Objective: To ensure logs are useful for debugging and auditing without exposing sensitive information like PII, PHI, or cardholder data. Status: Print over logging: The code prints warnings instead of using structured logging, reducing log usefulness and consistency. Referred Code print( "Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: " + mod ) # merge same modification for different sites Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Security-First Input Validation and Data Handling Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent vulnerabilities Status: Missing validation: Newly added parsing relies on split(',') and regex without validating tokens or site letters, which may cause unexpected behavior with malformed inputs. Referred Code if fix_mod != "": merged = defaultdict(list) for mod in fix_mod.split(","): tag = 0 diann_mod = None for modification in unimod_database.modifications: if modification.get_name() == mod.split(" ")[0]: diann_mod = modification.get_name() + "," + str(modification._delta_mono_mass) tag = 1 break if tag == 0: logging.info( "Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: " + mod ) continue site = re.findall(pattern, " ".join(mod.split(" ")[1:]))[0] if site == "Protein N-term": site = "*n" elif site == "N-term": site = "n" ... (clipped 59 lines) Learn more about managing compliance generic rules or creating your own custom rules
Update

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

qodo-code-review · 2025-11-24T12:44:04Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	✅ ~~Fix incorrect list conversion for labels~~ Suggestion Impact: The commit removed the use of extend(list(...)) entirely and switched to appending the site string (and refactored label handling into get_mod). This prevents the erroneous character-wise splitting the suggestion highlighted. code diff: - or "mTRAQ" in diann_mod - ): - merged[diann_mod].extend(list(site + "," + "label")) - elif diann_mod is not None: - merged[diann_mod].extend(list(site)) - else: - print( - "Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: " - + mod - ) + diann_mod, site = get_mod(mod) + merged[diann_mod].append(site) # merge same modification for different sites for name, site_list in merged.items(): @@ -98,35 +105,8 @@ if var_mod != "": merged = defaultdict(list) for mod in var_mod.split(","): - tag = 0 - diann_mod = None - for modification in unimod_database.modifications: - if modification.get_name() == mod.split(" ")[0]: - diann_mod = modification.get_name() + "," + str(modification._delta_mono_mass) - tag = 1 - break - if tag == 0: - print( - "Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: " - + mod - ) - continue - site = re.findall(pattern, " ".join(mod.split(" ")[1:]))[0] - if site == "Protein N-term": - site = "*n" - elif site == "N-term": - site = "n" - - if ( - "TMT" in diann_mod - or "Label" in diann_mod - or "iTRAQ" in diann_mod - or "mTRAQ" in diann_mod - ): - merged[diann_mod].extend(list(site + "," + "label")) - else: - merged[diann_mod].extend(list(site)) - + diann_mod, site = get_mod(mod) + merged[diann_mod].append(site) # merge same modification for different sites Fix incorrect list conversion for label modifications. The expression `list(site` `+ "," + "label")` should be `merged[diann_mod].append(site + ",label")` to prevent splitting the string into a list of characters. quantmsutils/diann/dianncfg.py [84] -merged[diann_mod].extend(list(site + "," + "label")) +merged[diann_mod].append(site + ",label") `[Suggestion processed]` Suggestion importance[1-10]: 9 __ Why: The suggestion correctly identifies a critical bug where `list(site + "," + "label")` incorrectly splits the string into characters, leading to a malformed output. While the proposed fix is also flawed due to subsequent processing logic, identifying this bug is crucial for the function's correctness.	High
High-level	Pinning a dependency version is risky The `sdrf-pipelines` dependency should not be pinned to an exact version. Instead, use a compatible version range like `~=0.0.33` to avoid potential dependency conflicts for users of this package. Examples: pyproject.toml [34] sdrf-pipelines = "==0.0.33" environment.yml [9] - sdrf-pipelines==0.0.33 Solution Walkthrough: Before: # pyproject.toml [tool.poetry.dependencies] sdrf-pipelines = "==0.0.33" # requirements.txt sdrf-pipelines==0.0.33 # environment.yml dependencies: - sdrf-pipelines==0.0.33 After: # pyproject.toml [tool.poetry.dependencies] sdrf-pipelines = "~=0.0.33" # or ">=0.0.33, <0.1.0" # requirements.txt sdrf-pipelines>=0.0.33 # environment.yml dependencies: - sdrf-pipelines>=0.0.33 Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies a dependency-pinning issue across multiple files, which is a significant maintainability and usability concern for a library, and proposes a valid best-practice solution.	Medium
General	✅ ~~Simplify site list extension logic~~ Suggestion Impact: The commit replaced uses of extend(list(site)) with append(site) when building the merged dict for both fixed and variable mods. code diff: + diann_mod, site = get_mod(mod) + merged[diann_mod].append(site) # merge same modification for different sites for name, site_list in merged.items(): @@ -98,35 +105,8 @@ if var_mod != "": merged = defaultdict(list) for mod in var_mod.split(","): - tag = 0 - diann_mod = None - for modification in unimod_database.modifications: - if modification.get_name() == mod.split(" ")[0]: - diann_mod = modification.get_name() + "," + str(modification._delta_mono_mass) - tag = 1 - break - if tag == 0: - print( - "Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: " - + mod - ) - continue - site = re.findall(pattern, " ".join(mod.split(" ")[1:]))[0] - if site == "Protein N-term": - site = "*n" - elif site == "N-term": - site = "n" - - if ( - "TMT" in diann_mod - or "Label" in diann_mod - or "iTRAQ" in diann_mod - or "mTRAQ" in diann_mod - ): - merged[diann_mod].extend(list(site + "," + "label")) - else: - merged[diann_mod].extend(list(site)) - + diann_mod, site = get_mod(mod) + merged[diann_mod].append(site) # merge same modification for different sites Simplify the site list extension logic by replacing `merged[diann_mod].extend(list(site))` with the more direct and readable `merged[diann_mod].append(site)`. quantmsutils/diann/dianncfg.py [86] -merged[diann_mod].extend(list(site)) +merged[diann_mod].append(site) `[Suggestion processed]` Suggestion importance[1-10]: 4 __ Why: The suggestion correctly points out that `extend(list(site))` is equivalent to `append(site)` for single-character sites, and proposes a change that improves code clarity and simplicity.	Low
Update

quantmsutils/diann/dianncfg.py

jpfeuffer · 2025-11-24T13:18:22Z

I think the handling of labelling modifications is wrong in general. In the docs it is recommended to add "label" only to mods that do not affect retention time

daichengxin · 2025-11-24T13:30:50Z

Yes, so quantms currently support DIA-LFQ. DIA-iso is not supported. If label support is required, other dia module will need to be coordinated. I'm not sure if it's necessary now.

jpfeuffer · 2025-11-24T13:33:26Z

True. I agree it is not super important. But then we should probably fail instead of producing something wrong. Let's at least fix the aggregation. And add a TODO for the "label" issue.

Copilot

Pull request overview

This PR attempts to fix a PTM parsing bug in DIA-NN configuration by introducing a merging mechanism for duplicate modifications across different sites. However, the implementation contains critical bugs that need to be addressed.

Key Changes:

Introduced defaultdict to collect sites for each modification
Refactored PTM handling to merge modifications with different sites into a single entry
Updated sdrf-pipelines dependency to version 0.0.33 across all dependency files
Bumped package version to 0.0.24

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
quantmsutils/diann/dianncfg.py	Introduces merging logic for PTMs but contains bugs in string-to-list conversion
pyproject.toml	Updates package version to 0.0.24 and pins sdrf-pipelines to 0.0.33
requirements.txt	Pins sdrf-pipelines dependency to version 0.0.33
environment.yml	Pins sdrf-pipelines dependency to version 0.0.33
quantmsutils/init.py	Updates package version string to 0.0.24

Critical Issues Found:

String-to-list conversion bugs (lines 84, 86, 126, 128): The code uses list(site) and list(site + "," + "label") which converts strings into character arrays. For example, list("K,label") produces ['K', ',', 'l', 'a', 'b', 'e', 'l'] instead of keeping it as a single string or list of strings. This breaks the merging logic.
Label modification format inconsistency (lines 95-96, 131-133): The merging approach is incompatible with label modifications that require a ,label suffix. The old format was "ModName,mass,K,label" but the new merging logic doesn't properly handle this case.
Missing test coverage: The convert_mod function lacks functional tests despite being complex and bug-prone. Only a help message test exists for the dianncfg command.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

quantmsutils/diann/dianncfg.py

Copilot · 2025-11-24T13:46:45Z

quantmsutils/diann/dianncfg.py

+                merged[diann_mod].extend(list(site + "," + "label"))
            else:
-                var_ptm.append(diann_mod + "," + site)
+                merged[diann_mod].extend(list(site))


Bug: Using list(site) converts the string into a list of individual characters. For example, if site="*n" (as set on line 116), this produces ['*', 'n'] instead of ['*n'].

This should be: merged[diann_mod].append(site) to add the site as a single string element.

quantmsutils/diann/dianncfg.py

Copilot · 2025-11-24T13:46:45Z

quantmsutils/diann/dianncfg.py

                or "mTRAQ" in diann_mod
            ):
-                fix_ptm.append(diann_mod + "," + site + "," + "label")
+                merged[diann_mod].extend(list(site + "," + "label"))


Bug: Using list(site + "," + "label") converts the string into a list of individual characters rather than a list containing the string. For example, if site="K", this produces ['K', ',', 'l', 'a', 'b', 'e', 'l'] instead of ['K', ',', 'label'].

This should be: merged[diann_mod].extend([site, ",", "label"]) or merged[diann_mod].append(site + "," + "label")

quantmsutils/diann/dianncfg.py

fabianegli · 2025-11-24T16:47:43Z

It would be nice to have a test for the PTM parsing e.g. here or because the parsing is a function, it could be a small unit test.

ypriverol · 2025-11-24T16:49:42Z

It would be nice to have a test for the PTM parsing e.g. here or because the parsing is a function, it could be a small unit test.

This is true, @fabianegli feel free to help us here, we are now quite busy with other issues, including quantms PTMs handling in SDRF DIA and DDA.

tests/test_commands.py

daichengxin added 3 commits May 8, 2025 21:00

Merge pull request #7 from bigbio/dev

96d2fe1

Dev

Merge pull request #9 from bigbio/dev

232e9ec

Dev

fixed diann PTM parse bug

65fbeb8

daichengxin mentioned this pull request Nov 24, 2025

Specifying Phospho S T Y modifications caused bugs bigbio/quantms#608

Closed

qodo-code-review bot added the Review effort 2/5 label Nov 24, 2025

daichengxin requested a review from ypriverol November 24, 2025 13:13

jpfeuffer requested changes Nov 24, 2025

View reviewed changes

quantmsutils/diann/dianncfg.py Outdated Show resolved Hide resolved

ypriverol requested review from Copilot and jpfeuffer November 24, 2025 13:42

ypriverol approved these changes Nov 24, 2025

View reviewed changes

Copilot started reviewing on behalf of ypriverol November 24, 2025 13:42 View session

Copilot finished reviewing on behalf of ypriverol November 24, 2025 13:45

Copilot AI reviewed Nov 24, 2025

View reviewed changes

daichengxin added 5 commits November 25, 2025 19:37

fixed

1a7b394

Update dianncfg.py

c66ff69

Update test_commands.py

1b54714

Update dianncfg.py

4557f07

Update dianncfg.py

b7cdfe7

jpfeuffer reviewed Nov 25, 2025

View reviewed changes

tests/test_commands.py Show resolved Hide resolved

daichengxin added 3 commits November 27, 2025 09:29

Update dianncfg.py

5f04fe4

Update test_commands.py

d76f991

Update dianncfg.py

9feb130

ypriverol merged commit 08715f3 into bigbio:dev Nov 27, 2025
3 checks passed

fixed diann PTM parse bug #57

fixed diann PTM parse bug #57

Uh oh!

Conversation

daichengxin commented Nov 24, 2025 • edited by qodo-code-review bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

coderabbitai bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Other AI code review bot(s) detected

Uh oh!

qodo-code-review bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Compliance Guide 🔍

Uh oh!

qodo-code-review bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

Uh oh!

jpfeuffer commented Nov 24, 2025

Uh oh!

daichengxin commented Nov 24, 2025

Uh oh!

jpfeuffer commented Nov 24, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fabianegli commented Nov 24, 2025

Uh oh!

ypriverol commented Nov 24, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

daichengxin commented Nov 24, 2025 •

edited by qodo-code-review bot

Loading

coderabbitai bot commented Nov 24, 2025 •

edited

Loading

qodo-code-review bot commented Nov 24, 2025 •

edited

Loading

qodo-code-review bot commented Nov 24, 2025 •

edited

Loading