-
Notifications
You must be signed in to change notification settings - Fork 4
Update changes in PTM diann handling #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fixed diann PTM parse bug
update dev
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughUpdates dependency pins for Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant dianncfg as dianncfg module
participant get_mod as get_mod()
participant unimod as get_unimod_database()
rect rgba(0,128,96,0.06)
Note over dianncfg,unimod: New flow — lazy Unimod DB + get_mod helper
Caller->>dianncfg: dianncfg(..., fix_mod, var_mod)
dianncfg->>dianncfg: convert_mod(fix_mod, var_mod)
dianncfg->>get_mod: request mapping for each mod
end
rect rgba(0,96,160,0.05)
get_mod->>unimod: ensure/init unimod DB & query entry
unimod-->>get_mod: unimod entry/data
get_mod-->>dianncfg: formatted DIA‑NN modifier + site
end
dianncfg->>dianncfg: merge modifiers, normalize terminals, detect MET_LOSS_MODIFICATION
dianncfg-->>Caller: assembled DIA‑NN config (may include --met-excision)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
|||||||||||||||||||||||||
PR Code Suggestions ✨Explore these optional code suggestions:
|
|||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR updates the PTM (Post-Translational Modification) handling in the DIA-NN configuration module. The changes refactor the modification conversion logic to better handle PTM merging across sites and add special support for Met-loss (methionine loss) modifications. This is part of the quantms-utils proteomics workflow, which converts SDRF parameters to DIA-NN parameters.
Key changes include:
- Refactored PTM conversion logic to extract a new
get_mod()helper function for cleaner code organization - Added special handling for Met-loss modifications using DIA-NN's
--met-excisionflag - Implemented PTM merging logic to consolidate the same modification across different sites
- Updated sdrf-pipelines dependency from
>=0.0.31to==0.0.33across all dependency files
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| quantmsutils/diann/dianncfg.py | Refactored modification conversion with new get_mod() function, added Met-loss special handling, and implemented modification merging across sites |
| tests/test_commands.py | Added new test case for dianncfg command with complex PTM examples including Met-loss |
| requirements.txt | Updated sdrf-pipelines to exact version 0.0.33 |
| pyproject.toml | Updated sdrf-pipelines to exact version 0.0.33 |
| environment.yml | Updated sdrf-pipelines to exact version 0.0.33 |
| quantmsutils/init.py | Bumped package version from 0.0.23 to 0.0.24 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "Currently only supported unimod modifications for DIA pipeline. Skipped: " | ||
| + mod | ||
| ) | ||
| exit(1) |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using exit(1) directly is not ideal in library code as it terminates the entire Python process. Consider raising a custom exception (e.g., ValueError or a custom ModificationNotSupportedError) instead, allowing callers to handle the error appropriately. This pattern appears in lines 71, 85, and 110.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot open a new pull request to apply changes based on this feedback
| "quantms DIA-NN workflow only support LFQ now! Unsupported modifications: " | ||
| + mod | ||
| ) | ||
| exit(1) |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using exit(1) directly is not ideal in library code as it terminates the entire Python process. Consider raising a custom exception (e.g., ValueError or a custom ModificationNotSupportedError) instead, allowing callers to handle the error appropriately.
| "Currently only supported unimod modifications for DIA pipeline. Skipped: " | ||
| + mod | ||
| ) | ||
| exit(1) |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using exit(1) directly is not ideal in library code as it terminates the entire Python process. Consider raising a custom exception (e.g., ValueError or a custom ModificationNotSupportedError) instead, allowing callers to handle the error appropriately.
|
@ypriverol I've opened a new pull request, #60, to work on those changes. Once the pull request is ready, I'll request review from you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
quantmsutils/diann/dianncfg.py (1)
54-111: Makeget_moderror handling more defensive and reusableTwo potential sharp edges here:
site = re.findall(pattern, " ".join(mod.split(" ")[1:]))[0]assumes there is always at least one parenthesized site; malformed input would raiseIndexErrorinstead of a clear error.- Calling
exit(1)from a helper makes the function hard to reuse programmatically and harder to test in isolation; it’s more idiomatic to raise a descriptive exception and let the CLI layer convert that to a non‑zero exit code.Consider (a) guarding the
findallresult and failing with a clear message if no site is found, and (b) raising a custom exception (orValueError) instead of exiting directly, with thedianncfgcommand catching and reporting it.tests/test_commands.py (1)
88-100: Strengthentest_dianncfg_exampleby asserting on the generated configThis is a useful smoke test for the CLI, but it currently only checks
exit_code == 0. To catch regressions in PTM/enzyme handling, consider also asserting that:
diann_config.cfgis created, and- its contents contain the expected
--cut,--fixed-mod,--var-mod, and--met-excisionflags for this input.That would exercise the new
get_mod/convert_modlogic more thoroughly.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
environment.yml(1 hunks)pyproject.toml(1 hunks)quantmsutils/__init__.py(1 hunks)quantmsutils/diann/dianncfg.py(3 hunks)requirements.txt(1 hunks)tests/test_commands.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Agent
- GitHub Check: build
- GitHub Check: build (3.10)
- GitHub Check: build (3.9)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
🔇 Additional comments (4)
environment.yml (1)
9-9: Pinningsdrf-pipelinesin conda env looks consistentPinning to
==0.0.33improves reproducibility and matches the other manifests; just make sure this version is available on your configuredbioconda/conda-forgechannels in CI and production.requirements.txt (1)
2-2: Requirements pin aligned with other manifestsPinning
sdrf-pipelinesto==0.0.33here keeps pip installs consistent with the conda env and Poetry config; no issues from the code side.quantmsutils/__init__.py (1)
1-1: Version bump to 0.0.24 is consistent
__version__matches the version inpyproject.tomland the scope of changes in this PR; nothing else needed here.pyproject.toml (1)
34-34: Poetry dependency pin consistent with other manifestsPinning
sdrf-pipelinesto==0.0.33here keeps Poetry installs aligned with conda and requirements.txt, which should avoid subtle version skew.
|
@ypriverol I've opened a new pull request, #61, to work on those changes. Once the pull request is ready, I'll request review from you. |
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
quantmsutils/diann/dianncfg.py (3)
36-48: Make the Met-loss →--met-excisionmapping more robust than a full-string equalityThe special handling for Met-loss currently depends on an exact match of the full formatted string:
if mod == "UniMod:765,-131.040485,*nM": diann_var_ptm += " --met-excision "This is brittle against:
- Changes in float formatting of the delta mass.
- Potential future tweaks to how
convert_modformats PTMs.Consider parsing
modinto accession / mass / sites and keying the special case only on stable parts (e.g. accessionUniMod:765and sites*nM), e.g.:parts = mod.split(",", 2) accession, _, sites = parts if len(parts) == 3 else (None, None, None) if accession == "UniMod:765" and sites == "*nM": diann_var_ptm += " --met-excision " else: diann_var_ptm += var_ptm_str + modThis preserves behavior while avoiding silent breakage from formatting changes.
54-125:get_modis much clearer and safer; only remaining concern is error handling stylePositives in this refactor:
- The new docstring and
modification_foundflag significantly improve readability and maintainability.- Guarding against missing site matches via
if not sites:avoids the previousIndexErrorrisk when parentheses are absent.- Terminal site normalization and the Met-loss
*nMspecial case are now clearly expressed.The only thing to consider in a future pass is replacing the internal
exit(1)calls with raised exceptions so callers can decide how to handle failures (while the CLI can still convert them to process exits).
128-152:convert_modcorrectly handles missing options and merges sites; tighten typing and empty-token handlingThe updated guards:
if fix_mod: ... if var_mod: ...nicely avoid the previous
None.split(",")crash when options are omitted and still handle empty strings correctly. The defaultdict-based merge to producename,site_stralso looks good.Two small follow-ups you may want to tackle (can be deferred):
- Type hints to match runtime behavior
fix_modandvar_modcan beNoneas well asstr, and the lists hold strings:from typing import List, Tuple, Optional def convert_mod( fix_mod: Optional[str], var_mod: Optional[str] ) -> Tuple[List[str], List[str]]: ...
- Skip empty tokens after splitting
If users accidentally pass a trailing comma,
fix_mod.split(",")can yield"", which would then fail inget_mod. You can filter them cheaply:for mod in (m for m in fix_mod.split(",") if m): ...
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
quantmsutils/diann/dianncfg.py(3 hunks)requirements.txt(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build (3.9)
- GitHub Check: build (3.10)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (1)
quantmsutils/diann/dianncfg.py (1)
10-16: Module-level Unimod database and defaultdict usage look appropriateUsing a module-global
UnimodDatabase()plusdefaultdictfor downstream merging is a good fit here: it removes repeated DB construction while keeping PTM merging local toconvert_mod. No functional issues from this change.
Co-authored-by: Copilot <[email protected]>
|
@ypriverol I've opened a new pull request, #63, to work on those changes. Once the pull request is ready, I'll request review from you. |
Co-authored-by: ypriverol <[email protected]>
Co-authored-by: ypriverol <[email protected]>
Implement lazy initialization for UnimodDatabase
|
@ypriverol I've opened a new pull request, #64, to work on those changes. Once the pull request is ready, I'll request review from you. |
Co-authored-by: ypriverol <[email protected]>
Co-authored-by: ypriverol <[email protected]>
Remove unreachable else block in get_mod function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
quantmsutils/diann/dianncfg.py (1)
39-40: Met-loss detection still couples behavior to exact mass string formattingEven though
MET_LOSS_MODIFICATIONis now a named constant, the equality check in the var-mod loop still depends on the exact mass string ("UniMod:765,-131.040485,*nM"). If the Unimod library changes the precision or formatting of_delta_mono_mass, Met-loss will silently stop triggering--met-excisionand instead be treated as a normal--var-mod.To decouple behavior from float formatting, match on stable fields (accession + site) instead of the full string. For example:
-# Met-loss modification constant (UniMod:765) with mass shift and site specification -MET_LOSS_MODIFICATION = "UniMod:765,-131.040485,*nM" +# Met-loss modification accession and site (UniMod:765, N-terminal Met-loss) +MET_LOSS_ACCESSION = "UniMod:765" +MET_LOSS_SITE = "*nM" @@ - for mod in var_ptm: - if mod == MET_LOSS_MODIFICATION: - diann_var_ptm += " --met-excision " - else: - diann_var_ptm += var_ptm_str + mod + for mod in var_ptm: + # Expect format: accession,delta_mass,sites + parts = mod.split(",", 2) + if len(parts) == 3: + accession, _, sites = parts + else: + accession, sites = None, None + + if accession == MET_LOSS_ACCESSION and sites == MET_LOSS_SITE: + diann_var_ptm += " --met-excision " + else: + diann_var_ptm += var_ptm_str + modThis keeps behavior stable even if the Unimod mass representation changes.
Also applies to: 68-71
🧹 Nitpick comments (1)
quantmsutils/diann/dianncfg.py (1)
17-37: Lazy UnimodDatabase initialization looks good; consider adding type annotationsThe lazy
_unimod_database+get_unimod_database()pattern is a nice improvement for testability and import-time cost. To make the intent clearer to readers and type-checkers, consider annotating the storage and accessor:
_unimod_database: UnimodDatabase | None = Nonedef get_unimod_database() -> UnimodDatabase: ...This makes it obvious that callers always receive a fully initialized
UnimodDatabaseinstance.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.gitignore(1 hunks)environment.yml(1 hunks)quantmsutils/diann/dianncfg.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- environment.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: build
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build (3.10)
- GitHub Check: build (3.9)
- GitHub Check: build
🔇 Additional comments (1)
.gitignore (1)
172-177: Ignoring DIA-NN config and Parquet artifacts is appropriateThe new patterns (
diann_config.cfgand*.parquet) are clearly documented as test-generated artifacts and are safe to keep out of version control. No issues here.
| def convert_mod(fix_mod: str, var_mod: str) -> Tuple[List, List]: | ||
| var_ptm = [] | ||
| fix_ptm = [] | ||
|
|
||
| if fix_mod != "": | ||
| if fix_mod: | ||
| merged = defaultdict(list) | ||
| for mod in fix_mod.split(","): | ||
| tag = 0 | ||
| diann_mod = None | ||
| for modification in unimod_database.modifications: | ||
| if modification.get_name() == mod.split(" ")[0]: | ||
| diann_mod = modification.get_name() + "," + str(modification._delta_mono_mass) | ||
| tag = 1 | ||
| break | ||
| if tag == 0: | ||
| logging.info( | ||
| "Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: " | ||
| + mod | ||
| ) | ||
| continue | ||
| site = re.findall(pattern, " ".join(mod.split(" ")[1:]))[0] | ||
| if site == "Protein N-term": | ||
| site = "*n" | ||
| elif site == "N-term": | ||
| site = "n" | ||
|
|
||
| if ( | ||
| "TMT" in diann_mod | ||
| or "Label" in diann_mod | ||
| or "iTRAQ" in diann_mod | ||
| or "mTRAQ" in diann_mod | ||
| ): | ||
| fix_ptm.append(diann_mod + "," + site + "," + "label") | ||
| elif diann_mod is not None: | ||
| fix_ptm.append(diann_mod + "," + site) | ||
| else: | ||
| print( | ||
| "Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: " | ||
| + mod | ||
| ) | ||
|
|
||
| if var_mod != "": | ||
| diann_mod, site = get_mod(mod, "fixed_mod") | ||
| merged[diann_mod].append(site) | ||
|
|
||
| # merge same modification for different sites | ||
| for name, site_list in merged.items(): | ||
| site_str = "".join(sorted(set(site_list))) | ||
| fix_ptm.append(f"{name},{site_str}") | ||
|
|
||
| if var_mod: | ||
| merged = defaultdict(list) | ||
| for mod in var_mod.split(","): | ||
| tag = 0 | ||
| diann_mod = None | ||
| for modification in unimod_database.modifications: | ||
| if modification.get_name() == mod.split(" ")[0]: | ||
| diann_mod = modification.get_name() + "," + str(modification._delta_mono_mass) | ||
| tag = 1 | ||
| break | ||
| if tag == 0: | ||
| print( | ||
| "Warning: Currently only supported unimod modifications for DIA pipeline. Skipped: " | ||
| + mod | ||
| ) | ||
| continue | ||
| site = re.findall(pattern, " ".join(mod.split(" ")[1:]))[0] | ||
| if site == "Protein N-term": | ||
| site = "*n" | ||
| elif site == "N-term": | ||
| site = "n" | ||
|
|
||
| if ( | ||
| "TMT" in diann_mod | ||
| or "Label" in diann_mod | ||
| or "iTRAQ" in diann_mod | ||
| or "mTRAQ" in diann_mod | ||
| ): | ||
| var_ptm.append(diann_mod + "," + site + "," + "label") | ||
| else: | ||
| var_ptm.append(diann_mod + "," + site) | ||
| diann_mod, site = get_mod(mod, "var_mod") | ||
| merged[diann_mod].append(site) | ||
| # merge same modification for different sites | ||
| for name, site_list in merged.items(): | ||
| site_str = "".join(sorted(set(site_list))) | ||
| var_ptm.append(f"{name},{site_str}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trim whitespace and align type hints in convert_mod to avoid subtle CLI failures
convert_mod now correctly uses truthiness checks, so None from click options is safe. However:
- When users pass values like
"Oxidation (M), Carbamidomethyl (C)"(note the space after the comma),fix_mod.split(",")produces entries with leading spaces. Becauseget_modusesmod.split(" ")with an explicit separator, the leading space yields an empty first token, and the Unimod name lookup fails, causing an error/exit even though the modification name is valid. - The function signature still declares
fix_mod: str, var_mod: str, but at runtime it acceptsNone(from click), so the annotation is misleading.
You can harden the parsing and align the annotations like this:
-def convert_mod(fix_mod: str, var_mod: str) -> Tuple[List, List]:
+def convert_mod(fix_mod: str | None, var_mod: str | None) -> Tuple[List[str], List[str]]:
@@
- if fix_mod:
- merged = defaultdict(list)
- for mod in fix_mod.split(","):
- diann_mod, site = get_mod(mod, "fixed_mod")
- merged[diann_mod].append(site)
+ if fix_mod:
+ merged = defaultdict(list)
+ for raw_mod in fix_mod.split(","):
+ mod = raw_mod.strip()
+ if not mod:
+ continue
+ diann_mod, site = get_mod(mod, "fixed_mod")
+ merged[diann_mod].append(site)
@@
- if var_mod:
- merged = defaultdict(list)
- for mod in var_mod.split(","):
- diann_mod, site = get_mod(mod, "var_mod")
- merged[diann_mod].append(site)
+ if var_mod:
+ merged = defaultdict(list)
+ for raw_mod in var_mod.split(","):
+ mod = raw_mod.strip()
+ if not mod:
+ continue
+ diann_mod, site = get_mod(mod, "var_mod")
+ merged[diann_mod].append(site)And update the import to include the newer union syntax or Optional as appropriate for your minimum Python version (e.g., from typing import List, Tuple, Optional and Optional[str] if you need pre-3.10 support).
This keeps the CLI robust against common spacing patterns and documents the actual accepted types.
🤖 Prompt for AI Agents
In quantmsutils/diann/dianncfg.py around lines 145–167, tighten parsing and fix
annotations: change the signature to accept optional strings and precise return
types (e.g., def convert_mod(fix_mod: Optional[str], var_mod: Optional[str]) ->
Tuple[List[str], List[str]] and add Optional to imports), and when iterating
split values trim whitespace and ignore empty tokens (e.g., iterate over
(token.strip() for token in fix_mod.split(",")) and only call get_mod on
non-empty tokens) so entries like "Oxidation (M), Carbamidomethyl (C)" parse
correctly; keep the existing merged logic and identical fixes for var_mod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
quantmsutils/diann/dianncfg.py (2)
150-158: LGTM! Consider whitespace handling for robustness.The merging logic using
defaultdictis clean and correctly consolidates identical modifications across different sites.However, a past review comment noted that users might pass values with spaces after commas (e.g.,
"Oxidation (M), Carbamidomethyl (C)"), which would causemod.split(" ")[0]inget_modto fail due to leading whitespace. Consider adding.strip()when iterating:for mod in fix_mod.split(","): mod = mod.strip() if not mod: continue diann_mod, site = get_mod(mod, "fixed_mod")Based on learnings, past review comment already flagged this issue.
161-168: LGTM! Consider whitespace handling for robustness.The variable modification merging logic mirrors the fixed modification approach and works correctly.
Apply the same whitespace trimming here as suggested for fixed modifications:
for mod in var_mod.split(","): mod = mod.strip() if not mod: continue diann_mod, site = get_mod(mod, "var_mod")Based on learnings, past review comment already flagged this issue.
🧹 Nitpick comments (3)
quantmsutils/diann/dianncfg.py (3)
17-36: Lazy initialization improves testability.The lazy initialization pattern addresses the previous concern about module-level database creation. The approach is sound for single-threaded CLI usage.
For future consideration: if this module is ever used in multi-threaded contexts, the global state could race. A thread-safe singleton pattern (with locks) or dependency injection would be more robust, but that's overkill for the current CLI use case.
94-94: Accessing private attribute_delta_mono_massis fragile.Direct access to the private
_delta_mono_massattribute couples this code to the internal implementation of theUnimodDatabasemodification objects. If the upstream library refactors this attribute (renames it, removes it, or changes access patterns), this code will break.Check if the
modificationobject provides a public method likeget_delta_mass()or similar. If not, consider requesting this as a feature insdrf-pipelinesor wrapping the access in a try-except block with a clear error message.try: mass = modification._delta_mono_mass except AttributeError: logging.error(f"Unable to retrieve mass for modification: {modification.get_name()}") exit(1)
146-146: Refine type hints for clarity.The return type
Tuple[List, List]is imprecise. Both lists contain strings, so the annotation should beTuple[List[str], List[str]]. Additionally, since Click passesNonewhen options are omitted (and the function now handles this with truthiness checks), the parameters should be typed asOptional[str].Apply this diff to align the annotations with runtime behavior:
-from typing import List, Tuple +from typing import List, Tuple, Optional -def convert_mod(fix_mod: str, var_mod: str) -> Tuple[List, List]: +def convert_mod(fix_mod: Optional[str], var_mod: Optional[str]) -> Tuple[List[str], List[str]]:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
quantmsutils/diann/dianncfg.py(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build (3.10)
- GitHub Check: build (3.9)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (8)
quantmsutils/diann/dianncfg.py (8)
10-10: LGTM!The
defaultdictimport is correctly added and used effectively inconvert_modto group modifications by accession.
39-40: LGTM!The
MET_LOSS_MODIFICATIONconstant eliminates the magic string and makes the Met-loss special case explicit and maintainable.
60-60: LGTM!The updated
convert_modcall correctly reflects the new signature that uses the lazy-initialized database internally.
69-72: LGTM!The Met-loss special case handling is now clear and uses the named constant, improving maintainability.
78-87: LGTM!The docstring clearly documents the function's purpose, parameters, return value, and potential exceptions.
89-97: LGTM!Renaming
tagtomodification_foundsignificantly improves code readability. The lookup logic using the lazy database instance is correct.
120-126: LGTM!The error handling for missing site specifications is robust. Checking
if not sites:before accessingsites[0]prevents the IndexError that could have occurred with the previous implementation.
127-143: LGTM!The site normalization logic correctly handles various terminal modification scenarios including Protein N-term, N-term, and compound terminal cases. The special handling for Met-loss is appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 7 changed files in this pull request and generated 16 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Lazy initialization of UnimodDatabase for improved testability. | ||
| # The database is created on first access rather than at module import time, | ||
| # which allows tests to mock or replace it more easily. | ||
| _unimod_database = None |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states "The database is created on first access rather than at module import time, which allows tests to mock or replace it more easily." However, there are no tests in this PR that demonstrate mocking this database, and the implementation uses a module-level global variable which is still difficult to mock properly. If testability is the goal, consider using dependency injection (passing the database as a parameter) or a more robust singleton pattern. The current implementation only partially achieves the stated testability goal.
| modification_found = 0 | ||
| diann_mod_accession = None | ||
| diann_mod_name = None | ||
| for modification in get_unimod_database().modifications: | ||
| if modification.get_name() == mod.split(" ")[0]: | ||
| diann_mod_accession = modification.get_accession().replace("UNIMOD:", "UniMod:") + "," + str(modification._delta_mono_mass) | ||
| diann_mod_name = modification.get_name() | ||
| modification_found = 1 | ||
| break | ||
|
|
||
| if modification_found == 0: |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using modification_found = 0 and modification_found = 1 as boolean flags is not idiomatic Python. Consider using modification_found = False and modification_found = True instead, or better yet, eliminate the flag by restructuring the logic (e.g., initialize diann_mod_accession = None and check if diann_mod_accession is None: at line 99).
| modification_found = 0 | |
| diann_mod_accession = None | |
| diann_mod_name = None | |
| for modification in get_unimod_database().modifications: | |
| if modification.get_name() == mod.split(" ")[0]: | |
| diann_mod_accession = modification.get_accession().replace("UNIMOD:", "UniMod:") + "," + str(modification._delta_mono_mass) | |
| diann_mod_name = modification.get_name() | |
| modification_found = 1 | |
| break | |
| if modification_found == 0: | |
| diann_mod_accession = None | |
| diann_mod_name = None | |
| for modification in get_unimod_database().modifications: | |
| if modification.get_name() == mod.split(" ")[0]: | |
| diann_mod_accession = modification.get_accession().replace("UNIMOD:", "UniMod:") + "," + str(modification._delta_mono_mass) | |
| diann_mod_name = modification.get_name() | |
| break | |
| if diann_mod_accession is None: |
| diann_mod_name = None | ||
| for modification in get_unimod_database().modifications: | ||
| if modification.get_name() == mod.split(" ")[0]: | ||
| diann_mod_accession = modification.get_accession().replace("UNIMOD:", "UniMod:") + "," + str(modification._delta_mono_mass) |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Accessing the private attribute modification._delta_mono_mass (with leading underscore) breaks encapsulation and could be fragile if the underlying library changes. Consider using a public method if available, or document why direct access to this private attribute is necessary.
| diann_mod_accession = modification.get_accession().replace("UNIMOD:", "UniMod:") + "," + str(modification._delta_mono_mass) | |
| # Prefer public accessor for mono mass if available; fallback to private attribute with documentation. | |
| try: | |
| # Try to use a public method/property if available | |
| mono_mass = modification.get_delta_mono_mass() | |
| except AttributeError: | |
| # No public accessor; document why direct access is necessary | |
| # Direct access to _delta_mono_mass is required as no public method is available. | |
| # This may be fragile if the underlying library changes. | |
| mono_mass = modification._delta_mono_mass | |
| diann_mod_accession = modification.get_accession().replace("UNIMOD:", "UniMod:") + "," + str(mono_mass) |
| for DIA-NN and site is the modification site. | ||
| :raises SystemExit: If the modification is not found in the Unimod database, logs an error and exits. | ||
| """ | ||
| pattern = re.compile(r"\((.*?)\)") |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The regex pattern is compiled inside the get_mod() function, which means it will be recompiled for every modification processed. Since this function is called in a loop (lines 151, 162), consider moving the pattern compilation to module level (similar to how MET_LOSS_MODIFICATION is defined at line 40) to improve performance when processing multiple modifications.
| if fix_mod: | ||
| merged = defaultdict(list) | ||
| for mod in fix_mod.split(","): |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check if fix_mod: will evaluate to False for both None and empty string "", but if fix_mod is None, the call to fix_mod.split(",") at line 151 will raise an AttributeError. Consider explicitly checking for None or ensuring that the parameters always receive a string value (empty string for no modifications). The same issue exists for var_mod at line 160.
|
|
||
| # merge same modification for different sites | ||
| for name, site_list in merged.items(): | ||
| site_str = "".join(sorted(set(site_list))) |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The site merging logic concatenates sites without a delimiter (e.g., "STY" for phosphorylation on S, T, and Y). However, if a site is multi-character (like "*n" for Protein N-term), this concatenation could produce ambiguous results. For example, "*n" + "K" becomes "*nK" which could be interpreted as "Protein N-term K" (a two-part site from lines 131-138) rather than two separate sites. Consider documenting this behavior or adding validation to ensure sites don't create ambiguous concatenations.
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
quantmsutils/diann/dianncfg.py (2)
60-72: MET-loss handling still coupled to full string (including mass) – consider parsing insteadThe
if mod == MET_LOSS_MODIFICATIONcheck still ties the--met-excisionmapping to the exact formatting of accession, mono mass, and site. Any future change in Unimod mass precision orstr()formatting would silently drop the special case and emit a generic--var-modinstead.You can decouple this by parsing the components and keying only on accession + site:
- for mod in var_ptm: - if mod == MET_LOSS_MODIFICATION: - diann_var_ptm += " --met-excision " - else: - diann_var_ptm += var_ptm_str + mod + for mod in var_ptm: + # Expect format: accession,delta_mass,sites + parts = mod.split(",", 2) + accession, sites = (parts[0], parts[2]) if len(parts) == 3 else (None, None) + + if accession == "UniMod:765" and sites == "*nM": + diann_var_ptm += " --met-excision " + else: + diann_var_ptm += var_ptm_str + modThis preserves behavior while making it resilient to float-format tweaks.
146-170: Fix whitespace-sensitive parsing and align convert_mod type hints with runtime usageRight now
convert_modwill wrongly reject valid modifications when users include spaces after commas, e.g.:dianncfg --fix_mod "Oxidation (M), Carbamidomethyl (C)"
fix_mod.split(",")yields" Carbamidomethyl (C)"; this is passed unchanged toget_mod, whosemod.split(" ")[0]becomes""(leading space + explicit separator), so no Unimod match is found and the CLI exits with an “Unsupported modification” error.Additionally, click passes
Nonewhen options are omitted, but the type hints still saystr, and the return type isTuple[List, List]instead ofTuple[List[str], List[str]].You can harden this and correct the annotations by stripping tokens and skipping empties:
-from typing import List, Tuple +from typing import List, Tuple, Optional @@ -def convert_mod(fix_mod: str, var_mod: str) -> Tuple[List, List]: - var_ptm = [] - fix_ptm = [] - if fix_mod: - merged = defaultdict(list) - for mod in fix_mod.split(","): - diann_mod, site = get_mod(mod, "fixed_mod") - merged[diann_mod].append(site) +def convert_mod(fix_mod: Optional[str], var_mod: Optional[str]) -> Tuple[List[str], List[str]]: + var_ptm: List[str] = [] + fix_ptm: List[str] = [] + if fix_mod: + merged = defaultdict(list) + for raw_mod in fix_mod.split(","): + mod = raw_mod.strip() + if not mod: + continue + diann_mod, site = get_mod(mod, "fixed_mod") + merged[diann_mod].append(site) @@ - if var_mod: - merged = defaultdict(list) - for mod in var_mod.split(","): - diann_mod, site = get_mod(mod, "var_mod") - merged[diann_mod].append(site) + if var_mod: + merged = defaultdict(list) + for raw_mod in var_mod.split(","): + mod = raw_mod.strip() + if not mod: + continue + diann_mod, site = get_mod(mod, "var_mod") + merged[diann_mod].append(site)(Plus, ensure
Optionalis imported fromtypingas shown.)This keeps behavior identical for well‑formed inputs, but avoids surprising failures for the very common
"A, B"style and documents the actual call contract.
🧹 Nitpick comments (2)
quantmsutils/diann/dianncfg.py (2)
17-40: Lazy Unimod initialization looks good; consider explicit testing hookThe lazy
get_unimod_database()pattern and documentation are clear and avoid work at import time. If you expect frequent mocking in tests, you might optionally expose a tiny helper (e.g.,set_unimod_database(db)) instead of having tests reach into_unimod_databasedirectly, but the current approach is acceptable for this CLI‑oriented module.
78-143: get_mod improvements are solid; only minor style nits remainThe added docstring, Unimod‑only error message, multiplex guard, and especially the check for missing
sitesbefore indexing eliminate the earlierIndexErrorrisk and make failures much clearer. The control flow is correct for both fixed and variable mods, including the Met-loss special case.If you touch this again, you might consider making
modification_founda proper boolean and/or normalizingmodonce (e.g.,mod_name = mod.split(" ", 1)[0].strip()) for readability, but there’s no functional issue here.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
quantmsutils/diann/dianncfg.py(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
- GitHub Check: build (3.9)
- GitHub Check: build (3.10)
- GitHub Check: build
PR Type
Bug fix, Enhancement
Description
Fixed DIA-NN PTM parsing bug with improved modification handling
Refactored PTM conversion logic into reusable
get_mod()functionAdded support for Met-loss modification with
--met-excisionflagMerged same modifications across different sites for cleaner config
Updated sdrf-pipelines dependency to version 0.0.33
Added test case for dianncfg command
Diagram Walkthrough
File Walkthrough
__init__.py
Version bump to 0.0.24quantmsutils/init.py
dianncfg.py
Refactor PTM parsing and fix Met-loss handlingquantmsutils/diann/dianncfg.py
get_mod()function forreusability
UnimodDatabase()initialization to module level--met-excisionflag
using
defaultdictdefaultdictfrom collectionstest_commands.py
Add dianncfg command test casetests/test_commands.py
test_dianncfg_example()to validate dianncfgcommand
parameters
environment.yml
Pin sdrf-pipelines to version 0.0.33environment.yml
pyproject.toml
Pin sdrf-pipelines to version 0.0.33pyproject.toml
requirements.txt
Pin sdrf-pipelines to version 0.0.33requirements.txt
Summary by CodeRabbit
Chores
Improvements
Tests
✏️ Tip: You can customize this high-level summary in your review settings.