-
Notifications
You must be signed in to change notification settings - Fork 4
Update changes in PTM diann handling #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 33 commits
96d2fe1
232e9ec
65fbeb8
1a7b394
c66ff69
1b54714
4557f07
b7cdfe7
5f04fe4
d76f991
9feb130
08715f3
a88d479
6e14a1b
5008129
51f8610
2a03fcc
4c436ad
17ee180
742695d
88989e2
af8aa49
cd28956
a1dc5ac
465f606
78c988d
2deda03
e5310ef
372dbeb
106abbf
00008df
7adceb5
6255ef9
a9251f7
4631f79
c82a93f
8aea86b
9fdb40c
1b0fe77
4128e98
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1 @@ | ||
| __version__ = "0.0.23" | ||
| __version__ = "0.0.24" |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -7,13 +7,37 @@ | |||||||||||||||||||||||||||||||||||||||||
| import logging | ||||||||||||||||||||||||||||||||||||||||||
| import re | ||||||||||||||||||||||||||||||||||||||||||
| from typing import List, Tuple | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| from collections import defaultdict | ||||||||||||||||||||||||||||||||||||||||||
| import click | ||||||||||||||||||||||||||||||||||||||||||
| from sdrf_pipelines.openms.unimod import UnimodDatabase | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| logging.basicConfig(format="%(asctime)s [%(funcName)s] - %(message)s", level=logging.DEBUG) | ||||||||||||||||||||||||||||||||||||||||||
| logger = logging.getLogger(__name__) | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| # Lazy initialization of UnimodDatabase for improved testability. | ||||||||||||||||||||||||||||||||||||||||||
| # The database is created on first access rather than at module import time, | ||||||||||||||||||||||||||||||||||||||||||
| # which allows tests to mock or replace it more easily. | ||||||||||||||||||||||||||||||||||||||||||
| _unimod_database = None | ||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+17
to
+20
|
||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| def get_unimod_database(): | ||||||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||||||
| Get the UnimodDatabase instance, creating it lazily on first access. | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| This pattern improves testability by avoiding database initialization at module | ||||||||||||||||||||||||||||||||||||||||||
| import time. For testing purposes, the internal _unimod_database variable can be | ||||||||||||||||||||||||||||||||||||||||||
| set to None to force re-initialization on the next call. | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| :return: The UnimodDatabase instance. | ||||||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||||||
| global _unimod_database | ||||||||||||||||||||||||||||||||||||||||||
| if _unimod_database is None: | ||||||||||||||||||||||||||||||||||||||||||
| _unimod_database = UnimodDatabase() | ||||||||||||||||||||||||||||||||||||||||||
| return _unimod_database | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| # Met-loss modification constant (UniMod:765) with mass shift and site specification | ||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||
| # Met-loss modification constant (UniMod:765) with mass shift and site specification | |
| # Met-loss modification constant for DIA-NN. | |
| # Format: "UniMod:765,-131.040485,*nM" | |
| # - UniMod:765: UniMod accession for N-terminal methionine excision (Met-loss) | |
| # - -131.040485: Mass shift corresponding to Met-loss | |
| # - *nM: Site specification (N-terminal Methionine) | |
| # This format matches DIA-NN's expected modification string. When this exact string is encountered | |
| # in the variable modifications list, the code triggers DIA-NN's special handling for Met-loss by | |
| # using the '--met-excision' flag instead of passing it as a generic modification. |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exact string comparison with MET_LOSS_MODIFICATION is fragile. If multiple Met-loss modifications with the same site are specified in the input, or if the site ordering changes after sorting at line 167, this comparison may fail. For example, if the merged result has sites in a different order or duplicates, it won't match the constant. Consider comparing just the modification accession (e.g., checking if the mod string starts with "UniMod:765,") rather than requiring an exact match with the site.
| if mod == MET_LOSS_MODIFICATION: | |
| if mod.startswith("UniMod:765,"): |
daichengxin marked this conversation as resolved.
Show resolved
Hide resolved
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The regex pattern is compiled inside the get_mod() function, which means it will be recompiled for every modification processed. Since this function is called in a loop (lines 151, 162), consider moving the pattern compilation to module level (similar to how MET_LOSS_MODIFICATION is defined at line 40) to improve performance when processing multiple modifications.
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The modification name comparison uses mod.split(" ")[0] which assumes the modification string format is "ModName (Site)". However, this is fragile - if the input string has leading/trailing whitespace or inconsistent formatting, the comparison will fail. Consider stripping whitespace and adding validation for the expected format, or document the expected input format more clearly in the docstring.
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Accessing the private attribute modification._delta_mono_mass (with leading underscore) breaks encapsulation and could be fragile if the underlying library changes. Consider using a public method if available, or document why direct access to this private attribute is necessary.
| diann_mod_accession = modification.get_accession().replace("UNIMOD:", "UniMod:") + "," + str(modification._delta_mono_mass) | |
| # Prefer public accessor for mono mass if available; fallback to private attribute with documentation. | |
| try: | |
| # Try to use a public method/property if available | |
| mono_mass = modification.get_delta_mono_mass() | |
| except AttributeError: | |
| # No public accessor; document why direct access is necessary | |
| # Direct access to _delta_mono_mass is required as no public method is available. | |
| # This may be fragile if the underlying library changes. | |
| mono_mass = modification._delta_mono_mass | |
| diann_mod_accession = modification.get_accession().replace("UNIMOD:", "UniMod:") + "," + str(mono_mass) |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using modification_found = 0 and modification_found = 1 as boolean flags is not idiomatic Python. Consider using modification_found = False and modification_found = True instead, or better yet, eliminate the flag by restructuring the logic (e.g., initialize diann_mod_accession = None and check if diann_mod_accession is None: at line 99).
| modification_found = 0 | |
| diann_mod_accession = None | |
| diann_mod_name = None | |
| for modification in get_unimod_database().modifications: | |
| if modification.get_name() == mod.split(" ")[0]: | |
| diann_mod_accession = modification.get_accession().replace("UNIMOD:", "UniMod:") + "," + str(modification._delta_mono_mass) | |
| diann_mod_name = modification.get_name() | |
| modification_found = 1 | |
| break | |
| if modification_found == 0: | |
| diann_mod_accession = None | |
| diann_mod_name = None | |
| for modification in get_unimod_database().modifications: | |
| if modification.get_name() == mod.split(" ")[0]: | |
| diann_mod_accession = modification.get_accession().replace("UNIMOD:", "UniMod:") + "," + str(modification._delta_mono_mass) | |
| diann_mod_name = modification.get_name() | |
| break | |
| if diann_mod_accession is None: |
ypriverol marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using exit(1) directly is not ideal in library code as it terminates the entire Python process. Consider raising a custom exception (e.g., ValueError or a custom ModificationNotSupportedError) instead, allowing callers to handle the error appropriately. This pattern appears in lines 71, 85, and 110.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot open a new pull request to apply changes based on this feedback
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using exit(1) directly is not a best practice in library code, as it terminates the entire Python process and prevents calling code from handling the error. Consider raising an appropriate exception (e.g., ValueError or a custom exception) instead, allowing the CLI command handler to catch it and exit with the appropriate code. This pattern applies to all exit(1) calls in this file (lines 104, 118, 125).
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using exit(1) directly is not ideal in library code as it terminates the entire Python process. Consider raising a custom exception (e.g., ValueError or a custom ModificationNotSupportedError) instead, allowing callers to handle the error appropriately.
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using exit(1) directly is not ideal in library code as it terminates the entire Python process. Consider raising a custom exception (e.g., ValueError or a custom ModificationNotSupportedError) instead, allowing callers to handle the error appropriately.
ypriverol marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
ypriverol marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check if fix_mod: will evaluate to False for both None and empty string "", but if fix_mod is None, the call to fix_mod.split(",") at line 151 will raise an AttributeError. Consider explicitly checking for None or ensuring that the parameters always receive a string value (empty string for no modifications). The same issue exists for var_mod at line 160.
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The site merging logic concatenates sites without a delimiter (e.g., "STY" for phosphorylation on S, T, and Y). However, if a site is multi-character (like "*n" for Protein N-term), this concatenation could produce ambiguous results. For example, "*n" + "K" becomes "*nK" which could be interpreted as "Protein N-term K" (a two-part site from lines 131-138) rather than two separate sites. Consider documenting this behavior or adding validation to ensure sites don't create ambiguous concatenations.
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment "merge same modification for different sites" could be more descriptive. Consider explaining the purpose and format of the merge. For example: "Merge same modification for different sites by concatenating sorted unique sites. E.g., 'Phospho,S' and 'Phospho,T' become 'Phospho,ST'". This helps developers understand the expected output format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trim whitespace and align type hints in convert_mod to avoid subtle CLI failures
convert_mod now correctly uses truthiness checks, so None from click options is safe. However:
- When users pass values like
"Oxidation (M), Carbamidomethyl (C)"(note the space after the comma),fix_mod.split(",")produces entries with leading spaces. Becauseget_modusesmod.split(" ")with an explicit separator, the leading space yields an empty first token, and the Unimod name lookup fails, causing an error/exit even though the modification name is valid. - The function signature still declares
fix_mod: str, var_mod: str, but at runtime it acceptsNone(from click), so the annotation is misleading.
You can harden the parsing and align the annotations like this:
-def convert_mod(fix_mod: str, var_mod: str) -> Tuple[List, List]:
+def convert_mod(fix_mod: str | None, var_mod: str | None) -> Tuple[List[str], List[str]]:
@@
- if fix_mod:
- merged = defaultdict(list)
- for mod in fix_mod.split(","):
- diann_mod, site = get_mod(mod, "fixed_mod")
- merged[diann_mod].append(site)
+ if fix_mod:
+ merged = defaultdict(list)
+ for raw_mod in fix_mod.split(","):
+ mod = raw_mod.strip()
+ if not mod:
+ continue
+ diann_mod, site = get_mod(mod, "fixed_mod")
+ merged[diann_mod].append(site)
@@
- if var_mod:
- merged = defaultdict(list)
- for mod in var_mod.split(","):
- diann_mod, site = get_mod(mod, "var_mod")
- merged[diann_mod].append(site)
+ if var_mod:
+ merged = defaultdict(list)
+ for raw_mod in var_mod.split(","):
+ mod = raw_mod.strip()
+ if not mod:
+ continue
+ diann_mod, site = get_mod(mod, "var_mod")
+ merged[diann_mod].append(site)And update the import to include the newer union syntax or Optional as appropriate for your minimum Python version (e.g., from typing import List, Tuple, Optional and Optional[str] if you need pre-3.10 support).
This keeps the CLI robust against common spacing patterns and documents the actual accepted types.
🤖 Prompt for AI Agents
In quantmsutils/diann/dianncfg.py around lines 145–167, tighten parsing and fix
annotations: change the signature to accept optional strings and precise return
types (e.g., def convert_mod(fix_mod: Optional[str], var_mod: Optional[str]) ->
Tuple[List[str], List[str]] and add Optional to imports), and when iterating
split values trim whitespace and ignore empty tokens (e.g., iterate over
(token.strip() for token in fix_mod.split(",")) and only call get_mod on
non-empty tokens) so entries like "Oxidation (M), Carbamidomethyl (C)" parse
correctly; keep the existing merged logic and identical fixes for var_mod.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,6 @@ | ||
| click | ||
| sdrf-pipelines>=0.0.31 | ||
| pyopenms>=3.2.0 | ||
| sdrf-pipelines==0.0.33 | ||
| pyopenms>=3.3.0 | ||
coderabbitai[bot] marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| pandas | ||
| pyarrow>=16.1.0 | ||
| scipy | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -84,7 +84,20 @@ def test_diann2mztab_example(self): | |
| ] | ||
| result = run_cli_command("diann2mztab", args) | ||
| assert result.exit_code == 0 | ||
| # Additional assertions could check for expected output files | ||
|
|
||
| def test_dianncfg_example(self): | ||
| """Test generating the DIA-NN config with example data""" | ||
| args = [ | ||
| "--enzyme", | ||
| "Trypsin", | ||
| "--fix_mod", | ||
| "Carbamidomethyl (C)", | ||
| "--var_mod", | ||
| "Oxidation (M),Phospho (S),Phospho (T),Phospho (Y),Acetyl (Protein N-term),Acetyl (K),Acetyl (R),Met-loss (Protein N-term M)", | ||
| ] | ||
| result = run_cli_command("dianncfg", args) | ||
|
|
||
| assert result.exit_code == 0 | ||
|
Comment on lines
+88
to
+100
|
||
|
|
||
|
|
||
| class TestSamplesheetCommands: | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.