-
-
Notifications
You must be signed in to change notification settings - Fork 55
bug: Fix field removal and blanking to clean up child UID references #293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Ensure child field UIDs are removed from internal lookup when parent sequence is deleted or blanked - Update tests in test_action_interaction.py to show this behavior for blanked and removed sequences - Prevent stale references and errors in subsequent recipe actions Before the fix: - If you removed or blanked a parent sequence tag, the internal field lookup (self.fields) still contained UIDs for child tags. - Subsequent recipe actions targeting child tags could cause errors (KeyError, IndexError) or operate on stale references. After the fix: - When you remove or blank a parent sequence tag, all child UIDs nested under that parent are also removed from self.fields. - Subsequent actions do not find stale child references, preventing errors and ensuring consistency.
|
@vsoch while using the tool on some data, I noticed that when one remove or blank a parent sequence tag (for example, a tag representing a DICOM sequence), the actual DICOM data structure deletes or blanks the parent tag and all its child elements, but the internal lookup (self.fields) still retained entries for the child tags that were previously nested under the parent sequence. This caused errors in the deidentification process if the child tag was also included in the recipe with another action. I fixed it in the fix proposed in this PR. |
|
@vsoch do you have any comments on this PR? |
vsoch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good - with one question.
Apologies for delay - I was traveling since last Saturday and got back last night!
deid/dicom/parser.py
Outdated
| child_uids = [ | ||
| uid for uid in self.fields if uid.startswith(field_uid_prefix) | ||
| ] | ||
| for child_uid in child_uids: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments. When we blank a field, it technically is still there (just blank) so why would we remove it (and child fields) here? Also, if there is shared logic between these functions to derive and yield child fields, we might instead have a shared function:
for child_uid in self.yield_child_fields(field):
# do action(s)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments. When we blank a field, it technically is still there (just blank) so why would we remove it (and child fields) here?
When a parent field is blanked that tag has zero length, hence there are no more child tags. The error raised because the uid lookup still listed those nested tags with other actions, even when they did not exist anymore in the dicom file. In those new lines we do not remove the tags from the dicom files, but just from the field lookup.
Also, if there is shared logic between these functions to derive and yield child fields, we might instead have a shared function:
for child_uid in self.yield_child_fields(field): # do action(s)
Thanks for catching that, I added the new method. I also bumped the version and changelog. Let me know if you have any other comments.
No problem at all, thanks for reviewing it! |
|
@vsoch would you have any other comment on the PR? |
| It is used to find and remove all child fields when blanking or deleting a parent field (e.g., a sequence). | ||
| """ | ||
| field_uid_prefix = field.uid + "__" | ||
| child_uids = [uid for uid in self.fields if uid.startswith(field_uid_prefix)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you want to do yield you would do like:
for uid in self.fields:
if uid.startswith(field_uid_prefix):
yield uidThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you have it now, that would be akin to get_child_fields. The reason for a yield is that you can imagine wanting to go through and stop early (and not need to continue yielding).
bug: Fix field removal and blanking to clean up child UID references
In the DICOM anonymization workflow, the parser maintains an internal lookup (self.fields) that maps unique identifiers (UIDs) to DICOM fields. These UIDs represent both top-level tags and nested tags (such as those inside sequences).
Before the fix:
After the fix:
Summary of PR:
Checklist