Skip to content

Conversation

@Simlomb
Copy link
Contributor

@Simlomb Simlomb commented Nov 17, 2025

bug: Fix field removal and blanking to clean up child UID references

In the DICOM anonymization workflow, the parser maintains an internal lookup (self.fields) that maps unique identifiers (UIDs) to DICOM fields. These UIDs represent both top-level tags and nested tags (such as those inside sequences).

Before the fix:

  • When you remove or blank a parent sequence tag (for example, a tag representing a DICOM sequence), the actual DICOM data structure deletes or blanks the parent tag and all its child elements.
  • However, the internal lookup (self.fields) still retained entries for the child tags that were previously nested under the parent sequence.
  • This means that, although the DICOM data no longer contained those child tags, the parser’s internal cache still referenced them.
  • If a subsequent recipe action (such as REPLACE, BLANK, REMOVE, etc.) tried to operate on one of these child tags, the parser would find the UID in its internal lookup and attempt to perform the action.
  • When you remove or blank a parent sequence tag (for example, a tag representing a DICOM sequence), the actual DICOM data structure deletes or blanks the parent tag and all its child elements.
  • Since the actual DICOM data no longer contained the child tag, this would result in error.

After the fix:

  • When you remove or blank a parent sequence tag, all child UIDs nested under that parent are also removed from self.fields.
  • Subsequent actions do not find stale child references, preventing errors and ensuring consistency.

Summary of PR:

  • Ensure child field UIDs are removed from internal lookup when parent sequence is deleted or blanked
  • Update tests in test_action_interaction.py to show this behavior for blanked and removed sequences
  • Prevent stale references and errors in subsequent recipe actions

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • My code follows the style guidelines of this project

- Ensure child field UIDs are removed from internal lookup when parent sequence is deleted or blanked
- Update tests in test_action_interaction.py to show this behavior for blanked and removed sequences
- Prevent stale references and errors in subsequent recipe actions

Before the fix:

- If you removed or blanked a parent sequence tag, the internal field lookup (self.fields) still contained UIDs for child tags.
- Subsequent recipe actions targeting child tags could cause errors (KeyError, IndexError) or operate on stale references.

After the fix:

- When you remove or blank a parent sequence tag, all child UIDs nested under that parent are also removed from self.fields.
- Subsequent actions do not find stale child references, preventing errors and ensuring consistency.
@Simlomb
Copy link
Contributor Author

Simlomb commented Nov 17, 2025

@vsoch while using the tool on some data, I noticed that when one remove or blank a parent sequence tag (for example, a tag representing a DICOM sequence), the actual DICOM data structure deletes or blanks the parent tag and all its child elements, but the internal lookup (self.fields) still retained entries for the child tags that were previously nested under the parent sequence. This caused errors in the deidentification process if the child tag was also included in the recipe with another action. I fixed it in the fix proposed in this PR.
Please let me know if you have any comment on it.

@Simlomb
Copy link
Contributor Author

Simlomb commented Nov 21, 2025

@vsoch do you have any comments on this PR?

Copy link
Member

@vsoch vsoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good - with one question.

Apologies for delay - I was traveling since last Saturday and got back last night!

child_uids = [
uid for uid in self.fields if uid.startswith(field_uid_prefix)
]
for child_uid in child_uids:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments. When we blank a field, it technically is still there (just blank) so why would we remove it (and child fields) here? Also, if there is shared logic between these functions to derive and yield child fields, we might instead have a shared function:

for child_uid in self.yield_child_fields(field):
    # do action(s)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments. When we blank a field, it technically is still there (just blank) so why would we remove it (and child fields) here?

When a parent field is blanked that tag has zero length, hence there are no more child tags. The error raised because the uid lookup still listed those nested tags with other actions, even when they did not exist anymore in the dicom file. In those new lines we do not remove the tags from the dicom files, but just from the field lookup.

Also, if there is shared logic between these functions to derive and yield child fields, we might instead have a shared function:

for child_uid in self.yield_child_fields(field):
    # do action(s)

Thanks for catching that, I added the new method. I also bumped the version and changelog. Let me know if you have any other comments.

@Simlomb
Copy link
Contributor Author

Simlomb commented Nov 24, 2025

This looks good - with one question.

Apologies for delay - I was traveling since last Saturday and got back last night!

No problem at all, thanks for reviewing it!

@Simlomb
Copy link
Contributor Author

Simlomb commented Nov 27, 2025

@vsoch would you have any other comment on the PR?

It is used to find and remove all child fields when blanking or deleting a parent field (e.g., a sequence).
"""
field_uid_prefix = field.uid + "__"
child_uids = [uid for uid in self.fields if uid.startswith(field_uid_prefix)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you want to do yield you would do like:

for uid in self.fields:
    if uid.startswith(field_uid_prefix):
        yield uid

Copy link
Member

@vsoch vsoch Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you have it now, that would be akin to get_child_fields. The reason for a yield is that you can imagine wanting to go through and stop early (and not need to continue yielding).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants