-
Notifications
You must be signed in to change notification settings - Fork 9
MLH-1151 | optimise lineage calculation #5386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes lineage calculation performance in the DeleteHandlerV1 class by replacing a stream-based approach with a more efficient Gremlin traversal pattern. The optimization moves from materializing all edges and checking them in memory to using native graph database operations for filtering.
- Replaces
.project()and.toStream().anyMatch()pattern with direct Gremlin filtering - Uses native graph traversal operations (
hasId,not) instead of materializing results in memory - Maintains the same functional behavior while improving performance for lineage calculations
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| // Filter out current edge using Gremlin | ||
| traversal = traversal.where(not(hasId(currentEdge.getIdForDisplay()))); | ||
|
|
||
| for (String deletedEdgeId : RequestContext.get().getDeletedEdgesIdsForResetHasLineage()) { | ||
| traversal = traversal.where(not(hasId(deletedEdgeId))); | ||
| } |
Copilot
AI
Sep 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop modifies the traversal object repeatedly, which could create performance issues if there are many deleted edge IDs. Consider building a single filter condition or using a more efficient batch filtering approach to avoid multiple traversal modifications.
| // Filter out current edge using Gremlin | |
| traversal = traversal.where(not(hasId(currentEdge.getIdForDisplay()))); | |
| for (String deletedEdgeId : RequestContext.get().getDeletedEdgesIdsForResetHasLineage()) { | |
| traversal = traversal.where(not(hasId(deletedEdgeId))); | |
| } | |
| // Filter out current edge and all deleted edges using a single Gremlin filter | |
| Set<String> excludedEdgeIds = new HashSet<>(); | |
| excludedEdgeIds.add(currentEdge.getIdForDisplay()); | |
| excludedEdgeIds.addAll(RequestContext.get().getDeletedEdgesIdsForResetHasLineage()); | |
| traversal = traversal.where(not(hasId(excludedEdgeIds.toArray(new String[0])))); |
| // Complete the traversal with common operations | ||
|
|
||
| // Filter out current edge using Gremlin | ||
| traversal = traversal.where(not(hasId(currentEdge.getIdForDisplay()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont see this change behind FF @aarshi0301
Change description
This code is an alternative to the previous implementation with improved performance:
Testing Notes
Initial Lineage Flows
Table1 → ProcessA → Table2
Status: Table1: true | Table2: true | ProcessA: true
Table0, Table1 → ProcessA → Table2
Status: Table0: true | Table1: true | Table2: true | ProcessA: true
Deletion Scenarios
Delete Table2
Status: Table0: false | Table1: false | Table2: true | ProcessA: false
Table0 → Process0 → Table1 → ProcessA → Table2
Status: Table0: true | Table1: true | Process0: true | ProcessA: true | Table2: true
Delete Table2
Status: Table0: true | Process0: true | Table1: true | ProcessA: false | Table2: true
Table1 → ProcessA → Table2, Delete ProcessA
Status: Table1: false | Table2: false | ProcessA: true
Type of change
Related issues
Helm Config Changes for Running Tests (Staging PR)
Does this PR require Helm config changes for testing?
enpla9up36. (You can proceed with the PR.) ✅Checklists
Development
Security
Code review
Note
Optimizes lineage recalculation using Gremlin traversal with edge object-ID filtering and adds RequestContext support for deleted edge object IDs; also enables CI for an additional branch.
DeleteHandlerV1.updateAssetHasLineageStatusV2(...)to filter edges by object IDs and use Gremlin traversal (hasId(P.without(...)),outV().has(HAS_LINEAGE, true).limit(1).hasNext()) instead of streaming projections.deletedEdgesObjectIdsForResetHasLineagetracking withaddToDeletedEdgesObjectIdsForResetHasLineage(...)andgetDeletedEdgesObjectIdsForResetHasLineage(); clear it inclearCache().mindbodylineagein.github/workflows/maven.yml.Written by Cursor Bugbot for commit 865b2c7. This will update automatically on new commits. Configure here.