Skip to content

Conversation

@emkornfield
Copy link
Collaborator

@emkornfield emkornfield commented Oct 28, 2025

🥞 Stacked PR

Use this link to review incremental changes.


This PR adds the update_deletion_vectors API to the Transaction type, enabling engines to update deletion vectors for existing files without rewriting data. This completes the write-side support for Delta Lake's deletion vector feature, allowing efficient row-level deletes and updates.

@github-actions github-actions bot added the breaking-change Change that require a major version bump label Oct 28, 2025
@emkornfield emkornfield changed the title add deletion vector APIs to transaction WIP NOT READY FOR REVIEW add deletion vector APIs to transaction Oct 28, 2025
@emkornfield emkornfield force-pushed the stack/dv_transaction branch 4 times, most recently from 6a65734 to 688fabc Compare November 4, 2025 01:28
@codecov
Copy link

codecov bot commented Nov 5, 2025

Codecov Report

❌ Patch coverage is 90.96774% with 42 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.13%. Comparing base (cfb3320) to head (33e1f56).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
kernel/src/transaction/mod.rs 90.80% 7 Missing and 25 partials ⚠️
test-utils/src/lib.rs 89.23% 1 Missing and 6 partials ⚠️
...src/engine/arrow_expression/evaluate_expression.rs 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1430      +/-   ##
==========================================
+ Coverage   84.99%   85.13%   +0.14%     
==========================================
  Files         120      120              
  Lines       31662    32259     +597     
  Branches    31662    32259     +597     
==========================================
+ Hits        26911    27465     +554     
- Misses       3441     3450       +9     
- Partials     1310     1344      +34     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

.with_dropped_field("modificationTime"),
);
let expr = Expression::struct_from([transform]);
Ok(remove_files_metadata.map(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewer, this is a formatting change because of not reference self.remove_file_metadata)

)
.with_dropped_field(FILE_CONSTANT_VALUES_NAME)
.with_dropped_field("modificationTime");
for column_to_drop in columns_to_drop {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewer this is an actual change.


// Step 4: Create deletion vectors marking rows 2, 5, and 7 as deleted
let mut dv = KernelDeletionVector::new();
dv.add_deleted_row_indexes([2, 5, 7]);
Copy link
Collaborator Author

@emkornfield emkornfield Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updates for two files:

  1. Apply this to only the first file (with updated indices).
  2. Verify deleted rows
  3. Delete one more row from the same file, and delete some rows from the second file.
  4. verify deleted rows again.

.transaction(Box::new(FileSystemCommitter::new()))?
.with_dv_update();
let write_context = temp_txn.get_write_context();
let dv_path = write_context.new_deletion_vector_path(String::from(""));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a parameter comment that the prefix is empty.

@emkornfield emkornfield force-pushed the stack/dv_transaction branch 3 times, most recently from 49d2e7a to 7747581 Compare November 6, 2025 18:54
@emkornfield emkornfield changed the title WIP NOT READY FOR REVIEW add deletion vector APIs to transaction add deletion vector APIs to transaction Nov 6, 2025
@emkornfield emkornfield requested a review from nicklan November 6, 2025 18:58
@emkornfield emkornfield marked this pull request as ready for review November 6, 2025 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change Change that require a major version bump

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants