-
Notifications
You must be signed in to change notification settings - Fork 120
feat!: Add optional stats field to remove action #1390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: Add optional stats field to remove action #1390
Conversation
08fdb5c to
d959169
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1390 +/- ##
==========================================
- Coverage 84.88% 84.85% -0.03%
==========================================
Files 119 119
Lines 30987 31031 +44
Branches 30987 31031 +44
==========================================
+ Hits 26303 26332 +29
- Misses 3415 3424 +9
- Partials 1269 1275 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
scovich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Please also run+fix the examples (inspect-table is already broken from previous schema changes that weren't incorporated, #1372)
d959169 to
05176bf
Compare
05176bf to
7637fb9
Compare
## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/1281/files) to review incremental changes. - [**stack/with_data_change**](#1281) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1281/files)] - [stack/add_stats_to_remove](#1390) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1390/files/a9159f03f6260388d7bbf9f4fd2ca06f8db21f8f..7637fb90686733d81815f4fae5f9e47880f55a4f)] - [stack/remove_files](#1353) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1353/files/7637fb90686733d81815f4fae5f9e47880f55a4f..94e2108cbea608b98284c634807b4f265cddfa29)] --------- This PR adds a with_data_change method to apply whether a transaction represents a data change to all the file additions (and in the future removals). This helps prevents clients from making accidentally mixing files marked with dataChange = true or false, and is in the long term the direction we want to go in. In the future we will also like want to physically record dataChange at the commit level, and move away from per file denotation. Some implications of this change: * The schema of add_files is now dependent on whether this flag is sent (this is to allow connectors to maintain backwards compatibility). * Change the default engine to no longer pass through dataChange at the file level but instead require using the new setter. Testing: 1. Existing FFI tests verify backwards compatibility. 2. Write tests now call this method with the default engine and no change of output happens. 3. An additional test is added to verify setting the flag to false, writes the appropriate value on add actions. BREAKING CHANGE: Engines must use with_data_change on the transaction level instead of passing it to the method. `add_files_schema` is moved to be scoped on a the transaction.
7637fb9 to
9317082
Compare
## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/1281/files) to review incremental changes. - [**stack/with_data_change**](delta-io#1281) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1281/files)] - [stack/add_stats_to_remove](delta-io#1390) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1390/files/a9159f03f6260388d7bbf9f4fd2ca06f8db21f8f..7637fb90686733d81815f4fae5f9e47880f55a4f)] - [stack/remove_files](delta-io#1353) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1353/files/7637fb90686733d81815f4fae5f9e47880f55a4f..94e2108cbea608b98284c634807b4f265cddfa29)] --------- This PR adds a with_data_change method to apply whether a transaction represents a data change to all the file additions (and in the future removals). This helps prevents clients from making accidentally mixing files marked with dataChange = true or false, and is in the long term the direction we want to go in. In the future we will also like want to physically record dataChange at the commit level, and move away from per file denotation. Some implications of this change: * The schema of add_files is now dependent on whether this flag is sent (this is to allow connectors to maintain backwards compatibility). * Change the default engine to no longer pass through dataChange at the file level but instead require using the new setter. Testing: 1. Existing FFI tests verify backwards compatibility. 2. Write tests now call this method with the default engine and no change of output happens. 3. An additional test is added to verify setting the flag to false, writes the appropriate value on add actions. BREAKING CHANGE: Engines must use with_data_change on the transaction level instead of passing it to the method. `add_files_schema` is moved to be scoped on a the transaction.
nicklan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks fine, but I think there's a small bug
| let base_row_id: Option<i64> = getters[12].get_opt(row_index, "remove.baseRowId")?; | ||
| let default_row_commit_version: Option<i64> = | ||
| getters[13].get_opt(row_index, "remove.defaultRowCommitVersion")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These have to bump up too right?
| let base_row_id: Option<i64> = getters[13].get_opt(row_index, "remove.baseRowId")?; | |
| let default_row_commit_version: Option<i64> = | |
| getters[14].get_opt(row_index, "remove.defaultRowCommitVersion")?; |
Probably indicates we have a lack of test coverage for these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed and added full test that should catch this in the future.
9317082 to
c89bfcd
Compare
nicklan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for adding the test. lgtm!
|
@scovich at least with current CI coverage on examples this PR passes. Was there a particular command invocation that you are concerned about? If not would you mind stamping? |
There are no removes in the table CI ran over, so indexing errors in the remove visitor and any actions checked after that will not be exercised. But the example was already broken before this change, so we should prob just fix it separately. |
scovich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
#1451 to track this. |
|
Thanks for the reviews merging. |
🥞 Stacked PR
Use this link to review incremental changes.
An optional stats field is part of the remove schema. This PR adds it here. While utility of stats can be limited it is important to propagate for remove_actions in transactions to appropriately record number of records removed on a remove action. It is likely also important to not drop this field when doing things like log_compaction.
BREAKING CHANGE: Adds a field in the middle of a the remove_file schema.