Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions kernel/src/kernel_predicates/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -842,6 +842,33 @@ pub trait DataSkippingPredicateEvaluator {
inverted: bool,
) -> Option<Self::Output> {
let max = self.get_max_stat(col, &val.data_type())?;

// Delta Lake min/max stats are stored with millisecond precision (truncated, not rounded up), so we can't use direct comparison.
// For max stats comparison, we adjust timestamp values by subtracting 999 microseconds from the value to ensure
// that comparisons against max stats are correct. Any rows that pass this filter will be re-evaluated later for exact matches.
// See:
// - https://github.com/delta-io/delta-kernel-rs/pull/1003
if matches!(val, Scalar::Timestamp(_) | Scalar::TimestampNtz(_)) {
match (ord, inverted) {
// col > val => stats.max.col > val
// NOT(col < val) => NOT(stats.max.col < val)
(Ordering::Greater, false) | (Ordering::Less, true) => {
let max_ts_adjusted = timestamp_subtract(val, 999);
tracing::debug!(
"Adjusted timestamp value for col {col} for max stat comparison from {val:?} to {max_ts_adjusted:?}"
);
return self.eval_partial_cmp(ord, max, &max_ts_adjusted, inverted);
Comment on lines +856 to +860
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this always be subtracting the val?

Suppose the max is 2500, but was truncated to 2000.
Suppose that value is 2250. => adjusted to 1251 

for expression:     value < max
actual:             2250  < 2500 == true
adjusted/truncated: 1251  < 2000 == true.
=> correctly not filtered

for expression:     value > max
actual:             2250  > 2500 == false
adjusted/truncated: 1251  > 2000 == false
=> correctly filtered


now suppose value is 2750 => adjusted to 1751

for expression:     value < max:
actual:             2750  < 2500 == false
adjusted/truncated: 1751  < 2000 == true 
=> not filtered (safe)

for expression:     value > max:
actual:             2750  > 2500 == true
adjusted/truncated: 1751  > 2000 == false 
=> filtered (not safe)❗ ❗ 

Am I missing something here? Could you add some extra details/context in comments to serve as a proof?

Copy link
Collaborator

@OussamaSaoudi OussamaSaoudi Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is that we're never doing a comparison value > max since we only ever want to check value <= max => ! (value < max).

EDIT: should be value <= max => ! (value > max).

Copy link
Author

@sgrebnov sgrebnov Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OussamaSaoudi - thank you for the deep dive and review! Yeah, I was thinking about the same when working on the fix - we only do max > value or !(max < value), decreasing val helps keep more records in both cases. Please let me know if you would like me to be more specific and add match for ord=Greater, inverted=false and ord=Less, inverted=true only

image

Other reasoning I used:

  1. If min is always truncated to lower value and we don't do special logic than the same should work with max increased (but we increase max by decreasing val used for comparison).
  2. if we can increase max, for example have (max + 999) and this is valid (keep more records): (max + 999) operator val then (max + 999-999) operator val-999 is valid as well so we have our current logic (max operator val-999 case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me know if you would like me to be more specific and add match for ord=Greater, inverted=false and ord=Less, inverted=true only

Good idea! Let's handle by cases.

  1. max > value or max >= value. Represented by: (ord=Greater, inverted=false) and (ord=Less, inverted=true)
    • Proof: If max >= value + 999, it must be the case that max >= value
  2. max < value or max <= value. (ord=Less, inverted=false) and (ord=Greater, inverted=true)
    • Proof: if max <= value - 999 , then it must be the case that max <= value
    • We can adjust this to max + 999 <= value to avoid underflow

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @scovich Would love your eyes on this.

Copy link
Collaborator

@scovich scovich Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, we only have four rewrites for inequalities:

  • col < val => stats.min.col < val
  • NOT(col < val) => NOT(stats.max.col < val)
  • col > val => stats.max.col > val
  • NOT(col > val) => NOT(stats.min.col > val)

If I understand correctly, the two involving max stats would be changed to:

  • col > val => stats.max.col > val - 999
  • NOT(col < val) => NOT(stats.max.col < val - 999)

And because this is data skipping, we only care whether the new rewrite might produce a FALSE where the old rewrite did not produce FALSE. Because that corresponds to wrongly skipping a file.

For the first: if stats.max.col > val - 999 is FALSE, then the max-value is "too small" and stats.max.col > val must also return FALSE.

For the second, let's simplify a bit by pushing down the NOT:

  • NOT(stats.max.col < val - 999) => stats.max.col >= val - 999

If stats.max.col >= val - 999 is FALSE, then the max-value is again "too small" and stats.max.col >= val must also return FALSE.

AFAICT, the rewrite is sound, because any time it returns FALSE the original predicate also returned FALSE.

}
// // The following case is not currently used but included for completeness and to ensure correctness in the future if logic is changed to use it.
// !(max > val) or max < val
(Ordering::Greater, true) | (Ordering::Less, false) => {
return self.eval_partial_cmp(ord, max, &val, inverted);
}
// Equality comparison can't be applied as max stats is truncated to milliseconds, so actual microsecond value is unknown.
(Ordering::Equal, _) => return None,
}
}

self.eval_partial_cmp(ord, max, val, inverted)
}

Expand Down Expand Up @@ -986,3 +1013,12 @@ impl<T: DataSkippingPredicateEvaluator + ?Sized> KernelPredicateEvaluator for T
self.finish_eval_pred_junction(op, preds, inverted)
}
}

/// Adjust timestamp value by subtracting the given interval in microseconds.
fn timestamp_subtract(val: &Scalar, interval_micros: i64) -> Scalar {
match val {
Scalar::Timestamp(ts) => Scalar::Timestamp(*ts - interval_micros),
Scalar::TimestampNtz(ts) => Scalar::TimestampNtz(*ts - interval_micros),
_ => val.clone(),
}
}
10 changes: 2 additions & 8 deletions kernel/src/scan/data_skipping.rs
Original file line number Diff line number Diff line change
Expand Up @@ -218,14 +218,8 @@ impl DataSkippingPredicateEvaluator for DataSkippingPredicateCreator {
Some(joined_column_expr!("minValues", col))
}

/// Retrieves the maximum value of a column, if it exists and has the requested type.
// TODO(#1002): we currently don't support file skipping on timestamp columns' max stat since
// they are truncated to milliseconds in add.stats.
fn get_max_stat(&self, col: &ColumnName, data_type: &DataType) -> Option<Expr> {
match data_type {
&DataType::TIMESTAMP | &DataType::TIMESTAMP_NTZ => None,
_ => Some(joined_column_expr!("maxValues", col)),
}
fn get_max_stat(&self, col: &ColumnName, _data_type: &DataType) -> Option<Expr> {
Some(joined_column_expr!("maxValues", col))
}

/// Retrieves the null count of a column, if it exists.
Expand Down
45 changes: 7 additions & 38 deletions kernel/src/scan/data_skipping/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -340,62 +340,31 @@ fn test_sql_where() {
do_test(ALL_NULL, pred, MISSING, None, None);
}

// TODO(#1002): we currently don't support file skipping on timestamp columns' max stat since they
// are truncated to milliseconds in add.stats.
#[test]
fn test_timestamp_skipping_disabled() {
let creator = DataSkippingPredicateCreator;
let col = &column_name!("timestamp_col");

assert!(
creator.get_min_stat(col, &DataType::TIMESTAMP).is_some(),
"get_min_stat should return Some: allow data skipping on timestamp minValues"
);
assert_eq!(
creator.get_max_stat(col, &DataType::TIMESTAMP),
None,
"get_max_stat should return None: no data skipping on timestamp maxValues"
);
assert!(
creator
.get_min_stat(col, &DataType::TIMESTAMP_NTZ)
.is_some(),
"get_min_stat should return Some: allow data skipping on timestamp_ntz minValues"
);
assert_eq!(
creator.get_max_stat(col, &DataType::TIMESTAMP_NTZ),
None,
"get_max_stat should return None: no data skipping on timestamp_ntz maxValues"
);
}

// TODO(#1002): we currently don't support file skipping on timestamp columns' max stat since they
// are truncated to milliseconds in add.stats.
#[test]
fn test_timestamp_predicates_dont_data_skip() {
fn test_timestamp_predicates_data_skip() {
let col = &column_expr!("ts_col");
for timestamp in [&Scalar::Timestamp(1000000), &Scalar::TimestampNtz(1000000)] {
// LT will do minValues -> OK
// LT will do minValues
let pred = Pred::lt(col.clone(), timestamp.clone());
let skipping_pred = as_data_skipping_predicate(&pred);
assert_eq!(
skipping_pred.unwrap().to_string(),
"Column(minValues.ts_col) < 1000000"
);

// GT will do maxValues -> BLOCKED
// GT will adjust predicate value for maxValues comparison
let pred = Pred::gt(col.clone(), timestamp.clone());
let skipping_pred = as_data_skipping_predicate(&pred);
assert!(
skipping_pred.is_none(),
"Expected no data skipping for timestamp predicate: {pred:#?}, got {skipping_pred:#?}"
assert_eq!(
skipping_pred.unwrap().to_string(),
"Column(maxValues.ts_col) > 999001"
);

let pred = Pred::eq(col.clone(), timestamp.clone());
let skipping_pred = as_data_skipping_predicate(&pred);
assert_eq!(
skipping_pred.unwrap().to_string(),
"AND(NOT(Column(minValues.ts_col) > 1000000), null)"
"AND(NOT(Column(minValues.ts_col) > 1000000), NOT(Column(maxValues.ts_col) < 999001))"
);

let pred = Pred::ne(col.clone(), timestamp.clone());
Expand Down