[datafusion-spark] Add Spark-compatible ceil function#20593
[datafusion-spark] Add Spark-compatible ceil function#20593shivbhatia10 wants to merge 23 commits intoapache:mainfrom
Conversation
comphead
left a comment
There was a problem hiding this comment.
Thanks @shivbhatia10 it would be nice having Spark .slt tests covering edge cases and datatypes as well as commenting in file body what is the difference between Spark and current DF implementation
|
Hey @comphead, I've added some |
|
Thanks @shivbhatia10 would you mind providing an example where DF ceil and Spark ceil return diff result? I'm still struggling to understand the difference between them |
|
Taking a look @shivbhatia10 |
| export_functions!(( | ||
| ceil, | ||
| "Returns the smallest integer not less than expr.", | ||
| arg1 |
There was a problem hiding this comment.
| arg1 | |
| @config arg1 |
| DataType::Float64 => Arc::new( | ||
| input | ||
| .as_primitive::<Float64Type>() | ||
| .unary::<_, Int64Type>(|x| x.ceil() as i64), |
There was a problem hiding this comment.
Could add an inline function .unary::<_, Int64Type>(|x| x.ceil() as i64) for both float inputs so that we dont repeat ourselves
There was a problem hiding this comment.
I tried this and couldn't figure out a way to make it cleaner than what we have now, I'd prefer to keep it as is unless there's a solution I'm missing which is very possible
| DataType::Float32 => Arc::new( | ||
| input | ||
| .as_primitive::<Float32Type>() | ||
| .unary::<_, Int64Type>(|x| x.ceil() as i64), |
There was a problem hiding this comment.
Could add an inline function .unary::<_, Int64Type>(|x| x.ceil() as i64) for both float inputs so that we dont repeat ourselves
| } | ||
| DataType::Decimal128(p, s) => Ok(DataType::Decimal128(*p, *s)), | ||
| _ => Ok(DataType::Int64), | ||
| } |
There was a problem hiding this comment.
Also recommend using return_field_from_args instead of return_type for better type control during planning phase
There was a problem hiding this comment.
In this UDF I don't think it's strictly necessary, I believe we can infer the output type from arg_types, unless it's good practice in general to prefer return_field_from_args?
|
@shivbhatia10 , thank you for the PR . I have left some comments on the PR to make implementation a little more safe and concretize edge case handling. Please take a look whenever you get a chance and I can review it again |
|
Hi @coderfender, thank you very much for the comprehensive review here, I've tried to address your comments, please let me know if there's anything else I should add. |
Which issue does this PR close?
Part of #15914 and apache/datafusion-comet#1704
I also just noticed #15916 butI believe work has been stale on this one.
Rationale for this change
Helping to continue adding Spark compatible expressions to datafusion-spark.
What changes are included in this PR?
Add new
ceilfunction.Are these changes tested?
Yes, unit tests.
Are there any user-facing changes?
No.