Skip to content

[datafusion-spark] Add Spark-compatible ceil function#20593

Open
shivbhatia10 wants to merge 23 commits intoapache:mainfrom
shivbhatia10:sb/datafusion-math-ceil
Open

[datafusion-spark] Add Spark-compatible ceil function#20593
shivbhatia10 wants to merge 23 commits intoapache:mainfrom
shivbhatia10:sb/datafusion-math-ceil

Conversation

@shivbhatia10
Copy link
Contributor

@shivbhatia10 shivbhatia10 commented Feb 27, 2026

Which issue does this PR close?

Part of #15914 and apache/datafusion-comet#1704

I also just noticed #15916 butI believe work has been stale on this one.

Rationale for this change

Helping to continue adding Spark compatible expressions to datafusion-spark.

What changes are included in this PR?

Add new ceil function.

Are these changes tested?

Yes, unit tests.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the spark label Feb 27, 2026
@shivbhatia10 shivbhatia10 changed the title Add ceil [datafusion-spark] Add Spark-compatible ceil function Feb 27, 2026
@shivbhatia10 shivbhatia10 marked this pull request as ready for review February 27, 2026 13:35
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shivbhatia10 it would be nice having Spark .slt tests covering edge cases and datatypes as well as commenting in file body what is the difference between Spark and current DF implementation

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Feb 28, 2026
@shivbhatia10
Copy link
Contributor Author

Hey @comphead, I've added some .slt tests and uncommented a few existing ones that seem have been there as placeholders for when we got this implementation, and I expanded on the comment explaining the differences between this and the DataFusion ceil function.

@comphead
Copy link
Contributor

comphead commented Mar 2, 2026

Thanks @shivbhatia10 would you mind providing an example where DF ceil and Spark ceil return diff result? I'm still struggling to understand the difference between them

@coderfender
Copy link
Contributor

Taking a look @shivbhatia10

export_functions!((
ceil,
"Returns the smallest integer not less than expr.",
arg1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
arg1
@config arg1

DataType::Float64 => Arc::new(
input
.as_primitive::<Float64Type>()
.unary::<_, Int64Type>(|x| x.ceil() as i64),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add an inline function .unary::<_, Int64Type>(|x| x.ceil() as i64) for both float inputs so that we dont repeat ourselves

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this and couldn't figure out a way to make it cleaner than what we have now, I'd prefer to keep it as is unless there's a solution I'm missing which is very possible

DataType::Float32 => Arc::new(
input
.as_primitive::<Float32Type>()
.unary::<_, Int64Type>(|x| x.ceil() as i64),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add an inline function .unary::<_, Int64Type>(|x| x.ceil() as i64) for both float inputs so that we dont repeat ourselves

}
DataType::Decimal128(p, s) => Ok(DataType::Decimal128(*p, *s)),
_ => Ok(DataType::Int64),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also recommend using return_field_from_args instead of return_type for better type control during planning phase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this UDF I don't think it's strictly necessary, I believe we can infer the output type from arg_types, unless it's good practice in general to prefer return_field_from_args?

@coderfender
Copy link
Contributor

@shivbhatia10 , thank you for the PR . I have left some comments on the PR to make implementation a little more safe and concretize edge case handling. Please take a look whenever you get a chance and I can review it again

@shivbhatia10
Copy link
Contributor Author

Hi @coderfender, thank you very much for the comprehensive review here, I've tried to address your comments, please let me know if there's anything else I should add.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants