Skip to content

Conversation

huashi-st
Copy link
Contributor

@huashi-st huashi-st commented Jun 6, 2025

Summary

Fix binary reader String handling in Arrays and Map (e.g., Array(String), Map(String, String)).

image

@pan3793
Copy link
Collaborator

pan3793 commented Jun 6, 2025

possible to add test cases?

@huashi-st
Copy link
Contributor Author

I’m encountering Docker setup failures when running ./gradlew clean test. Also, I couldn’t find any specific tests for ClickHouseBinaryReader in the codebase. I tested locally with Spark 3.5 and ClickHouse 24.10.2.80 using the following ClickHouse column type query in Spark (see snapshot below).

Array(String)

image image

Array(Int)

image image

Array(Array(String))

image image

Map(String, String)

image image

Map(String, Array(String))

image image

val arrayValue = value.asInstanceOf[ClickHouseArraySequence]
val convertedArray = Array.tabulate(arrayValue.length) { i =>
decodeValue(
arrayValue.getValue(i, createClickHouseValue(null, _dataType)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createClickHouseValue(null, _dataType) for every element looks costly. does clickhouse-jave provide constant variables for default null values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getValue needs a corresponding ClickHouseValue container based on data type (e.g. ClickHouseIntegerValue, ClickHouseLongValue). Unfortunately, there's no factory method to create it based on input type. createClickHouseValue creates empty containers based on input data type.

case IntegerType => ClickHouseIntegerValue.of(if (isNull) 0 else rawValue.asInstanceOf[Int])
case LongType => ClickHouseLongValue.of(if (isNull) 0L else rawValue.asInstanceOf[Long])
case DoubleType => ClickHouseDoubleValue.of(if (isNull) 0.0 else rawValue.asInstanceOf[Double])
case FloatType => ClickHouseFloatValue.of(if (isNull) 0.0f else rawValue.asInstanceOf[Float])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, can we use the clickhouse-java defined default value variable instead of hardcoding it for each data type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one. will fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call ofNull/ofEmpty for each ClickHouse value type provided.

@pan3793
Copy link
Collaborator

pan3793 commented Jun 9, 2025

I’m encountering Docker setup failures ...

Not sure which failures you encountered, it works well on my local env, anyway, manually testing is also good.

@huashi-st huashi-st requested a review from pan3793 June 15, 2025 03:34
@huashi-st
Copy link
Contributor Author

@pan3793 would appreciate your review again.

@pan3793 pan3793 changed the title Binary Reader: Add String support for Array/Map reading Spark: Binary reader supports String for Array/Map reading Jun 19, 2025
@pan3793 pan3793 merged commit 72cf042 into ClickHouse:main Jun 19, 2025
29 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants