[feature][cp] feat: Add Spark to_json function#390
[feature][cp] feat: Add Spark to_json function#390markjin1990 merged 2 commits intobytedance:mainfrom
Conversation
9af9d0e to
26f39f0
Compare
| @@ -0,0 +1,82 @@ | |||
| /* | |||
65a3c5d to
c3cb26c
Compare
Cherry-picked from facebookincubator/velox@a871a75 Original-author: Wechar Yu <yuwq1996@gmail.com> Cherry-picked-by: Zhongjun Jin <markjin1990@gmail.com> Original Commit Message: ------------------------------------------------------------ Summary: The `to_json` function converts a Json object (ROW, ARRAY or MAP) into a JSON string. Spark's implementation: https://github.com/apache/spark/blob/v3.5.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L672 https://docs.databricks.com/en/sql/language-manual/functions/to_json.html Pull Request resolved: #11995 Reviewed By: xiaoxmeng Differential Revision: D79266717 Pulled By: kKPulla fbshipit-source-id: da5308e663f1149dbfa5a95f6b61ee1c4ab86d7c ------------------------------------------------------------ Source: facebookincubator/velox@a871a75
c3cb26c to
9ce2c7f
Compare
|
Great work, I have some questions:
|
|
@wangxinshuo-bolt Excellent questions! @frankobe also has the same concerns.
|
…cpp, 1) support varbinary type in Spark to_json, 2) fix Array with single empty ROW case in Spark to_json function.
|
|
||
| auto input = makeRowVector({mapVector, arrayVector}); | ||
| auto expected = makeNullableFlatVector<std::string>( | ||
| {R"({"c0":{"blue":[1,2],"red":[null,4]},"c1":[{"blue":1,"red":2},{"green":null}]})", |
There was a problem hiding this comment.
@wangxinshuo-bolt So far, the only difference between this cherry-picked Spark to_json function and the existing to_json is on the nested json object. Spark to_json function will add quotes around the nested keys, but existing to_json won't.
There was a problem hiding this comment.
@wangxinshuo-bolt So far, the only difference between this cherry-picked Spark to_json function and the existing to_json is on the nested json object. Spark to_json function will add quotes around the nested keys, but existing to_json won't.
![]()
I think "c0":{"blue":[1,2],"red":[null,4]} seems more reasonable, and we can use Spark to test the actual return value of the function.
There was a problem hiding this comment.
@wangxinshuo-bolt Spark does have quotes around key strings.
There was a problem hiding this comment.
@wangxinshuo-bolt Spark does have quotes around key strings.
Excellent!

What problem does this PR solve?
Issue Number: close #391
Type of Change
Description
to_jsonfacebookincubator/velox@a871a75ToJsonTest.longDecimalto be consistent with Spark which parses decimal '0.0000000000' as '0E-10' using scientific notation when Bolt supports Spark.Performance Impact
No Impact: This change does not affect the critical path (e.g., build system, doc, error handling).
Positive Impact: I have run benchmarks.
Click to view Benchmark Results
Negative Impact: Explained below (e.g., trade-off for correctness).
Release Note
Please describe the changes in this PR
Release Note:
Checklist (For Author)
Breaking Changes
No
Yes (Description: ...)
Click to view Breaking Changes