[feature][cp] feat: Add Spark to_json function by markjin1990 · Pull Request #390 · bytedance/bolt

markjin1990 · 2026-03-12T22:02:41Z

What problem does this PR solve?

Issue Number: close #391

Type of Change

🐛 Bug fix (non-breaking change which fixes an issue)
✨ New feature (non-breaking change which adds functionality)
🚀 Performance improvement (optimization)
⚠️ Breaking change (fix or feature that would cause existing functionality to change)
🔨 Refactoring (no logic changes)
🔧 Build/CI or Infrastructure changes
📝 Documentation only

Description

Cherrypicked Spark function implementation for to_json facebookincubator/velox@a871a75
Modified unit test in ToJsonTest.longDecimal to be consistent with Spark which parses decimal '0.0000000000' as '0E-10' using scientific notation when Bolt supports Spark.
Added missing components needed by the cherry-pick.

Performance Impact

No Impact: This change does not affect the critical path (e.g., build system, doc, error handling).

Positive Impact: I have run benchmarks.

Click to view Benchmark Results

Paste your google-benchmark or TPC-H results here.
Before: 10.5s
After:   8.2s  (+20%)

Negative Impact: Explained below (e.g., trade-off for correctness).

Release Note

Please describe the changes in this PR

Release Note:

Release Note:
- [cherry-pick] feat: Add Spark to_json function

Checklist (For Author)

I have added/updated unit tests (ctest).
I have verified the code with local build (Release/Debug).
I have run clang-format / linters.
(Optional) I have run Sanitizers (ASAN/TSAN) locally for complex C++ changes.
No need to test or manual test.

Breaking Changes

No

Yes (Description: ...)

Click to view Breaking Changes

Breaking Changes:
- Description of the breaking change.
- Possible solutions or workarounds.
- Any other relevant information.

frankobe · 2026-03-13T01:45:37Z

bolt/functions/sparksql/tests/FromToJsonRoundTripTest.cpp

@@ -0,0 +1,82 @@
+/*


Incorrect license

Cherry-picked from facebookincubator/velox@a871a75 Original-author: Wechar Yu <yuwq1996@gmail.com> Cherry-picked-by: Zhongjun Jin <markjin1990@gmail.com> Original Commit Message: ------------------------------------------------------------ Summary: The `to_json` function converts a Json object (ROW, ARRAY or MAP) into a JSON string. Spark's implementation: https://github.com/apache/spark/blob/v3.5.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L672 https://docs.databricks.com/en/sql/language-manual/functions/to_json.html Pull Request resolved: #11995 Reviewed By: xiaoxmeng Differential Revision: D79266717 Pulled By: kKPulla fbshipit-source-id: da5308e663f1149dbfa5a95f6b61ee1c4ab86d7c ------------------------------------------------------------ Source: facebookincubator/velox@a871a75

wangxinshuo-bolt · 2026-03-16T06:08:25Z

Great work, I have some questions:

Before open-sourcing, we internally implemented our own to_json function. After open-sourcing, the to_json function support was removed. Why is this?
If the to_json function is cherry-picked, were the original UT cases retained? Those examples came from our online cases.

markjin1990 · 2026-03-16T06:38:20Z

@wangxinshuo-bolt Excellent questions! @frankobe also has the same concerns.

Our internal (presto) to_json implementation can only takes in one argument. Now, after rebase, Gluten now requires two arguments for to_json (input + timeZone), and Bolt fails as described in [Bug] "to_json" fails on gluten with error "Scalar function to_json not registered with arguments: (ARRAY<VARCHAR>, VARCHAR)" #391. In this case, we either need to cherry-pick the Spark to_json implementation to meet the new requirements, or we rewrite the Gluten (ToJsonTransformer).
I already run tests on 150 internal tasks, and the result all match. I am now testing original UT tests and see if we miss anything. I will get you updated on the final result.

…cpp, 1) support varbinary type in Spark to_json, 2) fix Array with single empty ROW case in Spark to_json function.

markjin1990 · 2026-03-19T19:28:05Z

bolt/functions/sparksql/tests/ToJsonTest.cpp

+
+  auto input = makeRowVector({mapVector, arrayVector});
+  auto expected = makeNullableFlatVector<std::string>(
+      {R"({"c0":{"blue":[1,2],"red":[null,4]},"c1":[{"blue":1,"red":2},{"green":null}]})",


@wangxinshuo-bolt So far, the only difference between this cherry-picked Spark to_json function and the existing to_json is on the nested json object. Spark to_json function will add quotes around the nested keys, but existing to_json won't.

@wangxinshuo-bolt So far, the only difference between this cherry-picked Spark to_json function and the existing to_json is on the nested json object. Spark to_json function will add quotes around the nested keys, but existing to_json won't.

I think "c0":{"blue":[1,2],"red":[null,4]} seems more reasonable, and we can use Spark to test the actual return value of the function.

@wangxinshuo-bolt Spark does have quotes around key strings.

@wangxinshuo-bolt Spark does have quotes around key strings.

Excellent！

wangxinshuo-bolt

LGTM

markjin1990 force-pushed the cherry-pick-spark-func-to-json branch from 9af9d0e to 26f39f0 Compare March 12, 2026 22:07

frankobe reviewed Mar 13, 2026

View reviewed changes

bolt/functions/sparksql/tests/FromToJsonRoundTripTest.cpp

@@ -0,0 +1,82 @@

/*

Copy link

Collaborator

frankobe Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect license

markjin1990 force-pushed the cherry-pick-spark-func-to-json branch 2 times, most recently from 65a3c5d to c3cb26c Compare March 13, 2026 16:56

markjin1990 force-pushed the cherry-pick-spark-func-to-json branch from c3cb26c to 9ce2c7f Compare March 13, 2026 17:38

markjin1990 changed the title ~~WIP: [Cherry-pick][Velox] feat: Add Spark to_json function~~ [Cherry-pick][Velox] feat: Add Spark to_json function Mar 13, 2026

markjin1990 requested review from fzhedu and guhaiyan0221 March 13, 2026 17:45

frankobe changed the title ~~[Cherry-pick][Velox] feat: Add Spark to_json function~~ [feature][cp] feat: Add Spark to_json function Mar 13, 2026

markjin1990 requested a review from wangxinshuo-bolt March 15, 2026 20:37

[fix] 0) port all tests in presto ToJsonTest.cpp to Spark ToJsonTest.…

d88dfd7

…cpp, 1) support varbinary type in Spark to_json, 2) fix Array with single empty ROW case in Spark to_json function.

markjin1990 commented Mar 19, 2026

View reviewed changes

wangxinshuo-bolt approved these changes Mar 23, 2026

View reviewed changes

markjin1990 added this pull request to the merge queue Mar 23, 2026

markjin1990 removed this pull request from the merge queue due to a manual request Mar 23, 2026

markjin1990 added this pull request to the merge queue Mar 23, 2026

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 23, 2026

markjin1990 added this pull request to the merge queue Mar 23, 2026

Merged via the queue into bytedance:main with commit ab89ad8 Mar 23, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature][cp] feat: Add Spark to_json function#390

[feature][cp] feat: Add Spark to_json function#390
markjin1990 merged 2 commits intobytedance:mainfrom
markjin1990:cherry-pick-spark-func-to-json

markjin1990 commented Mar 12, 2026 •

edited

Loading

Uh oh!

frankobe Mar 13, 2026

Uh oh!

wangxinshuo-bolt commented Mar 16, 2026

Uh oh!

markjin1990 commented Mar 16, 2026

Uh oh!

markjin1990 Mar 19, 2026

Uh oh!

wangxinshuo-bolt Mar 20, 2026

Uh oh!

markjin1990 Mar 21, 2026

Uh oh!

wangxinshuo-bolt Mar 23, 2026

Uh oh!

wangxinshuo-bolt left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

markjin1990 commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Type of Change

Description

Performance Impact

Release Note

Checklist (For Author)

Breaking Changes

Uh oh!

frankobe Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

wangxinshuo-bolt commented Mar 16, 2026

Uh oh!

markjin1990 commented Mar 16, 2026

Uh oh!

markjin1990 Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

wangxinshuo-bolt Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

markjin1990 Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

wangxinshuo-bolt Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

wangxinshuo-bolt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

markjin1990 commented Mar 12, 2026 •

edited

Loading