Skip to content

Conversation

@Vishwanatha-HD
Copy link
Contributor

@Vishwanatha-HD Vishwanatha-HD commented Nov 19, 2025

Rationale for this change

This PR is intended to enable Parquet DB support on Big-endian (s390x) systems. The fix in this PR fixes "arrow-acero-asof-join-node-test" testcase failure.

The "arrow-acero-asof-join-node-test" testcase was Aborted/core dumped on Big-endian platforms.

$ ./arrow-acero-asof-join-node-test
[ RUN      ] AsofJoinNodeTest/AsofJoinBasicTest.TestBasic1Backward/3
[       OK ] AsofJoinNodeTest/AsofJoinBasicTest.TestBasic1Backward/3 (201 ms)
[ RUN      ] AsofJoinNodeTest/AsofJoinBasicTest.TestBasic1Backward/4
arrow/cpp/src/arrow/compute/util.cc:35:  Check failed: false 
Aborted (core dumped)

What changes are included in this PR?

The fix includes changes to "util.cc" file to address the Abort/Core dump issues.

Are these changes tested?

Yes. The changes are tested on s390x arch to make sure things are working fine. The fix is also tested on x86 arch, to make sure there is no new regression introduced.

Are there any user-facing changes?

No

@kou kou changed the title GH-48151: [C++][Parquet] Fix arrow-acero-asof-join-node-test failures… GH-48177: [C++][Parquet] Fix arrow-acero-asof-join-node-test failures on s390x Nov 19, 2025
@github-actions
Copy link

⚠️ GitHub issue #48177 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you fix lint failure?

https://github.com/apache/arrow/actions/runs/19504115732/job/55873075753?pr=48180#step:6:84

diff --git a/cpp/src/arrow/compute/util.cc b/cpp/src/arrow/compute/util.cc
index 66c48631dc..3b671db021 100644
--- a/cpp/src/arrow/compute/util.cc
+++ b/cpp/src/arrow/compute/util.cc
@@ -325,10 +325,11 @@ void bytes_to_bits(int64_t hardware_flags, const int num_bits, const uint8_t* by
     bytes_next = SafeLoadUpTo8Bytes(bytes + num_bits - tail, tail);
 #else
     if (tail == 8) {
-      bytes_next = util::SafeLoad(reinterpret_cast<const uint64_t*>(bytes + num_bits - tail));
+      bytes_next =
+          util::SafeLoad(reinterpret_cast<const uint64_t*>(bytes + num_bits - tail));
     } else {
-      // On Big-endian systems, for bytes_to_bits, load all tail bytes in little-endian order
-      // to ensure compatibility with subsequent bit operations
+      // On Big-endian systems, for bytes_to_bits, load all tail bytes in little-endian
+      // order to ensure compatibility with subsequent bit operations
       bytes_next = 0;
       for (int i = 0; i < tail; ++i) {
         bytes_next |= static_cast<uint64_t>((bytes + num_bits - tail)[i]) << (8 * i);

You can use nice pre-commit run --show-diff-on-failure --color=always --all-files cpp.

Copy link
Contributor Author

@Vishwanatha-HD Vishwanatha-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have addressed all the review comments. Please re-review. Thanks.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work?

diff --git a/cpp/src/arrow/compute/util.cc b/cpp/src/arrow/compute/util.cc
index b90b3a6405..163a80d9d4 100644
--- a/cpp/src/arrow/compute/util.cc
+++ b/cpp/src/arrow/compute/util.cc
@@ -30,33 +30,41 @@ namespace util {
 namespace bit_util {
 
 inline uint64_t SafeLoadUpTo8Bytes(const uint8_t* bytes, int num_bytes) {
-  // This will not be correct on big-endian architectures.
-#if !ARROW_LITTLE_ENDIAN
-  ARROW_DCHECK(false);
-#endif
   ARROW_DCHECK(num_bytes >= 0 && num_bytes <= 8);
   if (num_bytes == 8) {
-    return util::SafeLoad(reinterpret_cast<const uint64_t*>(bytes));
+    auto word = util::SafeLoad(reinterpret_cast<const uint64_t*>(bytes));
+#if !ARROW_LITTLE_ENDIAN
+    word = bit_util::ByteSwap(word);
+#endif
+    return word;
   } else {
     uint64_t word = 0;
     for (int i = 0; i < num_bytes; ++i) {
+#if ARROW_LITTLE_ENDIAN
       word |= static_cast<uint64_t>(bytes[i]) << (8 * i);
+#else
+      word |= static_cast<uint64_t>(bytes[num_bytes - 1 - i]) << (8 * i);
+#endif
     }
     return word;
   }
 }
 
 inline void SafeStoreUpTo8Bytes(uint8_t* bytes, int num_bytes, uint64_t value) {
-  // This will not be correct on big-endian architectures.
-#if !ARROW_LITTLE_ENDIAN
-  ARROW_DCHECK(false);
-#endif
   ARROW_DCHECK(num_bytes >= 0 && num_bytes <= 8);
   if (num_bytes == 8) {
+#if ARROW_LITTLE_ENDIAN
     util::SafeStore(reinterpret_cast<uint64_t*>(bytes), value);
+#else
+    util::SafeStore(reinterpret_cast<uint64_t*>(bytes), bit_util::ByteSwap(value));
+#endif
   } else {
     for (int i = 0; i < num_bytes; ++i) {
+#if ARROW_LITTLE_ENDIAN
       bytes[i] = static_cast<uint8_t>(value >> (8 * i));
+#else
+      bytes[i] = static_cast<uint8_t>(value >> (8 * (num_bytes - 1 - i)));
+#endif
     }
   }
 }

@k8ika0s
Copy link

k8ika0s commented Nov 23, 2025

Mostly looks good to me — just one thought after reading through the recent back-and-forth...

Given the updated handling of tail bytes and the SafeLoadUpTo8Bytes discussion, I think this PR’s direction still makes sense. I’d just double-check that the tail==8 path really can’t happen with the current unroll logic, since @kou kou raised that question.
Otherwise the fixes seem aligned with the latest comments.

Willing to help test once the approach is finalized.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Nov 24, 2025
@Vishwanatha-HD
Copy link
Contributor Author

Does this work?

diff --git a/cpp/src/arrow/compute/util.cc b/cpp/src/arrow/compute/util.cc
index b90b3a6405..163a80d9d4 100644
--- a/cpp/src/arrow/compute/util.cc
+++ b/cpp/src/arrow/compute/util.cc
@@ -30,33 +30,41 @@ namespace util {
 namespace bit_util {
 
 inline uint64_t SafeLoadUpTo8Bytes(const uint8_t* bytes, int num_bytes) {
-  // This will not be correct on big-endian architectures.
-#if !ARROW_LITTLE_ENDIAN
-  ARROW_DCHECK(false);
-#endif
   ARROW_DCHECK(num_bytes >= 0 && num_bytes <= 8);
   if (num_bytes == 8) {
-    return util::SafeLoad(reinterpret_cast<const uint64_t*>(bytes));
+    auto word = util::SafeLoad(reinterpret_cast<const uint64_t*>(bytes));
+#if !ARROW_LITTLE_ENDIAN
+    word = bit_util::ByteSwap(word);
+#endif
+    return word;
   } else {
     uint64_t word = 0;
     for (int i = 0; i < num_bytes; ++i) {
+#if ARROW_LITTLE_ENDIAN
       word |= static_cast<uint64_t>(bytes[i]) << (8 * i);
+#else
+      word |= static_cast<uint64_t>(bytes[num_bytes - 1 - i]) << (8 * i);
+#endif
     }
     return word;
   }
 }
 
 inline void SafeStoreUpTo8Bytes(uint8_t* bytes, int num_bytes, uint64_t value) {
-  // This will not be correct on big-endian architectures.
-#if !ARROW_LITTLE_ENDIAN
-  ARROW_DCHECK(false);
-#endif
   ARROW_DCHECK(num_bytes >= 0 && num_bytes <= 8);
   if (num_bytes == 8) {
+#if ARROW_LITTLE_ENDIAN
     util::SafeStore(reinterpret_cast<uint64_t*>(bytes), value);
+#else
+    util::SafeStore(reinterpret_cast<uint64_t*>(bytes), bit_util::ByteSwap(value));
+#endif
   } else {
     for (int i = 0; i < num_bytes; ++i) {
+#if ARROW_LITTLE_ENDIAN
       bytes[i] = static_cast<uint8_t>(value >> (8 * i));
+#else
+      bytes[i] = static_cast<uint8_t>(value >> (8 * (num_bytes - 1 - i)));
+#endif
     }
   }
 }

Hi @kou ,
I have now reverted the changes done to "SafeLoadUpTo8Bytes()" function on s390x.. Its totally not required.. Thanks..

@Vishwanatha-HD
Copy link
Contributor Author

Mostly looks good to me — just one thought after reading through the recent back-and-forth...

Given the updated handling of tail bytes and the SafeLoadUpTo8Bytes discussion, I think this PR’s direction still makes sense. I’d just double-check that the tail==8 path really can’t happen with the current unroll logic, since @kou kou raised that question. Otherwise the fixes seem aligned with the latest comments.

Willing to help test once the approach is finalized.

Thanks @k8ika0s as well for your review comments.. I have checked the tail==8 code path, and its not required anymore. I have reverted the changes and pushed the code changes again..

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Nov 24, 2025
@Vishwanatha-HD Vishwanatha-HD force-pushed the fixParqIssues2 branch 4 times, most recently from 471c817 to 4d4691a Compare November 24, 2025 11:09
Copy link
Contributor Author

@Vishwanatha-HD Vishwanatha-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have addressed all the review comments.. Please re-review the changes.. Thanks..

@Vishwanatha-HD
Copy link
Contributor Author

Mostly looks good to me — just one thought after reading through the recent back-and-forth...

Given the updated handling of tail bytes and the SafeLoadUpTo8Bytes discussion, I think this PR’s direction still makes sense. I’d just double-check that the tail==8 path really can’t happen with the current unroll logic, since @kou kou raised that question. Otherwise the fixes seem aligned with the latest comments.

Willing to help test once the approach is finalized.

@k8ika0s.. Thanks very much for your review on this.. Yeah sure.. You please go ahead and cherry-pick my PR patches and run the tests from your end.. Please let me know the final status.. Thanks.. !!

Copy link
Contributor Author

@Vishwanatha-HD Vishwanatha-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved all the code review comments

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Nov 25, 2025
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants