Make block_size of BuildHistKernel adaptive #11808

razdoburdin · 2025-11-12T17:18:12Z

Current version of xgboost utilizes fixed block_size = 256 for hist building.

This PR make this value an adaptive function of model parameters and CPU cache size. The change is important mostly for ColsWiseBuildHistKernel and demonstrates up to 2x speed-up for epsilon dataset.

trivialfis · 2025-11-12T17:28:03Z

Thank you for the optimizations! The code looks reasonable, but please add comments when the PR is ready for review. (and ping me).

src/common/cache_manager.cc

Vika-F · 2025-11-13T13:02:43Z

src/tree/hist/histogram.h

+    std::size_t occupied_space = (hist_fit_to_l1 ? hist_size : 0) + offsets_size + idx_bin_size;
+    space_in_l1_for_rows = usable_l1_size > occupied_space ? usable_l1_size - occupied_space : 0;
+  }
+  std::size_t block_size = std::max<std::size_t>(1, space_in_l1_for_rows / l1_row_foot_print);


Previously block_size was always 256 rows, which is quite large. And now it is 1 row in case no more rows fit into L1. Won't this change affect the performance in the case when there are no enough space for rows in L1?
Should it be max(256, space_in_l1_for_rows / l1_row_foot_print) ?
Or maybe L2 size should be used to calculate the block_size?

I think, cacheline_size / (2 * sizeof(float) = 8, would be the best value in this case. Using L2 would result to a huge block_size (~1e4-1e5) and produce potential underutilization of CPU cores (blocks are processed in parallel, and if blocks a very big, than some cores would be out of job).

Co-authored-by: Victoriya Fedotova <[email protected]>

razdoburdin · 2025-11-13T16:21:00Z

Hi @trivialfis , this PR is ready for review.

trivialfis

Would you like to explain the cache info in code comments? Also, the construction of hist space on how and why it depends on the cache size?

razdoburdin · 2025-11-14T09:10:36Z

Would you like to explain the cache info in code comments? Also, the construction of hist space on how and why it depends on the cache size?

done

trivialfis · 2025-11-14T10:35:54Z

src/common/cache_manager.cc

+      GetCacheInfo(cache_num++, &type, &level, &sets, &line_size, &partitions, &ways);
+    if (!trust_cpuid) return trust_cpuid;
+
+    if (type == kCpuidTypeNull) break;


Is the cache_sizes[idx] valid if we break here?

In this case we use default values from SetDefaultCaches

trivialfis · 2025-11-14T10:42:48Z

src/tree/hist/histogram.h

+    size_t hist_size = 2 * sizeof(double) * nbins;
+    const bool hist_fit_to_l2 = 0.8 * cache_manager_.L2Size() > hist_size;
+
+    bool read_by_column = !hist_fit_to_l2 && gidx.IsDense();


Could you please add a comment on how this decision is made?

trivialfis · 2025-11-14T10:45:16Z

src/tree/hist/histogram.h

+    std::size_t n_bins = gidx.cut.Ptrs().back();
+    std::size_t n_columns = gidx.cut.Ptrs().size() - 1;
+    bool any_missing = !gidx.IsDense();
+    std::size_t hist_size = 2 * sizeof(double) * n_bins;


Consider using sizeof(GradientPair) and sizeof(GradientPairPrecise) instead of sizeof(float) * 2 (for all sizeof calls in this PR).

trivialfis · 2025-11-14T10:48:02Z

src/tree/hist/histogram.h

+    */
+
+    /* First step: determine whether one histogram column fits into L1.
+    * The maximum number of elements in a column is 2^8, 2^16, or 2^32,


Could you please elaborate on what it means to be the maximum number of elements in a (histogram) column? I thought that's the number of histogram bins?

you are right, bins is a correct term. I have fixed the description.

Thank you for updating the comments. It's still not clear to me what it means to have "maximum number of bins" in a column. So, what happens if I specify the training parameter max_bin=53?

Dmitry Razdoburdin added 3 commits November 12, 2025 05:18

initial

a7643ac

logging

498043e

min block size

956fd7f

razdoburdin marked this pull request as draft November 12, 2025 17:18

Dmitry Razdoburdin added 6 commits November 12, 2025 23:40

detect untrusted cases

dc9173d

linting and R fix

9333f7c

fix build

369683b

lint and refactoring

d0d47a3

refactoring

88f246e

comments

2f56e65

Vika-F reviewed Nov 13, 2025

View reviewed changes

razdoburdin and others added 6 commits November 13, 2025 14:03

Update src/common/cache_manager.cc

7522e70

Co-authored-by: Victoriya Fedotova <[email protected]>

Update src/common/cache_manager.cc

fb86e71

Co-authored-by: Victoriya Fedotova <[email protected]>

Update src/common/cache_manager.cc

b08da41

Co-authored-by: Victoriya Fedotova <[email protected]>

fix

e8d7f93

refactor

c594b59

fix

e73cf6e

razdoburdin marked this pull request as ready for review November 13, 2025 16:20

trivialfis reviewed Nov 13, 2025

View reviewed changes

more comments

465cd68

trivialfis reviewed Nov 14, 2025

View reviewed changes

make row/col wise hist dispatching more clear

5598949

Uh oh!

Make block_size of BuildHistKernel adaptive #11808

Are you sure you want to change the base?

Make block_size of BuildHistKernel adaptive #11808

Uh oh!

Conversation

razdoburdin commented Nov 12, 2025

Uh oh!

trivialfis commented Nov 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

razdoburdin commented Nov 13, 2025

Uh oh!

trivialfis left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

razdoburdin commented Nov 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

trivialfis left a comment •

edited

Loading