|
| 1 | +################### |
| 2 | +3.1.0 (2025 Sep 22) |
| 3 | +################### |
| 4 | + |
| 5 | +We are delighted to share the latest 3.1.0 update for XGBoost. |
| 6 | + |
| 7 | +******************** |
| 8 | +Categorical Re-coder |
| 9 | +******************** |
| 10 | + |
| 11 | +This release features a major update to categorical data support by introducing a |
| 12 | +re-coder. This re-coder saves categories in the trained model and re-codes the data during |
| 13 | +inference, to keep the categorical encoding consistent. Aside from primitive types like |
| 14 | +integers, it also supports string-based categories. The implementation works with all |
| 15 | +supported Python DataFrame implementations. (:pr:`11609`, :pr:`11665`, :pr:`11605`, |
| 16 | +:pr:`11628`, :pr:`11598`, :pr:`11591`, :pr:`11568`, :pr:`11561`, :pr:`11650`, :pr:`11621`, |
| 17 | +:pr:`11611`, :pr:`11313`, :pr:`11311`, :pr:`11310`, :pr:`11315`, :pr:`11303`, :pr:`11612`, |
| 18 | +:pr:`11098`, :pr:`11347`) See :ref:`cat-recode` for more information. (:pr:`11297`) |
| 19 | + |
| 20 | +In addition, categorical support for Polars data frames is now available (:pr:`11565`). |
| 21 | + |
| 22 | +Lastly, we removed the experimental tag for categorical feature support in this |
| 23 | +release. (:pr:`11690`) |
| 24 | + |
| 25 | +*************** |
| 26 | +External Memory |
| 27 | +*************** |
| 28 | + |
| 29 | +We continue the work on external memory support on 3.1. In this release, XGBoost features |
| 30 | +an adaptive cache for CUDA external memory. The improved cache can split the data between |
| 31 | +CPU memory and GPU memory according to the underlying hardware and data |
| 32 | +size. (:pr:`11556`, :pr:`11465`, :pr:`11664`, :pr:`11594`, :pr:`11469`, :pr:`11547`, |
| 33 | +:pr:`11339`, :pr:`11477`, :pr:`11453`, :pr:`11446`, :pr:`11458`, :pr:`11426`, :pr:`11566`, |
| 34 | +:pr:`11497`) |
| 35 | + |
| 36 | +Also, there's an optional support (opt-in) for using ``nvcomp`` and the GB200 |
| 37 | +decompression engine to handle sparse data (requires nvcomp as a plugin) (:pr:`11451`, |
| 38 | +:pr:`11464`, :pr:`11460`, :pr:`11512`, :pr:`11520`). We improved the memory usage of |
| 39 | +quantile sketching with external memory (:pr:`11641`) and optimized the predictor for |
| 40 | +training (:pr:`11548`). To help ensure the training performance, the latest XGBoost |
| 41 | +features detection for NUMA (Non-Uniform Memory Access) node (:pr:`11538`, :pr:`11576`) for checking cross-socket data |
| 42 | +access. We are working on additional tooling to enhance NUMA node performance. Aside from |
| 43 | +features, we have also added various documentation improvements. (:pr:`11412`, |
| 44 | +:pr:`11631`) |
| 45 | + |
| 46 | +Lastly, external memory support with text file input has been removed |
| 47 | +(:pr:`11562`). Moving forward, we will focus on iterator inputs. |
| 48 | + |
| 49 | + |
| 50 | +**************************** |
| 51 | +Multi-Target/Class Intercept |
| 52 | +**************************** |
| 53 | + |
| 54 | +Starting with 3.1, the base-score (intercept) is estimated and stored as a vector when the |
| 55 | +model has multiple outputs, be it multi-target regression or multi-class |
| 56 | +classification. This change enhances the initial estimation for multi-output models and |
| 57 | +will be the starting point for future work on vector-leaf. (:pr:`11277`, :pr:`11651`, |
| 58 | +:pr:`11625`, :pr:`11649`, :pr:`11630`, :pr:`11647`, :pr:`11656`, :pr:`11663`) |
| 59 | + |
| 60 | +******** |
| 61 | +Features |
| 62 | +******** |
| 63 | + |
| 64 | +- Support leaf prediction with QDM on CPU. (:pr:`11620`) |
| 65 | +- Improve seed with mean sampling for the first iteration. (:pr:`11639`) |
| 66 | +- Optionally include git hash in CMake build. (:pr:`11587`) |
| 67 | + |
| 68 | +**************************** |
| 69 | +Removing Deprecated Features |
| 70 | +**************************** |
| 71 | + |
| 72 | +This version removes some deprecated features, notably, the binary IO format, along with |
| 73 | +features deprecated in 2.0. |
| 74 | + |
| 75 | +- Binary serialization format has been removed in 3.1. The format has been formally |
| 76 | + deprecated in `1.6 <https://github.com/dmlc/xgboost/issues/7547>`__. (:pr:`11307`, |
| 77 | + :pr:`11553`, :pr:`11552`, :pr:`11602`) |
| 78 | + |
| 79 | +- Removed old GPU-related parameters including ``use_gpu`` (pyspark), ``gpu_id``, |
| 80 | + ``gpu_hist``, and ``gpu_coord_descent``. These parameters have been deprecated in |
| 81 | + 2.0. Use the ``device`` parameter instead. (:pr:`11395`, :pr:`11554`, :pr:`11549`, |
| 82 | + :pr:`11543`, :pr:`11539`, :pr:`11402`) |
| 83 | + |
| 84 | +- Remove deprecated C functions: ``XGDMatrixCreateFromCSREx``, |
| 85 | + ``XGDMatrixCreateFromCSCEx``. (:pr:`11514`, :pr:`11513`) |
| 86 | + |
| 87 | +- XGBoost starts emit warning for text inputs. (:pr:`11590`) |
| 88 | + |
| 89 | + |
| 90 | +************* |
| 91 | +Optimizations |
| 92 | +************* |
| 93 | + |
| 94 | +- Optimize CPU inference with Array-Based Tree Traversal (:pr:`11519`) |
| 95 | +- Specialize for GPU dense histogram. (:pr:`11443`) |
| 96 | +- [sycl] Improve L1 cache locality for histogram building. (:pr:`11555`) |
| 97 | +- [sycl] Reduce predictor memory consumption and improve L2 locality (:pr:`11603`) |
| 98 | + |
| 99 | +***** |
| 100 | +Fixes |
| 101 | +***** |
| 102 | + |
| 103 | +- Fix static linking C++ libraries on macOS (:pr:`11522`) |
| 104 | +- Rename param.hh/cc to hist_param.hh/cc to fix xcode build (:pr:`11378`) |
| 105 | +- [sycl] Fix build with updated compiler (:pr:`11618`) |
| 106 | +- [sycl] Various fixes for fp32-only devices. (:pr:`11527`, :pr:`11524`) |
| 107 | +- Fix compilation on android older than API 26 (:pr:`11366`) |
| 108 | +- Fix loading Gamma model from 1.3. (:pr:`11377`) |
| 109 | + |
| 110 | +************** |
| 111 | +Python Package |
| 112 | +************** |
| 113 | + |
| 114 | +- Support mixing Python metrics and built-in metrics for the skl interface. (:pr:`11536`) |
| 115 | +- CUDA 13 Support for PyPI with the new ``xgboost-cu13`` package. (:pr:`11677`, :pr:`11662`) |
| 116 | +- Remove wheels for manylinux2014. (:pr:`11673`) |
| 117 | +- Initial support for building variant wheels (:pr:`11531`, :pr:`11645`, :pr:`11294`) |
| 118 | +- Minimum PySpark version is now set to 3.4 (:pr:`11364`). In addition, the PySpark |
| 119 | + interface now checks the validation indicator column type and has a fix for None column |
| 120 | + input. (:pr:`11535`, :pr:`11523`) |
| 121 | +- [dask] Small cleanup for the predict function. (:pr:`11423`) |
| 122 | + |
| 123 | +********* |
| 124 | +R Package |
| 125 | +********* |
| 126 | + |
| 127 | +Now that most of the deprecated features have been removed in this release, we will try to |
| 128 | +bring the latest R package back to CRAN. |
| 129 | + |
| 130 | +- Implement Booster reset. (:pr:`11357`) |
| 131 | +- Improvements for documentation, including having code examples in XGBoost's sphinx |
| 132 | + documentation side, and notes for R-universe release. (:pr:`11369`, :pr:`11410`, |
| 133 | + :pr:`11685`, :pr:`11316`) |
| 134 | + |
| 135 | +************ |
| 136 | +JVM Packages |
| 137 | +************ |
| 138 | + |
| 139 | +- Support columnar inputs for cpu pipeline (:pr:`11352`) |
| 140 | +- Rewrite the `LabeledPoint` as a Java class (:pr:`11545`) |
| 141 | +- Various fixes and document updates. (:pr:`11525`, :pr:`11508`, :pr:`11489`, :pr:`11682`) |
| 142 | + |
| 143 | +********* |
| 144 | +Documents |
| 145 | +********* |
| 146 | + |
| 147 | +Changes for general documentation: |
| 148 | + |
| 149 | +- Update notes about GPU memory usage. (:pr:`11375`) |
| 150 | +- Various fixes and updates. (:pr:`11503`, :pr:`11532`, :pr:`11328`, :pr:`11344`, :pr:`11626`) |
| 151 | + |
| 152 | + |
| 153 | +****************** |
| 154 | +CI and Maintenance |
| 155 | +****************** |
| 156 | + |
| 157 | +- Code cleanups. (:pr:`11367`, :pr:`11342`, :pr:`11658`, :pr:`11528`, :pr:`11585`, |
| 158 | + :pr:`11672`, :pr:`11642`, :pr:`11667`, :pr:`11495`, :pr:`11567`) |
| 159 | +- Various cleanup and fixes for tests. (:pr:`11405`, :pr:`11389`, :pr:`11396`, :pr:`11456`) |
| 160 | +- Support CMake 4.0 (:pr:`11382`) |
| 161 | +- Various CI updates and fixes (:pr:`11318`, :pr:`11349`, :pr:`11653`, :pr:`11637`, |
| 162 | + :pr:`11683`, :pr:`11638`, :pr:`11644`, :pr:`11306`, :pr:`11560`, :pr:`11323`, :pr:`11617`, |
| 163 | + :pr:`11341`, :pr:`11693`) |
0 commit comments