From 54ce1693b89ba367bab0c0729cf7b5e5ef7d7c49 Mon Sep 17 00:00:00 2001 From: Dante Gama Dessavre Date: Wed, 17 Sep 2025 09:45:30 -0500 Subject: [PATCH 1/2] DOC Update categorical docs --- doc/tutorials/categorical.rst | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/doc/tutorials/categorical.rst b/doc/tutorials/categorical.rst index 76b88e67dece..d836d96f5e4f 100644 --- a/doc/tutorials/categorical.rst +++ b/doc/tutorials/categorical.rst @@ -2,12 +2,9 @@ Categorical Data ################ -.. note:: - As of XGBoost 1.6, the feature is experimental and has limited features - -Starting from version 1.5, XGBoost has experimental support for categorical data available -for public testing. For numerical data, the split condition is defined as :math:`value < +Since version 1.5, XGBoost has support for categorical data. +For numerical data, the split condition is defined as :math:`value < threshold`, while for categorical data the split is defined depending on whether partitioning or onehot encoding is used. For partition-based splits, the splits are specified as :math:`value \in categories`, where ``categories`` is the set of categories From a527db82d1d4617d19db8eb1df2bde7b89673d54 Mon Sep 17 00:00:00 2001 From: Jiaming Yuan Date: Fri, 19 Sep 2025 10:49:14 +0800 Subject: [PATCH 2/2] Note for the R package. --- doc/tutorials/categorical.rst | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/doc/tutorials/categorical.rst b/doc/tutorials/categorical.rst index 4d6edcc176b1..c3c7d173078b 100644 --- a/doc/tutorials/categorical.rst +++ b/doc/tutorials/categorical.rst @@ -2,15 +2,14 @@ Categorical Data ################ - -Since version 1.5, XGBoost has support for categorical data. -For numerical data, the split condition is defined as :math:`value < -threshold`, while for categorical data the split is defined depending on whether -partitioning or onehot encoding is used. For partition-based splits, the splits are -specified as :math:`value \in categories`, where ``categories`` is the set of categories -in one feature. If onehot encoding is used instead, then the split is defined as -:math:`value == category`. More advanced categorical split strategy is planned for future -releases and this tutorial details how to inform XGBoost about the data type. +Since version 1.5, XGBoost has support for categorical data. For numerical data, the +split condition is defined as :math:`value < threshold`, while for categorical data the +split is defined depending on whether partitioning or onehot encoding is used. For +partition-based splits, the splits are specified as :math:`value \in categories`, where +``categories`` is the set of categories in one feature. If onehot encoding is used +instead, then the split is defined as :math:`value == category`. More advanced categorical +split strategy is planned for future releases and this tutorial details how to inform +XGBoost about the data type. ************************************ @@ -62,6 +61,9 @@ for a worked example of using categorical data with ``scikit-learn`` interface w one-hot encoding. A comparison between using one-hot encoded data and XGBoost's categorical data support can be found :ref:`sphx_glr_python_examples_cat_in_the_dat.py`. +.. versionadded:: 3.0 + + Support for the R package using ``factor``. ******************** Optimal Partitioning