|
2 | 2 | Categorical Data
|
3 | 3 | ################
|
4 | 4 |
|
5 |
| -.. note:: |
| 5 | +Since version 1.5, XGBoost has support for categorical data. For numerical data, the |
| 6 | +split condition is defined as :math:`value < threshold`, while for categorical data the |
| 7 | +split is defined depending on whether partitioning or onehot encoding is used. For |
| 8 | +partition-based splits, the splits are specified as :math:`value \in categories`, where |
| 9 | +``categories`` is the set of categories in one feature. If onehot encoding is used |
| 10 | +instead, then the split is defined as :math:`value == category`. More advanced categorical |
| 11 | +split strategy is planned for future releases and this tutorial details how to inform |
| 12 | +XGBoost about the data type. |
6 | 13 |
|
7 |
| - As of XGBoost 1.6, the feature is experimental and has limited features. Only the |
8 |
| - Python package is fully supported. |
9 |
| - |
10 |
| -.. versionadded:: 3.0 |
11 |
| - |
12 |
| - Support for the R package using ``factor``. |
13 |
| - |
14 |
| -Starting from version 1.5, the XGBoost Python package has experimental support for |
15 |
| -categorical data available for public testing. For numerical data, the split condition is |
16 |
| -defined as :math:`value < threshold`, while for categorical data the split is defined |
17 |
| -depending on whether partitioning or onehot encoding is used. For partition-based splits, |
18 |
| -the splits are specified as :math:`value \in categories`, where ``categories`` is the set |
19 |
| -of categories in one feature. If onehot encoding is used instead, then the split is |
20 |
| -defined as :math:`value == category`. More advanced categorical split strategy is planned |
21 |
| -for future releases and this tutorial details how to inform XGBoost about the data type. |
22 | 14 |
|
23 | 15 | ************************************
|
24 | 16 | Training with scikit-learn Interface
|
@@ -69,6 +61,9 @@ for a worked example of using categorical data with ``scikit-learn`` interface w
|
69 | 61 | one-hot encoding. A comparison between using one-hot encoded data and XGBoost's
|
70 | 62 | categorical data support can be found :ref:`sphx_glr_python_examples_cat_in_the_dat.py`.
|
71 | 63 |
|
| 64 | +.. versionadded:: 3.0 |
| 65 | + |
| 66 | + Support for the R package using ``factor``. |
72 | 67 |
|
73 | 68 | ********************
|
74 | 69 | Optimal Partitioning
|
|
0 commit comments