Skip to content

Feature proposal - groupSHAP #11493

@OlivierBinette

Description

@OlivierBinette

Problem

I’d like to be able to compute SHAP values “by group” for a tree ensemble, using an extension of the TreeSHAP algorithm.

For example, I might have a credit risk model with multiple features related to someone’s income. While I can add up the SHAP values for these features to get a global “income” contribution, this is not the same as computing SHAP by considering the inclusion/exclusion of all income features together. The latter is called “groupSHAP”.

The distinction between summing up SHAP values by group and computing groupSHAP is very important, as summing up SHAP values by group can be misleading when features interact across groups.

Proposed Solution

If there is interest in groupSHAP, I’d like to open a PR that adds a new, optional argument for computing groupSHAP in xgboost’s treeshap implementation. The argument would be an optional mapping of feature name to feature group name.

It would be a rather simple modification and I can write up a technical report demonstrating the validity of the change.

Possible alternatives

There is a Python package that implements groupSHAP, but it is based on an R package and I’m not sure how robust it is. Given that groupSHAP can be added very easily to the TreeSHAP algorithm implementation, I think it would be worthwhile addition to core xgboost.

If core xgboost is not the right place for this, I might write a small C extension that implements groupSHAP and wrap it in a separate Python package.

Additional context

If there is interest in this idea, I’m happy to say more about groupSHAP or to write a more detailed proposal.

But I’d like to hear thoughts from the contributors before I start work on a PR.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions