Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ doc/_build/
doc/examples_classification/
doc/examples_regression/
doc/examples_calibration/
doc/examples_multilabel_classification/
doc/examples_risk_control/
doc/examples_mondrian/
doc/auto_examples/
doc/modules/generated/
Expand Down
1 change: 1 addition & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ History
* MAPIE now supports Python versions up to the latest release (currently 3.13)
* Change `prefit` default value to `True` in split methods' docstrings to remain consistent with the implementation
* Fix issue 699 to replace `TimeSeriesRegressor.partial_fit` with `TimeSeriesRegressor.update`
* Revert incorrect renaming of calibration to conformalization in risk_control.py

1.0.1 (2025-05-22)
------------------
Expand Down
2 changes: 1 addition & 1 deletion doc/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ clean:
-rm -rf $(BUILDDIR)/*
-rm -rf examples_regression/
-rm -rf examples_classification/
-rm -rf examples_multilabel_classification/
-rm -rf examples_risk_control/
-rm -rf examples_calibration/
-rm -rf examples_mondrian/
-rm -rf generated/*
Expand Down
4 changes: 2 additions & 2 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -321,14 +321,14 @@
"examples_dirs": [
"../examples/regression",
"../examples/classification",
"../examples/multilabel_classification",
"../examples/risk_control",
"../examples/calibration",
"../examples/mondrian",
],
"gallery_dirs": [
"examples_regression",
"examples_classification",
"examples_multilabel_classification",
"examples_risk_control",
"examples_calibration",
"examples_mondrian",
],
Expand Down
3 changes: 2 additions & 1 deletion doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,8 @@
:caption: Control prediction errors

theoretical_description_risk_control
examples_multilabel_classification/1-quickstart/plot_tutorial_risk_control
examples_risk_control/1-quickstart/plot_risk_control_binary_classification
examples_risk_control/index
external_risk_control_package

.. toctree::
Expand Down
8 changes: 7 additions & 1 deletion doc/quick_start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,4 +40,10 @@ Here, we generate one-dimensional noisy data that we fit with a MLPRegressor: `U
3. Classification
=======================

Similarly, it's possible to do the same for a basic classification problem: `Use MAPIE to plot prediction sets <https://mapie.readthedocs.io/en/stable/examples_classification/1-quickstart/plot_quickstart_classification.html>`_
Similarly, it's possible to do the same for a basic classification problem: `Use MAPIE to plot prediction sets <https://mapie.readthedocs.io/en/stable/examples_classification/1-quickstart/plot_quickstart_classification.html>`_


4. Risk Control
=======================

MAPIE implements risk control methods for multilabel classification (in particular, image segmentation) and binary classification: `Use MAPIE to control risk for a binary classifier <https://mapie.readthedocs.io/en/stable/examples_risk_control/1-quickstart/plot_risk_control_binary_classification.html>`_
45 changes: 32 additions & 13 deletions doc/theoretical_description_risk_control.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,26 +13,43 @@ Getting started with risk control in MAPIE
Overview
========

This section provides an overview of risk control in MAPIE. For those unfamiliar with the concept of risk control, the next section provides an introduction to the topic.

Three methods of risk control have been implemented in MAPIE so far :
**Risk-Controlling Prediction Sets** (RCPS) [1], **Conformal Risk Control** (CRC) [2] and **Learn Then Test** (LTT) [3].
The difference between these methods is the way the conformity scores are computed.

As of now, MAPIE supports risk control for two machine learning tasks: **binary classification**, as well as **multi-label classification** (including applications like image segmentation).
As of now, MAPIE supports risk control for two machine learning tasks: **binary classification**, as well as **multi-label classification** (in particular applications like image segmentation).
The table below details the available methods for each task:

.. |br| raw:: html

<br />

.. list-table:: Available risk control methods in MAPIE for each ML task
:header-rows: 1

* - Risk control method
- Binary classification
- Multi-label classification (image segmentation)
* - Risk control |br| method
- Type of |br| control
- Assumption |br| on the data
- Non-monotonic |br| risks
- Binary |br| classification
- Multi-label |br| classification
* - RCPS
- Probability
- i.i.d.
- ❌
- ❌
- ✅
* - CRC
- Expectation
- Exchangeable
- ❌
- ❌
- ✅
* - LTT
- Probability
- i.i.d
- ✅
- ✅
- ✅

Expand All @@ -41,7 +58,7 @@ In MAPIE for multi-label classification, CRC and RCPS are used for recall contro
1. What is risk control?
========================

Before diving into risk control, let's take the simple example of a binary classification model, which separates the incoming data into the two classes thanks to its threshold: predictions above it are classified as 1, and those below as 0. Suppose we want to find a threshold that guarantees that our model achieves a certain level of precision. A naive, yet straightforward approach to do this is to evaluate how precision varies with different threshold values on a validation dataset. By plotting this relationship (see plot below), we can identify the range of thresholds that meet our desired precision requirement (green zone on the graph).
Before diving into risk control, let's take the simple example of a binary classification model, which separates the incoming data into two classes. Predicted probabilities above a given threshold (e.g., 0.5) correspond to predicting the "positive" class and probabilities below correspond to the "negative" class. Suppose we want to find a threshold that guarantees that our model achieves a certain level of precision. A naive, yet straightforward approach to do this is to evaluate how precision varies with different threshold values on a validation dataset. By plotting this relationship (see plot below), we can identify the range of thresholds that meet our desired precision requirement (green zone on the graph).

.. image:: images/example_without_risk_control.png
:width: 600
Expand All @@ -54,7 +71,7 @@ So far, so good. But here is the catch: while the chosen threshold effectively k
Risk control is the science of adjusting a model's parameter, typically denoted :math:`\lambda`, so that a given risk stays below a desired level with high probability on unseen data.
Note that here, the term *risk* is used to describe an undesirable outcome of the model (e.g., type I error): therefore, it is a value we want to minimize, and in our case, keep under a certain level. Also note that risk control can easily be applied to metrics we want to maximize (e.g., precision), simply by controlling the complement (e.g., 1-precision).

The strength of risk control lies in the statistical guarantees it provides on unseen data. Unlike the naive method presented earlier, it determines a value of :math:`\lambda` that ensures the risk is controlled *beyond* the training data.
The strength of risk control lies in the statistical guarantees it provides on unseen data. Unlike the naive method presented earlier, it determines a value of :math:`\lambda` that ensures the risk is controlled *beyond* the validation data.

Applying risk control to the previous example would allow us to get a new — albeit narrower — range of thresholds (blue zone on the graph) that are **statistically guaranteed**.

Expand All @@ -66,7 +83,7 @@ This guarantee is critical in a wide range of use cases (especially in high-stak


To express risk control in mathematical terms, we denote by R the risk we want to control, and introduce the following two parameters:
To express risk control in mathematical terms, we denote by :math:`R` the risk we want to control, and introduce the following two parameters:

- :math:`\alpha`: the target level below which we want the risk to remain, as shown in the figure below;

Expand All @@ -76,13 +93,13 @@ To express risk control in mathematical terms, we denote by R the risk we want t

- :math:`\delta`: the confidence level associated with the risk control.

In other words, the risk is said to be controlled if :math:`R \leq \alpha` with probability at least :math:`1 - \delta`.
In other words, the risk is said to be controlled if :math:`R \leq \alpha` with probability at least :math:`1 - \delta`, where the probability is over the randomness in the sampling of the dataset.

The three risk control methods implemented in MAPIE — RCPS, CRC and LTT — rely on different assumptions, and offer slightly different guarantees:

- **CRC** requires the data to be **exchangeable**, and gives a guarantee on the **expectation of the risk**: :math:`\mathbb{E}(R) \leq \alpha`;

- **RCPS** and **LTT** both impose stricter assumptions, requiring the data to be **independent and identically distributed** (i.i.d.), which implies exchangeability. The guarantee they provide is on the **probability that the risk does not exceed :math:`\alpha`**: :math:`\mathbb{P}(R \leq \alpha) \geq 1 - \delta`.
- **RCPS** and **LTT** both impose stricter assumptions, requiring the data to be **independent and identically distributed** (i.i.d.), which implies exchangeability. The guarantee they provide is on the **probability that the risk does not exceed** :math:`\boldsymbol{\alpha}`: :math:`\mathbb{P}(R \leq \alpha) \geq 1 - \delta`.

.. image:: images/risk_distribution.png
:width: 600
Expand All @@ -94,12 +111,14 @@ The plot above gives a visual representation of the difference between the two t

- The risk is controlled in probability (RCPS/LTT) if at least :math:`1 - \delta` percent of its distribution over unseen data is below :math:`\alpha`.

Note that at the opposite of the other two methods, LTT allows to control any non-monotonic risk.
Note that contrary to the other two methods, LTT allows to control any non-monotonic risk.

The following section provides a detailed overview of each method.

2. Theoretical description
==========================
Note that a notebook testing theoretical guarantees of risk control in binary classification using a random classifier and synthetic data is available here: `theoretical_validity_tests.ipynb <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/risk_control/theoretical_validity_tests.ipynb>`__.

2.1 Risk-Controlling Prediction Sets
------------------------------------
2.1.1 General settings
Expand Down Expand Up @@ -234,7 +253,7 @@ We are going to present the Learn Then Test framework that allows the user to co
This method has been introduced in article [3].
The settings here are the same as RCPS and CRC, we just need to introduce some new parameters:

- Let :math:`\Lambda` be a discretized for our :math:`\lambda`, meaning that :math:`\Lambda = \{\lambda_1, ..., \lambda_n\}`.
- Let :math:`\Lambda` be a discretized set for our :math:`\lambda`, meaning that :math:`\Lambda = \{\lambda_1, ..., \lambda_n\}`.

- Let :math:`p_\lambda` be a valid p-value for the null hypothesis :math:`\mathbb{H}_j: R(\lambda_j)>\alpha`.

Expand All @@ -250,7 +269,7 @@ In order to find all the parameters :math:`\lambda` that satisfy the above condi
:math:`\{(x_1, y_1), \dots, (x_n, y_n)\}`.

- For each :math:`\lambda_j` in a discrete set :math:`\Lambda = \{\lambda_1, \lambda_2,\dots, \lambda_n\}`, we associate the null hypothesis
:math:`\mathcal{H}_j: R(\lambda_j) > \alpha`, as rejecting the hypothesis corresponds to selecting :math:`\lambda_j` as a point where risk the risk
:math:`\mathcal{H}_j: R(\lambda_j) > \alpha`, as rejecting the hypothesis corresponds to selecting :math:`\lambda_j` as a point where the risk
is controlled.

- For each null hypothesis, we compute a valid p-value using a concentration inequality :math:`p_{\lambda_j}`. Here we choose to compute the Hoeffding-Bentkus p-value
Expand Down
2 changes: 0 additions & 2 deletions doc/v1_release_notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -263,8 +263,6 @@ Risk control

The ``MapieMultiLabelClassifier`` class has been renamed ``PrecisionRecallController``.

The parameter ``calib_size`` from the ``fit`` method has been renamed ``conformalize_size``.

Calibration
^^^^^^^^^^^^^

Expand Down
4 changes: 0 additions & 4 deletions examples/multilabel_classification/README.rst

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. _multilabel_classification_examples_1:
.. _risk_control_examples_1:

1. Quickstart examples
----------------------

The following examples present the main functionalities of MAPIE through basic quickstart regression problems.
The following examples present the main functionalities of MAPIE through basic quickstart risk control problems.
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
"""
=================================================
Use MAPIE to control risk for a binary classifier
=================================================

In this example, we explain how to do risk control for binary classification with MAPIE.

"""

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_circles
from sklearn.svm import SVC
from sklearn.model_selection import FixedThresholdClassifier
from sklearn.metrics import precision_score
from sklearn.inspection import DecisionBoundaryDisplay

from mapie.risk_control import BinaryClassificationController, precision
from mapie.utils import train_conformalize_test_split

RANDOM_STATE = 1

##############################################################################
# Let us first load the dataset and fit an SVC on the training data.

X, y = make_circles(n_samples=3000, noise=0.3,
factor=0.3, random_state=RANDOM_STATE)
(X_train, X_calib, X_test,
y_train, y_calib, y_test) = train_conformalize_test_split(
X, y, train_size=0.8, conformalize_size=0.1, test_size=0.1,
random_state=RANDOM_STATE)

clf = SVC(probability=True, random_state=RANDOM_STATE)
clf.fit(X_train, y_train)

##############################################################################
# Next, we initialize a :class:`~mapie.risk_control.BinaryClassificationController`
# using the probability estimation function from the fitted estimator:
# ``clf.predict_proba``, a risk function (here the precision), a target risk level, and
# a confidence level. Then we use the calibration data to compute statistically
# guaranteed thresholds using a risk control method.

target_precision = 0.8
bcc = BinaryClassificationController(
clf.predict_proba, precision, target_level=target_precision, confidence_level=0.9)
bcc.calibrate(X_calib, y_calib)

print(f'{len(bcc.valid_predict_params)} valid thresholds found. '
f'The best one is {bcc.best_predict_param:.3f}.')


##############################################################################
# In the plot below, we visualize how the threshold values impact precision, and what
# thresholds have been computed as statistically guaranteed.

proba_positive_class = clf.predict_proba(X_calib)[:, 1]

tested_thresholds = bcc._predict_params
precisions = np.full(len(tested_thresholds), np.inf)
for i, threshold in enumerate(tested_thresholds):
y_pred = (proba_positive_class >= threshold).astype(int)
precisions[i] = precision_score(y_calib, y_pred)

valid_thresholds_indices = np.array(
[t in bcc.valid_predict_params for t in tested_thresholds])
best_threshold_index = np.where(
tested_thresholds == bcc.best_predict_param)[0][0]

plt.figure()
plt.scatter(tested_thresholds[valid_thresholds_indices],
precisions[valid_thresholds_indices], c='tab:green',
label='Valid thresholds')
plt.scatter(tested_thresholds[~valid_thresholds_indices],
precisions[~valid_thresholds_indices], c='tab:red',
label='Invalid thresholds')
plt.scatter(tested_thresholds[best_threshold_index], precisions[best_threshold_index],
c='tab:green', label='Best threshold', marker='*', edgecolors='k', s=300)
plt.axhline(target_precision, color='tab:gray', linestyle='--')
plt.text(0, target_precision+0.02, 'Target precision',
color='tab:gray', fontstyle='italic')
plt.xlabel('Threshold', labelpad=15)
plt.ylabel('Precision')
plt.legend()
plt.show()

##############################################################################
# Contrary to the naive way of computing a threshold to satisfy a precision target on
# calibration data, risk control provides statistical guarantees on unseen data.
# Besides computing a set of valid thresholds,
# :class:`~mapie.risk_control.BinaryClassificationController` also outputs the best
# one, which in the case of precision is the threshold that, among all valid ones,
# maximizes recall.
#
# In the figure above, the highest threshold values are considered invalid due to the
# small number of observations used to compute the precision, following the Learn then
# Test procedure. In the most extreme case, no observation is available, which causes
# the precision value to be ill-defined and set to 0.
#
# After obtaining the best threshold, we can use the ``predict`` function of
# :class:`~mapie.risk_control.BinaryClassificationController` for future predictions,
# or use scikit-learn's ``FixedThresholdClassifier`` as a wrapper to benefit
# from functionalities like easily plotting the decision boundary as seen below.

y_pred = bcc.predict(X_test)

clf_threshold = FixedThresholdClassifier(clf, threshold=bcc.best_predict_param)
# necessary for plotting, alternatively you can use sklearn.frozen.FrozenEstimator
clf_threshold.fit(X_train, y_train)

disp = DecisionBoundaryDisplay.from_estimator(
clf_threshold, X_test, response_method="predict", cmap=plt.cm.coolwarm)

plt.scatter(X_test[y_test == 0, 0], X_test[y_test == 0, 1],
edgecolors='k', c='tab:blue', alpha=0.5, label='"negative" class')
plt.scatter(X_test[y_test == 1, 0], X_test[y_test == 1, 1],
edgecolors='k', c='tab:red', alpha=0.5, label='"positive" class')
plt.title("Decision Boundary of FixedThresholdClassifier")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.legend()
plt.show()

##############################################################################
# Different risk functions have been implemented, such as precision and recall, but you
# can also implement your own custom function using
# :class:`~mapie.risk_control.BinaryClassificationRisk`.
6 changes: 6 additions & 0 deletions examples/risk_control/2-advanced-analysis/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
.. _risk_control_examples_2:

2. Advanced analysis
--------------------

The following examples use MAPIE for discussing more complex risk control problems.
6 changes: 6 additions & 0 deletions examples/risk_control/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
.. _risk_control_examples:

All risk control examples
=========================

Following is a collection of notebooks demonstrating how to use MAPIE for risk control.
Loading
Loading