Fix typo and links

C-BdB · C-BdB · commit 6cf7058be3ef · 2025-03-20T14:21:46.000+01:00
diff --git a/examples/regression/1-quickstart/plot_ts-tutorial.py b/examples/regression/1-quickstart/plot_ts-tutorial.py
@@ -4,7 +4,7 @@
 ========================
 
 In this tutorial we describe how to use
-:class:`~mapie.time_series_regression.MapieTimeSeriesRegressor`
+:class:`~mapie.regression.MapieTimeSeriesRegressor`
 to estimate prediction intervals associated with time series forecast.
 
 Here, we use the Victoria electricity demand dataset used in the book
@@ -24,7 +24,8 @@
 the EnbPI method.
 
 As its parent class :class:`~MapieRegressor`,
-:class:`~MapieTimeSeriesRegressor` has two main arguments : "cv", and "method".
+:class:`~mapie.regression.MapieTimeSeriesRegressor` has two main arguments :
+"cv", and "method".
 In order to implement EnbPI, "method" must be set to "enbpi" (the default
 value) while "cv" must be set to the :class:`~mapie.subsample.BlockBootstrap`
 class that block bootstraps the training set.
@@ -34,8 +35,8 @@ class that block bootstraps the training set.
 The EnbPI method allows you update the residuals during the prediction,
 each time new observations are available so that the deterioration of
 predictions, or the increase of noise level, can be dynamically taken into
-account. It can be done with :class:`~MapieTimeSeriesRegressor` through
-the ``partial_fit`` class method called at every step.
+account. It can be done with :class:`~mapie.regression.MapieTimeSeriesRegressor`
+through the ``partial_fit`` class method called at every step.
 
 
 The ACI strategy allows you to adapt the conformal inference
diff --git a/examples/regression/2-advanced-analysis/plot-coverage-width-based-criterion.py b/examples/regression/2-advanced-analysis/plot-coverage-width-based-criterion.py
@@ -10,7 +10,7 @@
 :class:`~mapie.metrics` is used to estimate the coverage width
 based criterion of 1D homoscedastic data using different strategies.
 The coverage width based criterion is computed with the function
-:func:`~mapie.metrics.coverage_width_based()`
+:func:`~mapie.metrics.coverage_width_based`
 """
 
 import os
diff --git a/examples/regression/2-advanced-analysis/plot_conditional_coverage.py b/examples/regression/2-advanced-analysis/plot_conditional_coverage.py
@@ -186,34 +186,28 @@ def sin_with_controlled_noise(
     print(estimated_cond_cov)
 
 ##############################################################################
-# We can see here that the global coverage is approximately the same for
-# all methods. What we want to understand is : "Are these methods good
-# adaptive conformal methods ?". For this we have the two metrics
+# The global coverage is similar for all methods. To determine if these
+# methods are good adaptive conformal methods, we use two metrics:
 # :func:`~mapie.metrics.regression_ssc_score` and :func:`~mapie.metrics.hsic`.
-# - SSC (Size Stratified Coverage) is the maximum violation of the coverage :
-# the intervals are grouped by width and the coverage is computed for each
-# group. The lower coverage is the maximum coverage violation. An adaptive
-# method is one where this maximum violation is as close as possible to the
-# global coverage. If we interpret the result for the four methods here :
-# CV+ seems to be the better one.
-# - And with the hsic correlation coefficient, we have the
-# same interpretation : :func:`~mapie.metrics.hsic` computes the correlation
-# between the coverage indicator and the interval size, a value of 0
-# translates an independence between the two.
 #
-# We would like to highlight here the misinterpretation that can be made
-# with these metrics. In fact, here CV+ with the absolute residual score
-# calculates constant intervals which, by definition, are not adaptive.
-# Therefore, it is very important to check that the intervals widths are well
-# spread before drawing conclusions (with a plot of the distribution of
-# interval widths or a visualisation of the data for example).
+# - SSC (Size Stratified Coverage): This measures the maximum violation
+# of coverage by grouping intervals by width and computing coverage for
+# each group. An adaptive method has a maximum violation close to the global
+# coverage. Among the four methods, CV+ performs the best.
 #
-# In this example, with the hsic correlation coefficient, none of the methods
-# stand out from the others. However, the SSC score for the method using the
-# gamma score is significantly worse than for CQR and ResidualNormalisedScore,
-# even though their global coverage is similar. ResidualNormalisedScore and CQR
-# are very close here, with ResidualNormalisedScore being slightly more
-# conservative.
+# HSIC (Hilbert-Schmidt Independence Criterion): This computes the
+# correlation between coverage and interval size. A value of 0 indicates
+# independence between the two.
+#
+# It's important to note that CV+ with the absolute residual score
+# calculates constant intervals, which are not adaptive. Therefore,
+# checking the distribution of interval widths is crucial before drawing conclusions.
+#
+# In this example, none of the methods stand out with the HSIC correlation coefficient.
+# However, the SSC score for the gamma score method is significantly worse than
+# for CQR and ResidualNormalisedScore, despite similar global coverage.
+# ResidualNormalisedScore and CQR are very close, with ResidualNormalisedScore
+# being slightly more conservative.
 
 
 # Visualition of the data and predictions
@@ -336,21 +330,19 @@ def plot_coverage_by_width(y, intervals, num_bins, alpha, title="", ax=None):
 plt.show()
 
 ##############################################################################
-# With toy datasets like this, it is easy to compare visually the methods
-# with a plot of the data and predictions.
-# As mentionned above, a histogram of the ditribution of the interval widths is
-# important to accompany the metrics. It is clear from this histogram
-# that CV+ is not adaptive, the metrics presented here should not be used
-# to evaluate its adaptivity. A wider spread of intervals indicates a more
-# adaptive method.
-# Finally, with the plot of coverage by bins of intervals grouped by widths
-# (which is the output of :func:`~mapie.metrics.regression_ssc`), we want
-# the bins to be as constant as possible around the global coverage (here 0.9).
-
-# As the previous metrics show, gamma score does not perform well in terms of
-# size stratified coverage. It either over-covers or under-covers too much.
-# For ResidualNormalisedScore and CQR, while the first one has several bins
-# with over-coverage, the second one has more under-coverage. These results
-# are confirmed by the visualisation of the data: CQR is better when the data
-# are more spread out, whereas ResidualNormalisedScore is better with small
-# intervals.
+# With toy datasets, it's easy to visually compare methods using data and
+# prediction plots. A histogram of interval widths is crucial to accompany
+# the metrics. This histogram shows that CV+ is not adaptive, so the metrics
+# should not be used to evaluate its adaptivity. A wider spread of intervals
+# indicates a more adaptive method.
+#
+# The plot of coverage by bins of intervals grouped by widths
+# (output of :func:`~mapie.metrics.regression_ssc`) should
+# show bins as constant as possible around the global coverage (0.9).
+
+# The gamma score does not perform well in size stratified coverage,
+# often over-covering or under-covering. ResidualNormalisedScore has
+# several bins with over-coverage, while CQR has more under-coverage.
+# Visualizing the data confirms these results: CQR performs better
+# with spread-out data, whereas ResidualNormalisedScore is better
+# with small intervals.
diff --git a/examples/regression/2-advanced-analysis/plot_conformal_predictive_distribution.py b/examples/regression/2-advanced-analysis/plot_conformal_predictive_distribution.py
@@ -84,9 +84,9 @@ def get_cumulative_distribution_function(self, X):
 
 ##############################################################################
 # Now, we propose to use it with two different conformity scores -
-# :class:`~mapie.conformity_score.AbsoluteConformityScore` and
-# :class:`~mapie.conformity_score.ResidualNormalisedScore` - in split-conformal
-# inference.
+# :class:`~mapie.conformity_scores.AbsoluteConformityScore` and
+# :class:`~mapie.conformity_scores.ResidualNormalisedScore` -
+# in split-conformal inference.
 
 mapie_regressor_1 = MapieConformalPredictiveDistribution(
     estimator=LinearRegression(),
diff --git a/examples/regression/2-advanced-analysis/plot_coverage_validity.py b/examples/regression/2-advanced-analysis/plot_coverage_validity.py
@@ -172,8 +172,7 @@ def cumulative_average(arr):
 
 
 ##############################################################################
-# Experiment 1: Coverage Validity for given confidence_level (confidence level) and
-# n_conformalize (data points dedicated to conformalization)
+# Experiment 1: Coverage Validity for given confidence_level and n_conformalize
 # --------------------------------------------------------------------------------
 #
 # To begin, we propose to use ``confidence_level=0.8`` and
diff --git a/examples/regression/2-advanced-analysis/plot_cqr_symmetry_difference.py b/examples/regression/2-advanced-analysis/plot_cqr_symmetry_difference.py
@@ -5,7 +5,7 @@
 
 
 An example plot of :class:`~mapie_v1.regression.ConformalizedQuantileRegressor`
-illustrating the impact of the symmetry parameter.
+illustrating the impact of the ``symmetric_correction`` parameter.
 """
 import numpy as np
 from matplotlib import pyplot as plt
@@ -124,9 +124,9 @@
 plt.show()
 
 ##############################################################################
-# The symmetric intervals (`symmetry=True`) use a combined set of residuals
-# for both bounds, while the asymmetric intervals use distinct residuals for
-# each bound, allowing for more flexible and accurate intervals that reflect
-# the heteroscedastic nature of the data. The resulting effective coverages
-# demonstrate the theoretical guarantee of the target coverage level
-# ``confidence_level``.
+# The symmetric intervals (``symmetric_correction=True``) use a combined set of residuals
+# for both bounds, while the asymmetric intervals (``symmetric_correction=False``)
+# use distinct residuals for each bound, allowing for more flexible and
+# accurate intervals that reflect the heteroscedastic nature of the data.
+# The resulting effective coverages demonstrate the theoretical guarantee of
+# the target coverage level ``confidence_level``.
diff --git a/examples/regression/2-advanced-analysis/plot_main-tutorial-regression.py b/examples/regression/2-advanced-analysis/plot_main-tutorial-regression.py
@@ -1,7 +1,7 @@
 r"""
-==============================================================
-Tutorial for tabular regression
-==============================================================
+=======================================================================
+Comparison between conformalized quantile regressor and cross methods
+=======================================================================
 
 
 In this tutorial, we compare the prediction intervals estimated by MAPIE on a
diff --git a/examples/regression/2-advanced-analysis/plot_nested-cv.py b/examples/regression/2-advanced-analysis/plot_nested-cv.py
@@ -4,38 +4,30 @@
 ==========================================================================================
 
 
-This example compares non-nested and nested cross-validation strategies for
-estimating prediction intervals with
+This example compares non-nested and nested cross-validation strategies
+when using
 :class:`~mapie_v1.regression.CrossConformalRegressor`.
 
-In the regular sequential method, a cross-validation parameter search is
-carried out over the entire training set.
-The model with the set of parameters that gives the best score is then used in
-MAPIE to estimate the prediction intervals associated with the predictions.
-A limitation of this method is that residuals used by MAPIE are computed on
-the validation dataset, which can be subject to overfitting as far as
-hyperparameter tuning is concerned.
+In the regular sequential method, a cross-validation parameter search is performed
+on the entire training set. The best model is then used in MAPIE to estimate
+prediction intervals. However, as MAPIE computes residuals on
+the validation dataset used during hyperparameter tuning, it can lead to
+overfitting. This fools MAPIE into being slightly too optimistic with confidence
+intervals.
 
-This fools MAPIE into being slightly too optimistic with confidence intervals.
 To solve this problem, an alternative option is to perform a nested
 cross-validation parameter search directly within the MAPIE estimator on each
 *out-of-fold* dataset.
-For each testing fold used by MAPIE to store residuals, an internal
-cross-validation occurs on the training fold, optimizing hyperparameters.
 This ensures that residuals seen by MAPIE are never seen by the algorithm
 beforehand. However, this method is much heavier computationally since
 it results in ``N * P`` calculations, where *N* is the number of
 *out-of-fold* models and *P* the number of parameter search cross-validations,
 versus ``N + P`` for the non-nested approach.
 
-Here, we compare the two strategies on a toy dataset. We use the Random
-Forest Regressor as a base regressor for the CV+ strategy. For the sake of
-light computation, we adopt a RandomizedSearchCV parameter search strategy
-with a low number of iterations and with a reproducible random state.
+Here, we compare the two strategies on a toy dataset.
 
 The two approaches give slightly different predictions with the nested CV
-approach estimating slightly larger prediction interval widths by a
-few percents at most (apart from a handful of exceptions).
+approach estimating larger prediction interval in average.
 
 For this example, the two approaches result in identical scores and identical
 effective coverages.
diff --git a/examples/regression/2-advanced-analysis/plot_timeseries_enbpi.py b/examples/regression/2-advanced-analysis/plot_timeseries_enbpi.py
@@ -1,6 +1,6 @@
 """
 ==================================================================
-Estimating prediction intervals of time series forecast with EnbPI
+Time series: example of the EnbPI technique
 ==================================================================
 
 This example uses

Original file line number	Diff line number	Diff line change
`@@ -172,8 +172,7 @@ def cumulative_average(arr):`
`172`	`172`
`173`	`173`
`174`	`174`	`##############################################################################`
`175`		`-# Experiment 1: Coverage Validity for given confidence_level (confidence level) and`
`176`		`-# n_conformalize (data points dedicated to conformalization)`
	`175`	`+# Experiment 1: Coverage Validity for given confidence_level and n_conformalize`
`177`	`176`	`# --------------------------------------------------------------------------------`
`178`	`177`	`#`
`179`	`178`	# To begin, we propose to use ``confidence_level=0.8`` and