@@ -186,34 +186,28 @@ def sin_with_controlled_noise(
186186 print (estimated_cond_cov )
187187
188188##############################################################################
189- # We can see here that the global coverage is approximately the same for
190- # all methods. What we want to understand is : "Are these methods good
191- # adaptive conformal methods ?". For this we have the two metrics
189+ # The global coverage is similar for all methods. To determine if these
190+ # methods are good adaptive conformal methods, we use two metrics:
192191# :func:`~mapie.metrics.regression_ssc_score` and :func:`~mapie.metrics.hsic`.
193- # - SSC (Size Stratified Coverage) is the maximum violation of the coverage :
194- # the intervals are grouped by width and the coverage is computed for each
195- # group. The lower coverage is the maximum coverage violation. An adaptive
196- # method is one where this maximum violation is as close as possible to the
197- # global coverage. If we interpret the result for the four methods here :
198- # CV+ seems to be the better one.
199- # - And with the hsic correlation coefficient, we have the
200- # same interpretation : :func:`~mapie.metrics.hsic` computes the correlation
201- # between the coverage indicator and the interval size, a value of 0
202- # translates an independence between the two.
203192#
204- # We would like to highlight here the misinterpretation that can be made
205- # with these metrics. In fact, here CV+ with the absolute residual score
206- # calculates constant intervals which, by definition, are not adaptive.
207- # Therefore, it is very important to check that the intervals widths are well
208- # spread before drawing conclusions (with a plot of the distribution of
209- # interval widths or a visualisation of the data for example).
193+ # - SSC (Size Stratified Coverage): This measures the maximum violation
194+ # of coverage by grouping intervals by width and computing coverage for
195+ # each group. An adaptive method has a maximum violation close to the global
196+ # coverage. Among the four methods, CV+ performs the best.
210197#
211- # In this example, with the hsic correlation coefficient, none of the methods
212- # stand out from the others. However, the SSC score for the method using the
213- # gamma score is significantly worse than for CQR and ResidualNormalisedScore,
214- # even though their global coverage is similar. ResidualNormalisedScore and CQR
215- # are very close here, with ResidualNormalisedScore being slightly more
216- # conservative.
198+ # HSIC (Hilbert-Schmidt Independence Criterion): This computes the
199+ # correlation between coverage and interval size. A value of 0 indicates
200+ # independence between the two.
201+ #
202+ # It's important to note that CV+ with the absolute residual score
203+ # calculates constant intervals, which are not adaptive. Therefore,
204+ # checking the distribution of interval widths is crucial before drawing conclusions.
205+ #
206+ # In this example, none of the methods stand out with the HSIC correlation coefficient.
207+ # However, the SSC score for the gamma score method is significantly worse than
208+ # for CQR and ResidualNormalisedScore, despite similar global coverage.
209+ # ResidualNormalisedScore and CQR are very close, with ResidualNormalisedScore
210+ # being slightly more conservative.
217211
218212
219213# Visualition of the data and predictions
@@ -336,21 +330,19 @@ def plot_coverage_by_width(y, intervals, num_bins, alpha, title="", ax=None):
336330plt .show ()
337331
338332##############################################################################
339- # With toy datasets like this, it is easy to compare visually the methods
340- # with a plot of the data and predictions.
341- # As mentionned above, a histogram of the ditribution of the interval widths is
342- # important to accompany the metrics. It is clear from this histogram
343- # that CV+ is not adaptive, the metrics presented here should not be used
344- # to evaluate its adaptivity. A wider spread of intervals indicates a more
345- # adaptive method.
346- # Finally, with the plot of coverage by bins of intervals grouped by widths
347- # (which is the output of :func:`~mapie.metrics.regression_ssc`), we want
348- # the bins to be as constant as possible around the global coverage (here 0.9).
349-
350- # As the previous metrics show, gamma score does not perform well in terms of
351- # size stratified coverage. It either over-covers or under-covers too much.
352- # For ResidualNormalisedScore and CQR, while the first one has several bins
353- # with over-coverage, the second one has more under-coverage. These results
354- # are confirmed by the visualisation of the data: CQR is better when the data
355- # are more spread out, whereas ResidualNormalisedScore is better with small
356- # intervals.
333+ # With toy datasets, it's easy to visually compare methods using data and
334+ # prediction plots. A histogram of interval widths is crucial to accompany
335+ # the metrics. This histogram shows that CV+ is not adaptive, so the metrics
336+ # should not be used to evaluate its adaptivity. A wider spread of intervals
337+ # indicates a more adaptive method.
338+ #
339+ # The plot of coverage by bins of intervals grouped by widths
340+ # (output of :func:`~mapie.metrics.regression_ssc`) should
341+ # show bins as constant as possible around the global coverage (0.9).
342+
343+ # The gamma score does not perform well in size stratified coverage,
344+ # often over-covering or under-covering. ResidualNormalisedScore has
345+ # several bins with over-coverage, while CQR has more under-coverage.
346+ # Visualizing the data confirms these results: CQR performs better
347+ # with spread-out data, whereas ResidualNormalisedScore is better
348+ # with small intervals.
0 commit comments