Centering and scaling #688

luisheb · 2025-05-14T06:17:00Z

Summary

This pull request introduces a complete framework for centering and scaling functional data in the scikit-fda library. These tools are particularly relevant in the context of vector-valued functional data and mixed datasets, where components may differ in units, scale, or variability.

Motivation

In classical functional data analysis, centering and scaling are often not required, since all functions are assumed to be sampled over the same domain and measured in the same units. However, when combining multiple functional components or integrating scalar and functional features, normalization becomes critical to ensure fair comparisons and effective learning.

This PR addresses this gap by providing:

Scikit-learn compatible transformers for centering and scaling.
Several statistical tools to compute meaningful scaling factors from the data.

Main Additions

Transformers

CenterScaler: Flexible transformer that applies user-defined or data-driven centering and scaling operations to FDataGrid or FDataBasis.
StandardScaler: Computes the mean and standard deviation of the dataset and uses them to standardize functional data, mimicking scikit-learn’s StandardScaler.

Summary Statistics (New Functions)

These utilities compute scalar summaries useful for centering and scaling:

individual_observation_mean: Integrated average of each function (vertical shift).
grand_mean: Global scalar mean of all functions.
root_integrated_sample_variance: A robust measure of variability, subtracting the mean function before integration.
root_mean_square_l2: RMS of the L2 norm of each function (total magnitude without centering).
individual_root_mean_square_l2: Computes RMS individually per function.

These functions are available under the skfda.exploratory.stats module and complement the existing set of functional location and dispersion statistics.

Documentation

Added a new documentation section: Scaling, under preprocessing/.
Describes when and why centering and scaling are relevant in functional and mixed settings.
Includes usage guidelines and mathematical definitions for each transformation and statistic.

Checklist before requesting a review

I have performed a self-review of my code
The code conforms to the style used in this package
The code is fully documented and typed (type-checked with Mypy)
I have added thorough tests for the new/changed functionality

vnmabus · 2025-07-03T07:41:41Z

docs/modules/preprocessing/scaling.rst

+References
+----------
+
+* J. Prothero, J. Hannig, and J. Marron. *New perspectives on centering*. The New England Journal of Statistics in Data Science, vol. 1, no. 2, 216–236, 2023.


We would rather use sphinxcontrib-bibtex so that all references are formatted in a consistent way.

docs/modules/preprocessing/scaling.rst

vnmabus · 2025-07-05T10:44:18Z

skfda/preprocessing/scaler.py

+        self.with_std = with_std
+        self.correction_ = correction
+
+        self.mean_: FData | None = None


The parameters that end in underscore should be set only in fit.

vnmabus · 2025-07-06T11:02:37Z

skfda/preprocessing/scaler.py

+                msg = "Cannot center with more than one sample"
+                raise ValueError(msg)
+            result = result - center
+        else:


I do not understand the else cases. When are they applied?

vnmabus · 2025-07-06T11:03:35Z

skfda/preprocessing/scaler.py

+                msg = "Cannot center with more than one sample"
+                raise ValueError(msg)
+            result = result - center
+        else:


I also do not understand the else cases here.

vnmabus · 2025-07-06T11:04:42Z

skfda/tests/test_centering_scaling.py

+
+
+@pytest.fixture
+def sample_fdgrid() -> Generator[FDataGrid, None, None]:


Since when is this a Generator?

vnmabus · 2025-07-06T11:06:21Z

skfda/tests/test_centering_scaling.py

+    with pytest.raises(TypeError):
+        root_integrated_sample_variance(
+            "not an FData object", # type: ignore[arg-type]
+            )


I am missing tests for FDataBasis and for the centering and scaling methods.

Co-authored-by: Carlos Ramos Carreño <[email protected]>

Luis Hebrero added 7 commits April 23, 2025 03:59

scaler and centering functions

c26ce49

name change

fcb999e

Doc and testo for centering and scaling quantities

7bbb5ee

root mean square l2

006c55d

scaler doc and final design, more coherent

7713836

individual_root_mean_square_l2

0f4ad35

ruff corrections

7389303

luisheb changed the title ~~Feature/centering and scaling~~ Centering and scaling May 14, 2025

Luis Hebrero and others added 4 commits June 13, 2025 10:16

scaler correction

93b3506

Merge branch 'GAA-UAM:develop' into feature/centering_and_scaling

2c2388d

Scaling doc

4c51df5

reference set up

c643837

luisheb marked this pull request as ready for review June 18, 2025 15:40

vnmabus requested changes Jul 6, 2025

View reviewed changes

Apply suggestions from code review

20e2141

Co-authored-by: Carlos Ramos Carreño <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Centering and scaling #688

Centering and scaling #688

Uh oh!

luisheb commented May 14, 2025 •

edited

Loading

Uh oh!

vnmabus Jul 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vnmabus Jul 5, 2025

Uh oh!

vnmabus Jul 6, 2025

Uh oh!

vnmabus Jul 6, 2025

Uh oh!

vnmabus Jul 6, 2025

Uh oh!

vnmabus Jul 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		@pytest.fixture
		def sample_fdgrid() -> Generator[FDataGrid, None, None]:

Centering and scaling #688

Are you sure you want to change the base?

Centering and scaling #688

Uh oh!

Conversation

luisheb commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Main Additions

Transformers

Summary Statistics (New Functions)

Documentation

Checklist before requesting a review

Uh oh!

vnmabus Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vnmabus Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

vnmabus Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

vnmabus Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

vnmabus Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

vnmabus Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

luisheb commented May 14, 2025 •

edited

Loading