diff --git a/intermediate_source/parametrizations.py b/intermediate_source/parametrizations.py index 59cff1d241c..13fbf625d5c 100644 --- a/intermediate_source/parametrizations.py +++ b/intermediate_source/parametrizations.py @@ -4,6 +4,45 @@ ========================= **Author**: `Mario Lezcano `_ +What is a Parametrization? +-------------------------- + +In the context of neural networks, a **parametrization** is a transformation applied +to the parameters (weights and biases) of a model before they are used in computations. +Rather than using parameters directly as learned during training, we transform them +through a function to enforce certain mathematical properties or constraints. + +**Why do we need parametrizations?** + +When training neural networks, we often want parameters to satisfy specific constraints: + +- **Orthogonal matrices** for stable recurrent networks +- **Positive definite matrices** for covariance estimation +- **Normalized weights** for improved training dynamics +- **Sparse or structured weights** for model compression + +Without parametrizations, enforcing these constraints is challenging. We could use +penalty methods (adding constraint violations to the loss), but these often fail in +deep learning due to the complexity of the optimization landscape. + +**What problems do parametrizations solve?** + +1. **Hard constraints**: Parametrizations guarantee that constraints are always satisfied, + not just approximately. If you parametrize a weight to be orthogonal, it will be + orthogonal at every step of training. + +2. **Improved optimization**: By transforming the parameter space, we can make + optimization easier. For example, learning the norm and direction of a weight + separately (weight normalization) can accelerate training. + +3. **Regularization**: Constraining parameters can act as a powerful regularizer, + preventing overfitting and improving generalization. + +4. **Numerical stability**: Certain parametrizations (like orthogonal matrices in RNNs) + prevent gradient explosion/vanishing, making training more stable. + +**Real-world motivation** + Regularizing deep-learning models is a surprisingly challenging task. Classical techniques such as penalty methods often fall short when applied on deep models due to the complexity of the function being optimized. @@ -37,6 +76,19 @@ In this tutorial, you will learn how to implement and use this pattern to put constraints on your model. Doing so is as easy as writing your own ``nn.Module``. +**Further Reading** + +For those interested in diving deeper into parametrizations and their applications: + +- `Weight Normalization (Salimans & Kingma, 2016) `_: + Introduces weight normalization for accelerating training. +- `Spectral Normalization for GANs (Miyato et al., 2018) `_: + Proposes spectral normalization to stabilize GAN training. +- `Orthogonal RNNs (Helfrich et al., 2018) `_: + Explores orthogonal parametrizations for recurrent networks. +- `Riemannian Optimization (Absil et al., 2008) `_: + Mathematical foundations for optimization on manifolds with constraints. + Requirements: ``torch>=1.9.0`` Implementing parametrizations by hand