You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: HISTORY.md
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,9 @@
5
5
The default parameters for the parameter-free optimizers `DoG` and `DoWG` has been changed.
6
6
Now, the choice of parameter should be more invariant to dimension such that convergence will become faster than before on high dimensional problems.
7
7
8
+
The default value of the `operator` keyword argument of `KLMinRepGradDescent` has been changed to `IdentityOperator` from `ClipScale`. This means that for variational families `<:MvLocationScale`, optimization may fail since there is nothing enforcing the scale matrix to be positive definite.
9
+
Therefore, in case a variational family of `<:MvLocationScale` is used in combination with `IdentityOperator`, a warning message instruting to use `ClipScale` will be displayed.
10
+
8
11
## Interface Changes
9
12
10
13
An additional layer of indirection, `AbstractAlgorithms` has been added.
This algorithm minimizes the exclusive/reverse KL divergence via stochastic gradient descent in the (Euclidean) space of the parameters of the variational approximation with the reparametrization gradient[^TL2014][^RMW2014][^KW2014].
120
120
This is also commonly referred as automatic differentiation VI, black-box VI, stochastic gradient VI, and so on.
121
121
122
-
`KLMinRepGradDescent`, in particular, assumes that the target `LogDensityProblem` is differentiable.
123
-
If the `LogDensityProblem` has a differentiation [capability](https://www.tamaspapp.eu/LogDensityProblems.jl/dev/#LogDensityProblems.capabilities) of at least first-order, we can take advantage of this.
124
-
For this example, we will use `LogDensityProblemsAD` to equip our problem with a first-order capability:
122
+
Also, projection or proximal operators can be used through the keyword argument `operator`.
123
+
For this example, we will use Gaussian variational family, which is part of the more broad location-scale family.
124
+
These require the scale matrix to have strictly positive eigenvalues at all times.
125
+
Here, the projection operator `ClipScale` ensures this.
126
+
127
+
This `KLMinRepGradDescent`, in particular, assumes that the target `LogDensityProblem` has gradients.
128
+
For this, it is straightforward to use `LogDensityProblemsAD`:
129
+
130
+
```julia
131
+
using DifferentiationInterface: DifferentiationInterface
132
+
using LogDensityProblemsAD: LogDensityProblemsAD
125
133
126
134
```julia
127
135
using DifferentiationInterface: DifferentiationInterface
This algorithm minimizes the exclusive/reverse KL divergence via stochastic gradient descent in the (Euclidean) space of the parameters of the variational approximation with the reparametrization gradient[^TL2014][^RMW2014][^KW2014].
119
119
This is also commonly referred as automatic differentiation VI, black-box VI, stochastic gradient VI, and so on.
120
+
121
+
For certain algorithms such as `KLMinRepGradDescent`, projection or proximal operators can be used through the keyword argument `operator`.
122
+
For this example, we will use Gaussian variational family, which is part of the more broad [location-scale family](@ref locscale).
123
+
Location-scale family distributions require the scale matrix to have strictly positive eigenvalues at all times.
124
+
Here, the projection operator `ClipScale` ensures this.
125
+
120
126
`KLMinRepGradDescent`, in particular, assumes that the target `LogDensityProblem` is differentiable.
121
127
If the `LogDensityProblem` has a differentiation [capability](https://www.tamaspapp.eu/LogDensityProblems.jl/dev/#LogDensityProblems.capabilities) of at least first-order, we can take advantage of this.
122
-
123
128
For this example, we will use `LogDensityProblemsAD` to equip our problem with a first-order capability:
124
129
125
130
[^TL2014]: Titsias, M., & Lázaro-Gredilla, M. (2014, June). Doubly stochastic variational Bayes for non-conjugate inference. In *International Conference on Machine Learning*. PMLR.
Now, `KLMinRepGradDescent` requires the variational approximation and the target log-density to have the same support.
152
+
Since `y` follows a log-normal prior, its support is bounded to be the positive half-space ``\mathbb{R}_+``.
153
+
Thus, we will use [Bijectors](https://github.com/TuringLang/Bijectors.jl) to match the support of our target posterior and the variational approximation.
146
154
The bijector can now be applied to `q` to match the support of the target problem.
"IdentityOperator is used with a variational family <:MvLocationScale. Optimization can easily fail under this combination due to singular scale matrices. Consider using the operator `ClipScale` in the algorithm instead.",
Copy file name to clipboardExpand all lines: src/algorithms/paramspacesgd/constructors.jl
+9-3Lines changed: 9 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,9 @@
4
4
5
5
KL divergence minimization by running stochastic gradient descent with the reparameterization gradient in the Euclidean space of variational parameters.
6
6
7
+
!!! note
8
+
For a `<:MvLocationScale` variational family, `IdentityOperator` should be avoided for `operator` since optimization can result in a singular scale matrix. Instead, consider using [`ClipScale`](@ref).
KL divergence minimization by running stochastic gradient descent with the score gradient in the Euclidean space of variational parameters.
92
95
96
+
!!! note
97
+
If a `<:MvLocationScale` variational family is used, for `operator`, `IdentityOperator` should be avoided since optimization can result in a singular scale matrix. Instead, consider using [`ClipScale`](@ref).
98
+
93
99
# Arguments
94
100
- `adtype`: Automatic differentiation backend.
95
101
@@ -111,7 +117,7 @@ function KLMinScoreGradDescent(
if q_init isa AdvancedVI.MvLocationScale && operator isa AdvancedVI.IdentityOperator
70
+
@warn(
71
+
"IdentityOperator is used with a variational family <:MvLocationScale. Optimization can easily fail under this combination due to singular scale matrices. Consider using the operator `ClipScale` in the algorithm instead.",
0 commit comments