You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/tutorials/flows.md
+2-8Lines changed: 2 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -188,14 +188,8 @@ nothing
188
188
189
189
For the variational inference algorithms, we will similarly minimize the KL divergence with stochastic gradient descent as originally proposed by Rezende and Mohamed[^RM2015].
190
190
For this, however, we need to be mindful of the requirements of the variational algorithm.
191
-
The default objective of `KLMinRepGradDescent` essentially assumes a `MvLocationScale` family is being used:
192
-
193
-
-`entropy=RepGradELBO()`: The default `entropy` gradient estimator is `ClosedFormEntropy()`, which assumes that the entropy of the variational family `entropy(q)` is available. For flows, the entropy is (usually) not available.
194
-
-`operator=ClipScale()`: The `operator` applied after a gradient descent step is `ClipScale` by default. This operator only works on `MvLocationScale` and `MvLocationScaleLowRank`.
195
-
Therefore, we have to customize the two keyword arguments above to make it work with flows.
196
-
197
-
In particular, for the `operator`, we will use `IdentityOperator()`, which is a no-op.
198
-
For `entropy`, we can use any gradient estimator that only relies on the log-density of the variational family `logpdf(q)`, `StickingTheLandingEntropy()` or `MonteCarloEntropy()`.
191
+
The default `entropy` gradient estimator of `KLMinRepGradDescent` is `ClosedFormEntropy()`, which assumes that the entropy of the variational family `entropy(q)` is available. For flows, the entropy is (usually) not available.
192
+
Instead, we can use any gradient estimator that only relies on the log-density of the variational family `logpdf(q)`, `StickingTheLandingEntropy()` or `MonteCarloEntropy()`.
199
193
Here, we will use `StickingTheLandingEntropy()`[^RWD2017].
200
194
When the variational family is "expressive," this gradient estimator has a variance reduction effect, resulting in faster convergence[^ASD2020].
201
195
Furthermore, Agrawal *et al.*[^AD2025] claim that using a larger number of Monte Carlo samples `n_samples` is beneficial.
0 commit comments