add docs for ProjectScale

Red-Portal · Red-Portal · commit 03338d6333c1 · 2024-11-16T21:58:47.000-08:00
diff --git a/docs/src/elbo/repgradelbo.md b/docs/src/elbo/repgradelbo.md
@@ -219,7 +219,7 @@ _, _, stats_cfe, _ = AdvancedVI.optimize(
     max_iter;
     show_progress = false,
     adtype        = AutoForwardDiff(),
-    optimizer     = Optimisers.Adam(3e-3),
+    optimizer     = ProjectScale(Optimisers.Adam(3e-3)),
     callback      = callback,
 ); 
 
@@ -230,7 +230,7 @@ _, _, stats_stl, _ = AdvancedVI.optimize(
     max_iter;
     show_progress = false,
     adtype        = AutoForwardDiff(),
-    optimizer     = Optimisers.Adam(3e-3),
+    optimizer     = ProjectScale(Optimisers.Adam(3e-3)),
     callback      = callback,
 ); 
 
@@ -265,6 +265,7 @@ Furthermore, in a lot of cases, a low-accuracy solution may be sufficient.
 
 [^RWD2017]: Roeder, G., Wu, Y., & Duvenaud, D. K. (2017). Sticking the landing: Simple, lower-variance gradient estimators for variational inference. Advances in Neural Information Processing Systems, 30.
 [^KMG2024]: Kim, K., Ma, Y., & Gardner, J. (2024). Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?. In International Conference on Artificial Intelligence and Statistics (pp. 235-243). PMLR.
+
 ## Advanced Usage
 
 There are two major ways to customize the behavior of `RepGradELBO`
@@ -317,7 +318,7 @@ _, _, stats_qmc, _ = AdvancedVI.optimize(
     max_iter;
     show_progress = false,
     adtype        = AutoForwardDiff(),
-    optimizer     = Optimisers.Adam(3e-3),
+    optimizer     = ProjectScale(Optimisers.Adam(3e-3)),
     callback      = callback,
 ); 
 
diff --git a/docs/src/examples.md b/docs/src/examples.md
@@ -118,11 +118,14 @@ q_avg_trans, q_trans, stats, _ = AdvancedVI.optimize(
     n_max_iter;
     show_progress=false,
     adtype=AutoForwardDiff(),
-    optimizer=Optimisers.Adam(1e-3),
+    optimizer=ProjectScale(Optimisers.Adam(1e-3)),
 );
 nothing
 ```
 
+`ProjectScale` is a wrapper around an optimization rule such that the variational approximation stays within a stable region of the variational family.
+For more information see [this section](@ref projectscale).
+
 `q_avg_trans` is the final output of the optimization procedure.
 If a parameter averaging strategy is used through the keyword argument `averager`, `q_avg_trans` is be the output of the averaging strategy, while `q_trans` is the last iterate.
 
diff --git a/docs/src/families.md b/docs/src/families.md
@@ -56,6 +56,16 @@ FullRankGaussian
 MeanFieldGaussian
 ```
 
+### [Scale Projection Operator](@id projectscale)
+For the location scale, it is often the case that optimization is stable only when the smallest eigenvalue of the scale matrix is strictly positive[^D2020].
+To ensure this, we provide the following wrapper around optimization rule:
+
+```@docs
+ProjectScale
+```
+
+[^D2020]: Domke, J. (2020). Provable smoothness guarantees for black-box variational inference. In *International Conference on Machine Learning*.
+
 ### Gaussian Variational Families
 
 ```julia
diff --git a/src/families/location_scale.jl b/src/families/location_scale.jl
@@ -140,6 +140,15 @@ function MeanFieldGaussian(μ::AbstractVector{T}, L::Diagonal{T}) where {T<:Real
     return MvLocationScale(μ, L, Normal{T}(zero(T), one(T)))
 end
 
+"""
+    ProjectScale(rule, scale_eps)
+
+Compose an optimization `rule` with a projection, where the projection ensures that a `LocationScale` or `LocationScaleLowRank` has a scale with eigenvalues larger than `scale_eps`.
+
+# Arguments
+- `rule::Optimisers.AbstractRule`: Optimization rule to compose with the projection.
+- `scale_eps::Real`: Lower bound on the eigenvalues of the scale matrix of the projection.
+"""
 struct ProjectScale{Rule<:Optimisers.AbstractRule,F<:Real} <: Optimisers.AbstractRule
     rule::Rule
     scale_eps::F