-
Notifications
You must be signed in to change notification settings - Fork 48
Add Riemannian manifold HMC #439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
AdvancedHMC.jl documentation for PR #439 is available at: |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #439 +/- ##
==========================================
+ Coverage 75.44% 78.31% +2.86%
==========================================
Files 21 22 +1
Lines 1230 1185 -45
==========================================
Hits 928 928
+ Misses 302 257 -45 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I've had a quick look - this PR clearly still needs some work. The sampling of the random momentum seems to be numerically unstable and inefficient, see (this PR) AdvancedHMC.jl/src/riemannian/metric.jl Lines 52 to 63 in c9e6b0a
AdvancedHMC.jl/src/riemannian/metric.jl Lines 63 to 73 in 9cac513
For the Softabs map, which internally does an eigendecomposition (see AdvancedHMC.jl/src/riemannian/metric.jl Lines 13 to 19 in c9e6b0a
|
This is a good point. Though for code modularity it would be nice if we didn't have to redefine H and its derivatives specifically for the eigendecomposition case. I have two thoughts on this:
|
Yeah, we'd either have to find a wrapper for this somewhere in the wild (I'm sure it exists somewhere) or write our own.
What would you compute the cholesky factor for? You can just compute the matrix's square root using the eigendecompoisition, which you already have, which should be all that you need. |
Hypothetically, if you could convert eigen to Cholesky for free, it would be more efficient to compute H and its derivatives using the low triangular L than the dense V from the eigendecomposition. But FLOPS-wise the overhead of doing the conversion only ends up being worth it if you are doing O(D) vector multiplies/solves. This doesn't apply to us in the dHdp. It sort of does for the trace term in dHdθ though. |
The one in here? AdvancedHMC.jl/src/riemannian/hamiltonian.jl Lines 12 to 34 in c9e6b0a
(Edit for in place display of code) |
|
The trace of the product of two (square) matrices can be computed in O(n^2) without computing the O(n^3) matrix product, see e.g. https://en.wikipedia.org/wiki/Trace_(linear_algebra)#Trace_of_a_product. Furthermore, do we think that Julia is clever enough to reorder this expression: |
| θ::AbstractVecOrMat, | ||
| ) where {T} | ||
| r = _randn(rng, T, size(metric)...) | ||
| G⁻¹ = inv(metric.map(metric.G(θ))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very inefficient and numerically unstable to compute a direct inverse. Worse, you're not taking advantage of the fact that G is PSD. Instead, you should compute the Cholesky decomposition first.
Let G = U'U where U are the upper triangles from the Cholesky decomposition.
You're computing the Cholesky of G⁻¹ which is (U⁻¹)(U⁻¹)'.
You then do a ldiv to compute r = (U'⁻¹)⁻¹ * r, but that's just U' * r.
Instead, you can just do lmul!(chol.L, r) where chol is the Cholesky of G directly.
|
|
||
| # Negative kinetic energy | ||
| #! Eq (13) of Girolami & Calderhead (2011) | ||
| function neg_energy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? Eq (13) of Girolami & Calderhead (2011) also contains the log-likelihood term, -L(θ). Why is this not included here?
Even if this is correct, we should clarify the naming conventions as it's quite hard to follow.
| kinetic, | ||
| #! Eq (14) of Girolami & Calderhead (2011) | ||
| function ∂H∂r( | ||
| h::Hamiltonian{<:DenseRiemannianMetric,<:GaussianKinetic}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AbstractRiemannianMetric?
This logic is not unique to the dense metric.
|
I made this review weeks ago but not sure it was actually showing up for others until I marked it as "complete". |
No haha, I hadn't seen this before 😅 |
|
I've got another, maybe silly, conceptual question. If we're using the softabs map, then the Riemannian metric (right?) G is not the Fisher Information Metric, and it's also not the Hessian (which after all is not guaranteed to always be SPD and thus no metric), but Shouldn't the Edit: maybe all that modulo some inverses? I'm really not familiar with this algorithm. |
That would make sense to me. All computations in the RHMC integrator are wrt G = softabs(H). H is never used directly. This is the point I was alluding to when talking about separating the logic for the DenseRiemannanMetric in particular (softabs/eigen stuff) from the AbstractRiemannianMetric stuff. |
Add the Riemannian manifold HMC from "Riemann manifold Langevin and Hamiltonian Monte Carlo methods" which is implemented in the research directory but has not been merged into the package. Supersede #432 #437