Skip to content

Conversation

@ErikQQY
Copy link
Collaborator

@ErikQQY ErikQQY commented May 5, 2025

Add the Riemannian manifold HMC from "Riemann manifold Langevin and Hamiltonian Monte Carlo methods" which is implemented in the research directory but has not been merged into the package. Supersede #432 #437

@ErikQQY ErikQQY requested a review from yebai May 9, 2025 10:03
@github-actions
Copy link
Contributor

AdvancedHMC.jl documentation for PR #439 is available at:
https://TuringLang.github.io/AdvancedHMC.jl/previews/PR439/

@codecov
Copy link

codecov bot commented Jun 29, 2025

Codecov Report

Attention: Patch coverage is 0% with 33 lines in your changes missing coverage. Please review.

Project coverage is 78.31%. Comparing base (5a562e0) to head (567e2a8).

Files with missing lines Patch % Lines
src/riemannian/metric.jl 0.00% 21 Missing ⚠️
src/riemannian/hamiltonian.jl 0.00% 12 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #439      +/-   ##
==========================================
+ Coverage   75.44%   78.31%   +2.86%     
==========================================
  Files          21       22       +1     
  Lines        1230     1185      -45     
==========================================
  Hits          928      928              
+ Misses        302      257      -45     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ErikQQY ErikQQY closed this Jun 29, 2025
@ErikQQY ErikQQY reopened this Jun 29, 2025
@nsiccha
Copy link
Contributor

nsiccha commented Dec 4, 2025

I've had a quick look - this PR clearly still needs some work. The sampling of the random momentum seems to be numerically unstable and inefficient, see (this PR)

function rand_momentum(
rng::Union{AbstractRNG,AbstractVector{<:AbstractRNG}},
metric::DenseRiemannianMetric{T},
kinetic,
θ::AbstractVecOrMat,
) where {T}
r = _randn(rng, T, size(metric)...)
G⁻¹ = inv(metric.map(metric.G(θ)))
chol = cholesky(Symmetric(G⁻¹))
ldiv!(chol.U, r)
return r
end
and maybe also compare with (that PR)
function rand_momentum(
rng::Union{AbstractRNG,AbstractVector{<:AbstractRNG}},
metric::DenseRiemannianMetric{T},
kinetic,
θ::AbstractVecOrMat,
) where {T}
r = _randn(rng, T, size(metric)...)
chol = cholesky(Symmetric(metric.map(metric.G(θ))))
r = chol.L * r
return r
end
.

For the Softabs map, which internally does an eigendecomposition (see

function softabs(X, α=20.0)
F = eigen(X) # ReverseDiff cannot diff through `eigen`
Q = hcat(F.vectors)
λ = F.values
softabsλ = λ .* coth.(α * λ)
return Q * diagm(softabsλ) * Q', Q, λ, softabsλ
end
), not even the cholesky decomposition will be required if we hold onto the eigendecomposition.

@THargreaves
Copy link

not even the cholesky decomposition will be required if we hold onto the eigendecomposition.

This is a good point. Though for code modularity it would be nice if we didn't have to redefine H and its derivatives specifically for the eigendecomposition case.

I have two thoughts on this:

  1. We could write H and its derivatives in a generic way using solves — the only issue is that E = eigen(A) isn't treated as a proper factorisation in Julia so things like E \ x don't work.
  2. You can compute the Cholesky of G from the decomposition by taking the QR decomposition of the eigenvector matrix. I did a back of the envelope FLOP calculation and this is ends up being slower if you only using it to perform vector solves and is a bit faster (but both O(D^3)) when performing matrix solves.

@nsiccha
Copy link
Contributor

nsiccha commented Dec 4, 2025

We could write H and its derivatives in a generic way using solves — the only issue is that E = eigen(A) isn't treated as a proper factorisation in Julia so things like E \ x don't work.

Yeah, we'd either have to find a wrapper for this somewhere in the wild (I'm sure it exists somewhere) or write our own.

You can compute the Cholesky of G from the decomposition by taking the QR decomposition of the eigenvector matrix.

What would you compute the cholesky factor for? You can just compute the matrix's square root using the eigendecompoisition, which you already have, which should be all that you need.

@THargreaves
Copy link

What would you compute the cholesky factor for? You can just compute the matrix's square root using the eigendecompoisition, which you already have, which should be all that you need.

Hypothetically, if you could convert eigen to Cholesky for free, it would be more efficient to compute H and its derivatives using the low triangular L than the dense V from the eigendecomposition. But FLOPS-wise the overhead of doing the conversion only ends up being worth it if you are doing O(D) vector multiplies/solves.

This doesn't apply to us in the dHdp. It sort of does for the trace term in dHdθ though.

@nsiccha
Copy link
Contributor

nsiccha commented Dec 4, 2025

This doesn't apply to us in the dHdp. It sort of does for the trace term in dHdθ though.

The one in here?

function ∂H∂θ(
h::Hamiltonian{<:DenseRiemannianMetric{T,<:IdentityMap},<:GaussianKinetic},
θ::AbstractVecOrMat{T},
r::AbstractVecOrMat{T},
) where {T}
ℓπ, ∂ℓπ∂θ = h.∂ℓπ∂θ(θ)
G = h.metric.map(h.metric.G(θ))
invG = inv(G)
∂G∂θ = h.metric.∂G∂θ(θ)
d = length(∂ℓπ∂θ)
return DualValue(
ℓπ,
#! Eq (15) of Girolami & Calderhead (2011)
-mapreduce(vcat, 1:d) do i
∂G∂θᵢ = ∂G∂θ[:, :, i]
∂ℓπ∂θ[i] - 1 / 2 * tr(invG * ∂G∂θᵢ) + 1 / 2 * r' * invG * ∂G∂θᵢ * invG * r
# Gr = G \ r
# ∂ℓπ∂θ[i] - 1 / 2 * tr(G \ ∂G∂θᵢ) + 1 / 2 * Gr' * ∂G∂θᵢ * Gr
# 1 / 2 * tr(invG * ∂G∂θᵢ)
# 1 / 2 * r' * invG * ∂G∂θᵢ * invG * r
end,
)
end

(Edit for in place display of code)

@nsiccha
Copy link
Contributor

nsiccha commented Dec 4, 2025

The trace of the product of two (square) matrices can be computed in O(n^2) without computing the O(n^3) matrix product, see e.g. https://en.wikipedia.org/wiki/Trace_(linear_algebra)#Trace_of_a_product.

Furthermore, do we think that Julia is clever enough to reorder this expression: r' * invG * ∂G∂θᵢ * invG * r? It's in any case unncessary to compute it as it's implemented there. Do the tmp = invG * r once, then compute the tmp' * ∂G∂θᵢ * tmp in one go - I'm sure there's a function for that.

θ::AbstractVecOrMat,
) where {T}
r = _randn(rng, T, size(metric)...)
G⁻¹ = inv(metric.map(metric.G(θ)))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very inefficient and numerically unstable to compute a direct inverse. Worse, you're not taking advantage of the fact that G is PSD. Instead, you should compute the Cholesky decomposition first.

Let G = U'U where U are the upper triangles from the Cholesky decomposition.

You're computing the Cholesky of G⁻¹ which is (U⁻¹)(U⁻¹)'.

You then do a ldiv to compute r = (U'⁻¹)⁻¹ * r, but that's just U' * r.

Instead, you can just do lmul!(chol.L, r) where chol is the Cholesky of G directly.


# Negative kinetic energy
#! Eq (13) of Girolami & Calderhead (2011)
function neg_energy(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? Eq (13) of Girolami & Calderhead (2011) also contains the log-likelihood term, -L(θ). Why is this not included here?

Even if this is correct, we should clarify the naming conventions as it's quite hard to follow.

kinetic,
#! Eq (14) of Girolami & Calderhead (2011)
function ∂H∂r(
h::Hamiltonian{<:DenseRiemannianMetric,<:GaussianKinetic},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AbstractRiemannianMetric?

This logic is not unique to the dense metric.

@THargreaves
Copy link

I made this review weeks ago but not sure it was actually showing up for others until I marked it as "complete".

@nsiccha
Copy link
Contributor

nsiccha commented Dec 4, 2025

I made this review weeks ago but not sure it was actually showing up for others until I marked it as "complete".

No haha, I hadn't seen this before 😅

@nsiccha
Copy link
Contributor

nsiccha commented Dec 4, 2025

I've got another, maybe silly, conceptual question.

If we're using the softabs map, then the Riemannian metric (right?) G is not the Fisher Information Metric, and it's also not the Hessian (which after all is not guaranteed to always be SPD and thus no metric), but G(theta) = softabs(Hessian(theta)).

Shouldn't the G field in the DenseRiemannianMetric type then already include the map? And the ∂G∂θ should then compute the derivatives wrt to softabs(Hessian(theta))?

Edit: maybe all that modulo some inverses? I'm really not familiar with this algorithm.

@THargreaves
Copy link

Shouldn't the G field in the DenseRiemannianMetric type then already include the map? And the ∂G∂θ should then compute the derivatives wrt to softabs(Hessian(theta))?

That would make sense to me. All computations in the RHMC integrator are wrt G = softabs(H). H is never used directly.

This is the point I was alluding to when talking about separating the logic for the DenseRiemannanMetric in particular (softabs/eigen stuff) from the AbstractRiemannianMetric stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants