Skip to content

Conversation

@Red-Portal
Copy link
Member

This PR makes the default of LogDensityFunction implicit. This is necessary to mutate-construct a LogDensityFunction (for example, by calling Accessors.@set) without worrying about DI.prepare being called each time.

@codecov
Copy link

codecov bot commented Nov 26, 2025

Codecov Report

❌ Patch coverage is 90.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 78.93%. Comparing base (052bc19) to head (ed1db96).

Files with missing lines Patch % Lines
src/logdensityfunction.jl 90.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1156      +/-   ##
==========================================
- Coverage   81.58%   78.93%   -2.66%     
==========================================
  Files          40       40              
  Lines        3845     3836       -9     
==========================================
- Hits         3137     3028     -109     
- Misses        708      808     +100     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Contributor

Benchmark Report

  • this PR's head: ed1db96795650d96b921a322266ba0f3cfd65f10
  • base branch: 052bc1950df3a42e14b56eae51af236881092f90

Computer Information

Julia Version 1.11.7
Commit f2b3dbda30a (2025-09-08 12:10 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

┌───────────────────────┬───────┬─────────────┬───────────────────┬────────┬─────────────────────────────────┬────────────────────────────┐
│                       │       │             │                   │        │        t(eval) / t(ref)         │     t(grad) / t(eval)      │
│                       │       │             │                   │        │ ──────────┬───────────┬──────── │ ───────┬─────────┬──────── │
│                 Model │   Dim │  AD Backend │           VarInfo │ Linked │      base │   this PR │ speedup │   base │ this PR │ speedup │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼───────────┼───────────┼─────────┼────────┼─────────┼─────────┤
│               Dynamic │    10 │    mooncake │             typed │   true │    425.09 │    503.37 │    0.84 │  12.35 │    9.35 │    1.32 │
│                   LDA │    12 │ reversediff │             typed │   true │   2835.36 │   3092.92 │    0.92 │   2.20 │    2.13 │    1.03 │
│   Loop univariate 10k │ 10000 │    mooncake │             typed │   true │ 170679.88 │ 166400.65 │    1.03 │   5.14 │    5.20 │    0.99 │
│    Loop univariate 1k │  1000 │    mooncake │             typed │   true │  14045.81 │  15052.69 │    0.93 │   5.67 │    5.90 │    0.96 │
│      Multivariate 10k │ 10000 │    mooncake │             typed │   true │  31489.80 │  35426.65 │    0.89 │   9.78 │    9.42 │    1.04 │
│       Multivariate 1k │  1000 │    mooncake │             typed │   true │   3585.06 │   4058.67 │    0.88 │   8.91 │    8.59 │    1.04 │
│ Simple assume observe │     1 │ forwarddiff │             typed │  false │     17.22 │     19.73 │    0.87 │   1.86 │    2.06 │    0.91 │
│           Smorgasbord │   201 │      enzyme │             typed │   true │   2890.37 │   2759.74 │    1.05 │   4.08 │    4.80 │    0.85 │
│           Smorgasbord │   201 │ forwarddiff │       simple_dict │   true │  29014.16 │  25094.92 │    1.16 │  19.66 │   24.84 │    0.79 │
│           Smorgasbord │   201 │ forwarddiff │ simple_namedtuple │   true │   1035.49 │   1135.71 │    0.91 │ 170.10 │   80.49 │    2.11 │
│           Smorgasbord │   201 │ forwarddiff │             typed │  false │   2754.01 │   2736.21 │    1.01 │  41.66 │   85.39 │    0.49 │
│           Smorgasbord │   201 │ forwarddiff │      typed_vector │   true │   2632.27 │   2805.23 │    0.94 │  43.92 │   43.16 │    1.02 │
│           Smorgasbord │   201 │ forwarddiff │           untyped │   true │   2374.02 │   2494.01 │    0.95 │  43.83 │   48.11 │    0.91 │
│           Smorgasbord │   201 │ forwarddiff │    untyped_vector │   true │   2324.80 │   2509.85 │    0.93 │  45.31 │   45.12 │    1.00 │
│           Smorgasbord │   201 │    mooncake │             typed │   true │   2915.12 │   2780.13 │    1.05 │   5.04 │    5.63 │    0.90 │
│           Smorgasbord │   201 │ reversediff │             typed │   true │   2699.00 │   2866.57 │    0.94 │  53.75 │   56.16 │    0.96 │
│              Submodel │     1 │    mooncake │             typed │   true │     25.89 │     27.89 │    0.93 │   5.22 │    5.25 │    0.99 │
└───────────────────────┴───────┴─────────────┴───────────────────┴────────┴───────────┴───────────┴─────────┴────────┴─────────┴─────────┘

@penelopeysm
Copy link
Member

This is necessary to mutate-construct a LogDensityFunction (for example, by calling Accessors.@set) without worrying about DI.prepare being called each time.

What are you trying to accomplish with this?

@Red-Portal
Copy link
Member Author

Hi! I would like to do @set logdensityprob.varinfo = varinfo_new. But the explicit constructor prevents the use of @set.

@penelopeysm
Copy link
Member

Yes, but why are you trying to do that? Any meaningful change to the varinfo will require the prep to be recalculated anyway. In the next version of DynamicPPL, LogDensityFunction won't even carry a varinfo with it.

@Red-Portal
Copy link
Member Author

Red-Portal commented Nov 26, 2025

Okay, to be more explicit, I am trying to implement subsampling, which requires replacing the LogLikelihoodAccumulator at each step of variational inference.

@penelopeysm
Copy link
Member

LogDensityFunction is pretty much just a thing that takes a vector and returns a float, anything beyond that is best treated as an implementation detail that shouldn't be relied on.

If you need to manipulate a varinfo, I would suggest that you create your own struct that contains the model + varinfo, and then manipulate that varinfo as desired. There might be other alternatives too, I don't mind taking a look at what you're trying to do and seeing how it can be done, but I don't think that setting varinfo in a LDF is the right way.

@Red-Portal
Copy link
Member Author

Red-Portal commented Nov 26, 2025

Basically we have to implement the subsample function explained in this tutorial. So the model underlying a LogDensityProblem has to be updated, and the log-likelihood needs to be adjusted by a constant (the long-gone MinibatchContext was supposed to do that). Of course, a LogDensityProblem is just an interface, so I am relying on DynamicPPL specifics here.

@penelopeysm
Copy link
Member

penelopeysm commented Nov 26, 2025

Do you have a branch / PR that does this for DynamicPPL?

I am almost 100% certain that modifying the varinfo will give you lots of headaches. Please don't do that.

LDF has this argument, called getlogdensity, which by default is getlogjoint_internal which in turn is logprior + loglikelihood - logjac. To scale the likelihood, all you need to do is to define a new function or callable struct

struct ScaledLogJoint{F<:Real}
    scale::F
end
(s::ScaledLogJoint)(vi::AbstractVarInfo) = getlogprior(vi) + (s.scale * getloglikelihood(vi)) - getlogjac(vi)

and then do

updated_model = # model conditioned on new subset of data...

ldf = LogDensityFunction(updated_model, ScaledLogJoint(scale); adtype=adtype)

But note that you are providing a different function to differentiate — so you can't reuse the same prep.

https://juliadiff.org/DifferentiationInterface.jl/DifferentiationInterface/stable/api/#DifferentiationInterface.prepare_gradient

The preparation result prep is only reusable as long as the arguments to gradient do not change type or size, and the function and backend themselves are not modified.

That's what I meant when I said that any meaningful change to the varinfo will require the prep to be updated anyway. If you edited the varinfo without updating the prep I assume that you would potentially get correctness errors.

@Red-Portal
Copy link
Member Author

Red-Portal commented Nov 27, 2025

Do you have a branch / PR that does this for DynamicPPL?

You can find my partial attempt here.

But note that you are providing a different function to differentiate — so you can't reuse the same prep.

https://juliadiff.org/DifferentiationInterface.jl/DifferentiationInterface/stable/api/#DifferentiationInterface.prepare_gradient

Huh, I vaguely remember @gdalle told that prep should be fine for this use case where only the dimensions of some auxiliary variables would change. I guess the policy changed since then?

@penelopeysm
Copy link
Member

I mean, what you're doing is basically this:

using DifferentiationInterface
using ForwardDiff, ReverseDiff, Mooncake

adtypes = [
    AutoForwardDiff(),
    AutoReverseDiff(),
    AutoReverseDiff(; compile=true),
    AutoMooncake()
]
for adtype in adtypes
    @info "AD type: ", adtype
    x = [1.0, 2.0]
    f(x) = sum(x)
    prep = prepare_gradient(f, adtype, x)
    f(x) = 1.5 * sum(x)
    @info "actual:   ", gradient(f, prep, adtype, x)
    @info "expected: ", ForwardDiff.gradient(f, x)
    println()
end
[ Info: ("AD type: ", AutoForwardDiff())
[ Info: ("actual:   ", [1.5, 1.5])
[ Info: ("expected: ", [1.5, 1.5])

[ Info: ("AD type: ", AutoReverseDiff())
[ Info: ("actual:   ", [1.5, 1.5])
[ Info: ("expected: ", [1.5, 1.5])

[ Info: ("AD type: ", AutoReverseDiff(compile=true))
[ Info: ("actual:   ", [1.0, 1.0])
[ Info: ("expected: ", [1.5, 1.5])

[ Info: ("AD type: ", AutoMooncake())
[ Info: ("actual:   ", [1.0, 1.0])
[ Info: ("expected: ", [1.5, 1.5])

@gdalle
Copy link
Contributor

gdalle commented Nov 27, 2025

No @Red-Portal, auxiliary variables are among the arguments in the docstring you mentioned, so they are not allowed to change type or size. The "operators" page of the DI docs gives more details on preparation and its semantics, but happy to clarify further if necessary

@Red-Portal
Copy link
Member Author

Red-Portal commented Nov 27, 2025

Thank you for the clarification. I think we can make the size not to change. The values will definitely have to change though. According to the docstring, that should be fine, correct?

But I recall we had this discussion before. What I am trying to do is very standard for any statistical application involving minibatching. Surely, there must be a way to use prep without issue right?

@Red-Portal
Copy link
Member Author

@penelopeysm I think what we're trying to do is more like:

using DifferentiationInterface
using LinearAlgebra
using ForwardDiff, ReverseDiff, Mooncake

adtypes = [
    AutoForwardDiff(),
    AutoReverseDiff(),
    AutoReverseDiff(; compile=true),
    AutoMooncake()
]
for adtype in adtypes
    @info "AD type: ", adtype
    x = [1.0, 2.0]
    
    y = randn(2)
    f(x) = dot(x,y)

    prep = prepare_gradient(f, adtype, x)

    g = gradient(f, prep, adtype, x)
    
    y[:] = randn(2)  
    
    g_new = gradient(f, prep, adtype, x)

    @info "not the same: ", norm(g - g_new) > 1e-10
end
[ Info: ("AD type: ", AutoForwardDiff())
[ Info: ("expected true: ", true)
[ Info: ("AD type: ", AutoReverseDiff())
[ Info: ("expected true: ", true)
[ Info: ("AD type: ", AutoReverseDiff(compile=true))
[ Info: ("expected true: ", false)
[ Info: ("AD type: ", AutoMooncake())
[ Info: ("expected true: ", true)

@penelopeysm
Copy link
Member

penelopeysm commented Nov 27, 2025

Guillaume is the authority on this, but I feel like that just proves my point. You're relying on the fact that it happens to work for some backends when that isn't a guarantee that's provided by DI.

In any case, as far as this PR is concerned, I'm not willing to change the implementation of LDF. If you really want to go down this route, you should duplicate the code as necessary. This is especially so because LDF's implementation is very different in 0.39 and so your code will immediately break if you are relying on its internal structure.

I would very strongly suggest that you change the approach to do something like this instead:

using DifferentiationInterface
using ForwardDiff, ReverseDiff, Mooncake
using LinearAlgebra: dot

adtypes = [
   AutoForwardDiff(),
   AutoReverseDiff(),
   AutoReverseDiff(; compile=true),
   AutoMooncake()
]
for adtype in adtypes
   @info "AD type: ", adtype
   x = [1.0, 2.0]
   y = randn(2)
   prep = prepare_gradient(dot, adtype, x, Constant(y))
   # calculate with the old prep
   y = randn(2)
   grad = gradient(dot, prep, adtype, x, Constant(y))
   @info "actual:   ", grad
   # calculate without the prep
   @info "expected: ", gradient(dot, adtype, x, Constant(y))
   println()
end
[ Info: ("AD type: ", AutoForwardDiff())
[ Info: ("actual:   ", [-0.10652485041993587, -0.3575589022616598])
[ Info: ("expected: ", [-0.10652485041993587, -0.3575589022616598])

[ Info: ("AD type: ", AutoReverseDiff())
[ Info: ("actual:   ", [0.11226749448821424, -0.0700900922876089])
[ Info: ("expected: ", [0.11226749448821424, -0.0700900922876089])

[ Info: ("AD type: ", AutoReverseDiff(compile=true))
[ Info: ("actual:   ", [-0.6767470277992602, 1.6638846489301709])
[ Info: ("expected: ", [-0.6767470277992602, 1.6638846489301709])

[ Info: ("AD type: ", AutoMooncake())
[ Info: ("actual:   ", [2.437615526525269, -1.0598882129119007])
[ Info: ("expected: ", [2.437615526525269, -1.0598882129119007])

And for DynamicPPL, it's not all that difficult to adapt to this scheme:

function evaluate_with_new_data(x, model, varinfo, loglike_scale, new_data)
    new_model = decondition(model, @varname(datapoints)) | (; datapoints = new_data)
    vi = DynamicPPL.unflatten(varinfo, x)
    _, vi = DynamicPPL.evaluate!!(new_model, vi)
    return getlogprior(vi) + loglike_scale*getloglikelihood(vi) - getlogjac(vi)
end

(I definitely recommend using condition / decondition because model.defaults is also an internal detail. Conditioning is a public API.)

Then you can build a single prep for that function and reuse it to your heart's content as long as the size and type of new_data doesn't change.

@Red-Portal
Copy link
Member Author

Okay, so what you're suggesting here is to have a downstream customized version of LogDensityFunction. I can definitely do that. However, in the long run, subsampling as a feature will have to be implemented elsewhere throughout the Turing ecosystem. For example, stochastic gradient MCMC algorithms have existed for a while but were basically dormant.

Given the previous agreement with Hong that subsampling-related features should be implemented and tested in AdvancedVI first and then moved upstream, we could punt this for now and later think about how to move all of this upstream. Shall we go this route?

By the way, @gdalle could you confirm that there is no valid way to use prep for this use case, given that AutoReverseDiff(; compile=true) doesn't return a correct result? (For reference, I remember that you mentioned this shouldn't happen, although that discussion is pretty old. Wow time flies!)

@penelopeysm
Copy link
Member

Yeah, I think that makes sense. You could stick that evaluate_with_new_data function in the DynamicPPLExt that you're doing, and then make use of it in AdvancedVI.subsample, and when you feel happy with it, shift it to DynamicPPL itself.

@Red-Portal Red-Portal closed this Nov 27, 2025
@gdalle
Copy link
Contributor

gdalle commented Nov 28, 2025

@Red-Portal the right way to change the behavior of a function is to encapsulate the changing part inside a Constant, as Penelope demonstrated. That will work even with ReverseDiff because we make sure not to bake that constant into the compiled tape. It won't work with the upcoming Reactant support though, but let's not worry about that yet

@Red-Portal
Copy link
Member Author

Red-Portal commented Nov 28, 2025

@gdalle @penelopeysm Okay then since everything was already going through Const in the first place, there's nothing to worry about. I just need to make the batches equally-sized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants