-
Notifications
You must be signed in to change notification settings - Fork 37
Make the constructor of LogDensityFunction implicit
#1156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1156 +/- ##
==========================================
- Coverage 81.58% 78.93% -2.66%
==========================================
Files 40 40
Lines 3845 3836 -9
==========================================
- Hits 3137 3028 -109
- Misses 708 808 +100 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Benchmark Report
Computer InformationBenchmark Results |
What are you trying to accomplish with this? |
|
Hi! I would like to do |
|
Yes, but why are you trying to do that? Any meaningful change to the varinfo will require the prep to be recalculated anyway. In the next version of DynamicPPL, LogDensityFunction won't even carry a varinfo with it. |
|
Okay, to be more explicit, I am trying to implement subsampling, which requires replacing the |
|
LogDensityFunction is pretty much just a thing that takes a vector and returns a float, anything beyond that is best treated as an implementation detail that shouldn't be relied on. If you need to manipulate a varinfo, I would suggest that you create your own struct that contains the model + varinfo, and then manipulate that varinfo as desired. There might be other alternatives too, I don't mind taking a look at what you're trying to do and seeing how it can be done, but I don't think that setting varinfo in a LDF is the right way. |
|
Basically we have to implement the |
|
Do you have a branch / PR that does this for DynamicPPL? I am almost 100% certain that modifying the varinfo will give you lots of headaches. Please don't do that. LDF has this argument, called struct ScaledLogJoint{F<:Real}
scale::F
end
(s::ScaledLogJoint)(vi::AbstractVarInfo) = getlogprior(vi) + (s.scale * getloglikelihood(vi)) - getlogjac(vi)and then do updated_model = # model conditioned on new subset of data...
ldf = LogDensityFunction(updated_model, ScaledLogJoint(scale); adtype=adtype)But note that you are providing a different function to differentiate — so you can't reuse the same prep.
That's what I meant when I said that any meaningful change to the varinfo will require the prep to be updated anyway. If you edited the varinfo without updating the prep I assume that you would potentially get correctness errors. |
You can find my partial attempt here.
Huh, I vaguely remember @gdalle told that |
|
I mean, what you're doing is basically this: using DifferentiationInterface
using ForwardDiff, ReverseDiff, Mooncake
adtypes = [
AutoForwardDiff(),
AutoReverseDiff(),
AutoReverseDiff(; compile=true),
AutoMooncake()
]
for adtype in adtypes
@info "AD type: ", adtype
x = [1.0, 2.0]
f(x) = sum(x)
prep = prepare_gradient(f, adtype, x)
f(x) = 1.5 * sum(x)
@info "actual: ", gradient(f, prep, adtype, x)
@info "expected: ", ForwardDiff.gradient(f, x)
println()
end |
|
No @Red-Portal, auxiliary variables are among the arguments in the docstring you mentioned, so they are not allowed to change type or size. The "operators" page of the DI docs gives more details on preparation and its semantics, but happy to clarify further if necessary |
|
Thank you for the clarification. I think we can make the size not to change. The values will definitely have to change though. According to the docstring, that should be fine, correct? But I recall we had this discussion before. What I am trying to do is very standard for any statistical application involving minibatching. Surely, there must be a way to use |
|
@penelopeysm I think what we're trying to do is more like: using DifferentiationInterface
using LinearAlgebra
using ForwardDiff, ReverseDiff, Mooncake
adtypes = [
AutoForwardDiff(),
AutoReverseDiff(),
AutoReverseDiff(; compile=true),
AutoMooncake()
]
for adtype in adtypes
@info "AD type: ", adtype
x = [1.0, 2.0]
y = randn(2)
f(x) = dot(x,y)
prep = prepare_gradient(f, adtype, x)
g = gradient(f, prep, adtype, x)
y[:] = randn(2)
g_new = gradient(f, prep, adtype, x)
@info "not the same: ", norm(g - g_new) > 1e-10
end[ Info: ("AD type: ", AutoForwardDiff())
[ Info: ("expected true: ", true)
[ Info: ("AD type: ", AutoReverseDiff())
[ Info: ("expected true: ", true)
[ Info: ("AD type: ", AutoReverseDiff(compile=true))
[ Info: ("expected true: ", false)
[ Info: ("AD type: ", AutoMooncake())
[ Info: ("expected true: ", true) |
|
Guillaume is the authority on this, but I feel like that just proves my point. You're relying on the fact that it happens to work for some backends when that isn't a guarantee that's provided by DI. In any case, as far as this PR is concerned, I'm not willing to change the implementation of LDF. If you really want to go down this route, you should duplicate the code as necessary. This is especially so because LDF's implementation is very different in 0.39 and so your code will immediately break if you are relying on its internal structure. I would very strongly suggest that you change the approach to do something like this instead: using DifferentiationInterface
using ForwardDiff, ReverseDiff, Mooncake
using LinearAlgebra: dot
adtypes = [
AutoForwardDiff(),
AutoReverseDiff(),
AutoReverseDiff(; compile=true),
AutoMooncake()
]
for adtype in adtypes
@info "AD type: ", adtype
x = [1.0, 2.0]
y = randn(2)
prep = prepare_gradient(dot, adtype, x, Constant(y))
# calculate with the old prep
y = randn(2)
grad = gradient(dot, prep, adtype, x, Constant(y))
@info "actual: ", grad
# calculate without the prep
@info "expected: ", gradient(dot, adtype, x, Constant(y))
println()
end[ Info: ("AD type: ", AutoForwardDiff())
[ Info: ("actual: ", [-0.10652485041993587, -0.3575589022616598])
[ Info: ("expected: ", [-0.10652485041993587, -0.3575589022616598])
[ Info: ("AD type: ", AutoReverseDiff())
[ Info: ("actual: ", [0.11226749448821424, -0.0700900922876089])
[ Info: ("expected: ", [0.11226749448821424, -0.0700900922876089])
[ Info: ("AD type: ", AutoReverseDiff(compile=true))
[ Info: ("actual: ", [-0.6767470277992602, 1.6638846489301709])
[ Info: ("expected: ", [-0.6767470277992602, 1.6638846489301709])
[ Info: ("AD type: ", AutoMooncake())
[ Info: ("actual: ", [2.437615526525269, -1.0598882129119007])
[ Info: ("expected: ", [2.437615526525269, -1.0598882129119007])And for DynamicPPL, it's not all that difficult to adapt to this scheme: function evaluate_with_new_data(x, model, varinfo, loglike_scale, new_data)
new_model = decondition(model, @varname(datapoints)) | (; datapoints = new_data)
vi = DynamicPPL.unflatten(varinfo, x)
_, vi = DynamicPPL.evaluate!!(new_model, vi)
return getlogprior(vi) + loglike_scale*getloglikelihood(vi) - getlogjac(vi)
end(I definitely recommend using condition / decondition because Then you can build a single prep for that function and reuse it to your heart's content as long as the size and type of |
|
Okay, so what you're suggesting here is to have a downstream customized version of Given the previous agreement with Hong that subsampling-related features should be implemented and tested in By the way, @gdalle could you confirm that there is no valid way to use |
|
Yeah, I think that makes sense. You could stick that |
|
@Red-Portal the right way to change the behavior of a function is to encapsulate the changing part inside a |
This PR makes the default of LogDensityFunction implicit. This is necessary to mutate-construct a LogDensityFunction (for example, by calling Accessors.@set) without worrying about DI.prepare being called each time.