Skip to content

Conversation

ChrisRackauckas
Copy link
Member

This is a pretty major performance boost

D(y₂) ~ k₁ * y₁ - k₂ * y₂^2 - k₃ * y₂ * y₃,
D(y₃) ~ k₂ * y₂^2]
@mtkbuild rober = ODESystem(eqs, t)
@mtkcompile rober = ODESystem(eqs, t)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtkcompile rober = System(eqs, t)

@ChrisRackauckas
Copy link
Member Author

With this and the nonlinearsolve.jl PR, this is now not allocating in the loop locally... except I found something in FBDF 😅

terk_tmp = @.. broadcast=false fd_weights[k - 2, 1]*u

Copy link
Contributor

github-actions bot commented Jul 18, 2025

Benchmark Results (Julia v1)

Time benchmarks
master 9ae0860... master / 9ae0860...
construction/linear_N50 27.7 ± 0.92 μs 27.4 ± 14 μs 1.01 ± 0.53
construction/lotka_volterra 18 ± 0.28 μs 18 ± 0.2 μs 1 ± 0.019
construction/rober 17.9 ± 0.23 μs 17.9 ± 0.2 μs 1 ± 0.017
nonstiff/fitzhugh_nagumo/BS3 0.144 ± 0.01 ms 0.144 ± 0.01 ms 0.995 ± 0.099
nonstiff/fitzhugh_nagumo/DP5 0.0952 ± 0.0086 ms 0.0966 ± 0.0093 ms 0.985 ± 0.13
nonstiff/fitzhugh_nagumo/Tsit5 0.113 ± 0.0051 ms 0.114 ± 0.0093 ms 0.994 ± 0.093
nonstiff/fitzhugh_nagumo/Vern6 0.165 ± 0.0071 ms 0.168 ± 0.0084 ms 0.98 ± 0.065
nonstiff/fitzhugh_nagumo/Vern7 0.111 ± 0.0075 ms 0.111 ± 0.0081 ms 1 ± 0.1
nonstiff/lotka_volterra/BS3 0.292 ± 0.011 ms 0.293 ± 0.011 ms 0.995 ± 0.055
nonstiff/lotka_volterra/DP5 0.0445 ± 0.016 ms 0.0458 ± 0.018 ms 0.973 ± 0.52
nonstiff/lotka_volterra/Tsit5 0.0603 ± 0.023 ms 0.0617 ± 0.024 ms 0.977 ± 0.53
nonstiff/lotka_volterra/Vern6 0.0802 ± 0.01 ms 0.0814 ± 0.012 ms 0.985 ± 0.19
nonstiff/lotka_volterra/Vern7 0.0488 ± 0.022 ms 0.0485 ± 0.022 ms 1.01 ± 0.65
nonstiff/pleiades/BS3 0.0892 ± 0.013 s 0.0893 ± 0.0092 s 0.999 ± 0.17
nonstiff/pleiades/DP5 1.25 ± 0.093 ms 1.25 ± 0.095 ms 1 ± 0.11
nonstiff/pleiades/Tsit5 15.9 ± 5.9 ms 15.4 ± 7.3 ms 1.03 ± 0.62
nonstiff/pleiades/Vern6 6.75 s 7.55 s 0.893
nonstiff/pleiades/Vern7 8 s 8.02 s 0.998
scaling/brusselator_2d/16x16 0.306 ± 0.022 s 0.322 ± 0.035 s 0.951 ± 0.12
scaling/brusselator_2d/32x32 7.24 s 7.3 s 0.992
scaling/brusselator_2d/8x8 11.4 ± 0.47 ms 11.3 ± 0.32 ms 1 ± 0.05
scaling/linear/N10 0.0375 ± 0.018 ms 0.0382 ± 0.018 ms 0.984 ± 0.66
scaling/linear/N100 0.813 ± 0.027 ms 0.818 ± 0.021 ms 0.995 ± 0.042
scaling/linear/N50 0.217 ± 0.013 ms 0.221 ± 0.012 ms 0.983 ± 0.077
stiff/pollution/FBDF 0.598 ± 0.014 ms 0.598 ± 0.015 ms 1 ± 0.034
stiff/pollution/KenCarp4 0.55 ± 0.012 ms 0.549 ± 0.011 ms 1 ± 0.03
stiff/pollution/Rodas4 0.769 ± 0.014 ms 0.768 ± 0.016 ms 1 ± 0.027
stiff/pollution/Rosenbrock23 1.38 ± 0.024 ms 1.38 ± 0.032 ms 0.999 ± 0.029
stiff/pollution/TRBDF2 0.512 ± 0.011 ms 0.514 ± 0.012 ms 0.996 ± 0.032
stiff/rober/FBDF 0.603 ± 0.011 ms 0.604 ± 0.011 ms 0.998 ± 0.025
stiff/rober/KenCarp4 0.796 ± 0.02 ms 0.762 ± 0.02 ms 1.04 ± 0.038
stiff/rober/Rodas4 0.397 ± 0.0099 ms 0.399 ± 0.011 ms 0.996 ± 0.037
stiff/rober/Rosenbrock23 0.262 ± 0.01 ms 0.267 ± 0.0097 ms 0.98 ± 0.052
stiff/rober/TRBDF2 1.73 ± 0.018 ms 1.75 ± 0.018 ms 0.991 ± 0.015
stiff/van_der_pol/FBDF 9.81 ± 0.088 ms 9.82 ± 0.16 ms 0.999 ± 0.019
stiff/van_der_pol/KenCarp4 4.96 ± 0.067 ms 4.84 ± 0.063 ms 1.02 ± 0.019
stiff/van_der_pol/Rodas4 7.07 ± 0.086 ms 7.42 ± 0.08 ms 0.952 ± 0.015
stiff/van_der_pol/Rosenbrock23 20.5 ± 0.2 ms 20.5 ± 0.23 ms 0.998 ± 0.015
stiff/van_der_pol/TRBDF2 3.01 ± 0.082 ms 3.02 ± 0.079 ms 0.997 ± 0.038
time_to_load 3.17 ± 0.05 s 3.21 ± 0.034 s 0.986 ± 0.019
Memory benchmarks
master 9ae0860... master / 9ae0860...
construction/linear_N50 0.071 k allocs: 0.0411 MB 0.071 k allocs: 0.0411 MB 1
construction/lotka_volterra 0.065 k allocs: 2.45 kB 0.065 k allocs: 2.45 kB 1
construction/rober 0.065 k allocs: 2.42 kB 0.065 k allocs: 2.42 kB 1
nonstiff/fitzhugh_nagumo/BS3 3.67 k allocs: 0.163 MB 3.67 k allocs: 0.163 MB 1
nonstiff/fitzhugh_nagumo/DP5 2.62 k allocs: 0.126 MB 2.62 k allocs: 0.126 MB 1
nonstiff/fitzhugh_nagumo/Tsit5 3.98 k allocs: 0.181 MB 3.98 k allocs: 0.181 MB 1
nonstiff/fitzhugh_nagumo/Vern6 4.52 k allocs: 0.207 MB 4.52 k allocs: 0.207 MB 1
nonstiff/fitzhugh_nagumo/Vern7 3.89 k allocs: 0.165 MB 3.89 k allocs: 0.165 MB 1
nonstiff/lotka_volterra/BS3 7.86 k allocs: 0.365 MB 7.86 k allocs: 0.365 MB 1
nonstiff/lotka_volterra/DP5 1.2 k allocs: 0.0536 MB 1.2 k allocs: 0.0536 MB 1
nonstiff/lotka_volterra/Tsit5 2.16 k allocs: 0.0924 MB 2.16 k allocs: 0.0924 MB 1
nonstiff/lotka_volterra/Vern6 2.23 k allocs: 0.0979 MB 2.23 k allocs: 0.0979 MB 1
nonstiff/lotka_volterra/Vern7 1.64 k allocs: 0.073 MB 1.64 k allocs: 0.073 MB 1
nonstiff/pleiades/BS3 0.685 M allocs: 0.0675 GB 0.685 M allocs: 0.0675 GB 1
nonstiff/pleiades/DP5 6.63 k allocs: 0.51 MB 6.63 k allocs: 0.51 MB 1
nonstiff/pleiades/Tsit5 0.186 M allocs: 20.4 MB 0.186 M allocs: 20.4 MB 1
nonstiff/pleiades/Vern6 0.038 G allocs: 4.05 GB 0.038 G allocs: 4.05 GB 1
nonstiff/pleiades/Vern7 0.044 G allocs: 4.63 GB 0.044 G allocs: 4.63 GB 1
scaling/brusselator_2d/16x16 3.57 k allocs: 0.152 GB 3.57 k allocs: 0.152 GB 1
scaling/brusselator_2d/32x32 3.39 k allocs: 2.22 GB 3.39 k allocs: 2.22 GB 1
scaling/brusselator_2d/8x8 2.62 k allocs: 8.76 MB 2.62 k allocs: 8.76 MB 1
scaling/linear/N10 0.752 k allocs: 0.0514 MB 0.752 k allocs: 0.0514 MB 1
scaling/linear/N100 2.27 k allocs: 0.901 MB 2.27 k allocs: 0.901 MB 1
scaling/linear/N50 1.66 k allocs: 0.348 MB 1.66 k allocs: 0.348 MB 1
stiff/pollution/FBDF 1.5 k allocs: 0.288 MB 1.5 k allocs: 0.288 MB 1
stiff/pollution/KenCarp4 0.508 k allocs: 0.134 MB 0.508 k allocs: 0.134 MB 1
stiff/pollution/Rodas4 1.2 k allocs: 0.36 MB 1.2 k allocs: 0.36 MB 1
stiff/pollution/Rosenbrock23 2.58 k allocs: 0.814 MB 2.58 k allocs: 0.814 MB 1
stiff/pollution/TRBDF2 0.779 k allocs: 0.168 MB 0.779 k allocs: 0.168 MB 1
stiff/rober/FBDF 2.87 k allocs: 0.136 MB 2.87 k allocs: 0.136 MB 1
stiff/rober/KenCarp4 1.28 k allocs: 0.0565 MB 1.28 k allocs: 0.0565 MB 1
stiff/rober/Rodas4 1.95 k allocs: 0.0984 MB 1.95 k allocs: 0.0984 MB 1
stiff/rober/Rosenbrock23 2.23 k allocs: 0.108 MB 2.23 k allocs: 0.108 MB 1
stiff/rober/TRBDF2 7.7 k allocs: 0.359 MB 7.7 k allocs: 0.359 MB 1
stiff/van_der_pol/FBDF 0.0444 M allocs: 2.2 MB 0.0444 M allocs: 2.2 MB 1
stiff/van_der_pol/KenCarp4 3.91 k allocs: 0.173 MB 3.91 k allocs: 0.173 MB 1
stiff/van_der_pol/Rodas4 0.0335 M allocs: 1.73 MB 0.0335 M allocs: 1.73 MB 1
stiff/van_der_pol/Rosenbrock23 0.189 M allocs: 8.37 MB 0.189 M allocs: 8.37 MB 1
stiff/van_der_pol/TRBDF2 4.64 k allocs: 0.201 MB 4.64 k allocs: 0.201 MB 1
time_to_load 0.159 k allocs: 11.2 kB 0.159 k allocs: 11.2 kB 1

@ChrisRackauckas
Copy link
Member Author

Spot benchmarks with this:

using ModelingToolkit
using ModelingToolkit: t_nounits as t, D_nounits as D
using NonlinearSolve, OrdinaryDiffEqBDF, OrdinaryDiffEqSDIRK, DiffEqDevTools
using OrdinaryDiffEqNonlinearSolve: NonlinearSolveAlg
using Test
using BenchmarkTools

@parameters k₁ k₂ k₃
@variables y₁(t) y₂(t) y₃(t)

eqs = [D(y₁) ~ -k₁ * y₁ + k₃ * y₂ * y₃,
    D(y₂) ~ k₁ * y₁ - k₂ * y₂^2 - k₃ * y₂ * y₃,
    D(y₃) ~ k₂ * y₂^2]
@mtkcompile rober = ODESystem(eqs, t)
prob = ODEProblem(rober, [[y₁, y₂, y₃] .=> [1.0; 0.0; 0.0]; [k₁, k₂, k₃] .=> (0.04, 3e7, 1e4)], (0.0, 1e5), jac = true)
prob2 = ODEProblem(rober, [[y₁, y₂, y₃] .=> [1.0; 0.0; 0.0]; [k₁, k₂, k₃] .=> (0.04, 3e7, 1e4)], (0.0, 1e5), jac = true, nlstep = true)

nlalg = NonlinearSolveAlg(NewtonRaphson(autodiff = AutoFiniteDiff()));
nlalgrobust = NonlinearSolveAlg(TrustRegion(autodiff = AutoFiniteDiff()));
sol2 = solve(prob2, FBDF(autodiff=AutoFiniteDiff(), nlsolve = nlalg));

@profview for i in 1:10000 solve(prob2, FBDF(autodiff=AutoFiniteDiff(), nlsolve = nlalg); save_everystep=false); end

sol = solve(prob2, FBDF(autodiff=AutoFiniteDiff(), nlsolve = nlalg));
using Plots
plot(sol)

@btime solve(prob, FBDF(autodiff=AutoFiniteDiff()); save_everystep=false); # 132.333 μs (211 allocations: 16.17 KiB)
@btime solve(prob, FBDF(autodiff=AutoFiniteDiff(), nlsolve = nlalg); save_everystep=false); # 148.459 μs (254 allocations: 21.77 KiB)
@btime solve(prob2, FBDF(autodiff=AutoFiniteDiff(), nlsolve = nlalg); save_everystep=false); # 143.791 μs (250 allocations: 23.58 KiB)

@btime solve(prob, KenCarp4(autodiff=AutoFiniteDiff()); save_everystep=false); # 170.334 μs (193 allocations: 14.81 KiB)
@btime solve(prob, KenCarp4(autodiff=AutoFiniteDiff(), nlsolve = nlalg); save_everystep=false); # 168.083 μs (230 allocations: 20.12 KiB)
@btime solve(prob2, KenCarp4(autodiff=AutoFiniteDiff(), nlsolve = nlalg); save_everystep=false); # 135.625 μs (238 allocations: 23.08 KiB)

using Sundials
@btime solve(prob, CVODE_BDF(); save_everystep=false); # 72.958 μs (1289 allocations: 45.50 KiB)

using OrdinaryDiffEqRosenbrock
@btime solve(prob2, Rosenbrock23(autodiff=AutoFiniteDiff()); save_everystep=false); # 37.041 μs (179 allocations: 17.44 KiB)
@btime solve(prob2, Rodas5P(autodiff=AutoFiniteDiff()); save_everystep=false); # 84.708 μs (195 allocations: 19.72 KiB)

The vast majority of what's left vs CVODE here is just BDF overhead on small equations. We've never really worried about that too much because it's the domain of the Rosenbrock methods anyways, but with this new feature it might matter more now and we should actually improve the FBDF non-nonlinearsolver parts 😅

@oscardssmith
Copy link
Member

This is now working locally:

using ModelingToolkit
using ModelingToolkit: t_nounits as t, D_nounits as D
using LinearAlgebra, SparseArrays

const N = 8
const xyd_brusselator = range(0, stop = 1, length = N)
brusselator_f(x, y, t) = (((x - 0.3)^2 + (y - 0.6)^2) <= 0.1^2) * (t >= 1.1) * 5.0
limit(a, N) = a == N + 1 ? 1 : a == 0 ? N : a
function brusselator_2d_loop(du, u, p, t)
    A, B, alpha, dx = p
    alpha = alpha / dx^2
    @inbounds for I in CartesianIndices((N, N))
        i, j = Tuple(I)
        x, y = xyd_brusselator[I[1]], xyd_brusselator[I[2]]
        ip1, im1, jp1, jm1 = limit(i + 1, N), limit(i - 1, N), limit(j + 1, N),
        limit(j - 1, N)
        du[i, j, 1] = alpha * (u[im1, j, 1] + u[ip1, j, 1] + u[i, jp1, 1] + u[i, jm1, 1] -
                       4u[i, j, 1]) +
                      B + u[i, j, 1]^2 * u[i, j, 2] - (A + 1) * u[i, j, 1] +
                      brusselator_f(x, y, t)
        du[i, j, 2] = alpha * (u[im1, j, 2] + u[ip1, j, 2] + u[i, jp1, 2] + u[i, jm1, 2] -
                       4u[i, j, 2]) +
                      A * u[i, j, 1] - u[i, j, 1]^2 * u[i, j, 2]
    end
end
p = (3.4, 1.0, 10.0, step(xyd_brusselator))

function init_brusselator_2d(xyd)
    N = length(xyd)
    u = zeros(N, N, 2)
    for I in CartesianIndices((N, N))
        x = xyd[I[1]]
        y = xyd[I[2]]
        u[I, 1] = 22 * (y * (1 - y))^(3 / 2)
        u[I, 2] = 27 * (x * (1 - x))^(3 / 2)
    end
    u
end
u0 = init_brusselator_2d(xyd_brusselator)
prob = ODEProblem(brusselator_2d_loop, u0, (0.0, 11.5), p)

@mtkcompile sys = modelingtoolkitize(prob)
prob_mtk = ODEProblem(sys, [], (0.0, 11.5), nlstep=true, nlstep_compile=false)


nlalg = NonlinearSolveAlg(NewtonRaphson(autodiff = AutoFiniteDiff()))
using NonlinearSolve,OrdinaryDiffEqBDF, OrdinaryDiffEqNonlinearSolve
using OrdinaryDiffEqNonlinearSolve: NonlinearSolveAlg

nlalg = NonlinearSolveAlg(TrustRegion(autodiff = AutoFiniteDiff()))
@btime sol = solve(prob_mtk, FBDF(autodiff=false, nlsolve = nlalg)).stats

@ChrisRackauckas
Copy link
Member Author

nlstep tests are failing.

@ChrisRackauckas ChrisRackauckas mentioned this pull request Aug 10, 2025
11 tasks
@oscardssmith
Copy link
Member

Rebased with compat bump. Tests look good locally. Lets see if CI agrees.

@oscardssmith
Copy link
Member

CI looks good. @ChrisRackauckas I think this is ready.

@ChrisRackauckas ChrisRackauckas merged commit 08e5b34 into master Oct 3, 2025
267 of 357 checks passed
@ChrisRackauckas ChrisRackauckas deleted the reinit branch October 3, 2025 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants