Fix missing physics example convergence issues

ChrisRackauckas · claude · ChrisRackauckas · commit b83f9b0a044a · 2025-09-21T23:51:09.000-04:00
- Increased neural network size from 5 neurons to 10 neurons per hidden layer for better expressiveness - Changed ADAM learning rate from default to 0.01 for more stable convergence - Increased ADAM iterations from 5000 to 10000 for better initial convergence - Increased LBFGS iterations from 1000 to 2000 for better final optimization - Fixed plot generation to handle conditional LBFGS plotting - Added true solution overlay to trajectory plot for better visualization - Fixed inconsistent module aliasing (ODE.solve -> OPT.solve) - Fixed parameter optimization function call to use OptimizationOptimJL.LBFGS These changes improve the convergence stability and ensure the UDE approximation properly overlays with the true solution, fixing the issue where red lines were not matching black lines in the generated plots. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
diff --git a/docs/src/showcase/missing_physics.md b/docs/src/showcase/missing_physics.md
@@ -148,9 +148,9 @@ Now let's define our UDE. We will use Lux.jl to define the neural network as fol
 ```@example ude
 rbf(x) = exp.(-(x .^ 2))
 
-# Multilayer FeedForward
-const U = Lux.Chain(Lux.Dense(2, 5, rbf), Lux.Dense(5, 5, rbf), Lux.Dense(5, 5, rbf),
-    Lux.Dense(5, 2))
+# Multilayer FeedForward with more neurons for better expressiveness
+const U = Lux.Chain(Lux.Dense(2, 10, rbf), Lux.Dense(10, 10, rbf), Lux.Dense(10, 10, rbf),
+    Lux.Dense(10, 2))
 # Get the initial parameters and state variables of the model
 p, st = Lux.setup(rng, U)
 const _st = st
@@ -266,12 +266,12 @@ will move to BFGS which will quickly hone in on a local minimum. Note that if we
 ADAM it will take a ton of iterations, and if we only use BFGS we normally end up in a
 bad local minimum, so this combination tends to be a good one for UDEs.
 
-Thus we first solve the optimization problem with ADAM. Choosing a learning rate of 0.1
-(tuned to be as high as possible that doesn't tend to make the loss shoot up), we see:
+Thus we first solve the optimization problem with ADAM. We use a learning rate of 0.01
+(for more stable convergence) and increase the number of iterations to ensure better convergence:
 
 ```@example ude
-res1 = ODE.solve(
-    optprob, OptimizationOptimisers.Adam(), callback = callback, maxiters = 5000)
+res1 = OPT.solve(
+    optprob, OptimizationOptimisers.Adam(0.01), callback = callback, maxiters = 10000)
 println("Training loss after $(length(losses)) iterations: $(losses[end])")
 ```
 
@@ -281,7 +281,7 @@ second optimization, and run it with BFGS. This looks like:
 ```@example ude
 optprob2 = OPT.OptimizationProblem(optf, res1.u)
 res2 = OPT.solve(
-    optprob2, OptimizationOptimJL.LBFGS(linesearch = LineSearches.BackTracking()), callback = callback, maxiters = 1000)
+    optprob2, OptimizationOptimJL.LBFGS(linesearch = LineSearches.BackTracking()), callback = callback, maxiters = 2000)
 println("Final training loss after $(length(losses)) iterations: $(losses[end])")
 
 # Rename the best candidate
@@ -296,10 +296,12 @@ How well did our neural network do? Let's take a look:
 
 ```@example ude
 # Plot the losses
-pl_losses = Plots.Plots.plot(1:5000, losses[1:5000], yaxis = :log10, xaxis = :log10,
+pl_losses = Plots.plot(1:10000, losses[1:10000], yaxis = :log10, xaxis = :log10,
     xlabel = "Iterations", ylabel = "Loss", label = "ADAM", color = :blue)
-Plots.Plots.plot!(5001:length(losses), losses[5001:end], yaxis = :log10, xaxis = :log10,
-    xlabel = "Iterations", ylabel = "Loss", label = "LBFGS", color = :red)
+if length(losses) > 10000
+    Plots.plot!(10001:length(losses), losses[10001:end], yaxis = :log10, xaxis = :log10,
+        xlabel = "Iterations", ylabel = "Loss", label = "LBFGS", color = :red)
+end
 ```
 
 Next, we compare the original data to the output of the UDE predictor. Note that we can even create more samples from the underlying model by simply adjusting the time steps!
@@ -310,9 +312,11 @@ Next, we compare the original data to the output of the UDE predictor. Note that
 ts = first(solution.t):(Statistics.mean(diff(solution.t)) / 2):last(solution.t)
 X̂ = predict(p_trained, Xₙ[:, 1], ts)
 # Trained on noisy data vs real solution
-pl_trajectory = Plots.Plots.plot(ts, transpose(X̂), xlabel = "t", ylabel = "x(t), y(t)", color = :red,
-    label = ["UDE Approximation" nothing])
+pl_trajectory = Plots.plot(ts, transpose(X̂), xlabel = "t", ylabel = "x(t), y(t)", color = :red,
+    label = ["UDE Approximation" nothing], linewidth = 2)
 Plots.scatter!(solution.t, transpose(Xₙ), color = :black, label = ["Measurements" nothing])
+Plots.plot!(solution, color = :blue, label = ["True Solution" nothing],
+    linewidth = 2, linestyle = :dash)
 ```
 
 Let's see how well the unknown term has been approximated:
@@ -488,9 +492,9 @@ function parameter_loss(p)
     sum(abs2, Ŷ .- Y)
 end
 
-optf = Optimization.OptimizationFunction((x, p) -> parameter_loss(x), adtype)
-optprob = Optimization.OptimizationProblem(optf, DataDrivenDiffEq.get_parameter_values(nn_eqs))
-parameter_res = Optimization.OPT.solve(optprob, Optim.LBFGS(), maxiters = 1000)
+optf = OPT.OptimizationFunction((x, p) -> parameter_loss(x), adtype)
+optprob = OPT.OptimizationProblem(optf, DataDrivenDiffEq.get_parameter_values(nn_eqs))
+parameter_res = OPT.solve(optprob, OptimizationOptimJL.LBFGS(), maxiters = 1000)
 ```
 
 ## Simulation