You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update the default parameter value for DoG and DoWG. (#199)
* update default of optimizers make them more invariant to dimension
* update history
* fix make notation more consistent in docs for parameter-free rules
* run formatter
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Copy file name to clipboardExpand all lines: src/optimization/rules.jl
+20-13Lines changed: 20 additions & 13 deletions
Original file line number
Diff line number
Diff line change
@@ -1,19 +1,24 @@
1
1
2
2
"""
3
-
DoWG(repsilon)
3
+
DoWG(alpha)
4
4
5
5
Distance over weighted gradient (DoWG[^KMJ2024]) optimizer.
6
-
It's only parameter is the initial guess of the Euclidean distance to the optimum repsilon.
6
+
Its only parameter is the guess for the distance between the optimum and the initialization `alpha`, which shouldn't need much tuning.
7
+
8
+
DoWG is a minor modification to DoG so that the step sizes are always provably larger than DoG.
9
+
Similarly to DoG, it works by starting from a AdaGrad-like update rule with a small step size, but then automatically increases the step size ("warming up") to be as large as possible.
10
+
If `alpha` is too large, the optimzier can initially diverge, while if it is too small, the warm up period can be too long.
11
+
Depending on the problem, DoWG can be too aggressive and result in unstable behavior.
12
+
If this is suspected, try using DoG instead.
7
13
8
14
# Parameters
9
-
- `repsilon`: Initial guess of the Euclidean distance between the initial point and
10
-
the optimum. (default value: `1e-6`)
15
+
- `alpha`: Scaling factor for initial guess (`repsilon` in the original paper) of the Euclidean distance between the initial point and the optimum. For the initial parameter `lambda0`, `repsilon` is calculated as `repsilon = alpha*(1 + norm(lambda0))`. (default value: `1e-6`)
Optimisers.init(o::DoWG, x::AbstractArray{T}) where {T} = (copy(x), zero(T), T(o.repsilon))
21
+
Optimisers.init(o::DoWG, x::AbstractArray{T}) where {T} = (copy(x), zero(T), T(o.alpha)*(1+norm(x)))
17
22
18
23
function Optimisers.apply!(::DoWG, state, x::AbstractArray{T}, dx) where {T}
19
24
x0, v, r = state
@@ -27,20 +32,22 @@ function Optimisers.apply!(::DoWG, state, x::AbstractArray{T}, dx) where {T}
27
32
end
28
33
29
34
"""
30
-
DoG(repsilon)
35
+
DoG(alpha)
31
36
32
37
Distance over gradient (DoG[^IHC2023]) optimizer.
33
-
It's only parameter is the initial guess of the Euclidean distance to the optimum repsilon.
34
-
The original paper recommends \$ 10^{-4} ( 1 + \\lVert \\lambda_0 \\rVert ) \$, but the default value is \$ 10^{-6} \$.
38
+
Its only parameter is the guess for the distance between the optimum and the initialization `alpha`, which shouldn't need much tuning.
39
+
40
+
DoG works by starting from a AdaGrad-like update rule with a small step size, but then automatically increases the step size ("warming up") to be as large as possible.
41
+
If `alpha` is too large, the optimzier can initially diverge, while if it is too small, the warm up period can be too long.
35
42
36
43
# Parameters
37
-
- `repsilon`: Initial guess of the Euclidean distance between the initial point and the optimum. (default value: `1e-6`)
44
+
- `alpha`: Scaling factor for initial guess (`repsilon` in the original paper) of the Euclidean distance between the initial point and the optimum. For the initial parameter `lambda0`, `repsilon` is calculated as `repsilon = alpha*(1 + norm(lambda0))`. (default value: `1e-6`)
38
45
"""
39
46
Optimisers.@defstruct DoG <:Optimisers.AbstractRule
40
-
repsilon=1e-6
47
+
alpha=1e-6
41
48
end
42
49
43
-
Optimisers.init(o::DoG, x::AbstractArray{T}) where {T} = (copy(x), zero(T), T(o.repsilon))
50
+
Optimisers.init(o::DoG, x::AbstractArray{T}) where {T} = (copy(x), zero(T), T(o.alpha)*(1+norm(x)))
44
51
45
52
function Optimisers.apply!(::DoG, state, x::AbstractArray{T}, dx) where {T}
0 commit comments