You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/training/training.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -117,13 +117,13 @@ fmap(model, grads[1]) do p, g
117
117
end
118
118
```
119
119
120
-
A slightly more refined version of this loop to update all the parameters is wrapped up as a function [`update!`](@ref Flux.Optimise.update!)`(opt_state, model, grads[1])`.
121
-
And the learning rate is the only thing stored in the [`Descent`](@ref Flux.Optimise.Descent) struct.
120
+
A slightly more refined version of this loop to update all the parameters is wrapped up as a function [`update!`](@ref)`(opt_state, model, grads[1])`.
121
+
And the learning rate is the only thing stored in the [`Descent`](@ref) struct.
122
122
123
123
However, there are many other optimisation rules, which adjust the step size and
124
124
direction in various clever ways.
125
125
Most require some memory of the gradients from earlier steps, rather than always
126
-
walking straight downhill -- [`Momentum`](@ref Flux.Optimise.Momentum) is the simplest.
126
+
walking straight downhill -- [`Momentum`](@ref) is the simplest.
127
127
The function [`setup`](@ref Flux.Train.setup) creates the necessary storage for this, for a particular model.
128
128
It should be called once, before training, and returns a tree-like object which is the
129
129
first argument of `update!`. Like this:
@@ -140,7 +140,7 @@ for data in train_set
140
140
end
141
141
```
142
142
143
-
Many commonly-used optimisation rules, such as [`Adam`](@ref Flux.Optimise.Adam), are built-in.
143
+
Many commonly-used optimisation rules, such as [`Adam`](@ref), are built-in.
144
144
These are listed on the [optimisers](@ref man-optimisers) page.
145
145
146
146
!!! compat "Implicit-style optimiser state"
@@ -325,7 +325,7 @@ After that, in either case, [`Adam`](@ref Flux.Adam) computes the final update.
325
325
The same trick works for *L₁ regularisation* (also called Lasso), where the penalty is
326
326
`pen_l1(x::AbstractArray) = sum(abs, x)` instead. This is implemented by `SignDecay(0.42)`.
327
327
328
-
The same `OptimiserChain` mechanism can be used for other purposes, such as gradient clipping with [`ClipGrad`](@ref Flux.Optimise.ClipValue) or [`ClipNorm`](@ref Flux.Optimise.ClipNorm).
328
+
The same `OptimiserChain` mechanism can be used for other purposes, such as gradient clipping with [`ClipGrad`](@ref) or [`ClipNorm`](@ref).
329
329
330
330
Besides L1 / L2 / weight decay, another common and quite different kind of regularisation is
331
331
provided by the [`Dropout`](@ref Flux.Dropout) layer. This turns off some outputs of the
0 commit comments