Skip to content

Commit 8cf0475

Browse files
cleanup
1 parent 1f32104 commit 8cf0475

File tree

5 files changed

+6
-2366
lines changed

5 files changed

+6
-2366
lines changed

docs/src/training/optimisers.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ opt = OptimiserChain(WeightDecay(1e-4), Descent())
4141
```
4242

4343
Here we apply the weight decay to the `Descent` optimiser.
44-
The resultin optimser `opt` can be used as any optimiser.
44+
The resulting optimiser `opt` can be used as any optimiser.
4545

4646
```julia
4747
w = [randn(10, 10), randn(10, 10)]

docs/src/training/reference.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -60,8 +60,6 @@ See the [Optimisers documentation](https://fluxml.ai/Optimisers.jl/dev/) for det
6060

6161
```@docs
6262
Flux.params
63-
Flux.update!(opt::Flux.Optimise.AbstractOptimiser, xs::AbstractArray, gs)
64-
Flux.train!(loss, ps::Flux.Params, data, opt::Flux.Optimise.AbstractOptimiser; cb)
6563
```
6664

6765
## Callbacks

docs/src/training/training.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -117,13 +117,13 @@ fmap(model, grads[1]) do p, g
117117
end
118118
```
119119

120-
A slightly more refined version of this loop to update all the parameters is wrapped up as a function [`update!`](@ref Flux.Optimise.update!)`(opt_state, model, grads[1])`.
121-
And the learning rate is the only thing stored in the [`Descent`](@ref Flux.Optimise.Descent) struct.
120+
A slightly more refined version of this loop to update all the parameters is wrapped up as a function [`update!`](@ref)`(opt_state, model, grads[1])`.
121+
And the learning rate is the only thing stored in the [`Descent`](@ref) struct.
122122

123123
However, there are many other optimisation rules, which adjust the step size and
124124
direction in various clever ways.
125125
Most require some memory of the gradients from earlier steps, rather than always
126-
walking straight downhill -- [`Momentum`](@ref Flux.Optimise.Momentum) is the simplest.
126+
walking straight downhill -- [`Momentum`](@ref) is the simplest.
127127
The function [`setup`](@ref Flux.Train.setup) creates the necessary storage for this, for a particular model.
128128
It should be called once, before training, and returns a tree-like object which is the
129129
first argument of `update!`. Like this:
@@ -140,7 +140,7 @@ for data in train_set
140140
end
141141
```
142142

143-
Many commonly-used optimisation rules, such as [`Adam`](@ref Flux.Optimise.Adam), are built-in.
143+
Many commonly-used optimisation rules, such as [`Adam`](@ref), are built-in.
144144
These are listed on the [optimisers](@ref man-optimisers) page.
145145

146146
!!! compat "Implicit-style optimiser state"
@@ -325,7 +325,7 @@ After that, in either case, [`Adam`](@ref Flux.Adam) computes the final update.
325325
The same trick works for *L₁ regularisation* (also called Lasso), where the penalty is
326326
`pen_l1(x::AbstractArray) = sum(abs, x)` instead. This is implemented by `SignDecay(0.42)`.
327327

328-
The same `OptimiserChain` mechanism can be used for other purposes, such as gradient clipping with [`ClipGrad`](@ref Flux.Optimise.ClipValue) or [`ClipNorm`](@ref Flux.Optimise.ClipNorm).
328+
The same `OptimiserChain` mechanism can be used for other purposes, such as gradient clipping with [`ClipGrad`](@ref) or [`ClipNorm`](@ref).
329329

330330
Besides L1 / L2 / weight decay, another common and quite different kind of regularisation is
331331
provided by the [`Dropout`](@ref Flux.Dropout) layer. This turns off some outputs of the

error.jl

Lines changed: 0 additions & 8 deletions
This file was deleted.

0 commit comments

Comments
 (0)