Skip to content

Commit 8167bf4

Browse files
pkofodavik-pal
authored andcommitted
Add docs for adam and adamax (JuliaNLSolvers#1072)
* Add docs for adam and adamax * Update make.jl
1 parent 60bfeea commit 8167bf4

File tree

3 files changed

+25
-6
lines changed

3 files changed

+25
-6
lines changed

docs/make.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ makedocs(
4040
"Particle Swarm" => "algo/particle_swarm.md",
4141
],
4242
"Gradient Required" => [
43+
"Adam and AdaMax" => "algo/adam_adamax.md",
4344
"Conjugate Gradient" => "algo/cg.md",
4445
"Gradient Descent" => "algo/gradientdescent.md",
4546
"(L-)BFGS" => "algo/lbfgs.md",

docs/src/algo/adam_adamax.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Adam and AdaMax
2+
This page contains information about Adam and AdaMax.
3+
## Constructors
4+
```julia
5+
Adam(; alpha=0.0001,
6+
beta_mean=0.9,
7+
beta_var=0.999,
8+
epsilon=1e-8)
9+
```
10+
11+
where `alpha` is the step length or learning parameter. `beta_mean` and `beta_var` are exponential decay parameters for the first and second moments estimates. Setting these closer to 0 will cause past iterates to matter less for the current steps and setting them closer to 1 means emphasizing past iterates more. `epsilon` should rarely be changed, and just exists to avoid a division by 0.
12+
13+
14+
```julia
15+
AdaMax(; alpha=0.002,
16+
beta_mean=0.9,
17+
beta_var=0.999)
18+
```
19+
where `alpha` is the step length or learning parameter. `beta_mean` and `beta_var` are exponential decay parameters for the first and second moments estimates. Setting these closer to 0 will cause past iterates to matter less for the current steps and setting them closer to 1 means emphasizing past iterates more.
20+
21+
## References
22+
Kingma, Diederik P., and Jimmy Ba. "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980 (2014).

src/multivariate/solvers/first_order/adam.jl

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -73,12 +73,8 @@ function update_state!(d, state::AdamState{T}, method::Adam) where T
7373
# m̂ = m./(1-β₁^state.iter)
7474
# v̂ = v./(1-β₂^state.iter)
7575
#@. z = z - α*m̂/(sqrt(v̂+ϵ))
76-
@. z = z - α*m/(1-β₁^state.iter)/(sqrt(v./(1-β₂^state.iter)+ϵ))
77-
78-
# not quite the same because epsilon is in the sqrt
79-
# not sure where I got this from
80-
# αₜ = α * sqrt(1 - β₂^state.iter) / (1 - β₁^state.iter)
81-
# z .= z .- αₜ .* m ./ (sqrt.(v .+ ϵ) )
76+
αₜ = α * sqrt(1 - β₂^state.iter) / (1 - β₁^state.iter)
77+
@. z = z - αₜ * m / (sqrt(v) + ϵ)
8278

8379
for _i in eachindex(z)
8480
# since m and u start at 0, this can happen if the initial gradient is exactly 0

0 commit comments

Comments
 (0)