Skip to content

Commit bc1d4bd

Browse files
authored
Improve (L-)BFGS docs (#1164)
* Initial sketch * Fix reference
1 parent f818cb9 commit bc1d4bd

File tree

1 file changed

+39
-21
lines changed

1 file changed

+39
-21
lines changed

docs/src/algo/lbfgs.md

Lines changed: 39 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
# (L-)BFGS
2-
This page contains information about BFGS and its limited memory version L-BFGS.
2+
3+
This page contains information about
4+
Broyden–Fletcher–Goldfarb–Shanno ([BFGS](https://en.wikipedia.org/wiki/Broyden–Fletcher–Goldfarb–Shanno_algorithm)) algorithm and its limited memory version [L-BFGS](https://en.wikipedia.org/wiki/Limited-memory_BFGS).
5+
36
## Constructors
7+
48
```julia
59
BFGS(; alphaguess = LineSearches.InitialStatic(),
610
linesearch = LineSearches.HagerZhang(),
@@ -26,39 +30,53 @@ LBFGS(; m = 10,
2630
manifold = Flat(),
2731
scaleinvH0::Bool = P === nothing)
2832
```
33+
2934
## Description
30-
This means that it takes steps according to
35+
36+
In both algorithms the aim is do compute a descent direction ``d_ n``
37+
by approximately solving the newton equation
3138

3239
```math
33-
x_{n+1} = x_n - P^{-1}\nabla f(x_n)
40+
H_n d_n = - ∇f(x_n),
3441
```
3542

36-
where ``P`` is a positive definite matrix. If ``P`` is the Hessian, we get Newton's method.
37-
In (L-)BFGS, the matrix is an approximation to the Hessian built using differences
38-
in the gradient across iterations. As long as the initial matrix is positive definite
39-
it is possible to show that all the follow matrices will be as well. The starting
40-
matrix could simply be the identity matrix, such that the first step is identical
41-
to the Gradient Descent algorithm, or even the actual Hessian.
42-
43-
There are two versions of BFGS in the package: BFGS, and L-BFGS. The latter is different
44-
from the former because it doesn't use a complete history of the iterative procedure to
45-
construct ``P``, but rather only the latest ``m`` steps. It doesn't actually build the Hessian
46-
approximation matrix either, but computes the direction directly. This makes more suitable for
47-
large scale problems, as the memory requirement to store the relevant vectors will
48-
grow quickly in large problems.
43+
where ``H_n`` is an approximation to the Hessian of ``f``. Instead of approximating
44+
the Hessian, both BFGS as well as L-BFGS approximate the inverse ``B_n = H_n^{-1}`` of the Hessian,
45+
since that yields a matrix multiplication instead of solving a the linear system of equations above.
4946

50-
As with the other quasi-Newton solvers in this package, a scalar ``\alpha`` is introduced
51-
as follows
47+
Then
5248

5349
```math
54-
x_{n+1} = x_n - \alpha P^{-1}\nabla f(x_n)
50+
x_{n+1} = x_n - \alpha_n d_n,
5551
```
5652

57-
and is chosen by a linesearch algorithm such that each step gives sufficient descent.
53+
where ``α_n`` is the step size resulting from the specified `linesearch`.
54+
55+
In (L-)BFGS, the matrix is an approximation to the inverse of the Hessian built using differences of the gradients and iterates during the iterations.
56+
As long as the initial matrix is positive definite it is possible to show that all the follow matrices will be as well.
57+
58+
For BFGS, the starting matrix could simply be the identity matrix, such that the first step is identical
59+
to the Gradient Descent algorithm, or even the actual inverse of the initial Hessian.
60+
While BFGS stores the full matrix ``B_n`` and performs an update of that approximate Hessian in every step.
61+
62+
L-BFGS on the other hand only stores ``m`` differences of gradients and iterates
63+
instead of a full matrix. This is more memory-efficient especially for large-scale problems.
64+
65+
For L-BFGS, the inverse of the Hessian can be preconditioned in two ways.
66+
67+
You can either set `scaleinvH0` to true, then the `m` steps of approximating
68+
the inverse of the Hessian start from a scaled version of the identity.
69+
It if is set to false, the approximation starts from the identity matrix.
70+
71+
On the other hand you can provide a preconditioning matrix `P` that should be positive definite the approximation then starts from ``P^{-1}``.
72+
The preconditioner can be changed during the iterations by providing the `precondprep` keyword which based on `P` and the current iterate `x` updates
73+
the preconditioner matrix accordingly.
5874

59-
## Example
6075
## References
6176

6277
```@bibliography
78+
Pages = []
79+
Canonical = false
80+
6381
nocedal2006
6482
```

0 commit comments

Comments
 (0)