You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This page contains information about BFGS and its limited memory version L-BFGS.
2
+
3
+
This page contains information about
4
+
Broyden–Fletcher–Goldfarb–Shanno ([BFGS](https://en.wikipedia.org/wiki/Broyden–Fletcher–Goldfarb–Shanno_algorithm)) algorithm and its limited memory version [L-BFGS](https://en.wikipedia.org/wiki/Limited-memory_BFGS).
5
+
3
6
## Constructors
7
+
4
8
```julia
5
9
BFGS(; alphaguess = LineSearches.InitialStatic(),
6
10
linesearch = LineSearches.HagerZhang(),
@@ -26,39 +30,53 @@ LBFGS(; m = 10,
26
30
manifold =Flat(),
27
31
scaleinvH0::Bool= P ===nothing)
28
32
```
33
+
29
34
## Description
30
-
This means that it takes steps according to
35
+
36
+
In both algorithms the aim is do compute a descent direction ``d_ n``
37
+
by approximately solving the newton equation
31
38
32
39
```math
33
-
x_{n+1} = x_n - P^{-1}\nabla f(x_n)
40
+
H_n d_n = - ∇f(x_n),
34
41
```
35
42
36
-
where ``P`` is a positive definite matrix. If ``P`` is the Hessian, we get Newton's method.
37
-
In (L-)BFGS, the matrix is an approximation to the Hessian built using differences
38
-
in the gradient across iterations. As long as the initial matrix is positive definite
39
-
it is possible to show that all the follow matrices will be as well. The starting
40
-
matrix could simply be the identity matrix, such that the first step is identical
41
-
to the Gradient Descent algorithm, or even the actual Hessian.
42
-
43
-
There are two versions of BFGS in the package: BFGS, and L-BFGS. The latter is different
44
-
from the former because it doesn't use a complete history of the iterative procedure to
45
-
construct ``P``, but rather only the latest ``m`` steps. It doesn't actually build the Hessian
46
-
approximation matrix either, but computes the direction directly. This makes more suitable for
47
-
large scale problems, as the memory requirement to store the relevant vectors will
48
-
grow quickly in large problems.
43
+
where ``H_n`` is an approximation to the Hessian of ``f``. Instead of approximating
44
+
the Hessian, both BFGS as well as L-BFGS approximate the inverse ``B_n = H_n^{-1}`` of the Hessian,
45
+
since that yields a matrix multiplication instead of solving a the linear system of equations above.
49
46
50
-
As with the other quasi-Newton solvers in this package, a scalar ``\alpha`` is introduced
51
-
as follows
47
+
Then
52
48
53
49
```math
54
-
x_{n+1} = x_n - \alpha P^{-1}\nabla f(x_n)
50
+
x_{n+1} = x_n - \alpha_n d_n,
55
51
```
56
52
57
-
and is chosen by a linesearch algorithm such that each step gives sufficient descent.
53
+
where ``α_n`` is the step size resulting from the specified `linesearch`.
54
+
55
+
In (L-)BFGS, the matrix is an approximation to the inverse of the Hessian built using differences of the gradients and iterates during the iterations.
56
+
As long as the initial matrix is positive definite it is possible to show that all the follow matrices will be as well.
57
+
58
+
For BFGS, the starting matrix could simply be the identity matrix, such that the first step is identical
59
+
to the Gradient Descent algorithm, or even the actual inverse of the initial Hessian.
60
+
While BFGS stores the full matrix ``B_n`` and performs an update of that approximate Hessian in every step.
61
+
62
+
L-BFGS on the other hand only stores ``m`` differences of gradients and iterates
63
+
instead of a full matrix. This is more memory-efficient especially for large-scale problems.
64
+
65
+
For L-BFGS, the inverse of the Hessian can be preconditioned in two ways.
66
+
67
+
You can either set `scaleinvH0` to true, then the `m` steps of approximating
68
+
the inverse of the Hessian start from a scaled version of the identity.
69
+
It if is set to false, the approximation starts from the identity matrix.
70
+
71
+
On the other hand you can provide a preconditioning matrix `P` that should be positive definite the approximation then starts from ``P^{-1}``.
72
+
The preconditioner can be changed during the iterations by providing the `precondprep` keyword which based on `P` and the current iterate `x` updates
0 commit comments