bfgsmin and fminunc

View: New views
10 Messages — Rating Filter:   Alert me  

bfgsmin and fminunc

by Michael Creel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I hadn't been keeping up with the developments, but a post today brought fminunc to my attention. fminunc has more or less the same use as bfgsmin, so I did a quick comparison. The script attached to this message uses the two algorithms to minimize a function of 20 variables. If you run this on Octave 3.1.55, you find that bfgsmin is about 7X faster. The default tolerances for bfgsmin are 1e-12 on the function value, and 1e-6 for the variables. For fminunc, the default tolerances according to the help entry are 1e-7. In spite of this, a convergence check fails for fminunc (and inspection of the result shows that some variables are more than 1e-6 away from the solution. The output of the script is below.

This is just one examplecompare_bfgsmin_fminunc.m, I have no idea if these results are representative.

Michael



octave:12> compare_bfgsmin_fminunc
EXAMPLE 1: Ordinary BFGS, using analytic gradient
------------------------------------------------
bfgsmin final results: 65 iterations            

function value: 1.8134e-16

STRONG CONVERGENCE
Function conv 1  Param conv 1  Gradient conv 1

          param    gradient (n)          change
        0.05000        -0.00000         0.00000
        0.10000         0.00000        -0.00000
        0.15000         0.00000        -0.00000
        0.20000        -0.00000         0.00000
        0.25000        -0.00000         0.00000
        0.30000        -0.00000         0.00000
        0.35000        -0.00000         0.00000
        0.40000        -0.00000         0.00000
        0.45000         0.00000        -0.00000
        0.50000        -0.00000        -0.00000
        0.55000         0.00000        -0.00000
        0.60000        -0.00000        -0.00000
        0.65000        -0.00000         0.00000
        0.70000         0.00000        -0.00000
        0.75000         0.00000        -0.00000
        0.80000        -0.00000         0.00000
        0.85000         0.00000         0.00000
        0.90000         0.00000         0.00000
        0.95000        -0.00000         0.00000
        1.00000        -0.00000         0.00000
theta =

   0.050000
   0.100000
   0.150000
   0.200000
   0.250000
   0.300000
   0.350000
   0.400000
   0.450000
   0.500000
   0.550000
   0.600000
   0.650000
   0.700000
   0.750000
   0.799999
   0.849999
   0.899997
   0.949994
   0.999988

BFGSMIN: EXAMPLE 1: Ordinary BFGS, using analytic gradient
Success!! :-)
Elapsed time = 0.092005



FMINUNC: EXAMPLE 1: Ordinary BFGS, using analytic gradient
Failure?! :-(
Elapsed time = 0.724046

Re: bfgsmin and fminunc

by Michael Creel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Oops, sorry. bfgsmin was using the analytic gradient, and fminunc was using numeric. If you edit the script to set the objective function to objective2, then both will use numeric gradients. The speed difference becomes less than 2X, but the convergence tolerance issue remains.
M.

Re: bfgsmin and fminunc

by Jaroslav Hajek-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, May 14, 2009 at 11:26 AM, Michael Creel <michael.creel@...> wrote:
>
> Oops, sorry. bfgsmin was using the analytic gradient, and fminunc was using
> numeric. If you edit the script to set the objective function to objective2,
> then both will use numeric gradients. The speed difference becomes less than
> 2X, but the convergence tolerance issue remains.
> M.
>

Hi Michael,

I'm glad to find you took the time to try some testing. The Octave's
new fminunc got very little testing so far, so I won't be surprised at
all if major bugs, robustness defects or other problems are
discovered. fminunc is intended to be Matlab-compatible and
essentially mimicks the algorithm in fsolve (which was much more
tested). The speed of a single iteration depends crucially on whether
you have Octave linked with libqrupdate (if not, it will be O(N^3)).
And of course the number of iterations is a major issue.
Regarding the tolerances issue: the default values for both TolFun and
TolX are sqrt(eps(class(x0))), which is about
1.5e-8 in the double case (fminunc will work in single precision if
given single prec inputs).
in the convergence tests (as well as in TR stepping), fminunc uses
automatic scaling by the predicted second derivatives (hessian
diagonal) to achieve scaling independency. So this does not translate
to absolute bounds.
It would be nice to provide a way to specify absolute tolerances, but
I'm not sure how to best do it w.r.t. Matlab compatibility.

Unfortunately, I can't really try the comparison because I have
trouble to get bfgsmin working at all using development octave, there
are some problems with feval that seem to be nonetheless on Octave's
side.
I'll investigate what's wrong.

Using analytic gradient, fminunc seems to produce a much better result
(try if you like), so maybe part of the problem is the numeric
gradient calculation. The big difference I see is that unlike fminunc,
bfgsmin uses centered differences, which are likely more accurate. I
also see it uses a slightly more sophisticated delta selection, so I
may copy that to __fdjac__, too. Allowing (optionally) central
differences within both fsolve and fminunc would indeed be a good
idea.

Still, even with the analytic gradient I'm not particularly pleased
with fminunc's performance (number of iterations), so I'll look for
possible improvements.

Anyway, further tests/suggestions from your part are welcome.

regards

--
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

Re: bfgsmin and fminunc

by Michael Creel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I have had a look at fminunc.m, and it looks nice in general. I would suggest using central diff gradient. In my experience, the increased time per iteration is usually offset by fewer iterations needed for convergence, due to better accuracy. I see a "guarded_eval" - what is that? I would also suggest setting max iters to infinite, or a lot higher than 400. Some problems are just hard to solve.

I may try to take advantage of the fast  cholupdate code in bfgsmin, probably this summer when I have time.

Maybe it would be nice to have a set of test functions for optimization algorithms, to be able compare performance and verify the capability of finding correct solutions. I used a set when I first wrote bfgsmin, but I haven't used it in a while. I'll see if I can resurrect it and give it a friendly interface.

M.

Re: bfgsmin and fminunc

by Julian Schnidder-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



Michael Creel schrieb:
> [...]
>
> Maybe it would be nice to have a set of test functions for optimization
> algorithms, to be able compare performance and verify the capability of
> finding correct solutions. I used a set when I first wrote bfgsmin, but I
> haven't used it in a while. I'll see if I can resurrect it and give it a
> friendly interface.

CUTEr (http://cuter.rl.ac.uk/cuter-www/) may be worth looking into.

Kind regards,
Julian


Re: bfgsmin and fminunc

by Jaroslav Hajek-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, May 15, 2009 at 9:44 AM, Michael Creel <michael.creel@...> wrote:

>
> I have had a look at fminunc.m, and it looks nice in general. I would suggest
> using central diff gradient. In my experience, the increased time per
> iteration is usually offset by fewer iterations needed for convergence, due
> to better accuracy. I see a "guarded_eval" - what is that? I would also
> suggest setting max iters to infinite, or a lot higher than 400. Some
> problems are just hard to solve.
>
> I may try to take advantage of the fast  cholupdate code in bfgsmin,
> probably this summer when I have time.
>
> Maybe it would be nice to have a set of test functions for optimization
> algorithms, to be able compare performance and verify the capability of
> finding correct solutions. I used a set when I first wrote bfgsmin, but I
> haven't used it in a while. I'll see if I can resurrect it and give it a
> friendly interface.
>
> M.
>

Michael,

thanks for your comments. I have reworked the FD part of fsolve and
fminunc so that they now use central diffs by default, controllable
using the FinDiffType option.
While the superlinear/subquadratic termination of the BFGS seems to
work fine in fminunc (which is the primary aim), the global
convergence props for strongly nonquadratic functions (such as
Rosenbrock's) is something that depends, according to my observations,
crucially on the strategy of updating the trust region.
So far the strategy was very simple, based on the original MINPACK
code. I replaced it with something slightly more elaborate now, but I
think there's still a lot to improve. It seems most researchers
consider the trust region strategy either equally or more powerful
compared to line searches, but OTOH the line search strategies seem
far better investigated.
In the present state fminunc differs pretty much from bfgsmin - trust
region/line search, factorized full BFGS/direct inverse full BFGS or
LBFGS. I wanted fminunc to stay close to fsolve and exploit the
"duality" between solving nonlinear systems and minimizing
functionals, and to be suitable for small or moderate size problems,
just like fsolve is. I think that a basic, Matlab-compatible local
minimization tool should be part of Octave core, but an user seriously
concerned with optimization should undoubtedly get the optimization
toolbox.
The optim toolbox needs some clean-up and also should eventually use
the optimget/optimset mechanism now provided by core Octave, that
allows case-insensitive option setting and warns when options are
misspelled.
Any more testing/suggestions/code contributions etc. are surely welcome.

regards

--
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


Re: bfgsmin and fminunc

by Michael Creel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, May 18, 2009 at 10:08 AM, Jaroslav Hajek <highegg@...> wrote:

> On Fri, May 15, 2009 at 9:44 AM, Michael Creel <michael.creel@...> wrote:
>>
>> I have had a look at fminunc.m, and it looks nice in general. I would suggest
>> using central diff gradient. In my experience, the increased time per
>> iteration is usually offset by fewer iterations needed for convergence, due
>> to better accuracy. I see a "guarded_eval" - what is that? I would also
>> suggest setting max iters to infinite, or a lot higher than 400. Some
>> problems are just hard to solve.
>>
>> I may try to take advantage of the fast  cholupdate code in bfgsmin,
>> probably this summer when I have time.
>>
>> Maybe it would be nice to have a set of test functions for optimization
>> algorithms, to be able compare performance and verify the capability of
>> finding correct solutions. I used a set when I first wrote bfgsmin, but I
>> haven't used it in a while. I'll see if I can resurrect it and give it a
>> friendly interface.
>>
>> M.
>>
>
> Michael,
>
> thanks for your comments. I have reworked the FD part of fsolve and
> fminunc so that they now use central diffs by default, controllable
> using the FinDiffType option.
> While the superlinear/subquadratic termination of the BFGS seems to
> work fine in fminunc (which is the primary aim), the global
> convergence props for strongly nonquadratic functions (such as
> Rosenbrock's) is something that depends, according to my observations,
> crucially on the strategy of updating the trust region.
> So far the strategy was very simple, based on the original MINPACK
> code. I replaced it with something slightly more elaborate now, but I
> think there's still a lot to improve. It seems most researchers
> consider the trust region strategy either equally or more powerful
> compared to line searches, but OTOH the line search strategies seem
> far better investigated.
> In the present state fminunc differs pretty much from bfgsmin - trust
> region/line search, factorized full BFGS/direct inverse full BFGS or
> LBFGS. I wanted fminunc to stay close to fsolve and exploit the
> "duality" between solving nonlinear systems and minimizing
> functionals, and to be suitable for small or moderate size problems,
> just like fsolve is. I think that a basic, Matlab-compatible local
> minimization tool should be part of Octave core, but an user seriously
> concerned with optimization should undoubtedly get the optimization
> toolbox.
> The optim toolbox needs some clean-up and also should eventually use
> the optimget/optimset mechanism now provided by core Octave, that
> allows case-insensitive option setting and warns when options are
> misspelled.
> Any more testing/suggestions/code contributions etc. are surely welcome.
>
> regards
>
> --
> RNDr. Jaroslav Hajek
> computing expert & GNU Octave developer
> Aeronautical Research and Test Institute (VZLU)
> Prague, Czech Republic
> url: www.highegg.matfyz.cz
>

Hi Jaroslav,
Thanks for the explanation, and I'm glad to see optimization getting
some attention. I'm not too well informed, but what I have observed
suggests that the trust region approach is used more widely in more
recent work.  I don't know too much about it, though, and I can't
comment on the relative merits compared to line search. I would say
that there's no reason not to combine the two if it leads to a
flexible, robust, and reliable algorithm.  I agree that the optim
toolbox could use some work. My contributions there are bfgsmin and
samin. I would like to modify them to have an interface more in line
with the "industry standard". Right now, these algorithms allow for
multiple arguments to the function to be minimized, passed as a cell,
and the one to minimize with respect to can be specified. That works
well for me, but it is not standard. I need to study the optimset
mechanism, I still don't know how it works. I will try to spend some
time this summer on this. If you could have a look at __bfgsmin.cc for
improvements, I would appreciate it. With an eye to improvements, I
believe that there is probably some low hanging fruit in that code.
Cheers, Michael


Re: bfgsmin and fminunc

by Jaroslav Hajek-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, May 18, 2009 at 10:29 AM, Michael Creel <michael.creel@...> wrote:

>
> Hi Jaroslav,
> Thanks for the explanation, and I'm glad to see optimization getting
> some attention. I'm not too well informed, but what I have observed
> suggests that the trust region approach is used more widely in more
> recent work.  I don't know too much about it, though, and I can't
> comment on the relative merits compared to line search. I would say
> that there's no reason not to combine the two if it leads to a
> flexible, robust, and reliable algorithm.  I agree that the optim
> toolbox could use some work. My contributions there are bfgsmin and
> samin. I would like to modify them to have an interface more in line
> with the "industry standard". Right now, these algorithms allow for
> multiple arguments to the function to be minimized, passed as a cell,
> and the one to minimize with respect to can be specified. That works
> well for me, but it is not standard.

Yes, in some cases that approach is convenient, and in the past it was
even necessary to avoid global variables, but I now consider it
superseded by anonymous function handles, which provide a more general
and flexible convenient mechanism to accomplish the same:

fminunc (@(x) my_func(x, y, z), x0,....)

the *big* advantage here is that the argument selection and binding
part is kept out of the function; fminunc just deals with a single
function of a single argument and stays simple. Thus, this mechanism
works exactly the same for all optimization, integration and any other
functions. The disadvantage, at least for now, is somewhat lower
performance, but I have plans for improvements here.


> I need to study the optimset
> mechanism, I still don't know how it works. I will try to spend some
> time this summer on this.

Basically, Octave's optimset/optimget now maintain internally a list
of "registered options" (see __all_opts__).
When you set an option via optimset, it looks up the list and possibly
adds proper casing, otherwise it warns you if an unregistered option
is being set, such as when you type "GradientObj" instead of "GradObj"
or so. Options are registered at Octave's startup using a "default"
query for each registered function. In a package, newly registered
function are added using PKG_ADD. See Octave's fsolve, fminunc, fzero
and lsqnonneg. Other functions (sqp, qp, glpk) either do not yet
conform to this interface, or do not need it (qp).

> If you could have a look at __bfgsmin.cc for
> improvements, I would appreciate it. With an eye to improvements, I
> believe that there is probably some low hanging fruit in that code.
> Cheers, Michael
>

I plan to do some work here, but probably not until Octave 3.2 gets out.

cheers


--
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


Re: bfgsmin and fminunc

by Michael Creel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, May 18, 2009 at 10:47 AM, Jaroslav Hajek <highegg@...> wrote:

> On Mon, May 18, 2009 at 10:29 AM, Michael Creel <michael.creel@...> wrote:
>>
>> Hi Jaroslav,
>> Thanks for the explanation, and I'm glad to see optimization getting
>> some attention. I'm not too well informed, but what I have observed
>> suggests that the trust region approach is used more widely in more
>> recent work.  I don't know too much about it, though, and I can't
>> comment on the relative merits compared to line search. I would say
>> that there's no reason not to combine the two if it leads to a
>> flexible, robust, and reliable algorithm.  I agree that the optim
>> toolbox could use some work. My contributions there are bfgsmin and
>> samin. I would like to modify them to have an interface more in line
>> with the "industry standard". Right now, these algorithms allow for
>> multiple arguments to the function to be minimized, passed as a cell,
>> and the one to minimize with respect to can be specified. That works
>> well for me, but it is not standard.
>
> Yes, in some cases that approach is convenient, and in the past it was
> even necessary to avoid global variables, but I now consider it
> superseded by anonymous function handles, which provide a more general
> and flexible convenient mechanism to accomplish the same:
>
> fminunc (@(x) my_func(x, y, z), x0,....)
>
> the *big* advantage here is that the argument selection and binding
> part is kept out of the function; fminunc just deals with a single
> function of a single argument and stays simple. Thus, this mechanism
> works exactly the same for all optimization, integration and any other
> functions. The disadvantage, at least for now, is somewhat lower
> performance, but I have plans for improvements here.
>
>
>> I need to study the optimset
>> mechanism, I still don't know how it works. I will try to spend some
>> time this summer on this.
>
> Basically, Octave's optimset/optimget now maintain internally a list
> of "registered options" (see __all_opts__).
> When you set an option via optimset, it looks up the list and possibly
> adds proper casing, otherwise it warns you if an unregistered option
> is being set, such as when you type "GradientObj" instead of "GradObj"
> or so. Options are registered at Octave's startup using a "default"
> query for each registered function. In a package, newly registered
> function are added using PKG_ADD. See Octave's fsolve, fminunc, fzero
> and lsqnonneg. Other functions (sqp, qp, glpk) either do not yet
> conform to this interface, or do not need it (qp).
>
>> If you could have a look at __bfgsmin.cc for
>> improvements, I would appreciate it. With an eye to improvements, I
>> believe that there is probably some low hanging fruit in that code.
>> Cheers, Michael
>>
>
> I plan to do some work here, but probably not until Octave 3.2 gets out.
>
> cheers
>
>
> --
> RNDr. Jaroslav Hajek
> computing expert & GNU Octave developer
> Aeronautical Research and Test Institute (VZLU)
> Prague, Czech Republic
> url: www.highegg.matfyz.cz
>

Another suggestion would be to add an example script that shows how to
use anonymous functions, set options, etc. I find this stuff a little
confusing, though I haven't yet made much effort to understand it. For
me, a working example always helps a lot.
M.


Re: bfgsmin and fminunc

by Jaroslav Hajek-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, May 18, 2009 at 10:54 AM, Michael Creel <michael.creel@...> wrote:
>
> Another suggestion would be to add an example script that shows how to
> use anonymous functions, set options, etc. I find this stuff a little
> confusing, though I haven't yet made much effort to understand it. For
> me, a working example always helps a lot.
> M.
>

I believe the anonymous function handles are described in the manual,
even with examples, though the examples are about quadrature not
optimization, they seem OK to me. "More docs for optimset/optimget and
the optimization stuff" is on my TODO.

cheers

--
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz