Uh-oh, it looks like your Internet Explorer is out of date.
For a better shopping experience, please upgrade now.
This outstanding text for graduate students and researchers proposes improvements to existing algorithms, extends their related mathematical theories, and offers details on new algorithms for approximating local and global minima. None of the algorithms requires an evaluation of derivatives; all depend entirely on sequential function evaluation, a highly practical scenario in the frequent event of difficult-to-evaluate derivatives.
Topics include the use of successive interpolation for finding simple zeros of a function and its derivatives; an algorithm with guaranteed convergence for finding a minimum of a function of one variation; global minimization given an upper bound on the second derivative; and a new algorithm for minimizing a function of several variables without calculating derivatives. Many numerical examples augment the text, along with a complete analysis of rate of convergence for most algorithms and error bounds that allow for the effect of rounding errors.
|Product dimensions:||5.30(w) x 8.40(h) x 0.60(d)|
Read an Excerpt
Algorithms for Minimization Without Derivatives
By RICHARD P. BRENT
Dover Publications, Inc.Copyright © 2002 Richard P. Brent
All rights reserved.
INTRODUCTION AND SUMMARY
Consider the problem of finding an approximate zero or minimum of a function of one real variable, using limited-precision arithmetic on a sequential digital computer. The function f may not be differentiable, or the derivative f' may be difficult to compute, so a method which uses only computed values of f is desirable. Since an evaluation of f may be very expensive in terms of computer time, a good method should guarantee to find a correct solution, to within some prescribed tolerance, using only a small number of function evaluations. Hence, we study algorithms which depend on evaluating f at a small number of points, and for which certain desirable properties are guaranteed, even in the presence of rounding errors.
Slow, safe algorithms are seldom preferred in practice to fast algorithms which may occasionally fail. Thus, we want algorithms which are guaranteed to succeed in a reasonable time even for the most "difficult" functions, yet are as fast as commonly used algorithms for "easy" functions. For example, bisection is a safe method for finding a zero of a function which changes sign in a given interval, but from our point of view it is not an acceptable method, because it is just as slow for any function, no matter how well behaved, as it is in the worst possible case (ignoring the possibility that an exact zero may occasionally be found by chance). As a contrasting example, consider the method of successive linear interpolation, which converges superlinearly to a simple zero of a C1 function, provided that the initial approximation is good and rounding errors are unimportant. This method is not acceptable either, for in practice there may be no way of knowing in advance if the zero is simple, if the initial approximation is sufficiently good to ensure convergence, or if the effect of rounding errors is important.
In Chapter 4 we describe an algorithm which, by combining some of the desirable features of bisection and successive linear interpolation, does come close to satisfying our requirements: it is guaranteed to converge (i.e., halt) after a reasonably small number of function evaluations, and the rate of convergence for well-behaved functions is so fast that a less reliable algorithm is unlikely to be preferred on grounds of speed.
An analogous algorithm, which finds a local minimum of a function of one variable by a combination of golden section search and successive parabolic interpolation, is described in Chapter 5. This algorithm fails to satisfy one of our requirements: in certain applications where repeated one-dimensional minimizations are required, and where accuracy is not very important, a faster (though less reliable) method is preferable. One such application, finding local minima of functions of several variables without calculating derivatives, is discussed in Chapter 7. (Note that wherever we consider minima we could equally well consider maxima.)
Most algorithms for minimizing a nonlinear function of one or more variables find, at best, a local minimum. For a function with several local minima, there is no guarantee that the local minimum found is the global (i.e., true or lowest) minimum. Since it is the global minimum which is of interest in most applications, this is a serious practical disadvantage of most minimization algorithms, and our algorithm given in Chapter 5 is no exception. The usual remedy is to try several different starting points and, perhaps, vary some of the parameters of the minimization procedure, in the hope that the lowest local minimum found is the global minimum. This approach is inefficient, as the same local minimum may be found several times. It is also unreliable, for, no matter how many starting points are tried, it is impossible to be quite sure that the global minimum has been found.
In Chapter 6 we discuss the problem of finding the global minimum to within a prescribed tolerance. It is possible to give an algorithm for solving this problem, provided that a little a priori information about the function to be minimized is known. We describe an efficient algorithm, applicable if an upper bound on f' is known, and we show how this algorithm can be used recursively to find the global minimum of a function of several variables. Unfortunately, because the amount of computation involved increases exponentially with the number of variables, the recursive method is practical only for functions of less than four variables. For functions of more variables, we still have to resort to the unreliable "trial and error" method, unless special information about the function to be minimized is available.
Thus, we are led to consider practical methods for finding local (unconstrained) minima of functions of several variables. As before, we consider methods which depend on evaluating the function at a small number of points. Unfortunately, without imposing very strict conditions on the functions to be minimized, it is not possible to guarantee that an n-dimensional minimization algorithm produces results which are correct to within some prescribed tolerance, or that the effect of rounding errors has been taken into account. We have to be satisfied with algorithms which nearly always give correct results for the functions likely to arise in practical applications.
As suggested by the length of our bibliography, there has recently been considerable interest in the unconstrained minimization problem. Thus, we can hardly expect to find a good method which is completely unrelated to the known ones. In Chapter 7 we take one of the better methods which does not use derivatives, that of Powell (1964), and modify it to try to overcome some of the difficulties observed in the literature. Numerical tests suggest that our proposed method is faster than Powell's original method, and just as reliable. It also compares quite well with a different method proposed by Stewart (1967), at least for functions of less than ten variables. (We have few numerical results for non-quadratic functions of ten or more variables.)
ALGOL implementations of all the above algorithms are given. Most testing was done with ALGOL W (Wirth and Hoare (1966)) on IBM 360/67 and 360/91 computers. As ALGOL W is not widely used, we give ALGOL 60 procedures (Naur (1963)), except for the n-dimensional minimization algorithm. FORTRAN subroutines for the one-dimensional zero-finding and minimization algorithms are given in the Appendix.
To recapitulate, we describe algorithms, and give ALGOL procedures, for solving the following problems efficiently, using only function (not derivative) evaluations:
1. Finding a zero of a function of one variable if an interval in which the function changes sign is given;
2. Finding a local minimum of a function of one variable, defined on a given interval;
3. Finding, to within a prescribed tolerance, the global minimum of a function of one or more variables, given upper bounds on the second derivatives;
4. Finding a local minimum of a function of several variables.
For the first three algorithms, rigorous bounds on the error and the number of function evaluations required are established, taking the effect of rounding errors into account. Some results concerning the order of convergence of the first two algorithms, and preliminary results on interpolation and divided differences, are also of interest.
In this section we summarize the main results of the following chapters. A more detailed discussion is given at the appropriate places in each chapter. This summary is intended to serve as a guide to the reader who is interested in some of our results, but not in others. To assist such a reader, an attempt has been made to keep each chapter as self-contained as possible.
In Chapter 2 we collect some results on Taylor series, Lagrange interpolation, and divided differences. Most of these results are needed in Chapter 3, and the casual reader might prefer to skip Chapter 2 and refer back to it when necessary. Some of the results are similar to classical ones, but instead of assuming that f has n + 1 continuous derivatives, we only assume that f(n) is Lipschitz continuous, and the term f(n + 1) ([xi]) in the classical results is replaced by a number which is bounded in absolute value by a Lipschitz constant. For example, Lemmas 2.3.1, 2.3.2, 2.4.1, and 2.5.1 are of this nature. Since a Lipschitz continuous function is differentiable almost everywhere, these results are not surprising, although they have not been found in the literature, except where references are given. (Sometimes Lipschitz conditions are imposed on the derivatives of functions of several variables: see, for example, Armijo (1966) and McCormick (1969).) The proofs are mostly similar to those for the classical results.
Theorem 2.6.1 is a slight generalization of some results of Ralston (1963, 1965) on differentiating the error in Lagrange interpolation. It is included both for its independent interest, and because it may be used to prove a slightly weaker form of Lemma 3.6.1 for the important case q = 2. (A proof along these lines is sketched in Kowalik and Osborne (1968).)
An interesting result of Chapter 2 is Theorem 2.6.2, which gives an expression for the derivative of the error in Lagrange interpolation at the points of interpolation. It is well known that the conclusion of Theorem 2.6.2 holds if f has n + 1 continuous derivatives, but Theorem 2.6.2 shows that it is sufficient for f to have n continuous derivatives.
Theorem 2.5.1, which gives an expansion of divided differences, may be regarded as a generalization of Taylor's theorem. It is used several times in Chapter 3: for example, see Theorem 3.4.1 and Lemma 3.6.1. Theorem 2.5.1 is useful for the analysis of interpolation processes whenever the coefficients of the interpolation polynomials can conveniently be expressed in terms of divided differences.
In Chapter 3 we prove some theorems which provide a theoretical foundation for the algorithms described in Chapters 4 and 5. In particular, we show when the algorithms will converge superlinearly, and what the order (i.e., rate) of convergence will be. For these results the effect of rounding errors is ignored. The reader who is mainly interested in the practical applications of our results might omit Chapter 3, except for the numerical examples (Section 3.9) and the summary (Section 3.10).
So that results concerning successive linear interpolation for finding zeros (used in Chapter 4), and successive parabolic interpolation for finding turning points (used in Chapter 5), can be given together, we consider a more general process for finding a zero η of f(q - 1), for any fixed q ≥ 1. Successive linear interpolation and successive parabolic interpolation are just the special cases q = 1 and q = 2. Another case which is of some practical interest is q = 3, for finding inflexion points. As the proofs for general q are essentially no more difficult than for q = 2, most of our results are given for general q.
For the applications in Chapters 4 and 5, the most important results are Theorem 3.4.1, which gives conditions under which convergence is superlinear, and Theorem 3.5.1, which shows when the order is at least 1.618 ... (for q = 1) or 1.324 ... (for q = 2). These numbers are well known, but our assumptions about the differentiability of f are weaker than those of previous authors, e.g., Ostrowski (1966) and Jarratt (1967, 1968).
From a mathematical point of view, the most interesting result of Chapter 3 is Theorem 3.7.1. The result for q = 1 is given in Ostrowski (1966), except for our slightly weaker assumption about the smoothness of f. For q = 2, our result that convergence to η with order at least 1.378 ... is possible, even if f(3)(η) ≠ 0, appears to be new. Jarratt (1967) and Kowalik and Osborne (1968) assume that
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (2.1)
and then, from Lemma 3.6.1, the order of convergence is 1.324.... However, even for such a simple function as
f(x) = 2x3 + x2, (2.2)
there are starting points x0, x1 and x2, arbitrarily close to η, such that (2.1) fails to hold, and then the order is at least 1.378.... We should point out that this exceptional case is unlikely to occur: an interesting conjecture is that the set of starting points for which it occurs has measure zero.
The practical conclusion to be drawn from Theorem 3.7.1 is that, if convergence is to be accelerated, then the result of Lemma 3.6.1 should be used in preference to a result like equation (3.2.1). In Section 3.8 we give one of the many ways in which this may be done. Finally, some numerical examples, illustrating both the accelerated and unaccelerated processes, are given in Section 3.9.
In Chapter 4 we describe an algorithm for finding a zero of a function which changes sign in a given interval. The algorithm is based on a combination of successive linear interpolation and bisection, in much the same way as "Dekker's algorithm" (van Wijngaarden, Zonneveld, and Dijkstra (1963); Wilkinson (1967); Peters and Wilkinson (1969); and Dekker (1969)). Our algorithm never converges much more slowly than bisection, whereas Dekker's algorithm may converge extremely slowly in certain cases. (Examples are given in Section 4.2.)
It is well known that bisection is the optimal algorithm, in a minimax sense, for finding zeros of functions which change sign in an interval. (We only consider sequential algorithms: see Robbins (1952), Wilde (1964), and Section 4.5.) The motivation for both our algorithm and Dekker's is that bisection is not optimal if the class of allowable functions is suitably restricted. For example, it is not optimal for convex functions (Bellman and Dreyfus (1962), Gross and Johnson (1959)), or for C1 functions with simple zeros.
Both our algorithm and Dekker's exhibit superlinear convergence to a simple zero of a C1 function, for eventually only linear interpolations are performed and the theorems of Chapter 3 are applicable. Thus, convergence is usually much faster than for bisection. Our algorithm incorporates inverse quadratic interpolation as well as linear interpolation, so it is often slightly faster than Dekker's algorithm on well-behaved functions: see Section 4.4.
An algorithm for finding a local minimum of a function of one variable is described in Chapter 5. The algorithm combines golden section search (Bellman (1957), Kiefer (1953), Wilde (1964), Witzgall (1969)) and successive parabolic interpolation, in the same way as bisection and successive linear interpolation are combined in the zero-finding algorithm of Chapter 4. Convergence in a reasonable number of function evaluations is guaranteed (Section 5.5). For a C2 function with positive curvature at the minimum, the results of Chapter 3 show that convergence is superlinear, provided that the minimum is at an interior point of the interval. Other algorithms given in the literature either fail to have these two desirable properties, or their order of convergence is less than that of our algorithm when convergence is strictly superlinear: see Sections 5.4 and 5.5.
In Sections 5.2 and 5.3 we consider the effect of rounding errors. Section 5.2 contains an analysis of the limitations on the accuracy of any algorithm which is based entirely on limited-precision function evaluations, and this section should be studied by the reader who intends to use the ALGOL procedure given in Section 5.8.
Excerpted from Algorithms for Minimization Without Derivatives by RICHARD P. BRENT. Copyright © 2002 Richard P. Brent. Excerpted by permission of Dover Publications, Inc..
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
Table of ContentsPREFACE TO DOVER EDITION
1 INTRODUCTION AND SUMMARY
2 "SOME USEFUL RESULTS ON TAYLOR SERIES, DIVIDED DIFFERENCIES, AND LAGRANGE INTERPOLATION"
2.2 Notation and definitions
2.3 Truncated Taylor series
2.4 Lagrange interpolation
2.5 Divided differences
2.6 Differentiating the error
3 THE USE OF SUCCESSIVE INTERPOLATION FOR FINDING SIMPLE ZEROS OF A FUNCTION AND ITS DERIVATIVES
3.2 The definition of order
3.3 Convergence to a zero
3.4 Superlinear convergence
3.5 Strict superlinear convergence
3.6 The exact order of convergence
3.7 Stronger results for q = 1 and 2
3.8 Accelerating convergence
3.9 Some numerical examples
4 AN ALGORITHM WITH GUARANTEED CONVERGENCE FOR FINDING A ZERO OF A FUNCTION
4.2 The algorithm
4.3 Convergence properties
4.4 Practical tests
4.6 ALGOL 60 procedures
5 AN ALGORITHM WITH GUARANTEED CONVERGENCE FOR FINDING A MINIMUM OF A FUNCTION OF ONE VARIABLE
5.2 Fundamental limitations because of rounding errors
5.3 Unimodality and d-unimodality
5.4 An algorithm analogous to Dekker's algorithm
6 GLOBAL MINIMIZATION GIVEN AN UPPER BOUND ON THE SECOND DERIVATIVE
6.2 The basic theorems
6.3 An algorithm for global minimization
6.4 The rate of convergence in some special cases
6.5 A lower bound on the number of function evaluations required
6.6 Practical tests
6.7 Some extensions and generalizations
6.8 An algorithm for global minimization of a function of several variables
6.9 Summary and conclusions
6.10 ALGOL 60 procedures
7 A NEW ALGORITHM FOR MINIMIZING A FUNCTION OF SEVERAL VARIABLES WITHOUT CALCULATING DERIVATIVES
7.1 Introduction and survey of the literature
7.2 The effect of rounding errors
7.3 Powell's algorithm
7.4 The main modification
7.5 The resolution ridge problem
7.6 Some further details
7.7 Numerical results and comparison with other methods
7.9 An ALGOL W procedure and test program
APPENDIX: FORTRAN subroutines