Read an Excerpt
Analytical Methods of Optimization
By D. F. Lawden
Dover Publications, Inc.Copyright © 2003 D. F. Lawden
All rights reserved.
1.1 Statement of the problem
The fundamental problem which we shall study can be expressed in the following terms: Suppose the state of a given physical system can be changed by varying the values of N physical quantities denoted by u1, u2, ..., uN. This process whereby the system's state is influenced by an external agent or operator will be referred to as controlling the system and the quantities ur (r = 1,2, ..., N) will be called the control variables. The behaviour of the system or the configuration it adopts as a consequence of any particular choice of control variables, we shall designate the system's response. It will be desired to control the system in such a manner that its response is as close as possible to a standard the operator has his own reasons for striving to attain. It will always be assumed that the operator's success in achieving this object can be measured by giving the value of a quantity C called the performance criterion (or index). Thus, a response resulting in a large value of C may be judged superior to any other response which gives a smaller value. In these circumstances, our problem is to choose values for the control variables which maximize C. The system response which results from such a choice will be said to be optimal and our problem may accordingly be described as the optimization of systems.
In some cases, it is more natural to assess the performance of a system by reference to the cost to the operator of eliciting a particular response. The quantity C will then be termed the cost or penalty index and the optimization problem will be to minimize its value.
In this chapter, we shall investigate the optimization of systems whose responses to control are of a particularly simple character. Thus, having set the control variables to chosen values, it will be assumed that the system under consideration responds by adopting a corresponding fixed configuration and that the excellence or otherwise of this configuration for the operator's purpose can be assessed by calculation of an index C; i.e. C will be a function of the control variables. For example, if the system is a yacht, the angles made by the sails and boat axis with the wind could be taken as control variables; having set these, the sailing posture becomes determinate and the boat's velocity is calculable; by choosing the upwind component of this velocity as the performance index to be maximized, a simple optimization problem is formulated. Systems of this type will be called static systems.
In later chapters, a more complex situation will be analysed. It will be supposed that the control variables are permitted to vary with the time t in any manner and that this causes the system to move continuously through a succession of states. The quality of the overall response of the system during this period of motion will be taken to be measured by an index C, but C will no longer be a simple function of the ur, since it will depend upon the infinity of values taken by each control variable during its interval of variation. In these circumstances, C is said to be a functional of the functions ur(t). Systems of this type will be termed dynamic systems.
1.2 Unconstrained control
We are supposing that C is a known function of the control variables, so that we can write
C = C(u1, u2, ..., uN). (1.2.1)
This function will be assumed to possess continuous first and second order partial derivatives. The variables ur may be thought of as the components of a vector in an N-dimensional Euclidean space (the control space); this vector will be denoted by u and will be termed the control vector. We shall employ the usually accepted notation and exhibit the components of u as the elements of a column matrix, thus:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (1.2.2)
If the column is not displayed, it will be convenient to write it in the form (u1, u2, ..., uN)T, the superscript T implying that the matrix to which it is attached is to be transposed. Equation (1.2.1) will frequently be abbreviated to read C = C(u).
We shall suppose that the values which can be taken by the ur are all real, but are not otherwise restricted in any way. The control vector u is then said to be unconstrained. In future, unless otherwise stated, u1, u2, ..., uN will denote the optimal values of the variables, i.e. the values which maximize or minimize C, depending upon the nature of the problem. To derive necessary conditions to be satisfied by these optimal values, it will be necessary to examine the values taken by C at neighbouring points in the control space; such points will be taken to have coordinates vr = ur + ε [xi]r. If, now, the [xi]r are given arbitrary fixed values and ε is regarded as a variable, C(u + ε [xi]) will be a function of ε having a minimum (or a maximum) at ε = 0. It follows from a well-known theorem from the theory of functions of a single variable that
dC/dε = 0, d2C/dε2 ≥ 0 (1.2.3)
at ε = 0. (N.B. For simplicity of statement it will be assumed in future, unless otherwise stated, that C is to be minimized; it will be left for the reader to reverse the appropriate inequalities in case C is to be maximized.) Conditions (1.2.3) are clearly valid for all sets of values of the [xi]r.
Assuming that the function C possesses its differentiability and continuity properties over a domain of the control space including the optimal point, it follows that, for sufficiently small,
dC/dε = [partial derivative]C/[partial derivative]vr dvr/dε = [partial derivative]C/[partial derivative]vr [xi]r; (1.2.4)
the summation with respect to r over its values 1, 2, ..., N is here indicated by repetition of the index (the repeated index summation convention will be operative throughout the remainder of this text unless otherwise stated). Putting ε = 0 in equation (1.2.4), since then vr = ur the first of the conditions (1.2.3) takes the form
[partial derivative]C/[partial derivative]ur [xi]r = 0. (1.2.5)
But this condition must be satisfied for all values of the [xi]r and it accordingly follows that
[partial derivative]C/[partial derivative]ur = 0. (1.2.6)
A second differentiation of C with respect to e and the setting of ε equal to zero yields the result
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1.2.7)
Repetition of both indices r, s implies that a double summation must be carried out on the right-hand member of this equation, which is accordingly a quadratic form in the variables [xi]r. The second of the conditions (1.2.3) requires that this form must be positive semi-definite, i.e. may vanish for a non-null set of values of the [xi]r, but is negative for no set of values of these parameters. A necessary and sufficient condition for this to be true is that the eigenvalues of the symmetric N × N matrix A = (ars), where ars = [partial derivative]2C/[partial derivative]ur]partial derivative]us, should be positive or zero (the eigenvalues of A are defined to be the values of α for which the determinant of A - αI vanishes; I is the unit N × N matrix).
The point v = (v1, v2,..., vN) will be said to belong to a neigh-bourhood of the optimal point of radius δ if
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (1.2.8)
Suppose the quadratic form (1.2.7) is positive definite, i.e. positive and non- vanishing for all non-null sets of values of the [xi]r. Then, the eigenvalues of the matrix A will all be positive and non-zero. Since we are assuming the partial derivatives [partial derivative]2C/[partial derivative]vr]partial derivative]vs to be continuous over a domain containing the optimal point, it will be possible to find a neighbourhood Δ of this point of radius δ within which the ε eigenvalues associated with the quadratic form ([partial derivative]2C/[partial derivative]vr [partial derivative]vs)[xi]r [xi]s are also positive and non-vanishing (these eigenvalues being continuous functions of the elements ars of A). Thus, within Δ, ([partial derivative]2C/[partial derivative]vr]partial derivative] vs)[xi]r[xi]s is positive definite. Let u + η be +any point of Δ; then, by Taylor's theorem and using equations (1.2.6),
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (1.2.9)
where, in the second derivatives, v = u + θη (0 < θ < 1). Since the quadratic form in the ηr, is positive definite, we can now conclude that
C(u + η) >C(u), u + η [member of] Δ, η ≠ 0, (1.2.10)
This is to say that C possesses a local minimum at the point u. Thus, we have proved that sufficient conditions for a local minimum are equations (1.2.6) and the positive definiteness of the form (1.2.7).
PROBLEM 1. Calculate the maxima and minima of the function
C(x, y) = x2 - 12xy + y3 - 63x - 63y.
Solution: At a stationary point
[partial derivative]C/[partial derivative]x = 3x2 - 12y - 63 = 0, (1.2.11)
[partial derivative]C/[partial derivative]y = -12x + 3y2 - 63 = 0. (1.2.12)
Hence, 3x2 - 12y = - 12x + 3y2, or (x - y)(x + y) = 4(y - x).
Thus, (i) x = y or (ii) x + y + 4 = 0. In the first case, substitution for y in equation (1.2.11) leads to x2 - 4x - 21 = 0; i.e. x = - 3, 7, yielding stationary points (- 3, - 3), (7, 7). In the second case, substitution for y gives x2 + 4x - 5 = 0; thus, x = 1, - 5 and the stationary points are (1, - 5), ( - 5, 1).
To determine the nature of the stationary points, we calculate the second derivatives:
[partial derivative]2C/[partial derivative]x2 = 6x, [partial derivative]2C/[partial derivative]x]partial derivative]y = -12, [partial derivative]2C/[partial derivative]y2 = 6y
Hence, at the stationary point (- 3, - 3)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
|A - αI| vanishes when α = - 6, - 30. Since both these eigenvalues are negative, (- 3, - 3) is a local maximum; we find Cmax = 216.
Similarly, at the point (7, 7), the eigenvalues are found to be 30, 54 and this point is therefore a local minimum (Cmin = - 784).
At both the remaining stationary points (1, - 5), (- 5, 1), the eigenvalues are the roots of the equation α2 + 24α - 324 = 0. Since the product of these roots is negative, the eigenvalues have opposite signs and the points are neither maxima nor minima.
PROBLEM 2. The force exerted by the wind on the sail of a yacht is proportional to the square of the wind velocity and to the sine of the angle made by the direction of the wind with the sail; its direction is normal to the sail. Assuming that the drag on the boat is proportional to the square of its velocity, calculate the course which must be steered and the setting of the sail if the component of the yacht's velocity in the upwind direction is to be a maximum.
Solution: In Fig. 1.1, θ is the angle made by the axis of the yacht and φ is the angle made by the plane of the sail with the upward wind direction. The force exerted by the wind on the sail will be taken to be F = cW2 sin φ (we neglect the effect the yacht's motion has on the velocity with which the wind strikes the sail). If D is the drag force opposing the yacht's motion and V is its velocity, then D = kV2. Assuming uniform motion, the force components along the boat axis must balance and, hence,
kV2 = cW2 sin φ sin(θ - φ).
It is required to choose θ and φ so that V cos θ is maximized. Since
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
it is more convenient to maximize
C = C(θ, φ) = sin φ sin(θ - φ)cos2 θ.
Necessary conditions for a maximum are:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].
Rejecting the possibility that cos θ vanishes, the second of these equations is equivalent to tan φ= tan(θ - φ); thus, φ = θ - φ, or φ = 1/2θ. Substituting for φ in the first equation, it is now easy to show that (i.e. θ = 48°, φ = 24° approximately).
Excerpted from Analytical Methods of Optimization by D. F. Lawden. Copyright © 2003 D. F. Lawden. Excerpted by permission of Dover Publications, Inc..
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.