Read an Excerpt
CHAPTER 1
Index Notation
1.1 Introduction
It is a fact not widely acknowledged that, with appropriate choice of notation, many multivariate statistical calculations can be made simpler and more transparent than the corresponding univariate calculations. This simplicity is achieved through the systematic use of index notation and special arrays called tensors. For reasons that are given in the following sections, matrix notation, a reliable workhorse for many second-order calculations, is totally unsuitable for more complicated calculations involving either non-linear functions or higher-order moments. The aim of this book is to explain how index notation simplifies certain statistical calculations, particularly those involving moments or cumulants of nonlinear functions. Other applications where index notation greatly simplifies matters include k-statistics, Edgeworth and conditional Edgeworth approximations, saddlepoint and Laplace approximations, calculations involving conditional cumulants, moments of maximum likelihood estimators, likelihood ratio statistics and the construction of ancillary statistics. These topics are the subject matter of later chapters.
In some ways, the most obvious and, at least initially, one of the most disconcerting aspects of index notation is that the components of the vector of primary interest, usually a parameter θ, or a random variable X, are indexed using superscripts. Thus, θ2 is the second component of the vector θ, which is not to be confused with the square of any component. Likewise X3 is the third component of X, and so on. For that reason, powers are best avoided unless the context leaves no room for ambiguity. In principle, θ2θ3 is the product of two components, which implies that the square of θ2 is written as θ2θ2. In view of the considerable advantages achieved, this is a modest premium to pay.
1.2 The summation convention
Index notation is a convention for the manipulation of multi-dimensional arrays. The values inserted in these arrays are real or complex numbers, called either components or coefficients depending on the context. Technically speaking, each vector in the original space has components with respect to the given basis; each vector in the dual space of linear functionals has coefficients with respect to the dual basis.
In the setting of parametric inference and in manipulations associated with likelihood functions, it is appropriate to take the unknown parameter as the vector of interest: see the first example in Section 2.4. Here, however, we take as our vector of interest the p-dimensional random variable X with components X1, ..., Xp. An array of constants a1, ..., ap used in the formation of a linear combination [summation] aiXi is called the coefficient vector. This terminology is merely a matter of convention but it appears to be useful and the notation does emphasize it. Thus, for example, κi = E(Xi) is a one-dimensional array whose components are the means of the components of X and κij = E(Xi) is a two-dimensional array whose components are functions of the joint distributions of pairs of variables.
Probably the most convenient aspect of index notation is the implied summation over any index repeated once as a superscript and once as a subscript. The range of summation is not stated explicitly but is implied by the positions of the repeated index and by conventions regarding the range of the index. Thus,
[MATHEMATICAL EXPRESSION OMITTED] (1.1)
specifies a linear combination of the Xs with coefficients a1, ..., ap. Quadratic and cubic forms in X with coefficients aij and aijj are written in the form
aijXiXj and aijkXiXjXk (1.2)
and the extension to homogeneous polynomials of arbitrary degree is immediate.
For the sake of simplicity, and with no loss of generality, we take all multiply-indexed arrays to be symmetric under index permutation but, of course, subscripts may not be interchanged with superscripts. The value of this convention is clearly apparent when we deal with scalars such as aijaklωijkl, which, by convention only, is the same as aikajl ωijkl and ailajkωijkl. For instance, if p = 2 and aij = δij = 1 if i = j and 0 otherwise, then, without the convention,
[MATHEMATICAL EXPRESSION OMITTED]
and this is not zero unless ωijkl is symmetric under index permutation.
Expressions (1.1) and (1.2) produce one-dimensional or scalar quantities, in this case scalar random variables. Suppose instead, we wish to construct a vector random variable Y with components Y1, ..., Yq each of which is linear in X, i.e.,
Yr = ariXi, (1.3)
and r = 1, ..., q is known as a free index. Similarly, if the components of Y are homogeneous quadratic forms in X, we may write
Yr = ariXiXj, (1.4)
Non-homogeneous quadratic polynomials in X may be written in the form
[MATHEMATICAL EXPRESSION OMITTED]
Where two sets of indices are required, as in (1.3) and (1.4), one referring to the components of X and the other to the components of Y, we use the sets of indices i, j, k, ... and r, s, t,. ... Occasionally it will be necessary to introduce a third set, α, β, γ, ... but this usage will be kept to a minimum.
All of the above expressions could, with varying degrees of difficulty, be written using matrix notation. For example, (1.1) is typically written as aTX where a and X are column vectors; the quadratic expression in (1.2) is written XTAX where A is symmetric, and (1.3) becomes Y = A*X where A* is of order q x p. From these examples, it is evident that there is a relationship of sorts between column vectors and the use of superscripts, but the notation XTAX for aijXiXj violates the relationship. The most useful distinction is not in fact between rows and columns but between coefficients and components, and it is for that reason that index notation is preferred here.
1.3 Tensors
The term tensor is used in this book in a well-defined sense, similar in spirit to its meaning in differential geometry but with minor differences in detail. It is not used as a synonym for array, index notation or the summation convention. A cumulant tensor, for example, is a symmetric array whose components are functions of the joint distribution of the random variable of interest, X say. The values of these components in any one coordinate system are real numbers but, when we describe the array as a tensor, we mean that the values in one coordinate system, Y say, can be obtained from those in any other system, X say, by the application of a particular transformation formula. The nature of this transformation is the subject of Sections 3.4 and 4.5, and in fact, we consider not just changes of basis, but also non-invertible transformations.
When we use the adjectives covariant and contravariant in reference to tensors, we refer to the way in which the arrays transform under a change of variables from the original x to new variables y. In statistical calculations connected with likelihood functions, x and y are typically parameter vectors but in chapters 2 and 3, x and y refer to random variables. To define the adjectives covariant and contravariant more precisely, we suppose that ω is a d-dimensional array whose elements are functions of the components of x, taken d at a time. We write [MATHEMATICAL EXPRESSION OMITTED] where the d components need not be distinct. Consider the transformation y = g(x) from x to new variables y = y, ..., yp and let ari = ari (x) = [partial derivative]yr/[partial derivative]xi have full rank for all x. If [bar.ω], the value of ω for the transformed variables, satisfies
[MATHEMATICAL EXPRESSION OMITTED] (1.5)
then w is said to be a contravariant tensor. On the other hand, if ω is a covariant tensor, we write [MATHEMATICAL EXPRESSION OMITTED] and the transformation law for covariant tensors is
[MATHEMATICAL EXPRESSION OMITTED] (1.6)
where bir = [partial derivative]xi/[partial derivative]yr, the matrix inverse of ari, satisfies [MATHEMATICAL EXPRESSION OMITTED].
The function g(.) is assumed to be an element of some group, either specified explicitly or, more commonly, to be inferred from the statistical context. For example, when dealing with transformations of random variables or their cumulants, we usually work with the general linear group (1.3) or the general affine group (1.8). Occasionally, we also work with the smaller orthogonal group, but when we do so, the group will be stated explicitly so that the conclusions can be contrasted with those for the general linear or affine groups. On the other hand, when dealing with possible transformations of a vector of parameters, it is natural to consider non-linear but invertible transformations and g(.) is then assumed to be a member of this much larger group. In other words, when we say that an array of functions is a tensor, the statement has a well defined meaning only when the group of transformations is specified or understood.
It is possible to define hybrid tensors having both subscripts and superscripts that transform in the covariant and contravariant manner respectively. For example, if ωij and ωijk are both tensors, then the product λijklm = ωijωklm is a tensor of covariant order 3 and contravariant order 2. Furthermore, we may sum over pairs of indices, a process known as contraction, giving
λikl = λijklj = ωijωklj.
A straightforward calculation shows that λikl is a tensor because, under transformation of variables, the transformed value is
[MATHEMATICAL EXPRESSION OMITTED]
and hence, summation over m = j gives
[MATHEMATICAL EXPRESSION OMITTED]
Thus, the tensor transformation property is preserved under multiplication and under contraction. An important consequence of this property is that scalars formed by contraction of tensors must be invariants. In effect, they must satisfy the transformation law of zero order tensors. See Section 2.6.
One of the problems associated with tensor notation is that it is difficult to find a satisfactory notation for tensors of arbitrary order. The usual device is to use subscripted indices as in (1.5) and (1.6), but this notation is aesthetically unpleasant and is not particularly easy to read. For these reasons, subscripted indices will be avoided in the remainder of this book. Usually we give explicit expressions involving up to three or four indices. The reader is then expected to infer the necessary generalization, which is of the type (1.5), (1.6) if we work with tensors but is usually more complicated if we work with arbitrary arrays.
1.4 Examples
In this and in the following chapter, X and Y are random variables with components Xi and Yi. Transformations are linear or affine. When we work with log likelihood derivatives, it is more appropriate to contemplate transformation of the parameter vector and the terms covariant and contravariant then refer to parameter transformations and not to data transformations. To take a simple example, relevant to statistical theory, let l(θ; Z) = log fZ(Z; θ) be the log likelihood function for θ = θ1, ..., θp based on observations Z. The partial derivatives of l with respect to the components of θ may be written
[MATHEMATICAL EXPRESSION OMITTED]
and so on. The maximum likelihood estimate of θ satisfies Ur ([??]) = 0, and the observed information for θ is Irs = -Urs([??]), with matrix inverse Irs. Suppose now that we were to re-parameterize in terms of φ = φ1, ..., φp. If we denote by an asterisk derivatives with respect to 0, we have
[MATHEMATICAL EXPRESSION OMITTED] (1.7)
where
[MATHEMATICAL EXPRESSION OMITTED]
and the derivative matrix 6ir is assumed to have full rank with inverse 0r = 34>r/36. Arrays that transform like U, I and I are tensors, the first two being covariant of orders 1 and 2 respectively and the third being contravariant of order 2. The second derivative, U(6), is not a tensor on account of the presence of second derivatives with respect to 6 in the above transformation law. Note also that the array U*s cannot be obtained by transforming the array U alone: it is necessary also to know the value of the array U. However E{Urs(6); 6}, the Fisher information for 6, is a tensor because the second term in Ur s has mean zero at the true 6.
To take a second example, closer in spirit to the material in the following two chapters, let X = X1, ..., Xp have mean vector κi = E(Xi) and covariance matrix
[[MATHEMATICAL EXPRESSION OMITTED]
Suppose we make an affine transformation from X to new variables Y = Y1, ..., Yq, where
Yr = ar + ariXi. (1.8)
The mean vector and covariance matrix of Y are easily seen to be
[MATHEMATICAL EXPRESSION OMITTED]
where ari = [partial derivative]Yr/[partial derivative]Xi. Thus, even though the transformation may not be invertible, the covariance array transforms like a contravariant tensor. Arrays that transform in this manner, but only under linear or affine transformation of X, are sometimes called Cartesian tensors (Jeffreys, 1952). Such transformations are of special interest because ari = [partial derivative]Yr/[partial derivative]Xi does not depend on X. It will be shown that cumulants of order two or more are not tensors in the sense usually understood in differential geometry, but they do behave as tensors under the general affine group (1.8). Under non-linear transformation of X, the cumulants transform in a more complicated way as discussed in Section 4.4.
A tensor whose components are unaffected by coordinate transformation is called isotropic. This terminology is most commonly used in applications to mechanics and the physics of fluids, where all three coordinate axes are measured in the same physical units. In these contexts, the two groups of most relevance are the orthogonal group O, and the orthogonal group with positive determinant, O+. In either case δij, δij and δij are isotropic tensors. There is exactly one isotropic thirdorder tensor under O+ (Exercise 1.22). However, this tensor, called the alternating tensor, is anti-symmetrical and does not occur in the remainder of this book. All fourth-order isotropic tensors are functions of the three second-order isotropic tensors (Jeffreys, 1952, Chapter 7). The only symmetrical isotropic fourth-order tensors are
[MATHEMATICAL EXPRESSION OMITTED]
(Thomas, 1965, Section 7). Isotropic tensors play an important role in physics (see Exercise 1.21) but only a minor role in statistics.
(Continues…)
Excerpted from "Tensor Methods in Statistics"
by .
Copyright © 2018 Peter McCullagh.
Excerpted by permission of Dover Publications, Inc..
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.