Текст
                    Matrix Perturbation Theory
G. W Stewart
Computer Science Department
Institute for Advanced Computer Studies
University of Maryland
College Park, Maryland
Ji-guang Sun
Computing Center of the Chinese Academy of Sciences
Beijing, China
@
This is a volume in
COMPUTER SCIENCE AND SCIENTIFIC COMPUTING
Werner Reinboldt and Daniel Siewiorek, editors
ACADEMIC PRESS. INC.
Harcourt Brace Jovanovich, Publishers
Boston San Diego New York
London Sydney Tokyo Toronto


, , i L , " i i i i i I , I j I , Contents Preface xiii I Preliminaries 1 Notation . . . . . . . Notes and References Exercises . . . . . . . 2 The QR Decomposition - Projections 2.1 The QR Decomposition 2.2 Hadamard's Inequality 2.3 Projections . . . Notes and References . . . Exercises . . . . . . . . . . 3 Eigenvalues and Eigenvectors . 3.1 Definitions and Elementary Properties 3.2 The Schur Decomposition . 3.3 The Jordan Canonical Form 3.4 Invariant Subspaces. . . . . 3.5 The Field of Values. . . . . 3.6 SUlns of Hermitian Matrices Notes and References . . . . . . Exercises . . . . . . . . . . . " . 4 The Singular Value Decomposition 4" 1 The Singular Value Decomposition 4.2 Two Inequalities. . . " . . . 1 1 4 5 6 6 8 9 10 11 14 14 17 20 21 23 25 26 27 30 30 33 vii 
VIII CONTENTS Notes and References Exercises . " . . . " . 5 Pairs of Subs paces " . 5.1 The CS Decomposition " 5"2 Pairs of Subspaces . 5": Pairs of Projections. Notes and References Exercises . . . . . . II Norms and Metrics 1 Vector Norms 1.1 Definition . . . 1.2 Examples . . . 1"3 Equivalcnce and Limits. 1.4 Lincar Functionals and Dual N arms " Notes and References Exercises . . . . . . 2 Matrix Norms . . . . 2.1 Basic Concepts . 2.2 Operator Norms Notes and References Exercises . . . . . . . 3 Unitarily Invariant Norms. 3.1 Von Neumann's Theory 3.2 Properties of Unitarily Invariant Norms 3.3 Doubly Stochastic Matrices and Fan's Theorem Notes and References . . . . Exercises . . . . . . . . . . . 4 Metrics on Subspaces of en . 4.1 The Gap. . . . . . . . . 4.2 Unitarily Invariant Metrics. Notes and References Exercises . . . . . . . . . . . . . III Linear Systems and Least Squares Problems 1 Th(' PS('1\(lo- I nvers(' and L('ast Squares. . . . . 1.1 Generalized Inverses and the Pseudo-Inverse 1.2 Projections and Least Squares. . . . . . . . 34 36 37 37 40 43 45 45 49 50 50 51 53 56 59 60 64 64 67 71 71 74 74 79 81 87 IV 88  . 89 \. 90 94 98 99 101 .102 .102 . 106 CONTENTS IX Notes and References . . . . Exercises . . . . . . . . . . " 2 Inverses and Linear Systems 2.1 Absolute and Relative Errors 2.2 The Inverse Matrix . . . . . . 2"3 Linear Systems . . . . . . . . 2.4 Asymptotic Forms and Derivatives Notes and References Exercises . . . . . . . . . . . . . . . . . 3 The Pseudo-Inverse . . . . . . . . . . . 3.1 Projections and Acute Perturbations 3.2 General Results . . . . . . . . " . . 3.3 Acute Perturbations . . . . . . . . 3.4 Asymptotic Forms and Derivatives Notes and References Exercises . . . . . . . 4 Projections...... Notes and References 5 The Linear Least Squares Problem 5.1 Perturbation of the Coefficients 5.2 The Residual . . . . . . " . . . 5.3 Backward Perturbations . . " . 5.4 Asymptotic Forms and Derivatives Notes and References Exercises . . . . . . . 108 109 114 . 115 117 124 130 132 134 136 137 140 146 . 150 151 152 153 155 155 156 160 160 162 163 163 The Perturbation of Eigenvalues 1 General Perturbation Theorems ...... 1.1 Continuity: OstrowskiElsner Theorems 1.2 The Bauer-Fike and Henrici Theorems 1.3 Residual Bounds Notes and References Exercises . . . . . . . 2 Gerschgorin Theory: Differentiability 2.1 Gerschgorin's Theorem 2.2 Diagonal Similarities Notes and References " . . 165 166 166 170 174 176 . 178 180 181 182 " 186 
x CONTENTS Exercises . . . " . . . . . " " " " . . . 3 N onnal and Oiagonalizable Matrices . 3.1 The Hoffman"' Wielandt Theorem 3.2 Diagonalizable Matrices Notes and References Exercises . . . . . . . . . . 4 Hermitian Matrices . . . . 4.1 Inertia and Interlacing 4.2 Wielandt's Theorem and Its Consequences 4.3 Mirsky's Theorem ........... 4.4 Residual Bounds . . . . . . . . . . . . 4.5 Approximation by a Low-Rank Matrix Not('s and References ExPfcis('s " " " " . . " 5 Some Further Results 5"1 Non-Hermitian Perturbations 5.2 Similarity Bounds. Notes and References Exercises . . . . . . V Invariant Subspaces 1 The Theory of Simple Invariant Subspaces 1.1 Definition . . . . . . . . . . . . . . 1.2 The Operator T = X I-> AX - X B 1.3 The Spectral Resolution Notes and References . . . . Exercises . . . . . . . . . . . 2 Perturbation of Invariant Subspaces 2.1 The Approximation Problem 2.2 Perturbation Theorems. . . . 2"3 Eigenvectors. . . . . . . . . . 2.4 Solution of a Nonlinear Equation Notes and References Exercises . . . . . " . " . . " . . . 3 Hermitian Matrices . . . . " . . " 3.1 The Approximation Theorem 3.2 Generalized Rayleigh Quotients 187 189 189 192 193 194 . 196 . 196 . 198 .203 . 2CJ5 .208 " 209 " 210 .211 .212 . 215 . 217 .217 219 .220 .220 .222 .223 .227 .227 .229 .230 .236 .240 .242 .244 .245 .246 .246 .248 L ! I ! , CONTENTS XI 3.3 Direct Bounds" . " " " . . . . " . 3.4 Residual Bounds for Eigenvalues Notes and References . " . " . . . Exercises . . . . . . . . . . . " " " 4 The Singular Value Decomposition 4.1 Two sin e Theorems . . . 4.2 A Perturbation Expansion Notes and References Exercises . . . . . . . . . . . . .249 " 254 .258 .258 " 259 .260 .263 .266 .267 VI Generalized Eigenvalue Problems 1 Background .....".".... 1.1 Matrix Pairs ......... 1.2 Triangular and Weierstrass Farms 1.3 Definite Pairs " " . . " . . . . 1.4 Metrics and Their Limitations Notes and References Exercises . . . . . . . 2 Regular Matrix Pairs 2.1 Continuity, First Order Theory 2.2 Gerschgorin Theory. 2.3 Diagonalizable Pairs 2.4 Eigenspaces . . . Notes and References Exercises . . . . . . . 3 Definite Matrix Pairs 3.1 Eigenvalues of Oefinite Pairs. 3.2 Eigenspaces . . . 3.3 Direct Bounds. . Notes and References Exercises . . . . . . . 271 .273 .273 .276 . 281 .283 .289 .290 . 291 . 291 .294 .300 .303 .311 . 312 . 312 . 313 . 317 .322 .324 .324 References 325 Notation 347 Index 351 
; I ! I '" I I I 11  r Preface The central question of perturbation theory is: How does a function change when its argument is subject to a perturbation? The function may be almost anything - the modes of a vibrating system, the solution of an ordinary or partial differential equation, the states of an electron. This book is concerned with the perturbation of matrix functions, such as the solution of a linear system or the singular values of a matrix. The result of a perturbation analysis may be a perturbation expan- sion or a perturbation bound. A perturbation expansion approximates the perturbation in the function in terms of a known perturbation in the argument. Perturbation expansions are widely used in the physical sCIences. A perturbation bound starts with a bound on a perturbation in the argument - here the perturbations are often called errors - and uses it to bound the resulting error in the function. Matrix perturbation theory has traditionally emphasized perturbation bounds. The reasons are varied, but two are paramount. The first reason is the widespread use of backward rounding-error analysis, a technique in which errors made in executing an algorithm are thrown back on the original data. To complete the analysis, pertur- bation theory is used to assess the effects of these backward errors on the accuracy of the computed solution. Since only bounds on the error are known, only bounds on the error in the solution can be expected. The second reason is that perturbation bounds often give insight into the behavior of a function of a matrix under perturbation. For XIII 
XIV PREFACE PREFACE XV example, it may be possible to determine a multiplier, called a CON- DITION NUMBER, that converts a bound on the error in the argument into a bound on the error induced in the function. The knowledge of a condition number enables one to say whether a function is sensitive or insensitive to perturbations of its argument. Although we shall give some perturbation expansions, we will be chiefly concerned with perturbation bounds. Deriving perturbation bounds is like cutting a diamond. Tap a problem in just the right way and it decomposes into one or two informative expressions. Jmash it with a hammer and it shatters into ugly, uninterpretable pieces. One of the purposes of this book is to introduce the reader to the art of deriving perturbation bounds. Our book began life as a translation of a book in Chinese of Sun [229, 19 8 7] on perturbation theory. As we progressed, however, it became clear that an expanded treatment was needed. The result is an entirely new book, in which the spirit of the old still lives. Matrix perturbation theory is too large a field to fit between the covers of a single book, and we have had to be selective in our choice of topics. We have chosen to treat the solution of linear systems and least squarcs problems (Chapter 111), the eigenvalue problem (Chap- ters IV and V), and the generalized eigenvalue problem (Chapter VI). Aesthetics has played a part in our choice of what results to present; we have generally preferred simple, informative bounds to more com- plicated, though technically sharper bounds. In particular, we have not been greatly concerned to seek out optimal bounds when nearly optimal ones are available for less effort. The book is divided into chapters, sections, and subsections, with bibliographical notes and exercises at the end of each section. We have tried to keep the notation uniform. Our hero is the intrepid, yet sensitive matrix A. Our villain is E, who keeps perturbing A. When A is perturbed he puts on a crumpled hat: A = A + E. There are many parts for A to play ,- he is variously square, rectangular, Hermitian, normal, and unitary. To avoid confusion, A's current guise is posted at the beginning of each chapter or, if necessary, section. The book is largely self contained. We have made a deliberate effort to keep important material outside the proofs, which in most cases can be skipped without loss of continuity. When a proof illustrates a general technique, we point out the fact explicitly. The notes and references emphasize original sources and historical development. They also point the reader to topics not treated, many of which are developed in the exercises. However, the bibliography, like the book, is not comprehensive, and we apologize in advance to all those whose work we may have slighted" The assigning of names to theorems is complicated by the fact that some theorems have been discovered twice - as theorems in linear al- gebra and as theorems in functional analysis. We have adopted the practice of naming the first claimant, whatever his field and adding names of people who made substantial generalizations. Thus, the t.he- orem on low-rank approximation, known to many as the Eckart--Young theorem, is called the Schmidt- Mirsky theorem after Schmidt, who proved it for integral equations, and Mirsky who showed it held for all unitarily invariant norms. Difficult cases are treated in the notes. The exercises are intended to amplify the material in the text and to introduce new results. They range from the trivial to the very difficult. We have not graded them, but the reader will do well to assume that any exercise with a reference attached is a major undertaking, especially in the later chapters. We debated whether to include numerical examples. Our decision not to was directed by the increasing availability of interactive systems that manipulate matrices. With these systems it is a small matter to play with a perturbation bound, watching it perform under a variety of circumstances. Beside such lively exercises, a printed example is a thing of lead and stone. We wish to thank Nick Higham, Xiaobai Sun, and Guodong Zhang for reading and commenting on parts of the text. Sven Hammarling and Vince Fernando furnished us with historical material on the singular value decomposition. We are particularly indebted to the staff of the library of the National Institute of Standards and Technology, who contributed much to the completeness of the bibliography. We are also indebted to the National Science Foundation of the People's Republic of China for its support. College Park Beijing 1990 
I I I , I , , Chapter I Preliminaries This chapter prepares the ground for the chapters that follow. To keep it reasonably short it contains only material that is used in two or more chapters. Background for the individual chapters is developed in initial introductory sections. The first section introduces some notation" In Section 2 we intro- duce the QR decomposition. In Section 3 we review the elementary theory of eigenvalues, eigenvectors, and invariant subspaces. In Sec- tion 4 we introduce the singular value decomposition. The chapter concludes with a section on the geometry of pairs of subspaces. 1. Notation In this section we will review our basic notation. Other notation will be introduced as needed. A summary of all notation is given in an appendix at the end of the book. The real numbers will be denoted by R. The space of all n-dimen- sional column vectors with components in R will be denoted by Rn. The set of all m x n matrices with elements in R will be denoted by RTnxn. The complex numbers, their vectors, and matrices will be denoted bye, en, and e TnXn . To avoid clutter, we will use this notation sparingly, giving dimensions only when they are required and cannot be inferred from context. 1 
2 1. PRELIMINARIES Generally we will use lower-case Greek letters for scalars, lower- case Latin letters for vectors, and upper-case Latin and Greek letters for matrices. However, we will not make a fetish of this convention , especially for scalar valued functions like the determinant and rank. Sets of all kinds will be denoted by calligraphic letters (but note Rand e in the last paragraph.) We will maintain a loose association between the letter denoting a vect,or or matrix and the lower-case Greek letter denoting its elements. Thus aij will usually denote the (i, j)-element of a matrix A, and (3i the ith component of b. The reader should keep in mind the association of  with :r and 11 with y. The zero matrix, vector, or scalar will all be written O. The identity matrix will be written 1- or In when it is necessary to specify the order" The vector of all ones of any dimension will be written 1 (a boldface one). The ith unit vector, whose ith component is one and whose other components are zero, is 1 i (this unusual notation is due to the fact that in matrix perturbation theory the more conventional symbols e and ei are needed to represent errors). The transpose of the matrix A will be written AT, and its conjugate transpose AH; i.e., AH = J'F. As usual AI will denote the inverse of A. The inverse of the transpose of A will be written AT, and the inverse conjugate transpose A -H. Opcrations on or between sets will run through all combinations of the members of the sets. For example, if X, Y c en, then ,1'+ Y = {x + Y: x E X, Y E Y} and AX = {Ax: x EX}. The COLUMN SPACE of A E e mxn is R(A) = {Ax: x E en}. Its NULL SPACE is N (A) = {x : Ax = o}" The RANK of A is rank(A) = dim[R(A)], where dim(X) denotes the dimension of the subspace X. The DETERMINANT of a square matrix A will be written det(A), and its TRACE written trace(A). 1. NOTATION 3 We will write Ilxlb for the 2-norm of a vector, which is dcfined as the positive square root of 2:i ld2 = xll.T" The 2-nonn is also called Euclidean norm, since in real two or three dimensional space it is the Euclidean length of its argument. The properties of this and other vector norms are treated in detail in Section ILL The matrix IAI is the matrix whose elements are laijl. We write A > B to mean that aij 2': (3ij for all i, j, with similar definitions for th;relations >, ::;, <. Note that this notation is inconsistent with the convention by which A > B means that A - n is positive definite. The symbol "(,, is used in definitions to introduce new terminology. The rclation "::::::" is used for implicit definitions; for example, (a, (3 + ,) :::::: (a, 8) defines 8 to be (3 + f. .. Some special types of matrices are collected in the followmg defim- tion. Definition 1.1. A matrix A is 1. SYMMETIC (HERMITIAN) if AT = A (AH = A); 2. POSITIVE DEFINITE (POSITIVE SEMI-DEFINITE, NEGATIVE DEFINITE, NEGATIVE SEMI- DEFINITE) if it is Hermitian and xII Ax > (2':, < , ::;) 0 for all x =/: 0; 3. UNITARY, or ORTHOGONAL in the real case, if A H A = AA H = I; 4. NORMAL if AHA = AAH; 5. UPPER TRIANGULAR if it is square and i > j =} O'ij = 0; i.e., if it is zero below its diagonal; 6. LOWER TRIANGULAR if it is square and i < j =} aij = 0; i.e., if it is zero above its diagonal; 7. DIAGONAL if it is upper and lower triangular; i.e., its nonzero elements are on its diagonal; 8. a PERMUTATION MATRIX if it is obtained by permuting rows and columns of the identity matrix. 
4 1. P RELIMIN ARIES The notation diag( 6 1 ,6 2 , . . . ,6 n ) will mean a DIAGONAL MATRIX whose diagonal elements are 6 1 , (h, . . . , 6n. The scalars 6 i may be replaced by square matrices, in which case the matrix will be said to be BLOCK DIAGONAL. BLOCK TRIANGULAR matrices are defined similarly. Notes and References This book presupposes a knowledge of basic matrix theory, for which there are any number of good introductory texts. For more advanced treatments see [98, 140]. The books by Gantmacher [81, 1959] and Bellman [21, 1970] and the little pamphlet by Marcus [151, 1960] are classics. The perturbation theorist will find the survey of matrix inequalities by Marcus and Minc [152, 1964] particularly useful. Other useful references on inequalities are [20, 160]. Mac Duffee [150, 1946] and Wedderburn [257, 1934] contain extensive references to the older literature. We have drawn heavily on the former in preparing the notes for this book. Much of matrix perturbation theory comes from numerical linear algebra. For an entry into this area see the books by Varga [252, 1962], Householder [121, 1964], Wilkinson [269, 1965], Stewart [203, 1974], and Golub and Van Loan [93, 1983]. Particular mention should be made of Parlett's excellent book on the symmetric eigenvalue problem [175, 198o], which contains a host of perturbation results. Another source of matrix perturbation theory is the specialization of perturbation theorems for operators in infinite dimensional spaces - a vast area with an extensive literature. The definitive reference is by Kato [135, 1966]. A third, hybrid source is the approximation of linear operators by finite dimensional operators, for which see the book by Chatelin [43, 1983]. There is no standard notat.ion for matrix theory. The one adopt.ed here will not be unfamiliar to numerical analysts. When A E em x n is regarded as an operator on en, its range is the same as the space spanned by its columns - hence t.he use of R(A) to denote the column space of A. Our definit.ion of positive definite carries the implication that the matrix is symmetric. Some authors (e"g., see [276]) drop this requirement. 1. NOTATION 5 Exercises 1. Let A be nonsingular. Show that if T = I + V" A -I U is nonsingular, then (A + UV")-I = A-I - A-IUT-1V H AI. For a history of this useful formula, which is sometimes called the Sherman MorrisonWoodbury formula, see [108]. 2. Show that R(A) nN(A") = {OJ. 3. Show t.hat det(I - 1W H ) = 1 - V H l1. 4. (Cauchy inequality). Show that IIx H yl12 :::: II:rIl2I1yIl2' with equality if and only if x and yare linearly dependent. 5. (Triangle inequality). Show that II.T + yl12 :::: IIxll2 + Ily112' 6. Show that trace(AHA) = Li,j IO:ij12. 7. Show that the diagonal elements of a Hermitian matrix are real. 8. A matrix A is SKEW HERMITIAN if A" = -A. Show that the diagonal elements of it skew Hermitian matrix are imaginary. 9. Show that any square matrix A can be written uniquely in the form A = A + iA where A and A are Hermit.ian. 10. Show that if A is positive definite, then Ax = 0 => x = O. 11. Show that for any matrix A the cross-product matrix AT A is positive semi-definite. Show that AT A is positive definite if and only if the columns of A are linearly independent. 12. Let 1111112 = 1. Show that H = 1- 21lu H is Hermitian and unitary. The matrix H is called a HOUSEHOLDER TRANSFORMATION. 13. Show that triangular matrix is normal if and only if it is diagonal. 14. A matrix is STRICTLY UPPER TRIANGULAR if it is upper triangular with zero diagonal elements. Show that if A is a strictly upper triangular matrix of order n then An = O. 
6 I. PRELIMINARIES 2. The QR Decomposition - Projections Throughout this book we will be confronted with the following prob- lem: Given a matrix A, find an orthonormal basis for R(A) and an ort.honormal basis for the orthogonal complement of R( A)" In this section we will ta.ke a constructive approach to the problem via the QR decompositiOlL We will then use the resulting bases to construct orthogonal projections onto a sllbspace a.nd its complcmenL 2.1. The QR Decomposition To establish the existence of the QR decomposition, we will use a lemma, which is useful in its own right. Its proof is purely compu- tational and is left as an exercise. Lemma 2.1 (Householder). Let Ilxlh = 1 and suppose that the first component of x is real and nonnegative. Let x + 1 1 u= II x + 1 1 112' Then the matrix H = I - 2uu H is Hermitian, and unitary. Moreover, Hx = -1 1 , (2.1) Equation (2.1) can be written in the form H11 = -x. In other words, there is a unitary matrix whose first column is -x. By scaling the first column, we may change it to any multiple of x. Hence, given any vector x of 2-norm one, there is a unitary matrix whose first column is x. We may use this result to reduce a matrix to upper triangular form by a unita.ry transfonnatiOlL 2. THE QR DECOMPOSITION 7 Theorem 2.2. Let A be an m x n matrix with m :::: n. Thcn thcrc is a unitary matrix Q such that QUA  (  ) , where R is upper triangular with nonnegative diagonal elements. Proof. The proof is by induction on the number of columns of A. Let n = 1, so that A is a vector-call it a. Let Q be a unitary matrix whose first column is aillalh (if a = 0 let Q = I). Then QUa  ( III, ) , which is in the required form" Now let A have n > 1 columns, and partition A in the form A = (a A*). Let H be a unitary matrix whose first column is aillall2 (if a = 0 let H = I). Then HH A can be written HU A  (lIaI' I). By hypothesis there is a unitary transformation V such that vHe  U ) where 5 is upper triangular with nonnegative diagonal elements. If we set Q = Hdiag(l, V), then ( Ilalh b H ) QH A = 0 5 , o 0 which is the required upper triangular form. . 
8 I. PRELIMINARIES Let us now assume that A has rank n. Since R is of order n, it must be nonsingulc.r with positive diagonal elements. Partition Q = (QA Q), where QA has n columns. Then A = (Q A Q) ( . ) = Q A R. (This decomposition is sometimes called the QR FACTORIZATION of A; it is essentially unique [Exercise 2.6].) It follows that R(QA) = R(A); i.e., the columns of QA form an orthonormal basis for R(A). More- over, the columns of Q form an orthonormal basis for the orthogonal complement R(A)J.. Thus we have the following corollary. eorollary 2.3. Let X be a subspace of en. Then there is a unitary matrix Q = (Q.l' Q) such that R( Q,y) = X. Proof. Let. the colUlllns of A forIll a basis for X. - 2.2. Hadamard's Inequality The main use we will put the QR decomposition to in this section is to establish the properties of orthogonal projectors. But before we do, we shall digress to establish an important determinantal inequality due to Hadamard. Theorem 2.4 (Hadamard). Let A = (al a2 .., an) be of order n. Then 'I .  " ' n 1 det(A)1 ::; II IlaJ2' j=l Moreover that equality holds if and only if A has a zero column or A is llni tar)'. Proof. Let QIIA = R == (rl,...,r n ) be the QR factorization of A. Since premultiplication by an orthogonal matrix does not change the magnitude of the determinant or the norms of the columns of A, we have (2.2) I det(A)\ = I det(R)1 ::; II IPii! ::; II Ilrilb = II Ilailb. If equality holds then either det(A) = 0 and one of the ai is zer or dd.(A) of 0 and {Iii = 111/.;1\2 (i = 1,... , n), which can only happen If A is unitary. - 2. THE QR DECOMPOSITION 9 2.3. Projections Let X be a subspace of en and let the columns of Q,y forIll an or- thonormal basis for X. The matrix P,l' = QA:'Q (2.3) is called the ORTHOGONAL PROJECTION ONTO X. Informally, P,y acts like the sun at high noon, projecting a vector z into its shadow P,yz on the ground X. In this subsection we will treat the elementary properties of projections. For (2"3) to be a proper definition, it must be independent of the choice of Q,y. In fact, if the columns of Q,y also form an orthonormal basis for X, then Q,y = Q,y U for some nonsingular U. Since I = QQ,y = UIIQQxU = UHU, it follows that U is unitary. Hence , 'II II II H Q,yQ,y = Q,yUU Q,y = Q,yQ.l' = P,y. The matrix P,y is Hermitian (P,l! = P,y) and idempotent (P = P,y). The fact that P,y is idempotent implies that for aliy x E X we have P,yx = x. In fact, since R(P,y) = X, we must have x = P,yw for some vector w. Hence P,yx = Pw = P,l'W = x. On the other hand, the fact that P,y is Hermitian implies that P,yy = 0 for any y E Xl.' In fact, y must be orthogonal to the columns of P,l'; that is, 0 = (yH p,1:.)H P1!y = P,yy. It follows that if we decompose any vector z in the form z = x + y, x E X, Y E Xl., (2.4) then x = P,yz and y = (I - P,y )z. Thus P,y projects z orthogonally onto its component in X, and I - P,y = P,t projects z onto its component in Xl.. Moreover, if P is any Hermitian, idempotent matrix with column space X, then by the above argument pz = x, where x is defined by (2.4). It follows that p,\,z = pz for all z, and hCllce PA:' = P. III other words, 
10 I. PRELIMINARIES 2. THE QR DECOMPOSITION 11 z = p'yZ + Pl.Z triangular. The requirement that the first component of.T be nonnegative is more than a trick to avoid tLe degenerate case :1: = -11; it is necessary for numerical stability. In one sense the QR decomposition goes back to Gram [97, 1883], who intro- duced the idea of orthogonalizing a sequcnce of functions and gave a determi- nantal expression for the resulting sequence: Schmidt [192, 1907] described the orthogonalization technique now known as the Gram-Schmidt algorithm and pointed out that the results are the same as Gram's. If Schmidt's fornlU- las are applied to the columns of A, they compute the columns of QA with the elements of R appearing as the coefficients in the expansions (Exercise 2.4). A drawback of the GramSchmidt approach to thc QIl decomposition is that it does not provide an explicit basis for R(A)l.. The name of the decomposition derives from Francis's QIl algorithm for finding the eigenvalues of a matrix [75, 1961 2] and its precursor the LR al- gorithm [191, 1955]. The letter R comes from the German word recht the equivalent of English upper in reference to triangular matrices. The letter Q was chosen "somewhat arbitrarily" by Francis. Projections do not have to be orthogonal. In fact, any idempotent matrix P, Hermitian or not, can be regarded as an oblique projection onto R(P) along N(pH) (see Exercise 2.10). We will use such oblique projections in the perturbation theory for invariant subs paces (Chapter V). For more on least squares approximations, see the notes and references to Section IlLI. any Hermitian, idempotent matrix is the orthogonal pro- jection onto its column space. By way of notation, we will write P A for the orthogonal projection onto R(A). We will write Pj; and pl for the projections complemen- tary to P.t' or PA- When X or A can be inferred from context, we will simply write P and Pl. for the complementary projections. Since the vectors in the decomposition are orthogonal, we have IlzlI = (Pxz + pl.z)lI(pxz + Pl.z) = (PXZ)H(pxz) + (Pl.Z)H(pl.Z) = IIPxzll + IIPl.zlI, a relation we shall call the PYTHAGOREAN EQUALITY. This equality IS the basis of the following important theorem. Theorem 2.5. If X is a subspace of en, then Pxz is the unique vector saUs(ying Ilz - Pxzlb = minllz - .T112' xEX Proof. For any x E X Ilz - xll = II (Pxz - x) + Pl.zlI = IIPx z - xll + IIPl.zlI. Exercises The right-hand side of this relation is clearly minimized when its first t.erm is zero; that is, it is minimized if and only if x = Pxz. . Thus Px z is the vector in X that is nearest z in the 2-norm. It is called the LEAST SQUARES APPROXIMATION to z, since the sum of squares of the absolute values of the components of z - Pxz is minimal. 1. (Householder [120]). Show that if x =1= y, IIxll2 = lIy112, and yHx is real, then there is a Householder transformation H such that H x = y. 2. Let Tn  n and A E c rnxn have rank k. Show that there is a permutation matrix P and an orthogonal matrix Q such that QHAP= ( I R I 2 ) o ' Notes and References where Rll is a nonsingular, upper triangular matrix of order k. Moreover, Although Householder transformations are mentioned in passing by Feller and Forsythe [73, 1951], Householder [120, 1958] was the first to use them systematically to reduce a matrix to a simpler form, in this case upper 'Precursors may be found in the works of Laplace [141, 1820J and Chebyshev [44, 1859]" But the former is not concerned with the decomposition qua decomposition (the formula.s are used to determine the variance of a least-squares estimate), and the latter restricts himself to polynomials. 
12 1. P RELIMIN ARIES the columns of the matrix p ( Rj12 ) form a basis for N(A). :t Use Exercise 2.2 to show that any matrix A can be written in the form A = Fell, where F and e have full column rank. This factorization, which is not unique, is called a FULL RANK FACTORIZATION of A. THE NEXT TWO EXERCISES TREAT TilE GRAM-SCHMIDT AL- GORITHM, AN IMPORTANT ALTERNATIVE FOR COMPUTING THE QR DECOMPOSITION" 4. (The Gram Schmidt. algorithm). Let A = (al a2 ... an) have rank n, and consider the followinp; alp;orithm. for j := 1 to n qj : = aj for i := 1 to j-l T Pij := qi qj end for for i := 1 to j - 1 qj := qj-Pijqi end for pjj := JqJ qj .- -\ / qj . - Pjj j end for Show that the algorithm goes to completion. Moreover, if we set QA (ql q2 "., q,,) and l PH P12 PIn ] o P22 p2n o 0 pm then QQA = I and A = QR; i.e., the GramSchmidt algorithm computes the QR factorization of A. R= fL (Th\' IIHHlilh'd Gram Schmidt, algorithm). Show that if tllP two inner loops in the Gram=-Schmidt algorithm are replaced by the single loop 2. THE QR DECOMPOSITION 13 for i : = 1 to j - 1 Pij : = q; qj qj := qj - PiN; end for Then the same decomposition is computed. This modified procedme has superior numerical properties [35]. -0- 6. Let A = Q R be a QR factorization of A. Show that if A has full column rank, then any QR factorizations of A has the form A = (QD)(DR), where IDI =1. 7. Let R be the R-factor in the QR factorization of A. Show that All A = R H R. The matrix R is called the CIlOLESKY FACTOR of All k 8. (The partitioned QR factorization). Let A E C mx1I , where 111 > n. Let A be partitioned in the form A = (AI A 2 ), where Al has full colu;;m rank. Let (AI Ad'(Al A2) = ( C: C12 ) C I2 C22 be a conformal partitioning of the cross-product matrix C = AH A, and let :: ) be a conformal partition of a QR factorization of k Show that ( ( Rll Al A2) = (Ql Q2) 0 1. Al = QIRll 1'. C ll = RliRll 2. PAl A 2 = Q I RI2 2'. C 1 /C 1 2 = Rj/ RI2 3. PX l A 2 = Q2R22 3'. C22 - cttc 1 /C l2 = Rr2 R 22. The matrix C 2 2 - C{C1/CI2 is called the SCHUR COMPLEMENT of Cll in C [193, 1909] and appears in many applications. For more see [108, 170]. 9. Let A = (al ... an) be of order n and let p/ be the projection onto the orthogonal complement of the space spanned by al,..., a;-l (take Pt = I). Show t.hat n I det,(A)1 = I1IJP/aiI12' i=1 
14 I. PRELIMINARIES Hence deduce Hadamard's inequality. 10. Show that if P is idempotent then R(P) ffi N(P) = en. Conclude that any idempotent matrix P is the projection onto R(P) along N(P). Such projections arc called OBLIQUE PROJECTIONS. 11. Show that if P is any projection then rank(P) = trace(P). 3. Eigenvalues and Eigenvectors This section is devoted to the eigenvalue problem Ax = Ax. Throughout the section, A will denote a matrix of order n. 3.1. Definitions and Elementary Properties We begin with the usual definition of an eigenvalue and eigenvector of a matrix. Loosely speaking, an eigenvector is a vector that does not change direction when it is multiplied by A, and its eigenvalue is the amount by which it shrinks or expands in the process. Definition 3.1. The pair (x, A) is called an EIGENPAIR of the matrix A if x i- 0 and Ax = AX. The vector x is called an EIGENVECTOR of x, and A is its associated EIGENVALUE. The set of all eigenvalues of A is written £(A). A word on nomenclature is appropriate here. The prefix eigen is German, and in this connection it means something like "characteris- tic." Purists originally objected to the hybrid translation "eigenvalue" for the German eigenwert, preferring one or another of a host of names (characteristic value, proper value, latent root, etc.). By now, however, "eigen" has become a living English prefix that means "pertaining to eigenvalues and eigenvectors," and we will use it with complete free- dom -, as we did in defining the term eigenpair above. The equation Ax = Ax may be written in the form (M - A)x = 0, from which it is seen that A is an eigenvalue of A if and only if M - A is singular or, equivalently, if and only if cPA (A)  det(M - A) = 0" 3. EIGENVALUES AND EIGENVECTORS 15 The function cPA (A) is a polynomial of degree n in A and is called the CHRACTERISTIC POLYNOMIAL of A. Consequently, a matrix has exactly n eIgenvalues, each distinct eigenvalue being counted according to its multiplicity as a root of the CHARACTERISTIC EQUATION cPA(A) = O. An eigenvalue whose multiplicity is one is called a SIMPLE EIGENVALUE. The characterization of eigenvalues in terms of the characteristic olynomial laS sOIIe important consequences. First, since cPA (A) = 0 If and onlY_If cPAH(A) = 0, each eigenvalue A of A corresponds to an eigenvalue A of All. Hence there is a vector y such that All y =  y or . I H H' ' eqUlva ently y A = Ay . The vector y IS called a LEFT EIGENVECTOR of A (and the original eigenvector is called a RIGHT EIGENVECTOR of A when it must be distinguished from y). Second, if A is real, then its characteristic polynomial is real and its complex eigenvalues must occur in complex conjugate pairs. ' Third, if A is block triangular, say All A I2 Alk 0 A 22 A 2k A= 0 0 Akk then cPA (A) = cPA ll (A)cPAn(A)' . . cPAkk (A). Hence the eigenvalues of a block triangular matrix are the eigen- valucs of its diagonal blocks. In particular, the eigenvalues of a triangular matrix are its diagonal elements. If (x, A) and (y, A) are eigenpairs of A, then (ax + (3y, A) is also an eigenpair, provided ax + (3y i- Q. Thus the set of all eigenvectors corresponding to an eigenvalue A together with the zero vector form a subspace, which is equal to N(AI - A). The dimension of this'sub- space, dim[N(AI - A)] = n - rank(M - A), is called the GEOMETRIC MULTIPLICITY of A. It is easy to see that the geometric multiplicity of an eigenvalue is not greater than its ALGEBRAIC MULTIPLICITY as a root of the characteristic equation. It can, however, be smaller as the following example shows" 
16 1. PRELIMINARIES Example 3.2. Let A( ) Then cPA PI) = (A - I?, so that 1 is an eigenvalue of algebraic multi- plicity two. On the other hand, the only eigenvector of A is a multiple of 1 1 , so that the geometric multiplicity of the eigenvalue 1 is one. An eigenvalue whose geometric multiplicity is less than its algebraic multiplicity is said to be DEFECTIVE A DEFECTIVE MATRIX is one with at least one defective eigenvalue. Unfortunately defective eigenvalues and their matrices are the bane of matrix perturbation theory, as the following continuation of the above example shows. Example 3.2 (continued). Let AC ) be a perturbation of A. Then cPA (A) = (A - I? - E, so that the eigen- values of A are 1:!::.,fE. Thus the eigenvalues of A are not differentiable at E = 0" Moreover, if E = lOl(), then the eigenvalues of A and A differ by 1O5. Thus a perturbation of 10- 10 in A can cause a perturbation in its eigenvalues that is five orders of magnitude larger. The use of a tilde (viz. A) to denote a perturbed quantity is the first occurrence of a convention that will be used throughout this book. See the introduction to Section lI1.2 for more details. If (x, A) is an eigenpair of A and U is nonsingular, then (U- I AU)U-Ix = AU-lX, which shows that (UIX,A) is an eigenpair of U-lAU. Thus the SIM- ILARITY TRANSFORMATION A -+ U- I AU preserves the eigenvalues of A and transforms its eigeIlvectors by U- I . Since rank(AI - A) = rank(AI - U- I AU), the geometric multiplicities of the eigenvalues are invariant under similarity transformations" Since cPA(A) = cPU-1AU(A), the algebraic multiplicities are also preserved. In the next two subsec- tions we will consider how far a matrix may be simplified by similarity tr ansformations. 3. EIGENVALUES AND EIGENVECTORS 17 3.2. The Schur Decomposition A major theme of matrix theory is the reduction of matrices to a simple form by similarity transformations. For example, it will fol- low from the results of the next subsection that if a matrix A is not defective then there is a nonsingular matrix X such that X-I AX = A == diag(AI' A2,.' ., An), where the Ai are the eigenvalues of A. The advantages of being able to work with a diagonal matrix instead of a full one are obvious. Unfortunately, similarity transformations can introduce problems of their own. Suppose, for example, that the matrix of the last paragraph is perturbed by an error E. Then the diagonalized problem is A + X-l EX. If X and its inverse are large (or if X is ILL CONDITIONED, as we shall learn to say in Chapter III), then the effect of the similarity transformation is to magnify the error. In this connection, unitary similarity transformations of the form A -+ U H AU are particularly desirable, since neither U nor its inverse U H can be large. Specifically, from the fact that UHU = I, it follows that the jth column of U satisfies lIuj 112 = 1. Hence no element of U or its inverse can be greater than one in absolute value. In general we cannot reduce a matrix to diagonal form by uni- tary similarities (informally, the relation U H U = I implies that U has roughly n 2 /2 degrees of freedom, too few to satisfy the roughly n 2 con- ditions that the off-diagonal elements of U H AU be zero). However, the following theorem shows that we can reduce an arbitrary matrix to triangular form by a unitary similarity. Theorem 3.3 (Schur). There is a unitary matrix U such that T = U H AU is upper triangular. The matrix U may be chosen so that the eigenvalues of A appear on the diagonal of T in any order. Proof. The proof is by induction. The theorem is trivial for a matrix of order one. Assume it is true for all matrices of order less than n > 1. Let an ordering of the eigenvalues of A be given, and let A be the first eigenvalue of the ordering. Let Ax = AX, where IIxl12 = 1, and let H = (x X) be a unitary matrix. 
18 I. PRELIMINARIES The matrix Il H AIl has the form H A ( xHAx xHAX ) H H= XHAx XHAX . Since Ax = Ax and J;IIX = 1, we have xII Ax = AXIlX Xlix = 0, we have X H Ax = AXHX = 0" Thus A. Since ( A x H AX ) ( A b H ) HIIAH = . o X H AX 0 M Since HH AIl is block triangular, the eigenvalues of the matrix Mare the eigenvalues of A other than A. By hypothesis, there is a unitary matrix V such that V H MV is upper triangular with its eigenvalues in the correct order. If we set U = (x XV), then T = U H AU = ( A o bHV ) VHMV is the required decomposition. · Although the proof of Schur's theorem is not difficult, the result- ing decomposition is one of the most important tools in theoretical and computational linear algebra. Computationally, it is the target of the QR algorithm, the single most successful algorithm for computing eigensystems of general matrices. We will encounter its theoretical uses throughout this book, beginning with some elementary consequences in this chapter. Recall (Definition 1.1) that a matrix A is normal if AHA = AA H . This seemingly innocuous equation has important consequences. Theorem 3.4. If A is normal then any Schur decomposition of A is diagonal. Proof. The theorem is trivial for a matrix of order one. Assume it is true for all matrices of order less than n > 1. Let T = U H AU be a Schur decomposition of A and partition T in the form ( Tt.'r ) '1'- - 0 '1'. . 3. EIGENVALUES AND EIGENVECTORS 19 Now if A is normal, any matrix unitarily similar to A is normal. Hence TilT = TT H , from which it follows that ITI2 = ITI2 + tHt, and TilT = T T II * * * * . The first of the equations implies that t = O. The second says that T. is normal, and hence by hypothesis it is diagonal (since it is triangular). Thus T is diagonal. . If we write T = A = diag(AI' A2"'" An) and U = (u] U2 '" Un), then AU = UA or AUj = AjUj, j=I,...,n. In other words the Uj are eigenvectors of A. Since the eigenvectors are pairwise orthogonal, we have the following corollary. Corollary 3.5. A normal matrix of order n has a system of ortlJOnor- mal eigenvectors that span en. A matrix chosen at random is unlikely to be normal. However, there are two frequently occurring classes of normal matrices: unitary matri- ces and Hermitian matrices. They are distinguished by the situation of their eigenvalues. eorollary 3.6. A unitary matrix is a normal matrix with eigenvalues on the unit circle. A Hermitian matrix is a normal matrix with real eigenvalues. The proof of this corollary is left as an exercise. It immediately implies that any Hermitian matrix can be written in the form A = UAU H , where U is unitary and A is diagonal and real. We will call this the SPECTRAL DECOMPOSITION of A. We will sometimes write it in the form A = 2:= Ailli U : I , (3.1 ) 
20 I. PRELIMINARIES where lLi is the ith column of U. This form allows us to extend scalar valued functions to Hermitian matrices: namely, ) clef  ( ) H <p( A = L.. <p Ai lLilLi . In particular if A is positive semi-definite, its eigenvalues are nonnega- tive and its SQU ARE ROOT 1 A ! clef  \"2 II 2 = L.. Ai lLilLi . is well defined. 3.3. The Jordan eanonical Form Although the Schur decomposition is useful in many applications, in others it does not go far enough in reducing its matrix. The ques- tion thus arises of how much we can simplify a matrix using similarity transformations. Example 3.2 shows that in general we cannot hope to reduce a matrix to diagonal form. However, we can reduce it to a simple block diagonal form, as the following classical theorem shows. Theorem 3.7 (Jordan). For any A there is a nonsingular X such tha t X-I AX = diag(Jkl (Ad, Jk2(A2)"'" Jk,(Al)), where Jk(A) E C kxk is a JORDAN BLOCK of the form A 1 0 0 0 0 A 1 0 0 Jk(A) (!:gf 0 0 0 A 1 0 0 0 0 A (3.2) The right-hand side of the decomposition is unique up to the ordering of the blocks. For a proof of this theorem (which is not easy) see any good linear algebra text. ' ,'! 't 3. EIGENVALUES AND EIGENVECTORS 21 The eigenvalues of a matrix corresponding to Jordan blocks of arder greater than one are defective (n.b", the same eigenvalue can occur in different blocks).. Thus a nondefective matrix has only 1 x 1 Jordan blocks, or equivalently every nondefective matrix can be diagonalized by a similarity transformation. For this reason a nondefective matrix is also called a DlAGONALIZABLE matrix. If we partition X = (XI,... , .Tn) by columns, then the first k l vectors associated with the Jordan block J k1 (AI) satisfy the equations XI = AIXI, Xj = AIXj + Xj_l, j=2,..",kl' Such a sequence is called a chain of PRINCIPAL VECTORS of A. The columns of X corresponding to the other Jordan blocks also form chains of principal vectors. In spite of its solid appearance, the Jordan form is a fragile thing. The continuation of Example 3.2 shows that the slightest perturbation can shatter it to bits. Moreover, the ones on the superdiagonal of a Jordan block represent an arbitrary normalization. For example the similarity U  ,r u  n u  ,) U  n (3.3) puts (j's on the superdiagonal of J 3 (A). (Note that as (j approaches zero, the transformation becomes increasingly ill conditioned). For these reasons, there has been a tendency to shy away from the Jordan form, at least in applications. 3.4. Invariant Subspaces One of the ways of handling instabilities in the eigensystem of a matrix is to decompose the matrix into smaller matrices acting on subspaces" Although the matrices may be ill-behaved within their subspaces, the subspaces themselves often turn out to be insensitive to perturbations. . For example, if in (3.2) we partition X = (XI '" XI) conformally wIth the Jordan blocks and similarly partition X- H = (Y I ".. Yi), then A = XddAdy 1 H + X2Jk2(A2)Y2H + '" + Xtlk;(AI)}{H. (3.4) 
22 1. PRELIMINARIES Since yH = X-I, it follows that AX i = XiJdAi), i=I,...,1. (3.5) In other words, A can be regarded as an operator mapping n(X i ) into itself. If u = XiV is a representation of u in the basis formed by the columns of X, then by (3.5) AXiv = X;Jk.(Ai)V; i.e", V maps into JdAi)V. This means that Jki(Ai) is the representation of A on R(X i ) with respect to the basis formed by the columns of Xi' These considerations suggest the following definition. Definition 3.8. The Sllbspace A' is an INVARIANT SUBSPACE of A if AXeA'. Some of the facts about invariant subspaces are contained in the following theorem. Theorem 3.9. Let X be an invariant subspace of A, and let the col- umns of X form a basis for A'. Then there is a unique matrix L such that AX = XL. (3.6) The matrix L is the representation of A on X with respect to the basis X. In particular (v, A) is an eigenpair of L if and only if (X v, A) is an eigenpair of A. Proof. Let X = (XI ... Xk). Since AXi E X, it can be expressed as a unique linear combination of the columns of X; that is, Ax; = Xli for some unique vector ii' The matrix L = (h .,. ik) is the required matrix. The fact that L is the representation of A on X follows as above for the Jordan canonical form. In view of (3.6), the statement about eigenpairs, which amounts to saying that Lv = AV if and only if AXv = AXV, is a triviality. . This is not the whole story - only enough to get us through Chap- ter IV. We will return to invariant subspaces in Chapter V. 3. EIGENVALUES AND EIGENVECTOH.S 23 3.5. The Field of Values The quadratic form x H Ax plays an important role in many applications. This subsection is devoted to the values that such a form can attain for a given matrix. We begin with a definition. Definition 3.10. Let A E e nxn . The set F(A)  {xHAx : IIxlh = I} is called the FIELD OF VALUES of A The set F(A) is bounded and closed in e. From Definition 3.10, it is easy to verify the following properties of F( A): 1. F(aA + (3I) = aF(A) + (3, a, (3 E e; 2. £(A)  F(A); 3. If U is unitary, F(U H AU) = F(A); 4. F(A + B)  F(A) + F(B). An important and far-reaching result is that the field of values of a matrix is convex; that is, it contains any line whose endpoints lie in it. Nothing in the proof of the following theorem is used later, and it may be skipped without loss. Theorem 3.11 (Toeplitz-Hausdorff). The field of values F(A) is a convex set. Proof. Let p, a E F(A). Since F(aA+(3I) = aF(A)+(3 and aF(A)+(3 is convex if and only if F(A) is convex, we may assume without loss of generality that p = 0 and a = 1. Thus there are vectors X() and x] of 2-norm one such that X Ax() = 0 and xr AXI = 1, which implies that Xo and Xl are linearly independent. By multiplying Xo by a scalar of absolute value one we may further assume that RxgXl = o. 
24 1. PRELIMINARIES We must show that any T E [0,1] is in the field of values of A. B.J the linear independence of Xo and Xl, the vector (1 - A )xo + AXl IS nonzero. Consequently the function II ] \ A 1 2 [(1 - A)Xo + Axd A[(1 - A):rO + AXI = 2 <p(A) = \1(1 - A)XO + AXI\1 11(1 - A)Xo + Axl\12 is continuous and real when A is real. Moreover, <p(A) E F(A) for all A. Since <p( 0) = 0 and <p( 1) = 1, by the intermediate value theorem there is a A E [0,1] such that T = <p(A) E F(A). · The field of values of a matrix A is closely related to the eigenvalues of A. In particular if Ax = AX with IIx\12 = 1, then A =.x H Ax E F(A). Since the field of values is convex, F(A) must contam the smaUest convex set containing all the eigenvalues of A, that is, the set H[£(A)] = { L 8 i A i: 8 i ;::: 0, L 8 i = I } (3.7) .\iEL:(A) (H[£(A)] is called the CONVEX HULL of £(A)). Unfortunately, the fi:ld of values can be bigger than the convex hull of L:(A), as the foUowmg example shows. Example 3.12. Let A( ) Then H[L:(A)] = {O}. But F(A) = {z E e: Izi S }. There is one important class of matrices for which the field of values coincides with the convex hull of its eigenvalues. Theorem 3.13. If A is normal, then F(A) = H[£(A)]" (3.8) Proof. Since the normal matrix A has a decomposition A = U H AU, wh,re U is unitary and A = diag(Al,"" An), we have F(A) = {(Ux)IIA(Ux) : x E en, IIxlb = I} = {yIIAy: y E en, IIyl12 = I} = {L'=I ly;\2 Ai : Li ly;\2 = I} . Since L IYil 2 = 1, this last equation is clearly equivalent to (3.7). · ;, ! 3. EIGENVALUES AND EIGENVECTORS 25 3.6. Sums of Hermitian Matrices The main result of this subsection will be found in more general form in Section IV.4, where it is a consequence of a very powerful character- ization of the eigenvalues of a Hermitian matrix. Because the result is needed in Chapters II and III, we will give an elementary proof here. Theorem 3.14. Let A and B be Hermitian and A = A + B. Let the eigenvalues of A be Al :::: A2 :::: ... :::: An, and let the eigenvalues of A be I ;::: ).2 :::: . . . :::: n' If JLn is the smallest eigenvalue of B, then Ai :::: Ai + /In, i=I,...,n. Proof. We begin by simplifying the problem. First, since the order of the eigenvalues is not affected by a shift of the origin, we may replace B by B - /lnI, so that the theorem says that the eigenvalues of A are pairwise greater than the eigenvalues of A. Second, if XlI AX = A is the spectral decomposition of A then we may replace A with XlI AX, A with A, and B with X H B X. llence we may assume that A = A is diagonal. Third, if B = Li /liYiy!1 is the spectral decomposition of B, then it is sufficient to prove the theorem in stages: first for A + /lIYIYP, then for (A + /lIYIyr) + /l2Y2YI, and so on. Thus it is sufficient to establish the theorem for the case B = yylI. Fourth, if the ith component of Y is zero, then Ai is an eigenvalue of A. Moreover, the remaining eigenvalues of A are those of the matrix obtained by deleting the ith row and column of A. Thus we may assume that lyl > O. Fifth, the characteristic polynomial of A is cPA (A) = det[(.>..! - A) - yyH] = det(.>..! - A)det[I - (.>..! - A)-lyyH]. From the identity det(I - nv H ) = 1 - vHn, we have cPA(A) = cPA(A) - L l1]iI2cP)(A), i where cP)(A) = cPA(A)/(A - Ai). Now if Ai has multiplicity Tn then (A - Ai)m-I factors out of cPA(A). This implies that Tn - 1 copies of A 
26 I. P RELIMIN ARIES stay fixed and we need only be concerned with the change in one of them" Hence we may assume that the Ai'S are distinct. The result now follows easily from the intermediate value theorem. Since 1>)(Ai) = 1>/..(Ai), w have 1>A(Ad < 0, 1>;\(A2) > 0, 1>A(A3) < 0, and so all. It follows that A has 11 -1 eigenvalues interlaced with the Ai. Sin<.:e 1>A(Ad < 0 and lim.\--;oo 1>;\JAI) = +00, the rpmaining eigenvalue of A must be greater than AI' · Theorem 3.14 is our first perturbation theorem, in th(,t it restricts the location of the eigenvalues of the perturbed matrix A. It is note- worthy that there is no restriction on the size of the perturbation B. Notes and References The prefix "eigen" has triumphed because of its brevity and utility. Other coinages are EIGENSYSTEM for the system of eigenvalues and associated vec- torS and EIGENPROBLEM for the eigenvalue problem. The term "eigenproblem" is actually more precise, since one is seldom concerned with eigenvalues alone; how- ever, the sense does not survive translation back into German -- further evidence of the thorough Anglicization of the prefix. Nontheless, our free use of the pre- fix "eigen" does not extend to revising established nomenclature: there will be no eigenpolynomials in this book ("A foolish consistency is the hobgoblin of little minds '" . "). The notation for the set of eigenvalues of a matrix or operator varies" Numeri- cal analysts and some matrix theorists write A(A); functional analysts, who call eigenvalues and their generalizations the spectrum of an operator, write a(A). Un- fortunately, the former group uses a(A) to denote the set of singular values of A. We have punted by writing L(A) for the set of eigenvalues of A. The proof of the existence of the Schur decomposition is the same as Schur's [193, 19 0 U]" The decomposition is essentially unique, up to the ordering of the eigenvalues; that is, once an ordering has been fixed, the copies of each multiple eigenvalue being placed together, the columns of U corresponding to simple eigenvalues are unique. The columns corresponding to multiple eigenvalues are not unique, but the space spanned by the columns is. This kind of "essential uniqueness" is typical of most decompositions involving eigenvectors and the like. Since a real matrix can have complex eigenvalues, the Schur decomposition of a real matrix can be complex, something that is undesirable in numerical applications. Reality can be restored by allowing the final form to be block triangular, with 2 x 2 blocks representing the complex eigenvalues. The details will be found in Exercises 3"22, 3"23, and 3.24" The .Jordan canonical form [124, 1870] represent.s an ext.reme redudion of its matrix, which is achieve,1 at. t.he expense of st.ability For more on the computat.ion of the 3. EIGENVALUES AND EIGENVECTORS 27 Jordan form and intermediaries see [273, 18, 128]. The term "field of ahles" is a reasonable translation of Wel'tvol'mt which seems to be clue to Toe { )htz [ 241 1 1 8] It ' I II 1 .' , ,9", IS a so ca e< the numencal range. Toeplitz proved t.hat the boundary of the field of values is a convex curve. Hausdorff '[105 1919] showed "tlat the set itself is convex. The proof given IIPre, which is really  souped-up vel SlOn of Hausdorff's, was adapted from [122]. . The esut on sums of Hermit.ian matrices is an easy consequence of Fiscll{'r's char- actenztlOn of the eigenvalues of a Hermit.ian matrix (see Corollary IV.4.9). The proof given here owes much to Wilkinson [269, 1965; pp. 94 98] Exercises 1. (CayleyHamilton). Prove that <PA(A) = O. 2. Prove that if A is real and A E L:(A) then>. E L:(A). 3. Let A be Tn x nand B be n x Tn. Show that the matrices ( A: ) and ( BOA) are similar. Conclude that the nonzero ei g envalu es of AB ar tl h " e 1e same as t ose of EA. [Note: This elegant proof is due to Kahan [130].] . Show that the geometric multiplicity of an eigenvalue is not greater than Its algebraic multiplicity. 5. Show that if A is SKEW-HERMITIAN ( i e A H - A) tl II . t ' I I . ., .., - - , 1en a I s elgenva - ues Ie on the llnagmary axis. 6. (Loewy [148]). Let K be skew Hermitian. Show that the matrix U = (I + K)-I(I - K) (3.9) s unitary. Moreover, if U is unitary and 1  L:(U), then U call be represented In the form (3.9) for some skew-Hermitian matrix K. 7. Suppose tht HI. nd H2 are Hermitian with HI positive definite. Show that HI + H2 IS posItIve definite if and only if all the eigenvalues of H- I H are greater than -1. I 2 
28 1. P RELIMIN ARIES 8. L('t. 11 b(' idempot('nt. (iy., A 2 = A). Show that A is nondefective with eigenvalues zero ano one. g. Let A be nonoefective. Show (without appealing to the Joroan canonical form) t.hat if the columns of X form a set of linearly indepenoent eigenvectors of A, t.hen A = X-I AX is diagonal. What arc the diagonal elements of A? 10. Let A be diagonalizable and let X-I AX = A, where A is diagonal. Let X = Q R be the QR factorization of X. Show that QII AQ is upper triangular; Le., it is a Schur decomposition of A. 11. Let "n-1 + "n q,(O = no + ffl€ + . . . + an-I<., <., , and let 0 1 0 0 0 0 0 1 0 0 C<I> = 0 0 0 0 1 -0'0 -a1 -a2 -0',,-2 -an-I Show that the characteristic polynomial of C<I> is ,p. The matrix C <I> is called the COMPANION MATRIX OF ,p. 12" Let. q,(O = ( - A)". Show that the companion matrix C<I> is similar to the Jordan block J,,(A). 13. Show that Jk(O)k = O. What do the matrices Jk(O)i (i = 1,..., k - 1) look like? 14. Show that .1,(»"  >"1 + >,,-' (':).1 + >,,-' ().1' +... + >,,-, " (k  ,).1'->, where J = Jk(O). 15. Prove that the field of values of a matrix A is real if and only if A is Hermitian. What if the field of values is imaginary? 16. Prove that the field of values of the matrix in Example 3.12 is {z E C : Izl:S}. 17. Let A and B be diagonalizable matrices. Show that the following state- ments are equivalent. 3. EIGENVALUES AND EIGENVECTORS 29 1. AD = DA. 2. There is a nonsingular mat.rix X sHch t.hat. X' I AX and X-I JJX are diagonal. 3. There are polynomials p and q and a diagonalizable matrix C such that A = p(C) and B = q(C). 1. Let A be  simple eigenvalue of A and let x and y be its right and left elgenvectors. Show that yHx "I O. 19. Let ,p be a polynomial and Jk(A) be a Jordan block of order k. Show that ,p[.h(A)] = (T ,p(3)(A)/3! ,p"(A)/2 ,p' (A) ,p(3)(A)/3! .. ) ,p' (A) ,p(A) o ,p"(A)/2 ,p' (A) ,p(A) ,p(3) (A)f:3! ,p"(A)/2 20. Let r(A) = p(A)/q(A) be a rational function, and let A be such that no zero of q is in £(A). Show that q(A) is nonsingular and hence r(A) = p(A)/q(A) is well defined. Prove that £[r(A)] = {r(A) : A E £(A)}. What are the corresponding eigenvectors? 21. Let B be nilpotent (i.e., let there be an integer k > 0 such that B k = 0). Show that if AB = BA then det(A + B) = det(A). THE FOLLOWING THREE EXERCISES DEVELOP AN ANALOGUE OF THE SCHUR FORM FOR REAL MATRICES. t\- 22. Let the columns of X form an orthonormal basis for an invariant sub- space of A. Let AX = XL, and let (X Y) be unitary. Show that ( : ) A(X Y) = ( ). 23. Let A be real, and let A be a complex eigenvalue of A with eigenvector x + lY. Show that the space spanned by x and y is an invariant subspace of A. 
30 1. PRELIMINARIES . . S l tl t'f A is real thcre is an orthogonal 24. (Real Schur decomposItIOn). lOW la I c.' d 2 2 blocks . U 1 tl lat UT AU is block triangular with 1 x 1 an x . matnx suc I . I f A and the . 1 . I Tllc 1 x 1 blocks contain the real Clgenva ucs 0 , on Its ( lagona . . I f A . I S ()f tl ln 2 x 2 blocks are the complex elgenva ues 0 . elgenva ue. . '" -0- . I ) Let A - B + iC whcre B = 2" (Ikndixson Hirsch Toephtz t lcorem . -.' I f B u. A iI )/ 2 d C = (A _ All)/2i are Hermitian. Lct the clgenva.ue 0 (A+ an fCb >...>, Showthat£(A)hesmthe b a >... > f3 and those 0 e ,I - - n It r e fJI - - n Note: Bendixson [25, 19 02 ] proved the rcSU lor rectangle [f3n,f3d x lIT",J]. [ d th tl bove for the imaginary part. real matrices, giving a weakcr bouIl an le a . d Hirsh [115 H)02] pointed out that the result holds for complex mtJces a c gae a sh;rp:r bound for \.1lP imaginary part. In the form statc! lcre, 1. thcorem is due to Tocplitz [241, 19 18 ].] , TENSOR PRODUCT of two matrices A and 26. The KRONECKER PRODUCr or B is ( aJlB a12B (X]3 B (X21B a22B an B A 0 B = a31 B a32 B a33 B . . . . . . . Show that if A and B arc square, then £(A 0 B) = £(A)£(B). 4. The Singular Value Decomposition ..., q J 4.1. The Singular Value Decomposition . I d t that it does not mix up The QR decompositIOn has t Ie a van age I . r t' b the columns of a matrix; since it involves only P l remu t l l t  Ica IOari: i : t . The P rice to be paid is that t le resu mg m ullltary ma fiX. t it' I y b y a . lar If we are also willing to pos mu Ip me.r t cl Y uP a P t e r r l ' x tr:n ;educe n arbitrary matrix to a diagonal form. Ulll ,ary m , The result is the SINGULAR VALUE DECOMPOSITION. 4 1 L t A E cmxn have rank r. Then there are unitary Theorem ., e matrices U and V such that ullAV= (,+ ) (4.1) 4. THE SINGULAR VALUE DECOMPOSITION 31 where E+ = diag(al,"", aT) wjt}j a] 2': "'2': a," > O. Proof. Let the eigenvalues of AHA be af 2': '" 2': a; > a = a;+1 = ... = a. If V = (VI V 2 ) (VI E e nX ,") is a unitary matrix formed from the corresponding eigenvectors of A H A, then VHAIIAV = ( E 0 ) o 0 ' whcre E+ is defined as above. Thus we have VIHAllAV J = E, ( 4.2) Vi i A ll Av'2 = 0, and from thc second of these relations we conclude that . A V 2 = O. (4.3) N ow let UI=AVIE:;:J. (4.4) Then from (4.2) we have ufu l = I. Choose U 2 so that (U J U 2 ) is unitary. Then from (4.2)( 4.4) wc get UllAV = ( UrAVI U fIAV 2 ) = ( E O + ( 0 ) ) " . UJIAV I UJ'AV 2 The diagonal elements of E+ are called SINGULAR VALUES of A. Con- ventions differ on how to handle zero singular values. The choices are to say that A has min{ m, n} - rank(A) zero singular values or that A has max{ m, n} - rank(A) zero singular values. Whenever the choice makes a difference, it will be clear from context what is meant, and we will therefore use either convention at our convenience. We will denote the set of singular values of A by S(A). It is easy to see that S(A) consists of the nonnegative square roots of the eigenvalues of AHA or of AA H , depending on how we count zero singular values. The columns of U and V are callcd LEFT and RIGHT SINGULAR VEC- TORS. The columns of U are eigenvectors of AA H , while those of V are eigenvectors of All A. The singular vectors are not uniquc, but they are . by no means arbitrary. Thc columns of the matrix VI must form an 
32 1. P RELIMIN ARIES orthonormal basis for the column space of A, while the columns of VI must form an orthonormal basis for the column space of All; and these bases are related by (4.4). Moreover, if U = U I W u and V{ = VI W v also consist of singular vectors of A, then Vu and "V v are unitary an? VuE+W} = E+. The columns of the matrices U 2 and V 2 may be arbI- trary orthonormal bases for the orthogonal complements of R(A) and R(AII). Thus the singular value decomposition, like the QR decompo- sition, can be Ilsed to compute the projection onto the column space of k However, it can also be used to compute the projection onto the row space of k . . The singular value decomposition (4.1) can be wntten III the form A = U1E+V I H , (4.5) which is called the SINGULAR VALUE FACTORIZATION of A. If we write F = U I and C = VIE+, then (4.5) is an example of a FULL RANK FACTORIZATION of A into the product FC H of matrices whose rank is the same as A. The relation between the singular value decomposition of A and the spectral decomposition of A H A allows one to obtil results .on the sin- gular value decomposition from results for HermItIan matnces. There is another way. Theorem 4.2 (Jordan-Wielandt). Let A E e mxn (m 2: n) have the singular value decomposition U H AV = diag(E, 0), where E is of order n. Then the matrix C Un ) has eigenvalues ::I: a I , . . . , ::I:a,,, corresponding to the eigenvectors ( Ui ) ::I: Vi ' i = 1,... ,n, where Ui is the ith column of U and Vi is the. ith column of v.r I addition C has m - n zero eigenvalues whose elgenvectors are (u i 0) (i=n+1,...,n} 4. TIlE SINGULAR VALUE DECOMPOSITION 33 The proof is purely computational and is left as an exercise. An important consequence of Theorem 3.13 is the following char- acterization of the largest and smallest singular values of a matrix. In stating it we take the opportunity to introduce some useful notation. Theorem 4.3. For any matrix A E e mxn IIAI12 f max IIAxll2 = maxS(A). IlxliFI If m 2: n, then inf 2 (A)  min IIAxll2 = minS(A). IlxllFI Proof. We have IIAII = max IIAxll = max xH(AHA)x = maxF(AHA). IlxliFI IIx1l2=1 Since A H A is Hermitian, the largest member of its field of values is its largest eigenvalue, which is the square of the largest singular value of A. A similar argument shows that inf 2 (A) is the smallest singular value of A. . As the notation suggests, the function II . 112 is actually a matrix norm. It will be treated in detail in the next chapter. 4.2. Two Inequalities In this subsection we will prove two useful inequalities. They are con- sequences of Theorem 3.14. Theorem 4.4. Let A E e mXll be partitioned in the form A  ( : ) Let the singular values of A be al 2: a2 2: " . 2: an and tllOse of Al be 2: T2' " 2: Tn' Then ai 2: Ti, i = 1,... ,n. 
I. P RELIMIN ARIES 34 Proof The squares of the singular values of A are the ei.genval l ues of f . . I I s of A are the elgenva ues 0 AH A and the squares of the smgu ar va ue I ., . . , H A A H A + AH A Since A!1 A2 is posItIve senll-defimte, A H A But A = I I 2 2. '2 h I I' I S of A H A taken in descending order are greater t an or t Ie elgenva ue .' H. I to tIle correS l Jondmg eIgenvalues of Al AI" equa ' , ' ' . The sc'cond incquality concerns the product of two matnces. 4 5 Let B E e mxn have singular values al 2: a2 2: . . . 2: an Theorem . '. > . . . > T Then and let C = AB have smgular values TI 2: T2 - - no Ti ::; a;\\A\\2, i = 1,. .., n. P f We will establish the inequality by comparing the eigenvalue roo..' II \\ 2BHB. Let AliA = QA 2 QlI be the spectra of CHC wIth those of A 2 2 A 2 )Q H TI D is P ositive ., f A li A L t D - Q(I\ A \I 1- H . len decomposItIon 0 . e - 2 2 semi-definite and All A + D = \IA\l21. Now \IA\IBH B = BII(AIIA + D)B = CHC + BHDB. S . B H DB is P ositive semi-definite, the eigenvalues of \IA\IBH Bare mce . . I f CHC . not less than the correspondmg elgenva ues o. . Both of the above theorems have trivial variants and corollanes, which we will use freely in the sequel. Notes and References . . ." l' d dently by Beltrami The singular value decomposItIOn was dlscovere( I e P I e t . . terms of . ] I Jordan [125, 1874]. Both cast theif ( enva Ions 1Il ' [23. 18 73 aIIC . . . ) _ H Ax b orthogonal transforma- silnplifyin!!: a rpallJl!1ncar fO\'\I1 1>(.1,,11 - 11. . '. y. I t 1.1 in the tions of the variables :r and y" Bdtrami's dcnva,'on b ls c ° k s 0 r le t ( l ) l l e lelar g es . I .' He begllls y as Illg lor text. Jordan's is somethmg e se agam.' t ( H H)H is an value of 1> when Ilxlb = IIY211 = 1 and shows that the vec or x Y eigenvector of the matrix ( A;) He then transforms A to the form whose eigenvalue is our aI' ( al 0 ) o A. ", $,: 4. THE SINGULAR VALUE DECOMPOSITION 35 and proceeds by induction a la Schur. Thus, Jordan can claim precedence for both variational characterizations in matrix analysis and the recursive definition of matrix decompositions. The decomposition has been frequently rediscovered, first by Sylvester [235, 236, 1889, ]990]. Autonne [8, 1913] generalized it to complex matrices, and Eckart and Young [63, 1936] to rectangular matrices, where they used it to approximate a matrix by another of lower rank (see Theorem 111.4.]8). The use of the word "singular" in this connect.ion apparently comcs from the literature of integral equations. The story begins with Schmidt [192, 1907], who expanded the kernal of an integral operator ill the form 00 1 K(X, y) = L >:fli(X)Vi(Y), i=1 ' where the fli and Vi are eigenfunctions of the iterated kernals J K(X,t)K(y,t)dt and J K(t,X)K(t,y)dt. This is equivalent to the matrix representation n K = UVT = L a;uivT i=1 A little later Bateman [12, 1908] refers to numbers that are essentially the reciprocals of the eigenvalues of K as as singular values, but does not relate them to the numbers introduced by Schmidt (he will continue this usage through 1(22). Picard [181, 1910] notes that for symmetric kernals Schmidt's Ai are real, and in this case (but not the general case) he calls them singu- lar values. By 1937, Smithes [198] was calling Schmidt's numbers singular values. Exactly when ami how the usage changed remains to be determined. Theorem 4.2 is implicit in Jordan's derivation of the singular value decom- position; however, Wielandt seems to be responsible for its widespread use today (e.g., see [71, p.113]). The singular value decomposition is closely related to the analysis, due to Hotelling [119, 1933], of a multivariate random variable into principal com- ponents. Specifically if the rows of A form a centered sample o a normally distributed random vector a, then V estimates an orthogonal transformation of a into a vector whose components are uncorrelated. 
36 1. P RELlMIN ARIES Exercises . A . 11 b x n ( m > n) matrix Throughout these exercises WI e an m - wit.h t.he singular value decomposition (4.1). 1. Show that if A is square, I det(A)1 = 0:'=1 ai. ]) P tl t an square matrix A has a POLAR DE- 2 (Autonne [7 19 02 . rove Ja Y , .' . . d fi 't . A ' - HQ where Q is unitary and H IS positive seml- e III e. COMPOSITION - , . . fi . d tl p lar de- Moreover, if A is nonsingular, then H is posItIve de mte an Ie 0 composition of A is unique. 3. What is the singular value decomposition of a Hermitian matrix? Of a unit.ary matrix? 4" 1,1'1. A 1)(' nOllsin{';ulaL Show t.hal. i nf 2(A) = IIA- I II2'I. 5. Show that IIAlb = max max lyH Axl. IIvIl2=llI x 1iFI . 1 » . . . > a and let C = AB 6. Let B E c mxn have smgular va ues al _ a2 _ - n . I I S T > 7' 2 > . . . > Tn- Then have smgu ar va ue I _ - - Ti ;:::: aiinf2(A), i = 1,..., n. r tl I. A . s square and nonsingular [Hint: Assume without loss 0; genera Ity la I and apply Theorem 4.5 to A- C.] 7. Let the eigenvalues of a square matrix A be ordered so that 1)'1 I ;:::: . . . ;:::: IA"I. Show that IAII S; al. 1 'he matrix W n is illustrated below for n = 5: 8. 1 -1 -1 -1 -1 0 1 -1 -1 -1 W5 = 0 0 1 -1 -1 0 0 0 1 -1 0 0 0 0 1 . 1 f W ? What is ) ()( 2-n ) What are the elgellva ues 0 n' Show that. inf2(W" = . it.s determinant.? 5. PAIRS OF SUBSPACES 37 5. Pairs of Subspaces The problem to be treated in this section is that of comparing two subspaces of en. Our tool will be the CS DECOMPOSITION (cosine-sine decomposition) of a partitioned unitary matrix. This decomposition allows us to define canonical angles for pairs of subspaces in such a way that as the largest canonical angle approaches zero the subspaces approach one another. This in turn leads to some useful results on the singular values of products and differences of projections. 5.1. The es Decomposition We begin by establishing the existence of the CS decomposition. Theorem 5.1. Let the unitary matrix vV E c"xn be partitioned in the form 1 n-I W = I ( Wn W12 ) n-l W 2I W 22 ' where 2l ::; n. Then there are unitary matrices U = diag(U n , Un) and V = diag(V n , V 22 ) with Un, V n E e 1xl such that n-21 UHWV =: (  n-21 0  ). -I; l' o (5.1) where l' = diag( 'YI, . .., 'Y/) 2: 0, I; = diag(al"'" al) 2: 0, and 1'2 + I;2 = I. Proof. The proof, which is long and tedious, is included mainly for completeness. Let uii W n V n = l' be the singular value decomposition of W II , and suppose that l' = diag(1' I , II-k), 
1. PRELIMINARIES 38 where the diagonal elements of r I satisfy o ::; ,1 ::; . . . ::; ,k < 1 ( 1) S ince \IV is orthogonal the singular values of Wu cannot be greater IL ., than one). Clearly the mat.rix ( W 11 ) Vu W 21 has ort.honormal columns. Hence I  [ ( ::: ) Vn1 n [( ::: ) vn1  1" + (W" Vn)n(W" Vn), that is, (W 21 V u )H(W 21 Vu) = diag(I - ri, Ol-k)' f TV V rthogonal with it.s last [- k This means that the columns 0 v 21 11 are 0 . U " e(n-I)x(n-l) 1 . Thus there is a unitary matnx 22 E columns )emg zero. sllch t.hat. "j "H 1 ( ) U 22 W 2 1 Vu = 11-21  ' where k l-k E = diag (El 0), _ d . ( a ) has I >ositive diagonal elements. and EI - lag al,' . ., k (5.2) Since " H ( \;Vu ) V - diag(U u , U 22 ) W21 u - (J has orthogonal columns, we must have 2 2 - 1 ,i + a i - , i = 1, . . . , l. (5.3) 5. PAIRS OF SUBSPACES 39 In a similar manner we may determine a unitary matrix V22 E c(n-I)x(n-l) such that UiiW I2 V 2 2 = (T 0), where T = diag(TI"'" TI) with Ti ::; O(i = 1,..., I). Since uii (W u Wddiag(V u , V 22 ) = (r T 0) has ort.hogonal rows, we must have ,; + T? = 1 (i = 1,..., l), and it follows from (5.2) and (5.3) that T = -E. Set (; = diag(U u , (;22) and V = diag(V u , V 22 )" Then the foregoing shows that the matrix X = (;II\1VV can be partitioned in the form k l-k k l-k 11-21 k r l 0 -E I 0 0 I-k 0 I a 0 0 X= k EI 0 X 33 X 34 X 35 l-k 0 0 X 43 X 44 X 45 n-21 0 0 X 53 X 54 X 55 Since X is unitary and E 1 has positive diagonal clements, we have X 33 = r l . Moreover, X 31 , X 35 , X 1 : J , and X 53 are zero. Therefore r l 0 -E I 0 0 0 I 0 0 0 X= EI 0 r l 0 0 0 0 0 X 44 X 45 0 0 0 X 54 X 55 where ( X 44 X 45 ) X 54 X 55 is unitary. Set U 33 = ( X44 X45 ) E c(n-I-k)x(II-I-k), X 54 X 55 
1. P ItELIMIN ARIES 40 then we have r l 0 -E 1 0 0 0 1 0 0 0 diag(1(l+k), uj)X = EI 0 r l 0 0 0 0 0 1 0 0 0 0 0 1 11-21 : (  -rE  1 ) ' n-21 0 0 ObSNve that diag(1(/+k), UmX = diag(1(I+k), UmUHWv. Hence, if we set U U dia g( l(IH) U. ) , 33 diag( U II , U 22 )diag(1(I), diag(1(k), U 33 )) diag(U II , U n diag(1(k), U 33 )) diag(U II , Un), then UHWV has the form (5.1), where U and V are diagonal block unitary matrices. . 5.2. Pairs of Subspaces Armed with the CS decomposition, we may now attack the problem of determining how two subspaces are situated with respect to one another. The following theorem shows that there are natural bases for two subspaces that exhibit their relation. Theorem 5.2. Let XI, 1'1 E e nxl with xiI XI = 1 and ylHY I = I. If 21 ::; n, there are unitary matrices Q, U II and V u such that QXIUII =: (  ) , n-21 0 (5.4) QY l V Il = I, (  ) , n-21 0 (5.5) 5. PAIRS OF SUDSPACES 41 where r = diag(rl"'" ,I) and E = diag(al,"', al) satisfy o ::; II ::; . . . ::; II, al 2: . .. 2: at 2: 0, (5.6) If + a; = 1, i = 1,. . . , i. On the other hand, jf 21 > n then Q U d \ 1 that " II, an II may be chosen so n-I 2t-n QXIU U = tln (   ) , n-I 0 0 (5.7) n-I 21n QY I VII = n-I ( r 21-n 0 n-I E (5.8) o ) I , o where r = diag(rl, . . . , In-I) and E = dia g( a l a ) ,. .., n-l satisfy o ::; II ::; . . . ::; 111-1, al  . . . 2: an-I  0, I; + a; = 1, i = 1, . . . , n - l. Proof. We. will prove the theorem for 2l ::; n leavin g the tl as an exercIse L t X d . ' ,0 ler case y _ (y; ' I" ) . e. 2 an J'2 be chosen so that X = (X X ) d _ I 12 are umtary. Let I 2 an I 11-1 " W  XHy = :,-, (::: :::) "By Theorem 5 1 tl . ., lere are umtary matrices U U 11 that the relation ( 5.1 ) holds TI C . f II, 22,' II and V n such " . lerelore, I we set Xi = XiV ii , Y; = }V;i, i = 1,2, 
1. P RELIMIN ARIES 42 and x = (XI X 2 ), Y = (Y 1 Y 2 ), then n-21 Xlly = 1 ( 1' I; -21 a -I; r o () )  . Moreover, by permuting the columns of U ii and Vii we can insure, that (5.6) holds. Equations (5.4) and (5.5)now follow on setting Q = X H . · Theorem 5.2 has the following geometric significance. Let XI and Yl be i-dimensional subspaces of en. If 2i :::; n we can transform en by unitary transformation Q so that the columns of the matrices (D and UJ form orthogonal bases of QX 1 and QYI' The space spanned by the columns for which Ii = 0 is the subspace of QYl that is orthogonal to QX 1 . When 21 > n, the columns of the matrices unand(n form orthogonal bases for QX I and QYI' As above, the space spanned by the columns for which Ii = 0 is the subspace of QYI that is orthog- onal to Q XI' The last 21 - n columns represent the smallest possible intersection of QX 1 and QYI' Since the numbers ai and Ii satisfy a+, = 1, they can be regarded as sines and cosines of angles between the bases. Moreover, X = Y if ancl only if I; = O. This means that the size of I; is a measure of how X and Y differ. Since I; depends only on the subspaces X ancl Y, we may make the following definition. 5. PAIRS OF SUBSPACES 43 Definition 5.3. Let X and Y b b The CANONICAL ANGLES b t e:u spaces of the same dimension. matrix e ween "" and Y are the diagonals of the 8(X, Y) f sin- I I; where I; is the matrix of Theorem 5.2. ' The following corolhr y of ' 1 ' 1 r: . ( . worem u 2 shows h( t canoIllcal angles. Its P roof' I ft . '. )W 0 compute the IS e as an exercIse. eorollary 5.4. Let X and Y b b sion. Let the columns f X !i e su spaces of en of the same dimen- the columns of Y form 0 J. t l. orm an orthonormal basis for Xl. and . an or 11onormal basis for Y Tl. 1 smgular values of XH y th ' . 11en t ]e nonzero 1. are e smes of the n . between X and Y. onzero cano111cal angles 5.3. Pairs of Projections A difficulty with the results of the last sub . . in terms of explicit bases for th . b ectlon IS that they are cast it is more convenient to re e s sp:ces mvolved. .In many instances jections. As one would ex .sen . su space b theIr orthogonal pro- to the canonical an g les bee e ' P t a h lr.s of b PrOJectlOns are closely related n elf su spaces. Theorem 5.5. Let X and Y be I d' . - 1I11ensJOnal subspaces of en Let k = { I, if 21 :::; n n-I, if21>n Let al 2: a2 2: . . . > a be th . . and y. Then: - k e smes of the can0111cal angles between X 1. The singular values of P ( 1 P ) x - yare aI, a2,. .., ak, 0,..., o. 2. The singular values of P P x - yare al,ul,a2,u2,... ,Uk,Uk,O,... ,0. 
I. PRELIMINARIES 44 . or the case 21 :::; n, leaving the P f Vve will prove the theOlem f I X d Y be bases roO. . If in Theorem 5.2 we et 1 an 1 other case as an exerCIse. H 'd P = y 1 y l H . It follows from (5.4) for X and Y, then P,y = X1X I an y and (5"5) that Qp,y(I _ py)QII = (    \ ( I :2 000) 0 ( I -0 r 2 -   \ o 0 0) = ( 2 _  \ o 0 0)  U)<b -ro) -r O  J I _ 2 o I tl sin g ular values -f 0) are orthonorma, 1e O. This proves the first a1,a2,'" ,az,O,..., Since the rowS of ( of Qp,y(I - py)QH are assertion. In a similar manner we find that ( 2 Q(P,y _ py)QII = -f -f 0 J _2 0 . o 0 . I f P - Py are d . I the nonzero smgular va ues 0 ,y Since  and rare Jagona" . , I f t l 2 x 2 matnces just the singular va ues 0 Ie -"(iai ) 2 ' -ai ( a2 S. = ' , -ani i=l,...,l. 1 f S are just ai twice over. · Bu t the singular va ucs 0 i' . C 5. PAIRS OF SUBSPACES 45 Notes and References The notion of canonical angles between subspaces goes back to Jordan [126, 18 75], and has been frequently rediscovered; e.g., in the very readable papers by Afriat [2, 3]. Davis and Kahan [53, 1970] give a unified treatment of the subject in Hilbert space and a bibliographical note with further references. Wedin [261] gives a lucid survey with an cmphasis on geometry. The CS decomposition was introduced by Stewart [205, 1977], although it is implicit in the paper of Davis and Kahan just cited. Paige and Saunders [174, 1981] consider the case where the diagonal blocks are not square. It is not a trivial matter to compute canonical angles or the CS decomposi- tion. Algorithms are given in [37, 209]. Exercises IN THE FOLLOWING EXERCISES X AND YARE SUBSPACES OF en OF DIMENSION l, WHERE 2l :::; n. THE CANONICAL ANGLES OF X AND YARE 0 1 2: ... 2: 0/. By THE ANGLE BETWEEN TWO NONZERO VECTORS x AND Y WE MEAN clef -1 Iyllxl L(x, y) = cos IIx1I211y1l2 ' 1. Let IIxll2 = lIyll2 = 1 and yHx 2: O. Show that . L(x, y) lIy - :1:112 = 2slll. 2. Let the columns of X and Y form orthonormal bases for X and y. Show that the singular values of y H X are the cosines of the canonical angles between X and y. cos(OJ) = minmaxcosL(x,y). rE,\' yEY xi"O yi"O 0 1 = max min L(x, y). xE.l' yEY 3"#0 V,#O 
I. PRELIMINARIES 46 4. Let VHl1 = 1. Show that there are matrices U and V such that UHU = diag(\\111\2\1v\l2' 1), VHV = I, and ( VII ) (11[1)-1= V H " 5. (Halmos [101]). A matrix A of order n is a contraction in the 2-norm if \IA\\2 :::; 1. Show that if A is a contraction, there exist matrices B, C, and D of order n such that ( ) is unitary. TilE FOLLOWING TWO EXERCISES DEFINE A UNITARY TRANS- FORMATION __ THE DIRECT ROTATION - - THAT MAPS X ONTO y, AND THEY SHOW THAT IT IS IN SOME SENSE OPTIMAL. FOR FURTHER DETAILS SEE THE PAPER BY DAVIS AND KAHAN [53]. 6. In Theorem 5.2 let the columns of X and Y form canonical bases for X and Y, so that U1I and V1I are the identity. Let ( l' - 0 J U = QII  l' 0 Q. 001 Show that U is It unitary matrix such that U X = y. Moreover, \11- UI\2 = 2 . h Sill 2 . 7. Let V be an orthogonal matrix such that V X = y. Show that \11- VI\2 2: 2sin. -0- THE FOLLOWING EXERCISE DERIVES A SPECIAL CASE OF THE GENERALIZED SINGULAR VALUE DECOMPOSITION INTRODUCED BY VAN LOAN [247, 248]. THE FORM GIVEN HERE IS DUE TO PAIGE AND SAUNDERS [174]. 8. Let x = (  ) , r: I :) CJ. AIRS OF SUBSPACES 47 wh.erc X] and X 2 are square and X has full umtary matrices U] and U d . rank. Show that there are 2 an a nOIlsmgular matrix, such that diag(U I ,lh)IIX = ( r ) 'i  c, where S is nonsingular and l' and  . Consider the CS decomposition of th:rerJagona l l with 1'2 + 2 = 1. [Hint: tion of X.] 0 lOgona part of the QR factoriza- -0- 
Chapter II Norms and Metrics The goal of matrix perturbation analysis is to predict or bound the changes in objects associated with a matrix when the elements of the matrix change. For example we might ask how far the eigenvalues and eigenvectors of a matrix A will change when A is replaced by a nearby matrix A + E. A prerequisite for answering such questions is to make precise terms like "how far" and "nearby." For eigenvalues, which are complex numbers, the absolute value function I . I : e --t R provides a natural notion of size and distance. The usefulness of this function in analysis depends on three properties: 1. ( ::F a ===} 1(1 > 0 (definiteness), 2. 10'(1 = 10'11(1 (homogeneity), 3. I( + 171 s 1(1 + ITII (the triangle inequality). These three properties make the function p( (,17) = I( -171 a metric over C, which endows it with a topology, so that we may speak of limits and continuity. A vector norm is a generalization of the absolute value; i.e., a defi- nite, homogeneous function on en that satisfies the triangle inequality. . The first section of this chapter is devoted to the study of the elemen- ;tary properties of vector norms. It is also possible to define a matrix "'norm to be a definite, homogeneous function on c mxn that satisfies 49 
50 ___________________ 11. NORMS AND METRICS 1. VECTOR NORMS 51 the triangle inequality. However, such a definition ignores the fact that matrices can be multiplied, and it is usually augmented to take this op- eration into account" Section 2 is devoted to the study of matrix norms, and Section 3 is devoted to the study of a particular class of matrix norms ,-,- the UNITARILY INVARIANT NORMS - which interact nicely with the Euclidean geometry of en" In some applications, the object that is pertnrbed is no!' a vector or a matrix, but a subspace. Accordingly, the chapter concludes with a discussion of measures of distance or metrics for subspaces. This chapter, like the first, is preliminary. Unlike the first, it con- tains material requiring lengthy, deep proofs. The following comments are for the reader who wants to get through the chapter quickly and move on to the perturbation theory itself. The first two sections are an elementary introduction to vector and matrix norms and may be skimmed by anyone familiar with the subject. The third section contains the most challenging material in the chapter. Theorem 3.6, which characterizes unitarily invariant norms, is the key result of the first subsection. However, its supporting lemmas and proof are not required elsewhere and may be skipped. Of the material in the last subsection only Birkhoff's theorem (Theorem 3.1G) and Fan's theorem (Theorem 3.17) will be used in the sequel. The material in Section 4 is used only in Chapters V and VI. The reader may find the summary (4.11) useful in sorting out the various metrics introduced in this subsection. 3. v(x + y) :S v(x) + v(y). Three important properties follow immediately from Definition 1.1. For any norm v, 1. v(O) = 0, 2. v( -x) = v(x), 3. Iv(x) - v(y)1 :S v(x - y). 1.2. Examples There are an infinite number of norms on en. However, three of these- the 1, 2, and oo-norms - are most commonly used in practice. The I-norm is defined by n IIxlh (f 2:: Iil ; i=1 the 2-norm by n IIxlb  2:: ld2; i=1 and the oo-norm by Ilxli oo  l Id. The norms II . III and II . 112 are special cases of the HOLDER NORMS (or p-NORMS) defined by 1. Vector Norms n 1 IIxli p = (2:: Idp)p, i=1 1 :S p < 00. (1.1) 1.1. Definition (For a proof that 11.111' is indeed a norm, see Exercises 1.G-1.8.) Since As we mentioned in the introduction, a norm on en is a generalization of the absolute value of a complex number. IIxll oo = lim IIxll p , 1''''' 00 (1.2) Definition 1.1. A function v : en --t R is said to be a NORM on en (or a VECTOR NORM) if v satisfies the following conditions the norm II . 1100 is also regarded as a Holder norm. The 2-norm has the useful characterization 1. x =I 0 ==} v(x) > 0, 2. v(nx) = Inll/(.T), Ilxll = xHx, from which it immediately follows that 
52 11. NORMS AND METRICS 1. VECTOR NORMS 53 the 2-norm is UNITARILY INVARIANT; i.e., The converse is left as an exercise. _ U unitary::=} IIUxJI2 = Ilxlh. The Hblder norms have the property that Ilx\ll' = Illxllll" They also have the property that \lx\ll' :::; Ilylll' whenever Ixl :::; Iyl. These properties will play an important role in the theory of unitarily invariant norms as wdl as in the structured perturbation theory of linear systems and least squares problems, and it is worth establishing that they are equivalent. We begin by introducing some terminology that we will use later. Definition 1.2. A vector norm v on en is ABSOLUTE if v(lxJ) = v(x) for all x E en. The following theorem exhibits a technique for constmcting new norms from old. Its proof is left as an exercise. Theorern 1.4. Let IL be a norm on em, and let A E e mxn have rank n. Then the function v defined by v(:z:) = Jl(A:r) is a norm on en. eorollary 1.5. Let A be positive definite. Then the function v defined by v(J;) = VxHAx Theorem 1.3. A vector norm v is absolute if and only if 1S a nonn. Ixl :::; Iyl ::=} v(x):::; v(y). ( 1.3) Proof. We have Proof. Suppose that v is absolute. Note that this implies that if IDI = I then v(Dx) = v(x). Now let Ixl :::; Iyl. It suffices to prove (1.3) for the case where the first component of Ixl is less than or equal to the first component of Iy\ and the other components of x and yare equal in absolute value. By multiplying x and y by suitable diagonal matrices, we may assume without loss of generality that the first components of x and yare nonnegative and the others are equal. Let x = (pT/I' T/2, . . . , 1]n)'f, where 0 :::; p < 1. If we set iJ = (-T/t, '/2, . . " , T/,,)T, then 1 v(x) = \lA 2 x112' Hence by Theorem 1.4, v is a norm" _ Norms generated by a positive definite matrix A are called ELLIPTIC NORMS and are often written 11.\lA. It is worth noting that these norms bear the same relation to the "inner product" yll Ax that the 2-norm bears to the usual inner product yHx (see Exercises 1.15-1.17). l+p I-p, x=y+y. 1.3. Equivalence and Limits In R 2 and R 3 , the norm \ly - X\l2 is the ordinary Euclidean distance between x and y. For this reason the 2-norm on en is also called the Euclidean norm. Moreover, the function Since 1 - P :::: 0, P2(X, y) = Ily - x\l2 v(x) = v (l¥y + y) :::;v(y)+v(y) = v(y) + v(y) = v(lyl) + v(lyl) = I/(Y). is a METRIC on en; that is, it satisfies the conditions 1. P2(X, y) :::: 0, 2. P2(X, y) = 0  x = y, 3. P2(X, y) = P2(y, x), 
54 II. NORMS AND METRICS 1. VECTOR NORMS 55 4. (J2(X, z) ::; (J2(X, y) + (J2(Y, z)" . Th unit sphere 52 = {x : IIxlb = I} is closed and bounded. Since f.l IS contmuous (Lemma 1.4), f.l achieves a minimal value Jr) on 52' Now let y = x/llxll2' Then y E 52, and The metric (J2 defines a topology on en; that is, it provides en with a collection of open sets from which the notions of closed sets, compact- ness, limits, and continuity may be defined. If v is a norm Oil en, the function IJ.(X) = fl,(lIxlby) = IIxI12/"(Y) 2> Jr) 11:1:112. (1.5) (Jv(X, y) = v(y - x) .ow let f.l and v be arbitrary norms" From the foregoing, there are posItIve numbers al, a2, TI, and T2 such that is also a metric and hence defines a topology for en. It turns out that this topology is the same as the topology generated by the Euclidean norm. This is a consequence of the equivalence of norms on finite dimensional spaces. We will prove this result in two steps, the first of which is of independent interest. alllxII2 ::; f.l(x) ::; a211 x 112, (1.6) and Tlllxll2 ::; v(x) ::; T211x112' (1. 7) From (1.G) and ( 1.7), Lennna 1.6. Lei, 1/ be a /lorm OlJ en. 1'h(,lJ v is cO/ltilJ/lol1s in the Euc1idealJ metric" O <  = al < IJ.(X) < 0"2 _ "1- -= Jr 2. . T2 - v( x) - T) Proof. We need only prove that for any E > 0 there is a 6 > 0 such that II/(Y) - v(x)\ < E whenever lIy - xlb < D. From Definition 1.1, II/(Y) - v(x)l::; v(y - x) = V[2::'=1 (rJi - ;)1;] ::; 2::'=II1Ji - ;\v(l;) ::; I'lly - xlb, The following example shows the relations between t Ile 1 2 d , , an (X)-norms. (1.4 ) Example 1.8. For all x where I' = ) 2::;'"=1 v 2 ( 1 i ) > 0 is independent of x and y. 1£ we take 6 = Ell', then Iv(y) - u(x)1 < E provided that Ily - xlb < 6. · This lemma allows us to prove the equivalence of norms. 1. IIx1I2::; IIxlll ::; JTlllxll2, 2. In IIxII2 ::; Ilxll oo ::; Ilxlb, 3. IIxlloo::; IlxliI ::; nllxlloo. In all cases, equality can be attained. Theorem 1.7. Let v and f.l be norms on en. Then there are positive numbers Jrl and Jr2 depending only on v and f.l such that JrIV(X) ::; f.l(x) ::; Jr2V(X), "ix E en. n heorem 1.7 shows that all norms generate the same topology on e , m tl.le sense that for any sequence XI, X2, . . . we have limkoo f.l( X - Xk) = 0 If and only if limkoo v(x - Xk) = O. Thus we can use any norm to define the notion of a limit of a sequence of vectors. However, it is possible to define limits without using norms. A very natural definition is the following. Let Xk = (d k ),. . . , k»)T (k = 1,2,.."). If liw c(k) = c" k l (,,1, oo i=I,..",n, Proof. Without loss of generality, we may take x =I O. The first step is to prove the theorem for the case v(-) = II . 112. In this case , it follows from (1.4) with 11 = 0 t.hat we way take Jr2 = J L'=J ,,2(1;). Thus we need only determine Jrl. 
56 II. N aRMS AND METRICS 1. VECTOR NORMS 57 then we say that the sequence of vectors {xd has the limit x (6,"." ,,,)T, or that x: converges to x, and we write lim ;rk = :r. koo Definition 1.10. Let v be R norm on en. Then the function 1/* defined by v*(y) = max lyHxl 1/(:r)=1 is the DUAL NORM of 1/ on en. The following t.hearcm shows that this component-wise convergence is the sauJ(' as convergence in any norm. The dual of the 2-norm is easily seen to be itself. The dual of the I-norm is the oo-norm, and vice versa. More generally if p, q > 1 satisfy Theoreln 1.9. For any vector norm 1/, lim Xk = x {=::} lim V(Xk - x) = O. (1.8) k --+ OCJ k -....+(X) Proof. The result is trivial when vO = \1 . 1100' Hence by the equiva- lence of norms, the result holds for any norm. · 1 1 - + - = 1, p q <p(x) = yHx (1.9) then the Holder norms II . 111' and II . IIq are dual. These examples suggest that we cannot generate new norms by taking the dual of a c!ualnorm- we simply get back the original norm. We are going to prove that this is indeed true, but to do so we must first establish an important result on the extension of linear functionals. A linear functional <p : X -t e defined on a proper subspace X of en has a representation in the form (1.9). However, the vector y is not unique; for example, it can be replaced by y + z, where z is any vector in X.l. Since (1. 9) defines <p on all of en, another way of expressing the nonuniqueness of y is to say that there are many ways of extending the functional <p from X to en. The following theorem shows that among these extensions is one that does not increase the norm of the functional. 1.4. Linear Functiollals and Dual Norms A LINEAR FUNCTIONAL on en is a continuous function <p : en -t e that is linear. The matrix representing such a function has dimensions 1 x n; i.e., it is a row vector. Thus to each linear functional <p on en there corresponds a unique vector y such that for all x E en. There is a rather nat.ural way in which a linear functional can be given a norm. In ardinary conversation we would say that a linear func- tional was big if it mapped vectors of ordinary size into large ones; and conversely we would say it was small if it mapped ordinary vectors into small ones. If we make the notions of "big," "small," and "ordinary" precise by choosing a specific vector norm v, then we can define the "size" of the functional <p by Theorem 1.11 (Hahn, Banach). Let v be a norm on en. Let X be a subspace of en, and let <p : X -t e be a linear functional satisfying max 1<p(x)1 = p. xEX v(x)=l v*(<p) = max 1<p(x)l. l/(x)=1 Note that v* is well defined, since 1<p(x)1 is continuous, and by the equivalence of norms the I/-sphere SI/ = {x : v( x) = I} is closed and bounded. It is easy to verify that v* is indeed a norm. According to (1. 9) every linear functional <p on en can be identified with a vector y E en. Hence (1.10) defines a new norm on en. This justifies the following definition. (1.10) Then <p can be extended to a linear functional on en that satisfies max 1<p(x)1 = p. TEC n l/(x)=l Proof. Without loss of generality, we may assume that JL = 1. If X = en, then we are through. Otherwise, there is a vector u =I 0 that 
58 11. NORMS AND METRICS X' I . . X We shall show how to extend 'f! to the space does not Ie 111 . ' spanned hy X and u in such a way that max \'f!( x) I = 1. TE.\" 1/(:1")=1 (1.11) Let Xl and X2 be two vectors in X. Then 'f!(xd - 'f!(.L2)  V(XI - X2)  V(XI + u) + V(X2 + u). Hf'nce 'f!(:rl) - v(:r\ + 1l)  'f!(:r2) + v(:r2 + 11.). 1 ' 1 ' l ' lleql Hllit y im p lies that any element in the set {'f!(x) - v(x + u) ) : liS ' c . h t { ( x ) + v( x + u . x E X} is less than or equal to any element m t e se 'f! . E X} If we set 'f!( -u) equal to any value lying between these two ets (or'equal to their one common value if they intersect), then 'f!(x) - v(x + u)  'f!(-u)  'f!(x) + v(x + u). In other words \'f!(x + u)\  v(x + u). Now extend 'f! to X' by linearity; i.e., for x E X set 'f!( x + au) = 'f!( x) + a'f!( u ). If a ::F 0, we have \'f!(x + au)1  lall'f!(a-Ix + u)\  lalv(a-Ix + u) = v(x + au). Hence (1.11) holds. I . X' and extend If X' =I en, we may select another vector u not m . ". I 1 I d X' in such a way that ItS norm IS 'f! to the space spanne( )y u an 1 II 1 . 1 After a finite number of such extensions, we s la lave not mcrease( . c ' extended 'f! to all of en. . We are now in a position to prove the duality theorem. Theorelll 1.12. Let v be a norm on en. Then v** = v. 1. VECTOR NORMS 59 Proof. From the definition of dual nOrm we have lyHxl  v*(y)v(x). (1.12) Consequently v**(X) = sup lyHxl  v(x)" v*(y)=1 To show equality define a linear functional 'f! on the space spanned by x by 'f!(ax) = av(x). Then (with an abuse of notation) v*('f!) = 1. By the Hahn,-Banach theorem, there is an extension of 'f! to en with v* ('f!) = 1. Let z be the vector representing 'f!. Then v* (z) = 1 and IzHxl = v(x). Hence v**(x) = sup lyHxl 2: v(x), v*(y)=1 which establishes the theorem. . We note for later reference that (1.12) is a generalization of the CAUCHY INEQUALITY, lyHxl  lIylbllxll2' For this reason it is sometimes called the GENERALIZED CAUCHY IN- EQUALITY. Notes and References The quantitative notion of distance or size is as old as thc mcasuring stick. Our 2-norm in 3-space is just the Euclidean notion of size - the basis of Greek geometry - and the triangle inequality says that the lcngth of one side of a triangle is less than the sum of the lengths of thc other two sides. Therc are two ways of generalizing Euclidean length to a vector space. The first, due to Minkowski [157, 1911, v.2, pp.131 -229], uses convcx bodies to define norms. Specifically if K is a compact convcx set containing the origin, we may define a norm I/!((x) as the reciprocal of thc number a such that ax lies on the boundary of K (in this approach we must add a 2: 0 to the homogeneity condition). Minkowski established the equivalence of his norms to the 2-norm, introduced thc dual norm (he called it the polar norm), and showed that the dual of the dual was the original. Although Minkowski worked only in 3-space, it is obvious (as it must have been to him) that the 
GO - .---"-- -_.---"-----._--------------- I I. NORMS AND METRICS 1. VECTOR NORMS Gl approach p;encralizes. For a modern treatment along these lines see [121, Ch.2]" 6. (Arithmetic-geometric mean inequality). Let  > O. Show t.hat The second approach is to add the axioms for a norm to those for a vec- tor spacc, which was done independently by Banach [9, 19 22 ] and Wiener [268, 19 22 ]" For Wiener t.he matter seems to have been little more than a mat.hematical pxcrcisp. Banach, on the ot.her hand, developed the notion ext.C'nsivdy and went. on to apply it. In any event, nOrIned linear spaces are known today as BANACI! SPACES. It. is of interest that both Banach and Wiencr use the modern notation 1\ . II to denote a norm. Norms of the form y' x H Ax arise frequently in the analysis of iterative meth- ods for linear syst.ems [276]. In spite of the equivalence of norms, a statement lJlade in one elliptic norm may in practice mean something entirely different frolJl the same st.atement made in the Euclidean norm. We will return to this point in Section 111.2, where we discuss the limitations of absolute and relative errors defined in terms of norms. c - O' + 0' _ 1 { ::::: 0 if 0: > 1 or 0' < 0, :::; 0 if 0 < 0' < 1. Conclude that for 0 < 0' < 1 and {: {: > 0 <.,1,<.,2 _ , l-n :::; 0'6 + (1- 0')6. EquivalentJy if p > 1 and 1 + 1 = 1 tl l' q , len 1 1 0' {3 O'v{3q:::; - +- P q (1.13) for all nonnegative 0' and {3. 7. (Holder's inequality[1l8, 1889]). Show that if p > 1 and 1 + 1 = 1 tl l' q , len We have already pointed out that Minkowski had the concept of the dual norm. The approach taken here is due to Hahn [99, 19 2 7] and Banach [10, 19 2 9]. The proof given here is adapted from the one in the elegant book by H.ipsz and Sz.-Nagy [184, 195fj]. The inequality yT x :::; \1xlbllylb for real vectors is due t.o Cauchy [41, 1821, Note II, Theorcm XVI]. It, is also associated with t.he names Schwarz and Bunyakovski. lyIlxl :::; Ilxlll'lIyllq. !Hint:  (1.13) take 0' = Ullxlll' and {3 = 1/i/l l x ll mequahtles.] q, and sum the resulting 8. (Minkuwski's inequality [156, 1896]). Show t.hat. for p > 1 1. Let v be a vector nonn. Show that Iv(x) - v(y)\ :::; v(x - y). 2. Verify directly that the functions \1 . 111' (p = 1,2,00) are norms. 3. (Cauchy inequality). Show that IIx + ylll' :::; II X 111' + lIylll" [Hint: Assume without loss of generality that x, y > O. Write (i + 1/i)1' = L i(i + 17;)1'-1 + L 1/i(i + 1/iyl, . ' . Exercises I:rlly\ :::; IIxlb\1yl12 and apply milder's inequality twice.] 9. Shuw that if 1 + 1 - 1 th 1.1 H U ld l' q - , en Ie 0 er norms II . 11 1 > and \1 . IIq are dual. -0- 10. Let {I be a norm on em and v be n x E em+n in the f _ ( II II  norm on e . Partition any vector . . OrIn x - xI x2) , where XI E em. Show that tl followmg functIOns are norms on e m + n : Ie with equality if and only if x and yare linearly dependent. 4. Show that up to a constant multiple the 2-nonn is the only unitarily invariant norm on en. 5. Show that Iyll:rl :::; \1x\1l\1y\1oo. Conclude that Ilxll :::; \1xl\r\1x\1oo. TilE FOLLOWING EXERCISES SHOW THAT THE HOLDER NORMS ARE TRULY NORMS. THEY FOLLOW BECKEN BACH AND BELL- MAN [20]. 1. (1 (Xl) + V(X2), 2. y' {1(xJ) 2 + V(X2 ) 2 , 3. max{{1(xJ), V(X2)}' 
62 II. NORMS AND METRICS 1. VECTOR NORMS 63 11. Let v be it norm on C n . Show that the function P'J(x, y) = v(y - x) 17. (Jordan and von Neumann [127, 1935]). Let v be a norm on C n . Show that a necessary and sufficient condition for v to be generated by an inner product is that it satisfy the RHOMBUS IDENTITY is a metric; i.e.. 1/ 2 (X + y) + v 2 (x - y) = 2[1}(x) + v 2 (y)]. L 1',,(:1', y)  0, 2" 1',,(:1', y) = 0 <==> :r = ;II, 3" plJ(:r,1/) = Pv(Y, :1:), 4. Pv(x, z) :::; Pv(x, y) + Pv(y, z). I I 1 f t . I C n X C n is a PSEUDO-METRIC if it satisfies all 12. A rea va ue< unc Ion 0 I the defining conditions for a metric except -<>- l8. Let 1/ be a norm on en The sequcnce T], :1:2, . . " is a CAUCHY SEqUENCF; if for every f > 0 there is an integer N such that 1/(.7:; - x j) :::; f whenever i,j  N. 1'2 (x, y) = 0 {=} x = y. Show that the relation 1'( x, y) = 0 is an equivalence relation un en. ?e- . . I - -I b y ( x ) Show that the functIOn note the correspondlllg eqlllva ence c asses . . k . p( (x), (y)) = p( X, y) is well defined and is a metnc over the space of eqUIva- lence classes (x). 13. Verify the inequalities in Example 1.8. 14. Prove Theurem 1.4. THE FOLLOWING EXERCISES CONCERN NORMS GENERATED BY POSITIVE DEFINITE MATRICES. . . ( ) . en EB C n ---> R 15. An inner product on en is a contllluous mapplllg .,. . that satisfies 1. Show that this definition is independent of the choice of norm. 2. Show that en is COMPLETE; that is, every Cauchy sequence in en converges in en. [Hint: use the corresponding fact about complex numbers. ] TilE FOLLOWING EXERCISES EXPLORE SOME OF THE RELATIONS OF NORMS AND CONVEXITY. 19. Let v be a vector norm and let Bv = {x : v( x) :::; I} be the unit v-ball. Show that Bv is closed, bounded, convex, and equilli- brated (x E Bv and 10'1 :::; 1'* ax E Bv). Shuw further that B IJ contains the origin in its interior. Conversely if B is a closed, bounded, convex, equilli- brated set containing the origin in its interior, then the function VB defined by VB(X) = inf{a- I > 0 : ax E B} (1.14) is a vector norm. L :r # ° {=} (:r, :r) > 0, 2. (0':1: + riy, z) = n(:r, z) + (3(y, z), 3. (y, :1:) = (:1:, y) . Shuw that any inner product has a unique representation of the form (x, y) = H Ax where A is positive definite. Conclude that the function v defined by v2(x)"= (x, x) is a norm. It is the NORM GENERATED BY THE INNER PRODUCT (-, "). 16. Let v be a norm generated by an inner product. Show that the unit ball {:r : lJ(:1:) < I} is an ellipsoid (hence the alternative name ELLIPTIC NORM for these nors)" Dcscribe the lcngths and situations of the axes. 20. Let BeRn bc a cluscd, boundcd, convcx set containing the origin. Show that the function VB defined by (1.14) is a norm in which the homogeneity condition is replaced by 0'2:0 => VB(X) = O. 21. Show that the Hahn-Banach theorem is equivalent to the following state- ment. If x is a point outside a closed, bounded, convex set cuntaining the origin in its interior, then there is a hyperplane that separates x from the set. -<>- 
II. NORMS AND METRICS 2. MATRIX NORMS 65 64 . 1 fi 1 1 Y ( 110 ) is a norm. 22. Verify that the function (e 11('( >. . WING EXERCISES IS TO SHOW THAT THE PURPOSE OF THE FOLLO " IN INFINITE DIMENSIONAL THE EQUIVALENCE OF NORMS FAILS SPACES" t of all infinite sequelle es x = (6,6,...) 23. For 1 :e:; ]I :e:; 00 let PI' be the se t k the limit). Show that if PI < ]12 1 tl t ,,",00 I C I I' < 00 (for P = 00 ,a e suc I la L.,,=1 c" .' tl n is P ro p erly contamed m £1)2' len I'l ' S how that the function II . \II' defined by 24. This definition has the consequence that all the properties of vector norms developed in the previous section remain true of matrix norms. For example, all matrix norms are equivalent and generate the same topology, in which they are all continuous functions. It makes no the- oretical difference whether we define convergence of matrices element- wise or as convergence in any matrix norm. A natural generalization of the Euclidean norm is given in the fol- lowing definition. Definition 2.2. Let A E e mxn . The FROBENIUS NORM of A 1S the number 1 \I:r\lp = (L liIP);; i=1 00 m n IIAIIF f L L IO'ijl2 = trace(A II A)L i=1 j=l (2.1 ) . C 1 sequences in the hat £ is complete; that IS, auc W, ' c is a norm OIl fl" Prove t "I' ( N t . £ 's called a HILBERT SPACE.) p-norm converge in the I)-norm. 0 e. 2 I H "ld norms are defined on £ I, they are not 25 Show that although all the 0 er l th ' . IS a se q uence Xk in £1 such . . 1 'f < P t len ele ' equivalent. In partlcu ar I PI 2 that \lxk\lpl ---> 00 while \lXk\l1'2 ---> O. -<>- Note that when A E e nxl , i.e., when A is a vector, the Frobenius norm reduces to the 2-norm. Our Definition 2.1 of matrix norm has one important defect: it makes no concession to the fact that matrices can be multiplied. What we would like is an analogue of the triangle inequality for matrix mul- tiplication. In fact the Frobenius norm satisfies such an inequality: namely, IIABIIF:::; IIAliFIIBIIF' 2. Matrix Nonns 2.1. Basic eoncepts . . . . ector S p ace of dimensIOn mn, It I f m x n matnces IS a v , I Since t le space 0 . . tile same wa y as vector norm. n I 1 fi a matnx norm m " I is natura to (e ne < . 11 I tl '. I owev er as we shall see ater, . d fi . t' e WI (OilS, 1 , the fol1owmg e m Ion, W d ' d " I roperty to be really useful. a matrix norm needs an a ItlOna p , e rnxn ( or a . . e mxn -+ R is a NORM on Definition 2.1. A functJOn v . . ". M) l ' f it satisfies the followlllg cond1tlOns. MATRIX NOR ' whenever the product An is defined (Exercise 2.1), However, not every matrix norm satisfies this kind of equality, as the following example shows. Example 2.3. Let us attempt to generalize the DO-norm as we did the Euclidean norm by defining voo(A) = max 100ijl. ',) Clearly this is a matrix norm. However, if L A:I 0 ===> v(A) > 0, 2. v(O'A) = \O'\v(A), 3. v(A + B) :::; v(A) + v(B). A= ( II ) 1 1 ' then voo(A.A) = 2 > 1 = voo(A)voo(A). 
11. NORMS AND METRICS G6 The sub multiplicative incquality satisfied by the Frobenius norm allows us to obtain bounds on the products of matrices in terms of the individual matrices. So important is this for matrix analysis, that norms with this property are given a special name. Definition 2.4. Let 11, v, and p be norms on e mxn , e nxk , and e mxk . Then Ii, v, and p are CONSISTENT if p(AB) ::; Il(A)v(B) whcnever A E emxnandB E e nxk . In particular, a matrix norm v on e nxn is consistent if v(AB) ::; v(A)v(B) for all A, B E e nxn . Since vector norms can be identified with matrix norms, Defini- tion 2.4 includes the notion of consistency of a vector norm and a matrix norm. For example, the Frobenius norm and the vector 2-norm are consistent-that is \\Ax\\F ::; \\A\\F\\xl\2-because \\X\\2 = \\X\\F' The following theorem shows that for any consistent matrix norm there is a consistent vector norm. Theorem 2.5. Let \\ . II be a consistent matrix norm on e nxn . Then tJlf're is a norlJl 1/ 011 en that is COIIsistCllt with \\ . \\. Proof. Chose a nonzero vector a E en and define v(x) = \\xa T \\. It is easy to verify that v is a vector norm. Moreover, since v(Ax) = \\Axa T \\ ::; \\A\\\\xa T \\ = \\A\\v(x), v is consistent with \\ . \\. · Consistent matrix norms have an important relation to the eigen- values of a matrix. Let us define the SPECTRAL RADIUS of a matrix A to be the number p(A) d,g max{\A\ : A E L:(A)}. Then we have the following theorem. 2. MATRIX NORMS 67 Theorem 2.6. Let II . II be " Il1a trI ' x A a consistellt matrix lIorm T h e Ii . n or allY p(A) ::; \\AII. (2.2) Proof. By Theorem 2 5 tl . with II . II. Let x be an ien:: IS  etor norm  that is consistent A; i.e", , or 0 correspondmg to an eigenvalue Ax = AX. Taking norms we get IAlv(x) = V(AX) = v(Ax) ::; IIAIiI/(x). Since v(x) > 0 we ma y d l ' . d b ( ) ( VI e y v x to get I A I < II All 2.2) follows from the fact that A E £ (A) ' '. - . The result ( IS arbItrary. . 2.2. Operator Norms Recall that in Section 1 4 we defi d tl d . "ne Ie ual to the norm v by v*(y) = max lyHxl. (2 ) 1/(x)=1 .3 A . . I, n Immediate consequence of this was the generalized C I . I ,y , , auc lY lllequal- . I yll:r: I ::; I/*(Y)I/(:r:), whIch may be interpreted . e, e 1xn and e nxl ,a saYlllg that the norms I . I, v* and v on , are consIstent. It turns out th t ( 2 3 ) general technique for general' I . a . represents a will call OPERATOR NORS'. mg a c ass of consIstent norms, which we Let J1 be a norm on em and b of norms, the function /I I ' S V t ' e a norm on en By the equivalence ( ) . t"" con lllUOUS and th - I S v x = I} IS closed and b d d e v sp lere v = {x : oun e . Hence for any . may define the number II A II b Tn X n matnx A, we /l",V y IIAII/L,v = max IL(Ax). (2 4) v(x)=1 . As the notation II A II p.,V suggests, the function 11 . 11 I ' S a . II Th p.," (CLua y a norm eorem 2.7. Let J1 and v be as above and 1 II II . (2.4). Then II . II/L,v is a norm on emxn '1"1 .et . ./L. v be defined by v. ' W 11C I IS consIstent with J1 and 
68 II. NORMS AND METRICS 2. MATRIX NORMS 69 Proof. We first prove consistency. From (2.4), Il(Ar) II A ll - max-" I',IJ - TiO v(x) We saw in Example 2.3 that generalizing the vector oo-norm by ex- tending its algebraic definition to matrices failed to produce a consistent matrix norm. The notion of an operator norm provides a means of ex- tending the definition of the Holder norms to all matrices. Specifically, if we define Th('ITfoJ'(" II ( ) (2}i) /1.( AI') S 11/ 1 ,.,I,fI :1' " tl t II . II is a matrix norm by showing that it N ow we prove la I"V fi" 21 satisfies the :ollditiOlS of De I;;tn"#. O' there is a index i such that 1. PositIve defimteness. ( A1 ) ' < \lAII. vv(l i ), which implies A1 i "I 0" Then from (2.5), 0 < 11 ,- " that IIAII/l.v > O. 2. Homogeneity. For any n we have II '. \\ = max JI(aAx) = max lalll(Ax) = laIIIAII/I,'J' ax I',IJ v(x)=1 v(T)=l . . A B E e mx1I . Suppose that x satisfies 3 Triangle mequahty. Let , ) v(x)'= 1 and J4(A + B)x] = \lA + BII",IJ' Then from (2.5 , IIA + BII,I,v = 1l[(A + B)x]  Il(Ax) + Il(Bx)  IIAII'I,vv(x) + \lB\l/',vv(x) = IIAII'I'v + \lBI\'"v' · IIAIlI' = lIIax 1111.1'111" lI.rll,,--1 (2.6) then by Theorem 2.9 IIABlip  IIAllpllnll p (2.7) whenever the product AB is defined. We say that these Holder matrix norms form a CONSISTENT FAMILY OF MATRIX NORMS. Another consistent family is the Frobenius norm defined by (2.1). For p = 1,2,00, the Holder matrix norms have explicit characteri- zations. Theorem 2.10. Let A = (aij) E e mxn . Then m IIAlh = max L laij!, I::;J::;ni=1 (2.8) n IIAlioo = max L laijl, I::;,::;m j=1 (2.9) Th 2 7 J ' ust ifies the following definition. eorem . . e m d en The norm 1 lJ be llorms 011 all . Definition 2.8. Let. /1. aJ]( > elllX1l It is also 1\ . 1 ' 1 1 ?[ill('d by (2.4) is called all OPERATOR NORM 011 . 1',1' ( ( . RD INATE to the vector norms 11 and v. said to be the norm SUBO . ws that an operator norm subordinate to two EquatIOn (2.5) slo, . I tl The following thearem shows . s consIstent WIt 1 lem. . h vector norms I, , , . . t norms are consistent WIt that under appropriate condItIOns, opea or ' themselves. Its proof is left as an exercIse. e m e n and e k and let L v and p be norms 011 " , Theorern 2.9. et JL" 1 , b d'nate operator norms. Then II . 1\ II . II alld II . II,I,!' be t Je su or 1 It,V' I/d" IIABI\",!' S IIAII",IJII B II,),!" and IIAII2 = y' .\.nax(AIIA) = O'max(A), (2.10) where A:nax(AHA) is the largest eigellvalue of All A and O'max(A) is the largest singular value of A. Proof. To prove (2.8) let A = (0.1,"', an) be partitioned by columns. For any x "I 0 we have n IIAxll1 = L jaj j=1 n  L Ijlllajill  max lIajlhllxlll' j=1 I::;J::;n Hence IIAIII S max lIajlh. I":J":1I (2.11) 
70 II. NORMS AND METRICS On the other hand, if maX1:'Oj:'On \lajllI = \lakllI, then IIA 1 k\ll = Ilaklll = max Ilajl\J. Ilhlli I:'OJ:'On lIence (2.12) IIAIII 2: l1t{,) lIajl\J. ., ( 1) d ( 2 12 ) together imply (2.8). The two inequalItIes 2.1 an. .' I E tion (2 10) fol- The characterization (2.9) is proved smlliar y. qua . lows from Theorem 1.4.3. · II II ' times called the . f ( 28 ) -(2.10), the norm . 1 IS some " III vIew o. \1 II th ROW SUM NORM; and the narm COLUMN SUM NORM; the norm . 00' , e II . 1\2, the SPECTRAL NOR. 'lVenient characterization of the spec- There is no computatlOnally COI b ' , f " P ro p erties that make it . d I a e anum er 0 Illce . tral norm, but It oes 1< v'S f these properties are contamed useful for theoretical purposes. ome 0 ," in the following theorem. Theorem 2.11. For any matrix A 1 II Alb = max Iyll Ax\, . 117112=1 Ily112=1 2. IIAlIII2 = IIATlh = IIA112, 3. IIAIIAlb = IIAII, 4. If U and V are unitary, IIU lI AVlb = IIAIb. 5. \lAII  \I AI\! \lAlloo. Proof. The first four items follow directly from tl:e sin:oI;1::: decomposition, and their proofs are left as an exercIse. item, note that from (2.7) and Theorem 2.6 IIAI11 = A 1I1ax (A II A)  IIAIIAI\I  \lAlllltllAIIt = \lA\lexo\lAIII' · 2. MATRIX NORMS 71 Notes and References The spectral norm was introduced by Peano [177, 1888], who established its basic properties. The Probenius norm appears briefly in the paper of Peano just cited, but only as a bound for the spectral norm. Schur [193, 19 0 9] uses it in an important bound on the eigenvalues of a mat.rix. Probenius [76, 77, 19 11 ] appears to be first. t.o regard the function II.IIF as what. we would call a matrix norm. Actually he worked wit.h the quantity II'II, which he called the Spannung of a matrix, and established the equivalent of the triangle inequality, consist.ency, invariance under orthogonal transformations. Although norms of operators in infinite dimensional spaces were a staple of functional analysis almost from the beginning, they seem to have percolated more slowly into matrix theory. Applications in numerical analysis played a large role in the process, thanks in large part to a series of conferences in Gatlinburg, Tennessee, hosted by A. S. Householder, which combined both the theoretical and the numerical aspects of matrix theory. Any list of the more influential works would include von Neumann [254, 1937],[256, 1947], Paddeeva [68, 1959], Mirsky [158, 1960], Householder [121, 19 6 4], and Wilkinson [269, 1965]. Exercises 1. Show that IIABIIF ::; IIAIIFIIBIIF whenever the product AB is defined. 2. Prove Theorem 2.9. 3. Establish the first four items in Theorem 2.11. 4. Show that IIAII ::; IIAIII IIAllexo. 5. Let A = (al ... an), where IIadl2 = 1 (i = 1,..., n). Show that IIAI12 ::; yn. 6. Let v be defined on e nxn by v(A) = nmaxi,j Inijl. Show that v is a consistent matrix norm. 7. (Gastinel [121, p.61]). Show that if v is a matrix norm on e nxn , there is a constant T such that the function A 1-+ Tv(A) is a consistent matrix norm. 8. Let v be a consistent matrix norm on en and let X be nonsingular. Show that the function Vx defined by vx(A) = v(X- 1 AX) )i 
72 II. NORMS AND METRICS is a consistent matrix nortn. g. Let v be a norm on c nxn and let B be nonsingular. Let the norm It be defined by J.L(x) = v(Bx). Show that the operator norms 11. III' and II. I\v subordinate to It and v are related hy the equation J.L(A) = v(BAB- I ). 10. Ld II . 11 be the operator norm subordinate to an absolute vector llorm. Show that II dia g(8 1 , 82, . . . ,8" )11 = max 1 8 il. L (2"13) " Conversely, if (2.13) is satisfied for all diagonal matrices by an operator norm, then it is generated by an absolute nortn. 11. (Mirsky [121, p.61]). Show that inf I\X- I AXIIF = L 1.A12, X nonstngular AEL:(A) with equality for some particular X if and only if A is diagonalizable. 12. Show that the numbers Ppq in the following table satisfy IIAl\p ::; ppqllAllq, where A E c nxn . Show that equality can be attained. q ]I 1 2 00 F 1 1 ,fii n .;n 2 .;n 1 .;n 1 00 n .;n 1 .;n F .;n .;n ,fii 1 13. Build the table in Exercise 2.12 for the same norms over c mxn . 14. Let A E C" X1J al\(l let f > O. Show that there is a consistent matrix norm 11 . II A,( such that IIAIIA,( = peA) + f. [Hint: Reduce A to Schur form. Use a diagonal similarity transformation to reduce the off-diagonal clements so that the infinity norm is less than peA) + f. Finally, use Exercise 2.8 to undo the transformations.] 15. Show that the spectral norm and the Frobenius norm are UNITARILY INVARIANT; that is, U, V unitary =? IIU H A Vllp = IIAllp, ]I = 2, F. 2. MATRIX NORMS 73 lG. Let A be square. Show that there i . IIAII = peA) if and only if ev . s a consIstent nortn " . " such that nondefective. . ery eIgenvalue .A E L(A) with I.AI = peA) is THE FOLLOWING EXERCISES INVESTIG TilE POWERS OF A MATRIX A E C'1Jx n ATE THE PROPERTIES OF 17. Show that Jim A k 0 . f koo = I alld only if peA) < 1. 18 00 , (Ne k umann series ) . Let peA) < 1. '" Show that I - A . . 6k=O A = (I _ A) -I . . IS nonslllgular and 19. Show that if v is . I, a conSIS ent matrix norm thenlim (A k ) ! -0-' koo v k = peA). 20. Let the infinite series r/J( () = L oo k . hw that if peA) < (j then L oo k=o2 k ( have radIUs of convergence (j. hnnt. k=O rk A converges. We write r/J(A) for its 21. In the last exercise let I .A I , < (j. Show that r/J[Jk(.A)] = (T r/J'(.A) r/J(.A) o r/J(3)(.A)/3! r/J"(.A)/2 r/J'(.A) r/J"(.A)/2 r/J'(.A) r/J(.A) r/J(3) (.A)/3! ) r/J(3) (.A) /3! r/J" (.A) /2 22. For any A E c nxn define A 00 Ak e =L,. k=o r.:. Show the following. 1. If AB = BA then e A + B _ A B , - e e 2. det(e A ) = etrace(A) 3. deTA/dT = Ae TA . 23. Show that peA) < 1 if and onl if . such that Q _ A Q AH i s po . 1, ' I Y fi . there IS a positive definite matrix Q Sl Ive ( e 1Il1te, 
74 II. NORMS AND METRICS 3. UNITARILY INVARIANT NORMS 75 24 A matrix is STABLE if all of its eigenvalues have negative real parts. Show tht A is stable if and only if limt-++oo etA = O. 25. Show t.hat - A is stable if and only if there i a positive denite Ilatrix ' I . 1 t I t A '1 + MAli is positive definite. [Ihnt: Use ExerCIse 2.2,3.1 "' S\lC I ,la, "' so that II A 1100 = 2. Let 3.1. Von Neumann's Theory An important property of Euclidean space is that shapes and distles do not change under rotations. In particular for any vector x an or any unitary matrix V we have u  ( I 1 ) v'2 v'2 I I - v'2 v'2 Thcn ( 2  ) 4 IIV Alloo = v'2 0 }2 ' 00 3. Unitarily Invariant Nonns IIV xlh = IIxjk Since not all matrix norms are unitarily invariant, we may ask which ones are. One purpose of this section is to establish a characterization, due to von Neumann, in terms of certain vector norms, which, in this connection, are called SYMMETRIC GAUGE FUNCTIONS. In one direction the connection is easy to establish. Let A be of order n, and let VII A V = 2: be the singular value decomposition of A. Let II . II be a unitarily invariant norm. Since V and V are unitary, . d l t l ctral and Frobenius norms: An analogous property IS share )y Ie spe. V H A V namely, for any unitary matrices V and V for whIch the product is defined, IIAII = 112:11. IIVIIAVllp = IIAllp, p = 2,F. These examples suggest the following definition. jlAIl = IIAI12 (3.1) Thus IIAII is a function <I> of the singular values of A. Since II . II is a norm, the function <I> , regarded as a mapping from Rn to R, is also a norm. Since by interchanging columns of V and V we can make the singular values of A appear in any order, <I> must be symmetric in its arguments. Since by multiplying a column of V by -1, we can change the sign of the corresponding singular value, <I> must depend only on the absolute values of its argument. Moreover, if II . II is normalized and A has rank one, then IIAII = <I>(O"IId = 0"1. All this suggests the following definition. A llorlII II . li on c mxn is UNITARILY INVARIANT if it Definition 3.1. satisfies IIVHAVII = IIAII [or all unitary V and V. It is NORMALIZED if W I I r e d that the S I Jectral and Frobenius norms are UIll- e lave 0 )se v . ., th tarily invariant. However, not all norms are unitarily mvanant, as e following example shows. Definition 3.3. A function <I> : Rn -+ R is a SYMMETRIC GAUGE FUNC- TION if it satisfies the following conditions. whenever A is o[ mnk one. Example 3.2. Lct A=( ), 1. x fc 0  <I>(x) > O. 2. <I>(px) = Ipl<I>(x). 3. <I>(x + y) :S <I>(x) + <I>(y). 4. For any permutation matrix P we have <I>(Px) = <I>(x). 
76 II. NORMS AND METRICS 3. UNITARILY INVARIANT NORMS 77 <I>(ld = 1. Since the set of unitary matrices is closed and bounded, there are uni- tary matrices U o and V o for which the supremum in (3.4) is attained. Let C = UOTV O H . We claim that C is a diagonal matrix with nonnegative diagonal el- ements. To see that the diagonals of C are nonnegative, let us suppose, say, that /11 i= 0 is not positive. Then by multiplying the first row of C by 1'1I/hlll (n.b., this is a unitary transformation) we increase R trace( EC), contrary to the optimality of C. To show that the off-diagonal elements of C are zero, let us suppose, say, that /12 i= O. By multiplying the first row of C by 1'12/11121 and di- viding the first column by the same number -- a unitary transfarmation that does not change the trace - we may take /12 > O. Let 5. <I>(lxl) = <I>(x). The fUllction <I> is NORMALIZED if - t . - auge function is an abso- In the language. of. nonls, a SY d lIune rIClUtion transformations. If lute norm that IS mvanant un e.r pern . _ ..,  )T we will often wnte <I>(6,"', n) for . <I> (x).. . x - (6, , . n, -I d roof that every unitarily mvanant norm IS We have Just sketc I. a p . ar values of its argument. The a svmmetric gauge functIon of the smg1 f -t' d if II . II<I> is . . Iso tr ue' if <I> is a symmetrIc gauge unc IOn an converse IS a. , " defined by IIAI\,y, = <1>(a1" . . , a,,), I f A th n II . \\ <I> is a unitarily a are the singular va ues 0 , e where al,' . ., n ." h t IIAII is a definite, homogeneous invariant norm. It IS ea to =t \t asatisfi:s the triangle inequality re- fuction. However, to s eO:l wth an inequality, due to von Nemann, qUIres more work. We b .g t . _ with the trace of theIr procl- relating the singular values of two ma rIces uct. (:3.2) Ro = ( cos B - sin B ) si II B cos B and let Co = Cdiag(Ro, In-2)' Then R trace(EC o ) = al hll cos e +/12 sin e) + a2( 122 cos e - R 121 sin e) + 2::'=3 ai/ii, and n max R trace(AU BV II ) = L aiTi' U,V unitary i=1 (3.3) dR trace( CO) I In dB = al/12 - a2 /21. O=() If this derivative is nonzero, then R trace( Co) > R trace( C) for suf- ficiently small B (positive or negative depending on the sign of the derivative). Otherwise, let Co = diag(R, I n - 2 )C. Then dR trace( Co) I de = a2/12 - a I R/ 21 . o=() Since al > a2, /12 > 0, and the derivative (3.5) is zero, this lat- ter derivative cannot be zero, and a small change in e will increase Rtrace(EC o ). Either case is a contradiction of the optimality of C. Since C is nonnegative and diagonal, its diagonal elements must be the singular values of B; i.e., C = diag( T"(I), . . . , T,,(n)) for some permutation 7r of {I, 2, . . . , n}. Hence (3.5) L 3 4 L et A and B have singular values al  a2  elnma ., ' d T >  > . . . > Tn- Then an I _ '2 - -  an f the case where the ai P f It is sufficient to prove the theorem or ' . . t ' roo. " 1 . turb t h e a" so they are pOSI Ive . . d d' tinct (ot lerWlse per , ' are d . P0 1 .Sl t IV t a a n nd tke the limit in (3.3) as the perturbation approaches an (IS mc . zero). . ) and T = cliag(TI,'" ,Tn). By passing t.o Let E = dJag(al,"" an d B tha t ( 33 ) IS . t' f A an we see . the singular value decomposl IOns 0 , equivalent to sup U,V unitary n Rtrace(EUTV II ):::; LaiTi' i=1 (3.4) n n sup  trace(EUTV H )  L O"iT,,(i)  L O"iTi, U,V unitary i=1 i=1 
7L- II. NORMS AND METRICS 3. UNITARILY INVARIANT NORMS 79 f t tl t the a" and the Tj are the' last ineqllality following from the ac, ,la, ., ' nonincreasing" · d I' <1> . . .' I a dual norm <1>* whose ua IS Since <1> IS Itself a nOlln, It las . llaracte rize II A II <1> in terms ) I It that we can ( c (Theorem 1.12" t turns ?l ' I .t' tion is just what we need to of the dual norm, and tills c lar<IC enza establish the triangle inequality for II . 11<1>' . [ ction and let <I>* be its 3 r: Let <1> be a symmetnc gauge un, Lemma .0. d b ( 32 ) Then 1 1 L t the [unction II . II <1> be denne y . . (ua. e II All <1> = max 3? trace(X H A). IIXII...=I , Proof. L'fr(A) = {at,...,a n }, and ledT(X) = {I,...,n} for any X. Then 3.2. Properties of Unitarily Invariant Norms The correspondence between symmetric gauge functions and unitarily invariant norms allows us to transfer results about the former to the latter. This subsection is devoted to establishing the basic properties of unitarily invariant norms. The first item of business is to extend the definition of unitarily invariant norms to rectangular matrices. Suppose that <1> is a symmetric gauge function on Rn. Then if m = min{ k, l} :S n, a unitarily invariant norm on e kxl may be defined by IIAII<1> = <1>(a],a2,"..,a m ,0,..",0), max 3?trace(X H A) = II x lI...=1 Inax I1 X I1...=l u,v unitary 3? trace(V XHV A) where aI, a2, . . . , am are the singular values of A. We shall call these norms THE FAMILY OF NORMS GENERATED BY <1>. This convention allows us to use what is essentially the same norm on matrices of varying dimensions. In the most important cases, the gauge function <I> can be regarded as defined for all infinite sequences 6,6, . . . with only a finite number of nonzero elements, in which case the corresponding norm is defined for matrices of all dimensions. For example, the function n = max  iai (Lemma 3.4) <1>.(l'"H,,,)=l i=1 = <1>(al, . . . , an) (by duality) = II All <1> (by (3.2))" · W Lellll lHl 3 5 to prove t.hat. eVNY symmetric gauge func- e now use , c. tion generates a unitarily invariant narm. ) L t ,¥,. be a S y mmetric gauge [unc- 3 6 ( Neumann. e '¥ . . 1 Theorelll . yon Co d b (3 .2 ) . Then II . 11<1> is a umtan y . R n d let 11 . II <1> be de1lne y . . . t tJOn on an ' 1 . [ II . 11 is a unitan1y 111 van an . c nxn Converse y, ] 1 invanant norm on . . t . U g e [unction <I> on Rn suc I C nxn then there IS a symme nc ga non1I on " that IIAII = IIAII<1>' . d I P rove that II . 11<1> satisfies the triangle inequalIty. Proof. We nee on y From Lemma 3.5, IIA + BI\<1> = max 3?trace[X Il (A + B)] IIxlI...=1 H max 3?trace(X Il B) :S max 3? trace(X A) + IIXII...=I IIXII...=I = II All <1> + IIBII<1>' · <1>2(6,6,. ..) = max Id , generates the spectral norm, and <I>F(I, 6, .. .) = J2; lil2 generates the Frobenius norm. The fact that a symmetric gauge function is an absolute norm has the following important consequences for unitarily invariant norms. Theorem 3.7. Let A and B have singular values al 2: '" 2: an and Tl 2: . . . 2: Tn. If ai :S Ti (i = 1, . . . , n), then [or every unitarily invariant norm 11,11, we have IIAII :S IIBII. Proof. Let <1> be the symmetric gauge function that generates II . II. Then since <I> is absolute, IIAII = <1>(al,.. ., an) :S <1>h,..., Tn) = IIBII. . 
80 II. NORMS AND METRICS . . ollar of this theorem and Theo- The following useful result IS a cor Y rem 1.4.4. and let the L t II . 1\ be a unitarily invariant norm, Corollary 3.8. e . . A be partitioned 111 the [arm matflx ( All AIZ ) A- . - A Z1 An Then \I All I! ::; \lA\I. . f nitarily invariant " I role m the theory 0 u The 2-narm plays a specIa . tile following theorem shows. norms as . Tl 1] . '1' 1 aflant norms. Ie Let 11 . 1\ be a family of U1l1tan y J1 v Theorem 3.9. \lAB\I ::; IIA\I\lBlb (3"6) and \IABII ::; \lA\b\lB\I. (3.7) Also \\AB\I 2: IIA\linfz(B) (3.8) and \lAB\I 2: infz(A)IIBII. (3.9) M 1 ' [ II . II is normalized, then oreover, IIAIIz ::; \lAII. . . Id (3 7) are immediate corollaries of Proof. The inequalItIes (3.6) a.J I' . alities (3.8) and (3.9) are Theorem 3.7 and Theorem 1.4.5. TIe mequ corollaries of Exercise 1.4.6. > 1 e the singular values of A. To establish (3.10), let al 2: . .. _ an ) Then since 1> is absolute, ) > 1>(al 0,... ,0) = al = IIAIIz. · IIAII = 1>(al,az,...,a" - , . ) and ( 3.10) we have the following corollary. Combinmg (3.6), (3.7 , .' 1 . d unitarily invaflant norms IS A family of norma lze , (3.10) Corollary 3.10. consistent. \. 3. UNITARILY INVARIANT NORMS 81 3.3. Doubly Stochastic Matrices and Fan's Theorem As far as unitarily invariant norms are concerned, the principal result of this subsection is a thearem of Ky Fan, which gives conditions under which one matrix dominates another in any unitarily invariant norm. However, to establish it we must first prove some important theorems on doubly stochastic matrices, one of which will be used later in this book. Definition 3.11. Let A be a matrix with nonnegative elements. Then A is STOCHASTIC if A1 = 1. A stochastic matrix A is DOUBLY STOCHAS- TIC if AT is also stochastic. Since the elements of a row of a stochastic matrix sum to one, they may be regarded as probabilities -- hence the name stochastic. A doubly stochastic matrix is one whose rows and columns sum to one. The first theorem gives necessary and sufficient conditions for two vectors to be related by a doubly stochastic matrix. A little notation will be helpful in stating and proving it. Let x = (6,..., n)T and y = (1]1,..., 1]n)T be real vectors with 6 2: .. . 2: n, 1]1 2: . . . 2: 1]n" (3.11 ) We shall write x)-y if 6 + . . . + k 2: 1]1 + . . . + 1]k, k = 1, . . . , n - 1. and 6 + .. . + n = 111 + . .. + 1]n- We say that x MAJORIZES y. Theorem 3.12 (Hardy-Littlewood-P6lya). Let the vectors x and y satisfy (3.11). Then a necessary and sufficient condition for there to exist a doubly stochastic matrix S with y = Sx (3.12) is that x)-y (3.13) 
82 II. NOHMS AND MI'THJCS Proof. The necessity of the condition is left as an exercise. '; For the sufficiency, note that (3.13) implies that if all the :ii'S are equal to some number, then thJ Yi'S are equal to the same number, and we may take 5 = I. Moreover, if x >- y and we add a constant to the elements of x and y then we still have x >- y. Consequently we may assume without loss of generality that .; I > 0 > ';n and n 2:';i = O. i=1 The proof is by induction. The result is trivial for n = 1. For n = 2, the most general form of a doubly stochastic matrix is ( ai-a ) , I-a a where 0 s: 0' s: 1. Thus (3.12) requires that "11 = a6 + (1 - a)(2 and 112 = (1 - a)';1 + a6. But the hypotheses of the theorelIl imply that 6  "11  6, which in turn implies that the first of these equalities can be satisfied for a unique a E [0,1]. Summing the two equalities, we see that the second is equivalent to ';1 + 6 = 111 + 172, which is the equality in (3.13). Far t.he gellNal case, let us first note that if equality occurs ill any of the inequalities (3.13), then x and y can be partitioned in the form ( XI ) ( YI ) X = and y = , X2 Y2 where the pairs XI, Yl and X2, Y2 satisfy the hypotheses of the theorem. Thus there are doubly stochastic matrices Sl and S2 such that Yi = SiXi (i = 1,2). It follows that S = diag(S],S2) is the matrix required by the thearem. 3. UNITARILY INVARIANT NOHMS 83 Let k be the index of the smallest positive ';i and I be the index of the largest negative ';i; i"e., ';k > 0 = ';k+1 = ... = (1-1 > .;/. Let x' be the vector obtained by replacing k by k - a and ';1 by .;/ + a. For sufficiently small a, we have X >- x' >- y. Now conider the following three cases. k > 1: Here we have equality in the first of the relations x >- x'. i < n: Here we have equality in the next to last of the relations x >- x'. k = 1, 1= n: This is equivalent to the 2 x 2 case. In all three cases, there is a doubly stochastic matrix 5' such that x' = 5' x. Now let us increase a from zero until one of two things happens. 1. We obtain an equality in the relations x' >- y. 2. x k or xl becomes zero. In the first case, there is a doubly stochastic matrix Sy such that Y = 8 I1 x ' , in which case 5 = Sy5 ' is the matrix required by the theorem. In Y the second case, x' has one more zero component than x. We may then '. repeat the above construction to obtain a new vector x"  x' and a P:'double stochastic matrix 5" satisfying x" = 5" x'. Again we either have ]' equality in the relation x" >- y, in which case the theorem is proved, or +X" has one more zero element than x'. Ultimately this reduction must + furnish the required doubly stochastic matrix or produce a zero vector ;",' x(m). This latter implies that Y = 0 and the matrix 5 = 5(m) . . . 5" 5' is the required doubly stochastic matrix. . The second theorem is a characterization of doubly stochastic matri- which says that they are the convex hull of all permutation matrices. !The proof requires a theorem of independent interest. We begin with }l definition. ,: Ii; Definition 3.13. Let T E e mxn , where rn s: n. If 7r is any permuta- >', tion of {I, . . . , n} and 1 s: jl < " . < jm s: n, then the vector  T (T,,(I),h' . . . , T,,(m),jm) is called a PERMUTATION VECTOR of T. 
84 II. NORMS AND METRICS Theorem 3.14 (Hall). Let T E e mxn (m :S n). Then there are pe'- mutation matrices P and Q such that PTQ has a p x q zero submatnx with p + q > n if and only if every permutation vector of T contains a zero component. Proof. We will first. show t.hat. if T can he permuted to the form q 11 -q T = TI-]I (I T12 ) Tzz ' (3.14) where p + q > n, then any permutation vector must h.ave zero como- nents. Suppose to the contrary that (Ti"h, . . . , Tim,jm) IS a pernutatlOn vector with no zero elements. Then from (3.14), first rn-(n-q) mtegers iI, " . . , im-(n-q) must be distinct and lie bet.ween p+ 1 and rn (.inlusive). It follows that m - p  m - (n - q) or n  p + q, a contrachctlOn. The proof of the converse is by induction. It is trivial for rn  1 or n = 1. Therefore assume that rn, n > 1 and that every permutatIOn vector of T has a zero component. Without loss of generality we may suppose that Trnn fc O. Then every permutation vector of the (m - 1) x (n - 1) leading principle submatrix must have a zero component. . For any permutation vector not having a zero component could be combmed with Tmn to give a permutation vector of T that does not have a zero component - a contradiction. By the induction hypothesis, T can be permuted to the form (3.14), where now p + q = n. It follows that T I2 is square and T 2l . has at least as many columns as rows. Now at least one of the matnces T 2I or T 12 must. have all it.s permutation vectors with zero components; for otherwise we could piece toget.her permutation vectors from T 21 and T I2 having nonzero components to fonn a permutation vector far T whose components are nonzero. Assume that all the permutation vectors of T2l have zero compo- nents. By the induction hypothesis, we can permute the rows and columns of T so that it has the form "" '-t S Tl2 ) 7:22 , Tzz (3.15) ]I ( 0 r 0 m-l'-r T 31 3. UNITARILY INVARIANT NORMS 85 where r+s > q. Then (p+r)+s > p+q = n, which shows that (3.15) is the required matrix. . From Theorem 3.14 we get the following corollary. eorollary 3.15. If T E Rnxn is a nonzero multiple of a doubly sto- chastic matrjx, then T has a pernllltathm vector consisting of pOBitivc ele111cllts. v' \.. > I ' .." Proof. The proof is by contradiction. Without loss of generality, we may assume that T is stochastic. If every permutation vector of T contains a zero element, then from Theorem 3.14 the matrix T can be permuted to have a zero p x q submatrix with p + q = n + 1. Since the property of being doubly stochastic is invariant under permutations, we may assume that the zero submatrix is located in the upper left corner of T; i.e., '1 nq T = :_1' (I  ) . Now the sum of all elements of T I2 is p and the sum of all elements of T2l is q. But p + q > n which is greater than the sum of all elements of T. The contradiction establishes the corollary. . We may now establish our characterization of double stochastic ma- trices. Theorem 3.16 (Birkhoff). The set of all doubly stochastic matrices of order n is the convex hull of all permutation matrices of order n; that is, any doubly stochastic matrix 5 can be expressed as a convex combination of the permutation matrices Pi (i = 1,. . . , n!) n! n! L Ui = 1, Ui  0, i = 1,..., n!. i=1 (3.16) S = LUiPi, i=1 Proof. Any matrix of the form (3.16) is clearly doubly stochastic. It therefore remains to show that any doubly stochastic matrix S = (Uij) of order n has the form (3.16). By Corollary 3.15, 5 has a regular set Uli" . . " , Uni n in which Ukik > O. Let UI = minlsksn {UkiJ, and let PI be the permutation matrix with 1 in its (l,id,...,(n,i n ) elements" Let 51 = 5-UI P I. Clearly the matrix 51 has the following properties: 
86 II. NORMS AND METRICS 1. 51 is nonnegative; 2. The sum of all elements in each row and the sum of all elements in each column are equal to 1 - al  0; 3. The 1lI1mber of zero elements of 51 is greater than that of 5 by at least one. If 1 - al = 0, then 51 = 0; i.e., 5 = al PI and the theorem is proved. If 1 - 0"1 > 0, then by Corollary 3.15 the matrix 51 has a regular set consisting of positive elements. Repeating the above argument, we get a matrix 52 = 5 - 0"IP 1 - a2P2 in which 0"2 > 0 and P 2 is a permutation matrix. As above, the matrix 52 is nonnegative. The sum of the elements in each row and the sum of the elements in each column are equal to 1 - 0"1 - 0"2  O. Finally the number of zero elements of 52 is greater than that of 5 by least. two. Continuing in this way we may produce a sequence PI, P 2 , . .. of permutations and multipliers 0"1, a2, . .., with 0 < ai :S 1. The sequence terminates when 1 - al - . . . - am = 0, at which point m 5 = 2: O"iPi i=1 is the required convex combination. . We may now prove Fan's theorem. Theorem 3.17 (Fan). Let x, y ERn satisfy 6  . . .  n  0, 7]1  . . .  7]n  O. (3.17) Then 6 + . . . + k  171 + . . . + 17k, k = 1,... ,n. (3.18) is a necessaIY and sufficient condition for <I>(x)  <I>(y) to hold for all symmetric gauge functions <I>. 3. UNITARILY INVARIANT NORMS 87 Proof. For the necessity consider the symmetric gauge functions <I>k(X) f . max {I';i, I + ... + !';ik I}. IS"<"""<'kS n (3.19) If x and y satisfy (3"17) then <I>k(X)  <I>k(Y) (k = 1,.". , n) is equivalent to (3.18)" For sufficiency, note that by successively reducing n, then nl, and so on, we can obtain a vector i; :S x such that i; >- y. By the Hardy LittlewoodP6lya theorem, there is a doubly stochastic matrix 5 such that y = 5i; :S 5x. By Birkhoff's theorem we can write 5 as a convex combination of permutation matrices: n! n! 2: O"i = 1, O"i  0, i = 1, . . . , n!. i=1 5 = 2: O"iPi, i=1 It then follows that for any symmetric gauge function <I>, <I>(y):s <I>(5x) = <I> (O"iPiX) :S ai<I>(PiX) = <I>(x). . The symmetric gauge functions <I>k defined by (3.19) have associated unitarily invariant norms 1I.II<I>k' When Fan's theorem is recast in terms of these norms it takes the following form. eorollary 3.18. In order for IIAII :S IIBII for every unitarily invariant norm it is necessary and sufficient that IIAII<I>k :S IIBII<I>k' k = 1,... ,n. Notes and References The characterization of unitarily invariant norms as symmetric gauge func- tions of singular values is due to von Neumann [254, 1932]. Lemma 3.4 is of independent interest, since it links the eigenvalues and the singular values of a matrix (see Exercise 3.2). The proof given here is new. 
88 II. NORMS AND METRICS For surveys with extensive bibliographies of the subjects of doubly stochastic mat.rin's and majori7:ation H('( [159, 5]" For t.he Hardy Littlewood P6lya theorem see [103,1934]. The proof given here is due to Ostrowski [168,1952]. For Birkhoff's theorcm see [33, 1946], and for Fan's see [70, 1951]. Fan's paper is the culmination of a flury of results, initiated by Weyl [266, 1949] on inequalities bounding eigenvalues in terms of singular values. As Fan points out, these results can be used to establish von Neumann's characterization of unitarily invariant norms in terms of singular values. Theorem 3.14, which we have called Hall's theorem, is associated with the names Konig and Frobenius. Hall [100, 1935] actually proved a set theoretic version of the theorem given here and noted that it was a generalization of a p;raph UH'oretic t.heore'm of Kiinip; [136, 1916], which van del' Warden [246, 1927] had }'('cast. in a sl'!. t.h('o}'(ti(' fonn. TIIP aSHociation wit.h Frobenius apl)(ars t.o be spuriouH" The earliest reference we know that mentions him in t.his connection is by Dulmage and Halperin [62, 1955], from which the proof givell here is adapted. This paper cites Frobenius's famous paper on nonnegative matrices [78, 1912]. However, the theorem does not appear there, at least not in all obvious form. Exercises 1. Show that for any unitarily invariant llorm II . II, II (AI :2) II  II ( : A 21 A 22 ) II. 2. Let L(A) = {AI,..., An} and S(A) = {iTl,... ,iT n }. Show that IAII +... + IAnl ::; iTI +... + iT n . 3. (Fan and Hoffman [71]). Let II. II be a unitarily invariant norm. 1. Show that if H is Hermitian and U is unitary then IIH - III  IIH - UII  IIH + III. 2. Show that for any Hermitian matrix H, IIA - A +2 A H II::; IIA - HII. > .'t, '\ :?, 4. METRICS ON SUBSPACES OF en 89 4. For any real vector x let :z;+ = (x + Ixl)/2 be the result of setting the negative components of x to 7:ero. Show that x )- !/ if and only if IT (.1; - Tl)+ :2: 1T(y - T1)+ for all real T. 5. Show that x )- y if and only if y is a COllvex combination of all vcctors of the form Px where P is a permutation matrix. 6. (Hall's theorem, the original version [100]). Let A be a set consisting of n elements. For Tn ::; n suppose that A = Ui1 Hi. Show that there exist Tn distinct elements al,..., am of A such that ai E Hi (i = 1,..., rn) if and only if every union of k of the sets Hi contains at least k distinct elements of A. 4. Metrics on Subspaces of en A difficulty in framing a workable perturbation theory for eigenvectors is that there may be no unique eigenvector corresponding to a multiple eigenvalue. For example, any nonzero vector in the space spanned by the unit vectors 1 1 and 1 2 is an eigenvector of the matrix A 0  n When A is perturbed, two distinct eigenvectors can precipitate from this subspace, and for different perturbations these eigenvectors can be quite different, even when the perturbations are small. For example, when E > 0 the matrix u+n has eigenvectors 11 and 1 2 , whereas the matrix (  n 
------- II. NORMS AND METRICS 4. METRICS ON SUI3SPACES OF en 91 has the very different eigenvectors (1 1 0)'1' and (1 -1 0)'1'. In spite of such differences, the two eigenvectors will always span a space that is very near to the space spanned by 11 and 1 2 . Thus it makes more sense to derive perturbation bonnds for the subspace, which is stable, rather than for the eigenvectors, which are not. In order to derive snch a theary, we must first specify what we mean by the distance between subspaces. Since we will be comparing only subspaces of the same dimensions, we can restrict ourselves to the problem of introducing a notion of distance on the set of all l dimensional subs paces of en, a set that we will denote by C;' (or R/, in the real case). Unfortunately, this can be done in a number of ways, not all of which turn out as we might hope. Definition 4.2. Let. X be a subspace of en and let y E en. If v is a nonn on en, then the V-DISTANCE between y AND X IS THE FUNCTION 8" (y, X) ,f min v(y - x). xEX ( 4.1) An elementary compactness argument shows that Dv is well defined. When v is the 2-norm, it follows from Theorem 1.2"5 that D2(Y, X) = 11(/ - Px )ylb ( 4.2) p(X, Y) = L[x(X), x(Y)] In other words, the 2-distance between y and X is the distance between y and its projection onto X. For this reason, a minimizing vector x in (4.1) is sometimes called a v-projection of y onto X. We are now in a position to define a distance between subspaces. Definition 4.3. Let X, Y E er and let v be a norm on en" Then the V-GAP BETWEEN X AND Y is the number p,,,( X, Y) "'" mox b! 6,,( x, Y), :;:c 6"(y, X) } . Thus the gap is the largest distance from Y of a vector of length one lying in X, or vice versa, whichever is greater. The definition of the gap function satisfies our intuitive notions of what a distance between subspaces should be. In Example 4.1 it gives Ri a topology in which lines that intersect at small angles have a small distance from one another. However, the gap function need not be a metric, and this means that we must establish from first principles that a gap function actually generates a topology. We shall do this in two stages. First we shall prove that all gap functions are equivalent, in the same sense that all norms are equivalent. We will then show that one gap function, Pg,2 is a metric. It will then follow from their equivalence that all gap functions generate the same topology. Example 4.1. Consider the space Ri of all infinitely extended lines in the plane that pass through the origin. Given any sIbspace X E Rio there is a unique vector x(X) = (6 6)T in X that lies in the half open semicircle {x: 1I;z;112 = 1,0::; 1,-1 < 2:S 1}. It is easy to verify that the function is a metric on Ri. However, the lines along the directions (0, -1) and (f, + 1) are nearly 7r apart in this metric, even though the lines themselves approach one another as f -+ O. In the above example we gave Ri the topology of a semicircle open at one end, which allows lines that are near in the usual sense of the word to be far apart in the sense of the p metric. This shows that we have to t.ake some care in defining distances and metrics on e/,. In the first subsection we will introduce one widely used distance function- the gap - and derive its properties. In the second subsection we will consider metrics that are unitarily invariant. Theorem 4.4. Let f1 and v be norms on cn, and let 4.1. The Gap af1(x) ::; v(x) :S (3f1(x), a, (3 > o. In defining a notion of distance between subspaces, it is natural to begin with the distance between a point and a subspace. Then a (3 {jP g ,fl(X, Y) ::; pg,v(X, Y) ::; Pg,fl(X, Y). (4"3) 
92 11. NORMS AND METRICS Proof. First we will establish t.he second inequality in (4.3). Let x E X with v(x) = 1 and let y E y. Let x' = X/Jl(X), so that j1,(x') = 1, and let y' = y / Jl( x ). Then v(:1: - y) = IL(X)V(X' - y') :S !:.v(x' - y') :S Jl(X' - y'). a 0: From this it follows that Dv(X, Y) :S DI'(X', Y). a As :1: ranges over all vectors in X with v( x) vectors in X with Jl(X') = 1. Hence 1, x' ranges over all (3 max D,/(x, Y) :S - max D1,(J;', Y). TEX Q :r'e,\' v(x)=l It(X')=l Similarly, max D,,(y,X):S  max D1,(y',X), yEY a y'EY v(y)=1 /t(y')=l and the second inequality follows from the definition of the gap. The first inequality in (4.3) follows from the second and from the fact that (3-IV(X):S Jl(x):s a-Iv(x). . We shall now prove that P g ,2 is a metric. The proof is based on the following characterization of Pg,2( X, Y) in terms of the canonical angles between X and y. Theorem 4.5. Let X,Y E C[', and let 8 = diag(Ol,...,OI), where 0 1 2: . . . 2: 0 1 are the canonical angles between X and y. Then P g ,2(X, Y) = sin 0 1 = II sin 8112 (4.4) Proof. By (4.2), D2(X,Y) = 11(/ - Py)xI12' 4. METRICS ON SUI3SPACES OF en 93 Hence max D2(X, Y) = max 11(/ - Py)XII2 xE,\ xE IIx1l21 IIx1l21 = max 11(/ - Py)xlI2 :rE.\ II x ll2S 1 = max 11(/ - Py)Pxxlb IIxllFI = 11(/ - Py)PxI12' Thus by Theorem 1.5.5, max D2(X,Y) = sinOI. .rE.\' IIx1l21 Similarly, max D2(Y,X) = sinOI, yEY lIyllF' and the result follows from the definition of the gap" . Since by Theorem 1.5.5 the singular values of Px - Py are the sines of the singular values of the canonical angles between X and Y, we have the following corollary. eorollary 4.6. In the 2-norm, P g ,2(X, Y) = IIPx - Py1l2' (4.5) ," It immediately follows from Corollary 4.6 that Pg,2 is a metric on er [e.g., to get the triangle inequality write Pg,2(X, Z) = IIPx - PZll2 :S IIPx - Pyll2 + IIPy - PzII2 = Pg,2(X, Y) + Pg,2(Y, Z)]. Consequently, Pg,2 induces a topology on Cr, and by Theorem 4.4 aU gap functions induce the same topology, which we will call the GAP TOPOLOGY. Equation (4.5) does not hold in general; if we replace the 2-norm by another norm, the equality may fail. However, the right-hand side, regarded as a function of X and Y, remains a metric on el'. We leave the proof of the following theorem as an exercise (the subscript p in (4.6) stands for projection). Theorem 4.7. Let v be a matrix norm on e nxn . Then the function ppAX,y)' v(Px - Py) (4.6) is a metric on el', which generates the gap topology. 
94 II. NORMS AND METRICS 4.2. Unitarily Invariant Metrics A metric P 011 Cl' is UNITARILY INVARIANT if p(X, Y) = p(U X, UY) far all unitary matrices U. In this subsection we will be interested in unitarily invariant metrics that generate the gap topology. Now the metric of Example 4.1 is not unitarily invariant and does not generate the gap topology" It is therpfare appealing to conjecture that any uni- tarily invariant metric must generate the gap topology. Unfortunately, the conjecture is not true, as the following example shows. Example 4.8. Let X, Y E Ri and let O(X, Y) be the canonical angle betwecn X and y. Denne p( X, Y) by , _ { e( X, Y), if e( X, Y) is rational, p( ,l , Y) - 1, if O( X, Y) is irrational. Thcn it is easilv verined that p is a unitarily invariant metric. But two subspaces with an irrational canonical angle are at a distance of unity, no matter how near they are in the gap topology. Fortunately, there are many unitarily invariant metrics that gen- erate the gap topology. One of them is the gap function Pg,2, since II . 112 is unitarily invariant. However, this metric effectively exhausts class of unitarily invariant metrics that can he generated by gap func- tions, since up to a constant multiple the 2-nonn is the only unitarily invariant vector norm on e". l3ut there are many unitarily invariant matrix norms, which can be used in the definition (4.6) to give unitarily invariant metrics generating the gap topology. Theorem 4.9. If lJ is a unitarily invariant matrix norm, then Pp,v is a unitarily invariant metric on el" By (4.4), Pg,2(X,y) = Ilsin8(X,Y)lb, and it is natual to ask if we can find new unitarily invariant metrics of the form v[sm 8(X, Y)], where lJ is a unitarily invariant matrix norm. Unfortunately, these metrics are just the Pp,v metrics in disguise. Theorem 4.10. Let lJ be a unitarily invariant matrix norm on e lxl . Then there is a unitarily invariant matrix norm v' such that v[sin 8(X, Y)] = pp,v'(X, Y) for all X, Y E el'. ::  4. METRICS ON SUBSPACES OF C n 95 Proof. We will treat only the caSe 2l S n, leaving the case 2l > n as an exercise. Let 171 2': ... 2': 171 be the sines of the canonical angles between X and y. By Theorem 1.5.5, we know that the singular values of Px - Py are 171,171,"..,171,171,0,. " , O. Let <P be the symmetric gauge function that generates lJ, and define <P' : Rn --+ R as follows. For any vector x E Rn with lill 2': li21 2': .. . 2': I inl let <P'(x) = <P ('ill ; li21 , li31 ; li41 , . . . , li2H 1 2 + li211 ) . It is easily verified that <P' is a symmetric gauge function. Let lJ' be the unitarily invariant matrix norm generated by <P'. Then lJ'(Px - Py) = <P'(al, 171,...,171,171,0,.. .,0) = <P(al""' ,(71) = v[sin 8(X, y)]. . We now turn to two different metrics on Ci'. To motivate the first, let X, Y E ei' and let the columns of X and Y form orthonormal bases for X and y. If X = Y, then there is an unitary matrix Q = yHX such that X = YQ or equivalently IIX - YQIIF = O. This suggests that we use the number Pb(X, Y) f min IIX - YQIIF Q umtary as a measure of the distance between X and Y (the subscript b stands for basis). Theorem 4.11. The function Pb is a unitarily invariant metric on ei'. If Ii is the cosine of the i th canonical angle between X and Y, then Pb(X, Y) = ) 2 L(1- Ii)' (4.7) Proof. It is easy to see that Ph is a unitarily invariant, function that is zero if and only if its arguments are equal. show that it satisfies the triangle inequality. nonnegative We will now 
96 II. NORMS AND METRICS Let the columns of X, Y, Z form orthonormal bases for X, Y, Z E Cr. Let p,,(X, Y) = IIX - YQx,yIlF, Pb(Y, Z) = IIY - ZQy,zIIF, Pb(X, Z) = IIX - ZQx,zIIF' Then Pb(,l', Z) :::; IIX - ZQy,zQx,yIlF = IIX - YQxy + YQx,y - ZQy,zQx,YIIF :::; IIX - YQx,yIlF + IIY - ZQY,zIIF = Pb(X, Y) + Pb(Y, Z). We will establish (4.7) for the case 2l :::; n. Without loss of gener- alit.y, we may assllmc t.hat X and }' ,Ue' canonical bases for X and Y" Thus we must find a unitary mat.rix Q that minimizes 2 (f)-O)Q = III - rQII + IIEII. F The second term on the right-hand side of the above equation is inde- pendent of Q. Hence Q must minimize III - rQII = trace(1 - Qllr - rQ + r 2 ). This quantity is minimized when the diagonals of Q are one, and since Q is unitary, Q = I. Hence III - rQII + IIEII = trace(I - 2r + r 2 + E2) = 2trace(I - r) = 2 Li(1 - Ii)' . The second metric is defined by the formula I det(XHY)1 pe(X, Y) = arccos V det(XH X) det(yHY)' wlH're th(' columns of X and Y form bases for X and y. We will also consider the closely related metric pg(X, Y) = sin Pe(X, y). 4. METRICS ON SUBSPACES OF en 97 It is easily seen that these functions are unitarily invariant and inde- pendent of the choice of X and Y. By choosing canonical bases we see that Pe(X, Y) = arccos n Ii, (4"8) where, as usual, the Ii'S are the cosines of the canonical angles between X and y. The proof of the following theoreIll shows that they are metrics on e/,. Theorem 4.12. The functions pe and pg are unitarily jnvarjant metdcs on e/,. Proof. The fact that pg is a metric follows immediately from the fact that Pe is a metric, which we will now establish. From (4.8) we have PII(X, Y) = 0 {=::::> Oi Ii = 1 {=::::> II = 12 = . . . = 1 {=::::> X = y. Thus it remains only to show that Pe satisfies the triangle inequality. Let the columns of X, Y, Z form orthonormal bases for X, Y, Z E e/,. Then we must show that arccos I det(X H Z)I ::; arccos I det(XIIY)1 + arccos I det(y H Z)I. For S = X, Y, Z, let 6j;")"i , denote the determinant of the matrix formed from the i1th, ..., i[th rows of S. By the Binet-Cauchy formula (see, e.g., [81, V.I, p.9J) we have det(XHy) = L 6;;')i, 6 t y \, 1:";;1 <"""<i/" (4.9) with similar formulas for det(XH Z) and det(yH Z). For S = X Y Z let the components of Vs be the numbers 6(S)" taken in some' fie(i order. Then l[ "HII det(XHy) = VVy, with similar formulas for det(XII Z) and det.(yll Z). Thus our problem reduces to showing that arccos Ivvzl ::; arccos Ivvyl + arccos Ivvzl. (4.10) 
98 II. NORMS AND METRICS The QR decomposition of the matrix (vy Vx vz) gives a unitary matrix U such that ) T vy=U(I,O,""",O , ) T VX=U(0'1,0'2,0,...,0 , ) T V 7, = U U:l1 , ;32, /3:1, 0, " " . , 0 , 10'11 2 + 10'21 2 = 1, 1(:1 1 1 2 + 1/321 2 + 1;331 2 = 1. Since 1 0 1/31 + 02/321;:::: 10'111/311-10' 211;321 = 10'111/311 - Vi -10'11 2 V l - 1/311 2 -1/331 2 ;:::: 10'111;311- J I-10'11 2 V I-I/3112 = cos(arccos kl'll + arccos 1;311), we have arccos 10l,Lh + 02.821 :S arccos lall + arccos 1.811 which is the required inequality (4.10). . Since most perturbation bounds deal with small quantities, it is in- structive to examine the asymptotic behavior of the metrics introduced above when the canonical angles e i are small. Specifically, we have 1. Pg.2 = si n (}I  1181h, 2. Pp,F = )2 L sin 2 e i  v'2118I1F, 3. Ph = J 2 2::(1 - cos e i )  11811F, 4. pe = arccos(n cos e i )  11811F. In particular, comparing (4.11.3) and (4.11.4) with (4.11.2) we see that the metrics Ph and pe also generate the gap topology. (4.11 ) Notes and References The original impetus for comparing subspaces came from functional analysis. According to Berkson [27], the gap or "opening" Pg,2 was first defined on Hilbert space by Krein and Krasnoselski [137, 1947]. Krein, Krasnoselski, and Milman later extended the notion to an arbitrary Banach space [138, 1948]. For more on the gap and its applications see Kato [135]. 4. METRICS ON SUBSPACES OF en 99 The metric Ph was introduced by Paige [173, 1984] and the metric pe y Lu !149, .1963], A, this writing there is no simple characterization of unitarily Il1vanant metncs for subspaces. One reason is that these metrics - unlike norms, which are defined for one object on a space with a linear structure-- are relations between two objects in a space with a complicated structure. Hweer, the survey in this section suggests that any reasonable approach will YIeld something that can be expressed in terms of canonical angles __ at least asymptotically [ef. (4.11)]. Exercises IN THE FOLLOWING EXERCISES ,1:' AND YARE SUBSPACES OF en AND v IS A NORM ON en. UNLESS OTHERWISE STATED X AND Y HAVE THE SAME DIMENSION. 1. Show that if dim(X) > dim(Y) then there is a point x E X such that ov(x,y) = v(x). [Note: This theorem is nontrivial. For a proof and further references see [135, Ch.lV,Sec.2], from which some of the following exercises were excavated.] 2. Show that Pg,v(X, Y) :S 1, with equality if dim(X)  dim(y), 3. Let bv(Y, X) = mill u(y - :1; ) :rEX I/(T)l and Pg,v(X, Y) = max { max 8v ( X, Y ) rE,\' , v(x)=1 Show that Pg,v is a metric. 4. Show that max 8v(Y, X) } . yEJ' v(y)=1 Pg,v(X, y) :S Pg,v(X, y) :S 2pg,v(X, Y). 5. Show that if {.1'd is a Cauchy sequence in Pg,v then there is a subspace Y such that X k -+ Y in Pg,v (i.e., the space of all subspaces is complete). 6. (Schiiffer [153]) . Let 1T(X,Y) = max{min{v(I - C): CX = y},min{v(I - C): CY = X}} 
100 II. NORMS AND METRICS and Ps(,1', Y) = 10g[1 + 11'(,1', Y)]. Show that {is is a metric. Chapter III Linear Systems and Least Squares Problems In this chapter we will be concerned with the solution of the linear system Ax = b, where A is a nonsingular matrix, and with the closely related least squares problem minimize lib - Axllz, (1) where A is a general m x n matrix. A solution of the latter problem is given by Atb, where At is the pseudo-inverse of A to be defined below. When A is nonsingular, At = A-I, so that t.he pseudo-inverse solves the first problem as well. This chapter begins with an introductory section, after which we will treat the perturbation of matrix inverses and pseudo-inverses and in consequence the solution of linear systems and linear least squares problems. As we have noted, the fonuer is a special case of the lat- ter, and in principle we could approach the subject by developing the perturbation theory of pseudo-inverses and least squares problems and note what happens when m = n. However, the perturbation of pseudo- inverses is complicated by the fact that the pseudo-inverse need not be a continuous function of its elements. We will therefore begin with the 101 
102 Ill. LINEAR SYSTEMS AND LEAST SQUARES simpler case of matrix inverses and linear systems. This approach also has the advantage of presenting some of the key ideas of perturbation theory in a comparatively simple setting. 1. The Pseudo-Inverse and Least Squares 1.1. Generalized Inverses and the Pseudo-Inverse Let A E e nxn . It is well known that if A is nonsingular then there is a unique matrix X such that AX = X A = I. (1.1) The matrix X is called the inverse of A and is denoted by A -I. In this case the linear system Ax = b has a unique solution x = A -I b. It is natural to attempt to generalize the idea of an inverse to the case where A is singular or even fails to be square. This can be done by requiring X to satisfy conditions that are less restrictive than (1.1). By varying the conditions, we can obtain many different "generalized inverses," each suited to its own application. In this book we will be particularly concerned with the geometry of C n , and the following PENROSE CONDITIONS are the most appropriate: 1. AXA=A, 2. X AX = X, 3. (AX)H = AX, 4. (X A)" = XA. (1.2) Note that the first condition alone implies that X = A-I when A is nonsingular. However, it does not define X uniquely when A is singular. We are therefore free to impose additional conditions. The conditions 2-4 have geometric implications, which we will explore in the next section. It is customary to denote an "inverse" satisfying a subset of the Pen- rose conditions, say conditions i, j, and k, by writing A(i,j.k). Thus A(I) satisfies only the first condition. The matrix A (1,2,3,4), which satisfies all four, is written At, and is called the MOORE-PENROSE GENERALIZED INVEHSE or tllP PSEIJDO-INVEHSE of A" 1. THE PSEUDO-INVERSE AND LEAST SQUARES 103 There are explicit formulas for some of the generalized inverses gen- erated from the Penrose conditions. Let A = U (+ ) V H (1.3) be the singular value decomposition of A. Let us seek A(I) in the form AP'  V U ) U H By direct multiplication "! AA'O A  U ( E, ''E, n If", and it follows frolll the first Penrose condition that T = E- 1 Tlms any (I)-inverse has the form + A'"  V (E{' ) UH, where K, L, and M, are arbitrary. If we now seek a (1, 2)-inverse in the form AP,"  V (Ei' ) U", then by the second Penrose condition A(1,2) = A(I,2) AA(1,2) = V ( E O + 0 ) UH. LE+K Thus any (1, 2)-inverse has the form A(I,2) = V ( E L +1 K ) H M U, 
104 III. LINEAR SYSTEMS AND LEAST SQUARES where I< and L are arbitrary and LE+I< = M. In the same way we can prove that any (1,2, 3)-inverse has the form ( 12'1 ) ( 1::;,1 0 ) V II A ", = V L 0 ' with L arbitrary; and any (1, 2,4)-inverse has the form A(I,Z,4) = V ( E:;:I I< ) VII o 0 ' wit.h I< arbit.rary. Finally the pseudo-inverse has the form At',"A)  V (E;' ) U H (1.4) These representations are independent of the choice of singular vectors (see the comments following Theorem 1.4.1). In particular, since there is nothing arbitrary about (1.4), we have established the existence and uniqueness of the pseudo-inverse. Theorem 1.1. Let A E C'" x n" Theu there is a uuique matrix X E c"xm that satisfies the Penrose cauditians (1.2). The following properties of t.he pseudo-inverse are easily established from (1.2) or (1.4) Theorem 1.2. For any matrix A the fallawiug hold. 1. (At)t = A 2. (A)t = (At) . 3. (AT)t = (At)T. 4. rank(A) = rank(At) = rank(AAt) = rank(At A). 5. (AAII)t = AId At, (AIIA)t = At A Ht . 6. (AAH)tAA H = AAt, (AIIA)tAHA = AtA 1. THE PSEUDO-INVERSE AND LEAST SQUARES 105 7. If A E e",xn has rank n, then At = (All A)-I All aud At A = f(n). 8. If A E e mxn has rauk rTl, I,hen At = AII(AAII)-1 and AAt = f(m). 9. If A has the full rauk factorization A = PCIl, where rank(P) = rank(C) = rank(A), then At = C(p II AC)-l pH and At = (ct) lI pt. In particular, Ear the singular value Eactarizatiau A = VI E+ VI, At = \/,1:- J V II I + I' 10. If V, V are unitary matrices, then (U AV)t = VIIAtU II . 11. If A = ( D O 0 0 ) E enxn, with D = diag(81"", 8r) aud 8 i i= 0 for i = 1,...,1', theu ( D-I 0 ) At = E e nxm o 0 r' Theorell 1.2 shows. that the pseudo-inverse has many properties in commo wIth. the ardmary inverse. However, it fails to share other propertIes. It IS left as an exercise to construct examples to show that 1. (AB)t is not necessarily the same as BtAt , 2. AAt is not necessarily the same as At A , 3. (Ak)t is not necessarily the same as (At)k, 4. The nonzero eigenvalues of At are not necessarily the reciprocals of the nonzero eigenvalues of A. 
106 III. LINEAR SYSTEMS AND LEAST SQUARES 1.2. Project.ions and Least. Squares As we saw in Section 1. 2, any solution of the problem (1) of minimizing Ilb- Axllz must satisfy Ax = PAb, where P A is the orthogonal projection onto the column space of A. It turns out that this projection can be expressed in terms of the pseudo-inverse of A. Theorem 1.3. For any matrix A, 1. P A = AAt is the orthogonal projector onto R(A), 2. PAil = At A is the orthogonal projector onto R(A II ), 3. I - PAil is the orthogonal projector onto N(A). Proof. From the Penrose conditions, AAt is a Hermitian idempotent and hence is the orthogonal projector onto R(AAt) C R(A). Since A = (AAt)A, we have rank(AAt) 2: rank(A). Hence R(AAt) = R(A), which establishes Part 1. Part 2 follows from Part 1 on observing PAil = AII(AII)t = [AII(AH)t]H = AtA. To establish Part 3, note that by Part 2 A(I - PAil) = A - AAtA = A - A = 0, so that I - PAil is an orthogonal projector into N(A). But if Ax = 0, then (I - PAII)x = X - At Ax = x, so that I - PAil projects onto N(A). . The characterization Ax = PAb of the solution of the least squares problem (1) is not sufficient to determine x when A is not of full column rank. The following theorem gives a complete characterization. 1. THE PSEUDO-INVERSE AND LEAST SQUARES 107 Theorem 1.4 (Penrose). The SOlUUOllS of the problem (1) have the general form x = Atb + (I - PAII)Z, (1.5) where Z is arbitrary. Of all the solutions, Atb has the smallest 2-11onn. Proof. By Theorem 1.2.5, any minimizing vector Ax must satisfy Ax = PAb. Since A(Atb) = PAb, the vector x = Atb is one solution" Let us now seek a general solution in the form x=Atb+y. (1.6 ) Since Ay = A(x - Atb) = Ax - PAb = 0, We have y E N(A). By Theorem 1.3, y = (I - PAII)Z, where Z is arbitrary. This establishes (1.5). Since Atb is orthogonal to (I - PAII)Z, from IIxll = IIAtbll + II(I - PAII)zII we see that IIxll is minimal whcll Z = o. . Whell A has full column rank, thcn PAil = I, and the solution of (1) is unique. We conclude this subsection with two sets of equations that are always satisfied by least squares solutions. The first are the classic NORMAL EQUATIONS. The second, which involves both the solution and the residual, are called the EXPANDED EQUATIONS and are useful in a number of applications. The proof of the following theorem is left as an exercise. Theorem 1.5. Let x be a solution of the least squares problem (1), and let r = b - Ax be the associated RESIDUAL VECTOR. Then A H Ax = A H b and Un )(:) () (1. 7) 
108 III. LINEAR SYSTEMS AND LEAST SQUARES Notes and References Although Gauss [84, 1821] exhibited the rows of the pseudo-inverse to prove his celebrated minimum variance theorem (see Exercise 1.g), he had no con- cept of the pseudo-inverse as a matrix or an operator, and it would be mis- taken to impute the notion to him. The true fathers of the pseudo-inverse are Moore [162, Ig20], Bjerhanuner [34, 1951] (the full rank case), Penrose [178, 179, 1955,1956], and to a lesser extent Bergman et al. [26, 1950], who introduced the (1,2)-inverse of a symmetric matrix and specialized it to the pseudo- inverse. Penrose, perhaps because of his elegant algebraic char- adpri7,atiol\ of the pseudo-inverse touched ofT a vogue in the subjed of gcnprali7,cd inverses. By 1976, NashI'd and Rail [163] were able to compile a bicentennial bibliography running to 1776 entries, in which they all but say that a number of these papers contributed more to the promotion of their authors than to the promotion of science. Things have settled down since then, and now is a good time to sift the residuum. Any attempt to use a generalized inverse in the case where the matrix is not of full rank must come to grips with the fact that under such circumstances the generali;>;ed inverse is not a continuous function of its elements. This observation was first made by Penrose [178] for the pseudo-inverse, but it is true of any (I)-inverse: Thus, to use a pseudo-inverse in practice, one must determine the rank of the matrix and project the errors appropriately. Unfortunately, determining the rank of a matrix in the presence of errors is a very difficult problem (e.g., see [211]). For this reason, papers about the applications of generalized inverses have a certain theoretical air about them: they leave you all dressed up with nowhere to go. The clear winner in the generalized inverse sweepstakes is the pseudo-inverse applied to full rank problems. It is unique, continuous, and computable (although one seldom has need of an explicit pseudo-inverse). Moreover, its connection with orthogonal projections makes it useful in discussing the gpolJ\ptry of lI-space. A distant second is the Dra;>;in generali;>;cd-inverse [61, 1958] (see Exercise 1.23), which enters into the perturbation theory of eigenvalues and eigenvectors, one of the few cases in which we know the rank a pri07'i (see Example V.2.1O). The principle of lea.<:;t squares was used by Gauss in astronomical calculations 'This follows from the fI'Let thI'Lt, mnk(A) = trI'Lce(A(1)A) I'Lnd thI'Lt the trace of a n",trix, being an intr'gpr I is dicol1titHlOllS unlc:o;s the matrix is of full rank. 1. THE PSEUDO-INVERSE AND LEAST SQUARES 109 in the 1790s. IIowever, Legendre [143, 1805] first publishf'd the method, and Gauss's subsequent claim to it [82, 8:, 1809] sparked a famous priority dispute (for a discussion and further references see [219]). The applications of least squares are far too many to survey here. For the statistician'spoint of view, see [196]. [Warning: the notation is quite different. A statistician does a "regression analysis," which is usually written in the form y = Xj3 + e, where X has n rows and typically p columns.] For the numerical analyst's point of view, see [38, 142]. The least squaresproblem can be generali;>;ed in a number of ways. One is to replace the 2-norm by an elliptic norm (See Corollary II.1.5 and Exer- cise II. 1.16). Actually this can be done in two ways, since we can generali;>;e Theorem 1.4 by requiring that :r be the solution of minimal T-norm that min- imizes lib - Axlls. The theory of these problems can be approached through elliptic pseudo-inverses (see Exercises 1.13-1.15). However, the numerical solution of such problems requires a different approach (see [171, 172]). Another generalization is to require that the solution satisfy a linear equality constraint. This problem can also be approach via generali;>;ed inverses, but the returns on this approach are not yet in. See [64] for details and further references. Exercises 1. (Moore's characterization [162]). Show that if 1. AX A = A, 2. X AX = X, 3. R(A) = R(X H ), 4. R(AH) = R(X), then X = At. 2. (Bjerhammer's characterization [34]). Let A have full column rank and let B be any matrix such that A H B = 0 and (A B) is nonsingular. If ( X yH H ) = (A B)-I, ',then XH = At. 
110 111. LINEAR SYSTEMS AND LEAST SQUARES 3. Show that if A(I) is a l-invn;e, then rank(A) :S rank(A(I»). Moreover AA(I) is an (oblique) projection onto th column space of A. If in addition A(I) is a 3-inverse, then the projection is orthogonal. 4. Let A have the singular value decomposition (1.3). Determine conditions on T, [{, L, ami Af such that x = V ( I' [{ ) UH 1, AI is a (1,3)-inverse. 5. Establish the existence and uniqueness of the pseudo-inverse directly from Penrose's conditions 6. Show that if A has full column rank and A = QR is the QR factorization of A, then At = RIQH. 7. Show that if A is of full rank then inf2(A) = IIAtllil. What is IIAtilil when A is not of full rank? 8. Prove Theorem 1.2. (t) d I . th 9. (Gauss [84, 1821]). Let A have full rank, and let a i enote t Ie I row of At. Show that Ilat)112 = min Ilzll2' zH A=l; Conclude that among all matrices Z satisfying ZH A = I (i.e., among all RIGHT INVERsES of A) the pseudo-inverse has minimal Frobenius norm. [Note: Gauss's application was the following. If b = Ax + e, where e is a vector of uncorrelated random variables with mean zero and variance (j2, then any vector satisfying zH A = IT yields an unbiased estimate zHb of the first componC'nt of x. Since the variance of zHb is (j2I1zll: the let squares csUlIIate is UH' lIIinillllllJl variance ('stilllatc. This result IS sometlIIles called t.he Gauss- Markov t.hconm, although the attribution to Markov is spurious.] 10. (Penrose [178]). Show that the equation AX B = C has a solution if and only if AA(l)CB(I)B = C, in which case the most general form of the solution is X = A(1)CB(I) + Y - AA(1)YB(I) B, where Y is arbit.rary. 11. Show t.hat if A is normal, then (An)t = (At)n. 'Ji 1. THE PSEUDO-INVERSE AND LEAST SQUARES 111 12. Show that At = lim(TI + AHA)-I A H = lim AH(TI + AAII)-I. TO TO THE FOLLOWING EXERCISES SHOW HOW TO GENERALIZE THE NOTION OF PSEUDO-INVERSE TO THE CASE OF ELLIPTIC NORMS. 13. Let S and I' be positive definite matrices of orders m and 71, and let the last two Penrose conditions be replaced by 3'. SAX is Hermitian, 4'. X AT is Hermitian. Show that AX is the projection onto R(A) that is orthogonal with respect to the inner product (x, y)s = yHSx. What is X A? The matrix X == A(S,T) is called an ELLIPTIC or WEIGHTED PSEUDO-INVERSE. 14. Show that the most general solution of the problem of minimizing lib _ Axlls is x = A(S,T)b + (I - A(S,T) A)z, where z is arbitrary. In particular A(S,T)b has minimal T-norm. 15. Show that if A is of full column rank, then A(S,T) = (AHSA)-IAHS. In this case we write A(S.T) = AU,). 16. (Paige [171]) . Let W be positive definite and let W = LLII. Show that the problem of minimizing lib - Axllw-t is equivalent to the problem minimize IIvll2, subject to b = A;r + Lv. The importance of this result is that it works when W is singular: simply let W = LL H be a full rank factorization of W. -0- WHEN S AND I' ARE DIAGONAL THE ELLIPTIC PSEUDO-INVERSES ARE CALLED SCALED PSEUDO-INVERSES. IN THE FOLLOWING EXERCISES WE WILL ASSUME THAT A IS OF FULL RANK AND DE D+, THE SET OF ALL DIAGONAL MATRICES WITH POSITIVE DIAGONAL ELEMENTS. 
112 111. LINEAR SYSTEMS AND LEAST SQUARES 17. (Stewart [216]). Let ,f' = {:r E R(A) : 11:1:11 = l} and y = {y: 3D E V+ such that yTDy = O}. LPI, p = inf lIy - :rll. yEY xE'\' Show that sup II AA (D) 1\2 ::; p DED+ and sup IIA(D)II::; pllAtli. DED+ 18. (Stewart [2IG]). For any matrix X let inf+(X) be the smallest nOlzero . I I f X L n t the columns of U form an orthonormal basIs for smgu ar va ue 0 ." ' c R(A). Lct p = min inf+ (Uj), whcrc Uj denotes any submat.rix formed from a set of rows of U. P? p. 19. (O'Leary [IG5]). Show that Show that p ::; /i" 20. Devise an example to show that the above results do not hold when D ranges over the space of positive definite matrices. -0- TilE FOLLOWING EXERCISES TREAT THE THE LEAST SQUARES PROBLEM WITH LINEAR EQUALITY CONSTRAINTS: minimize IIb 2 - A 2 xlb (1.8) subject to Alx = b l . 21. Let Al have full row rank and A 2 have full column rank. Let _ ( TAl ) AT - A 2 and b = ( :: ) . 1. THE PSEUDO-INVERSE AND LEAST SQUARES 113 Show that x = lim Atb T00 is the solution of (1.8). [Notc: This result suggests that we can solvc a constrained least squares problem by taking T large enough and solving an ordinary least squares problem. In essence this is true, but precautions must be taken against the effect of rounding errors. See [249] for more.] 22. (Wedin [262]). Show that if Al has full row rank, then the the con- strained problem (1.8) has a unique solution which satisfies ( A\' o I AI : )( ) (:J where I! is a vector and ( r) (::) - ( : ) x. -0- THE FOLLOWING EXERCISES DEVELOP THE DRAZIN GENERAL- IZED INVERSE [61]. DRAZIN ORIGINALLY DEFINED HIS INVERSE FOR ELEMENTS OF RINGS AND SEMIGROUPS. HERE WE AP- PROACH TilE DRAZIN INVERSE TlInOUGIi TilE JORDAN CANON- ICAL FORM . 23. Let the Jordan form of A be written A = XI.h l (AdYlI + X2Jk2(A2)y]I +... + X/Jk/(AdYl H (1.g) (see the subsection On invariant subspaces in Section 1.3). Define .h(A)# = { O h(A)-1 if A i 0 if A = 0 and A# = Xdkl().d#y I H + X2Jk2(A2)#Y2H +... + X/Jk/()./)#1/H. Show that A # is the unique matrix satisfying 
II,} Ill. LINEA It SYSTEMS AND LEAST SqUAIUS 1. A# A = AA#, 2. A# = (A#)2 A, .. I tl t A m - Am+1 A# 3. there IS an mteger 111, sue I la - . The matrix A# is called the DRAZIN GENERALIZED INVEHSE of A. 24. Let A have the Jordan form (1.9) and let A be an eigenvalue of A. The invariant subs pace corresponding to A is the space EB R(X j ). .\,=.\ Show that F.\ = I - (AI - A)(AI - A)# is a (generally oblique) p.rojecion onto the invariant subspace associated with A. What subspace does It project along? The matrix P.\ is called the SPECTRAL PROJECTION of A. -0- 2. Inverses and Linear Systems In this section we will be concerned with two related problems. Let A E e nxn be nonsingular and let A = A + E be a perturbtion of A. The first problem is to determine under what conditions A is nonsingular and to bound a norm of AI - A-I. The second problem is to derive bounds on some norm of x - x, where x is a solution of the linear system Ax = b and .i is the solution of Ai: = b. The close relation between the two problems is due to the fact that x - x = (A- I - A-I)b. In the statement of these problems, note the use of a tilde to denote a perturbed quantity. We will use this notational convention throughout the book: A symbol with a tilde over it always denotes a perturbed quantity. The unperturbed quantity is denoted by the same symbol without a tilde. 2. INVERSES AND LINEAR SYSTEMS 115 Here we must distinguish between primary perturbations and de- rived perturbations. An example of the former is .4, which is obtained from A by the addition of an explicit perturbation E. The latter is represented by x, which is not explicitly perturbed but depends on the primary perturbation E. In perturbation analysis our goal is usually to obtain bounds on derived perturbations in terms of an explicit per- turbation; e.g., to bound some norm of x - x in terms of some norm of E. Generally, we will represent explicit perturbations by the letters E, F, G, H or their lower-case and Greek analogues. Usually, but not always, these letters will carry the implication that the perturbation is in some sense small. 2.1. Absolute and Relative Errors Before we can proceed to the perturbation theory of linear systems, we must first discuss how errors are to be represented. For scalars there are two representations in general use. Definition 2.1. Let 0:, ii E e. The ABSOLUTE ERROR or simply the ERROR in ii regarded as an approximation to 0: is the number ae(o:, ii) = Iii - 0:1. If 0: =I 0, then the RELATIVE ERROR in ii is _ Iii - 0:1 re(o:,o:)= 10:1 . The absolute and relative errors have a number of elementary prop- erties, which we list here without proof. Theorem 2.2. In the notation of Definition 2.1: 1. There is a number f with iEl = ae( 0:, ii) such that ii = 0: + f. 2. There is a number p with Ipl = re(o:,ii) such that ii = 0:(1 + p). 3. Ifre(o:,ii) < 1, then re(o:, ii) re(o:, d') ( _ ) ::; re(ii, 0:) ::; 1 + re 0:, a 1 - re(o:, ii). 
116 III. LINEAR SYSTEMS AND LEAST SQUARES 1. If re( n, a) = 10- 1 , then a and a agree to approximately t decimal digits" The first item in the theorem says that if the absolute error is sm.all, then a may be obtained from a by adding a number nea zero. Slln- ilarly if t.he rdat.ive error is slllall, t,lJ('1l (V lIIay be obtamcd f[(.)n n by ,;\;lltiplying hy a !l\lIllIH'r !lear (Jnt'" The t.hird itt'lIl S:WS t.hat. I t.he relative error is small, thell for practical purposes re(n, n) (nd re(n,.n) are the same. The last item gives a quick way of estimatmg .relat.lve error; e.g., we see from it that 3.14159, regarded as an approxllnatlOn to Jr, has a relative error of about 10- 6 . . It is natural to attempt to generalize absolute and relatIve error to vectors and matrices by replacing the absolute value by a norm. Tle result is the following defillition, which, however, is not without Its difficulties. - I II C ",xn D fi . t ' 2 3 L ot A A E c",xn and let I. be a norm on . e nl Ion .. " , _ The ABSOLUTE ERROR or simply the ERROR in A regarded as an approx- imation to A is the number ae(A, A) = IIA - All. If A =I 0, then the RELATIVE ERROR in A is - IIA-AII re(A, A) = IIAII . The following theorem list the analogues of the items in Theo- H'm 2.2. Theorem 2.4. In the notation of Definition 2.3: 1. There is a matrix E withllEIl = ae(A, A) such that A = A -I- E. 2. If II . II is consistent ,!:nd there is a matrix R such that A = A(I -I- R), then re(A, A) ::; IIRII. 3. Ifre(A, A) < 1, then re ( A A) - re(A, A) , -  rc(A, A)  - . 1 -I- re(A, A) 1 - re(A, A) 2. INVERSES AND LINEAR SYSTEMS 117 The difficulty with the definitions 2.3 is that they attempt to char- acterize the errors in the mn elements of A by a sillgle number. Some information has to be lost in the process, and that information may be important. This is illustrated by comparing the second items in Theorems 2.2 and 2.4. For scalars, the statement p = re( a, tV) means that a is obt.ailled from () hy lIIult.iplyillg by a lIumher within p of one. For matrices, we would like to say that p = re(A, A) means that A = A(I -I- R), where IIRII  p. However, if A is singular such an R may not exist, and if A is nonsingular the most we can say about R is tha IIRII ::; p1lAIlIIA-11I (see Exercise 2.1). Thus to report the fact thai, A = A(I -I- R) by saying that the relative error in A is IIRII is to give away information. The absence of a fourth item from Theorem 2.4 illustrates another loss of information. For example, suppose that with respect to the oo-nonn we have reCXJ(x, x) = 10- 1 . Then we know that the largest components of x and x agree to roughly t decimal digits. But the best thing we can say about the smaller components is that if Iil = lOkllxIlCXJ' then Id andld agree to about t - k significant figures. In particular, if k > t then the relative error says nothing at all about the accuracy of i' As we shall see, structured perturbation theorems, such as Theo- rem 2.14, provide partial relief from these problems, as do component- wise bounds (Theorem 2.12). Nonetheless, we will cast many of our results in terms of absolute and relative errors of vectors and matri- ces. In the first place, bounds of this form do say something about the larger components. Moreover, the use of absolute and relative errors gives perturbation bounds a simplicity that makes them easier to inter- pret. Finally, in applications we can often scale the problem so that the components of the quantity being bounded are approximately equal, in which case a bound on the relative error is as good as bounds on the individual components. 2.2. The Inverse Matrix The fundamental results on matrix inverses are contained in the follow- ing theorem. 
118 III. LINEAR SYSTEMS AND LEAST SQUARES Theorem 2.5. Let A E cnx1t be nonsingular and let A = A + E be a perturbation of A. Let II . II be a consistent matrix norm. If A is nonsingular, then IIA- I _- A-III < IIA- I Ell. IIA-III - (2.1) Alternatively, if IIA-IEIl < 1, then A is perforce nonsingular and - IliA-III IIA- II  l-IIAIEII ' (2.2) l\Joreover IIA- I - AIII IIAI Ell < IIA-III - l-IIA-IEII (2.3) Remark 2.6. The theorem remains valid if A-I E is everywhere re- placed by EA- I . Proof. Since A is nonsingular, AA- I = (A + E)A- 1 A-I E)A-I = AI. Hence A -I _ Al = -A- I EAl , I, or (I + (2.4) and (2.1) follows on taking norms in (2.4). If IIA- I Ell < 1, the spectral radius of A-I E is less than one and I + AI E is nonsingular. From A = A(I + A-I E) it follows that A is nonsingular. Moreover, from (2.4) we also have II A - ' II  IIA-'II + IIA- I EIlIIA- I II, and (2.2) follows on solving this inequality for IIA-III. Finally, (2.2) follows from (2.1) and the third item in Theorem 2.4. . This theorem establishes four things: 1. a bound on re( A  I, A -I) which holds whenever A is nonsingular; 2. a condition under which A is nonsingular; 3. a bound on A-I; 4. a bound on re(Al, A-I). All these are cast in terms of the number IIA- I Ell (or, by Remark 2.6, the number ilEA-III). However, in many applications we will not know E explicitly, only an estimate of II Ell. The following corollary, which also defines the condition number of a matrix, answers to these applications. It is proved by replacing IIA- I EII by the upper bound IIAlIII1EII in the conclusions of Theorem 2.5. 2. INVERSES AND LINEAR SYSTEMS 119 Corollary 2.7. Let K(A) = IIAIlIIA-IIi be the CONDITION NUMBER of A. If A is nonsingl1lar, then IIA-l - AIII II Ell IIA-III  K(A) IIAII ' 1£ in addition K ( A ) II Ell . 1 IIAII <: , then A is perforce nonsingl1lar and IIA-III < IIA-III - 1- K(A) IIEII ' IIAII (2.5) Moreover _ IIEII IIA-I - A-III < K(A)lTAIT IIA-JII - IIEII' 1 - K(A) IIAII It is insruc.tive to look at the inequality (2.6) in greater detail. The lt;hand sIde IS the relati:,e er.ror (with respect to the norm II . II) of  regaded as an approxllnatlOn to A -I. Its bound on the right- hand i! sde consIsts of two factors. The first is the relative error IIEII/IIAII in .. A regarded as an approximation to A. The second is the quotient (2.6) K(A) 1- K(A) II Ell . II All If K(A) is much .less than one, as it will be when the error E is small enoug, t e denommator has negligible effect, and the second factor is essentIally K(A). Thus, the inequality (2.6) says that a relative error in the matrix A may be magnified by as much as K(A) in its inverse. 
120 HI. LINEAR SYSTEMS AND LEAST SQUARES The word "magnify" is appropriate here because 1 :S 11111 = IIAA-III :S IIAIIIIAIII = K(A). It is difficult to overstate the insight that one gets from this bound. Its preCllfsor, the inequality (2.3), suggests that the inverse of a ma- trix will be WELL CONDITIONED ..,. that is, insensitive to perturbations- provided its inversE' is sufficiently small. But it does not say what "suf- ficiently small" is. TlH' bound (2.G), on the other hand, makes it clear that a well conditioned matrix is one with a small condition number. Anel small is well defined, since the condition number is bounded below by unity. For example, to the extent that Item 4 in Theorem 2.2 holds for relative errors in matrices, we have the following rule of thumb: If a matrix has a condition number of 10 k and its elements are perturbed in their t-th digits, then the elements of its inverse will be perturbed in their (t - k)-th digits. People often say that ill-conditioned matrices are nearly singular. The following theorem gives substance to this way of speaking. Theorem 2.8. Let A be a nonsingular matrix. Let K(A) = IIAIlIIA-IIi be the condition number of A with respect to a consistent matrix norm II . II. Then for any matrix E, A + E is singular => IIEII > KI(A). IIAII - (2.7) Moreover, if the norm 11.11 is subordinate to a vector norm (also written 11'11), then there is a matrix F; with IIEII/IIAII = KI(A) such that A+E is singular. Proof. If A -+ F is singular, thf'1l hy TheorC'1ll 2"5 1 :S IIA- I Ell  IIA-11IIIEII, which with a little manipulation is seen to be equivalent to the impli- cand in (2.7). Now suppose that II . II is subordinate to the vector norm II . II, and let x be a vector of norm one such that IIA-Ixll = IIA-Ili. Let y = A-lx/IIA-III, so that x Ay = IIA-III ' 2. INVERSES AND LINEAR SYSTEMS 121 Let II . II. be the norm dual to the vector norm /I . II, and let z be a vector such that II z /I. = 1 and zHy = max lu"yl. 111111.=1 It follows from Theorem 11.1.12 that z"y = lIyll = 1. Let xz ll E=- /lA-III ' Then H (A -+ E)y = Ay - 1II/l Y = IIAIII - IIAIII = 0, so that A + E is singular. The theorem will be proved if we can show that IIEII = /lA-Ill-I. But II E II - /I(xzH)vll _ /lxll II -1 -I - Ii IIAIII - IIAI /I li)i Iz vi = IIxllllzll.IIA II , and the result follows from the fact that x and z have norm one. . The first part of this theorem states that to make a matrix A sin- gular we must introduce a relative perturbation of at least K-I(A). Thus, well-conditioned matrices are not nearly singular. The seconcl part says that for a broad class of norms, which includes the Holder norms, we can make A singular by introducing a relative perturbation of K-I(A). In these norms, the larger the condition number, the nearer to singularity is the matrix. In general, the condition number is not easy to characterize. How- ever, for the spectral norm, the condition number can be expressed in terms of the singular values. The proof of the following theorem is left as an exercise. Theorem 2.9. Let A have singular values 0"1  0"2 > ...  O"n > O. Then 0"1 K2(A) = -. 0"" 
122 III. LINEAR SYSTEMS AND LEAST SQUARES As modern numerical linear algebra began to develop in the late 1940s, positive definite matrices were very much to the fore. Since the singular values and the eigenvalues of a positive definite matrix are the same, the condition number was sometimes defined as the ratio of the largest to the smallest eigenvalue. Certainly if this ratio is large, the ma- trix must be ill conditioned (see Exercise 2A). However, the following example shows that it can fail completely to inclicate ill-conditioning. Example 2.10. Let An be the matrix illustrated below for n = 5: A5 = 1 -1 o 1 o 0 o 0 o 0 -1 -1 -1 -1 1 -1 -1 -1 -1 1 -1 o 1 o 0 The eigcnvalues of An are all one; hence the ratio of the largest to tile smallest eigenvalue is one. On the other hand it is easily seen from the equation 1 o o o -2- 3 -1 -1 -1 -1 1 -1 -1 -1 o 1 -1 -1 o 0 1-1 000 1 1 ! 2 I 4 I 8 I 8 =0 tlUlt replacing the (n, I)-element of An by _22-n makes A exactly singu- lar. Since IIAnlb  1, it follows from Theorem 2.8 that K2(A n )  2n2. This example also shows that the determinant is not a good measure of singularity, since the determinant of An is always one. Corollary 2.7 shows that the condition number will never underes- timate the sensitivity of a matrix to perturbations. Theorem 2.8 shows that for subordinate norms the condition number is sharp in that it truly estimates the distance to singularity. Nonetheless, when the con- dition number is used in practice, it often overestimates the actual error. The phenomenon is known as ARTIFICIAL ILL-CONDITIONING, and it is instructive to see how it cOllies about. 2. INVERSES AND LINEAR SYSTEMS 123 Consider the matrix A"  (: _) TJ > o. Its inverse is A-I =  ( 1 'I 2 TJ-I From this it is seen that if TJ < 1 1 _TJ-l ). 1 Koo(A,,) = 1 + -, 1] and A" becomes increasingly ill-conditioned as TJ approaches zero. Now .in the sense of being nearly singular, A" is truly ill conditioned when TJ IS small, for we can construct a small perturbation that will render A singular. In fact if E"  ( -), then IIE"lIoo = TJ and A + E is singular. However, in applications we do . not construct the error in A; rather it is given as part of the problem. For e.xample, Suppose that TJ represents a column scaling of an original matnx Al (extreme scaling can result from changes of units; e.g., years to seconds). This matrix is presumed to have an error E in it - say its elements are bounded by E, where E is small. Now when Al is scaled so that it becomes A,p E inherits that scaling, so that its elements are bounded by the elements of the matrix (2.8) ( E TJE ) . E TJE This says that a perturbation like E'I in (2.8) cannot occur. In other words, although A'I becomes increasingly ill conditioned, the nature of the delying problem insures that perturbations exhibiting this ill- condltlOlll.ng are forbidden. It is this restriction on the range of the perturbatIOns that makes the ill-conditioning artificial. 
124 III. LINEAR SYSTEMS AND LEAST SQUARES It should be stressed that artificial ill-conditioning in no way rep- resents a failure of our perturbation theorems: they were designed to predict the behavior of the inverse matrix under arbitrary perturba- tions, and they handle the worst cases well. It does, however, represent a failme of our perturbation theory, which gives no way of incorporat- illg t.lH' st.rllct.ure of all e!Tor into the it.s hounds. III the next subsection we will see t.hat. (,olllpolle'nt.wise bounds can alkviate the sit.uation. l3ut they are not a cure-all, and at the present state of our knowledge we must cope by using ad-hoc methods, usually some form of rescaling. The practitioner who encounters a large condition number should ex- amine his data for artificial ill-conditioning before concluding that his results are inaccurate. 2.3. Linear Systems III this subsection we will be concerned with pertmbations of the system Ax = b. Since the solution of the perturbed system Ai = b satisfies i-x = (AI - A-I)b, we immediately obtain from (2.3) 11 _ _ II < IIAIEIlIIA-Ilillbll x x-I _ IIA-I Ell . (2.9) However, this bound illustrates one of the ironies of matrix perturbation theory: a general result often does not give the best bound when applied in a special case. In particular the factor IIA IIIII bll may be replaced by IIxll, as the following theorem shows. Theorem 2.11. Let A be nonsingul"r and let A = A + E be a pertur- /Jil/.ion of A. For Ii E e" let, Ax = b. Let II . II be a consistent matrix norm that is also consistent with the vector norm II . II. If there is a vector i such that Ai = b, (2.10) then IIi - :1:11 < IIAl Ell. Ilill - 2. INVERSES AND LINEAR SYSTEMS 125 If in addition IIA- 1 Ell < 1, then (2.10) always has a unique solution, which satisfies lIi-xll < IIA-IEII 1I:r.1I - 1 -IiA-lEII' UK(A) = IIAIlIIA-IIi is the conditionl1l1111lJer of A and K A IIEII ( ) IIAII < 1, then K(A) II Ell IIAII 1 - K(A) IIEII . IIAII Prof. The proof is mutatis mutandis the same as that of Theorem 2.5 and Its corollary. The key is the relation i + A-I Ei = x. . Most of the observations to be made about this theorelll have al- read .been made in the preceding subsection. Here, as there, the '" con?lton number determines the relative perturbation of the solution. . rtJficIa.1 ill-conditioning can occur with linear systems, just as with .nverses. Perhaps the most interesting feature of the bounds is their , mdependence of the right-hand side Ii of the linear system. , One way of dealing with artificial ill-conditioning is to examine the effects of special perturbations on individual components. We will ill us- ,?rate Yle technique for perturbations in a single column of the matrix of a Imear system. :<t IIi - xii 1f:S; Let A be nonsingular, and let A = A + E, where (2.11) Then I i I II (-1) - <,i - i :s; a i 1I.lIellljl, (2.12) ""I . ..... h (-1)11. " ,{were ai IS the zth row of A-I and II . II and II . II. are dual. 
12G III. LINEAR SYSTEMS AND LEAST SQUARES Proof. From the relation x - x = -A-lEx, we have upon multiplying by iT and substituting the right-hand side of (2.11) for E - (I)II - i - i = -Oi e.i' 'I'll(' t.hCOr<'llI now follows on t.akinF; norms. . The advantage of this result over more general bounds is that it is invariant under column scaling. For example, if we replace A by AD, where D = diag( 6 1 ,6 2 , . . . ,6 n ) is nonsingular, then we must also make the following substitutions in (2.12): L it-6i-li, 2. .i t- {;;I.i' 3. e t- 6.ie, (-1)11 ,-I (-1)11 4. 0i t- (Ii 0i . The effect of these substitutions is simply to multiply both sides of (2.12) by 6;1, which does not represent a change. in tle boud. Let us now turn to the problem of perturbatIOns m the nght-hnd side of the system Ax = b. In order to describe what s actuall gomg on, it is necessary to introduce some additional notatIOn. SpecIfically, let IIAllllxl1 17 = IIbll Since IIbll  IIAllllxl1. we have 17 ;::: L On the other hand, since IIxll  IIA- I lillbll, wc hav( 17  K(/l). Whcn1/ is near K(A), we say t.hat t.he solution of the system Al; = b REFLECTS TilE CONDITION of A. In oth:r words, when 1/ is near K(A), the size of the solution x is nearly as bIg as we might predict on the basis of the condition number alone. With these preliminaries, we can state the following theorem. Theorem 2.13. Let A E c nxn be nonsingu1ar. For b =I 0, let Ax = b, and let Ai: = b + e. Then (2.13) IIi: - xii K(A) Ilell <--. IIxll - 17 IIbll (2.14) 2. INVERSES AND LINEAR SYSTEMS 127 Proof. We have i: - x = A -Ie, from which !I:r - xII  IIA-Ililieli. The result now follows on dividing by IIxll = 17l1 b ll/IIAII. . The left-hand side of the bound (2.14) is the relative error in x. The factor lIeli/llbll is the relative error in b. The factor K,(A)/1/ is the condition mllnber of the problem. Whatever the condition of the matrix A, if x reflects that condition, then x is insensitive to perturbations in b, whatever they may be. This is in contrast to the sensitivity of x to perturbations in A, where it is K(A) alone that predicts the worst case. Theorem 2.12 shows that by manipulating the relation i: - x = - A -1 Ei: we can obtain bounds that to some extent circumvent the problems of artificial ill-conditioning. The focus there was the compo- nents of x. The following STRUCTURED PERTURBATION THEOREM homes in on the components of A. Theorem 2.14 (Bauer, Skeel). Let A be nonsingu1ar, Let Ax = b, and (A + E)i: = b + e. Let 11.11 be an absolute vector norm, and let 11.11 also denote a consistent matdx norm (e.g., the subordh]ate operator norm). If for some nonnegative S, s, and ( lEI  (S and lei  (3 and in addition (IliA-II SII < 1, then IIi: - xII < (1IIA-II(Slxl + s)1I - 1-(IIIAIISIl . Proof. From the identity i-x = A-lEx + A-Ie + A-IE(i - x), it follows that Ix - xl  (jA-IISlxl + (lA-lis + (IA-IISIi: - xl. The theorem now follows on taking norms and solving for II x - x II. . 
128 III. LINEAR SYSTEMS AND LEAST SQUARES The point of this theorem is that that it gives us control over the form of the errors. By choosing the STRUCTURE MATRICES Sand s appropriately we may model the behavior of the error. For example, if we wish to consider only errors in the matrix A, we may set s = O. Again, if we wish to consider only relative errors in the elements of A, we may take'S = IAI. Tlwse substitutions and a little manipulation yield the following corollary. eorollary 2.15. Let e = 0 and lEI  EIAI. Set KBS(A) = IIIAIIIAIII. If EKBs(A) < 1, then IIx - xii < EKBs(A) Ilxll - 1 - EKBs(A)' (2.15) The number Kns(A) is the BAUERnSKEEL CONDITION NUMBER of A. It has the property that it is independent of row scaling, which therefore cannot be a source of artificial ill-conditioning in (2.15). Unfortunately, the bound can be quite sensitive to column scaling. We conclude this subsection with the topic of RESIDUAL BOUNDS. . Generally speaking, most problems in matrix computations can be cast in the form of solving an equation r(x) = O. For example, r(x) = b-Ax for the linear system Ax = b. The result of the computation will not be the exact solution but an approximate solution X, usually one for which the RESIDUAL r(x) is small. The problem of residual bounds is to construct a bound on x - x in terms of r(x). There are many possible solutions to this problem; but one- the METHOD OF BACKWARD PERTURBATIONS - is particularly fruitful. The technique is to show that the computed solution is the exact solution of a problem with slightly perturbed data and to bound the perturbation. Conventional perturbation theory can be then used to determine the accuracy of the purported solution X. This technique is illustrated by the following theorem and its proof. Theorem 2.16 (Rigal-Gaches). Let A E e nxn . Let II . II denote a vector norm and its subordinate matrix norm. For any x "I 0, let r = b - Ax. Then there is a matrix E satisfying 11EII = tl lIill 2. INVERSES AND LINEAR SYSTEMS 129 such that (A + E)x = b, and E is the smallest matrix with this prop er (y. Hence, if A is lJonsilJ- gular and Ax = b, we have II.T - xII IIrll II xII  K(A) IIAllllxll ' (2.16) Proof. Let 11.11. be the norm dual to 11.11. Let y be a vector satisfying lIyll. = 1 and yHx = Ilxll. Set ryH E=- IIxll' (2.17) Then ryHx b - (A + E)x = r - Ex = r - - = r - r - 0 IIxll - . The norm of E is IIEII = max 111'y Hz II = M max Hz = tJl IIzll=1 IIxll IIxllllzll=lly I IIxll' he Ist inequality following from the fact that lIyll. = 1. Moreover, If E IS any matrix satisfying (A + E)x = b, then Ex = r. Hence IIEllllxll 2: IIrll or IIEII 2: IIrll/llxll. The bound (2.16) follows directly from Theorem 2.11. . Provided we ar willing to sacrifice optimality, here is nothing sa- cred about the chOIce (2.17) of E. For example, if i "I 0, the choice T'lT E= lIi1dl is als a backward perturbation for the problem. It has the property that It concentrates the error in the ith column of A. Other choices migt plae te error entirely on nonzero elements of A, an important consIderatIOn 111 dealing with sparse matrices. There is a structured backward perturbation thearem, the analogue of Theorem (2.1G) in the spirit of the Bauer-Skeel theorem. 
130 III. LINEAR SYSTEMS AND LEAST SQUARES Theorcm 2.17 (Oettli-Prager). Let r nonnegative and set b - Ax. Let Sand s be Ip;\ f = Inax \ _ 1 ) ; (S.T +s; (here % = 0 and otherwise p/O = 00). Iff "I 00, and a vector f' with (2.18) there is a matrix E lEI :S fS and lel:S fS (2.19) such that (A + E)x = b + e. (2.20) 1\1oreover, f is the smallest number for which sllch matrices exist. Proof. From (2"18) we havp Ipil :S f(Slxl + sk This in tum implies that r = D(Slil + s), where IDI ::; d. It is then easily verified that E = DSdiag(sign(I),'" ,sign(n)) and e = -Ds are the required backwards perturbations. On the other hand, given perturbations E and e satisfying (2.19) and (2.20) for some f, we have Irl = Ib - Axl = lEi - el :S f(Slxl + s). Hence f  Ip;\/(S;r + s);, which shows that the f defined by (2.18) is optimal. . 2.4. Asymptotic Forms and Derivatives Throughout this section we have made use of the formula AI = A-I - A-I EAI. The main drawback to this formula is the presence of A on. both sides of the equality sign. There is little we can do about thIs wIthout passing to infinite series, e.g., Al = A-I _ (A- I E)A- 1 + (A- I E)2 A-I - ..., 2. INVERSES AND LINEAR SYSTEMS 131 or inverses, e"g., A-I = (I - A-1E)-IA- 1 , either of which destroys the simplicity of the relation. Howeve'r, if we are willing to make do with an approximation, we can write A-I 3" A-I - A-lEA-I. Since A -I is a differentiable function of its elements, this FIRST ORDER APPROXIMATION is accurate up to terms of order IJEIJ2. First order approximations occupy an important place in perturba- tion theory. They furnish computable approximations to the perturbed objects; in fact, in many applications the term "perturbation theory" amounts to little more than the constructioll of first and higher order approximations. Moreover, first order approximations are often eas- ier to work with than their exact equivalents. The following example, which requires a smattering of probability theory, illustrates this point. Example 2.18. Let us suppose that the elements of E are independent random variables with mean zero and standard deviation a. Then IIA- I - A -IIJF is a random variable, whose distribution gives infonnaUon about the sensitivity of A -I to perturbations in A. Unfortunately, its distribution is not tractable. However, we can easily compute the number E(IIA- 1 EAII) = uIlA- J II, which is the root mean square of the Frobenius norm of the linearized error. This number is analogous to our error bounds, being proportional "to an error term u and the square of the inverse. However, unlike our : bounds, it is an exact equality, so that if we can ignore second order 'terms and higher, it is a better estimate of the actual error. . An important application of first order approximations is to calcu- :.)ate derivatives. For example, to compute 8A- I /8O:;j note that when E = El i 1J the matrix A is just A with its (i,j)-element perturbed by .i'£. Hence for this choice of E, -"i 8A-I A-I _ A-I -=lim 8O:;j .o f 
132 III. LINEAR SYSTEMS AND LEAST SQUARES A I - A l = -AI EA + 0((2), we have Since -- 1 T A -1 oA- 1 . -fA- IiI} _ i (I)H - = llIn - -O(I)OJ oai) fO ( 1 I (-1)11. 'ts 7 'th row. I " i is the ith column of A am OJ IS Ie, ' WWI(' (l(_I) " ' . f I r .., t'III AI'=/i In Si1l1ilnr n'slIlt,s hold for t]H' sol1l1 Ion 0 t.)P 11IP,\I sys ,( ." " t.lip not.atioll of t]H' last. subspct.ion, x  x - A-I Ex. Moreover that is, ax i - = ja(_l)' ail the derivative is proportional to the ith column of A -I. Notes and References I r t s finds applications in The perturbation of matrix inverses am mea sys eme e I h S . the theor y is simple, exposItors tend to deve op as uc many areas. Illce . d W t beglll to or as little as they need in the notation of theIr fiel s. e cannot t Ie st e tl " bo I of literature but to see how the theory appears 0 a ae survey liS ( Y , " . 2 3 two numerical optimizers see [166, SectIOn .]. . t I h re as it is found in the numerical analysis litera- The theory IS presen ,e( e, e e. . t Ie of rounding- ture where it has become highly polished. The reason IS  s Y I : h 1.1 ffects ' , I . - BA CKWARD ROUNDING-ERROR ANALYSIS --111 W lIC Ie e e error ana YSIS  l' 't tl most (;f rounding error are thrown back on the original ata.. o. CI e. Ie d;o I 'f Gaussian elimination with partIal plvotmg IS use famous exmp e, I t A _ b then under certain conditions, which need solve the lmear sys em x - , " . _ . fi ( A + E)i = b where not COIlCl'rll us hl'rl', I,he computed solutlOn.x satl les" ." aluate IIEII/II1'111 is a m(Hll'st multiple of the I'Olllldmg UIlI!.. It: leUHlms t e the effect of this error on the solution, and this necessity h.as ma e lumer- ical nists keen perturbation theorists. An unfortunate te ffet s I: the outsider must search a desert of rounding-error ana YSIS 0 n , t e f I > e rturbation theory. For historical comments and references on nugge so, , '. the subject of rounding error, see [270, 113]. . Actually, a bound in the classic style. was developed earlier by WIttmeyer [274, 1936], who showed (in our notatIOn) that _ IIA-Ilbllel12 + IlbIl21IA-lllIIEI12 , 11:1' - :r1l2 = 1 -II1'1-11121IEI12 2. INVERSES AND LINEAR SYSTEMS 133 which is equivalent to (2.9) extended to account for errors in the right- hand side. It is not clear what influence this paper exerted, since it is only sporadically referenced (e.g., in [240, 121] but not in [269].) The notion of condition number was introduced by Tllfing [242, 194 8 ] to quantify "the expressioll 'ill-condition' [which] is sometimes used merely as a term of abuse appli<:ab]( to Illatri('(s or pqual.iolls, " . . ," Thl' fad that the condition numbm' appears ill the error boullds lead to attempts to find the row alld column scaling of A that gives the smallest condition number, of which the most penetrating investigation was given by Bauer [13, 19 G 3]. Not much is heard about the topic now, partly because of a better understanding of what the condition number actually means and partly because of a re- markable theorem of van del' Sluis [244, 19G9], which says that the condition number of a matrix is nearly optimal in the spectral norm if its columns have the same 2-norm. The present discussion of artificial ill-conditionillg owes a great deal to Wilkinson's COllUnon-sense approach to the subject. [269, Ch.2, pp.192-193]. Kahan [129] attributes Theorem 2.8 to Gastincl but cites 110 reference" The .. connection between singularity and condition is not an accident. Kahan ,; [131, 1972] shows that for a number of problems the condition is related to  the distance from degeneracy. Denuuel [55, 1987] gives a uniform treatment ,"; of this phenomenon via differential inequalities. k' , ,The condition number K( A) has the drawback that it depends on A -I, which " seldom needs to be calculated in application. This has given rise to CONDI- '; TION ESTIMATORS, which attempt to approximate some norm of A -I from a ' factorization of A. The first of these was suggested by Gragg and Stewart  [96, 197 6 ], and an improved version [45, 1979] was incorporated into the \:LlNPACK codes [59, ]979]. Since t.hen there have been many variatiolls and "improvements in the technique. See [111] for a survey. The approach to perturbation t.heory through structured errors, as in The- orem 2.14, is due to Bauer [14, 1966] in the forward sense and Oettli and ,lager [164, 19 6 4] in t.he backward sense. Skeel [197, 1979] combined this approach with rounding-error analysis to arrive at important and original ?bservations on the numerical solution of linear systems. The llI11nber KBS *'sometimes called the Skeel condition number, but it was introduced by "auer. A variant of Theorem 2.14 for matrix inverses was first established Bauer [14]. The specialization to linear systems is due to Skeel [197], and e present statement is a variant of one by Higham [113]. "kward perturbation theory ill the style of Higal and Gauches [ISri, ]9°7] 
134 III. LINEAR SYSTEMS AND LEAST SQUARES has important applications to quasi-Newton methods for the IluIIIerical so- lution of nonlinear equations and optimization problems, where the per- turbation is used to update approximate Jacobians and Hessians (e.g., see [57]). For the case of Hessians, where the update must be symmetric, see Exercise 2"10. The use of first order expansions to determine the properties of functions of random variables goes back at least to Gauss [84, 1821, Section 16]. For a systematic application of the idea to matrix perturbation theory, see [215]. Exercises UNLESS OTHERWISE STATED, A IS A NONSINGULAR MATRIX OF ORDER 11. TilE SYMBOL II . II DENOTES A VECTOR NORM AND ITS SUBORDINATE MATRIX NORM. 1. Show that if re(A, A) = p, then there exists a matrix R satisfying IIRII  pK( A) such that A = A(I + R). Show that this is the best possible result. 2. Giv<, an example of a matrix X such that III - XAII/IIAIIIIXII is small while III - AXII/IIAIIIIXII is large (i.e", a matrix that is a good left illverse but a poor right inv(rsc") 3. Show that for any Holder norm, if re(x, x)  p then the relative error in any component i "I 0 of x is bounded by pllxll/IJ 4. Show that. for any consistent norm K(A) :0:: IAmax(A)I , I Amin (A) I where AnHL,,(A) is the eigenvalue of A of greatest absolute value and Amin(A) is the eigenvalue of least absolute value. Show that. for the spect.ral norm equalit.y is attained whenever A is normal. 5. What. is K2(O.1I,,)? What is det(O.lI,,)? 6" Show that t.he inverse of t.he matrix An in Example 2.10 has the form 2. INVERSES AND LINEAR SYSTEMS 135 illustrated below for 11 = 5: 1 1 248 01124 00112 00011 o 0 001 Generalize and derive an upper bound for K (A ) . f . n III your avonte norm. 7. Let. Bn be the matrix illust.rated below for n = 5: 1 1 1 1 1 0 1 1 1 1 B5 = 0 0 1 1 1 0 0 0 1 1 0 0 0 0 1 What is B;;I? 8. It is sometimes objected that the matrix in Example 2 10 is not I d properly. If we normalize the columns so that. the y I 2' c sca.e a matrix A 1,1 d (A - ) r-; lave -norm one, to gIve C '" Ie et n = 1 I v n!, which reveals the ill-conditionin g of A omment. >. 9. (van del' Sluis [244 ] ) L I, D b 1 . P . t ' d . . e + e I, Ie set of all dIagonal matrices with OSI Ive mgonal elements. Let K2,opt(A) = inf K2 ( AD ) DEV+ ' and let A = A diag(lIaI1l2"", Ila n Il2)-1 (i.e., A scaled so that its columns have 2-norm one). Show that 4A)  vnK2,oPt(A). ..10. (Dennis and More [56 ] ) L I, A b . tthat . e e symmetnc and let r = b - Ax. Show rx T + ir T iT r E = - -xxT IIill 2 II.i;II 8 th t  S fi rna ( ll A est S E YI ) Imetric matrix in the Frobenius norm for which the vector a IS es +, x = b. . 'I 1 I I 
136 11 1. LINEAR SYSTEMS AND LEAST SQUARES 11. (Bunch, DemIlle!, and Van Loan [40]). In the last exercise, show that if F is any matrix for whdl (A + F)x = b, then IIEl1l' s: 31IFII1' (p = 2, F). 3. The Pseudo-Inverse In this section we will derive perturhation bounds for the pseudo- inverse. The task is complicated by the fact that the pseudo-inverse, unlike the inverse, is not a continuous function of its elements. The qualitative result is that lim At = At  AA !im rank(A) = rank(A). AA (3.1) In spite of this discontinuity, we can ohtain informativf' hounds on At, even when rank(;l) "I rank(A). A second complicating feature is that the bounds depend on where the perturbations lie. For example, if A is of full column rank, then a perturbation in the orthogonal complement of the column space of A, no matter how large, can have only bounded effect. Accounting for this makes our bounds more intricate than the corresponding bounds for inverses. We will begin in the first subsection by establishing a uniform no- tation and nomenclature. Here we will also introduce the notion of an acute perturbation which will play a central role in the theory. The second subsection is devoted to general results, and the third to acute perturbations. In the last subsection we give derivatives and asymp- totic forms. The complexity of our results makes it imperative to have a consis- tent notation. Throughout the next three sections A will denote an m x n matrix with Tn 2: n. The projection onto the column space of A will be denoted by P, and the projection onto the row space R(A II ) by R. The complementary projectors will be delloted by P L and R L . As in the last section we will let A = ;\ + E df'lIote 11 pertl1l'batioll of ;\. The associated projectors will be denoted by P, ii, F.L, and it.L' 3. THE PSEUDO-INVERSE 137 3.1. Projections and Acute Perturbations Given the relation of the pseudo-inverse to the geometry of en 't' natural to cast our res It . t f '" ' I IS will let . u  m errls  ultanly mvariant norms. We th II II ?enote a famIly of umtanly mvariant norms generated b e symmetnc gauge function 1>. Y Since we are workin 'tl . . 1 ' . t t . g WI 1 umtan y mvanant norms we ma y fr eely ro a e our P roblel t . I ' f ' ' . . n 0 sImp I y It. In particular, let V = (V, V ) b ullltry matnx with R(V I ) = R(AH), and let U = U I 2 . e.a matnx with R(Ud = R(A). Then ( I U 2 ) be a UllltaIY UHAV = ( U!;AVI UAV2 ) = ( All 0 ) U 2 A VI U 2 A V 2 0 0 ' where All is square and nonsingular. If we set ( UfI EV I Uf IEV2 ) Ur EV I UJf EV 2 " ( Ell EI2 ) , E 2I E 22 U H EV = UIIAV = ( All; Ell E12 ) _ ( All EI2 ) . 21 E 22 E 21 E 22 , etill calhese transformed, partitioned matrices the REDUCED FORM : . pro em. M.any statements about the original problem have >revea mg analogues III the reduced form. For example, in reduced form (3.2) p ii  ( An: En ,) (3.3) 'f ba !he final ite1 in this subsection is to define the kind of pertur- , bon uner whIch pseudo-inverses behave well. Essentially these are , rturbatlns under which the column and row spaces do not It catastroplucall y The t b t . a er , . se per ur a IOns are characterized in the f II theorem. 0 ow- beorem 3.1. The following statcmcnts are cquivalC'nt. , 1. liP - Plb < 1. 
138 III. LINEAR SYSTEMS AND LEAST SQUARES 2" There is no vector in R(A) that is orthogonal to R(A) and vice versa. 3. rank(A) = rank(A) = rank(P A). Corresponding statcments hold for the row spaces of A and A. Proof. 1 :=} 2: Suppose there were a nonzero vector x E R(A) that is orthogonal to R(A). This is equivalent to saying that Px = x and Fx = O. It follows that (P-F)x = x, which implies that IIP-FII2 2: 1, a contradiction. The reciprocal relation follows by interchanging A and A in the preceding argument. 2 :=} 3: If, say, rank(A) > rank(A), then the dimension of the column space of A is greater than the dimension of the column space of A. Hence there is a vector in R(A) that is orthogonal to R(A). Thus it remains only to show that rank(P A) = rank(A). In reduced form this amounts to saying that the matrix (All E 12 ) has full row rank [cf. (3.3)]. Suppose to the contrary that xH(All E l2 ) = 0 for some nonzero vector x. Then (XII O)H E R(A). But by (3.3), (x H O)A = 0, i.e., (xII 0)11 E R(A).l. _ 3 :=} 1: By Corollary 1.5.4 and Theorem 1.5.5 the 2-norm of P - P is the 2-norm of Xry, where the columns of X.l form an orthonormal basis for R(A).l, and those of Y an orthonormal basis for R(A). In reduced form we !lIay take X = (0 1)11" To get Y, note that since (All E 12 ) has full row rank, we may permute the columns of A so that the reduced form is ( Bll BI2 ) , B 21 B 22 where Bll is nonsingular. If we set F = B2IBl/' then the columns y  ( , ) (I + pH P) -! form an orthonormal basis for R(A). It follows that XHy = F(I + F II F)-!. It is easy to see that if a is a singular value of F then. a(l + a 2 )-! < 1 is a singular value of XHy, and conversely. Hence liP - PI12 = IIX H Yl12 < 1. . This theorem suggests the following definition. 3. THE PSEUDO-INVERSE 139 Definition 3 2 The 1 t' A - . PII . -' na nx IS an ACUTE PERTURBATION of A if ll P- 2 < 1 and I I R - R II < 1 TV 1 1 - 2 . He a so say t wt A and A are acute. Thus the column spaces of acute perturbations ar e of the d . m' d I . same 1- h enslOns an t Ie canomcal angles betweell them are all less than 'Tr / 2 w ence the name acute t b t . , ' per ur a IOn. 1 he same is true O f tl spaces TI f II . I ,Ie row d d . £ Ie 0 owmg t leorem characterizes acute perturbations in the re uce lorm. ' Teor 1 e 3:3. .In the :-educed form the matrices A and A are acute if an on y If Au IS nonsmgu1ar and E 22 = E2IAj/ E 12 . (3.4) F 21 = E21Aj/ Fl2 = Aj/ E 12 , L ( F, ) A" (I p,,) (3.5) if A'  (I p,,)' A,,' ( 1', )' (3.G) Proof. Assume that A d A - '/ 1 an are acute, and suppose that A is t u ar. Specifically, let Allx = 0 for Some x Ie 0, and considerllthe or y = E 21 x. If Y = 0 the ( H O ) H . . " !J1 tha . ' n x IS a vector m the row space J, t I orthogonal to the row _space of A. If y Ie 0, then (0 I1)H ,  v ec f to A r m :he columl: space of A that is orthogonal to the cOInns eo. EIther case IS a contradiction. Equation (3.) simply says that the last columns (rows) of the ar- !on (3.2) ar.e Ime<:t r combinations of the initial columns (rows) Wich necessary sme ll s nonsingular and rank(A) = rank(A), ' Conversely, If All IS nonsingular and ( 3.4 ) holds the n . t . . 1 ed that rank ( A ) _ - -' .: I IS easl y . - rank(A) = rank(P A) = rank ( AR ) wi' 'h . Clent for acuteness. ' IIC IS The formula ( 3 5 ) I ' S . d . k . an Imme late consequence of (3.4) an I (3 6) ows from Penrose's conditions. . ' ( " 
140 III. LINEAR SYSTEMS AND LEAST SQUARES . . \ . to verify that From the above characterizatIOn, 1, IS easy, " , lim rank(A) = rank(A) A.-,A if and only if A and A are ultimately acute. Comparing this with (3.1), we see that the set on which the pseudo-inverse is continuous about A is the set of acute perturbations of A. 3.2. General Results In this subsection we shall establish results, due to Wedin, tha\d.not re( uire acuteness. The first result shows that nonacue pertur a IOns ar no only discontinuous but in some sense behave lIke poles. Theorem 3.4. If A and A are not acute, then - t 1 II At - A II  IIElb ' Moreover, if rank(A)  rank(A), thcn -t  IIA II  IIEII2' - ) k ( A ) Si ce A is not an acute P roof SU I J p OSe that rank(A  ran . n - . h I . t E R ( A ) that IS ort ogona perturbation of A, there is a nonzero vc or x. hit R(A H ) R ( A ) or a nonzero vector x E R(A H ) that IS Oft ogona 0 . to f r t that the former is true ancl that Assume without loss 0 genera I y Ilxlb = 1. Then 1 = xHx = x H Fx = xHAAt x = xll(A + E)At x = xHEAt x ::; IIE11211At x lb. 3. THE PSEUDO-INVERSE 141 Hence "jitlb  IIAIxll 2  1/IIEII2, which establishes (3.8). To establish (3.7), note that At: r = O. Hence (3.7) (3.12) IIAt - Atlb  II(At - At)xlb = IIAt x ll 2  11;112 ' This inequality for rank(A) ::; rank(A) is valid by symmetry. . For small perturbations the case of interest is r = rank(A) > rank(A), since rank(A) = rank(A) implies that the perturbation is acute when E is sufficiently sma}!. In this case it is easy to understand what is going OIL The matrix A !!lllSt have l' nonzero singular values. Since A has fewer than r singular values, at least one of the singular val- ues of A must approach zero as A approaches A. This means that At, whose spectral nOrm is equal to the reciprocal of the smallest singular value of A, must diverge. We now turn to our general perturbation bounds. The theorems are based on two lemmas: one containing perturbation bounds on products of projections, and the other an explicit formula for At. First the bounds. The projections P.L and P saUs(y P P.L = (At)H E H P.L, (3.9) (3.8) IIPP.LII ::; IIA t ll 2 11EII. (3.10) liP P.LII = IIP.LPII. (3.11) liP P.LII  IIP.LPII. mark 3.6. Similar results hold for the product P F.L as well as the ." projections Rand R. We will reference this lemma for any of these ults. 
1,12 II [. LINI';AH SYSTIMS AND LEAST SqUARES Proof. We have P P.L = pil P.L = (At)H All P.L = (At)ll(A + E)H P.L = (At)HEIIP.L' Thr inC'quality (3.10) follows on taking norms in (3.9). The inequality (3.11) follows from Theorem 1.5.5, which shows that if rank(A) = rank(A) then P P.L and F.LP have the same singular values. To establish (3.12), let F = i\ + F 2 , where rank(i\) = rank(A) and R(?2) 1. R(A). Then IIP?.LII = IIP(l- FI - F 2 )11 = IIP(l-l\)1I = IIFIP.LI\, the last inequality following from (3.11). Now for any x we have IIF 1 P .L x lb ::; liP P.Lxlb. Hence by Theorem 1.5.5, IIFIP.LII ::; II? P.LII. · The RecollCl ingredient in our bounds is an explicit expression for At. Actually, we will use three closely related expressions. Lemma 3.7. The difference At - At is given by the expressions At - At = -AtEAt + AtP.L - R.LAt, (3.13) At - At = -At ?ERAt + At F P.L - R.LRAt, (3.14) and At _ At = -AtFERAt + (AHA)tREHp.L + R.LElIp(AAH)t. (3.15) Proof. These expressions may be verified by replacing all quantities by their definitions in terms of A, A, At, and At (e.g., E = A - A) and simplifying. . We are now in a position to establish a general bound on IIAt - Afli. The exact form of the bound depends on the norm II . II. . 3. THE PSEUDO-INVERSE 143 -- Theorem 3.8 (Wedin). The error At - At has the following bound. jlAt - Atll ::; Jlmax{IIAtll, IIAtll}IIEII, (3.16) where Jl is given in the following table: II . II arbitrary spectral Frobenius IL 3 1+ 2 5 V2 Prof. The inequality for an arbitrary norm follows immediately on takmg norrs in (3.15). The results for the spectral and Frobenius norms reqlllre further argument. For the spectral norm we have from (3.13) that for any unit vector UE em II(At - At)ull = 11- tEAtu + At!.Lull + IIR.LAtull = II -:. At EAt Pu + At P.LP.LUIl + IIR.L At Pull ::; (IiAtEAtIl2I1PuIl2 + IIAtP.L1I2I1P.LuIl2)2 + IIR.LAtllIIPull. (3.17) 0'1 = IIAt EAtIl2, 0'2 = IIAt P.L1I2, 0'3 = IIR.LAtlh cosO = IIPull2 ;:::: 0, sinO = IIP.LuIl2;:::: o. .Substitutin g these values into (3"17), we get II(A t - A t)ull ::; (0'1 cos 0 + 0'2 sin 0)2 + 0'5 cos 2 0 ::; maxo::;o::;  [( 0'1 cos 0 + 0'2 sin 0)2 + 0'5 cos 2 0] = H O'f + O' + 0'5 + [(O'f + O' + 0'5)2 - 40'0'5]} < 3+V5 { 2 2 2 } -  max 0'1,0'2,0'3 - ( 1+V5 ) 2 { 2 2 2 } -  max 0'1,0'2,0'3 . (3.18) 0'1 ::; IIAtlbllAtll211Elk 
144 Ill. LINEAR SYSTEMS AND LEAST SQUARES By Lemma 3.5, 0'2 = IIAt P Pllb ::; IIAtlbll P P1lb ::; IIAtllIIEII2' and similnrly (t:\ -:; IIAtllIIElh" Hence from (3.18) we obtain IIAt - A t l1 2 ::; 1 \ v'5 max{IIAtll, IIA t llnllEl12' (3.19) For the Frobenius norm, we first consider the case where rank(A) ::; rank(A). Let FI = - At P E RA t, F 2 = At P P l, F3 = - R 1 RA t be the terms in (3.14). Then IIAt - Atll = IIFI + F211 + IIF311. Now FI + F 2 = At(-PEAtp + PP l ). Hence I!PI + F211::; IIAtll(IIPEAtpll + IIPPlll). (3.20) It follows from Lemma 3.5 that II [JEAtrllf. + IIPrlll} -:;IIPEAtll, + lIlPIif. = liP EAt II} + IIPlEAtll} = liE At II} ::; IIAtllIIEII}. Consequently, IIFI + F211F ::; IIAtIl2I1AtIl21IEIIF' Moreover, we have IIF31h -:; IIA t I1 2 I1RJ.RIIF = IIA t l1 2 11R l E H Atll F -:; IIAtllIIEIIF' 3. THE PSEUDO-INVERSE 145 Hence from (3.20) we get II At - AtIl F ::; J2IIAtlbmax{IIAtIl2' IIA t lb}II E IiF -:; J2 max{IIAtll, IIAtllnIlEIiF. (3.21) Since the bound is symmetric in A and B, it also holds for the case rank(A) -:; rank(A). . Although the perturbation bounds bear a family resemblance to the bounds for matrix inverses, they cannot by themselves insure the convergence of At to At as E -t 0, since At may grow unboundedly. What is needed is the additional hypothesis that rank(A) = rank(A), which ensures that A and A will become acute as E -t O. The following theorem gives the perturbation bounds that hold for this case. It is established by essentially the samc arguments as Theorem 3.8 - with the difference that some of the terms vanish in the expressions for At. The details are left as an exercise. Theorem 3.9 (Wedin). Let A E c rnxn , where m 2:: n. lfrank(A) = rank( A), then IIAt - Atll ::; JlIIAtIl2I1AtlbIlEII, (3.22) wllere JI is given by the following table: norm arbitrary spectral Frobenius rank(A) < m, n 3 1+ 5 J2  rank(A) = n 2 J2 1 rank(A) = m = n 1 1 1 A trivial rearrangement of (3.22) gives a familiar looking corollary. Let K2(A) = IIA11211A t 11 2 . IIAt - Atll IIEII IIA t ib -:; Jl K 2(A) IIA112 ' (3.23) 
14G III. LINEAR SYSTEMS AND LEAST SQUARES There are two points to make about this corollary. First, although the number K2(A) is usually called the condition number of A, the theorem shows that the "real" condition number is jlK2(A) - at least to the extent that the bounds really describe the behavior of At. However, tlH' usag(' is sanct.ioned by cnstom; and if we take the view that a condition nUlnber is any number whose size gives a rough estimate of thC' sensitivity of the problem, then the usage is even correct. The second point is that as E -. 0 the right-hand side of (3.23) approaches zero. This means that the relative error in At approaches zero; i.e., At -. At, all under the hypothesis that rank(A) = rank(A). On the other hand if rank(A) =I rank(A), then by Theorem 3.4 the matrix At cannot converge. Thus we have established the statement (3.1) with which we opened this section: a //ecessary and sufficient condition for At -. At as A -. A is thai rank(A) -. rank(A)" 3.3. Acute Perturbations It is evident from the proof of Theorem 3.8 th(t we have given away much in deriving the bounds. In particular, if A is a small acute per- turbation of A then P and P are nearly equal, and the same is true of R and it. Thus it follows from (3.15) that At - At can be decomposed into three terms - one essentially depending on PER, one on PERl., and one on Pl.ER. However, this does not tell the whole story; for we shall show t.hat the df'pendency of At - At on PERl. and Pl.ER is bounded, no matter how large these projections may be. In order to state our results concisely, we must introduce some addi- ' tional notation. Let 11.11 be generated by the symmetric gauge function IJ>, and for any FE ekXT(k 2: r) define w1.(F) (f IJ> ( al(F) 1 '"'' aT(F) 1 )  I!FII. [1 + ar(F)] '2 [1 + a;(F)] '2 (3.24) The function W <j> is not a norm; however, it has some useful properties. First, from Theorems 1.4.5 and II.3.9 and the monotonicity of IJ>, W<j>(GF)  W4.(IIGlbF)  w4.(IIGIIP). 3. THE PSEUDO-INVERSE 147 Second, since for 0' 2: 1 O'a O'a < (1 + 0'2( 2 )! (1 + ( 2 )!' we have 0' 2: 1 ==} W<j>(O'F)  O'W<j>(F). For small , the function w<j> is asymptotic to IIFII, and for all F it is bounded: VIZ., W<j>(F)  lilT II. Finally, for the spectral norm W 2 (F) = 1!F1I2 . (1 + IIFIID! Our first resnlt conCPfllS a rather special matrix" Lemma 3.11. The matrix U) U)' $1 2 (3.25) ( ;, )' - (I 0)  "'_(F). (3.26) Let aj(F) be the singular values of F. It is easily verified that (  ) t  (I + F H Fr'(I F"), (3.27) 1 [ 2 1  1, 1 + a j (F)]'2 
148 III. LINEAR SYSTEMS AND LEAST SQUARES from which (3.25) follows. Also if G  ( , )' - (I 0), then GG H = I - (I + F H F)-I. It follows that the singular values of G are given by (Ji(F) 1 , [1 + (JJ(F)] 2 which establishes (3"2G). . We turn now to the perturbation theorem. Theorem 3.12. Let A be an acute perturbation of A, and let --I K- = IIAllliAu lb. Then IIA;It;tll s K. "'li' + W (K. III ) + w (K. lf11 )' (3.28) where w is defined by (3.24). Proof. Let F ij be defined as in Theorem 3.3. Let I"  ( n, I"  (I 0) J - ( ) J I2 = ( I FI2)' 21 - I ' . - t --1 t. From (3.G), At = J I2 A ll J 21 , hence iit - At = (Jt2 - It2)A 1 lIJI + .712 A ll(JJI - IJd + J12(A 1 l - A1U1) . 3. THE PSEUDO-INVERSE 149 From Corollary 2.7 we have the following bound: II t (A -I A -I ) t II IIA -I II ,IIEuil J 12 11 - II .7 21 S ]1  IIAIJII ' (3.30) By Lemma 3.11 11(112 - I1 2 )A I lIJlII S IIA I lIIIIJ12 - ILII = II All IIW.p(F I2 ) = II All II w.p(All E 12 ) S IIAlllllw.p (K.rn), (3.31) and likewise IIJ1 2 A I l(JJl - IJI)II S IIAi-Itllw (K. Iil )' (3.32) The bound (3.28) follows on combining (3.29), (3.30), (3.31), and (3.32) and recalling that IIAI/II = IIAtll. . The bound (3.28) gives a rather nice dissection of IIA L Atll. Asymp- totically, it is better than the bound that would be obtained by taking norms in (3.15); i.e., IIAt - Atll , liEu II + II E l211 + IIE 21 11 IIAtll S  IIAII (the two are asymptotically equal for El2 and E 21 small). However, the bound also shows that E I2 and E 21 can have at most a bounded effect on IIAt - Atll. When A is square and nonsingular, EJ2 and E21 are void, and the bound reduces to that of Corollary 2.7. , If Eu is sufficiently small, we can estimate II All 112 in terms of nAlllb and II Ell. This gives the following corollary. Corollary 3.13. In Theorem 3.12, let  = IIAIlIIAtlb, IIAtll211Ellil < 1, 
150 III. LINEAR SYSTEMS AND LEAST SQUARES so that , == 1 - K:11£llII/IiAIl > o. Then (3.33) II At II :S IIAtll/!, and IIAt - Alii < '5:. II Ell II + W ( '5:. E21 ) + W<I> ( '5:. El2 ) . IIAtll -, IIAII <I> ,IIAII ,IIAII . - t --I t Proof. From the equatIOn At = J l2 A ll J 21 , we have IliP11 :S IIJLIIIIAiI11lIlJJllb:S IIAi/li. (3.34) By Corollary 2.7, II A-III :S IIAill1l = IIAtll , 11 , , Also  :S K/!, and the inequality (3.34) fol- which establishes (3.33). lows from (3.28). . 3.4. Asymptotic Forms and Derivatives Asymptotic forms for A may be obtained from (3 15). Of. course for jjt t.o approach At we must have rankSA) = rank(A); and smce we are assuming that E is arbitrarily small, A may be assumed to be an acute perturbation of A. In this case At = At + O(IIEII), and P = AAt = (A + E)[At + O(IIEII)] = P + O(IIEII) with similar expressions for the other projections. Hence from (3.15) At = At _ At PERAt + (A H A)t RE H Pl. - Rl.EH P(AAH)t + 0(IIEI12). (3.35) We could apply this formula, as we did the corresponding formul.a for the inverse to calculate oAt loaij; however, the results are complI- cated. Instead' let us assume that A( T) is a differentiable function of T with rank[A( T)] = rank[A( T')] 3. THE PSEUDO-INVERSE 151 for all T and T'. Then A(T)t is a differentiable function of T and dAt dA dA H dAH - d = -At p - d RAt + (A H A)tR- d -Pl. - Rl. -P(AAH)t. (3.36) T T T dT The reduced form of (3.35) can be computationally useful. From the results of the last section we have Ai/ = Ai/ - AI/ Ell Ai/ + 0(11 Ell 11 2 ). FrOill (3.27) in the proof of Lemma 3.11 we have ( ) t I -H H F 21 = (I Au E 21 ) + 0(11 Ell II II E 21 II) and (I F 12 )t = ( EH-H ) +O(IIEllIIIIEdl). 12 II Hence from (3.6) A.t = ( Ai/- Ai/Ell Ai} + 0(11 Ell 112) E{(AllA{l)-1 + O(IIEllIIIIEl2I1) (AH A 1 d- 1 E£{ + O(IIE II IIIIE 2 J) ) E{;(A{lAllAH)-lE{ . +0(11 Ell II IIEI211 IIE 21 II) Notes and References 'Much of the material in this and the next two sections has been taken, {,SOmetimes word for word, from a survey article by Stewart [205, 1977]. The notion of acute subspaces is due to Davis and Kahan [53, 197 0 ], who used the second condition of Theorem 3.1 as a characterization. The notion . of an acute perturbation of a matrix is due to Wedin [260, 1973]. t.: "'?enrose [178, 1955] established that the pseudo-inverse is continuous only if 'he rank is unchanged. However, he used techniques that do not give explicit . rturbation bounds. The subject was revived by Golub and Wilkinson [94, 19 66 ] , whose interest in stable algorithms for solving least squares problems 
152 III. LINEAR SYSTEMS AND LEAST SQUARES (see [88]) led t.helll t.o derive first.-order pertl1l'bation bounds for least squares solutions. 'I'IH' first. perturbation bounds for t.he pseudo-inverse itself were given by Ben-Israel [24, 1966], who rest.ricts his class of perturbations so t.hat (in reduced form) only Ell is nonzero. More general bounds for acute perturbat.ions were established by Hanson and Lawson [102, 1969], Pereyra [180, 1!)69], and St.ewart [199,1969]. Theorem 3.12 extends Stewart's bound to unitarily invariant norms. An identity in terms of projections related to (3.6) is given by Wedin [260, 1973], who uses it to derive bounds for acute perturbations. The general results in the second subsection arc essentially due to Wedin [260, 1973]. Theorem 3.4 is an extension by Wedin of a theorem of St.ewart [199, 1969]. In an earlier report Wedin [258, 1969] considers t.he sharpness of the constants /1" iu Theorem :U) and shows that for tllf' spect.ralnorm the constant /1, canuot be made small!'r. Early differentiability H'stilts have been given by Pavel-Parvu and Korganoff [176, 1969], Hearon and Evans [107, 1968] and Decell [54, 1972]. Wedin [258, 1969] derived the formula (3.36), as we do, from (3.15). The same results for functions of several variables was derived independently by Golub and Pereyra [gO, 1973] in connection with separable nonlinear least squares problems. For more, see [91]. Exercises 1. Show that if X has linearly independent columns and B is positive defi- nite, then R(X) and R(BX) are acute. 2. Let XI and Y I have full column rank and suppose that R(Xd and R(Y I ) are acute. Show that if the columns of X 2 span R(Yd.L then (XI X 2 ) is nonsingular. [Hint: Use canonical bases.] 3. Show that rank(A) = rank(A) is not sufficient for A to be an acute perturbation of A. 4. Give an example of matrices A and A such that IIP A - PAI12 < 1, while liRA - RAII2 = 1. 5. Let K2(A) = IIAI12I1AtIl2. Show that if rank(A) < rank(A) then IIA - AI12 > K ( A ) IIAII2 - , and the bound can be attained. 4. PROJECTIONS 153 6. (edin [258]): Show that the constants for the spectral norm in The- orem 3..9. re optunaL [Note: this is a hard problem, Wedin solves it, not by elllbltmg IIlt.rices for which the bounds are attained, but by exhibiting matrIces for wIndr the bounds are asymptotically sharp.] 7. Let \]11> be defined by (3.24). Show that lal :::; 1  Inl\]l1>(F):::; \]I1>(aF) and IIFII < [1 + aax(F)] - \]I1>(F) :::; IIFII. 8. Let A be of full rank. Calculate a At I Oaij. 4. Projections I this section we shall consider how the projection P varies with A. Sll1ce P = AA t, we can obtain perturbation bo.unds for P from the theory develope in he last section. However we can derive sharper bouds by workll1g dIrectly with Olle of the decompositiolls of At. In partIcular we shall work :vith the decomposition (3.6) based on the reducd forms of A and A. The use of this form presupposes that A and A are acute. This is no loss, since, as with the pseudo-inverse we ust reuire rank(A) = rank(A) to ensure the continuity of PA, wllich 111 turn Implies that the perturbation is ultimately acute. Theorem 4.1. Let A be an acute perturbation of A, and let k, be defined as in Theorem 3.12. Then liP - PII2 ::; k,II E 21lldIiAIl 1 [1 + (k,II E 21112/I1AIIPJ> < 1. Proof. With F 21 defined as in Theorem 3.3, we have (4.1 ) R(A) = R ( I ) . F 21 The matrix ( l ) (I + F2F21)-I(I FD 
154 III. LINEAR SYSTEMS AND LEAST SQUARES is a Hermitian, idempotent matrix whose column space is R(il); hcncc it is P. It follows that - ( (I + FJF2Itl - I (I + FJF2Itl FJi ) ( 4.2) P - P = II -I H ) I FH ' F 21 (I + F 21 F 21 ) F 21 (I + F 21 F 21 - 21 from which it is easily verified that - 2 ( f'1{F21(I+FJ{F21)-1 a ) . (4.3) (P - P) = a F 21 (I + F2 F 2 J)-1 FJi N ow the nonzero singular values of the diagonal blocks in (4.3) are a;(F 2 d/[1 + a;(F 2 dJ, where the ai(F 21 ) are the nonzero singular values of F 21 . The ound follows from the fact that the largest singular value al of F 21 satIsfies A IIE 21 1b . al(F 21 ) = IIF 21 1b ::; '" IIAII . In terms of projections, t.he bound (4.1) can be written in the form kIlP.L ER II2/IiAII 1 < 1. liP - PI12 ::; [1 + (kIIP.L ER I12/IIAIIF]"2 The bound is interesting in several ways. First, it is independent of E I2 d E S econd its de p endence on Ell is only through the constant an 22., . . ll' t zero k. Third, the bound is always less than Ulllty. Fma y, It goes 0 along with E21" . h If the hypotheses of Carollary 3.13 are satisfied. (that IS, w en IIA]}112I1Ellll < 1), then we may replace k by "'h 1Il (4.1). Thus, '" serves as a condition number for P A. Asymptotic forms may be obtained in the usual way from (4.2). Indeed, - ( 0(IIE21112) FJ{ + 0(IIE 21 11 3 ) p - P = F 21 + O(lIE21113) 0(IIE 21 11 2 ) . 5. LEAST SQUARES PROBLEMS 155 In terms of projections P = P + P.LERAt + AtIlREHp.L + 0(II P .L E RII2). It follows that if A( T) is differentiable and varies without changing rank, then P( T) is differentiable and dP = P.L dA RAt + Atlf R dAH Pi." dT dT dT (4.4) Notes and References Theorem 4.1 is due to Stewart [205, 1977]. The expression (4.4) for the derivative is due to Golub and Pereyra [90, 1973]. 5. The Linear Least Squares Problem In this section we will derive perturbation bounds for the solution of the least squares problem of minimizing lib - Axlb and bounds for the resulting residual vector. Although the solution of minimum norm is given by x = At b, the perturbatiOlI theory of Scction 3 again docs not . give the best possible results. i!. We shall assume throughout this section that A is an acute pertur- .bation of A, and we shall work with the reduced form of the problem. 111 this form x is replaced by 7HX and b is replaced by UHb. If x and b e partitioned in the forms x  (::), h  (  ) XI = AI/b l X2 = O. oreover, the norm of the residual vector r = b - Ax 
15G III. LINEAR SYSTEMS AND LEAST SQUARES is gi ven by 111'112 = Ilb 2 112" In the theorems to follow we shall freely use the definitions made in the previolls sections (e.g., k, K" and!,). As in Sections 3 and 4, the nlllnber K may be replaced by K,h whenever IIAtlbllEllll < 1. OnE' additional piece of notation will be lleeded" In analogy with ('L I:n, ddil1(, IIAllllxlb 1]= IIb l 1l 2 . Since b] = AllXI, we have 1] 2': 1. Also Ilxll ::; IIAtlillblll, which shows that 1/ ::; K,. When A is ill conditioned, that is, when At is large, the vector :r may be either large or small. In the first case 1] is near K" and we shall say that x reflects the ill-conditioning of A. 5.1. Perturbation of the eoefficients We begin by bounding the effects of perturbations in b" Theorem 5.1. Let b = b + e, x = Atb, and x = Atb. Then IIx - Xll2 K, II Pe l12 <--. IIxlb - 1} IIPbll 2 Proof. With the obvious partitioning of e we have i-x = Ailel , so that IIi - xl12 S; IIAill112l1ellb" (5.2) But 11.1:112 = 1/lIb l lb/IiAII, which combined with (5.2) yields (5.1). . Theorem 5.1 shows that. the perturbation in x is determined by the projectioll of e onto R(A). However, Pe is normalized by IIPbIl2, and if this latt.er quantity is small, the perturbation may be large. Since IIbll = IIPbll + IIrll, this observation may be summarized by saying that large residuals are' troublesome, a statment that will be amply confirmed later. Since 1/ can be as large as K" the number K, cannot be taken as a' conditioll llumber for perturbations in b without further qualification. (5.1) 11(112 - 112)Ai,/b I 1l 2 < W 2 ( k El2 ) I I x il - IIAII 2. 5. LEAST SQUARES PROBLEMS 157 If x does not reflect the ill-conditioning of 4 tl' . K. is a condition number As' . '. lell 1} IS near Ul1lty and . . t ' : 1} glowS the solutlOll becomes increasin g l y msensl Ive to perturbatIOns ill b. We next turn the effects of a perturbation in A on x. Theorem 5.2. .Let x = Atb and i: = .1tb, where A acute perturbatIOn of A. Then = A + E is all li-xI2 <k"Ellib ( ,EI2 ) IIxll2 - IfAII + W K, IIAII + k2 11Edl2 ( 1-l lIb2112 IIE21112 ) IIAII } IIb l ll 2 + IfAII . (5.3) X-x = J{2(.1i/ - Ai/)b l + (J{2 - I{2)Ai/bl + .J{2 .1 1'/(141 - 141)b. (5.4) Then 1/J12(.1il - Ail )bI//2 < k IIEllll2 11 x li - IIAII 2, J12 .1 i/(111 - IJI)b = J12 .1 1'/[(I + F2F2d-1 - I]b l *-,' + JL.1I'/(I + F2F2d-IF2b2.  bound the first term in (5.7), note that (I + FiIF 2 d- 1 - I = _ ( I + F.Hp. ) -I p, 1I T:' 21 21 21r21. II t - I J I2 A il [(I + F2 F2d - I]bdl2 ::; lIi/1I211(I + F2F2d-1112I1FJIII2I1F21bdb s; IIAlllIIIE2I.1l/blIl2 S lI.1l/IIIIE2111lIx"2 _ [ ,  ] 2 II II - K, IIAII x 2. (5.5) (5.6) (5.7) (5.8) 
158 111. LINEAR SYSTEMS AND LEAST SQUARES For the second term in (5.7) we have IIJJIA;/ (I + FJ[F2d- 1 F 21 b 2 112 :S IIA11111IIE2111b211 = 11/1 1 / 1IIIE21112 :::i 1]-llIxlbIIAIl < -1'2 I1E2111>Ji  11 II _ 1] K IIAII 11"111 x 2. The bound (5.3) follows on combining (5.4)-(5.9). . The first two terms in (5.3) are unexceptionable. The first term corresponcls to the classical result for linear systems and is the only nonzero term when A is square and nonsingular. The second term depends on P ERJ. and vanishes when A is of full column rank, as it is in many applications" The third term J'('quires more explanation. If terms of second order in IIE2111 are ignored, it is essentially (5.9) '2 11b 2 11211E21112 = t e 211E2d12 K Ilbllh IIAlj - an 1] IIAlj , where e is the angle subtended by band R(A). The number 1]-1 tan e can vary from a to (x). It is small when e is small (i.e., when the residual vector is small). It is also reduced in size when IIEulb is small and x reflects the ill-conditioning of A so that 1]  K  k. When x does not reflect the ill-conditioning of A and e is significant, it is of order k 2 , thus making the third term in (5.3) the dominant one. We have bounded the third term in the decomposition (5.4) in such a way as to reflect its behavior when E 21 is small. In fact it is bounded for all values of E 21 , and the third term in (5.3) may be replaced by ,lIbll 2 ( , E21 ) K1] llb l l1 2 W 2 K IIAII . For the full rank case there is a structured perturbation theorem for least squares solutions. Theorem 5.3 (Bjorck). Let A be offull column rank and let x = Ab. Let Sand 8 /J(' llOnncgati ve, ilnd assume that lEI :S eS and lel:S es. (5.10) , 5. LEAST SQUARES PROBLEMS 159  _i is=an solution f .the least squares problem of minimizing lib _ Xll2 - II r ll2 (n.b., A IS not assumed to be an acute perturbation of A), then for any absolute norm II . II IIi - xII :S e[IIIAtl(s + Slxl)II + III(AHA)-'ISTIFIII]. (5.11) Proof. By Theorem 1.5, we have Hence CA : E)H A  E )(; ) (  ) ( I A ) ( F ) ( b + e - Ei ) A" 0 j. -E"1" (: 11 )' (: -(::t')' (5.12) x - Atb = Ate - AtEx + (AIIA)-I ETF. Since x = Atb, we have on taking absolute values and using (5"10) '. Ix - xl :S e[lAtl(s + Six!) + I(AIIA)-IISTIFI]" ;,The inequality (5.11) now follows on taking norms in (5.13). . '" '..v (5.13) .Remark 5.4. The proof of the theorem works if the matrix (A + E)H in (5.12) is replaced by (A + Fl', where IFI S eS. ( . As it stands, the theorem is unsatisfactory for applications in which 1!i" IS not kno.wn explicitly, for in that case it is impossible to compute . However, If we compute f = b - Ax, then 1 = 1 - Ex. It follows that IFI :S If I + eSlxl, d we may use this upper bound in place of In in (5.11). Note that adjustment is only of order e 2 and will usually be negligible. 
160 III. LINEAR SYSTEMS AND LEAST SQUARES 5.2. The Residual . . . 1 l' - Pb the theory of Section 4 may SincC' thc resIdual vector IS glven)y -, . S"f . II if bc applied to give perturbation bounds for the residual. pecI Ica y, i = Atb and l' = b - Ai = Pb, then liT - 1'112 :s liP - Plbllblb and II P - Fib can be bounded by (4.1). 5.3. Backward Perturbations The problem of backward perturbatons for least sqIares problels; f d ' ffi It ti lan the correspondmg problem for Imear system. ar more I cu f ., ., Il b- see wh let i be a purported solution of the problem 0 mllllfmzmg . A y, II t A - b - Ai What we would like is to find a perturbatIOn xlb allC e l' - . . . ., li b E suh that x is an exact solution of the problem f mmlmlzmg E f: ( A + E ) x ll == Il rlb. For linear systems all we need do IS produce an bl h ever we must choose which T is zero. For the least squares pro em, ow,. E' i e. E so t.hat t.he rcsidual is orthogonal to the .column space of  + ut;o I t ( 1 - t- E ) "1' = O. Since T is defincd m terms of E, tillS eq < so t Ja, j I . lit' 0 IS is nonlinear in E, and at present we know on y specIa so u I I . Theorem 5.5. Let x be given. Let x = Atb, l' b - Ai" Then x = (A + Ei)tb (i = 1,2) for rfHX EI = - Ilfll ' b - Ax, and f in which case IIx H flb IIEllb = IIflb ' and for (f - r')i H E 2 = Ilill ' 5. LEAST SQUARES PROBLEMS 161 in which case (1Ifll - IIrll) 1I.7:lb li E II = Ilf - rib 2 2 lIill 2 (5"14) Proof. The proof for EI is a straightforward, if tedious, verification that (A + E)Hr = O. For IIE211, note that f - r' = A(x - i) E R(A). Hence R(A + E 2 ) c R(A). But it is easy to verify that b - (A + E 2 )i = r. Hence b - (A + E 2 )i E R(Ah C R(A + E 2 )J., which is sufficient for i to be a solution of the perturbed least squares problem. The first equality in (5.14) follows on taking norms. The second follows from the Pythagorean equality and thc observation that since f E R(A), we have r .l r. . The perturbation E 1 and its norm can be computed if we are given .i. It is small when the residual r is nearly orthogonal to the column '8pace of A. The perturbation E 2 cannot be computed, since it involves "the true residual 1', which is not known. However, it has the theoretical :consequence that there is little use hunting for the exact minimizing z. Provided the residual is nearly minimal, the approximate solution %, however inaccurate, is the exact solution of a slightly perturbed -!if_',,, problem. }' The matrix E 2 is only one of a class of backward perturbation theo- ,'rerns. For example, if it is desirable that some of the columns of A not .e altered by the perturbation, we can proceed as follows. Let j; be the . tor obtained from i by setting to zero the components corresponding the columns that are not to be disturbed. Then . (f - l' )j;H E = II£II the required matrix. Of course 11£112 :s lIill 2 , so that IIElb  IIE 2 112; owever IIEII2 may still be small enough for practical purposes. The attempt to formulate a structured backward pcrturbation the- rem for the least squares problem leads to an intractable optimization " lem. However we may apply the OettliPrager theorem (Theo- 2.17) to the expanded equations to get the following useful result. "e proof is left as an exercise. 
162 III. LINEAR SYSTEMS AND LEAST SQUARES ( k) Let ' 0 E R rnxn and s E R rn be nonnegative. Theorem 5.6 Bjorc. ,.J For b, 1 E ern and x E en let { 11 + A.i: - bl i IAlIfli } f = max IIlflx (Slxl + s); , mfx (STlfl)i . ( here % = 0 and otherwise p/O = 00). If f "# 00, then there are . lEI I F I < fS and a vector e satisfying matrices E and F satisfYlllg , le\ ::; fS such that CA : F)H A  E )( :) (b  c ) This theorem does not say that f and x are solutions of a slghty perturbed least square problem, since the perturbation E in A IS chf- ferent from the perturbation F H in A H . Nonetheless, by Remark 5.4, tl bound on E and F can be used in Theorem 5.3 to assess Ie common the accuracy of X. 5.4. Asymptotic Forms and Derivatives An asymptotic form of the perturbed least squares solution x can be obtained from (3.15): .i = x - At P ER.T + RJ.E II P(AII)t x + (A H A)t RE H PJ.b + 0(IIE\12). The corresponding derivative formula is dA H dA H 2 d.T = _Atp dA Rx + RJ. _P(AH)t x + (AIIA)t R-PJ.b + O(IIEII ). &    In reduced form ( XI ) = ( XI ) + ( -Al/ EllXI  (tIAll)-1 E 21 b 2 ) + 0(IIEI1 2 ). X2 X2 EI2 A ll XI 5. LEAST SQUARES PROBLEMS IG3 Notes and References The first perturbation analysis of the least squares problem is due to Golub and Wilkinson [94, 1966], who gave first order bounds. They were the fin;t to note the dependence of the solution on ",2. Rigorous upper bounds were derived by Hanson and Lawson [102, 1969], Pereyra [180, 19 6 9], Stewart [199, 1969], and Wedin [258, 1969]. More recent treatments have been given by Lawson and Hanson [142, 1974] and Adbelmalek [1, 1974]. Van der Sluis [245, 1975] gives an especially detailed treatment. He was the fin;t to point out the mitigating effect of 71 in (5.3). Strictly speaking, K, is not a condition number for the least squares prob- lem- at least not in the simple sense we have been using the term. Nonethe- less, it is called the conditiou Il1lmber of A everywhere iu the literat.ure. Statisticians are coucerned with the effects of errors in the matrix A, a prob- lem they treat under the names "errors in the variables" or "measurement error models." One approach to the problem is to pose a probabilistic model of the error and investigate its effects [116, 49, 80]. Another approach is to compute "regression diagnostics" to tell when the error is having harmful effects [22, 213]. Yet another approach is given in Exercise 5.8. The structured perturbation theorem (Theorem 5.3) is due to Bjorck [36, 19 8 9], who also noted that it remains valid when the perturbations of A in the augmented equatious are differeut (Remark 5.4). Arioli, Duff, aud de Rijk [6, 1989] have Ilsed this fact t.o aualyze the errors in algorithms based on the expanded equatious (1. 7). Theorem 5.5 is due to Stewart [206, 1977], and Theorem 5.6 to Bjorck [36, 19 8 9]. The problem of obtaining optimal backward perturbation bouuds, structured or uot, is unsolved. See [112] for further details. 1. Show that "'2(A Il A) = ",(A). THE FOLLOWING EXERCISES USE FIRST ORDER PERTURBATION THEORY TO EXPLORE THE SENSITIVITY OF LEAST SQUARES SO- LUTIONS TO ERRORS IN INDIVIDUAL COLUMNS. Let A be of full column rank and x = Atb. Let E = eli. Show that :i: - x = -iAte + c;-l)ellr + O(lIell)' 
lG/t III. LINEAR SYSTEMS AND LEAST SQUARES where c(-I) is the ith column of the cross-product matrix 0 = AliA and '" r = Ii - Al: is the residual vector. 3. Show that 11:1: - .7:112 :S IIell2 (liIIlAtI12)2 + (11c;-I)112I1rI12)2 + O(lIell). 4. Show that Ii - il :S lIel12 Ila)t)II + (h'S-I)lll r I12)2 + O(llell), where ajt) is the jth row of At and IS-I) is the (i,j)-element of 0- 1 . -0- 5. (A quick and dirty bound). Let A and A have full rank. Starting from the normal equations All Ax = Allb, use the perturbation theory for linear systems to show that for any consistent norm II . II IIxlllIxll :S K,(A) \\:: (1 + I,':'D + K,2(A) \\:: ( 1 + \\D . G. (Higham and Stpwart [114, JU87]) . Let A be of full colulIln rauk and let o = All A. Show that if F is sufIiciently small then 0 + F can be written in the form 0+ F = (A + E)II(A + E), where 1 IIEIIF;S 2I1AtIl2IJFIIF' Show that this bound is realistic. 7. The vector l' in Theorem 5.6 can be regarded as an arbitrary parameter. Write down t.he bounds obtained when f = Ii - Ai: and f = O. [Note: the problem of determining the optimal f is open.] 8. (Stewart [217, 1989]). Let A have full rank. Show that x = At(1i + Ex) + 0(11E1I2). Give an expression and a bound for the 0(IIEI1 2 ) term. [Note: To the statistician, this result says that if E is small enough, the least squares solution x behaves as if it came from an unperturbed problem in which the error in t.he right-hand side has been inflated.] Chapter IV The Perturbation of Eigenvalues Of al the problems in matrix perturbation theory the perturbation of the eIgenvalues of a matrix presents the most varied technical difficul- ties. The problem itself is simply stated" Given a matrix A E e nxn and a pertrbtion. E. of A, how are the spectra £(A) and £(A + E) related? But thIs sImplIcIty is elusive. In the first place, the term "related" has more than one natural sense, as we shall see in the first section of this chapter. More important, different classes of matrices even matrices having the same eigenvalues, behave differently under erturbation. Example 1. Let Ao = O. Then the eigenvalues of Ao are all zero. Let E be a perturbation of Ao. Since Ao + E = E, it follows from Theorem II.2.6 that A E £(At + E) ===* IAI :S IIEII" for any consistent matrix norm II . ". On the other hand, let I 17 A,  (   ) 165 
lGG IV. TIlE PERTURBATION OF EIGENVALUES Then the eigenvalues of Al are also zero. However, if f > 0, the eigcn- values of A,  (   ] afe (tw" wlJCfe UJ(' Wi afe I.he primit.iw 4th roo(..'i of uni/,y O.e", 1, i, -1, -i). Thus while a perturbation of order, say, 10- 8 wm induce a perturbation of order only 10- 8 in the eigenvalues of Ao, it can induce a perturbation of as large as 10- 2 in the eigenvalues of AI. This example shows that a general perturbation theory for eigenval- ues has to be pessimistic, since it must account for the ill-conditioned behavior of the eigenvalues of matrices like AI' The cure for this prob- lem is to develop individual perturbation theories for different classes of matrices, which is what we will do in this chapter. We begin in Sec- tion 1 with the general case. In Section 2 we introduce the very useful Gerschgarin theorem and use it to compute the derivative of a simple eigenvalue. In Section 3 we treat normal and diagonalizable matrices, and in Section 4 Hermitian matrices. The chapter concludes with a section on special topics. As in the last chapter we will use the tilde conventions to denote perturbations. Specifically, A wm denote a (com- plex) matrix of order n, and A = A + E wm denote a perturbation of A. The eigenvalues of A are £(A) = {AI,'""' '\n} and those of A are £(A) = {I"" ,,n}' As wm;]l I.he clwracterist.ic polynomials of A and A wm be wriHen 1> A (,\) and 1> A (,\)" 1. General Perturbation Theorems 1.1. eontinuity: Ostrowski-Elsner Theorems The first thing that we will establish about eigenvalues is that they are continuous, which follows from a fact and a theorenL The fact is that the characteristic polynomial of a matrix, being itself a polynomial in . 1. GENERAL PERTURBATION THEOREMS lG7 the elements of the matrix, is a continuous function of the matrix. The theor.em is RouclH's theorem. In the form we will use here, it states hat I cP and Tl are analytic in a simply connected region D and D c D IS a dIsk for which IT1(OI < 11>(01, (E av, where a v is the boundary of V, then 1>( () and 1>( () + T/( () have t.he same number of zpros in V. Theorem 1.1. Let ,\ be an eigenvalue of A of algebraic multiplicity m. Then for any norm II . II and all sufficiently small f > 0 there is  6> O.such that ifllEil < 8, the dsk D('\,f) = {( E e: Ie -,\1 < f} contams exactly Tn eigenvalues of A. - Proof. Let f be so small that V('\, f) contains only the eigenvalue ,\ of A. Let T( () = 1> J (0 - 1> A (0. By the continuity of the characteristic polynomIal, as A ---+ A the function Tl( 0 converges to zero on the compact set av. Since 1>A(() is nonzero on av, there is a 8 > 0 such that ITJ(OI < I1>A(OI on av whenever IIEII < 8" By Rouche's theorem 1>A and1>A = 1>A + TJ have the same number of zeros in V. . . Theorem 1.1 is an example of a qualitative perturbation theorem: It states.that a perturbation must be small without providing a bound on ,the sIze .of the perturbation. We now turn to a theorem of Elsner, whIch provIdes explicit bounds. However, we first need to introduce s?me notaion to describe how the the eigenvalues of two matrices are sItuated wIth respect to one another. Definition 1.2. Let A have ei g envalues ,\ \ d A - L . - - I, . . . ,An an lJave eIgen- values. '\1,... , '\n. Then the SPECTRAL VARIATION OF A WITH RESPECT To A IS (A - ) dcf - SV A = mj'Lx IIVnl'\i - '\j I" (1.1) The HAUSDORFF DISTANCE between the eigenvalues of A and A is hd(A,A)  max{svA(A),svA(A)}. (1.2) MATCHING DISTANCE between the eigenvalues of A and md(A, A) f lIn{mj'Lx I)."(i) - '\d}, ,where 11' is taken over all permutations of {l, 2,..., n}. (1.3) 
168 IV. THE PERTURBATION OF EIGENVALUES The function sv A (A) is not a metric: it may be zero, even whel: the eigenvalues of A and A are different (e"g., when n = 2 and Al = Al = 2 = a while A2 = 1). Geometrically, the function sv has the following interpretation. If Vi = {(: I( - Ad  sVA(A)}, i = 1,."., n, then n £(A) C U Vi. i=1 In other words, the eigenvalues of A lie in the union of disks of radius sv II (/\) ('cntcI"I'd at the cigcnvalues of A" The Hausdorff distance hounds the spectral variation and is actually a metric. The matching distance bounds the Hausdorff distance and is also a metric. To say that the matching distance is small is one of the nicest things that can be said of the eigenvalues of a matrix and its perturbation. It means that they can be grouped into nearby pairs. We are now in a position to bound the Hausdorff distance between two matrices. Theorem 1.3 (Elsner). For any A and A, 1 hd(A, A)  (IIAII2 + IIAII2)1- IIEII. (1.4) Proof. Since the right-hand side of (1.4) is symmetric in A and A, it is sufficient to prove that it bounds sv A(A). Assume the maximum in (1.1) is attained for the eigenvalue  of A, and let XI,..., x n be orthonormal vectors with AXI = XI' Then sv A (,1)n  [L IAi - I = det(A - I)  [L II(A - J)X;j12 [Hadamard inequality (1.2.2)] = II (A - A).Tllb Di>III(A - I)xiI12  IIElb(IIAII2 + IIAII2)n-l" The result follows on taking nth roots in the above inequality and from the symmetry of the resulting bound. (:J 1. GENERAL PERTURBATION THEOREMS 169 As we have mentioned above, the most desirable bound is one on the matching distance. In some cases bounds on the spectral variation or the Hausdorff distance can be converted into such a bound" Since the technique, with appropriate variations, can be applied to other problems, we will develop it informally and then summarize the results. Let us begin by relaxing our bound a little and writing _ 1 sv A(A)  /LIIElif == f, where Il = (max{2I1A + TEII2: 0  T  1})I-. As above, set Vi = {( : I( - Ad :::; f}, i=l,"..,n. The purpose of this adjustment is to make the bound monotone in T E. We claim that if any m of the disks Vi are isolated from the others, then their union contains exactly m eigenvalues of A. To see this, assume without loss of generality that the m disks isolated from the others are VI, V 2 , . . . , V"," For a :::; T  1, let AT = T A + (1 - T)A = A + TE, and let 1 VT) = {(: I( - Ail  /LIiTEIlf}. Since IIAII2 + II A t au ll2  Il, - lc 1 by Theorem 1.3, we have sv A (AT)  /LIlT EII2' = Tn f, and the eigenval- ues of AT lie in the union of the disks VT). N ow UZI Vo) contains exactly m eigenvalues of Ao = A, namely AI(A), A2(A),..., A",(A). Since Tf is an increasing function of T, as T varies from zero to one the region U;1 V;T) remains disjoint from the other disks. Since by Theorem 1.1 the eigenvalues of AT are continuous in T, they cannot jump from one disjoint regio to nother. Hence U1 V?) must contain exactly m eigenvalues of Al = A. 
170 IV. THE PERTURBATION OF EIGENVALUES It is now easy to obtain a bound on mcl(A, A} Let C 1 , C 2 , . . . , C k be the connected components of Ui'=l Vi" If C 1 is the union of ml of the disks Vi, then it contains exactly m{ eigenvalues of A and m{ eigenvalues of X Choose the' permutation 11' to associate the eigenvalues of A in each C 1 with tlJ(' corresponding eige'nvalues of A" Since each eigenvalue of A in C{ is within (2ml - 1)6(A, A) of any point in C{, each eigenvalue of A in C 1 is within (2m/ - 1)6(A, A) :s: (2n - 1)6(A, A) of the corresponding eigenvalues of A. We have just established the following theorem. Theorem 1.4 (Ostrowski, Elsner). - - III md(A, A) :s: (2n - 1)(IIAII2 + IIAlb) -;;-IIEII2'. Actually, we have only used the fact that Theorem 1.3 gives a bound on sv II (A} Elsner, by an application of Hall's thearem (The- orem 11.3.14), has shown that the factor 2n - 1 can be replaced by 2ln/2J. To summarize: Theorem 1.5. LetT 2 O. Ift3(T) is a nondecreasing bound onsvA(A+ T E), then md(A, A) :s: (2n - 1)13(1). If t3(T) is a nondecreasing bound on hd(A, A + TE), then md(A, A) :s: 2ln/2Jt3(I). (1.5) 1.2. The Bauer-Fike and Henrici Theorems As was pointed out in t.he introduction to this chapter, any general perturbation bound on the eigenvalues of a matrix will have to be pes- simistic. In Theorems 1.3 and 1.4, this shows itself by the fact that 6(A, A) is proportional to the nth root of the error v(A, A). Exam- ple 1- or rather a trivial extension of it - shows that this nth root is necessary. However, in most ca.<;es it is unrealistic. To see one way in which this can come about, let us return to Example 1 and set A" = 17AI, where 17 is presumed small. Let E be given with IIEII2 = E. If E is much smaller than 17, Elsner's bound will gp]J('rally be' of the' right ordpr, unlpss E has special structure. On the othe'r hand, if E 2 1/, then IIA" + Elb :s: 1/ + E :s: 2E, and no eigenvalue of 1. GENERAL PERTURBATION THEOREMS 171 A1} can change by more than 2E. The reason that the fourth root of the error is unrealstic in the second case is that A" + E can be regarded as a perturbatIOn of the zero matrix, which is well behaved. In this subsection we will derive a bound, due to Henrici, that takes this phenomenon into accollnt" It is based on a general theorem of Bauer and Fike. The?rem 1.6_ (Baur-Fike). Let Q be nonsingular, and let II . II be consIstent. If A E £( A) is not an eigenvalue of A, then IIQ-I(A - I)-IQII-I :s: IIQ-I EQII. ( 1.6) Proof. We have Q-I(A - 5..I)Q = Q-I[(A - 5..1) + E]Q = Q-I(A - 5..I)Q{J + [Q-1(A - 5..I)-IQ][Q-1 EQ]}. Since A - 5..£ is singular, 1:S: II[Q-I(A - 5..I)-IQ][Q-I EQJlI :s: IIQ-I(A - 5..I)-IQIIIIQ- I EQII, (1. 7) and this last inequality is equivalent to (1.6). . _ Note that if the left-hand side of (1.6) is regarded as zero when A E £(A), then the inequality holds for all eigenvalues of X In the sequel we will not be over fussy in dealing with trivial singularities of this kind. Our first application of the Bauer- Fike theorem is to prove Henrici's perturbation theorem. It is phrased in terms of a deviation from nor- mality. Recall that if a matrix is normal, its Schur form is diagonal. Consequently the size of the off-diagonal terms in the Schur form can be used to measure the departure of a matrix from normality. Defini tion 1. 7. Let v be a norm on e nx n. Let U be the set of unitary U such that U ll AU is upper triangular. For each U E U write U II AU = Au + Ru, where Ru is strictly upper triangular. Then the 1/- DEPARTURE FROM NORMALITY of A is the number 6v(A) '1,gf min v(Ru). UEU 
172 IV. THE PERTURBATION OF EIGENVALUES The departure from normality is not easy to calculate, since the Schur form is not unique. However, if A has eigenvalues Ai, then by the unitary invariance of the Frobenius norm we have for any Schur form IIAII} = L 1.\;1 2 + IIRII}" TllllS we have t.he following theorellL Theorem 1.8. For any matrix A with eigenvalues Ai, 8 F (A) = V IIAII} - L I A iI 2 . We are now ill a position to state and prove Henrici's theorem. Theorem 1.9 (Henrici). Let v be a norm on enxn_such_that v(C) ::::: IIGlb for all G E e"x". Then for every eigcnvaluc A of A there is an cigcnvaluc A of A sucll ilwt (r < IIElb (1.8) 1 + ( I>'AI ) + .., + ( 1)'_AI ) n-1 - 8 v (A)' 6,,(A) 6v(A) Proof. Let  be an eigenvalue of A, and let U H AU = A + R be a Schur form of A. Then by (1.6), II(A - I + R)-11l2 1  IIElb. (1.9) Since R is strictly upper triangular, (A - I + R)-I = {I - (A - I)-I R + . . . + (-1)"-I[(A - I)-I R]"-I}(A - I)-I. Thus if 8 = minp E £(A) : I - AI}, II(A - I + Rr l 1l2  8- 1 {I + 8- 1 8'J(A) + . . . + [8- 1 8 v (A)]"-I}. Hence ti 11(1\ - 5.1-1- R)-11I2- 1 ::::: 1 + 8,,(A)/8 +... + [ti,,(A)/8],,-1 (1.10) 1. GENERAL PERTURBATION THEOREMS 173 The theorem follows on combining (1.9) and (1.10) and dividing by 8 v (A). . The remarkable thing about Henrici's theorem is that it provides a conti1Uous traJ.lsition between the two cases mentioned at the beginning of tillS subsectIOn: namely, the case in which the perturbation bound is proportional to the nth root of the error and the case in which it is proportional to the error itself. To see this let , 'I/J(T/) = T/n/(1 + T/ +... + 1JnI), (1.11) so that the left-hand side of the bound (1.8) has the form 'I/J[I _ AI/8 v (A)]. For T/ small, 'I/J(11)  T/ n , and the bound takes the asymptotic form - 1 sVA(,1) < ( "Elb ) "" ti'J(,1) '" ti,,(,1) When T/ is large, 'I/J(T/)  T/, and the bound takes the asymptotic form sv A(A) :s IIEII2' Specifically, we have the following corollary. Corollary 1.10. If IIElb/ti,,(A) < n- I , then - 1 sVA(A) l ( "EI2 ) "" <nit - 8v(A) - tiv(A) If IIEII2/ti v (A) > 1, then sv A(A)  IIEII2 + 6 v (A). Proof. If 'I/J(11) < l/n, then T/ < 1. Hence if IIEII2/8,,(A) < n-I, n- I ( SVA(A) ) " < 1/; ( SV A (,4) ) < IIEII2 6v(A) - 6v(A) - 6v(A)' (1.13 ) (1.12) from which (1.12) follows. On the other hand, if '1/;(11) > 1, then T/ > 1 and 'l/;(11) = 1 + -I + 11 + _( -I) ::::: 11(1- 11- 1 ) = 11- 1. 11 . . . 11 n 
174 IV. THE PERTURBATION OF EIGENVALUES Hence if IIEI12/o//(A) > 1, sv A (,1) < 'IjJ-I ( IIElb ) :S IIElb + 1, o//(A) - o,,(A) oAA) which is equivalent to (LI3). . Using Theorem L5 and the monotonicity of 1j}, we have the following bound on md(A, A). eorollary 1.11. Let 1j} be defined by (1.11). Then - -I ( IiElb ) md(A, A) :S (2n - l)o,,(A)1j! o//(A) ' An unsavory aspect of Henrici's theorem, one that it shares with Theorems L3 and 1.4, is the nth root of the error in its bounds. The examples that show that it's presence is necessary all deend on the matrix having a Jordan block equal to its order. The followlIl theorem shows that for a matrix with smaller Jordan blocks the root IS smaller. Its proof is similar to Henrici's theorem and is left as an exercise. Theorem 1.12. Let Q-IAQ = J be the Jordan canonical form of A. Let m be the size of the largest Jordan block hl J. Then for any eigenvalue  E A( A) there is an eigenvalue A of A such that _ I - A1 111 _ :S IIQ-I EQII2' 1 + IA - A\ +... + IA - A11I1-1 (LI4) 1.3. Residual Bounds Let the columns of X form a basis for an invariant subspace of A. From Theorem}3.9, we know that there is a unique matrix M (which is now easily seel; 'to be X t AX) such that AX - X M = O. The matrix M is the representation of A on R(X) with respect to the basis X, and hence the eigenstfllcture of M is a substructure of the eigenstructure of A. 1. GENERAL PERTURBATION THEOREMS 175 Now suppose that the columns of X span a subspace that is only approximately invariant. For example, X may come from a numerical algorithm for approximating invariant subspaces. Then for any AI the resid ual R = AX - X M (1.15 ) is nonzero, although presumably with a proper choice of M it can be made small. An important problem in perturbation theory is: Given some norm of R, determine how near R(X) is to an invariant subspace of A and how the eigenvalues of M relate to those of A. We will consider the invariant subspace problem in the next chapter. Here we will focus on the eigenvalue problem. The key tool in our investigation is the following backward pertur- bation theorem. Its proof, which is purely computational, is left as an exercise. Theorem 1.13. Let A E e nxn , X E e nxp , and ME e pxp . Let R be defined by (1.15). IfyH is any matrix satisfying yH X = I and A = A - RyH , (LI6) then AX - X M = O. The theorem says that if R is small then R(X) is an exact invariant subspace of a matrix A that is near A - in fact within IIRyHl1 of A in any norm 11.11. Moreover, M is the representation of A on R(X), and its eigenvalues are therefore eigenvalues of A. Since we know II Ell, we may use any appropriate perturbation theorem for eigenvalues to assess the accuracy of the eigenvalues of M. For example, from Corollary 1.11 we have the following corollary. eorollary 1.14. Let /11, . . . , /1p be the eigenvalues of M. Then there are eigenvalues Ah , . . . , Ajp of A such that l/1i - Aj,l :S (2n - l)o//(A)'IjJ-l C'/;12 ) . The problem of choosing M and Y to minimize IIRyHIl still remains. In general, the problem is intractable; however, for unitarily invariant norms it has an elegant solution. 
I7G IV. TilE PERTUHBATION OF EIGENVALUES Theorem 1.15. In the notation of Theorem 1.13, assume thai X H X = I. Let II . II be a unitarily invariant norm. Then II RII is minimized [or M = X H AX, and RyH is minimized [or M = X H AX and Y = X. Proof. Let (X X,d be unitary. Then from (1.15), IIHII = II(X Xd H RII = ( XII AX - M ) X.lAX It follows from Corollary 3.8 that II HII is minimized when XII AX - M = O. To minimize II RyH II, note that yll X = I implies that Y = X + X.lS, for some 5" Then IIR} ""II = II(X X.l)1I R}r"(x X.l)1I ( XHAX - M (XHAX - M)SI! ) X.lAX X.lAXS H f R vH . '" d I X H A X - Again by Corollary 3"8, the norm OIlS mllllmlze w len !11 = 0 and 5 = O. . Notes and References PerdIrbation theory for eigenvalues comes in two flavors. In this book we consider comparatively unstructured errors and attempt to bound the per- turbations in terms of some norm of the errors. Other approaches impose some structure on the errors; for example, they may be analytic functions of a complex variable. The problem is then to determine how this structure affects tlH' pertmbed eigenvalues: e.g., when are they analytic functions of the variable, what kind of paths do they follow in the complex plane? For more result.s of t.his kind see tJH' books by Kato [135] and Baumgartel [17]. The approach taken here is the one generally followed by numerical analY,sts; for example, Householder [121, 1964] and Wilkinson [269, 1965]. PartIcu- lar mention should be made of the elegant little book by Bhatia [28, 1987], which is a rcquired supplement to this chapter. RouchC's theorem may be found in most texts on complex analysis. As Ex- amplc 1 shows, we cannot expect much more than continuity in the eigen- valucs, at least. for defcctivc cigenvalues. However, the reader should not 1. GENERAL PERTURBATION THEOREMS 177 conclude from this example that all perturbations of defective eigenvalues are multiples of primitive roots of unity. A counterexample is given in the exercises. The term "spectral variation" is found in Henrici [109, 1962] but may be of earlier vintage. The Hausdorff distance between two sets may be found in the second edition of Hausdorff's famous book on set theory [104, 19 1 4]. In general, the Hausdorff distance is a metric only over the class of closed bounded sets, which is just what the set of eigenvalues of a matrix is. The term "optimal matching distance" seems to be due to Bhatia [28], although thc concept. has been around for some time (e.g., Henrici calls it. t.he eigen- value distance). The first. general perturbation bounds for eigenvalues were given by Os- trowski [169, 1957]. Theorem 1.3 is due to Elsner [66, 1985], who also shows that the bounds are in some sense the best possible (Exercise 1.4). It is perhaps significant both Ostrowski and Elsner use Hadamard's inequality in deriving their bounds. The fact that one can count the eigenvalues in the connected components of inclusion regions provided by the bound on the spectral variation was first. noted by Gerschgorin [85, 1931] and also by Ostrowski [169, 1957], who used it to establish the "271 - I" bound on the matching distance. Thc "2ln/2j" bound is due to Elsner [65, 1982]. Bauer and Fike [15, 1960] have not. been treated fairly in the literature. Their names have become associated with a weak corollary of Theorem 1.6 (Theorem 3.3), which is frequently trotted out as a straw man by people who have not read the original paper. The generality of their technique makes it applicable in a variety of situations (e.g., Henrici's t.heorem, Gerschgorin's theorem in the next section, and also a useful theorem of Demmel [55]). Henrici's theorem [lOg, 1962] is but one of many results that Henrici casts in terms of the depart.ure from normality. The observation (Corollary 1.10) that the theorem provides a smooth t.ransition from nonlinear to linear behavior in the bounds is new. For a single eigenvalue, Theorem 1.13 may be found in Wilkinson's book [269, 19 6 5]. The optimality conditions of Theorem 1.15 are part of the folklore, at least for the Frobenius norm. The observations that the conditions are optimal for all unitarily invariant norms appears t.o be new. In some applications we may have, in addition to a residual for an approxi- mate invariant. subspace, a residual for t.he corresponding left invariant sub- 
178 I V. '\'IIE PERTURBATION OF EIC:ENVALUES 1. GENERAL PEIUUrWATION TIIEOREMS 17U space. Kahan, Parlett, and Jiang [134, 1982] Rhow how to use this infonna- tion to derive a backward perturbation theorem (See Exercise 1.12). 9: (A structured backward pert.urbation theorem for eigenvectors). Let r = (AJ - Ai) and let. 5 be nonnegative. Set. Exercises ]" Show t.hat the I'ienvalues of the matrix ./,,(0) + £lilT (i = 1,..., n) are zero with multiplicity n - i and (+ times the primitive ith roots of unit.y. 2. (Wilkinson [269, p.80]). Let A = diag[h(O), h(O)]. Show that there is a perturbation of A of order ( for which all the eigenvalues of the perturbed 5 matrix are of order (2. Ipil (= max- ( I _ I ' 5.7; )i (Here % = 0 and otherwise p/O = (0). If ( oF 00, there iR a matrix E satisfying lEI :s (5 such that (.>., i) is an eigenpair of A + E. 3. Let [A] be the equivalence class of matriceR having the same eigenvalues as A. Show t.hat. the Hausdorff dist.ance and the matching distance are metrics over the space of such equivalence classes. THE FOLLOWING EXERCISES CONCERN BACKWARD PERTURBA- TIONS WHEN RESIDUALS FOR LEFT AND RIGHT INVARIANT SUB- SPACES ARE KNOWN. THEY ARE BASED ON A GENERAL THEO- REM OF DAVIS, KAHAN, AND WEINBERGER ON DILATIONS [52, 19 82 ], WHICH IS ESTABLISHED IN THE NEXT TWO EXERCISES. THE DILATION PROBLEM MAY BE STATED AS FOLLOWS. GIVEN THE PARTITIONED MATRIX 4. (Elsner [66)). Show that _equality hlds in the bound (1.4) if and only if A = wliAlbI (Iwl = 1) and A has -w11A1I2 as an eigenvalue. 5. (Kato [135, p.l09)). Let M(T) be an unordered n-tuple of n numbers that depend continuously on the parameter T in an interval I. Show that there are functions II'i ( T) (i = 1,..., n), continuous on I, such that M (T) conRiRtR of ILl (T), .. . , Il" (T). 6. Establish the bound (1.5). [Note: This is a difficult problem. The idea is to declare eigenvalues A and.>. relat.ed if they can be connected by a suitably short chain of disks. One then applies Hall's theorem (Theorem 3.14) to the 0-1 matrix of this relation. See [65, 28] for details.] A = ( All A 21 A 12 ) A 22 ' DETERMINE A 22 SO THAT IIAII2 IS MINIMIZED. FOR A HISTORY OF TilE PROBLEM AND APPLICATIONS, SEE THE ARTICLE JUST CITED. 10. Let IIAlllb :s: v. Show that. 7. Show that if 11(.>.1 - A) ] 112 2 17, then there is an eigenvalue A of A satisfying II ( : ) 1/2 :s V I'>' - AI :S 2(IIAII2 + 17-1)11-' if and only if A 21 = K21(v 2 J - ArlAll), where IIK21112 :S 1. In particular we may take K 21 = A21(v 2 I - ArIAll)t/2. 11. Let A be as above and let 8. (Henrici [109]). For any A PF(A) :S i n3 1; n V IIAIIA - AAHIIF. "'ax {II ( :: ) II; II(A" A.,)lb} " v Characterize the matrices for which equality holds. [Note: Recall that A is normal if and only if IIA II A - AAHIIF = O. It is therefore not surprising that the size of IIAIIA - AAHIIF is related to the departure from normality. This problem iR not an paRY.] Then A 22 may be chosen so that IIAII2 :S v. In particular if K 21 = A 21 (v 2 1- Arl A ll)t/ 2 and K12 = (v 2 I - A ll Arl)t/ 2 A I2 , then the most general form of An is A 22 = -K21A\KI2 + v(I - K2IKJi)C(I - KnKI2), 
180 IV. THE PERTURBATION OF EIGENVALUES where C is an arbitrary matrix satisfying IICII2 :::; ]. [Hint: Apply the previous exprcise three times: twice to define K 21 and K I2 and once to the partition A = ( : ) . N .IL, t hI' last st.pp is nont.rivial.] 12. (Kahan, Parld,t., and Jiang [134]). Let A E C/ X /. Let X, Y E c nx l' have orthonormal columns, and assume that yH X is nonsingular. For any M E Cl'Xl' let N = (yH X)-I M(yH X). Set R = AX - XM and SH = yHA - NyH. Then there is at least one matrix E such that (A + E)X = XM and yH(A + E) = NyH. MoreovC'r the smallest solution in the Frobenius norm satisfies IIEIIF = V IIRII} + IISII} -llFull, where Fll satisfies yHR SH X . The smallest solution in the spectral norm IIEII2 = max{IIRII2, IISI12}' [Hint: Let (X Xd and (Y Yd be orthogonal and set ( yH ) ( Fu FI2 ) . H E(X Xd = Y.L F21 F22 Then show that only F 22 is free.] -0- 2. Gerschgorin Theory: Differentiability The results of the last section do not suggest a way to assign a condition number to an eigenvalue. The problem is that eigenvalues associated with a nontrivial Jordan block are not differentiable functions of the elements of the matrix. However, this does not mean that individual eigenvalues cannot behave in a locally linear fashion and hence have condition numbers. This section is devoted to one of the most powerful tools for probing the sensitivity of a single eigenvalue - the Gerschgorin thporcm. 2. GERSCHGORIN THEORY 181 2.1. Gerschgorin's Theorem trictly speaking, ?erschgorin's theorem is not a perturbation theorem; It states that the eIgenvalues of a matrix lie in the lInion of certain disks !n the complex plane. However, as we shall see in the next subsection, It can be used to establish extremely accurate perturbation bounds" There are several ways of establishing Gerschgorin's thearem. Here we will approach it through the Bauer'Fike theorem of the last section. Theorem 2.1 (Gerschgorin). For A E e nxn let ai = L laijl j#i and 9i(A) = {z E e : Iz - aid::; ad. (2.1 ) Then n £(A) c U 9 i (A)" i=1 (2.2) Moreover, if m of the GERSCHGORIN DISKS Qi(A) are isolated from the otler n - m disks, then there are precisely m eigenvalues of A in their Ulllon. Proof. Let D = diag( au, a22, . . . , ann). In the Bauer Fike theorem make the following substitutions: Q A A II. II <-11.1100' <- I, <-D, <-A, Then it is easy to verify that the first inequality in (1. 7) is equivalent to saying that each eigenvalue of A lies in a Gerschgorin disk. The proof of the second part of the theorem uses the techniques developed in the previous subsection and is left as an exercise. _ The following illustrates how mllch an improvement Gerschgorin's heorem cn be over Elsner's thearem. It also illustrates a deficiency m the straIghtforward use of Gerschgorin theorem. 
182 IV. TilE PERTURBATION OF EIGENVALUES Exalnple 2.2. Consider the matrix ( 1 10 2 - 4 ) . A = 10- 4 (2.3) Regarding A as a perturvation of the matrix diag(l, 2), we find from Theorem 1.3 that one eigenvalue must lie in the interval [1- 0.021,1 + 0.021] and the other in the interval [2 - 0.021,2 + 0.021] (ac.tually the thf'orem yields intervals that are just barely greater than "04 mlength). On the ther hand, vy Gf'rschgorin's theorem each of the intervals [1- 10- 4 ,1 + 10- 4 ] and [2 - 10- 4 ,2 + 10- 4 ] must contain an eigenvalue of A. Thus Gerschgorin's theorem is vetter than Elsner's vy more than two orders of magnitude. However, the eigenvalues oE A arc approximately 1 - 10- 8 and 2 + 10- 8 . Thus Gcrschgorin's theorcm is still off by four orders of magnitude. It is worth noting that in the above example we have replaced disks in the complex plane with intervals on the real axis. The ratiolale for this is the following. The two disks - either the ones provIded by Elsner's's theorem or by Gerschgarin's theorem- contain only oI.le eigenvalue each. Since A is real, its complex eigenvalues must occur III complex conjugate pairs. Hence the eigenvalues in the disks must, be real and are contained in the intersection of the disks with the real lIne. 2.2. Diagonal Similarities Example 2.2 shows that the bounds provided by Gerschgorin's theorem need not be very sharp. Now in principle there is no reason why Gersch- gorin's theorem should provide sharp bounds. However, te matrix A of (2.3) has a special structure; it is almost diagonal, and It turns out that we can exploit this structure to obtain sharper bounds. The general technique is seen at its simplest with the matrix of Example 2.2. Let Do = diag(o:, 1), and let At = Da AD ;:/ = ( 1 10- 4 0:- 1 1O40: ) . 2. GERSCHGORIN THEORY 183 Since Aa is similar to A it has the same eigenvalues; however, A and At have different Gerschgorin disks. As 0: becomes small the first disk shrinks, while the other grows. Eventually, the second disk expands to engulf the first, but until it does, the first provides an ever-improving bound on the eigenvalue. In particular, as long as 10- 4 0: + 1O40:-1 < 1, the two Gerschgorin disks will remain isolated. It is easy to see that this will be true as long as 0: is just a little greater than 10- 4 , say 0: = 1.01 . 10- 4 . This isolates an eigenvalue of A in the interval [1 _ 1.01.10- 8 ,1 + 1.01 . 10- 8 ], which is a very sharp bound. Of course this is a trivial example. However, the technique it illus- trates -- that of reducing one Gerschgorin disk until the others over- whelm it - is widely applicable. We will give another example in the proof of the following theorem. Theorem 2.3. Let), be a simple eigenvalue oE the matrix A, with right and left eigenvectors x and y, and let :4 = 1 + E be a perturbation of A. Then there is a unique eigenvalue). of A such that - yll Ex ). = ). + II + O(IIEII2). Y x (2.4) Proof. Let (j > 0 be the distance between). and the other eigenvalues of A. Let J = yH AX be the Jordan canonical form of A, in which the superdiagonals arc equal to (j/3 or zero [see (1.3.3)]. Note that the first columns x and y of X and Yare the right and left eigenvectors corresponding to )., and since yH = X-I we have yHx = 1; i.e", the denominator in (2.4) is nonzero. Now consider the matrix j = yH(A + E)X. This matrix has the form illustrated below for n = 5: ). + yHEx E E E E E IL T E E J= E E IL T E E E E IL T , \ E E E E IL 
2. GERSCHGORIN THEORY 185 184 IV. TilE PERTURI3ATION OF EIGENVALUES Here we have used f to stand generically for a quantity bounded by IIYIIIIEIIIIXII; 11 for an eigenvalue of A other than A plus f; and 7 for a quantity bounded by f + 8/3. By a diagonal similarity transformation, we may replace .1 by a matrix of the form eorollary 2.4. Under the hypotheses of Theorem 2.3 the eigenvalue A is a differentiable function of A. Moreover, A + yHgr Ctf Ctf ()'f Ctf a:-'I f f.L 7 f f J" = Ct-If f f.L 7 E Ct-IE f E Jl 7 Ct-IE f E E f.L OA  i/ij - - OCt;j yH x . (2.6) Proof. By definition a function f(A) is differentiable if there is a linear operator f such that f(A + E) = f(A) + f(E) + o(IIEII). Equation (2.4) exhibits such an operator for the eigenvalue A: namely, E I-> yH1Px . y x To establish (2.6), note that . Now the first Gf'fschgorin disk of .1" has center A + yH Ex and radius hOlIl\(jpd hy (1/-1)nc The othl'r disks havl' cent.er Ji and radii boulICled by n-If + T + (1/ - 3)E. I1cllce if OA = lim A(A + 71i1J) - A(A) O()'ij TO T But by (2.4) Ct-IE + 8/3 + nf + (71. - I)CtE < 8, the first Gerschgorin disk will be disjoint from the others. N ow let E be small enough so that 2 8 -8 - nE > -. 3 2 (2.5) Hl"l T x A(A + 71i1J) - A(A) = 7 Y  j + 0(7 2 ) y X ' provided we require E to be so small that 16m 2 <1. In this case, the radius of the first Gerschgorin disk is bounded by 41/f 2 /8 = 0(E 2 ). Since this isk is centered at A + yH Ex, the unique eigenvalue it contains is om A. _ An immediate consequence of Theorem 2.3 is that the simple eigen- values of a matrix are differentiable functions of the elements of the matrix" from which the result follows immediately. _ The proof of Theorem 2.3 is almost as interesting as the theorem itself, since it gives us insight into the factors that make the higher order terms important. Specifically, we require that terms involving E/8 be sufficiently small. The denominator 8 shows that if a simple eigenvalue is near its neighbors, the range of perturbations for which the derivative provides an adequate approximation will be restricted. The size of the numerator depends not only on E, but on the sizes of the reducing transformations X and Y. If these are large, we again can expect higher order terms to become significant. It is worth noting that according to the remark following (1.3.3), a small value of 8 will tend to aggravate this effect. Equation (2.4) can be written Then if 8 nE0 2 - -0' + E < 0, 2 the inequality (2.5) is satisfied. This latter condition will be satisfied if 4E 0=6'  = yH(A + E)x + 0(IIEII2). yllx (2.7) The quantity yll(A + E)x/yHx is called a RAYLEIGH QUOTIENT, and one way of stating the theorem is to say that the Rayleigh quotient provides a first-order approximation to the perturbed eigenvalue. We 
18G IV. THE PERTURBATION OF EIGENVALUES will generalize the notion of a Rayleigh quotient in Section V.2, where we will give explicit bounds for the second order terms. The theorem also provides us with a condition number for a simple eigenvalue. We see from (2.4) that I). - AI < lIyll . II xII II Ell rv Iyllxl J, for any consistent pair of matrix and vector norms. Thus the quantity v= lIyll.llxll lyHxl (2.8) is a condition number for A. When 11.11 = 11.112, the number v is the secant of the angle between x and y" It is one when .r and y lie in the samf' dirf'ction, and grows un!Jonndpdly as .7: and y approach ort.hogonalit.y" Notf' t.hat if A iR simple its left and right eigenvectors cannot be orthogonal, although it is easy to construct examples where they are as close to orthogonality as we like. Also note that the left and right eigenvectors corresponding to a nontrivial Jordan block have to be orthogonal. Notes and References GerRchgorin [85, 19;Jl] established hiR theorem aR a corollary to the theorem that a diagonally dominant matrix is nonsingular. In particular, the union of the Gerschgorin disks of A is the complement of the set of all ( for which (I - A is diagonally dominant. In a restricted form the diagonal dominance theorem is due to Levy [146, 1881], and in a general form to Desplanques [58, 1887]" The theorem kept getting itself rediscovered until Olga Taussky put a stop to it with a paper appropriately entitled A Recun'ing Theorem on Dctenninants [n9, 1949]. Rohrbach [187, 1931] used the technique to estab- lish eigenvalue bounds but did not define the regions now called Gerschgorin disks. Actually, the theorem stated by Gerschgorin is not true unless the matrix is irreducible (see Exercise 2.7). More generally, if 7r is any proposition such that 7r(A) is true if and only if A is nonsinp;ular, tlH'n the complement of the set {( : 7r( (I - A) is true} C!mtainR all the pip;envalues of A. By varying 7r one can p;et different regions, some of which are treated in the exercises. 2. G ERSCHGORIN THEORY 187 In his paper, Gerschgorin noted that the union of k isolated disks contains exactly k Eigenvalues. Although the idea of using diagonal similarity to reduce the radius of an isolated disk is due to Gerschgorin (and in a different sense to Rohrbach), it was Wilkinson [269, 1965] who refined the technique and applied it to a variety of problems. Although ad hoc techniques for reducing the diameter of an isolated disk suffice for most applications, there are algorithms for determining the optimal disk [253, 154]. Exercises 1. A matrix A is strictly diagonally dominant if 100iil > :L 100ijl, #i i = 1,..., n. Show that a strictly diagollally dominant matrix is lIonsingular and us( this fact to prove Gerschgorin's theorem. 2. Let Ax = Ax, and suppose Ijl:::: Iil (i = 1,...,n). Show that A lies in the Gerschgorin disk centered at O'jj. 3. (Ostrowski [167]). Let Pi = L#i 100iji and "Ii = L#i 100jil. Show that if for some T E [0,1] I(Xi;! > Ii pI T, i = 1,... ,n, then A is nonsingular. 4. (Ostrowski [167]). In the notation of the last exercise suppose that I II I > T I-T T I-T °ii O'jj Ii Pi Ij Pj , i,j=I,...,n, i¥j. Show that A is nonsingular. 5. (Qi [182]) . Let A be of order n. Let {ji = max { :L 100;jl,:L IOj;! } , #i #i i = 1,..., n. Show that the singular values of A lie in the union of the intervals [lO'ii 1 _ {ji, 100iil + {ji] (i = 1,..., n). 
188 IV. THE PERTURBATION OF EIGENVALUES 6. (Feingold and Varga [72]). Let A ue partitioncd in the form ( All A 21 A= . Au Alk ) A 2 k Ak ' Al2 An A k2 and let II .11 be a consist.ent norm. Show that if A is an eigenvalue of A, then for some i II(AI- Aii)-lll1 ::; L IIA;jll. Hi 7. A square matrix A is fiEDUCIBLE if there is a permutation matrix P such that pT AP = ( AI Al2 ) A22 ' I A and A 22 are sl l uare. Show that an irreducible diagonally domi- w lere II . . 1 d . t' nant matrix for which at least one of the diagonals IS stnct y omlllan IS l1ousil1/!,ula L 8. (Taussky [238]). Let A be irreducible. Show that A E £(A! lies on the boundary of one Gerschgorin disk, then it lies on the boundanes of all the Gerschgorin disks. 9 L t A - X J ( A ) yH + ... + XkJ m (Am)Y, be the Jordan decom- . e - I 11'1 I I 'k 'f E . position of A, and assume that Al has multiplicity_ mI. Sho.w that I IS sufficiently small there are exactly m eigenvalues of A that are III £[J m1 (Ad+ yllg"YI + O(IIEI12)J. 10" (Wilkinson [271]). Let 1: and y be left and right eigenvectors crrespon- ing to t.he simple eigenvalue A. Let () = L(.T, y). Show that there IS a matnx E satisfying IIEI\2 < cot () IIAII2 - such that A is a multiple eigenvalue of A + E. Otherwise put, if. a imple eigenvalue of a matrix has a large condition number, then the matnx IS near one with a multiple eigenvalue. 3. NORMAL AND DIAGONALlZABLE MATRICES 189 3. Nonnal and Diagonalizable Matrices A normal matrix is any matrix satisfying AHA = AA H . From this it follows that Hermitian matrices, skew Hermitian matrices, and unitary matrices are all normal. Given the importance of this class of matrices, it is natural to seek a special perturbation theory for its eigenvalues. The main complicating factor here is that normal matrices, unlike Her- mitian matrices, can have complex eigenvalues which cannot be ordered by size. Nonetheless, normal matrices' have enough structure to enable us to prove the striking Hoffman--Wielandt theorem. Since any normal matrix can be diagonalized by a unitary trans- formation, the normal matrices are special cases of diagonalizable ma- trices; that is, matrices that can be diagonalized by similarity trans- formations (these matrices are sometimes called normalizable). In the second subsection we will treat the perturbation of eigenvalues of diag- onalizable matrices. 3.1. The Hoffman-Wielandt Theorem In Section 1 we saw that it is relatively easy to obtain bounds on the spectral variation sv A (ii) of a mat.rix ii with respect to A. Although it is usually possible to escalate such a bound into a bound on md(A, A), we pay the price of a factor of 2n - 1 in the bound. The essence of the HoffmanWielandt theorem is that when A and A are normal we do not have to pay such a price to get a bound on md 2 (A, A) f min L vl l).,,(i) - Ai1 2 , " " (3.1 ) where 7r ranges over all permutations of the integers 1,2,.. . , n. (The subscript 2 refers to the 2-norm. In this notation the usual matching distance is md oo .) Theorem 3.1 (Hoffman-Wielandt). Let A and A be normal. Then md 2 (A, A) :::; IIA - AIIF, (3.2) where md 2 (A, A) is defined by (3.1). 
190 IV. TilE PERTURBATION OF EIGENVALUES 3. NORMAL AND DIAGONALIZABLE MATRICES 191 Proof. Since II . IIF is unitarily invariant, we may assume that A = A = diag(A\,..., \,). Let A = wAw H , where W is unitary and A = diag(I' . . . , n)' We will have established the theorem if we can show that II A - V A VIIIIF, regarded as a function of the unitary matrix V, is minimized when V = P" is a permutation matrix corresponding to some permutation 7r. For in that case, It follows that if 7r is the permutation for which 1/}( P,,) is maximal, then 1/J( S) :S 1/J( P,,). Hence rp( P,,) is also maximal, and 7r is the permutation required by the theorem. . The hypothesis that both A and A be normal is necessary. For example, let md(A, 11) = md(A, A) :S L IAi - "(i)12 i :S IIA - WAWIIII = IIA - AII. A ( n and Denoting the elements of V by Vij, we have by direct calculation A = ( -1 -1 ) 1 1 ' IIA - VAVHII = L \A;\2 + L lil2 - rp(V), so that A is no!mal but A is not. The eigenvalues of A are 0 and 4 while those of A are both zero. Hence where = - - 2 rp(V) = L(AiAj + AiAj)lvijl . i,j md(A, A) = 16 > 12 = IIA - AII. 1/J(S) = L(Aij + ),ij)(Jij' i,j This fact complicates the practical application of the Hoffman Wie- landt theorem, since the sum of two normal matrices may not be nor- m1. Even the sum of a normal matrix and a Hermitian matrix may faIl to be normal. Thus the class of perturbations that the theorem can handle is strictly limited. A case in point is the attempt to derive residual bounds from a backward pe:turbation result like Theorem 1.13. The difficulty is that the matrix A, defined by (1.16), need not be normal. Howver, the following result gives a residual bound for a single eigenvalue. Thus our problem reduces to showing that rp(V) is maximized when V is some permutation matrix. Since V is unitary, the matrix whose elements are IVij 1 2 is doubly stochastic. For any doubly stochastic matrix S define It is clear that max rp(V) < max 'lj!(S), since not every doubly stochastic v - 5 matrix has elements of the form IVij 1 2 , where V = (I/ij) is unitary. Therefore, if we can show that 1/) is maximized when S is a permutation matrix P", tllPn since P" is unitary, it also maximizes rp. By llirkhoff's theorem (Theorem 1I.3.1G), any doubly stochastic ma- trix S can be written as a convex combination of the permutation ma- trices P,,: namely, Theorem 3.2. Let A be normal. If IIxll2 = 1, then li,IAi - xHAxl :S IIAx - (x H A.r)xIl2' (3.3) Proof. Since A is normal, there is a unitary matrix U such that A = UAU H , where A = diag(AI,"', An). Hence S = L ex"P", " II [A - (x H Ax)I]xIl2 = IIU(A - (x H Ax)I)U H xIl2  min IAi - xHAxl Iin ' where the Ct" are nonnegative and sum to one. Since 1/) is linear in S, 1/;(5) = L a,,1/}(P,,). " from which (3.3) follows. . 
192 IV. THE PERTURBATION OF EIGENVALUES 3.2. Diagonalizable Matrices The chief general result for diagonalizable matrices follows from the Bauer-Fike theorem (Theorem 1.6). Theorem 3.3. Suppose that A is diagonalizable; i.e., X-I AX = A, where A is diagOllitl. L,rt 11 . 11 he iI cOJlsisf,rJlt matrix Jlorm sllch that IIdiag(81"" ,8,,)11 = maxi 18;\" Then sv A(A) ::; IIX- I EXII (3.4) and sv A(A) ::; I\;(X)IIEIJ, where as usuall\;(X) = IIXIIIIXIII. Moreover, md(A, A) ::; (2n - 1)lIx- 1 EXII ::; (2n - 1)I\;(X)IIEII. (3.5) (3.6) Proof. Let). be an eigenvalue of A. Under the hypotheses of the theorem, the inequality (1.6) in the BauerFike theorem assumes the form II(A - ).1)-111- 1 ::; IIX- 1 EX\J, from which (3.4) follows immediately. The inequality (3.5) follows from consistency. Finally, (3.6) follows from Theorem 1.5. . The bounds (3.4) and (3.5) hold for the widely used norms. II. . lip (p = 1, 2, 00) (and in fact for all the Holder nrms). They hold tnJa.lly for all normalized unitarily invariant norms, Slllce these norms dommate the spectral norm. Corollary 3.4. If A is normal, then sv A(A) ::; IIEI12' Although (3.4) is stronger than (3.5), we will usually have no more than an estimate of IIEIJ, in which case we are forced to use the weaker bound. Here the condition number of the matrix of eigenvectors serves as an overall condition number for the eigenvalue problem of A. Unfor- tunately, if we replace X by X D, where D is diagonal, I\;(X) c?anges, even though X continues to diagonalize A. Moreover, bYr makIlg ole colullllI of X very large or vC'ry small we can make I\; (X ) arbItrarIly 3. NORMAL AND DIAGONALIZABLE MATRICES 193 large - a situation we called artificial ill-conditioning in the last chap- ter. These considerations lead us to ask: What is the optimal scaling of X? In general this is a very difficult question; however, for the Frobenius norm we can give an answer. Theoreill 3.5. Let X E C" X" be nonsingular, and let yll X = J. Then I\;p(X) 2: L IIVill2l1 x ill2, with equality if and only jf there is an 0: . 0 sllch that IIvdl2 = 0:11xd12' i = 1,... ,n. (3.7) Proof. By the Cauchy inequality 4(x) = (lIxIII + ... + IIxnllD(IIVdl +... + IIYnll) 2: (II X II1211YIII2 +... + IIx n ll2I1v,,1I2)2. Equality holds if and only if (II X lll2,..., IIxnll2) and (IIvdI2"'" IIVnlb) are proportional, which is equivalent to (3.7). . There are two observations to be made about this theorem. First, the proportional scaling (3.7) is probably not a bad strategy for other balanced norms like II . lip (p = 1,2,00). Second, if the eigenvalues of A are simple, the optimaII\;F(X) is the sum of the individual condition numbers of the eigenvalues [ef. (2.8)]. This shows that the bounds in Theorem 3.3 are realistic in the sense that if the optimall\;F (X) is large, then there must exist at least one ill-conditioned eigenvalue. Notes and References For the HoffmanWielandt theorem, see [117, 1953]. Wilkinson [2G9, 19G5] gives an elementary proof that docs not use Birkhoff's theorem. The Hoffman- Wielandt theorem can be rewritten in a suggestive manner. Let <I> be a symmetric gauge function and let 11.11<1> be the associated unitarily invariant norm. Set -_' - 2 - 2 md<l>(A, A) - lIn <1>(1..\"(1) - ..\J I ,.".,1..\,,(,,) - ..\nl ), 
194 IV. THE PERTURBATION OF EIGENVALUES where as usual 7r ranges over all permutations of the integers 1,. . . , n. Then for <1>(1:) == /1:[/12, the Hoffman- Wielandt theorem states that md<I>(A, A) :s: /lA - AII<I>. (3.8) It is natural to cOllj(cture that (3.8) remains true for lIormal matrices and arbitrary ullitarily invariant 1I0rms. The coujecture is untrue, even for 01'- thogollal matrices; however, many partial results are known. The following survey is largely based on the book by Bhatia [28], which contains proofs and further references. Mirsky [158, 19Go] showed that the conjecture is true for Hermitian matrices. See Section 3 for a proof and applications. Wittmeyer [274, 1936], claims that the theorem is true for normal matrices and the 2-norm, but he refC'rs the reader to his Ph.D. thesis for the proof. Since others have tried and failed to establish this result, it must remain open ulltil Wiu'meyl'l"s proof can bc examined. Bhatia and Davis [29, 1984] have showll that the conjecture is true for or- thogonal matrices and the 2-norm. Another proof was given by Bhatia and Holbrook [32, 1985]. Other partial results are obtained by relaxing the bound. Bhatia, Davis, and McIntosh [:31, 1983] have shown that for unitary matrices - 7r- md<I>(A, A) :s: 2/1A - AII<I>, and they give an example to show that  is the best possible constant (Ex- ercise 3.4). They also show that for normal matrices md(A, A) :s: 1'IIA - A1I2, where l' :s: 2.91 [30, 1987]; i.e., the conjecture is true for normal matrices and the 2-norm, provided we multiply the right-hand side by a factor of about three. For most practical applications this is good enough. The inequality (3.5) is due to Bauer and Fike [15], but as we have pointed out it is a weak corollary of their more general results. Exercises 1. Let A and A be normal of order n. Show that 11,1 - AIIF 2 n\x L V I-',,(i) - Ai1 2 , i 3. NORMAL AND DIAGONALIZABLE MATRICES 195 where 7r ranges over all permutations of the integers 1,2,. . . , n. 2. Let A and A be normal. If thcre are convcx sets A and A such that 1. A contains k eigenvalues of A, 2. A contains at least n - k + 1 eigenvalues of A, 3. the distance from A to A is 8, then 8 :s: IIA - A/l2' 3. (Bhatia and Davis [29]). Let A and A be orthogonal matrices with their eigenvalues lying in a semicircle of the unit circle. Order the eigenvalues by the order in which they appear on the semicircle, say counterclockwise. Show that max I-'i - Ail :s: IIA - A112' , 4. Let <I> be the symmetric gauge function defined by <I>(x) = IlxliI. Let 0 1 0 0 0 0 1 0 A:I: = 0 0 0 1 ::J:l 0 0 0 Show that /lA+ - A-II<I> = 2, whereas limnoomd<I>(A+,A_) = 7r. 5. Give an example of a doubly stochastic matrix S whose elements are not of the form IVijl, where U is unitary. 6. (BauerHouseholder [16]). Let A = X-lAX be diagonal. Let nand {3 be polynomials and w a vector with {3(A)w '" O. Show that there is an eigenvalue of A in the region {I n(O I < K,(x) lln(A)w/l2 } . (3(O - /I{3(A)wIl2 7. Let X E c nxn and let K,opt(X) be the smallest value of K,p(X D), where D is nonsingular and diagonal (see Theorem 3.5). Show that if 1I.7:iIl2 = 1 (i = 1, . . . , n), then K,p(X) :s: vnK.oPt(X). 
196 IV. THE PERTURBATION OF EIGENVALUES 4. Hermitian Matrices In this section we will treat the perturbation of eigenvalues of Hermitian matrices. This is an area rich in results, and we will only be able to sample some of the more importanL We will begin with two classical results: Sylvester's inertia theorem and Cauchy's interlacing theorem. We will then establish Wielandt's elegant generalization of Fischer's characterization of the eigenvalues of a Hermitian matrix. This result in turn yields a host of powerful perturbation bounds. Throughout this section A will denote a Hermitian matrix with eigenvalues Al :::: AI' :::: . . . :::: An, and A = A + E will denote a Hermitian perturoation of A with eigenvalues - - - Al :::: A,l 2: . . . :::: An- 4.1. Inertia and Interlacing A fundamental problem of matrix theory is to determine what remains invariant under some class of transformations. For example, the eigen- values and Jordan structure of a matrix are not altered by similarity transformations. For Hermitian matrices it is natural to consider trans- formations that leave the matrix Hermitian, which leads us to the class of CONGRUENCE TRANSFORMATIONS; that is, transformations of the form XHAX, where X is nonsingular. Unless X is unitary, the eigenvalues of A need not remain invariant under this transformation. However, the number of positive, negative, and zero eigenvalues does not change. Theorem 4.1 (Sylvester, Jacobi). Let A be Hermitian, and define the INERTIA of A to be the ordered triplet inertia(A) = [7r(A), u(A), ((A)], 4. HERMITIAN MATRICES 197 where u(A), (), nd 7r(A) are respectively the number of negative zero, and posItIve eIgenvalues of A. Then for any nonsingular X, ' inertia(X H AX) = inertia(A). Proof TI f' 1 . . . Ie y.roo IS )y contraclIctlOIL Suppose, for example, that A has more posItive eigenvalues than XII AX L t Y b tl b tl . . e e Ie space spanned y Ie clgenvectors corresponding to positive ei g envalues of A 1 ' 1 . len yEY===}yHAy>O. et Z be the space spaned by all vectors of the form X z, where z is an eIgenvector correspondmg to a negative or zero ei g envalue of XH AX Then . z E Z ===} zHAz::; O. But by hypothesis, dim(Y) + dim(Z) > n, where Hence X and Y have a vector in common - a Hence Y n Z = {OJ. n is the order of A. contradiction. . . An important consequence of the inertia theorem is Cauchy's beau- tful theorem relatin .the eigenvalues of a principal submatrix to the eIgenvalues of the ongmal matrix. heorem .2 (?auchy). Let B be a principal submatrix of A of 01'- er n -1 wIth eIgenvalues 111:::: /12:::: ...:::: /In-I' Then AI:::: 111:::: A2:::: 112:::: ...:::: 11"-1:::: An. Proof. Without loss of generality assume that B is the leadin g . . P ie s b t . f A pflnCI- u ma fiX 0 , so that we may write . A  (:., ) Assume that the theorem is false. Then for some i either 11" > A Ai+1 > l1i. Let i be the first such index. ' , or We will treat the case /li > Ai, the other case being similar. Let /li > T > Ai. Then B - T I is nonsingular, and the matrix H= ( B-TI 0 ) o a-T-aH(B-TI)-la = ( I 0 ) ( B - TI -all(B - TI)-] 1 all a ) ( I - (B _ I T I) -) a ) n - TI 0 
198 IV. THE PERTURBATION OF EIGENVALUES . t t A  T 1 Hence by the inertia theorem, H has the IS congruen o. I . 1 B t H same number of positive eigenvalues as A - Tl, name y l - . . U TI . . . I as B - T 1 namely l. Ie has at least as many POSItIve elgenva ues ( '" contradiction establishes the theorem. . If in the theorem C is a principal submatrix of A of order n - 2, , ,"" f c t'f >v>IL> then the eigenvalues VI 2: Vz 2: . . . 2: Vn-z 0 sa IS y ILl _ I _ z_ Vz  . . .  1/1I1  jI'n-l' Hence Ai 2: Vi  Ai+Z, i = 1,2, . .. , n - 2. Continuing through sub matrices in this manner, we have the following corollary. eorollary 4.3. Let B be a principle submatrix of order n - k of A with eigenvalues ILl  ILz 2: . . . 2: ILn-k' Then Ai  ILi 2: Ai+k, i = 1,2, . . . , n - k. Finally we observe that the interlacing theorem holds for more '. . . L t U E enx(n-k) have orthonormal than just pnnClpal submatnces. e ., I 1 1 t \ 1 be chosen so that ( u \1) IS umtary. Then apply- co umns au( e " , II . .. C . II 4 3 to the matrix ( u \1)11 A(U \1), we have the fo owmg mg 01 0 ary ., "( . corollary. II 4 4 L t U E enx(n-k) have orthonormal columns. Let the Coro ary .. e . 1 f U II AU be 1/ 1 > I/z > . . . > ILn-k' Then eJgenva ues 0 fA' _ fA' _ - . Ai 2: ILi 2: Ai+k. i = 1,2,. . . , n - k. 4.2. Wielandt's Theorem and Its Consequences It is a consequence of Theorem 1.3.13 that Al = max x H Ax. xHx=1 gellerall ' zat i o n of this fact is Fischer's theorem, which An important states that . H A A" = Inax nun x x. t ditn(..Y)=i xfl-:l 4. HERMITIAN MATRICES 199 In this subsection we will establish a further generalization, due to Wielandt, which has far-ranging implications. The proof, which has been adapted directly from Wielandt's paper, is complicated and may be omitted without loss of continuity. Theorem 4.5 (Wielandt). Let 1 ::; i l < i z < ... < i k ::; n. Then Ail + Ai2 + . . . + Aik = max min. trace(X H AX), Xil C'\"2 C "' C .--\'tk X=(3'il T 12 ". Iik ),1'1 J EAij dim(X 1 ) )=i) .Xllx=! and ( 4.1) Ail + Ai2 + . . . + Aik =. min max, trace(X Il AX). '\il J"'t2 J"'J'\'ik X:(T '1 T '2 ... X lk ),T 1J E'\tj dim('\'tj )n-iJ+l xH x=[ ( 4.2) Remark 4.6. Note that the words max and min (instead of sup and inf) imply that the maximizing or minimizing objects actually exist. Proof. We will establish (4.1), from which it is an easy exercise to establish (4.2). We begin by showing that there is a particular sequence XiI C X i2 C ... C X ik of subspaces with dim( X ij ) = ij sllch that if X = (XiI Xi2 '" Xik) (Xij E XiJ has orthonormal columns, then trace(X H AX) 2: Li) Ai)' In fact, let X ij be the space spanned by the eigenvectors of A corresponding to AI, Az, . . . , Ai" Then Xi j is a linear combination of these eigenvectors, and since X}!Xi j = 1, we have Xl , I Ax , .. > A , ". Hence J ) - 1 trace(XHAX) = 2::>AXij 2: L Ai)' i) ij In view of the result of the last paragraph, it will be sufficient to establish that max min trace(X H AX) ::; Ail + Ai2 + . . . + Aik' Xi! CXi2C"'CXik X=(Xil xi2 .. Xik),TijE.-l'tj dim('Yij)ij XHX1 The proof will be by induction on n. Note that the theorem is trivially true when k = n, since in this case X H AX is similar to A. Hence the theorem is true for n = 1. 
200 IV. TilE PEHTUIWATION OF EIGENVALUES Let us therefore assume that n > 1 and k < n. Let Xi! C X i2 C . . . C X ik with dim( Xi)) = ij be given. We must show that there is a matrix X = (Xi! Xi2 ..' Xik)' Xij E Xi) with orthonormal columns such that trace(X H AX) S; Ai! + Ai2 + . . . + Aik' First. assume that i k < n. Let X"_1 be an (n - I)-dimensional subspacp cont.aining X ik . Let Z = (2J 22 .., 2,,-1) be a matrix with orthonormal columns such that R[( 21 "., 2ij)] = X ij and R( Z) = X,,_I' Let B = ZII AZ. Then by Corollary 4.4 the eigenvalues 11i of B satisfy J.li S; Ai, i=I,...,n-1. ( 4.3) N ow let Yi j = {ZlI x : X E XiJ. Observe that since X ij C R(Z), if V E Yij then x = Zy E Xi)' More- over, yllBy = XII Ax. By the induction hypotheses there are orthonor- mal vectors Yij E Yij such that LVBYij S; LILi)' j j (4.4) Hence if Xi , = Z Vi ) , then Xi ) E Xi. and L J ' yil BYi = L J . xr AXi.' Hence " "J) J j"J by (4.3) and (4.4) trace(XIIAX) = 2::>Axij S; L Aij' j j which is what we were to establish. Now assume that i k = n. Let 1 be the largest index such that it + 1 < i nJ . For notational convenience let it = p and il+ 1 = q. Let X n - I be an (n - I)-dimensional su bspace that contains X p and the eigenvectors corresponding to Aq, . . . , Art' Since q, q + 1,. . . , n - 1 are among the indices i j , we have A A ,1'/> C X q n Xn-I C ... C Xnl n Xn-I C X n -]. Since for i = q, " . . , n -1, dim(X i n Xn-d 2: i-I, we can find subspaces X q _],.. ., Xn-2 such that X'l,-I C Xq,. .., Xn-2 C X,,_I 4. HERMITIAN MATRICES 201 and Xi! C '" C X p C X q _ 1 C '" C X n - I . . No:v apply the construction of the previous case to give a matrix B wIth eIgenvalues II >... > II t . f . (4 3) d . ,...1 _ _ ,...n-I sa IS ymg . an a umtary matrix X = ( .x , " ! ... X l> . X I ) , o.J.'q ... .,tt- such that Xi) EX i ), j=I,...,l, Xi E Xi C Xi+I, Z = q - 1,. .., n - 1, and trace(X li AX) S; L;=1/ 1 i) + L::qlI/Li < ",I A. + ",-I. (4.5) - L-J=I 'j L-,=q_I/L,. By construction X n _ 1 contains the eigenvectors of of A corresponding to Aq,. . . , An. Hence these are also eigenvalues of B. Sinc e 11 I q-I,...,J.ll1 I are t Ie smallest eigenvalues of B, we have - n-l n L JLi S; L Ai, i=q-I i=q and the result follows upon substituting this inequality in (4.5). . ":'hen k = 1, Wielandt's theorem gives Fischer's characterization of the eIgenvalues of a Hermitian matrix. eorollary 4.7 (Fischer). The eigenvalues of A are given by Ai = max min xII Ax dim(X)=i IE"\" xHX=l and Ai = min max Xli Ax. dim(X)=n-i+1 IE.\" x H x=l For i = 1 the second of the above charact.erizations reduces to Al = max xII Ax xHx=l ' 
202 IV. TilE PERTURBATION OF EIGENVALUES as was pointed out at the beginning of this section. This latter char- acterization has important implicat.ions for perturbation theory. For suppose, as usual, that A = A + E,_where E is _also Hermitian. Then denoting the largest eigenvalues of A and E by Al and EI, we have .xl = max x" Ax < max x" Ax + max :r" Ex :S Al + E). :,-fl:1'-:1 - :,.'I:r=-l .,.II.r:==1 In other wards, since IEtl :S IIEI\2, the perturbation E can increase the largest eigenvalue of A by no more than IIE112' We will now proceed to generalize this result. As we did earlier, we first establish a result for sums of eigenvalues and then specialize it to a single eigenvalue. Theorem 4.8. Let the eigenvalues of E be EI  E2  .. .  En, and let iI, . . . , i k be distinct integers between one and n inclusive. Then Ail + . . . + Aik + En-k+1 + . . . + En :S .xii + . . . + ik SAil + . . . + Aik + EI + . . . + Ek. Proof. Without loss of generality, we may assume that i l < .. . < in. We will first establish the second inequality. By Remark 4.6 following Theorem 4.5, there are subspaces Xii C X i2 C . . . C X ik such that i + ...+ = 11 tk nun X=(Til X'2 ,., Tlfr ),Ttj EX tj XHX=I H - trace(X AX). Moreover, t.here are vectors :1\ E Xi.) such that X unitary and (Xii ... Xik) IS Ail +... + Ai k  trace(XHAX). It follows that il + . . . + ik :S trace[X"(A + E)X] S Ai! + . . . + Aik + trace(X H EX). But by Corollary 4.4, trace(X" EX) :S E) + ... + Ek, 4. HERMITIAN MATRICES 203 which establishs the second inequality. The first inequality may be obtained from the second by writing A = A - E, from which it follows that Ai, + . . . + Aik :S ).il + . . . + \k - Enk+1 - . . . - En" . Wl)(n k: = 1, the theorem provides a perturbation bOUIHL Corollary 4.9 (Weyl). For i = 1,. . . , n ).i E [Ai + En''\; + Ed. There are three things to note about this corollary. First, the corollary is similar to the Gerschgorin theorem in that it pro':.ides a set of n intervals (disks) whose union includes the eigenvalues of A. However, we know just which eigenvalue to look for in each interval. Moreover, it is impossible for an eigenvalue corresponding to one of a cluster of overlapping intervals to migrate outside its own interval. Second, the intervals are not symmetric about the eigenvalues Ai. In fact if En is positive, the ith interval will not contain Ai' This occurs when E is positive definite. In other words, if a Hermitian matrix is perturbed by a positive definite matrix, its eigenvalues must increase. Third, there is a weaker, more conventional form of the theorem which is stated in the following corollary. Corollary 4.10. max{IAi - Ail} :S IIEIIz. (4.6) This result follows directly from the preceding corollary and the ob- servation that IIEI12 = max{lEll, IEn!}. In the next subsection we will generalize this corollary. 4.3. Mirsky's Theorem Equation (4.6) can be rewritten in a more symmetric form: namely, IIdiag(.x i - '\i)1I2 :S IIEII2' (4" 7) 
204 IV. TilE PERTURBATION OF EIGENVALUES I 11 . II ith other norms to This suggests that we attempt to rep ace 2 w . . t '1 . 1 d 1ft ( 4 7 ) is valId for any um an y btain new perturbation Joun s. n ac, . I  (. . I e r to I Jrove it we first establish an ana ogous mvanant norm, lOwev , result for singular values. k ) Let X and X be matrices of the same di- Theorem 4.11 (Mirs y . lIH'nsioJJs with singular valucs al  a2  . . .  a 1 " al 2: a2 2: . . . 2: al" Then for any unitarily invariant norm II . II, Iidiag(ai - ai)11  II"Y - XII. . e that X and X are Proof. Without loss of generalIty we may assum I t ke s uare (otherwise pad them out with zero rows or co umns. 0 ma tem so). Now by Theorem 1.4.2 the eigenvalues of the matnx Un :) . f X - F . II if E >... > E are the ::!: ::!:a and simIlarly or . ma Y I _ _ n are al,"', P' f singular values of X - X, then the eigenvalues 0 ) ( 0 X-X (X - X)H 0 are ::!:EI, . " . , If]>' III Theorelll ,1.8 let i,  { k ifak2:ak, n + k if ak < ak. It then follows that lal - all + . . . + lak - akl  EI + . . . + Ek, Therefore by Theorem 11.3.17 the inequality ''' ( - - a a - - a , ) < <I> ( fl, . . . , fl') 'F at 1, . . ., P J_ k = 1,... ,n. 4. HERMITIAN MATRICES 205 holds for any symmetric gauge function <1>. The result now follows from yon Neumann's characterization of unitarily invariant norms (Theo- rem II.3.6). . An immediate consequence of Mirsky's theorem is the generalization of (4.7). Specifically, we have the following corollary. Corollary 4.12. Let <I> be a sYllJl1wtric gauge function and II . 11<1> ds corresponding unitarily invariant norm. Then IIdiag().i - .\;)11<1>  IIEII<I>' ( 4.8) Proof. Let p = min{An, ).n}. Then the eigenvalues of the matrices A - pI and A - pI are nonnegative; i.e., their singular values and their eigenvalues are the same. Mirsky's theorem now applies to give (4.8). . When <I> generates the Frobenius norm, we obtain a Hermitian ana- logue of the HoffmanWielandt theorem. Corollary 4.13. n 2)).i - AiF  IIEIIF' i=1 Note that this result is stronger than the HoffmanWielandt theorem, since it specifies an ordering of the eigenvalues that satisfy the inequal- ity, whereas the Hoffman- Wielandt theorem merely asserts that such an ordering exists. 4.4. Residual Bounds In this and the next subsections we will consider applications of the Mirsky theorem. The subject of this subsection is residual bounds. As in Section 1 we are given a matrix A (now Hermitian) of or- der n and a matrix X whose column space approximates an invariant subspace of A. This means that for some choice of M, the residual R = AX - X M will be small. In particular, if X has arthonormal columns, then by Theorem 1.15 any unitarily invariant norm of R is minimized when 
206 IV. THE PERTURBATION OF EIGENVALUES Moreover, since A is Hermitian, we can use irsk.y's a bound on the eigcnvalues of A1 as an approxnnatlOn AI = XHAX. theorem to get to t hose of A L t :y E nnxk have ortlJOnorwal coluwns. Let !vI = Theorem 4.14. r", . [, t' on c H A v J 1 t R = AX - X AI" Lrt <I> be a symmetnc gauge U1c lOl X ,\. aI1C1 r, C '1 [ 't '1 III van an t nn Ilpt 11.11 denote the correspon(Jjng lan1J y 0 UIll ,an . , <W(  A . A >... > A and the eIgenvalues norms. If the eIgenvalues o[ al e I - -". i such [ M > > Il k then there are integers 1'1 < 12 < . . . < k o areIII_"'_f"" , that 'II ' ( "- \" )11 < \ I XR H + RXI!II<1> = <I>(PI,PI,P2,P2,.' .), (4.9) (Jag 11) At} <1>_ 1 » . .' . Ire tI le sin g ular values o[ R. II! ]('1'1' PI _ P2 _ ,,' c Proof. We will establish (4.9) for the case 2k :S 11, leaving the other . F AI XI!AX let case as an exerCIse. or = , E = -(X R H + RXI!). Then E is Hermitian, and it is readily verified that (A + E)X = XM. (4.10) . . b f A + E and to each eigenvalue 1 ' 1 'D ( V ) I 'S an lllvanant su space 0 , l\lS '''" J\. . - fA., E 11' of !vI there corresponds an eIgenvalue Aij 0 . d' ) By Mirsky's theoremlldiag(.\i - Ai)II<I> :S IIElk Hence II Jag(l-  . < E <I> = 11 X R H + RX H 11 <1>, and it remains only to esta IS A I '}) 'I <I> _ I ' I t I ! I ' n ( 4 9 ) or e q uivalently that the singular values of E are tleequalY . ) . 't then But if X, is chosen so that (X X.l IS um ary, PI, PI, P2, pz, . . . . .L ( 0 R H X.l ) (X X.l)HE(X X.l) = XR 0 . Since n(R) c n(X.l) and the columns of X.l are orthonormal, the . I Illes of X l! R are the same as those of R, and hence those of smgu ar va .1 E are those of R repeated. . . It is worthwhile to list the bounds for the spectral and Frobemus norms. 4. HERMITIAN MATRICES 207 Corollary 4.15. For the spectral norm we have max {I/Ij - Ai} I} :S 11 RII2, J and [or the Frobenius norm v '2((1Jj - AiJ2 :S 1211RIIF' (4.11) Remark 4.16. By an application of the argument leading to the Hoff- manWielandt theorem, Kahan has been able to remove the factor 12 in (4.11). See Exercise 4.8. The residual bounds derived above can be very good or very bad, and the following example shows. Example 4.17. I[ A( ) and X = 1 1 , then M = 0 and IIRlb = E, so that the bound is attained (the eigenvalues o[ A are :h). On the other hand, i[ A( :), then the residual bound [or AI = 0 is the samc, but the smallest eigen- value of A is approximately -E 2 ! The distinction between the two examples is that in the second the unwanted part of the spectrum is well removed from the part we are attempting to bound. In Section 3 we will show how to use such infor- mation to get a better bound. Two more comments. First, the eigenvalues of M = Xl! AX are sometimes called the RAYLEIGHRITZ APPROXIMATIONS to the eigenval- Ues of A. Second, although we motivated this subsection by taking X to be a matrix of approximate eigenvectors, all that is required to get accurate eigenvalues is that R be small. Indeed, the part in proof where we show that the eigenvalues of M are the same as those of A + E can be turned into an algorithm for getting approximate eigenvectors from X, a procedure that is sometimes called RayleighRitz improvement. 
208 IV. THE PERTURBATION OF EIGENVALUES 4.5. Approximation by a Low-Rank Matrix The second application of Mirsky's theorem is to the cletermination of low-rank approximations to a fixed mat.rix. As above, let <I> be a symmetric ane function and II . 11<1> be the corresponding unitarily i;lvariant. norm. Let X E e mx " have the singnlar value decomposition X = UI:V H , (4.12) where al ;::0: ... ;::0: am ;::0: O. We wish to find a matrix Y of rank not greater than k that is as near as possible to X in the <I>-norm. First let Y be any matrix of rank not greater than k. Then the singular values of Yare TI ;::0: . . . ;::0: Tk ;::0: 0 = . . . = 0; i.e., the last m - k sinular values are zero. It follows from Mirsky's theorem that IIY - X\I<I> ;::0: <1>( TI - al,. . . , Tk - ak, aHI," . . , am) ;::0: <1>(0, . . . ,0, ak+l, . . . , am)' III other words, any approximation of rank not greater than k must be at least <1>(0,.. .,0, aHI,"', am) removed from X ill the <I>-norm. N ow let I: k = diag(al,"" ak, 0,...,0) (4.13) and X k = UI:kV H . (4.14) Then it is easily verified that XI; has rank not reater than k andllX k - XII<I> = <1>(0,."'.,0, ak+I," ., am). Thus we have proved the following approximation theorem. Theorem 4.18 (Schmidt, Mirsky). Let X have the singular value decomposition (4.12), where al ;::0: ... am ;::0: O. Let <I> be a symmetric gauge function and 11.11<1> be the corresponding unitarily invariant norm. If Y is a matrix of rank less than or equal to k, then IIY - XII<I> ;::0: <1>(0,...,0, aHl,"', am)' Moreover, equality is attained for the matrix Xk defined by (4.13) and (4.14). 4. HERMITIAN MATRICES 209 Notes and References Although Sylvester published the inertia theorem in 1852 [233] (also see [234, 18 53]), the theorem was found in Jacobi's papers ami published posthu- mously [123, 18 57] by Borchart, who gives 1847 as the date of discovery [39]. Hermite, who published his own proof [110, 1857], also names Jacobi as having discovercd the principle. According to one biographer [195], at the time Jacobi was suffering from diabetes and from personal revcrses stem- ming from the revolutions of 1848, which probably accounts for his failure to publish. The interlacing theorem (Theorem 4.2) is due to Cauchy [42, 182 9]. Wiclandt [267, 1955] provcd his theorem because he was unable "to succeed in completing the intcresting sketch of a proof given by Lidskii [147, 195 0 ]" of Tlworem 4.8 (see [28, p.50] for more details and further rcferences), Amir- Moez [4, 195 6 ] gcncralized Wielandt's charactcrization by replacing the sums and traccs by any function of the eigenvalues in question that is nondecreas- ing in its arguments. The special, but very important case in Corollary 4.7 is due to Fischer [74, 1905], who actually established it for matrix pencils (see Corollary VI.1.16). Courant [46, 1920] extended the result to differ- ential operators, and the theorem is frequently called the CourantFischer theorem. Weyl [265, 1912] proved more than is stated in Corollary 4.9 (see Exer- cises 4.3 4.4). He also claims the analogous results for singular values Ii la Schmidt [192]. For Mirsky's theorem see [159, 1963], which in addition contains an ad- mirable survey of unitarily invariant norms and related topics. For the spectral norm and arbitrary M, the rcsidual bound of Corollary 4.15 is due to Kahan [130, 19G7] (finally published as a part of [52, 19 82 ]), who uses the dilation theorem (Exercise 1.10) specialized to Hermitian matrices. The generalization in Theorem 4.14 to unitarily invariant norms is new. The proof given here is closely related to a proof for the spectral norm given by Parlett [175, pp. 219-220]. It should bc noted that this result is but one -- and one of the simplest - of a host of useful residual bounds. Sce [144, 145, 264] and especially the book by Parlett [175] , which contains a unified treatment of many of these topics. As was noted in the text, the eigenvalues of M in Theorem 4.14 are fe- quently called RayleighRitz approximations to the eigenvalues of A. Both Rayleigh and Ritz were concerned with approximating the eigenvalues of an 
210 IV. THE PEHTUIU3ATION OF EIGENVALUES infinite operator by nplacing it with a matrix eigenvalue problem. R.ayleigh [183, 1899] found the natural frequencies vibrating systems by restricting its degrees of freedom to a finite number of modes, which were to be cho- sen to accentuate the fundemental frequency. R.itz [186, 1909] approximated the eigenvalues of the vibrating string by minimizing the variational equation over a fiuite dimensional subspace. Neither gives a formal justification for his method. A curious custom has grown up of calling eigenvector approxima- tions obtailwd from !If and X "Rit", v(d,ors," alt.hough Rit", himsdf merely said that he was unable to establish their couvergence using the techniques he had devdoped earlier in his paper. The Schmidt Mirsky theorem (Theorem 4.18) is commonly attributed to Eckart and Young [63, 1936], who established it for the Frobenius norm. I3ut Schmidt [192, 1907] proved it for integral operators and the Hilbert Schmidt norm -- the natural extension of the Frobenius norm. Mirsky [159, 1963] generalized it to unitarily invariant norms. When a Hermitian matrix is perturbed at random, a multiple eigenvalue will tend to break up into simple eigenvalues, and the perturbation in these eigenvalues will all be of a size. When the perturbation is not random, however, the perturbations can be quit.e disparate. Sun [231, 19R9] has investigated the case where t.he clements of A depend analytically on several paramcters. Exercises 1. Let ( Be ) A = cll D . Show t.hat. there is an eigenvalue A of A satisfying IA - DI s: Ilcll2' 2. (Lidskii [147]). In the notation of Theorcm 4.8, let e = (EI'"'' En)T. Show that (>\] - AI, .. . , n - AT,) lies in the convex hull of the set {Pe : P a permutation}. THE FOLLOWING TWO EXERCISES SHOW IN MODERN NOTATION WHAT WEYL [265] ACTUALLY PROVED. 3. Let A and B be Hermitian with B of rank k. Then the largest eigenvalue of A - B is not less t.han the (k + l)th largest eigenvalue of A. 5. SOME FUn:I'BEH RESULTS 211 4. Let A, B, and C have eigenvalues (XI 2: '" 2: ct", {JI > II 2: . . . 2: In' Then > {3n, and li+j+l s: cti+1 + {Jj+I' -0- THE FOLLOWING EXERCISES DEVELOP THE KATOTEMPLE RE- SIDUAL BOUND [43, SECTION 6.5] FOR AN ISOLATED EIGEN- VAUI AND ITS EIGENVECTOR. IN WIIAT FOLLOWS lI:rll2 = 1, P, - x Ax, AND r = Ax - ttx. 5. Let J1. E (a, (3), where (a, (3) contains no eigenvalues of A. Then ({3 - p,)(p,- a) s: IIr1l2' 6. Let I!. < p, < 71, where (I!., 71) contains exactly one eigenvalue A of A. Then A E [ p, _ Irll , It + IIrll ] . p,-Il Il-I!. -0- 7. (Kahan [129]). Show that for the 2-norm, tlw hypothesis !if = XIi AX can be removed from Theorem 4.14. Specifically, for arbitrary Hermitian M, the inequality (4.9) can he replaced by IIdiag(p,j - Aij)112 s: IIRII2' [Hint: Use the dilation theorem (Exercise 1.11).] 8. .(Kahan [129]).. Show that the factor v'2 can be removed from (4.11). [Hmt: Assume wIthout loss of generality that A and M are diagonal, and reard R as a function of X, or more generally of U = (X X), where U is umtary. Let W = IUI, and let Dij = (Ai - p,j) when j s: k and otherwise be zero. Show that IIRII = Li Lj Wijliij. Conclude from I3irkhoff's theorem that IIRlh is minimized when U is a permutation matrix.] 5. Some Further Results This subsection is devoted to some useful results that could not be made to fit comfortably into the preceding subsections. In the first subsec- tion we treat the' problem of non-Hermitian perturbations of Hermitian matrices; and in the second, the perturbation of eigenvalues of matrices that are similar to Hermitian or normal matrices. 
212 IV. THE PERTURBATION OF EIGENVALUES 5. SOME FURTHER RESULTS 213 r: 1 N on-Hermitian Perturbations iJ. . The results of this subsection concem non-Hermitian perturbations of Hermitian matrices and except as noted a:e dle to. Kahan. Thro;t ill Clssume that A is a Hermitian matnx wIth eIgenvalues Al -. _ I' wew ,,' _ 1 ' II ,' "i'III I H'rt.urbat.lolio :\ \V!' will furt.h!'r aSSUIlt(' t.hat. / IS a llon- (11111", . _ . , / A ':' tl . t . F = A - A is not Hermitian. The eigenvalues of A, :v lnch , Id IS, J . " > . . . > Fmally may be complex, will be wntten ILk + Wk, where ILl _ _ IL". we will write E + E H E 1R = 2 Hence II XII (A - AII)x x E"x = = 1/ " 2i ' from which it follows that Ivl ::; IIE\.1112. . If Oil!' of tit(' rqiolls Dk is isola!.!'d from t.h!' oth!'rs, it. (,ollt.ains only one eigellvalue, llC1mely ILk + ivk, which is perforce real. Thus the theo- rem says something new only for clusters of eigenvalues whose regions overlap. Specifically, if the m regions Dk, . . . , Dk+rn-l overlap, then they contain precisely m eigenvalues of A, namely ILk + iVk, . . . ,Ilk+m-I + iVk+rn-l. The regions themselves are disks trimmed at the top and bot- tom by horizontal lines (z) = ::!:II E'.'i 112. As the perturbation becomes increasingly Hermitian, these lines approach one another, restricting the sizes of the imaginary parts of the eigenvalues of A. There is another version of the theorem that is reminiscellt of the Hoffman  W ielancl t theorcm. and - -II E - Ell A - A E\.l = 2i 2i for the "real" and "imaginary" parts of E. It call be verified by direct computation that IIEII = IIE1RII + IIE'.'iII. (5.1 ) Theorem 5.2. In the notatjon above "\"1 I \\2 = L  t, v,' " IIEolI, (5.2) Theorem 5.1. Let Dk = {fL + iv : III + iv - Akl ::; IIEI12 andlvl ::; II E '.'i1l2}' and Then n A(A) C U Dk' k=1 n L(ILk - Ak)2 ::; IIE1RIIF + k=l n IIE'.'i/l - L Vk 2 . k=1 (5.3) Proof. By Corollary 3.4, for any It + iv E A(A) there is an eigenvalue Ak of A such that From thjs jt follows that \/1 + iI/ - AI,l s; \lElk It remains only to show that \1/\ s; \I E'.'I 112_" . " .' Let x be a 'normalized eigen\'ector of A correspondmg to /1 + lV, I.e., n L I(ILk + ivd - Akl 2 s; V2/1Elk. (5.4) k=1 By passing to the Schur form of A, We may assume that A.r = \/1 '7 iv)I. A = .11 + i.\' + R. (5.5\ '\ '1.':" \' = :. :,' . ,......- . J'.., . ;::\.: _\,  :-::-:-!\.::-i," x H Ax = fL + iv and XII Allx = IL - iv. R + R II A + E'!R = M + 2 
214 IV. TIlE PERTUlWATION OF EIGENVALU and R-R H E = N + 2i N . N d (R - R II )/ 2i have dis J ' oint sets of nonzero eleme ow smre an , , \\ R_R H \\ 2 IIE'JII = IINllf, + 2i F = IINII +  IIRII  IINII = I:k=1 Vk 2 , (5 2) O n the other hand, since A and M are He which establishes . . tian n L:)fLk - Ak)2  11M - AIIF '=J = liE. _ R +2 RIl  IIEIIF + \\R +2 RHt = IIEIIF +  IIRIIF = IIEIIF + J IIE'JII - IINII = IIEIIF + J IIE'JII - I:k=l Vk 2 , which establishes (5.2). To establish the combined bound (5.4), write " 1 2 ",n ( \ ) 2 + ",n v 2 I:k=1 I(P'k + Wk) - A = L-k=l JLk - /lk 2 L-k=1 k J n 2 ) ",n 2  (IIEIIF + IIE'JII - I:k=l Vk + L-k=l V k = IIEII + 21IEIIF J IIE'JII - I:k=1 Vk 2 + IIE'JII} s: (IIEIIF + II E 'JIIF)2 s: 2(IIEII} + IIEII) = 2I1EII. . 215 rem 3.3 we assumed that A was diagonalizable and derived d on sv A CA) that depended on the condition number of the ""alizing transformation. In this subsection we will assume that fand A can be reduced to either Hermitian or normal matrices ".larity transformations, and obtain perturbation bounds on their ues. , begin with the Hermitian case. The principal result is based on , owing lemma. a 5.3. Let Hand K be n x n Hermitian matrices , and let ag(O'I" . . , an) with 0'1  . . .  an  O. Then IIH - K1I2  O'nliH - K1I2. Er, fj'Let A be the eigenvalue of H - K of largest absolute value, so 'IH - Klb = IAI. Let x be the corresponding normalized eigenvec- en ':xH(H - K)x = xH(H - K)x + xH(IO= - K)x = Axllx + iT, (5.6) .;: iT is real (here we have used the fact that the matrix K - K '-Hermitian). Hence  ,0' "H - K1I2 = max luH(H - K)vl ;. 11"112=1 " 1111112=1  max luH(HE - EK)ul lIull2=1  IxH(HE - K)xl  IAllxllxl [by(5.6)] = IIH - K1I21xHxl  O'nliH - Klb. . a 5.3 allows us to establish the following theorem. , m 5.4. Let A, A E e nxn , and suppose that there are two non- '.' matrices P and Q such that p1 AP and QI AQ are Hermi- t the eigenvalues of A and A (which are necessarily real) be An and 5. 1  . . .  5. n . TheIl l.x i - Ad s: !\;2(P)!\;2(Q)IIA - A1!2, i = 1,..., n. (5.7) 
216 IV. TilE PERTURBATION OF EIGENVALUES 5. SOME FURTHER RESULTS 217 Proof. Let and -I - -I - A. = P AP and A. = Q AQ. D = 2: - anI. Obviously, the diagonal elements of D are nonnegative. Moreover, Then Ilj - AII2 = IIQ.4.Q-I - PA.P- I 112 -:-IIQCI.Q I/'_Q 1/';1.)/' 1112 (S.H) 2': IIQ- 1 1I2- I IIPIl2- I IIA.(Q-Ip) - (Qlp)A.1I2' {i = IIM(D + a"I) - (D + a , J)NIlt., - alIl\f - Nllt., = liMO - !IN + an(M - N)IIf., - aIIM - Nllt, = IIMD - DNII + 2an{trace[(MD - nN)H(M - N)]} = liMn - DNII +antrace{D[(M - N)H(M - N) + (M - N)(M - N)H]) Let U2:V H be the singular value decomposition of Ql P. Then with a" denoting the smallest diagonal of 2:, we have from Lemma 5.3 IIA.(Q-Ip) - (QIP)A.lb = II(U H A.U)2: - 2:(V H A.V)II2 2': anllUHA.U - V H A. VII2 2':a"l5.i-Ail, i=l,...,n 2': 0, which is the required inequality. . Notes and References Thus from (5.8) we get - I 1- IAi - Ail  a;: 11P11211Q- IhiiA - AII2, i=I,...,n. (5.9) With the exception of Theorem 5.5, which is due to Sun [227, 1984] and Zhang [277, Ig8G] , the results of this section are taken from a paper by Kahan [133, 1975]. Kahan writes as if Theorem 5.1 were due to Wilkinson [269, 19G5], but although Wilkinson gives a brief discllssion of nonsymmetric perturbations, he does not bound the imaginary parts. In addition to the results of this section, Kahan shows that the matching distance of a non-Hermitian perturbation is proportional to log nllEII2 (Ex- ercise 5.2). Now a n - I = II(Q-I Pt l l12  IIp- 1 1121IQlb. Combining this with (5.9) we get (5.7). . Therc is an analogue of Theorem 5.4, due to Sun and Zhang, for matrices that can be transformed into normal matrices. Theorem 5.5. Let A, A E c"xn. Assume that there are nonsingular matrices l' and Q sllch t.1wt p-I AP and Q-I AQ are normal. Then Exercises md 2 (A, A)  K:(P)K:(Q)IIA - AIIF' 1. Let A be Hermitian and A be normal. In the notation of Theorem 5.1 show that Proof. If we can establish the analogue of Lemma 5.3 for normal matrices, then the proof of Theorem 5.4 goes through mutatis mutandis. Specifically we must show that if M and N are normal matrices and 2: = diag(al,"', an) with al 2': ... 2': an 2': 0, then  t. "1 :S 11/'00 II, and IIM2: - 2:NIIF 2': anllM - NIIF. n I)11k - Ak)2 ::; IIE!RIIF. k=1 To show t.his, set. Conclude that 2 2 11 11 2 {) = IIM2: - 2:NIIF - an M - N F n L I(/lk + il/ k ) - Akl 2 S; IIEIIF' k=l 
218 IV. THE PERTURBATION OF EIGENVALUES [Notc: This is thc bound that thc Hoffman Wielandt theorcm would provide, cxeppt that thp pairing of the cigenvalucs of A and A is cxplieiL] 2. (Kahan [133]). Use thc fat't [132] that if ..\(Z) is real then IIZ - ZIII12 :S (0.038 + log2 n) II Z + Z1I112) to show that. if A is lIcrrnitian then md(A, A) :S IIE!R112 + (0.038 + log2 n)IIE\J1I2. Chapter V Invariant Subspaces We have already observed in Section 11.4 that the problem of establish- ing perturbation bounds for eigenvectors is complicated by the fact that eigenvectors corresponding to multiple eigenvalues are not unique. This has the consequence that the eigenvectors corresponding to a tight clus- ter of eigenvalues will be ill conditioned. However, the space spanned by these eigenvectors is an invariant subspace, which need not be sensi- tive to perturbations in the matrix. It therefore makes sense to derive perturbation bounds for invariant subspaces, from which bounds for eigenvectors follow as a special case. The first section of this chapter may be regarded as a continuation of the subsection on invariant subspaces in Chapter I. Here we introduce the notion of a simple invariant subspace - the analogue of a simple eigenpair - and establish its properties. In the next section we derive error and perturbation bounds for a simple invariant subspace of a general matrix. In the third section we present the Davis- Kahan theory for invariant subspaces of Hermitian matrices. The chapter concludes with a section on the singular value decomposition. Throughout this chapter, A will denote a matrix of order 11, except in the last section, where it will denote an Tn x 11 matrix. 219 
220 V. INVARIANT SUBSPACES 1. SIMPLE INVARIANT SUBSPACES 221 1. The Theory of Simple Invariant Subspaces AX = XL. which establishes (1.1). Writing (1.1) in the form X" AHy = 0, we see that R(Y) must be an invariant subspace of A". . Just the invariant subspace R(X) is a generalization of the notion of a right eigenvector, so R(Y) is a generalization of a left eigenvector. Consequently we shall call R(Y) a LEFT INVARIANT SUBSPACE of A. The condition (1.1) can be regarded as saying that A can be red uced to a block triangular form by a unitary similarity. To see this, let XI be an invariant subspace of A, and the columns of XI form an orthonormal basis for X. Let (X I Y 2 ) be unitary. Then 1.1. Definition Let .Y be an invariant subspace of A, and let the columns of X form a basis for X. In Section 1.3 we showed that there was a unique matrix L such that The matrix L is the representation of A on X with respect to the basis X, and the eigenvalues of L are eigenvalues of A. U nfartl111ately, t.he matrix L need not characterize the invariant sub- space X" For example, if A = In, then any matrix X E e nx2 with orthonormal columns spans an invariant subspace whose representa- tion with respect to X is 1 2 . This shows that we cannot circumvent the problem of nonuniqueness of eigenvectors by passing to invariant subspaces: we need additional conditions to insure that the invariant subspaces are themselves unique. The key is provided by the observation that if ). is' a simple eigen- value of A, then its eigenvector is unique up to a scalar multiple. The analogous requirement for an invariant subspace is that the eigenval- ues of it.s representation L be distinct from the other eigenvalues of A. We will say that such an invariant subspace is simple. However, be- fore making a formal definition, it will be convenient to establish some preliminary results. We begin with a useful characterization of invariant subspaces. ( X" A X ( X Y ) H A ( X Y ) = I'" ] I 2 I 2 y2H AX I X:fAY 2 ) Yl 1 AY 2 . By (1.1) the matrix YifAX] is zero. Consequently, if we set LI = XAXI, L 2 = Yi I AY 2 , and H = X:IAr2, then ( L O I L H 2)' (XI y 2 )HA(X I Y 2 ) = (1.2) Theorem 1.1. Let the columns of X be linearly independent and let the columns of Y span R( X).l. Then R( X) is an invariant subspace of A if and only if It is easy to see that AX I = XIL I , Y"AX = o. (1.1) so that LI is the representation of A on X with respect to X. Thus the eigenvalues of LI are the eigenvalues of A associated with X. The complementary set of eigenvalues are those of L 2 . All this suggests the following definition. In this case R(Y) is an invariant subspace of A". Proof. Let X = R(X). Then by definition X is an invariant subspace of A if and only if AX c X. But AX eX{:::=} AX 1- x.l {:::=} R(AX) 1- R(Y) {:::=} yHAX = 0, Definition 1.2. Let X be an invariant subspace of A, and let (1.2) be its REDUCED FORM with respect to the unitary matrix (XI Y2). Then X is a SIMPLE INVARIANT SUBSPACE if £(LI) n £(L 2 ) = 0. 
222 V. INVARIANT SUBSPACES 1. SIMPLE INVARIANT SUBSPACES 223 A key fact about simple invariant subspaces is that they have COlIl- plements in en that are also an invariant su bspaces (to see that this is not true in general, let A= ( O 1 ) o 0 We will show that this equation has a unique solution. Partition Y = (YI '" Yn) and D = (d l ". d n ) by columns. Since T is upper triangular, the first column in the relation (1.4) is AYI -TIlYI = d l or (A - TIlI)YI = d j (1.5 ) alld considC'r thC' invariant. subspace spanned by 1d. However, before we can prove this fact we must digress to establish the properties of a certain linear operator. Since TII E £(B), the matrix A - Till is nonsingulaL Hence YI is the unique solution of (1.5). Now suppose that YI,. .., Yk-I are uniquely determined. The kth column of (1.4) is 1.2. The Operator T = X I-> AX - X B k AYk - L TikYi = d k i=1 III the sequel we shall have to solve SYLVESTER'S EQUATION, which is of the form or AX - X n = c, (1.3) k-] (A - T,,,J)Yk = (h + L TiYi' i=1 (1.G) where A and 1J are square mat.rices of orders n ami rn., so that X and Care n x rn. matrices. We will be concerned with conditions under which (1.3) has a unique solution. Equivalently, if we define the linear operator T : c nxm ---t e nxm by Since Tkk E £( B), the matrix A - Tkkl is nonsingular. Hence Yk is the unique solution of (1.6). . A corollary of this result is a characterization of the eigenvalues of T = X I-> AX - X B, T. then the problem becomes one of determining when T is nonsingulaL Theorem 1.3. The linear operator T = X I-> AX - X B is nonsingular if and only if eorollary 1.4. L(T) = £(A) - L(B). £(A) n £(B) = 0. Proof. If v E £(T), then there is an X such that AX - XB = vX, or (A - vI)X - X B = 0; i.e., the operator X I-> (A - vI)X - X B is singulaL It follows that £(A - vI) and L(B) have a common element, which is to say that v = A -/1 for some A E £( A) and fl E £( B). Hence £(T) c £(A) - £(B). The inclusion in the other direction follows by reversing the above argument. . Proof. First. suppose that A E £(A) n £(B). Let Ap = AP and qBB = AqIl (p, q =f 0). Let X = pqH. Then T(X) = ApqH - pqH B = Apqll - ApqH = O. Thus T annihilates the nonzero matrix X and must be singulaL Conversely, assume that £(A) n £(B) = 0. We must show that the system AX - X B = C has a unique solution. Let the Schur decomposition of B be T = VB BV. Then with Y = XV and D = CV, thC' C'quat.ioll AX - X B = C is equivalent to 1.3. The Spectral Resolution AY -)''1' = D. (1.4) We are now in a position to show that a simple invariant subspace has a complementary subspace. The following theorem exhibits the complement as the column space of a matrix constructed from a reduced form of the invariant subspace. 
22'1 V. INVAIUANT SUUSI'ACES Theorem 1.5. Let the simple invariant subspace Xl have the reduced form (1"2) with respect to the orthogonal matrix (XI Y2). Then there are matrices X 2 and Y I such that (XI X 2 )-1 = (Y I y 2 )H, and A = XILIy/1 + X 2 £2 Y 2 H , (1. 7) whcH' L . = yHAX 1- 1, Z, i = 1,2. Proof. We begin by reducing the matrix ( £1 H ) o L 2 from (1.2) to block diagonal form by a similarity transformation. Specif- ically, we will show that there is a matrix Q such that ( -n (' :)(  en (' J (1") This is equivalent to showing that there is a matrix Q such that LIQ - QL 2 = -H. Since X is simple, £(LJ) n £(£2) = 0. Hence by Theorem 1.3, Q exists (and is unique). It follows from (1.2) and (1.8) that ( £1 0 ) = o L 2  -/Q) ( 2 ) A(X I 1 2 ) ( ) : ) A(X, X,), (1.9) where X 2 =Y 2 +X 1 Q 1. SIMPLE INVARIANT SUI3SPACES 225 and Y I = XI - Y 2 QH. The fact that (Xl X 2 )-1 = (Y I y 2 )H follows from the fact that (1.9) is a similarity transformation. Hence we may write (1.9) in the form ( £1 A = (Xl X 2 ) 0 o ) ( y,H ) £2 Yl l ' from which (1. 7) follows directly. . From (1.7) we see that AX I = XI£I' More important, AX 2 = X 2 £2, which implies that X 2 = R(X 2 ) is an invariant subspace of A. Since (XI X 2 ) is nonsingular, together XI and X 2 span en. Thus we have the following corollary. eorollary 1.6. Let XI be a simple invariant subspace o[ A. Then A has a complementary invariant subspace X 2 . Moreover, the spaces YI = Xl and Y2 = X{ are the corresponding complementary pair o[ left invariant subspaces. We will call (1. 7) the SPECTRAL RESOLUTION of A along XI and X 2 . It is instructive to write the spectral resolution in a different way. Let Pi = X i y;lI, i = 1,2. Then it is easily verified that 1. P/ = Pi (i = 1,2), 2. P I P 2 = P 2 P I = 0, 3. A = PIAP I + P 2 AP 2 . (1.10) As we saw in Section 1.2, the first condition says that Pi Xi = Xi (i = 1,2). The second condition says that P I X 2 = O. Hence if we decompose any vector z into the sum z = XI + X2, XI E X I ,X2 E X 2 , then XI = Plz and X2 = (I - PI)z = P 2 z. For this reason we say that PI is a the projection onto Xl along X 2 . We will call it the SPECTRAL PROJECTION of the simple invariant subspace ,l'!" 
226 V. INVARIANT SUBSPACES When dim(X,) = 1, that is when XI = XI is an eigenveetor, the spectral projection is PI = xy" and IIPllb = IIY1Ib" We have already seen that this quantity is a condition number for the eigenvalue Al [(IV.2.8)]. The quantity II PI II will play an analogous role for LI' Hence it. is of int.cITst. t.o know t.lw sin!!;ular vaJups of PI' Theorem 1. 7. Let X be a simple hnmriant suhspace of A and let P be its spectral projection. Let Y be the corresponding left invariant subspace, and let BI ::::: B 2 ::::: . . . ::::: 0 be the canonical angles between X and y. Then BI <  and S(P) = {secBI,secB2,...}. (1.11) Proof. We will adopt the notation of Theorem 1.5, with Xl = X, etc. Since )'1 = XI + Y2Q, we have 1'1 11 )'1 = 1+ Q"Q. It follows that if PI, P2, . . . are the singular values of Q, then S(Y I ) = { )1 + pi,)1 + P ,...}. (1.12) Clearly the columns of Y I (I + QHQ) form an orthonormal basis for YI. Since the columns of Y2 form an orthonormal basis for XI.l, it follows from Corollary 1.5.4 that the sines of the canonical angles of XI and YI are the singular values of y.]IYI(I + QHQr = Q(I + QHQr. Hence . (j P; 1 sm ; =  < . V l + PT It follows that the canonical angles must be less than . Finally, (1.11) follows from (1.12) and (1.13) and the fact that XI in the expression P = XI y l H has orthonormal columns. . Although we shall not use the fact in the sequel, it is worth noting that a spectral resolution can be defined for more than two complemen- tary subspaces. An extreme example is given by the spectral decompo- sition (1.3.1), in which each invariant subspace is spanned by a single eigenvectoL This example also shows that although the simplicity of an invariant subspace is sufficient for a spectral resolution, it is not (1.13) necessary. 1. SIMPLE INVARIANT SUBSPACES 227 Notes and References The theory developed in this section is constructive, in the sense that given a basis for an invariant subspace, one can construct the associated spectral res- olution. From a pedagogical point. of vi(w, t1H approach has the advantage that. one can devplop the theory for a simple eigenpllir- something students grasp readily '- and then generalize it by replacing lower case letters with capital letters [214]. The disadvantage of the approaeh is that it deals only with simple invari- ant subspaces, whereas the set of all invariant subspaces of a matrix has a far richer structure. For a detailed exposition see the book by Gohberg, Lancaster, and Rodman [87, 19 86 ]. In spite of it.s simplicity, Theorem 1.1 is the key to perturbation theory for invariant subspaces, since it furnishes an equation t.hat. an invariant subspace must satisfy. To obt.ain perturbation bounds all one has to do is solve the equation. Equation (1.3) is known variously as Sylvester's equation and Rosenblum's equation [188, 1956]. The proof of the existence of a solution (Theorem 1.3) is constructive and can serve as a basis for efficient algorithms for solving the equation [11, Sg]. Integral representations of the solution have been given by Rosenblum (Exercise 1.4) and Bhatia, Davis, and McIntosh [31] (Exercise 1.5). The possibility of spectral resolutions into more t.han two blocks is treated in the exercises below. The ultimate spectral resolution is the Jordan canonical form (d. 1.3.4), in which the blocks are as small and simple as possible. However, the transformations which producp the Jordan form may be too ill conditioned to make it usable. This has led some algorithmists to seek resolutions in which the blocks are nearly as small as possible, given a bound on the condition of the transformations (e.g., see [95, 189, 18, 128]). Exercises 1. Given a (not necessarily orthonormal) basis for an invariant subspace of A, describe in detail how to compute its spectral resolution. THE FOLLOWING EXERCISES CONCERN THE SOLUTION OF SYL- VESTER'S EQUATION AX - X B = C AND THE ASSOCIATED OP- ERATOR T = X..... AX - X B. 
228 V. INVARIANT SUBSPACES 2. Assuming A and Bare diagonalizahle, show tlat '1' has a complete sysem of eigenvectors" Use this fact to give an alternatIve proof of Theorem 1.3. . f'1' ) L t X - (x x) and define 3. (Matrix representatIOn 0 . e, - ,I... m ve«X)  ( .,:, ) . Show that vec['1'(X)] = (1m <:9 A - B <:9 In)vec(X), where <:9 is the Kronecker product defined in Exercise 1.3.26. 4 (Il 11 [188]) Let r. be a sim I )le closed curve containing £(B) and . osen ) \!In ., ':f  , pxdudinp, C(A). Show that X = - {(A - (1)- I C(B - (1)-1 de. 27rZ ./ 9 5 (Bhatia, Davis, and McIntosh [31]) . Let A and B be normal with. L:A)n £(B) = 0. Let A = A!R+iA'}, where A!R and A'} are Hermitian, and sllIlIlarly for B. For t = (71 72? E R2, let " ( A + A ) alld V(t) _ _ e i(T,B!J1+T2B')) . U (t) = te' T, !J1 T2 ') R 2 . f . Show that if (h is any function integrable on sat IS ymg  T 1 e- 1f Tq;6C:r)d:z; == . , . R2 71 + Z72 IItll2  D, then X = ( U( -t)CV(t)(h(t) dt. ./R2 -0- THESE EXERCISES DEVELOP THE PROPERTIES OF SPECTRAL RES- OLUTIONS WITH MORE THAN TWO BLOCKS. . . I f A b ,,\ ,,\ Show that there are ma- 6 Let the distlllct el g enva ues 0 e I"" k. I II . , ) I tl I, X- - Y and trices X = (X I ... X k) and Y = (Y I ... Y k SUC I la - ylIAX =diag(LI,...,Lk), where C(Li) = {Ad. Conclude that A = XILly1 +... + XkLkyl. 2. PERTURBATION OF INVARIANT SUBSPACES 229 7. Let Pi = XiYi H (i = 1,..., k). Show that the Pi are (oblique) projections satisfying PiP j = 0 (i ¥' j) and A = PIAP I +... + PkAP k . 8. Let 4>(A) be defined as in Exercise II.2.20. Show that 4>(A) = X I 4>(Ldy k H +... + Xk4>(LdY1f. [Note: This exercise shows that the evaluation of 4>(A) may be reduced to the evaluation of4>(L i ) (i = 1,...,k). Since the orders of the L i may be much less than the order of A, this reduction may save a great deal of work.] -0- 2. Perturbation of Invariant Subspaces In this section we will treat two closely related problems. The first is the problem of assessing the accuracy of an approximate invariant subspace in terms of a residual. Specifically, let the columns of Xl form an orthonormal basis for an approximate invariant subspace of A. Let L = Xr AX I , and let R = AX I - Xl LI be the associated residual. If R = 0, then R(X 1 ) is an invariant subspace of A. This suggests that if R is sufficiently small there will be an invariant subspace XI of A that approaches R(Xd as R approaches zero. The problem is to bound the difference in terms of R. The second problem is our usual perturbation problem. Let XI be a simple invariant subspace of A and let Ii = A + E. Show that for sufficiently small E there is an invariant subspace XI of A, that approaches XI as E approaches zero, and bound their difference in terms of E. The two problems are closely related. For example, if the orthonor- mal columns of XI span an invariant subspace of A and we set !vf = Xr AX I , then R = AX - X M = EX. Thus for any unitarily invariant norm, IIRII :::; /lEII, and we may use any residual bound to determine how near R(Xd is to an invariant subspace of A. In fact, this is the general approach we will take in this section ,- first establish a residual bound, then derive a perturbation bound from it. 
230 V. INVARIANT SUBSPACES 2.1. The Approximation Problem Let (XI Y2) be a unitary matrix and let (XI 1 2 )IIA(X I Y2) = ( :). (2.1 ) By Theorem 1.1 the space R(Xd is an invariant subspace if and only if G = yli AX I is zero. The problem treated in this section is to determine how near R(X I ) is to an invariant subspace of A, when G is small. The solution is conceptually simple, although fussy to realize. Let XI = (XI + );P)(I + pH P) (2.2) and f 2 = ()2 - XI pll)(I + P pll)-, (2.3) where P is a matrix to be determined so that R(XJ) is an invariant subspace of A. It is easy to see that (XI }7 2 ) is unitary. Hence by Theo- rem 1.1, R(Xd is an invariant subs pace of A if and only ify{IAX I = O. Writing this condition out in terms of (2.2) and (2.3) yields the follow- ing nonlinear equation for P: P LI - L 2 P = G - PH P. (2.4 ) If we define T = P I-> P L I - L 2 F, (2.5) then this equation can be written T(P) = G - PHP. (2.6) The conditions under which this equation has a solution are given in the following theorem, in which II . II represents a consistent family of norms. Theorem 2.1. Let (XI Y2) be unitary. Let LI' L2' G and H be defined by (2. I) illld set , = IIGI!' 1] = IIHII. 2. PERTURBATION OF INVARIANT SUBSPACES 231 Assume that £( L ) n £ ( L ) - (/i 1 (2 5) . . I 2 - '11, so t lat the operator T defined by . IS nonsmgu1ar, and set b = sep(L I , L 2 ) (f inf IIT(P)II > O. IIPII=I Then if ,T] 1 7)i < 4' there is a unique solution P of (2.6) satisfying (2.7) I I P II < 2, , - _ vi _ 2 < 2 -. u + u - 4,17 b If XI and Y2 are defined by (2.2 ) and ( 23 ) then 'D ( {, ) 1 'D (} ' ) . 1 . 1 ., 11.- """ I an ( 11.- 2 are sImp e ng It and left invariant subspaces of A Tl t . with respect to XI is . Ie represen atlOn of A (2.8) £1 = (I + pHp)(LI + HP)(I + pHp)-. The representation of A with respect to Y2 is £2 = (I + PpH)(L2 - PH)(I + ppH). (2.10) Proof. The existence of a P satisfying (2.8) is a consequence of Theo- rem ?11 .at the end of this section. By construction R( X ) and R(Y: ) are Ivanant subspaces. We will establish their simpliclity later t remams only to estblish the epresentations, (2.9) ,and (2.10). . The representatIOn of A wIth respect to Xl is X AX I . From (2.2) AH ' H 1 XI AX I = (I+P P)2(LI+HP+pHG+pHL2P)(I+pHp). (2.11) From (2.4), L 2 P = P LI + PH P - G. If this value is substituted into (2 11 ) th It f '. t' . ( 2 ) . , e resu , a ter some slmphfica- lOn, IS .9. The representation (2.10) follows similarly. . . We now trn t.o an extended discussion of the theorem. The first pomt to consler IS the interpretation of P and its norm. By Corol- lary 1.5.4 the sl1lgular values of the matrix (2.9) 11 A Y 2 Xl = P(I + plIp)- 
232 V. INVARIANT SUBSPACES are the sines of the canonical angles B 1 , B2'" . between R(Xd and the invariant subspace R("Yd. If 7r1, 7r2,. .. are the singular values of P, then 7ri . ()  = sm 0i. V 1 + 7rT Hence 7ri = tan B i . Thus, loosely speaking, the theorem bounds the tangents of the cnoni- cal angles between R( X I) and an invariant subspace of A. In partIcular, since sin B 'S tan B (0 'S B 'S ), if II . II is the spectral norm, then A 12 IIP I - nll2 'S 2 t5 2 ' where PI and PI are the orthogonal projections onto R(Xd and R(Xd. In terms of the Frobenius norm A IF IIP I - P 1 11F 'S 2v 2 t5 F (see Theorem L5.5). . . There are three numbers - ,- I' 1/, and t5 -- that determme the eXIs- tence and size of P. The number 1 is closely related to the residual R = AX I - XIL I . In fact, since L, = XrAX I , it follows that H ( 0 ) (XI 1'2) R = G . Hence if II . II is unitarily invariant, 1 = IIRII. Moreover, by Theo- rem IV.1.15, R is optimal in the sense that IIRII is minimal among all residuals of the form AX I - XIL. The number 17 = II H II is of only secondary importance in the bound (2.8). However, it features proninently in. the condition ITJ/t5 2 < 1/4, which insures the existence of X I. Here It plays a role analogous to the quantity 1/ introduced at the beginning of the second subsectin of Section IV.1. If 17 is small, the eigenvalues of L 1 and L 2 are effectIvely 2. PERTURBATION OF INVARIANT SUBSPACES 233 uncoupled. On the other hand, if 17 is large, the eigenvalues of LI and L 2 look like a single cluster compared to IIAII, and the residual must be very small for our theorem to hold. Before we go on to discuss the meaning of the number t5, it will be worth our while to recast part of the theorem in the less cluttered form of a residual bound. The key is to note that if we set 5 = Xr A - LI Xr, then 11511 = II H II for any unitarily invariant norm. In the following corollary we change our notation slightly. eorollary 2.2. Let II . II be a unitarily invariant norm. Let (X Y) be unitary and X = R(X). Let L = Xli AX and M = ylIAY. Finally, let R=AX-XL and 5=X H A-LX H . If IIRII1I511 1 sep(L, M)2 < 4' then there is a simple invariant subspace X of A sl1ch that II tan[8(X, X)]II < 2 IIRII sep(L, M) We now turn to the number t5 = sep(L I , L 2 ), which is the thing that makes the whole theory work. As the name "sep" indicates, it is related to the separation of £(Ld and £(L 2 ). Theorem 2.3. For any square matrices Land M, sep(L, M) ::::: min I£(L) - £(M)I. (2.12) Proof. As above, set T = P I-> P L - M P. If sep(L, M) 0, the inequality (2.12) holds trivially. Otherwise, T is nonsingular, and sep-I(L, M) = sup T-I(Q) = liT-III. IIQII=1 
234 V. INVARIANT SUBSPACES Now by Theorem 11.2.6 the spectral radius of T- I is bounded by liT-III. I3y Corollary 1.4, .c(T) = .c(L) - £(M). Hence sep-I(L, M) 2': p(T- 1 ) = max I£(L) - £(M)I-I, which is equivaknt to (2.12). _ Theorem 2.3 and the bound (2.8) imply that if some of th eigen- values of LI and L 2 are close then the invariant subspace R(X I ) may be distant from R(Xd. However, the converse need not be true, since sep(L I , L 2 ) can be small when the eigenvalues of L 1 and L 2 are well separated, as the following example shows. Example 2.4. Let L = ( ) and M( ) Then £(L) = {OJ and .c(M) = {:l::d}, so that 6).. = minl.c(L) - .c( M) I = d. On the other hand, sep( L, M) = E. Thus sep( L, M) can be arbitrarily smaller than the distance between the eigenvalues of L and those of M, in the sense that limco sep(L, M)/6).. = O. The distance 6>. between the eigenvalues of L and those of M is neceRsarily a continuous function of the elements of Land M; however, it need not be analytic. For example, perturbing M in the above ex- ample so that it becomes equal to L changes the (2,2)-element of M by E. However it changes 6).. by EL The function sep(L, M) is better behaved. Theorem 2.5. sep(L,M)-IIEII-11F1l S sep(L+E,M+F) S sep(L,M)+IIEII+IIFIi. Remark 2.6. If the norm II . II is unitarily invariant, then we may replace IIEII and IIFII by IIEII2 andllFl12' 2. PERTURBATION OF INVARIANT SUBSPACES 235 Proof. For the first inequality, sep(L + E, M + F) = inf l lT'll=1 II(L + E)P - P(M + F)II 2': infllPII=dllLP - PMII-IIEPII-IIPFII} 2': inf i l1'll=1 {IILP - PMII} - IIEII- IIFII} = sep(L, M) - IIEII - IIFII. The second inequality is established similarly. _ Thus a perturbation in L or M cannot induce a larger perturbation in sep(L, M). This stability of the function sep will be important in establishing the perturbation bounds of the next subsection. The representations (2.9) and (2.10) imply that L:(A) = £(L 1 + H P) U £(L 2 - P lJ)" By (2.8) IIPHII,IIHPII < 2 ,T] , 6 and it follows from Theorem 2.5 that (2.13) A A ,T7 sep( L I , L 2 ) 2': 6 - 4-y > 0, the last inequality following from (2.7). This implies that £(Ld n £(L 2 ) = 0, which shows that R(Xd and R(Y2) are simple invariant subspaces. An important consequence of the approximation theorem is a new class of residual bounds for the eigenvalues of A. Specifically, we have II(L I + PH) - LIII = IIPHII S 2 ';7 . Since LI is known and L:(LI + PH) C £(A), we may use the pertur- bation techniques of the last chapter to bound the distance between a subset of the eigenvalues of A and the eigenvalues of LI' Comparing these bounds with the bounds from Theorem IV.1.13, we see that the approaches are different and give different results. In the case of residual bounds, we apply perturbation theory to a man- ufactured perturbation A + E to relate the eigenvalues of LI (M in Theorem IV.1.13) to those of A. In the approximation theorem, we 
23G V. INVARIANT SUI3SPACES apply perturbation theory to L 1 to relate its eigenvalues to those of LI + H P, which are a subset of the eigenvalues of A. Moreover the perturbations are of different sizes: ry in the case of the residual bounds and bounded by ry1] / {y in the approximation theorem. Which is better will depend on the application. 2.2. Perturbation Theorems The key to obtaining perturbation theorems for invariant subspaces is to cOlllbine t.lJ(' approximation theorelll with continuity of the measure sep. Specifically, we have the following theorem, whose proof is left as an exercise. Theorem 2.7. Let (XI Y 2 ) be unitary and suppose that R(Xd is a simple invariant subspace of A, so that (XI Yd'A(X I Y 2 ) = ( LI H ) . o L 2 Given a perturbation E, let (XI y 2 )11 E(X I }2) = ( Ell EI2 ) . E 21 En Let II . II represent a consistent family of norms and set i'=IIE 21 11, Tl = IIHII + IIE1211, 6 = sep(L I , L 2 ) - IIEI1I1 - IIEdl. If {y > 0 and -- 1 ry1] < _ 6 2 4' there is a unique matrix P satisfying 2- - II PII :::; _ _ ry < 2::; {y + J {y2 - 4i'T] {y 2. PERTUIU3ATION OF INVARIANT SUUSI'ACES 237 such that the columns of - 1 XI = (XI + Y 2 P)(I + pHpt'i and Y2 = (Y 2 - XlpH)(I + ppH)-! fr:. JrIl1 orthonormal bases for simple right and left invariant subspaces of A = A + E. The representation of A with respect to XI is £1 = (I + pH P)! [LI + Ell + (H + EdP](I + pH P)!, and the representation of A with respect to Y 2 is £2 = (I + PpH)-![L2 + E 22 - P(H + E I2 )](I + ppH)!. The comments following Theorem 2.1 apply here. In particular the singular values of P are the tangents of the canonical angles between R(X I ) and R(Xd. The expressions for II and £2 are somewhat awkward to interpreL This is because of the way we have chosen to express XI and f;. If we choose different expressions, we will obtain different bases for the perturbed invariant subspaces and hence different representations for A on those subspaces. A good choice, it turns out, is to express XI in terms of the spectral resolution of A. Specifically, let (XI X 2 t l = (11 y 2 )H and ( H ( LI 0 ) Y I Y 2 ) A(X I X 2 ) = , o L 2 (2.14) as in (1.9). If we seek XI and }";.2 in the forms XI = XI + X 2 P and Y 2 = Y 2 - ylpH, then we have the following theorem. 
238 V. INVAnIANT SUI3SPACES Theorem 2.8. Let A have the spectral resolution (2.14) and set " . II - ( Fu F12 ) (11 Y2) E('{I X 2 ) = F 21 Fn . Let II . II represent a consistent family of norms, and set .:y = IIF 21 11, 11 = IIFnll, 8 = sep(L I , L 2 ) - 11F1l1l -IIFnll. If 6 > 0 and - - 1 ,T] < _ 6 2 4' there is a unique matrix P satisying 2- - IIPII ::; _ _' < 2::r, {j + J {j2 - 4.:yil {j such that the columns of XI = XI + X 2 P and - II Y 2 = Y 2 - YIP form bases for simple right and left invar!ant subspace of A = A + E. The representation of A with respect to XI is £1 = LI + Fu + FI2Pf and the representation of A with respect to )/2 is £2 = L 2 + F 22 - PF I2 . Proof. Apply the approximation theorem to eliminate F 21 E in the matrix ( LI + Fll F I2 ) . (Y I y 2 )H(A + E)(X 1 X 2 ) = F 21 L 2 + F 22 . 2. PERTURI3ATION OF INVARIANT SUI3SPACES 239 Remark 2.9. Since XI and Y2 are the same in Theorems 2.7 and 2.8, we have E 2 ] = F 2 ]. Hence if p] denotes the matrix P produced by Theorem 2.7 and P 2 denotes the matrix P produced by Theorem 2.8, then PI(L I + Ell) - (L 2 + En)PI = E 2 ] - PI(H + E I2 )P I , and P 2 (L I + F Il ) - (L 2 + F 22 )P 2 = E2l - P 2 (F 12 )P 2 . Since these two equations differ in terms of order 0(IIEII2), we have the remarkable result: PI = P2 + 0(IIEI12). It is instructive to consider the difference between Theorem 2.7 and Theorem 2.8 from_a different point of view. There is no unique basis for the subspace Xl' However if we chose a matrix Z whose columns span a subspace that is acute to XI, the normalization H - Z Xl = I, along with R(X I ) = :f\, uniquely determines "Y"I' In Theorem 2.7 we require the normalization XII XI = 1+ 0(IIEII2), (2.15) whereas in Theorem 2.8 we require y l H XI = I + 0(IIEII2). (2.16) Which is the better theorem? It depends on the application. If the angles between the invariant subspaces themselves are the chief concern, it does not make a great deal of differcnce, since the matrices P produced by the two theorems are the same up to second order terms; the difference is in the way they are used to get a basis for R( X I)' On the other hand, if the representations of A on the perturbed subspaces are of interest, then Theorem 2.8 is the more natural. In the expression £1 = Ll + Fu + F 12 P, the matrix F 12 P is of order IIEII2. Since Fu = y/I EX l , we have £1 = LI + y/IEX I + 0(IIEII2). 
240 V. INVARIANT SUBS PACES Since yll X I = I, this equation is completely analogous to the relation  = A + yllEx + 0(IIEII 2 ), which we derived in Theorem IV.2.3. Moreover, if we write L 1 1,/' (/\ -t- F.:)X I + O( II E1I2) and compare t.his C'xpression with (IV .2" 7), we see that. th(' 111<1 trix 1',11 (A + E)X I is a generalization of the Rayleigh quotient, which we will call the GENERALIZED RAYLEIGH QUOTIENT. Furthermore, since X f' X I = I, for any unitarily invariant norm IILI - L111 ,:s IIY I I121IEII. Thus II Yi Ib is a condition number for LI' This number is also the norm of the spectral projection associated with the invariant subspace R(X]), which justifies the statement that the condition number of an eigenvalue is the norm of its spectral projection. Moreover, by Theo- rem L7, IIY I I12 = see 0" where 0 1 is tlw largest canonical angle between R(Xd and R(Yd. Hence, the condition number of LI (with respect to the normal- ization (2"16)) is the secant of the largest canonical angle between R(Xd and R(Y I ). 2.3. Eigenvectors When the concern is with eigenvectors, the preceding results simplify considerably, since the operator T - in an obvious specialization of the abov(' notation- becomes the matrix >,,1 - L 2 . Hence for both theorems, p  (:>,,1 - L 2 )-I};H EXI' For Theorem 2.7, XI  XI + Y 2 (A I I - L 2 t l y 2 H ExI, while for Theorem 2.8 XI  XI + X 2 (A I I - L 2 )-IY,JIExI. The matrix (AI - /\)# = X 2 (/\II - £2) 11;11 2. PERTURBATION OF INVARIANT SUBSPACES 241 is called the GROUP INVERSE or DRAZIN GENERALIZED INVERSE of Al I - L 2 (see Exercises I11.1.23II1.L24). Since for Theorem 2.8 IIxI - xIII ,:S II(AI - A)#IIIIEII. the number II (AI - A)# II is sometimes said to be a condition number for the eigenvector XI; but as we have seen above this depends on the application. If the angles between the eigenvectors are the concern, then 81 = II(AII -L 2 t l ll is the condition number of the problem. But if we are interested in the difference between the eigenvectors when yr I XI  1, then II (AI - A)# II is the condition number. Here is an example of the latter application. Example 2.10. A square, nonnegative matrix A is said to be STOCHAS- TIC if A1 = 1; i.e., its rows sum to one. Clearly, 1 is a. right eigenvector of A corresponding to the eigcnva1ue onc" 1\10rcover, if A is irreducib1c (see Example 2" 7 for the definition), then one is a simple eigenvalue and hence has a unique, positive left eigenvector p that satisfies pT1 = 1. With this normalization, the components of p can be regarded as prob- abilities. Now if we perturb A in such a way that Ii remains an irreducible stochastic matrix, then we will want to keep the normalization yH 1 = 1, so that we can continue to regard the components of y as probabilities. In this case the Drazin generalized inverse provides the condition 11l1m- bel' for the problem. The two theorems give us two bounds for eigenvalues -- the first a bound for the eigenvalue itself, the second for the Rayleigh quotient. For the eigenvalue we have I - AI ::; (1 + 2) IIEII. For the Rayleigh quotient we have -2 I). - (A + yr 1 E.Tdl ::; 2  . 
242 V. INVARIANT SUI3SPACES 2.4. Solution of a Nonlinear Equation In this subsection we will prove a general theorem that can be used to establish the existence of P in Theorem 2.1. We state and prove it for a Banach space, which the reader may take to be em x n. Theorelll 2.11. Let l' be ;l bouuded lincar operator on a Banach sp<lce B. ASSIIlIIC I.hi! t T hits a bouuded inverse, and set, b = 111'-111-1. Let t.p : B -t B be a function that satisfies 1It.p(x)1I :S 1]llxlI2 and 1It.p(x) - t.p(y) II :S 2Tlmax{llxlI, Ilyll}llx - yll for somc 1] :::: 0" For any 9 E B, let , = Ilgli. If ,Tl P = 4'62 < 1, thcn the sequence defined by Xo = 0 and :rA,+1 = T-I[g + t.p(Xk)], k = 0,1,... (2.17) converges /0 the unique solution of Tx = 9 + t.p(x) that satisfies 2, , IIxII :S b + Jb2 _ 4,1] < 28' 1\'1oreover, pk , Ilxk - xII :S --. 1 - P b 2. PERTURI3ATION OF INVARIANT SUI3SPACES 243 Proof. We first construct an upper bound on Ilxkll. From (2"17), Ilxk+111 :S IIT-III(llgll + 1It.p(Xk)ID :S  + llxkIl2. Consequently if we set o = 0 and , Tl 2 k+l = b + bk' k = 0,1,".., then IIxkll :S k. Now the sequence o, 6, . . . is clearly increasing. Moreover, if p < 1, the function rjJ()=+e, has a smallest fixed point  _ 2, · - b + V b 2 - 4,1] If k :S ., then k+1 = rjJ(k) :S rjJ(.) = .. Hence all the k are bounded by ., and the sequence {d must converge to .. Thus , Ilxkll :S II.II < 28' We next show that the sequence {xd converges. We have II X k+1 - xkll :S IIT-IIIlIt.p(Xk) - 'f/(xk-dll :S 2b- l 1]max{lI x kll, IIXk-dl}lIxk - Xk-III :S PIlXk - .Tk-III. Hence IIXk+1 - xkll :S llixI - xoll :S l. It follows that {Xk} is a Cauchy sequence and must have a limit x. Moreover, 00 00 " , pk, IIx - xkll :S L IIXi+l - xdl :S LP'- = --. . i=k i=k b 1 - P b 
244 V. INVARIANT SUI3SPACES Notes and References Alt.hough various results for the perturbation of eigenvectors are found scat- t.ered in the literature [274, 69, 269], the modern approach via invariant s1lhspaces crystallized in the sixties and early seventies. For Hermit.ian mat.rices, a litUe note by Sw<tnHon [2:32, 1961] contains in embryonic form m1lch of what was to follow. The first dear statement that the problem really concerned invariant subspaces is due to Davis [50, 51, 1963, 19 G 5]. The importance of Sylvester's equation emerged in the famous paper of Davis and Kahan [53, 1970], whose content more than justifies its impenetrability. For nonnonnal matrices the problem is complicated by two facts. First, there is no orthonormal system of eigenvectors. Second, the differences among the eigenvalues of a nonnonnal matrix are not Lipschitz continuous. In his thesis, Varah [250, 1967] (see also [251]) ameliorated the first difficulty by working with spectral resolutions, whose transformations are in general better conditioned than the matrix of eigenvectors; however, his bounds have thc distancc betwecn the spectra raised to a power in the denominator. Iluhe [190, 1970] proposed replacing this difference with the smallest singular value of a power of A - ).,] for suitable A. The use of an orthogonal reduction to block triangular form to circumvent the first difficulty and the introduction of the function sep to circumvent the second is due to Stewart [200, 1971], who proved his theorems for closed operators in a Hilbert space. The exposition in this chapter is based on a later survey paper [202, Ign]. Lower bounds on the function sep have been given by Sun [226, 1984] Yamamot.o [275, 1980] cxploits a different nonlinear equation and eigenpair to get component-wise bounds (Exercise 2.6). The fact that the generalized Ilayleigl quotient Yu1XI provides a first-order approximation to the represent.ation £1 appears to have first been noted by Stewart [202, 1973], although it is readily derivable from standard perturba- tion expansions, such as are found in Kato [135, Ch.II]. The observation that a change in normalization [e.g., from (2.15) to (2.16)] leaves the multiplier P essentially unaffected may be found in [210" For eigenvectors, nonlin- ear normalizations are common; for example, one frequently requires that i H i = 1. This complicates the asymptotic theory for complex matrices, since the normalization may not be analytic. Meyer and Stewart [155, Ig88] have treated this problem in detail. Owing t.o UH'ir Ht.r1l('(,1I1'(', t.he pert1lrbation t.heory for HtochaHtic mat.ricm 2. PERTURBATION OF INVARIANT SUBSPACES 245 can be developed independently of the theory here [194, 106, 47, 155] (see Exercise 2" 7). Exercises TilE FOLLOWING EXERCISES DEVELOP SOME OF TilE ELEMEN- TARY PROPERTIES OF THE FUNCTION sep, WHICH WE SUPPOSE TO BE DEFINED WITH RESPECT TO A CONSISTENT FAMILY OF NORMS II. II. 1. Show that if X and Yare nonsingular, then sep(A, B) I\:(X)I\:(Y) :::: sep(X AX-I, Y By-I) :::: I\:(X)I\:(Y)sep(A, B). 2. Show t.hat if X and Yare unitary and /I . II is uuitarily invariant, then sep(X H AX, yH AY) = sep(A, B). 3. Show that if A = diag(AI ... A k ) and B = dia g(B B ) tl ' , I, . . ., I, ,1en sePF(A, B) = min{seP F (A; B J " ) : i = 1 ... k. J " = 1 I} , '",..., . 4. Show that if A and Bare diagonalizable, then sepF(A B) > IL(A) - L(B )j , - 1\:2(X)1\:2(Y) , where X and Yare matrices of the eigenvectors of A and B. 5. Let II . /lr and /I . lis be consistent norms satisfying allPlir :::: IIPII" :::: TIIPII,q for all P E c nxn . Show that a sepr(A, B) :;::: -seps(A, B). T Use this fact to bonnd seP2 in tf'nns of sep". -0- 
246 V. INVARIANT SUBSPACES 6. (Yamamoto [275]). Let (x,) be a simple eigcnpair of A, with 11:f:112 L Let (:r;, A) = (:i: - h,  - 1/) be an approximate eigcnpair. Set E max{llhll,I1)I}. Show that if f is sufficiently small then the matrix ( A - A/ X ) l J1 () is nonsingular and ( A - AI X ) ( h ) _ _ ( Ax - AX ) O( 2) l Jf 0 1/ - !(1- Ilxll) + f. Analyze this ('quation to obtain an approximatiOlJ theorem for eigcnvectors. 7. Let A and A be stochastic matrices, each having one as a simple eigen- value. Let y T and fiT be the corresponding left eigenvectors, normalized so t.hat yT 1 = fiT 1 = 1. Show that fiT - yT = yT E(I - A)#. 3. Hennitian Matrices \.ye now turn to the the pertllfbation of invariant subspaces of Hermi- tian matrices. In the next two subsections, we will apply the theory of the last section to Hermitian matrices - first the approximation the- orem and then the perturbation theorem. In the third subsection we will deVflop part of the elegant Davis-Kahan theory of invariant sub- spaces. Finally, we will develop two residual bounds that in some cases are improvements on the bound of Theorem IV.4.14. Throughout this section A wjJl be a Hermitian matrix of order n, as wjJl the error matrix E. 3.1. The Approximation Theorem When A is Hermitian, several things simplify in the approximation theorem. In the first place, H = G II . It follows that any unitary similarity that reduces G to zero also reduces II to zero, and hence that n(}2) is the invariant subspace complementary to n(X 1 ). 3. HERMITIAN MATRICES 247 If II . II is unitarily invariant, then, and the condition 1],/ ry2 < 1/4 becomes , 1 -<- b 2 The most striking simplification occurs when we take II . II to be the Frobenius norm. IIGII IIGlIlI 1], IIHI! Theorem 3.1. If Land 111 are Hermitian, then sepF(L, M) = min I£(L) - £(M)I. Proof. As usual, let T = P I-> P L - M P. For any matrix P (PI P2 "" PI) let vec(P) = PI P2 PI Then vec(PL) = Lvec(p), where All 1m A12 I m All/", L= A21 I m ).,22 1m ).,21 1m All 1m AI2 I m All 1m Similarly, vec(M P) = Mvec(P), where !v! = diag(M, M,..., M). Hence vec[T(P)] = (L - Jvf)vec(P). Since Land M are Hermitian, the linear operator T is Hermitian. Since IIPIIF = Ilvec(P)112' sePF(L, M) = inf T(P) = min £(T) = min I£(L) - £(M)I, III'IIFI 
248 V. INVARIANT SUBSPACES tl\(' last equality following from Corollary 1.4" . Thus for Hermitian A, the number 81' in the approximation theorem truly measures the distance between the eigenvalues of LI and those of L 2 . Since LI and £1 = (I +pll P) (L I +H P)(I +pll P)- are Hermitian, we may use Theorem IV.5.5 to bound the eigenvalues of L. Theorem 3.2. In the notation of the approximation theorem, let II . II be the Frobenius norm, so that 8 = min 1.c(LJ) - .c(L 2 )1. L('t th,8 eigcnvalues of LI be Al ;::: ... ;::: Ak and those of LI be Al > . . . ;::: Ak' Then : ( 2 ) 2 , , , " ( A - A ) 2 < 2 1 + 2- -. L' ,- 8 2 8 ,=1 (3.1 ) Proof. I3y Theorem IV.5.5 we have k I)Ai - i)2 S K:d(I + pllp)]IIJ{IIFIIPIIF' (3"2) i1 Now IIHIIFIIPIIF S 2,2/8. Moreover, II (I + pH Pt 112 S 1. Finally, II(J + pllp)lb S (1 + IIPII) S 1 + II" S 1 + 2 : . Combining these inequalities yields (3.2). . Note that as , ---+ 0, the constants 2 in the bound (3.1) can be replaced by functions that approach 1 [cf., (2.8)]. Consequently, the right-hand side of (3.1) is bounded by a quantity that is asymptotic to ,2/8. 3.2. Generalized Rayleigh Quotients So far as invariant subspaces are concerned, the comments of the last subsection apply to the perturbation theorems. However, the repre- sentations of A on the perturbed subspaces provide new perturbation 3. HERMITIAN MATRICES 249 theorems for eigenvalues. Since A is Hennitian, it is most natural to work with Theorem 2.8, taking Xi = Y; (i = 1,2). We will also take 11.11 to be the Frobenius norm, so that and ii = TJi and 81' is the distance between the spectra of LI and L 2 . The proof of the following theorem is similar to that of Theorem 3"2 and is left as an exercise. Theorem 3.3. Let the Hermitian matrix A have the spectral resolu- tion (XI X 2 )HA(X I X 2 ) = ( LI 0 ) o L 2 ' where L 1 is k x k, and set ( FIt Flr ) (Y I YdIE(X I X 2 ) = . F 21 F 22 Let i = IIF 21 11F and 8 = sePF(L I , L 2 ) - 1IF1lib - II Fdb. Let Al ;::: . ., ;::: Ak he Uw eigenvalues of Lt and )..1 ;::: ... ;::: )..k l>e the eigenvalu_es of the genf!.ralized Rayleigh quotient £1 = X: 1 (A + E)X 1 . Then if 8 > 0 and i / 8 < 1/2, there are eigenvalues jl' . . . , jk of Ii satisfying I)ji - >';)2 S 2 ( 1 + 2 2 ) i_2 . i=1 8 2 8 This theorem shows that the eigenvalues of the Rayleigh quotient are, up to terms in IIEII2, eigenvalues of A. 3.3. Direct Bounds A great deal of the complexity of the preceding theory is due to the necessity of establishing the existence of the perturbed invariant sub- space. For Hermitian matrices the existence is often obvious. For ex- ample, in the usual notation, suppose that min l.c(LJ) - .£:(L 2 ) 1 = 8 and IIEII2 < %. Then by Corollary IV.4.6, min I.£:(LJ) - .£:(£2)1 > 0, and 
2[)(} V. lNVAIUAN'l' SUBSI'ACES there_ are unique complementary invariant subspaces associated with £1 and L 2 . Tlms we may assume the existence of the perturbed invariant subspace and proceed directly to bounds on the canonical angles be- tween the original and its perturbation. This general approach is due to Davis and Kahan. TlH' first. "sin 8" thearem is so called because it bounds the sum of squares of the sines of the canonical angles between an invariant subspace of A and an approximation. Theorem 3.4. Let A have the spectral resolution ( yH ) "I A(X 1 X 2 ) = diag(LI' L 2 ), X 2 where (XI X 2 ) is unitar.y with XI E e nxk . Let Z E e nxk have or- t1lOnorma1 co1u111ns, and for any Hermitian !v! of order k, let R = AZ - ZM. If b = minl£(L 2 ) - £(M)I > 0, (3.3) then II sin 8[R(Xd, R(Z)]liF ::; ""F . Proof. From the definition of R and the fact that Xr A = L2XI, we have XIR = L2XIZ - XrZM. (3.4 ) I3y Thearf'm 3.1, IIX' ZIIF ::; ""F . By Theorem 1.5.5, IIXrZIlF = II sin0)[R(Xd, R(Z)]IIF' . Thus the theorem bounds the error in the approximate subspace in terms of the residual R. Since Z and M are arbitrary, we may use it to assess the accuracy of the vector from any approximate eigenpair (z, f.l), provided we can find a lower bound the distance from Ii to n - 1 eigenvalues of A. 3. llEHMITIAN MATlucES 251 As k becomes large, the conclusion of the theorem becomes less and less satisfactory because II sin8[R(X]), R(Z)]IIF can be large even when the individual canonical angles are small. What we need in this case is a bound on II sin 8[R(X I ), R(Z)]1I2, which we can obtain if we are willing to place further restrictions on the spectra of L 2 and M. We begin with a lemma. Lemma 3.5. Let II . II be a consistent nonn Lf't A and B be square with II All ::; a and liB-III-I;::: a + b, where b > O. If AX - X B = C, then IIXII ::; II" . Proof. By consistency IIAXII ::; aliXl1 and IIX BII ;::: (0' + b)IIXII. Hence IICII ;::: IIBXII - IIAXII 2: (0 + b)IIXIl - aliXIl 2: bliXIi. . We are now in position to establish a second sin 8 theorem. Theorem 3.6. In the notation of Theorem 3.4, Sllppose tlwt £(M) C [a,;31 (3.5) and that for some b > 0, £(L 2 ) C R \ [0' - b,;3 + b]. (3.6) Then for any unitarily invariant nor111 . IIRII II sm 8[R(X]), R(Z)] II ::; T' Remark 3.7. The matrices LI and M may be switched in (3"5) and (3.6). Proof. By translating the spectra of A and M, we may assume without loss of generality that a = -;3. The result now follows on applying Lemma 3.5 to (3.4). . In some applications, the columns of the matrix Z may not be orthonormal. The following theorem shows that with an appropriate correction factor, the above bounds continue to hole!. 
252 V. INVARIANT SUBSPACES Theorem 3.8. In Theorems 3.4 and 3.6, let inf 2 (Z) (i.e., the smallest singular valuf' of Z) be positive. Then II sin e[R(Xd, R(Z)]II::; .IIRI ) ' bmf 2 Z Proof. Let Z = Q R be the QR factorization of Z. Then the proofs of the sin 8 theorems show that IIX' ZII = IIX'QRII ::; IIII . But IIxi'QRII :::: inf 2 (Z)IIXi I QII = inf 2 (Z)1I sin (0[7(XI), R(Z)]II. . We conclude this subsection with a bound on the tangents of the canonical angles. Here we must restrict 111 to be equal to ZH AZ and impose further restrictions on the disposition of the spectrum. The idea of the proof is to prove the theorem for the norms II . 111» associated with Fan's symmetric gauge functions <Pj(;Z;) = . llIax {I;ll +... + Iil}" l<::l1<"p<zj:':n ) We begin with a lemma. Lemma 3.9. Let R have singular values 0"1 2: 0"2 2: . " 2: O"k. If Rj is any leading principle suhmatrix of R, then j trace(Rj) ::; L O"i' i=1 Proof. By Theorem 1.4.4 the sum of the singular values of Rj is less than or equal to 'Li=l O"i. By Lemma 11.3.4 the trace of Rj is less than or equal to the sum of its singular values. . 'rVe llIay now prove the tan e theorem. 3. HERMITIAN MATRICES 253 Theorem 3.10. In the notation of the sin e theorems, let M = ZHAZ, and assume that £(M) C [0',,3] while for some b > 0 £(L 2 ) C (-00, 0'- b] (or £(1 2 ) C [,8 + b, 00)). Then II tan 8[R(X I ), R(Z)]II ::; IIII . Proof. SOllie preliminary transformation will make the proof easieL First, we may assume without loss of generality that £1 and L 2 are of the same ordeL For if the order of 1 1 is less than that of 1 2 , we may augment the reduced form of A to diag(LI, vI, 1 2 ), where v E [a, ,8]. This will make no difference in the final bounds. N ext by passing to canonical bases, we may assume that Z=() and ( r - ) X = (XI X 2 ) =  r ' where r = diag( cos B i ) and  = diag(sin B i ) consist of the cosines (in ascending order) and the sines (in descending order) of the canonical angles between R(Z) and R(Xd. In this coordinate system partition A in the form A = ( All AI2 ) . A 21 A 22 Since M = All, we have R  (;: ;:)( n (n Au  U, ) (3.7) 
254 V. INVARIANT SUBSPACES Note that by the simplification of the preceding paragraph all the above submatrices are square and of the same order. Since (-2:: r)A = L 2 ( -2:: f), Oil mult.iplyinp; (3.7) by (-2:: f) we have 1'A' 21 = 2::A ll - L 2 2::. The ith diagonal dement of this relation is (2]) _' (II) (22) . cos Bin ii - sm B i ( nii - '\i ):::: b sm B i , the last inequality following from the fact that n;;I) E [a,,8] and A;;2) E (-00, a - b]" Since b > 0, the cosine of B i cannot be zero. Hence 0:;;1) :::: b tan B i . It follows from Lemma 3.9 that j j IIRlI lJlj :::: La;;I):::: bLtanB i = IItan811lJ1j' i=1 i=1 where 8 = diag(e i ). The theorem now follows from Fan's theorem (Theorem 11.3.17). . Although the hypothesis on the situation of the spectra of 1.1 and L 2 may appear unnecessarily restrictive, it is necessary (Exercise 3.3). However, it answers to the frequently occurring case where Z is an approximation to an invariant subspace corresponding to the largest (or smallest) eip;envalues of A. 3.4. Residual Bounds for Eigenvalues In Example IV.4.17 we saw that the residual bound for eigenvalues provided by Theorem IV.4.14 could be off by orders of magnitude. The problem is that the bound does not take into account the situation of the spectrum. In this subsection we will apply perturbation theory for invariant subspaces to derive new bounds that do just that. Throughout this subsection X will be an n x k matrix with or- thonormal columns. We will set 111 = XII AX 3. HERMITIAN MATRICES 255 ami R = AX - X 1.1. We wish to show that there are k eigenvalues of A near the eigenvalues 1],1 :::: . . . :::: I],k of 1.1, and further show that the difference is proportional to IIRII2. The basic idea is simple. We use olle of the direct bounds from the last section to give us a matrix X = (X + Y P)(I + pI! P)-! whose columns span an invariant subspace. We then know from the approxi- mation theorem that if = ""yH AY is similar to 1.1 + e H P. But e and p are both of order II RII, so that an application of perturbation theory for eigenvalues will give a bound of order IIRII2. Since the direct bounds do not give us explicit relations between the subspaces, we must begin with a lemma that allows us to deduce the ex- istence of P. Its proof, which llses the canonical bases of Theorem 1.5.2, is left as an exercise. Lemma 3.11. Let (X Y) be unitary lfR(X) and ,:f' are acute, there is a matrix P such that R(X + y P) = X. The singular values of P'(I + pH P)-! are the sines of the canonical angles between R(X) and X. The first residual bound is based on the second sin 8 theorem. Theorem 3.12. Suppose that there is a nl11nber b > 0 SUell that exactly n - k of the eigenvalues of A lie outside the interval [Ilk - b, PI + b] (3.8) and p == IIRII2 < 1. b Then there is an index j such that Aj, . . . , Aj+k-I E (pk - b, PI + b) and I . - A. I <  IIRII M. )+.-1 - 1 _ p2 b ' Proof. Let (X Y) be unitary. Then ( :: ) A(X Y)  ( '), i=I,...,k. (3.9) 
256 V. INVARIANT SUBSPACES where IIGII2 = IIRII2" By Theorem 3.6 and Lemma 3.11, there is a matrix P satisfying IIP(1 + p Hp t t ll 2 ::; p. (3.10) such that the columns of AX' = (X + }' P)(l + pI' P)- span an invariant subspace of A. From (3.10) it follows that II P l12 < J l + IIPII - p, and since p < 1, P IIPll2 ::;  . I-p A 1 A A Let Y = (Y - X pH)(I + P pH)- 2. Then (X Y) is unitary. Since the A A H A columns of X span an invariant subs pace of A, we have Y AX = O. Hence (3.11) ( XH ) A A ( if 0 ) yH A(X Y) = 0 N . As in the proof of Theorem 2.1, it can be shown that if = (1 + pHp)t(M + G H p)(1 + pHp)-. The eigenvalues of !V! are eigenvalues of A. Since p < 1 it follows from the residual bound of Chapter IV (Theorem IV.4.14) that they lie in the interval (II,. - f1,II'1 + b), and hence are Aj,". ., Aj+k-J for some index j. By the similarity bound of Theorem IV"5.3, 1/ 1 i- A j+i-ll::; 1I(1+pHp)tI1211(1+pHpttIl21IGI12I1PI12' i = 1,...,k. The theorem now follows on noting that 11(1 + pHpttll2 ::; 1 and inserting the bound (3.11) for II P Il2. . In the bound (3.9) the factor (1 - p2)-1 is insignificant when p is even a little less than one. The factor IIRIlUb is quadratic in IIRII2; however, as f1 decreases the bound deterioriates. 3. HERMITIAN MATRICES 257 For p <  the bound is less than the bound II RI12 provided by Theorem IV.4.14. Moreover, the bound is asymptotically sharp, as the matrix A( :) from Example IV.4.17 shows. The requirement (3.8) unfartunately does not allow the cigenvalues of M to be scattered through the spectrum of A. However if we pass to the Frobenius norm, then we can obtain a Hoffman-Wielandt type residual bound. Specifically, if there is a set 12 2 consisting of n - k eigenvalues of A (counting multiplicities such that) b = min I£(M) - 12 2 1 > 0, (3" 12) then Theorem 3.4 shows that there is a matrix P satisfying IIP(1 + pH P)t 112 ::; IIP(1 + pHp) IIF ::; IIIIF such that the columns of x = (X + Y P)(1 + pHp)- span an invariant subspace of A. By the A similarity bound of Theo- rem IV.5.5, the eigenvalues Ail, . .. , Ajk of M may be ordered so that k 2:.:(l1i - Aj,F ::; 11(1 + pH P)t 11211(1 + pH Ptt 11211511FIIP1I2" i=1 Hence we have the following theorcnL Theorem 3.13. With the above definitions, assume that A and M satisfy (3.12). If PF == IIRIIF < 1, b then there are eigenvalues Aj) , . . . , Ajk of A such that  (1 1- A ) 2 <  IIRII .  1 ), - 1 _ 2 b i=1 PF 
258 V. INVAIUAN'l' SUUSPACES Notes and References The observation that for Hermitian matrices the function sepin the Frobenius norm reduces to the minimum difference between the eigcnvalues was made by Stewart [200, 1971]. The knowledgeable reader will have noted that we have surreptit.iously int.roduC!d tlw Kronecker or tensor product in the proof of TI\('orelll :.L 1. TI\(' sill (-) awl t.an (-) t.heoremsare dlw t.o Davis and Kahan [53, 1970]. Earlier Davis [50, 51, 1963,1965] est.ablished bounds on sin 20) and tan 2(O, which are also present.ed in this ground-breaking paper, along with much, much more, including Theorem 3.4. It should be noted that Davis and Kahan work with bounded operators in a Hilbert space, and some of their results extend to unbounded operators. The residual bounds for eigenvalues are due to Stewart [21S, IgSg]. Exercises TilE FOLLOWING TWO EXERCISES SPECIALIZE THE RESULTS OF THIS SECTION TO EIGENVALUES AND EIGENVECTORS. NOTE TIIAT THE EIGENVALUE BOUNDS ARE A LITTLE SHARPER THAN TilE ONES ONE WOULD GET FROM THE THEOREMS IN THE TEXT. L Let (z, It) be an approximate cigenpair of A with IIzll2 = 1. Let l' = A.T, - 1/:1:. Suppose that there is a set L of n - 1 eigenvalues of A such that 0= minlL - {A}I > o. Show that there is an eigenpair (x,..\) of A satisfying . 111'112 Sill L(:1:, z) 'S T and { 117'1I2 } III - AI 'S min 111'112, T . 2. Let (x, A) be a simple eigenpair of A with IIxll2 = 1. Let A = A + E and set f = IIEII2' Let 0 be the distance from A to L(A) \ {A}. Show that if f < 0 and f 1 -<- f, - f 2' 4. THE SINGULAR VALUE DECOMPOSITION 259 then there i:-; an eigenpair (i:,).) of A satisfying tanL(.T,i:) <2 0f and - H - f2 IA -.7: A:rl < 2" {J-f -0- 3. (Davis and Kahan [53]). By considcring the matrices A  U I ; ) and A  (l o o I ) v'2  ' o 1 v'2 show that the hypotheses on the situation of the eigenvalues in the tan e theorcm are necessary. 4. The Singular Value Decomposition The perturbation theory for singular values and vectorscomplicated by two troublesome facts. The first is that we must deal with both right and left singular vectors. The second is that the singular values of a matrix are not differentiable functions of the matrix. For example, if A = a is a 1 x 1 complex matrix, then its singular value is lal, which is not an analytic function of a. In particular, if we seek a perturbation expansion for a = a + f, we cannot s imply write a = / (a + f)H(a + f)  a + (fa + (if), since the right-hand side of this expression may not be nonnegative. For the larger singular values this example presents no problem; but it shows that we must take care in dealing with singular values near zero. In the next subsection we will consider a generalization of the sin 8 theorem for subspaces spanned by singular vectors, which are some- times called SINGULAR SUBSPACES Here we circumvent the problem by working with the Jordan- Wielandt matrix to get simultaneous bounds for spaces spanned by right and left singular vectors. In the following subsection, we derive a perturbation expansion based on the cross- product matrix. 
2GO V. INVARIANT SUBSPACES Throughout this section A will be an rn x n matrix with rn 2: n. 4.1. Two sin 8 Theorems In this subsection we will establish sin 8 theorems, due to Wedin, for spaces spanllcd by the sillgular vectors of A. To fix the notation, let ( l 0 ) (U I U 2 U 3 )HA(V I V 2 ) =  2 be a partitioned singular value decomposition of A (here we do not place ally constraints on the order in which the singular values appear), and let ( I:l 0 ) (lh lh ud l A(Vi \/2) =  2 be a conformal partition of A = A + E. Let 1:> be the matrix of canonical angles between R( Ud and R( U I ), and let 8 be the matrix of canonical angles between R(\lI) and R(Vi) Finally, let R = A\l1 - UII: I H - -- and S = A U I - \l1I:1' (4.1 ) ;" / "1' , The following theoren: 'bounds the angles 1:> and 8 in terms of the residuals Rand S" Theorem 4.1 (Wedin). Suppose that there is a number b > 0 such thilt, minla(t l ) - a(I: 2 )12: band mina(td 2: b. ( 4.2) Then VII sin 1:>11 + II sin 811 S VIIRIIb+ IISII . - - Remark 4.2. The matrices U i , Vi, U i , and Vi may be replaced by any matrices with orthonormal columns spanning the appropriate sub- spaces. 4. THE SINGULAR VALUE DECOMPOSITION 261 Proof. Consider the Jordan - Wieland t matrix e- ( 0 A ) - A H 0 ' whose eigenvalues are ::!:al, . . " , ::!:a n with m - n additional zero eigen- values. Let C be the .Jordan- Wielandt matrix for X It is easy to show that if 1 ( UI UI ) X = J7\ == (XI X 2 ) v2 \II -Vi then R(X) is an invariant subspace of e. The representation of e on this subspace is diag(I:I, - I: 2 ). Similarly, if - 1 ( UI UI ) -- X = J7\ - - == (XI X 2 ), v2 VI -\II then R(X) is an invariant subspace of C. The representation of C on this subspace is diag(t l , -t 2 ). Hence by Theorem 3,1, if we set - --II-- I' = ex - X(X ex), the:! II sin 8[R(X), R(Y)]IIF S IIIIF . ( 4.3) To arrive at the conclusion of our theorem, we must compute the left- and right-hand sides of (4.3)" For the left-hand side, note that Px = XXII = diag(UU II , V\lII) = diag(Pu, P v ), and similarly for p\;' Hpnc(' by Theorem 1"5.5 II sin 8[R(X), R(X)]II = IIP,*Pxll = IIcliag(P[t PO), cliag(P Pi! )II = II sin 1:>1I + II sin 811. (4,1) For the right-hand side, a straightforward computation shows that T=  ( -;). 
262 V. INVARIANT SUBSPACES Hence , 2 2 I 11 2 IIlllr = IIRlir + IS r" ( 4.5) The theorem follows on combining (4.3), (4.4), and (4"5). . The appf'arance of the condition a(EI) ::::: f1 seems strange at first, hut it is I\('('('ssary :lS tJ\(' followinp; ('xalllplc shows" Example 4.3. Let AU n and AU n The 111 makes an angle of 45 degrees with UI, even though the singular value I' of A is well separated from the other singular value. This example also points to a fundamental defect in Theorem 4.1. Although the vector VI is insensitive to perturbations in A, its bound is governed by the ill-conditioning of 111. However, the problem can be circumvented by using the theorem to bound the perturbation in R(V 2 ). Since R(V 1 ) and R(V 2 ) are complementary subspaces, the same bound will serve for R(Vd. By imposing further restrictions on the singular values, we may establish a bound on the 2-norm (actually on any unitarily invariant norm). The proof of the following theorem is a variant of the proof of Theorem 4.1 and is left as an exercise. Theorem 4.4 (Wedin). Suppose that there an numbers cr, b > 0 such that min a(td  cr + band max a(E2) :::; cr. (4.6) Then max{IIRlb IISlld max{11 sin «I>lb, II sin 81b} :::; {) . The condition (4.6) restricts the bounds to subspaces associated with a group of the largest singular values. However, by the trick described in connection with Example 4.3, we can use the theorem indirf'dly to get bounds on the perturbation in R(F 2 ). 4. THE SINGULAR VALUE DECOMPOSITION 263 4.2. A Perturbation Expansion In the last subsection we saw that the perturbation theory for singular vectors associated with small singular values presented some difficulties. Actually, small singular values themselves exhibit curious behavior- they tend to get larger (after all, they have nowhere to go but up)" Since this fact has important. consequences for applications to least squares problems and linear regression, we will develop a pert.urbation expansion that shows what is going on. The key is to smooth out the behavior of the small singular value by working with its square, or equivalently with the cross-product matrix A"A. We begin with a lemma that follows directly from Theorem 2.7 applied to Hermitian matrices. Lemma 4.5. In the Hermitian matrix ( cr h" ) h C let cr and C, be constant and let h depend on a parameter I' in such a way that IIhll 2 = 0(1') as I' --+ O. Let the quantities ii, e and h satisfy Iii - crl, lie - CII2 = 0(1') and IIh - hll 2 = 0(1'2)" If crl - C is nonsingu1ar, then for all sufficiently small I' the matrix ( ') (4.7) has an eigenvector ((Ol-)'h) +0(,') ( 4.8) corresponding to the eigenvector ii + hH(crl - C)-lh + 0(1':1). (4.9) 
2G4 V. INVARIANT SUBSPACES To apply the lemma, let A have the singular value decomposition Un A V  ( n (here, as above, we do not assume that the singular values appear on t.Ite diagonal of  in des('('nding order)" Partition p-l n-1J U = (UI U 2 U 3 ) and I 1'-1 V = (VI V 2 ). Partit.ion ( al 0 ) UIlAV =  2 conformally. Finally, given a perturbation A = A + E of A, let ( I'll U" EV = 921 931 9 " ) 12 G 22 G n so that U H Av = ( a] :]1'11 E 2 2G22 ) . (4.10) .l!:11 G: 12 The following theorem contains the chief result of this subsection. Theorem 4.6. Let h = al912 + E 2 921' (4.11) If all - E 2 is nonsingular (i.e., if al is a simple singular value of A), then as E -> 0 the matrix (4.10) has a right singular vector of the form ( (aU -\:;)-Ih ) + O(IIEII) ( 4.12) 4. TilE SINGULAR VALUE DECOMPOSITION 265 corresponding to a singular value al satis(yjng ai = (al + I'll ? + 1192]1I+ 1193111+hH(ail - E)-lh+O(IIEII). (4.13) Proof. In Lemma 4.5 let ex = ai and A = E, and let h be defined as in (4.11). Identify the elements of the matrix (4.7) with the elements of t.he partitioned cross product matrix (U ll A V)II (U ll A V), so that ii = (al + 1'11)2 + 1192] II + 1193] II, h = h. + 1'11912 + G2921 + G931' A = (E 2 + G 22 )2 + 9129 + GG32' Then the conditions of the lemma are satisfied and (4.12) and (4.13) follow by making appropriate substitutions in (4.9) and (4.9). . We have expanded the perturbed singular vector (4.12) in terms of the transformed matrix U B A V; in terms of the original matrix we get the expression VI = VI + V 2 (aiI - ED-Ih + O(IIEIID. The results can also be stated in terms of projections. Let PI, P 2 , and P3 be the orthogonal projections onto the column spaces of UI, U 2 , and U 3 , and let QI, Q2, and Q3 be the orthogonal projections onto the column spaces of VI, V 2 , and V!. Then IIU!I E1;j1l2 = IIPiEQjii, so that an expresRion like (al + I'll )2 + Il.q2111 + IIg:!lIl can be written - 2 - 2 - 2 - IIP I AQI1I2 + IIP 2 AQI!l2 + IIP 3 AQI1I2 = IIAQIII. In particular, if al is large compared with E, then the second order terms in (4.13) are negligible compared to the first order terms, and we have a] = al + I'll + O(IIEIID = al + ur EVI + O(IIEII) = IIP]AQI1I2 + O(IIEIID 
2GG V. INVARIANT SUBSPACES Our expansion quantifies the observation, made at the beginning of this subsection, that small singular values tend to increase. For if (}I = 0, then II, = E 2 921, and hH((}I - ED-Ill, = -1I92111. It. follows that ai = ,il + Ilg3111 + O(IIEIID = (u]Evd 2 + IIUJIEvdl + O(IIEIID = II(P I + P3)EQIII + O(IIEII). Notes and References The <P e theorem for the 2-norIn (Theorem 4.4) was proven by Wedin [259, 1972], who established the results for arbitrary unitarily invriant nrms. The <P Theta theorem for the Frobenius norm (Theorem 4.1) IS techmcally new, but the proof is a modification of a comment by Wedin on another way of proving the <p, e theorem for the 2-norm. A It.hollgh WI' hav(' stn's('d dir('ct. bounds iu this section, th(' ap.proach taken in Sect.ioll 2 for invariant subspaces can be adapted to the slllgular value decompositioll. Briefly, let U = (U I U 2 ) and V = (VI V2) be unitary, and let ( 5 HH ) UHAV = I . G 52 We seek , ( I U = (lh U 2 ) P 1 (I + plI P)-2 o (I + pOpH)_ ) ) ( _pll I and 1 , ( I QH ) ( (I+QIIQ)-2 V = (VI V2) 0 -Q I (I + QOQH)_ ) such that that ejH A V is block diagonal. This requirement leads to the equation T(Q,P) = (G,H) - rjJ(Q,P), 4. TilE SINGULAR VALUE DECOMPOSITION 2G7 where T = (Q, P) f-> (Q5 1 - 5 2 P, P5r - 5IQ) and rjJ = (Q, P) f-> (QGP, PHQ). If we set I/(Q, P)II = I/QI/ + IIPII alld let IITI/ be the subordinate operator norm, then I/T-II/-I = mi1l15(5d - 5(5 2 )1. Theorem 2.11 now applies to give conditions for the existence of P and Q amI bounds on their norms. This development is due to Stewart [202, 1973]. The material on perturbation expansions in the second subsection is taken with small changes from a paper by Stewart [212, 19 8 4]. In least squares problems with errors in the least squares matrix (errors in the variables, as they arc known to the statistical COmIll\lIIity), the illcrease of small singular values manifests itself in a downward bias of the least squares solutioll (e.g., see [19, 213]). This has lead to the development of techniques to remove the bias [80, 92]. It should be noted that the solutions produced by these techniques differ from a least squares solution ollly in second order terms and higher [210]. Closely related to these perturbation expansions are characterization by Sun [230, Ig88] of the behavior of a simple singular value whell the elements of its matrix are analytic fllnct.ion of several complex variables. Exercises 1. Verify Remark 4.2. 2. Let () = inf 2 (A) be the smallest singular value of A with right singular vector v and similarly for if and 11. Let (j be the distallce between () and the next largest singular value of A. Show that if I/EI/2 < b then . L( ' ) < I/EI/2 SlIl 11, V - b _ I/EI/2 [Hint: Work with the complementary spaces and regard A as a perturbation of A.] THE FOLLOWING EXERCISES PRESENT VVEDIN'S PROOF [259] OF THEOREM 4.4, WHICH IS VALID FOR ANY UNITARILY INVARIANT NORM 1/. 1/. HERE, IN THE NOTATION OF THE FIRST SUBSECTION, H' '" 'H " WE SET Ai = UiI:iV; (z = 1,2) AND Ai = UiI:iV; (z = 1,2). 
268 V. INVARIANT SUBSPACES :t Show t.hat. I'J,p Ji , = (l'J, E1';\1I + A2P'IIPJiIl)Al , I' and PAHP A \ = Al1(P A - EP A \ + P A Px , Ad. I 1 1 1 1 .1. Lpt. nand S lH' dditwd by (/1.1)" Show that IL == max{llPt EPA"II, P A EP A \}  max{IIRII, JlSJI}. 1 1 1 5. Let /I, be defined as in the last exercise. Show that uwler the hypotheses of Theorem 4.4 II sin <1>JI  IL + nil sin 811 0'+8 and II ' E -' II IL+O'Jlsin<1>JI sm 7 < . - (10 + 8 llence {II ' "" II II . E " II} rnax{JlRII,IISII} max sm 'i' , sm 7 < . . . - 8 -0- 6. Fill in the details of the sketch given in the notes and references for an approximation theorem for the singular value decomposition. I3e sure to pxhibit. mat.rices whose singular vaineR arc those of A. Derive a perturbation t.hporpll1 from t.!w approxilnatiOI1 t.!worclIl. 7. (A norm version of Theorem 4.6 [208]). Let A have singular values (JI  . . .  (In and A have singnlar values 0"1  . . .  0"11' Show that - 2 ( ) 2 2 (Ji = (Ji + Ii + 17i , i=l,...,p, where hil  IIPAEI12 and inf 2 (PX E)  1/i  IW.t EJl2 8. (Scaled null vedors [210]). Let A E C"'Xl1 have rank n, and let b = Ax. Let. A = A + E and b = B + c. For l' nonsingular, let (JT be the smallest singular value of (AT Ii) and ( :rT ) -1 4. THE SINGULAR VALUE DECOMPOSITION 2G9 be the corresponding singnlar vector. Show that. T-IXT = At/; + O(II(E e)Jl2). 
Chapter VI Generalized Eigenvalue Problems A MATRIX PENCIL is a family of matrices A - AB, parameterized by a complex number A. When A is square and B = I, the zeros of the function det(A - >'B) are the eigenvalues of A. Consequently, the problem of finding the nontrivial solutions of the equation Ax = ABx is called the GENERALIZED EIGENVALUE PROBLEM. Although the generalized eigenvalue problem looks like a simple gen- ,eralization of the usual eigenvalue problem, it exhibits some important ; differences. In the first place, it is possible for det( A - >'B) to be iden- 'tically zero, independent of A. For such SINGULAR PENCILS every scalar can be regarded as an eigenvalue. Second, it is possible for B to be singular, in which case the problem ,bas infinite eigenvalues. To see this, write the generalized eigenvalue >problem in the reciprocal form Bx = AIAx. B is singular with a null vector x, then Bx = OAx, so that x is eigenvector of the reciprocal problem corresponding to eigenvalue 271 
272 VI. GENERALIZED EIGENVALUE PROBLEMS A -I = 0; i.e., A = 00. It might be thought that infinite eigenvalues are special, unhappy cases to be ignored in our perturbation theory, but that is a misconceptiOlL If we write the eigenvalue problem in the cross-product form ,6Ar = (xBx, (1) then we see that infinite eigenvalues correspond to nonzero pairs (a, (3) for which (3 = 0, a case that is not essentially different from the case a = 0 (i.e., A = 0). In this chapter we will deal with the problem of infinite eigenvalues by treating generalized eigenvalue problems in their cross-product forms. Finally, t.lH'H' are' difficult and unresolved problems connected with the' scalillg of generalized e'igenvaille prohlems" In the ordinary eigen- value problem, the fact that B = I provides a natural scale: namely the size of A. For the generalized eigenvalue problem, we may scale both A and B, and the perturbation bounds we derive will be essentially different for different scalings. This is an open research problem, which will keep returning to haunt us. In spite of the differences between the generalized and the ordi- nary eigenvalue problems, they have striking similarities, similarities we will stress as much as the differences. In fact, this chapter is a copy en 1T/,iniat1J.n of the' part of the book that concerns eigenvalue prob- lems. The first section is devoted to the background- an algebraic introduction to t.he subject. We then turn to perturbation bounds for: the eigenvalues of regular matrix pencils - the nat.ural generalization of the ordinary eigenvalue problem - and then for their eigenspaces- the natural generalization of their eigenvectors. Finally, we consider both the eigenvalues and eigenspaces of definite matrix pencils, which generalize the Hermitian eigenvalue problem. Although rectangular matrix pencils -- matrix pencils A - )"B with A and B rectangular - occur and have important applications, we shall" consider only square matrix pencils, for which the perturbation theory.. is less immature. Unless otherwise stated, A and B will be square matrices of ord('r n throughout this chapter. 1. BACKGROUND 273 1. Background 1.1. Matrix Pairs ':V e lave. seen in the introduction to this chapter that the presence of mfi.lllte elgenvles results from the asymmetrical treat.ment of A and B III t.he defimtlOn. of a matrix pencil and its generalized eigenvalue prblem. The solutIOn to the problem is to recast it in the form (1), in wInch A and B play equivalent roles. However, there is a technical problem here. If the pair (a, (3) satisfies (1) then so does T(a, (3) for any scalar T. Consequently, if we are to regard (a, ) as a geeralized eigenvalue, we must so regard its nonzero scalar multll.;!es. TIlls suggest.s that it is tlw subspace spanned by the vct.or (,,6) that should be regarded as the generalized eigenvalu" To dlstmgulsh between the subspace and the pair we make the foll . definition. ,. oWlllg Definition 1.1. Let (a, (3) f. (0,0). Then (a,,6) f {T(a, (3)T : TEe}. In ordr to preseve the .connection of the generalized eigenvalue prob- lem wIth. the. ordmary eIgenvalue problem, we will occasionally abuse the otalOn Illtroduced in the above definition and write (A) for (A, 1). For mfilllte A, we define (00) = (1,0)" We .are now in a )sition to define the generalized eigenvalue prob- lem. Smce the defillltlOn of matrix pencil treats A and B ell ' a- tl . 11 ueren y, we WI drop the term and refer simply to pairs of matrices. Definition 1.2. A MATRfX PAIR (A, B) is SINGULAR if for all (a, (3) det((3A - aB) = O. ":;,. Otherwise the pair (A, B) is said to be REGULAR. If (A, B) is regular and ,6Ax = aBx (1.1) "[o (a,,6) f. (0,0) and :r f. 0, then (a, (3) is an EIGENVALUE of (A, B) .,;WIth (.RIGHT) EIGENVECTOR x. The corresponding solution y f. 0 of the uatlOn ) ,6yH A = ay" B called a LEFT EIGENVECTOR. 
LL-__________ ,_'-- c;l :: NEI! l' i.Zlm EIC:'iNVAL j! <; PI!OI3LEMS SOll1e examples may make these definitions clearer. Example 1.3. Suppose that the null spaces of A and B intersect, and let x c:I 0 belong to the intersection. Then for an,Y (0:, (3), we have ((3A - o:B)x = 0, so that the pair (A, B) is singular. Example 1.4. Let IJ be nonsingulaL Then with (0:, (J) I iii \'(' (1,0), we det((jA - nB) = - det(IJ) c:I O. Consequently the pair (A, B) is regular. In fact the eigenvalue problem for the pair in this example is equiv- alent to an ordinary eigenvalue problem. To see this, note that if (0:, (3) is an eigenvalue of (A, B), then (3 c:I O. It follows that (1.1j can be rewritten in the form IJ- I Ax = AX, where A = 0:/ (3. Conversely if A E £(IJ- I A), then (A) is an eigenvalue of (A, B). This observation- that the generalized eigenvalue problem with nonsingular B can be converted to an ordinary eigenvalue problem - is the basis of many nu- mericalmethods, which, however, can fail in the presence of rounding error wlH'n IJ is ill conditioned" Example 1.5. The pair A= ( 10 ) o 0 ' lI(n is obviously H'gular. Its eigenvalues are (1) and (00), and the corre- sponding eigenvectors are 1 1 and 1 2 . We shall see that in spite of the infinite eigenvalue, the pair behaves well under perturbations, provided we make the proper definition of "well behaved"" When B is nonsingular, the eigenvalues of the pair (A, B) satisfy the characteristic equation det(A - AD) = O. When B is singular, the characteristic equation will have degree less than n. For example, the pair in Example 1.5 has the characteristic polynomial det(A - )"B) == ).." The missing eigenvalue is the infinite 1. BACKCI!OUND 275 one. By transforming the problem we can make the infinite eigenvalues finite and restore the lost degrees in the characteristic equation. The proof of the following theorem is purely computational and is left as a exercise. Theorem 1.6. Let W be a 2 x 2 nonsingular matrix. Given the pair (A, B), set (C D) = (A B) ( will W12 I ) == (A B)(W Q9 1). W21 I w221 Given the pair (0:, /3) c:I (0,0), define h, 8) by ( !, )  W-, ( !" ) . Then (0:, (3) is an eigenvalue of (A, B) if and only if (" 8) is an eigen- value of(C, D). If (A, B) is a regular pair, there are constants a and T such that T A - a B is nonsingular. If we set w= ( a T ) , T -a then Wis. nonsingular. If C and D are defined as in Theorem 1.6, then the eIgenvalues of (A, D) are in one-to-one correspondence with t?ose of (C, D). But D is nonsingular, and hence by Example lA the eIgenvalues of the pair (C, D) are the eigenvalues of D-IC. Thus we have established that a regular matrix pencil of order n has n eigenvalues. As with the ordinary eigenvalue problem, we will denote the set of eigenvalues of the pair (A, B) by L:[(A, B)]" 
27G VI. GENERALIZED EIGENVALUE PROBLEMS 1.2. Triangular and Weierstrass Forms As we saw in Chapter I, an important theme of matrix algebra is the reduction of matrices to simpler forms by means of appropriate trans- formations. The key word here is "appropriate." For the computation of projections, the appropriate transformation is premultiplication by a unitary matrix. For the eigenvalue problem it is similarity trans- formations" For the generalized eigenvalue problem it is equivalence tr ansformations. Definition 1.7. If X and}' are nonshlgu1ar, then the pair (A, B) and (}'IIAX, yIlBX) are EQUIVALENT. Equivalence, like similarity, preserves eigenvalues while transform- ing eigenvectors in a simple manner. The proof of the following theorem is left as an exercise. Theorem 1.8. Let (0:, /3) be an eigenvalue of the pair (A, B) with eigenvector x. Then (0:, (3) is an eigenvalue of the equivalent pair (yH AX, yH BX) with eigenvector X-IX" The first application of this observation is a reduction to the equiv- alent of the Schur form. Theorem 1.9. Let (A, B) be a regular pair. Then there are unitary matrices U and V such that the components of the equivalent pair (5, T) = (VB AU, VB BU) are triangular. The quantities (aii,Tii) (i = 1,..., n) are the eigenvalues of (A, B) and may be made to appear in any order on the diagonals of 5 and T. Proof. Let (a, T) be the first eigenvalue in some prespecified order of the' eigcnvalllcs of (A, B), and let T/Le = aB;r (x", 0). Since (A, B) is regular, not both Ax and B x can be zero - say Ax '" 0" Let U = (u U.) be a unitary matrix with u proportional to x, and let V = (v V.) be a unitary matrix with v proportional to Ax. Then VII AU = ( vH Au vB AU. ) ( a o ll A s:", . ) V. II Au V. II AU. 1. BACKGROUND 277 is block triangular. Since TAu = aBu, we mllst have V. II Bu = 0" Hence V H 8U  (T1 :) is also block triangular. This completes one step of the reduction. The reduction continues inductively a la Schur (d. Theorem 1.3.3). . We note in passing that singular pairs can also be reduced to tri- angular form by unitary equivalences" The proof is a minor variant of the above proof. The computational consequences of this theorem are the same as for the Schur theorem: it provides a target far iterative generalized eigenvalue algorithms to aim for. However, it does not have the broad theoretical implications of Schur's theorcm" This is because the trans- formations involved do not preserve Hermitian matrices. Consequently, we cannot read off the theory of Hermitian pairs from the theorem; in- stead we must develop it directly, as we will do in the next subsection. However, the triangular reduction has one important consequence for simple eigenvalues. eorollary 1.10. Let (0:, /3) be a simple eigenvalue of the regular pair (A, B) with right eigenvector x and left eigenvector y" T11en (0:,(3) = (yHAx,yIlBx). Proof. It is sufficient to consider (A, B) in the triangular form [( :).( )] In this case x is a multiple of 11. Moreover, the first component of y is nonzero; otherwise, y would be a left eigenvector of (A., B.), contra- dicting the simplicity of (0:, (3). Hence yll.T '" 0 and (yHAx,yIlBx) = (o:yllx,(3y H X) = (0:,(3). . We now turn to the further reduction of the Schur form to block diagonal form. Let (A, B) be a pair in triangular form and partition A = (AI ::), B = (Bl ::). 
278 VI. GENERALIZED EIGENVALUE PROBLEMS We wish to find matrices P and Q such that ( I Q ) ( All A12 ) ( I P ) ( All 0 ) o I 0 A 22 0 I 0 A 22 and ( I Q ) ( BII B12 ) ( I P ) ( Bll 0 ) . o I 0 B 22 0 I 0 B 22 \>Vith a little manipulation, this requirement yields the pair of equations AllP + QA 22 = -A I2 , BliP + QB 22 = -B 12 . which may be called the GENERALIZED SYLVESTER EQUATIONS If we set T = (P, Q) I-> (AllP + QAn, BllP + QB 22 ), (1.2) then our problem becomes one of determining when the linear operator T is nonsingular. It turns out that a separation condition, analogous to the condition of Theorem v" 1.3 far the ordinary eigenvalue problem, is necessary and sufficient for T to be nonsingular. Theorem 1.11. Let (All, B ll ) and (A 22 , B 22 ) be regular pairs, and let T be defined by (1.2). Then T is nonsingular if and only if L:[(A ll , B ll )] n L:[(A 22 , B 22 )] = 0. (1.3) Proof. Suppose that L:[(All, B ll )] n L:[(A 22 , B 22 )] = 0. We will show that for any (R, S), the equation T(P, Q) = (R, S) has a solution, which implies that T is nonsingular. \Ne may assume without loss of generality that All, An, B] I, and B 22 are upiJer triangular. For by Theorem 1.8, there are unitary matri- ces U i , V; (i = 1,2) such that the pairs (HAiiUi,HBiiUi) are upper triangular. Then the equation T(P, Q) = (R, S) is equivalent to (VlIAIIUd(Ur PU 2 ) + (VjHQV2)(Vl1A22U2) = V j H RU 2 , (VI II13 II U d(U[IPU 2 ) + (VlIQV 2 )(Vl'B 22 U 2 ) = Vl 1 SU 2 . 1. BACKGROUND 279 Hence with the substitutions All f- Vl' All U 1 , P f- Ufl PU 2 , etc., the problem reduces to one in which the pairs (A ii , B ii ) (i = 1,2) are triangular. We shall now show how to solve the equations AllP + QA n = R, BliP + QB 22 = S (1.4) column by column beginning with the first columns of P and Q. Sup- pose that the columns PI, P2, . . . , Pk-I and ql, q2, . . . , qk-I have already been computed (n.b., k may be equal to one). From (1.4) and the upper triangularity of A ii and B ii , it follows that the kth columns of P and Q must satisfy (22) k-I (22) AllPk + a kk qk = rk - Li=1 aile qi, (3 (22) _ k-I (22) BllPk + kk qk - Sk - Li=1 (3ik qi' (1.5 ) Multiply the first equation by (3ki 2 ), the second by ai2), and subtract to get k-j k-I ((3ki 2 ) All - ai2) Bll)Pk = (3ki 2 )(rk - L a;2)qi) - ai2)(sk - L (3i(2)qi)' i=1 i=1 ( 1.6) Since L:[(A II , B ll )] n L:[(A n , B 22 )] = 0, the matrix (3ki 2 ) All - a2) BII is nonsingular. Hence equation (1.6) may be solved for Pk' Since (An, Bn) is a regular pair, not both a2) and (3ki 2 ) are zero. Hence one of the equations (1.5) may be solved for qk, and it is easily verified that this solution is consistent with the other equation. This completes the computation of Pk and qk from PI,P2,.." ,Pk-I and ql, q2,".., qk-I. For the converse, suppose that (A) E L:[(A II , B II )] n L:[(A 22 , B 22 )], and assume without loss of generality that A '" 00 (otherwise reverse the roles of A and B in what follows). Then there are nonzero vectors x and y such that Allx = ABllx, yH An = AyH 13 22 . Let P = xyH Bn, Q = -BIIXyll. 
280 VI. GENERALIZED EIGENVALUE PROBLEMS 1. BACKGROUND 281 Then 1.3. Definite Pairs T(P, Q) = [(A - A)BllXyH B 22 , (1- I)B llXy H B 22 ] = (0,0), The natural generalization of the Hermitian eigenvalue problem is to pairs of Hermitian matrices. Unfortunately, the property of being a Hermitian pair is not in itself enough to guarantee that the pair has nice properties, as the following example shows. which shows that T is singulaL . One consequence of this t.heorem is that we can reduce any regular pair to any block diagonal form, in which the diagonal block-pairs do not. have common eigenvalues. In particular, we have the following corollary eorollary 1.12. Let the regular pair (A, B) have disUnct eigenval- ues. Thc)] there arf' nonsingular 111atrices X and Y such that the pair (yH AX, yH BX) is diagonaf The colu111ns of X are the right eigenvec- tors of (A, B), and tlu? colu111ns of Yare its left eigenvectors. 1\ morc intPfcsting consequence is WEIERSTRASS'S CANONICAL FORM. Theorem 1.13 (Weierstrass). Any regular pair is equivalent to a form Example 1.14. Let A= ( 1 0 ) B= ( O 1 ) . o -1 ' 1 0 Then the pair (A, B) is Hermitian, but the eigenvalues of the pair are clearly (::I::i). [diag(J, I), diag(I, N)], (1.7) Even mare pathological cases can OCCUL For example, any ma- trix can be written in the form B-1 A, where A and B are Hermitian. Clearly, we must impose additional conditions if we are to have a work- able theory. One possibility, which accounts for many applications, is to require that B be positive definite. This condition is justified by the following theorem Theorem 1.15. In the Her111itian pair (A, B) let B be positive defi- nite. Then there is a no)]singular matrix X satisfying XII B X = I such that X H AX = 1\, where 1\ is real and diagonal. Proof. Since B is positive definite, it has a positive definite square root B!. Then the pair (A, B) is equivalent to the pair (B-!AB-!, 1). Let B-! AB-! = U 1\U II be the spectral decomposition of B-! AB-L Then X = B-! U is easily seen to be the required matrix" . The construction used in the proof of this thearem can be used to establish a min-max characterization of the eigenvalues of (A, B) in the spirit of Fischer's thearem (Corollary IV .4. 7). In fact, the following corollary, whose proof is left as an exercise, is what Fischer originally established. eorollary 1.16 (Fischer). Let the eigenvahlf's of (A, B) be ordcH?d so that Al 2: A2 2: . . . 2: An. Then where J and N are in Jordan canonical for111 and N is nilpotent (i.e., has only zero eigenvalues). Proof. Assume that the eigenvalues of the pair have been ordered so that its triangular form can be partitioned [( All AI2 ) , ( Bll B I2 )] o A 22 0 B 22 where the diagonals of Bll are nonzero and the diagonals of B 22 are zero. Since L:[(A ll , B ll )] nL:[(A 22 , B 22 )] = 0, we may further reduce the pair to the form [diag(A ll , A 22 ), diag(B ll , Bd] . Sincc t.he pair is regular and the diagonals of B 22 are zero, the diagonals of A 22 are nonzero; i.e., A 22 is nonsingular. Hence we may further reduce the pair to the form [diag(AllB 1 /, 1), diag(I, B 22 A;n] . The reduction is completed by reducing AI1Bl/ and B22Ail to their Jordan canonical forms. . XII Ax Ai = max min-, dim(,Y)=i rE.\" xlI Bx T#O 
282 VI. CENERALIZED EIGENVALUE PROI3LEMS and Xli Ax .\ = min max -" dim(X)=n-i+1 TE.\' x H B.T ",",0 Although the condition that B be positive definite covers many cases occllrring in practice, we can make do with an even weaker condition, which includes cases in which neither A nor B is positive definite" Definition 1.17. The Hermitian pair (A, B) is a DEFINITE PAIR if ,(A, B) (f min Ixll(A + iB)xl == min V (x H Ax)2 + (x H Bx)2 > O. xEC" XEC" IITII21 IIT1I21 (1.8) The basic fact about definite pairs is that they can be transformed into a pair in which B is positive definite. Specifically, we have the following result" Theorem 1.18. Ll?t (A, B) bl? a dl?finite pair, and for 4> E R let A1' = A cos cp - B sin cp, B1' = A sin cp + B cos cpo (1.9) Then there is a 4> E [0, 27r) such that Brp is positive definite and ,(A, B) = AmirJBrp), where Amin(Brp) is the smallest eigenvalue of Brp" Proof. Let F be the field of values of A+iB (see Definition 3.10)" Then ,(A, B) = minhEF IIhlb. Let the minimum be attained at the point h = xb'(A+iB)xo. Since F is a bounded, convex set (see Theorem 3.11), it is contained in the half plane H, whose boundary passes perpendicularly through h. Let Frp, Hrp, and hrp be the quantities corresponding to the pair (A"" B",). Since Arp + iB", = ei"'(A + iB), these quantities are just the original quantities rotated through the angle cpo Choose 4> so that H", lies in the upper half plane; i.e", so that h", lies along the imaginary 1. BACKGROUND 283 axis. Then xg Arpxo =  hrp below H, we must have O. Moreover, since no point of Frp lies o < ,(A, B) = xg Brpxo = min XH B",x = Amin(B), IIxll=J which proves that B", is positive definite. _ If we now combine Theorems 1.15 and 1.18, we have the following corollary. eorollary 1.19. Let (A, B) be a definite pair. Then (A, B) is regu- lar. Moreover, there is a nonsingu1ar matrix X such that X H AX and X H BX are diagonal. 1.4. Metrics and Their Limitations A novel feature of the perturbation theory for the generalized eigen- value problem is that two matrices _m A and B -- vary instead of one. Moreover, when we consider the perturbation of eigenvalues, we must introduce a distance between pairs (a, (3) and (0,73). This subsection is devoted to defining the metrics that will be used in the remainder of the chapter. We will first consider metrics for eigenvalues. Since we have chosen to regard eigenvalues of matrix pairs as two-dimensional subspaces, it is natural to use the metrics of Section II.4. Of these metrics, one, the chordal metric, has an especially natural geometric interpretation. Definition 1.20. The CHORDAL DISTANCE between (a, (3) and (1,8) is the number x((O',(3), (1,8)) (f pg,2((O',(3), (1,8)), where Pg,2 is the gap metric in the 2-norm (see Definition II.4.3). By definition the chordal distance is a metric. It is also easily com- puted. In terms of a, (3, " and 6 it has the form 10'6 - (3,1 X( (a, (3), (1,6)) = V l O' j2 + 1(312 v hl2 + 181 2 
28,1 VI. GENERALIZED EIGENVALUE PROBLEMS If we set. A = n / (1 ami Il = 'Y / tJ, tlwn we have X((A), (p.)) = IA -pi / 1 + IAI2 / 1 + I/LI2 From the lattN farm it is seen that 1 X((A), (00)) = / 1 + IAI2 ::;L Thus, the chordal metric regularizes the point at infinity by making it no more than unit distance from any other point. The name "chordal metric" comes from the following considera- tions. In R 3 let the :r-y plane represent the complex numbers. For any complex number A draw a line between A and the point (0,0,1) and let s(.\) be the intersection, other than (0,0,1), of the line with the unit sphere centered at the origin (the Riemann sphere). Then it can be shown that 1 X(A,P) = -lIs(A) - s(p)1I2' 2 (1.10) In other words, the chordal distance between A and p is one half the lcugt.h of t.he chord joining t.hc projections of A and Il ont.o t.he Riemann sphen'" For numbers less than one in magnitude, the chordal metric behaves essentially like the ordinary Euclidean metric. In particular IAI,lpl ::; 1 ===? X( (A), (00)) ::; IA - pi ::; 2X( (A), (00)). Moreover, as A, p --+ 0, we have X((A), (tL)) S:! ItL - AI. On the other hand, for large numbers the chordal metric behaves counter-intuitively For example, as A --+ 00 we have 1 X( (A), (2A))  f\T' Thus large numbers can have very small chordal differences, even when they have large relative errors" , Let us now consider metrics for matrix pairs. Let (A, B) be a matrix pair and let (ii, in = (A + E, B + F). A natural way to define the 1. BACKGROUND 285 distance, between the pairs is to apply a narm to tlw difference (11 B)- (A B), I.e., to the matrix (E F). Far example we mi ght say that the distanc between (A,) ald (.4, B) is II(E F)112 or / IIEII + IIFII or, dependmg on the applIcatIOn, some other combination. In many respects this is the most natural approach; however, it has an impartant drawback. Since the generalized eigenvalue problem is homogeneous in A and B, we do not feel there is any substantial differ- ence b:tween the the pair (A, B) and any nonzero multiple (TA, TB). But wIth the above approach, these two pairs have positive distance, unles.s T = 1. Consequently, we will also use other, less discriminating metncs. Let us start with definite pairs. We will take the same approach as we did with eigenvalues and define our metric over equivalence classes of pairs, which in some sense represent the same problem. Definition 1.21. Let (A, B) be a definite pair. Then (A, B)D (D for definite) is the set of pairs (C, D) such that there exists a real multiplier p for which one of the following conditions holds: 1. C = pA and D = pB (p  0), 2. A = pB and C = pD, 3. B = pA and D = pD. The first case in the above definition corresponds to the case where the pair (C, D) is proportional to the pair (A, B), and hence the pairs have te same eigenvalues. It the second case both pairs have the single eIgenvalue (p), and in the third they have the eigenvalue (p-I). It is easily verified that the operator (" ')D divides the set of definite pairs into equivalence classes. We now define p( (A, B)D, (C, D)D) f r;:;c: X( (x H Ax, x H Bx), (xIlCx, xII Dx)). (1.11) Theorem 1.22. The function p defined by (1.11) is a metric on the space of definite pairs (A, B)D" 
28G VI. GENERALIZED EIGENVALUE PROBLEMS 1. BACKGROUND 287 Proof. TIIP only propprt.y of a me't.ric t.hat is difficult, to verify is that where T is real. From (L12) p((A,B)D,(C,D)J)) = 0 ===? (A,B)J) = (C,D)D' ( H .H 2 ) alx l XI + a2x2 X2T x [Xl'DllXI + (Xl'DI2X2 + x1 D 21 XI)T + xr D22X2T2] = ( " + ' H 2 )[ H c + ( .H C H C ) H c 2 ] :rl.TI X2 X 2 T XI llXI XI 12 X 2+ X 2 21XI T+.T2 22X2T " \Ve' will establish this implication, leaving the other properties as an eXNcise" Suppose that p((A, B)D, (C, D)D) = O. Then for all X :r H Ax.T II Dx = XII BxxHCx. (1.12) Equating powers of T, we ge H D li C ajx j jjXj = x j jjXj, j = 1,2, (L 13) (1.14) Le't us first dispose of cases two and three in Definition 1.21. Suppose that A = ILB. Since (A, B) is definite, it follows that Xli Bx '" 0 for all :L Ilence from (1.12), we' have' x"Cx = J.LXH Dx for all x. Equivalently C = pD, which shows that (it, B) = (C, D). The third case is treated similarly. Turning to the first case, it is easy to see that the first relation in Definition 1.21 and the equation (1.12) remain invariant under substi- tutions of the form Xr X IXr(a I D 22 - Cd X 2 = xr x 2 x l'(C ll - a2 D ll):rl, and aj(Xr D 12. T 2 + xID21Xd = XrCI2X2 + xrD 2l xI, j = 1,2. (LI5) From (1.13) Cll = alD ll , C 22 = a2D22. Hence from (1.14) we obtain A f- A cos cp - B sin cp, n f- A sin cp + B cos cp, C f- C ('Wi r/> - C sill cp, D f- C sin cp + D cos cpo H D x 2 22 X 2 XX2 xl'Dllxl xrxi =J.L for some real 11.. Since this relation holds genprally m :rl and .'];2, WE:' must have Hence by Theorem 1.18, we may assume B is positive definite. The same relations are also invariant under congruence transformations. Hence by Theorem 1.15 we may assume that B = I and that A = diag( (Xl I, . . " , amI), where the ai are distinct numbers. Since A is not a multiple of B, we must have m > 1. The general proof is sufficiently well illustrated by the C&'ie m = 2. Let Dll = D 22 = pI Since al '" a2, we have from (1.15), xr DI2X2 + x1 D 21 xI = O. (1.16) Taking first X2 = Dxl and then XI = D\ X2 in (1.16), we obtain D I2 = 0, D 21 = O. C = ( Cll CI2 ) , C 21 C 22 D = ( Dll D12 ) D 21 D 22 Hence C = IlA, be conformal partitions of C and D. Consider the vector and D = III = liB. . X = (:rr TX)I1, 
288 VI. GENERALIZED EIGENVALUE PROBLEMS 1. I3ACKGROUND 289 The notation p( (A, B)o, (C, D)o) is a little clumsy, and in the sequel we will write po[(A, B), (C, D)]. the perturbed pair (A, B). This implies that up to first order terms, the errors in, say, A do not cross over to affect the second component (3 of the eigenvalue. But the metric PD confounds errors A and B, while the chordal metric confounds the resulting perturbations in a and (3. Moreover, the resulting bounds are not scale invariant. Replacing the pair (A, B) with (TA, B) will give essentially different bounds-- bounds that change nonlinearly with T. For example, our theorems will not reduce to the usual perturbation theorems when B = I; to recover them we must let T --+ O. However, the situation is not entirely bleak For nicely scaled prob- lems the perturbation bounds we derive may be quite satisfactory. Moreover, some of our theorems will give bounds on perturbations of the components of (a, (3), bounds that the analyst may use in any way he sees fit. When it comes to general matrix pairs, the situation is even less satisfactory. The metric defined above has all the drawbacks discussed above. Moreover, it is asymmetric; we obtain essentially different the- orems for the pairs (A, B) and (All, BlI). However, when A and Bare Hermitian this last objection does not apply. However, the reader should keep in mind that the function PD so defined is a pseudo-metric, not a metric. Turning now to general, regular matrix pairs, we will say that the pair (11, B) is LEFT EQUIVALENT to (C, D) if there is a nonsingular matrix y such that yH(A, B) = (C, D)" This is clearly an equivalence relation and we shall dC'note the equivaknce classes by (A, B)L. We wish to define a metric over the equivalence classes (A, B)L. The key observation is that (A, B) is left equivalent to (C, D) if and only if the row spaces of the matrices (A B) and (C D) are the same. Consequently we can define a metric by using anyone of the metrics in Section 11.4. Far definiteness we will choose the gap metric in the 2-norm, or equivalently we will make the following definition. Definition 1.23. Let (A, B) and (C, D) be regular matrix pairs. Then p((A, Bh" (C, D)L) = sin8 1 , j t t' ) I whrl'r 8 1 is thr largrst canonical angle betwecn the row sp<1ces o£(A B) and (C D)" Again, we will usually write pd(A, B), (C, D)] for p( (A, B)L, (C, D)L). Note that there is a natural function PR obtained by considering the pair (AH,B lI ). Let us now step back from the technical details and take a broader view. The justification for the metrics we have introduced is conve- nience and elegance. The metrics are convenient because they regular- ize singular cases. For example, all eigenvalues, finite and infinite, are treated uniformly. The elegance can only be judged by the results, but the reader is invited to look ahead to the statement of Theorem 3.2. On the other hand, convenience and elegance exact a toll. In Chap- ter III we discussed the loss of information entailed in using norms, and the same caveats apply here. But there is more. In the next section we will show that if x is an eigenvector of the def- inite pair (A, B) corresponding to the simple eigenvalue (x H Ax, x H Bx), then (xII Ax, XII Ex) is a first order approximation of the eigenvalue of Notes and References Matrix pairs arise naturally in the study of systems of ordinary differential equations of the form '/ .J A dx = Bx dt ' where the simultaneous diagonalization of A and B by an equivalence rep- resents a transformation which uncouples the system. Generalizations to higher order systems lead to A-MATRICES of the form An + AlA +. . . + AkA k , for treatments of which see [81, 86]. WC'ierstrass [263, 1867] established his canonical form (Theorem 1.13) by working with a pair of bilinear forms, as was customary at the time. Jordan [125, 1874] gave another proof, which included singular pencils. Later Kro- necker [139, 1890] extended these results to rectangular pencils (for details see [81]). For modern computational treatments see [60, 272]. The generalized Schur form of Theorem 1.9 is due to Stewart [201, 197 2 ], as is the condition for the generalized Sylvester equations to be nonsingular (Theorem 1.11). 
290 VI. GENERALIZED EIGENVALUE PROBLEMS Definitp pairs in which one or the o',her of the components is positive definite constitute the majority of applications. It is not generally appreciated that Fisc!wr [74, 1!)O5] proved his min-max characterization (Corollary 1.16) for such pairs' , not simply for eigenvalues of Hermitian matrices. TIH'orPIll I "IR charact,prizing ddinitp pairs is one of a IlIl1nber of interrelated tllPOrPIllS, whose hist.ory a\l(l illt.PI'COllllPCUOIlS havp beell admirably surveypd by Uhlig [243]. The particular theorem given here is due to Crawford [48, 1976]. It. should be noted that in the definition (1.8) of I'(A, B) the minimum is taken over all vectors x E en. It might be hoped that for symmetric pairs olle could let x range over Rn. Although one can when n > 2, one cannot when n = 2, as the pair of Example 1.14 shows. The chordal metric was first used ill the perturbation theory for matrix pairs by Stewart [204, 1975]. 'I'll!' metric 1'0 was introduced by Sun [222, 1982], as were the metrics I'L and I'H [220, 1979] (see also [67])" Exercises 1. Show that if A and Bare nonsingular and (A) is an eigenvalue of (A, B) thell (A -I) is an eigenvalue of (A -I, B- 1 ). 2. Let A = I al\(l B=(I;£ 1£)' where £ is small. Then the pair (A, B) has eigenvalues Al S:' (1,1 + !£2) and A2 S:' (1, - !£2)" Show that Al is insensitive to perturbations in B, but is an ill-collditioned eigenvalue of B- 1 A. :\" (Molpr and St.pwart. [161]). Ld, (A, n) be a real, regular mat.rix pair. Show that (A, B) is orthogonally equivalent to (5, T), where T is upper triangular and 5 is block upper triangular with 1 x 1 or 2 x 2 blocks on its diagonal. 4. Show that the Weierstrass canonical form (1.7) is essentially unique. 5. Let A and B be positive definite. Show that if x H Ax s: xli Bx for all x'" 0 then xHA-Ix;::: xHB-Ix. 6. Verify that the chordal metric may be defined by (1.10). 7" Show that IAI, 1, / 1::; 1 =} X((A), (CXJ)) ::; IA -ILl::; 2X((A), (CXJ)). 2. REGULAR MATRIX PAIRS 291 8. Let (A, B) be as in Exam ple 1.14. Show that min / (.r T Ax)2 + (x T B:r)2 = 1, xER" V IIr1l21 eV(1I t.hough (A, n) is 1101, ddinite. 2. Regular Matrix Pairs In this section we will treat the perturbation of the eigenvalues of regu- lar matrix pairs. We begin with first order perturbation theory, which exhibits the typical behavior of a simple generalized eigenvalue. We then turn to a generalization of the Gerschgorin theorem, which is the most useful tool for bounding pertnrbations of generalized eigenvalues. Next comes a generalization of Theorem IV.3"3, which bounds the spec- tral variat.ion in terms of the condition of the eigenvalues. Finally, we develop the perturbation theory of eigenspaces" Throughout this section, (A, B) will be a regular matrix pair of order nand (A, B) = (A + E, B + F) will be a perturbation of (A, B). 2.1. eontinuity, First Order Theory The first thing that must be established is the contilluity of the eigenval- ues of matrix pairs. We will use Theorelll 1.6 to reduce the continuity of the generalized eigenvalues to that of the ordinary eigenvalue problem. Here it is not critical how we measure the size of the perturbation in (A, B), and to fix on a single measure we will set £ = V IlEII + IIFII. Theorem 2.1. Let (A, B) be a regular pair, and let its eigenvalues be (AI),"" (An). Then there is an ordering ()'l)"", (\,) of the eigenval- ues of (ii, B) such that lirn x((>-;), (Ai)) = 0, (-.-.--.to i=I,..",n. 
292 VI. GENERALIZED EIGENVALUE PROBLEMS ; , i I I I l"- I; 1\: t!.; I Ij: 'C Proof. By ThC'orem L6 there is a 2 x 2 matrix TV such that the matrix D in the pair (C D) = (A B)(W QSi I) is nonsingular. Let /11,... ,/1n be the eigenvalues of DIC. Let (6 D) = (A B)(W QSi I). For E sufficiently small, D is nonsingular. By the continuity of the ordinary eigenvalue problem, we know there is an ordering of the eigenvalues Ill, . . . ,Iln of f)-Ie' such that lilll,.oil; = II," . Lct.(jJ i ; -Iti) = (A -l)WT. Thcn y Thoreml(j, (Ai) = i,(3i) IS all clgenvalue of (A, B). If we set ((3i - ai) = (.\ - I)W and (.i) = (ai, jJi), then (i) is an eigenvalue of (A, B). Since (ai, jJi) converges to (O'i, (3i), it follows that (i) converges to (Ai) in the chordal metric. _ Throughout this book, we have presented first order perturbation expansions whenever they exist Although these expansions are usually carollaries of more general results, in many cases the general results themselves were conjectured by looking at first order expansions. The reason is that the expansions often tell ninety percent of the story and yet are free of the clutter that accompanies rigorous upper bounds. Since research into the perturbation of matrix pairs is still in a state of flux, it is appropriate to begin with first order perturbation theory. Let (0', (3) be a simple eigenvalue of (A, B). In order to derive a first order perturbation expansion, we must first show that one exists. In one sense this is trivial. By Theorem L6, we may assume that B is nonsingular. Hence for E sufficiently small iJ is nonsingular, and to the eigenvalue A = 0'/(3 of BI A there corresponds an eigenvalue  = A + O( f) of iJ-I A, which is differentiable in the elements of A and 13" It follows that (.) is the required expansion" However, when we look at the individual components of (0', (3) we find that their perturbations are not unique. For if (a, jJ) is an O(f) perturbation of (0', (3) in the chordal metric and ify(f) = O(f), then (a+aify(f),i3+(3ify(f)) differs from (a,jJ) by 0(E 2 ). This follows directly from the formula x( (a, i3,), (a + aify(f), i3 + (3ify(f))) _ lify(f)(a(3 - jJa)1 - J lal2 + 1i312 J la + o:ify(E) 1 2 + 1i3 + (3ify(E)1 2 ' in which the denominatar is O( (2). Fortunately, Corollary 1.10 provides a canonical choicc far ((t, /3). ,i .., J 1 :1 ", j: i\ .\ \   J, , IJ I " ,"; .' 2. REGULAR MATRIX PAIRS 293 Theorem 2.2. Let (0', (3) be a simple eigenvalue of the regular pair (A, B) with right and left eigenvectors x and y. Let (c):, jJ) be the corresponding eigenvalue of the O(E) perturbation (A, B)" Then (a,jJ) = (yHAx,yHBx) + 0(f 2 ). (2"1) Proof. A pplying the pert 1Il'bation theory for Uw ordinary f'igcnvaluc problem first to B-1A and then to AB-l (after a transformation, if necessary, to make B nonsingular), we find that we may take for the eigenvectors corresponding to (a, jJ) the vectors i: = x + u and y + v, where u, v = O(E). By Corollary 1.10, (a,jJ) = (yHAi:,yHBi:) = (yHAx + yllAu + vHAx + 0(f2), yHiJx + yllAu + vHAx + 0(f2)). Since (A, B) is regular, at least one of 0: or (3 must be nonzero, say (3 oj 0" Then li B li B II A II A Y u + v x Y u+v x=o: (3 and H B H B H B li B (3 y U + v x y u+v x= (3 Thu (yllu + vII Ax, yll Bu + vII Bx) is an order f perturbation of (yIlAx,yHAx) that lies along (0',(3). By the observation made just before the theorem, deleting these terms introduces an 0(f2) error. _ From this theorem we may derive approximate error bounds for the perturbation of a simple eigenvalue. Specifically, if we set 0: = yH Ax and (3 = yll Bx, then X( (0:, (3), (a, jJ))  Io:yll Fx - (3yll Exl J l0:1 2 + 1(312 J IO' + yH Exl2 + 1(3 + yll Fxl2 c:,; layll Fx - (3yH Exl 10'1 2 + 1(312 To turn this approximation into a bound, note that l"y"Fx - fiy" Exl  k'(E P) ( !:x ) I ::; J l O' l 2 + 1(32111xll21IyI1211(E F)II2. 
294 VI. GENERALIZED EIGENVALUE PROBLEMS 2. REGULAR MATRIX PAIRS 295 Hence if we set V= IIxlbllyll2 V l a j2 + 1/J12 ' (2.2) Theor.em 2.3. Let (A, B) and C, B) be !'etfular pairs, and let 11.11 be a. consIstent matrix norm. If (ii, (3) E L:[(A, B)] is not an eigenvalue of (A, B), then thC'n II(A - iiB)-I([3E - iiF)1I 2: 1. (2A) x( (0, (3), ((t,;3)) ;S I/II(E F)112" (2.3) Proof. Since (o,) ct .c[(A,_B, the matrix ;3A - o:B is nonsingular. Let i be an eigenvector of (A, B) corresponding to (ii, [3). Since Thc 1lI11l1bcr 1/ defined by (2.2) is completely analogous to the nUlll- ber 1/ defined by (IV.2.8) for the ordinary eigenvalue problem; that is, it serves the role of a condition number for its eigenvalue. Unfortunately we cannot obtain the usual bound for the ordinary eigenvalue problem Ax = AX by replacing B by I and F by O. However, if we replace A by T A, A by T A, and E by T E, then the bound (2.3) becomes 0= ([3A - iiB)i = ([3A - iiB)i + ([3E - iiF)i, we have X((TA), (T));S v T IITEII2, (A - iiBtl(E - iiF)i = .1:, from which the theorem follows on taking norms. . We may now state and prove the generalization of the Gerschgorin theorem. where IIxll211ylb V T = V ITyHAxl2 + l y llxl 2 I3ut as T --+ 0+, we have X((TA), (T.\))  TI.\ - AI. Moreover, condition number I/ T approaches IIxll2l1ylldlyHxl" Hence we have Theorem 2.4. Let (A, B) be a regular pair" Let the Vi = { (a, {J) : l(3aii - a(3iil ::; L l{Jaij - a(3ijl }, #i i=1,..",1L Then 1 .\ - A I < IIxll211ylb II E II rv lyHxl 2, n L:[(A, B)] c U Vi' i=1 which is the usual bound. The approximate bound (2"3) is about as good as we will see for the generalized eigenvalue problem. However, the trouble we had to take to retrieve the Rayleigh quotient bound is a reminder that it suffers from the limitations noted at the end of the last section. When in doubt, one should return to explicit forms, like the approximation (2.1) provided by Theorem 2.2. Moreover, if the union k of the regions Vi is disjoint from the others and is not equal to the space ei of all (a, (3), then the union contains exactly k eigenvalues of (A, B). Proof. Let D A = diag(all,...,a nn ) and DB = diag({Jll,...,(3nn). In Theorem 2.3 make the substitutions A f- D A , A f- A, B f- Dn, B f- B, 2.2. Gerschgorin Theory In this subsection we will generalize Gerschgorin's theorem and apply it to the perturbation of multiple eigenvalues. As in Section IV.2, we approach the theorem through a generalization of the I3auer-Fike the- on'lll. and 11.11 f- II. 1100' Then it is easily verified that the inequality (2.4) is equivalent to saying that each eigenvalue of (A, B) is in some Vi. The statement about isolated disks follows from the continuity of the eigenvalues as in the ordinary Gerschgorin theowm -- namely, if we introduce the pairs (AT' B T ) = [DA + T(A - D A ), DlJ + T(A - Dn)], 
29G VI. GENERALIZED EIGENVALUE PROBLEMS 2. REGULAR MATRIX PAIRS 297 then the corresponding regions V;T) increase with T. The only tricky point is to insure that the pair (AT' B T ) is regular for 0 :S T :S 1. We argue as follows. Assume without loss of generality that the I . " . t [ . k V (T) V (T) Tl U k V (T) I U n V (T) ( ISjOlll, (IS S are I' . . ., k'" len i=1 i an( i=k+1 i are disjoint closed sets. Since ei is connected, there must be a point (0', m  U;"=I D;T) U U;'=k'.J I V;T) Then (O'J3) is not an eigenvalue of (AT> 11 7 ), which is tlH'rdorC' rqlllar. . The first comment to be made about this theorem is that it becomes uninteresting when some (O'ii, (3ii) = (0,0), since in this case Vi includes all pairs (0', (3). In the sequel we will tacitly exclude this case (see Exercise 2.1). The regions Vi are difficult to compute, since (0', (3) appears on both sides of the bound. However, by expanding the regions, we may remove this dependence. the last inequality following from the Cauchy inequality. Thus the inequality I(3O'i; - O'(3;d :S L I(3O'ij - O'(3ijl j#i implies the inequality l(3aii - O'fJiil :S V l0'1 2 + 1131 2 lIailii + Ilbdli" ai = (Oil,...,0'i,i_1,0,O'i,i+1,"",O'ill)T The corollary now follows on dividing by J IO'id 2 + IfJiil 2 J I0'12 + 1(31 2 " . This corollary is actually our principal Gerschgorin theorem. It should be noted that by the same kind of limit argument we used in the last subsection, we can recover the usual Gerschgorin thearem, for the ordinary eigenvalue problem. The technique of diagonal similarities, which we used so successfully in Section IV.2, can be applied to the generalization of Gerschgorin's theorem. To vary the application, we will show how to apply the tech- nique to a multiple eigenvalue of a diagonalizable pair. Since we have already showed in Section IV.2 how to take into accollnt terms of order higher than the first, we will not bound their contribution here. Let (A, B) be a diagonalizable pair; that is, suppose there exist nonsingular matrices X = (Xl .,. X,,) and Y = (YI ... Yn) such that eorollary 2.5. Let and T b i = ((3i1,".. ,fJi,i-1,0,(3i,i+I,'" , (3ill) bl? th(' rows of A - D A and B - D 13. Let Pi = Iladii + IIbilii IO'iil 2 + l(3iiI 2 ' (yll AX, y" EX) = [diag( 0'1, " . . ,0',,), diag((3I, . . " , (3n)]. and let Assume that (0'1, (3d = '" = (O'k, (3k), and that these eigenvalues are distinct from the others, so that 9i = {(0',(3): X((O',(3),(O'ii,(3ii)):S pd" b = min X((Ol,(3J), (O'i,fJi)) oj O. k<l'5:.n Then Vi C9i, i = 1,... ,n. Set Proof. We have Vi = II X ill211Yill2 J 10';\2 + l(3iI 2 ' L I(3O'ij - O'(3ijl = II(3ai - O'adh jfi and let < ((30') ( 110.;\11 ) - IIb i ll l :S V l O' l 2 + 1(31 2 lIadli + Ilbilli, V = Inax Vi I<;i<;k and I V = Iuax I/i. k<i<;n 
298 VI. GENERALIZED EIGENVALUE PROBLEMS 2. REGULAR MATRIX PAIRS 299 Then TyH(A + E)XT- I = O'll + /ll /12 /13 T/l1 T/15 T/16 /21 0'22 + /22 /23 T/24 T/25 T/26 /31 /32 0'33 + /33 T/34 T/35 T/36 TI/41 -I -I 0'44 + /44 T /42 T /43 /45 /46 -I -1 -I 0'55 + /55 T /51 T /52 T /53 /54 /56 -1 -I -I /64 0'66 + /66 T /61 T /62 T /63 /65 and Tyl1(B + F)XT- I = (3ll + 1]1l 1]12 1]13 71 114 T1]15 71 116 1]21 (322 + 1]22 1]23 71 124 T1]25 71 126 1]31 1]32 (333 + 1]33 71 134 T1]35 T1]36 -I -I -I (344 + 1]44 T 1]41 T 1}42 T 1]43 1145 1]46 T- 11 151 -] -] 1]54 !355 + 1J55 T 1]52 T 1]53 1156 T- 11 161 -I T- I 1]63 1]64 (366 + 1166 T 1]62 1165 Now consider the pair (A, B) = (A + E, B + F). Write (for n = 6) } 'II(A + E)X = 0'11+/11 /12 /13 /14 /15 /16 /21 (Y22 + /22 /2:3 /24 /25 /26 /31 /32 0'33 + /33 /34 /35 /36 /41 /12 /43 0'14 + /11 /45 /46 /5] /52 /53 /51 0'55 + /55 /56 /61 f62 /63 /64 /65 0'66 + /66 and yH(B + F)X = (311 + 11u 1]12 1]13 1111 1]15 1]16 1]21 (322 + 1]22 1]23 1}24 1125 1]26 1131 11:32 {J:33 + 1133 1131 1135 1}36 1/.1] 1/.12 111:1 (341 + 1111 1145 1]16 1151 1}52 11r,3 1}54 (355 + 1155 1156 116] 1]62 1163 1161 1]65 (366 + 1]66 Let I' = max{IIElb, IJFII2}, so that Ellxjlbllydb  lTijl, 111ijl. As I' -+ 0, the first three of the regions 9i have radii that approach zero. The last three have radii that are bounded by V2kw' /T, up to terms of order 1'2. Consequently, if we take T= 2V2kv'E {j then for I' small enough the first three regions will be disjoint from the last three and hence will contain exactly three eigenvalues. The radius of these disks is bounded by V2( k - 1 )1/1' up to terms of order ('2. Since it is easily seen that Far definiteness let us suppose k = 3 and n = 6. Let X((O'ii,(3ii), (O'ii + /ii,(3ii + 1]ii))  .JiVE + 0(1'2), T = diag(T,T,T,I, 1,1). we have shown that 
300 VI. GENERALIZED EIGENVALUE PROBLEMS 2. REGULAR MATRIX PAIRS 301 there are ('xactly k eigeJJ\'alues (a;, /3,) (i (A, 13) sat.isfring 1,".",k) of Theorcm 2.6. Let (A, B) be a regular pair, awl Sll]Jpose that for somc nonsingular X and Y we have - 2 x((nll,,6l1), (ii;,,6;) + O(f ). (yll AX, yH BX) = (D A, DB), where There are four points to be made about this result. First, the as- sumption that the pair (A, B) is diagonalizable is not necessary. What is required is that the multiple eigenvalue have k linearly independent eigenvectors. Other multiple eigenvalues corresponding to nontrivial .Jordan blocks can be handled as in Section IV.2. Second, the above development shows that when k = 1 there is an eigenvalue of (A, B) in the region with center (0' + y1 E.TI,,6 + y1 FXl) and a radius O( (2). Thus, the GNschgorin theorem gives an indepen- (h'nt proof of Theorem 2"2. However, unlike our first proof, this one offers the possibility of computing the bound" Third, the number v is a condition number for the multiple eigen- value in the sense that the bound on the error in the perturbed eigen- values is proportional to the error times v. However, the constant of proportionality grows linearly with the multiplicity of the eigenvalue. Finally, the theory developed here is a worst-case theory depending on the largest of the numbers /Ji. In practice, the perturbed eigenval- ues will telld to have cOlldition inversely proportional to of value of /Ji particular to itself. Far example the pair D A = diag(O:I"", an), DB = diag(,6I,... , ,6,,). Let (A, B) be regular. Then [or evelY eigenvalue (ii, ffi) E L:[(A,13)] there is an eigenvalue (0:,,6) E L:[(A, B)] that satisfies X( (0:, ,6), (ii, ffi)) ::; 1\:2(X)pd(A, B), (A, B)]. (2.5) Proof. Since both the eigenvalues of (A, B) and the equivalence class (A, B)L are invariant when A and Bare premultiplied by a nonsingular matrix, we may assume that U H = (A B) and (;11 = (A B) have orthonormal rows. Moreover, since (A, B) is regular, we may assume that liil 2 + Iffil 2 = L Let i: be the right eigenvector corresponcling to (ii, ffi), normalized so that IIxll2 = L Then ffiAx - iiBx = ffi(A - U II (; A)x - ii(B - U II (; B)x [U 2,OO)(  [,Oon] = (A - UII(J A B - U II (; B) ( 'T ) -ax = (UH _ UH(;(;H) ( x ) -o:x = UH(UUH _ (;(;11) ( x ) -o:x = UH(Pu - Po) ( x ) . -o:x has a double eigenvalue (2,1). But one of these eigenvalues is very sensitive to perturbations of order 0.1, whereas the other is noL The question of how to make this observation precise is an open research problem. 2.3. Diagonalizable Pairs In this subsection we will consider the eigenvalues of diagonalizable pairs, and in particular we will generalize Theorem IV.3.3 - the well- known corollary of the BauerFike theorem. By Theorem 1.5"5 the singular values of Pu - Po are the sines of the canonical angles between the column spaces of U and (;. Hence by Definition L23, 118Ax - iiAxl12 ::; pd(A, B), (A, B)]" (2"6) 
302 VI. GENERALIZED EIGENVALUE PROBLEMS Let I'll = X-I and QH = y-I. Then (A, B) = (QDAPH, QDBPH). Now 1= UHU = Q(DAPHpDA + DBpHpDB)QH. Hence for any w, wH(QHQ)-IW = wH(DAPHpDA + DBPHPDB)w ::; IIpil PIl2WIl(ID AI2 + IDEI2)W, and therefare (Exercise 1.5) WIl(QIIQ)w;::: IlpHpIl 2 I W Il (IDAI2 + IDBI2)-IW" Thus - - ) II II 1I;3A:r - ii-A.1:112 = IIQ(;3D A - aD E P x = [:r Il P(,6D A - iiD B )IIQIIQ(,6D A - ("):DB)PH x ]! ;::: IIPI12 1 [X H p(,6D A - iiD B )(ID A !2 + IDBI2)-I(,6DA - iiDB)pHX]! > IIPII-I(xHpllpx)! min l,6ai - ii;3;1 - 2 i v lail2 + l;3il 2 ;::: 1\:2(X) min p( (a;, ;3i), (ii,,6))" 1 (2.7) The theorem now follows on combining (2.6) and (2.7). . Recall t.hat we defined the spectra variation sv A (A) of A with respect to A as the largest distance of an eigenvalue of A from the nearest eigenvalue of A [see (IV.l.l)]. If we define SV(A,B)[(A, B)] analogously, then the conclusion of Theorem 2.6 can be written SV(A,B)[(A, B)] ::; 1\:2(X)pd(A, B), (A, B)]. Although the bound (2.5) is satisfactory in many ways, it is difficult to Ilse when we only know bounds on the perturbations E ald _ F. One approach is to use Theorem 111.4.1 to bound pd(A, B), (A, B)]" However, another approach is to adapt the proof of the thearem to give a direct bound. Theorem 2.7. In addition to the hypotheses of Theorem 2.6, suppose that the columns of X and Yare normalized so that ID AI2 + IDBI2 = I. Then SV(A,lJ)[(A, B)] ::; IIXlbIlYIl211(E F)1I2' (2.8) 2. REGULAR MATRIX PAInS 303 Proof. Consider the equivalent pairs (D A, DB) and (D A +y lI EX, D B + yHFX). We have ,6D A x - aDBx = ,6y H EXx - iiyHFXx = yll(E F) ( Xx ) -aXx Hence II,6D A x - iiD B xll2 ::; IIX1I211Y1I211(E F)lb. Since D A and DB are diagonal and IDAI2 + IDBI2 = I, it is trivial to verify that II,6D A x - iiD B x1l2;::: miIIX((ai,;3i), (ii,,6))" . , If in the above theorem we assume that lIy;ll2 = 1, then IIx;ll2 is the condition number Vi. Mareover, II Y ll2 ::; ..;n and IIXII ::; ..;n maxi l/i. This gives the following corollary" eorollary 2.8. Let Vi = Ilx;l1211y;/12 V l a il 2 + 1;3;/2' Then SV(A,B)[(A, B)] ::; n max v;/I(E F)II2' , 2.4. Eigenspaces In this section we will treat the perturbation of eigenspaces, which are the natural generalization of invariant subspaces. The theory largely parallels theory of invariant subspaces developed in Chapter V, and the exposition here will be a little terser than usual. Definition 2.9. Let (A, B) be a regular matrix pair. The subspace X is an EIGENSPACE if dim(AX + BX) ::; dim(X)" (2.9) 
304 VI. GENERALIZED EIGENVALUE PROBLEMS If dim(,1') = l, then (2.9) implies that both AX and 13X are contained in a subspace Y of dimension l. In otht'r words, A and B have essentially the same effect on X" Eigenspaces have the following characterizations. Theorem 2.10. Let (A, 13) be a regular matrix pair and let X be a subspaCl? of dimension l. TheIl the following statements are equivalent. 1. X is an I?igenspace of (A, 13). 2. There are nonsingular matrices (X I U 2 ) and (VI Y2) such that R(X]) = X and ( VII ) ( AI HA ) y:1I A(X I U 2 ) = 0 A 2 ' ( BI HB ) . o B 2 (2.10) ( v:H ) H B(X I U 2 ) = Moreover thl? pairs (AI, Bd and (A2' B 2 ) are regular. :t If tile colUllJllS of XI [or a IJasis [or X, tJ]eIl I.her(' is a regular pair (AI, Bd such Ihat AXIB] = BXIA]. (2.11) Remark 2.11. The proof will show tl1at we may take (XI [[2) and (VI Y2) to be unitary. Proof. 1 => 2: Let (XI U 2 ) be a unitary matrix with R(X I ) = X. Since X is an eigenspace, both AX and B X lie in a subspace Y of dimension l. If we let (VI Y2) be a unitary matrix with R(Vd = Y, then (XI U 2 ) and (Vi Y2) are the required nonsingular matrices. 2 => 3: From (2.10) we have AX I = VIAl and BX I = VIB I . Since (AI, B d is regular, by Theorem 1.13 here ar: nonsingular matri- ces Rand S such that Al = R H AIS and BI = R B]S commute. Let 2. REGULAR MATRIX PAIRS 305 XI = XIS and VI = VIR-II, so that R(X[) = X A/Y I VIA], and B.);I = 7IB]. It follows that A.-Yd3 1 = BxIA]" ' If the columns of XI form a basis for X then XI = XI T for some nonsingular matrix T. The conclusion follows on setting B] = T-181 and Al = T-I A]. 3 => 1: By Theorem 1.13 we may assume that A] = diag(J, I) and BI = diag(J, N)" Let P = AX I and Q = BX I . Then with the natural partitioning, (r, r,) ( )  (Q, Q,) ( ). It follows that R(QI) C R(Pd and R(P2) C R(Q2)' Hence dim[R(P) + R(Q)] = dim[R(P I ) + R(PJ) + R(QI) + R(QI)] = dim[R(P I ) + R(Q2)] :::; l. . Equation (2.10) shows that in some sense the pair (AI, B I ) is a representation of the part of (A, B) associated with X. In particular, if (0', (3) is an eigenvalue (AI, B I ), then it is an eigenvalue of (A, B)" Moreover, if B is nOllsinglllar, then XI is an invariant sllbspace of B-] A" See Exercises 2"3 and 2"2. Equation (2.10) implies that R(Y 2 ) is a left eigenspace of (A, B) with representation (A 2 , B 2 ). We will say that X is a SIMPLE EIGENSPACE if .c[(A], B])] n .c[(A 2 , B 2 )] = 0. By Theorem 1.11 this is sufficient for the existence of matrices P and Q such that ( i)( ' :) U n (':,) and ( i)(' :)(  n (';,) If we set X 2 = U 2 + XIP 
30G VI. GENERALIZED EIGENVALUE PROBLEMS ami Y I = VI + Y 2 QH, Then we have proved the following spectral resolution theorem. Theorem 2.12. Let X be a simple eigcnspace of the regular pair (A, B). Thcn there are nonsingular matrices (Xl X 2 ) and (Y I Y 2 ) such that ( }'II ) ( A 0 ) I H A(X I X 2 ) = I Y 2 0 A 2 (2.12) and ( y/I ) ( BI 0 ) H B(X I X 2 ) = . Y2 0 B 2 (2.13) In analogy with the terminology for invariant subspaces, we call R(X 2 ) the COMPLEMENTARY EIGENSPACE. The spaces R(Yd and R(Y 2 ) are the corresponding left eigenspaces. Turning now to the perturbation of eigenspaces, we begin with an approximation theorem. Let (XI U 2 ) and (VI Y 2 ) be nonsingular and set ( V/I ) . ( AI Ii A ) 'II A(X 1 [12) =, , ) 2 (, A ib ( VIII ) B ( X U ) = ( BI H II ) . y 2 H I 2 C B B 2 If C A = C B = 0, then R(X I ) is an eigenspace of (A, B). We now su ppose that C A ancl C B are small, and ask how near R( X I) is to an eigenspace. In analogy with the ordinary eigenvalue problem, we introduce per- turbations (2"14) ,  T II XI = X j + [hP and }2 = }2 + VIQ and attempt to determine P and Q so that Y2kYI = }2H/YI = O. (2.15) This leads directly to the system of equations QA I + A 2 P = -C A - QHAP, QB I + B 2 P = -C B - QHBP. (2.16) 2. REGULAR MATRIX PAIRS 307 If we set T = (P, Q) I-> (QA I + A 2 P, QB I + B 2 P), then (2.16) can be written T(P, Q) = -(C A + QHAP, C ll + QHBP). (2"17) To establish a perturbation bound we must introcluce a norm on the ace of pairs (P, Q). Here we will work with the norm II . IIF defined II(P, Q)IIF f max{IIPIIF, IIQIIF}. If we define dif[(A I , Bd, (A 2 , B 2 )] ,t inf IIT(P Q)IIF ( 2.18 ) II(P,Q)ILF=I ' , then by Theorem 1.11, dif[(A I , Bd, (A 2 , B 2 )] > 0 if and only if the spectra of (AI, B I ) and (A 2 , B 2 ) are disjoint. For later use not that dif[(A l + E I , BI + Fd, (A 2 + E 2 , B 2 + F 2 )]  dif[(A I , Bd, (A 2 , B 2 )] - max{IIEdh + IIE 2 112, IIFdl2 + IIF2Ih}. T' (2"19) "" Ith these preliminaries, we may now turn to the approximation theorem. Theorem 2.13. Let the regular pair (A, B) be as in (2.14). Set i = II(CA,CB)IIF, 1} = II(HA,HB)IIF Assume that L:[(A I , Bd] n L:[(A 2 , B 2 )] = 0 so that 6 = dif[(A I , Bd, (A 2 , B 2 )] > o. Then if rn 1 82 < 4' there is a unique solution (P, Q) of (2"17) satisfying II(P, Q)IIF ::::; 6 vI: 7 < 2:l (2.20) + - 4i1J 6 he column spces of XI and }'2 defined by (2.15) are complementaq nght and left elgenspaces of (A, B) corresponding to the regular pairs (1 1 .  HAP, BI + HBP) and (A 2 + QH A , B 2 + QH ll ), whose spectra are dIsJOInt. 
308 VI. GENERALIZED EIGENVALUE PROI3LEMS 2. REGULAR MATRIX PAIRS 309 Proof. Let 'P[(P, Q)] = (QHAP, QU n ?)" Then it is easy to see that the conditions of Theorem V.2.11 are satisfied, which establishes the existence of (P, Q) satisfying (2.17) and (2.20). To prove the statements about '-X"I and 1'2, consider the equivalences ( I 0 ) ( F/' ) A ( X U ) ( I 0 ) Q I 1:]' I 2 ? I _ ( '/I ) , ) _ ( AI + HAP - }'p A(X I U 2 - 0 H A ) A 2 + QH A Although the function dif is nonzero if and only if the spectra of its arguments are disjoint, its size is not directly related to the distance between the spectra - either in the complex plane or on the Riemann sphere. In fact, multiplying the arguments of dif by a common scalar increases dif by the absolute value of the scalar without changing the spectra of the arguments. From (2.14), we see that if (XI U 2 ) and (VI 1'2) are unitary, th(']] rl is thp F-nonn of a perturbation (E F) such that R(X 1 ) is anl'igenspace of (A + E, B + F). Namely, take and E = (VI Y2) ( 0 -G A )(:) - \? G vll - - 1 2 A .-\. I ( I 0 ) ( VII ) ( I 0 ) Q I 11I B(X I U 2 ) P I ( VIH ) " ( BI + HBP H B ) = "lI B(X I U 2 ) = QH . r 2 0 B 2 + B ane! This shows that XI and r'2 are complementary right and left eigen- spaces. To prove the statement about the spectra, note that by (2.19) ( 0 0 ) ( Xli ) F = (VI 1'2) I = -Y2GnXtl. -G B 0 U 2 However, this backward perturbation is not necessarily the smallest one with this property. There is a perturbation theorem corresponding to Theorem 2.13. Its proof is left as an exercise. dif[(A I + HAP, BI + Hn P ), (A 2 + QH A , B 2 + QH B )] 2: {y - max{IIHAlldllPIIF + IIQIIF)IIHBIIF(lIPIIF + IIQIIF)} 2: {y - 211(IIA, IIn)II.FII(P, Q)II.F > {y - 4 171' > O. . {y Theorem 2.14. Let R(X I ) be an eigenspace of the regular pair (A, B), and let the pair have the decomposition (2.10). Given the perturbation (E, F), let ( : ) ;(X, U,)  ( 0 ) F(X, U,)  ( Ell E 21 ( Fll F 2 ! El2 ) , E 22 F I2 ) F 22 " If (XI, U 2 ) and (VI, 1'2) are unitary (see Remark 2.11), then (2.O) bounds the tangents of the canonical angles between R(XJ) and R(X I ) or R(Yi) and R(Y I ), just as in the ordinary eigenvalue problem. Un- fortunately, the theorem provides only a single bound for both P and Q. This problem is characteristic of the perturbation theory of matrix pairs" There is nothing sacred about the normll.II.F. Any narm that allows the conditions of Tlworem 2.11 to be verified will do" Set l' = II(E 21 , F 21 )1I.F, ij = II(H A + E l2 , H B + F l2 )II.F, {y = dif[(A I , B I ), (A 2 , B 2 )] - max{IIEllllF + IIE 22 I1F, IlFuliF + IlFnlld. If {y > 0 and -- 1 1' < _ 6 2 4' 
310 VI. GENERALIZED EIGENVALUE PROBLEMS 2. REGULAR MATRIX PAIRS 311 thcn there are matrices P and Q satisfying 21 2 1 II (P, Q) 11.1' S; - ,( _ _ < b 8 + 8 2 - 4,17 then there are matrices P and Q satisfying such tha t t]J(' columns of - } -. } " \/f}lI X I =X I +U 2 P and 2= 2+ I";:" span left and right complemental}' eigenspaces of (A + E, B + F) cor- responding to the regular pairs [AI + Ell + (H A + E l2 )P, BI + FIl + (H B + F I2 )P] 2- - II(P, Q)IIF S; - \t' < 2:2 8 + 8 2 - 41 1 / 8 such that the columns of Xl = Xl + U 2 P and }'2 = Y2 + VI QH span left and right complementary eigenspaces of (A + E, B + F) cor- responding to the regular pairs and (AI + Ell + E I2 P, BI + Fll + F I2 P) [A2 + E 22 + Q(H A + E I2 ), B 2 + F 22 + Q(H B + F I2 )] The spectra of these pairs are disjoint. If instead of starting with the block triangularization (2" 10), we start with a spectral resolution -- that is with a block diagonal form - then II A and H B vanish and we obtain a sharper bound on the spectra. Theorem 2.15. Let R(X I ) be an eigenspace of the regular pair (A, B), and let the pair have the spectral resolution (2.12) and (2.13). Given the perturbation (E, F), let and (A 2 + E 22 + QE I2 , B 2 + F 22 + QF I2 ) The spectra of these pairs are disjoint. When XI has only one column-i.e., when it is an eigenvector-- Theorem 2.15 shows that the approximation (0- 1 , (31)  (al +yI EXI, (31 + yr FXI) is accurate up to terms of second order in the error. Thus the theorem gives another proof of Theorem 2.2. Notes and References ( : ) E(X, U,)  (;: ;:), ( }: ) F(X, U,)  (;;: ;:) lIT 1 2 < l' The first order perturbation analysis is new, as is the systematic use of the condition number v defined by (2.2). The Gcrschgorin theory and its application to multiple eigenvalues is from a paper by Stewart [204, 1975]. The simplified bounds are due to Sun [228, 19 8 5] . The generalization of the Bauer- Fike theorem is due to Elsner and Sun [67, Ig82]. This paper also contains generalizations of Henrici's theorem and of the HoffmanWielandt theorem for "normal" pairs --- pairs for which B-1 A is normal (in the case where B is nonsingular). The perturbation theory for eigenspaces is taken from a paper by Stewart [201, 197 2 ], where eigenspaces were called deflating subspaces. Although this theory is asymptotically sharp, it is complicated by the fact that the function dif is not easy to interpreL When the concern is with eiglmvectors, it is possible to write out explicit perturbation expansions (Exercise 2"G). Set 1 = II(E 21 , F 2 dIlF, i} = IIA + E 12 , F I2 )IIF, '6 = dif[(A I , Bd, (A 2 , B 2 )] - max{IIEIlIIF + IIE 22 11F, IIFllllF + IIF 22 1IF}. If 8 > 0 and 
312 VI. GENERALIZED EIGENVALUE PROBLEMS Exercises 1. Let (A, B) be a regular matrix pair. Then there is a permutation matrix P such that no diagonal of (AP, BP) is (0.0). [Hint: Use Theorem IL3.14.] 2. Ld X be an eigenspan' of t.lw regular pair (A, B) and let AX Ilh = BXIA I as in (2"J J). Show that. if z is an eigenvector of (AI, B I ! then XBlz (or X I A I z if 11] z = ()) is an eigenvector o.f (A, B). Conversely If x   IS an pig(nv(d,or of (A, B), then t.here is an eIgenvector z of (AI, BI) such t.hat x = XIBlz (or :1: = XIAIZ if BIz = 0). 3. Let X be an eigenspace of the regular pair (l B). Show that if B is nonsin/!;ular t.hen X is an invariant subspace of B A. 4. Show that (1if[(A I + E I , ill + F 1 ), (A 2 + E 2 , B 2 + F2)] 2 dif[(I1I, lJ 1 ), (112. 11 2 )] -- maxi 11/<;1112 + 11/<;2112, 11F1112 + IW,dI2} 5. Show that dif[(A1,J), (A 2 ,J)] :::: sePF(A I , AI). Moreover, if IIA1112, IIAI1I2 :::: 1, then 1 dif[(AI,J), (A 2 , 1)] 2 2 sepF (AI, AI). G. UIHlPr t.he hypot.heses of Theorem 2.13, show that when Al = 0'1 is a scalar, j:1 = .1:1 - U 2 (j3 1 A 2 - ():IB2)I(j319A - n19B) + ()(11(9A,9B)II}). 3. Definite Matrix Pairs In this the concluding section we will treat the perturbatin o eigen- values and eigenspaces of definite matrix pairs. We begm wIth the I f Corolla r y IV 4 6 which g ives a uniform bound for all the ana ogue 0 . " , ., . eigenvalues of a definite pair. We thn look at the speClal.lzatJon to definite pairs of our general theory of elgenspaces devlopedm the last section. Finally we consider some direct bounds for elgenspaces. Tlm>llgllOllt, this sectioll (A, B) will dcnote i1 definite nw- trix pair of order n. 3. DEFINITE MATRIX PAIRS 313 3.1. Eigenvalues of Definite Pairs Let us begin with some general observations on the condition of eigen- values of a definite matrix pair. If x is an eigenvector of (A, B) corre- sponding to the eigenvalue (0',/3) = (xHAx,xIlBx), then the number "x" II = V (x H Ax)2 + (x Il Bx)2 is a condition number for (0', /3) in the chordal metric. This fact has two consequences. First, the eigenvalues of a definite pair, unlike the eigenvalues of a Hermitian matrix, are not automatically well conditioned. As in the Hermitian case, small eigenvalues can be ill conditioned in a relative sense; but eigenvalucs of ordinary sizc can be ill condit.ioned in an absolut.e licnsc" For example, the eigcnvalue (1) of the pair [ ( O02)' ( OOl)] is insensitive to perturbations of magnitude 10- 4 , but the eigenvalue (2) is quite sensitive. Second, we defined (A, B) to be definite if the number ,(A, B) = min V (x H Ax)2 + (xHB.r)2 IlxllFI is nonzero. We now see that ,-I(A, B) is an upper bound on the condition of the eigenvalues. Thus although the eigenvalues of a definite pair can be ill conditioned, the degree of ill conditioning is bounded. The motor that drives the perturbation theory of Hermitian matri- ces is the natural ordering of the real line, which defines an association between the eigenvalues of a Hermitian matrix and its perturbation. Eigenvalues of definite pairs also have an ordering, although it is not as natural. To define it, let (A, B) and (A, B) be definite pairs. By The- orem 1.18 the field of values F(A + iB) lies in a half plane that does not contain the origin and F(A + ill) lie in another such half plane" Therefare, there is a ray CJ, emanating from the origin that lies in nei- ther half plane. Given any real pair (0', /3) =/c (0,0), define 8(0, /3) to 
314 VI. GENERALIZED EIGENVALUE PROBLEMS 3. DEFINITE MATRIX PAIRS 315 be the angle the line from the origin to (0:, (J) makes with 0, measured clockwise. This construction allows us to associate angles with the eigenvalues of a definite pair and a perturbation of the pair. Specifically, we will suppose the pair (A, B) has eigenvalues (0:;, (J;) (i = 1,. .. , n) and set (}i = (}(O:i, (3 i )" (}=o (} , I (} II I tl LES f (A B) W e will assume Tlw mun )('rs ; are ca e( Ie F:IGENANG . 0 , . that the eigenangles are ordered so that (}i o ::; (h ::; . . . ::; ()n < 7r. (3.1 ) Figure 3.1: Eigenangles and Their Bounds The eigenangles of the pair (A, B) are defined similarly. Eigenangles have a variational characterization. Lemma 3.1. With the ordering (3" 1), the eigenangles of the definite pair (A, B) satisfy then (A, B) is definite. Moreover, if the eigenvalues (O:i, (Ji) are 01'- dere so that their eigenangles (}i are nondecreasing and the eigenvalues (ii;, (Ji) (i = 1,."., n) are ordered similarly, then lei - 0;/ <  and . ( II A , li B ) e i = nun "max() x x,x x dim(X)=, TE.\' x#o (3.2) x( (O:i, (Ji), (6 i , i3i)) ::; PD[(A, B), (A, B)], i=I,...,TL (3.4) and Proof. Recalling that (A, B) is definite if and only if ')'(.1, B) > 0, we have () = max min()(xHAx,xHBx). , tlim(X)=n-i+1 TEX ,,#0 (3.3) ')'(A, B) 2: min { V (x H Ax)2 + (xIIBx)2 - V (x H Ex)2 + ( :r H Fx)2 } Ilxl!FI 2: (1 - (h(A, B) > O. Proof. By Theorem 1.18, we may assume that B is positive definite. Then for some fixed angle eo, (x Il A X ) II II -I ()(x Ax,x Bx)=eo+cot xlIBx ' Hence (A, B) is definite. Now SUppose that B i 2: e i " Let A' be a subspace for which the minimum is attained in (3.2). Then The lemma now follows from Fischer's min-max characterization (Corol- lary 1.16)" . The main theorem of this section bounds the chordal metric of the perturbation of the eigenvalues in terms of the metric po introduced in Section 1. Theorem 3.2. Let (A, B) be a definite pair and let (.4, B) = (A + E, B + F). If B i ::; max e( x H Ax, xII B x). TEA' x#o Let x be a vector for which the above maximum is attained. Then ()(xHAx,xHBx)::; e i ::; (}i::; ()(xIIAx,xHBx). (== max """"2= I (xIIEx)2 + (x H Fx)2 2 < 1, (:r: 1I A:r:)2 + (:r: 1I B:r:) These inequalities are pitured!n FiguEe 3.1, in which (xIlAx,xHBx) is denoted by Z and (xHAx, XII Bx) by Z. Since ( < 1, we hav IZZ, < 1021. Ilence the angle ZOZ is less than , which implies that e i - e; < . Moreover, if we let ZP be the line 
316 VI. GENERALIZED EIGENVALUE PROBLEMS 3. DEFINITE MATRIX PAIRS 317 from 'l perpendicular to OZ, then elementary geometry gives Example 3.4. Let - -" I ()i - (Ii :S ZOZ = sm _ (  ) 2 1 10ZI A= ( 1 0 ) o -1 ' and for '/ > 0 let (:r ll A:r:r ll A:r + :1'11 A:r:r ll A:r)2 = sin- I 1 - __ [(:r Il A:r)2 + (:l:IIB:r)2][(x Il Ax)2 + (x Il B:r)2] = sin- 1 X[(x ll Ax, xII Bx), (Xli Ax, xII Ex)] B =  ( 1 + 1/ '/ - 1 ) 2 ,/-11+1/ . The eigenvalues of Bare 1 and 1/, the latter corresponding to the eigen- vector 1. Since 1 H A1 = 0, 1--  sin- PD[(A, B), (A, B)]. ,(A, B) = 1/- 1 . Since trace(A- 1 B) = 0 and det(A- I B) = ry, the eigenvalues of the pair (A, B) are (:f: 1/ fij). Now the corresponding eigenvectors are of the form x = (1 OT, where (3.5) But is is easily verified_ that X[(O'i,.8i), (Ci i ..!;3i)] sin(Oi - ()i)' Hence (3.4) holds when ()i :S ()i' The case ()i 2: ()i is established in a similar manner, beginning with the characterization (3"3). . We may obtain a bOllnd in terms of II Ell and IIFII by observing that [( A B ) ( A E )] < ( < jllEII2 + IIFII2 Po , , , - - ,(A, B) l-ry  = 1 :f: 2 r.;  1 =F 2Jii" v ' /- '/ It follows that _ _ j llEII2 + 11F112 p( (0';, .8;), (fYi, .8i)) :S ,(A, B) , :rTx < _ 1 max Vi < i - Ix T Axl '" 2fij' Thus ,(A, B) = 0(1/-1) while maxi Vi = 0(1/-). Even this example would not be damning if ,( A, B) really reflected the effects of perturbations when ( is near one. However, even in this case the condition numbers will give a more realistic estimate. The reason is that for the second inequality in (3"5) to be realistic, the values of (x H Ax)2 + (:r H Bx)2 and (.r H Ax)2 + (X ll iJX)2 must be nearly minimal, which is unlikely_ Thus, we havC' the following corollary" eorollary 3.3. 1f j llEI12 + IIFII2 < 1, ,(A, B) the jJijir (A, B) is definite and i = 1,..., n. (3.6) Theorem 3.2 and its Corollary 3.3 have pretty forms, but their con- tent is less than satisfactory. The bound (3.6), for example, depends on ,(A, B)I, which is greater than the largest individual condition number of the eigenvalues. Of course it is to be expected that a bound for all the eigenvalues would depend on maxi Vi, since it must take into account the worst case" The trouble is that ,(A, B)I can be arbitrarily largpr than max; 1/;, i1S the' following example shows. 3.2. Eigenspaces The theory of eigenspaces for definite pairs, like its counterpart for Hermitian matrices, is both simpler and more complex than the general theory. On the one hand, the assumption of definiteness simplifies the general theory; on the other hand the same assumption gives us more structure to exploit in extending the theory" The basic fact of eigenspaces of definite pairs is that a right eigen- space is also a left. eigenspace. 
318 VI. GENEItALIZED EIGENVALUE PROBLEMS :3. DEFINITE MATRIX PAInS 319 Theorem 3.5. Let (A, B) oe definite" Let the columns of XI span an eigenspace of (A, B). Then there is a matrix X 2 such that (XI X 2 ) is nonsingular and the pair (A, B) has the spectral resolution and ( XII ) ( A 0 ) .:, A(X 1 X 2 ) = I Xl 0 A 2 (3.7) max IIxdl ::; 1'-I(A, B). , The normalization of the resolution allows us to give an explicit bound for the function dif in terms of the eigenvalues of the pair (A, B). Theorem 3.6. Let AI, A 2 , B I , and B 2 satisfY' (3.9), and let /5 = Inin '" (( 0:" (3 , ) ( 0:" (3 " )] l<l<k A 1, l, l' J . k+lJn amj ( XII ) I B(X I X 2 ) = X 2 ( BI 0 ) o B 2 (3.8) Then Proof. It is easily verified that R( XI) is an eigenspace of the pair (A cos 1J - B sin 1J, A sin 1J + B cos 1J)" Hence by Theorem 1.18 we may assume that B is positive definite. It follow that XI and BX I are acute (Definition III.3.2). Hence if the columns of X 2 form a basis for R(XdJ., the matrix (XI X 2 ) is nonsingular (Exercise lII.3.2) and XJ BX I = O. Since R(AX I ) C R(BX I ) it follows that XJ AX I = 0, which establishes (3.7) and (3.8). Since the pairs (Ai, B i ) (i = 1, 2) are definite, by Corollary 1.19 there are nonsingular matrices U i such that (Ul l AiU i , ut J BiU i ) are diagonal. If we make the substitutions Xi +-- XiU i (i = 1,2), then (Xl X 2 ) diagonalizes (A, B). . The second part of the theorem allows to assume that the matrices AI, ib, B I, and B 2 of the spectral resolution (3.7)'-{3.8) have the form /5 V2 ::; dif[(A I , Bd, (A 2 , B 2 )] ::; /5. Proof. To establish the lower bound we must show that for all Rand S the solution of the system Moreover, XI and X 2 may oe chosen so that AI, A 2 , B I , B 2 are diagonal (1.e", the columns of (XI X 2 ) are eigenvectors.) QA I + A 2 P = R QB I + B 2 P = S (3"11) satisfies II(P, Q)IIF ::; J211(, S)IIF . (3.12) If we postmultiply the first of the equations (3.11) by BI and the sec- od by Al and subtract, we get (remember that since Al and BI are dIagonal, they commute) A 2 P BI - B 2 P Al = RBI - SA 2 . Hence the (i, j)-element of P is given by Al = diag(O:I"'" Ok), BI = diag((3I, . . . , (3k), 0 2 + (3 2 = 1 , , , A 2 = diag(ok+I" . . , on), B 2 = diag((3k+I,. .., (3n), i=I,...,n. (3"9) ., _ Pij(3j - aijaj "'J - Oi+k(3j - (3i+kOj Since the diagonal pairs (AI, Bd and (A 2 , B 2 ) are normalized, 1 7r 1 2 < Ip'JI 2 + la'Jl 2 'J - /52 (3.13) In this case we will say that the spectral resolution (3.7)-(3.8) is NOR- MALIZED. Among other things, normalization implies that the columns of X = (XI X 2 ) satisfy Hence IIxdl = Vi, i=I,...,n, (3.10) IIPIIF ::; /IIRII + IISII < J211(R, S)IIF /5 - /5 . 
320 VI. GENERALIZED EIGENVALUE PROI3LEMS 3. DEFINITE MATRIX PAIRS 321 I3y a similar argument, and IIQIIF ::; J211, S)IIF , - 1 6=- J2 nliq P[(O'i,i), (O'j,j)] - nvmax{IIEII2, IIFII2)}' k+l:'SJn ane! (3" 12) follows" To C'stahlish the uppC'r boune!, we must show that there are matrices R ane! S such that the solution of the system (3.11) satisfies If i 1 <- Ii 2 ' then there are matrices P and Q satisfying II(P, Q)IIF 2: II(R'6 S )IIF . 2- - II(P, Q)IIF ::; l' < 22: 6 + / 6 2 - 4i 2 6 (3.14) Let the minimum in the definition of 6 occur for the pairs (O'k+i' k+i) ane! (ctj, j). Let R = sign(j)lilJ and R = sign(O'j)li1J, so that II(R, S)IIF = 1. Then from (3"13), II(P, Q)IIF 2: l 7r ijl = IOjl ; Ijl 2: l = II (R'6 S )IIF . . such that the columns of XI = XI + X 2 P and X 2 = X 2 + XIQII span comp1emcntary eigenspaces of (A, B) corrcsponding to the pairs (AI + Ell +Efip+pHE21 +pH E 22 P, BI +Fll +F2P+plIF21 +p H F 22 P) (3.15) and V>le may now combine all these facts into a perturbation theorem which is essentially a corollary of Theorem 2"14. (A 2 + E 22 + E 21 QH +Q E +Q Ell QH, B2+ F 22 + E 21 QH +Q Fn +Q FII QH). (3.1G) Theorem 3.7. Let, Uw definite pair (/1, B) havc I,he s]Jcdra1 rcso1u- tion (3.7)(3.8) satisfying (3"9). Let Vi (i = 1,..., n) oe the condition numoers of the eigenvalues of (A, B), and set Proof. We have ( X:' ) E ( X X ) = ( Ell X II 1 2 E 2 21 ( xp ) ( Fll 'H F(X I X 2 ) = , -\2 F 21 Ell ) 21 E 22 ' p,H ) 21 F 22 . IIEllllF ::; IIXdlIIEII2 ::; k:vIlElb, the last inequality following from (3.10). Similarly IIE22I1F ::; (n- k)vIlElb, IlFuilF ::; k v llFlI2, andllF2211F ::; (n - k)vIlFII2' so that 6 is a ower b?und on on dif[(A I + Ell, BI + F ll ), (A 2 + E 22 , B 2 + F 22 )]. The mequahty (3.14) now follows from Theorem 2.14. The pairs (3.15) and (3.15) are obtained by considering the diagonalizing congruences 1/ = luax Vi. i Giw'n the Hermitian perturbations A = A + E and iJ = B + F set ( I PH ) ( Au + Ell E ) ( I QlI ) Q I E 21 A 22 + E 22 P I and i = II(E 21 , F2dllF ( I PH ) ( Bll + Fu Fi: ) ( I QII ) . . Q I F21 B 22 + F 22 P I Let 
322 VI. GENERALIZED EIGENVALUE PROULEMS 3. DEFINITE MATRIX PAIRS 323 When E and F are sufficiently small, the bound (3.14) assumes the asymptotic form But XI - XI = X 2 P. Hence I!X I - XIIiF < IIX21!, max{I!Elb, IIFI!2} I!XII!F '" 15 (n - k)vmax{I!Elb, I!Fl!d < . '" 15 Thus the ratio of the overall condition of the eigenvalues to their sepa- ration is a condition number for the problem. (3.17) such that XI + ql is an eigenvector of (A, E) corresponding to the re- maining eigenvalue ('\1)' Note that this bound is closely related to the asymptotic bound (3.17). The factor (n - k)v/15, which multiplies the error in (3.17), corresponds to the factor ,(A, E)-I /15 in (3.18). We turn now to thearems that bound the sin of the canonical angles between eigenspaces" As usual they come in two varieties: one in the Frobenius norm that requires no restrictions on the situation of the eigenvalues and one in any unitarily invariant norm that requires the eigenvalues to be suitably clustered. Theorem 3.9. Let the definite pair (A, B) be decomposed as in (3" 7) and (3.8), where XI and X 2 have orthonormal columns. Let the anal- ogous decomposition be given for the pair (A, B) = (A + E, B + F)" If I!(P, Q)I!F:S II(E 21 ,;2dIlF , where 15 =  min P[(Oi,,Bi), (OJ,,Bj)]. V 2 lS'Sk k+l$.Jo<;;n Since IIE 21 I1F:::; IIX I IIFIIX 2 11FIIEIb and similarly for IIF21I1F, we have II ) 11 < IIX I II F IIX 2 11F max{11 Elb IIF112} . [F", 15 15 == min{X((A), (,\)) : A E .c[(A I , B l )],,\ E .c[(A I , E I )]} > 0 3.3. Direct Bounds We conclude this chapter with three direct bounds for eigenspaces, which we state without proof. The first bound is for an eigenvector. Theorem 3.8. Let XI be a eigenvector of the definite pair (A, B) with eigenvalue (AI)" Suppose that the definite pair (A, E) has n - 1 eigen- values ('\i) (i = 1, . . . , n) such that 15 == min X( (AI), (.\i)) > O. ,>1 then II sin 8 [ R ( X ) R ( X )]11 < VIl A2 + 211: VIIEXIIlf. + IIFXIIlf. . I ,IF - ,(A, B)r(A, B) 15 Finally, we state a sin 8 theorem that is valid for all unitarily in- variant norms. Theorem 3.10. Let the definite pair (A, B) be decomposed as in (3.7) and (3.8), where Xl and X 2 have orthonormal columns. Let the anal- ogous decomposition be given for the pair (A, in = (A + E, B + F). Suppose that there are numbers 0 2: 0 and 15 > 0 with 0 + 15 :::; 1 such that for some real number, .c[(Aj,B I )] C {(A): X((A),(r)):::; o} and f = V IlEII + I!FI!. if f / 15 < ,(A, E), then there is a vector PI satisfying .c[(A 2 , E 2 )] C {(A) : X( (A), (r)) 2: 0 + 15}. Let Then 11Th 112 < =: _ < 1 lI:rllb - 15,(A, B) (3.18) II sin 8[R(Xd, R( Xj)]I!F 7r(o, 15; ') V Il A2 + B21b V IlEXjll + IIF X 1 1!f. < - - , - ,(A, B)r(A, B) 15 
VI. GENERALIZED EIGENVALUE PROBLEMS 324 where n(n,h;)  { )2 (0' + 6) + O'V I - (0' + 6)2 20' + 6 (0' + 6) + O'VI - (0' + 6)2 20' + 6 if I #- 0, if I = O. References Notes and References 1'11(' conncction of the number ,(A, B) with perturbation theory for definite pairs was first noted by Crawford [48, 1976], who used it to derive bounds on the spectral variation of matrix pencils. Stewart [207, 1979] introduced the angles associated with the eigenvalues and used the induced ordering to bound the matching distance" The sharper form of the theorem given here is due to Sun [222, 1982]. Theorem 3.8 is a special case of a theorem of Stewart [207, 1979]. The sin 8 theorems are due to Sun [223, 225, 1983]. The second paper also contains sin 28 theorems. A troublesome feature of the sin 8 theorems is the appearance of two infima, in the denominator of the bounds. Whether both of them should be there is an open question" .J ust as thl' singular valul' d{composition is rdated to an associated semidd- initc matrix, the generali7,ed singular value decomposition (Exercise 1.5.8) is rdated to an associated definite pair. Sun [223, 224, 1983] gives perturbation bounds for the generalized singular value decomposition. Paige [173, 1984] gives a different derivation and bounds on the CS decomposition. [1] N. N. Abdehnalek (1974). "On the Solution of Least Squares Problems and Pseudo-Inverses." Computing 13, 215 228. [2] S. N. Afriat (1956). "On the Latent Vectors and Characteristic Values of Products of Pairs of Symmetric Idempotents." Quarterly JOUT'nal of Mathematics 7, 76-78. [:!] S" N" Afriat (1957). "Orthogonal and Oblique Projectors and the Char- acteristics of Pairs of Vector Spaces." Proceedings of the Cambridge Philosophical Society 53, 800-816. [4] A. R. Amir-Moez (1956). "Extremal Properties of Eigenvalues of a Hermitian Transformation and the Singular Values of the Sum and Product of Linear Transformations." Duke Mathematical Journal 23 463-476. ' Exercises 1. Show that the ill-conditioning of one eigenvalue of a definite pair can infect the others by considering the matrices A = diag( 1, 10- 8 ), B = diag( 1,2 . 1O8), [5] T. Ando (1989). "Majorization, Doubly Stochastic Matrices, and Com- parison of Eigenvalues." Linear' Algebra and Its Applications 118, 163 -248. _ ( 1 .j2 . 10- 8 ) A= , .j2 . 10- 8 2 . 1O8 [6] M. Arioli, 1. S. Duff, and P. P. M. van Rijk (1989). "On the Auge- mented System Approach to Sparse Least-Squares Problems." Nu- mer'ische Mathematik 55, 667685. [7] L. Autonne (1902). "Sur II's groupes lineaires, reels 1'1, orthogonaux." Bulletin de la Societe Mathernatique de Fr'ance 30, 121-134. [8] L. Autonne (1913). "Sur II's matrices hypohcnnitiennes 1'1, II's unitairs." Comptes Rendus de l'Academie des Sciences, Par'is 156, 858"-860. and B = B. 2. Show that the eigenanglcs O(oo,{3) and 0(&,13) satisfy sinI0((1,jj) - O(oo,{3)1 = X((o:,{3), (&,/3)). 325 
326 REFERENCES REFERENCES 327 [g] S. Banach (1922)" "Sur les operations dans !fs ensembles abstraits et lcur application aux equations integrales." Fundementa Mathematicae 3, 133 181. [10] S" Banach (1929). "Sur les functionnelles lineaires II." Studia Mathe- 1/wfica 1, 223 2:39. [11] It IL Bartels and G. W" Stewart (1972). "Algorithm 432: The Solution oft.lH' Matrix Equation AX -IJX = C"" C01/l,1/I,unication8 of the AC1H 8, 820 826. [12] H. Bateman (1908). "A Formula for the Solving Functionof a Cer- tain Integral Equation of the Second Kind." Cambridge Phzlosophzcal Transactions 20, 179- 187. [13] F. L. Bauer (1963). "Optimally Scaled Matrices." Numerische Math- ematik 5, 73-87. [14] F. L. Bauer (1966). "Genauigkeitsfragen bei der Losung linear Gleich- ungssysteme." Zeitsehrift Fir' angewandte Mathematzk und Mechanzk 46, 409421. [15] F. L. Bauer and C" T. Fike (1960). "Norms and Exclusion Theorems." N1L1/U'7"i8r:he Mathematik 2, 137 141. [16] F. L" Baucr and A. S. Householder (1960). "Moments and Character- istic Roots." Numer"ische Mathematik 2, 4243. [17] H. Baumgiirtel (1972). Endlichdimensionale Analytische Stonmgs- theOT'ie. Akademie- Verlag, Berlin. Cited in [28]. [18] C. Bavely and G. W. Stewart (1979). "An Algorithm for Computing Heducing Subspaces by Block DiagonalillatioTL" SIAM .Jom'nal on Numerical Analysis 16, 359-367. [19] A" E. Beaton, D. B. Rubin, and J. L. Barone (1976). "The Accept- ability of Regression Solutions: Another Look at Computational Ac- curacy." .JoU7'7wl of the A merican Statistical Association 71, 158-168. [20] E. F. Beckenbach and R. Bellman (1971). Inequalities. Springer, New York. [21] R. Bellman (1970)" Int7'Oduction to Matrix Analysis. Mc Graw-Hill, New York. [22] D. A. Bclsley, A. E. Kuh, and R. E. Welsch (1980). Regression Diag- 1I08tirs: Identifying Infiurntial Data and Sou.rces of Collzneanty. John Wiley and Sons, New York. [23] E. Beltrami (1873). "Sulle Funzioni Bilineari." Giornale di Matem- atiche ud 1LSO Degli Studenti Delle Unive7'sita, 11, 98 106. [24] A. Ben-Israel (1966). "On Error Bounds for Generalized Inverses." SIAM Jom'nal on Nume7'ical Analysis 3, 585- 592. [25] I. Bcndixson (1902). "Sur les racines d'une equation fondemental." Acta Mathematica 3, 359-366. [26] P. G. Bergman, It Penfield, R. Schiller, and If" Zatkis (1950). "The Hamiltonian of the General Theory of Relativity with Electromagnetic Field." Physical Review 52, 1950. [27] E. Berkson (1963). "Some Metrics on the Subspaces of a Banach Space." Pacific .Journal of Mathematics 13, 7-22. [28] R. Bhatia (1987). Perturbation Bounds for Matrix Eigenvalues. Pit- man Research Notes in Mathematics. Longmann Scientific & Techni- cal, Harlow, Essex. Published in the USA by John Wiley. [29] R Bhatia and C. Davis (1984). "A Bound for the Spectral Variation of a Unitary Operator." Linear and Multilinear Algebra 15, 71 76. [30]R. Bhatia, C. Davis, and P. Koosis (1987). "An Extremal Problem in Fourier Analysis with Applications to Operator Theorem." Preprint cited in [28]. [31] R. Bhatia, C. Davis, and A. McIntosh (1983). "Perturbation of Spec- tral Subspaces and Solution of Linear Operator Equations." Linear Algebra and Its Applications 52-53, 45-67. [32] R. Bhatia and J. A. R. Holbrook (1985). "Short Normal Paths and Spectral Variation." Proceedings of the Amer'ican Mathematical Soci- ety 94, 377. 382. [33] G. D. Birkhoff (1946). "Tres Observaciones Sobre el Algebra Lineal." Univer'sidad Nacional de Tucuman Revista, Serie A 5, 147 _ 151. [34] A. Bjerhammer (1951). "Rectangular Reciprocal Matrices with Special Reference to Geodetic Calculations." Bulletin Geodesique 52, 118-220. [35] A. Bjorck (1967). "Solving Linear Least. Squares Problems by GralIl- Schmidt Orthogonalization." BIT 7, 1-21. [36] A. Bjorck (1989). "Componentwise Backward Errors and Condition Estimates for Linear Least Square Problems." Technical Report LiTH- MATH - R-1989-13, Department of Mathemat.ics, Li nkoping U ni versi ty. 
328 _________,__ REFERENCES REFERENCES 329 [37] A. Ujijrck and G. tL Golub (197:3). "Numerical Methods for Comput- ing Angles between Linear Subspaces." Mathematir.s of Computation 27,579 594. [38] Ake Bjijrck (1987). "Least Squares Methods." Working paper, Depart- ment of Mathematic, Link6ping University. To appear in Handbook of NumfTical Analysis, V.i: Solution of Equations in R", P. G. Ciarlet and .1" L. Lions editors, Elsevier, North Holland. [39] C. W" Borchart (1857). "13emerkung iiber die beiden vorstehenden Aufsiitlle." .l01lnwl j1"il' die nine und angrwandte Mathematik 53, 281 - 283. [40] .J. n. Bunch, .1. W. Demmel, and C. F. Van Loan (1989). "The Strong Stability of Algorithms for Solving Symmetric Linear Systems." SIAM .l01l1'Twl on Matri.T Analysis and Applir.ations 10, 494-499. [41] A. L. Cauchy (1821). "Cours d'analyse de l'Ecole Royale Poly tech- nique." In Oeuvres Completes (Ir Serie), volume 3. [42] A. L. Cauchy (1829). "Sur l'equation Ii l'aide de laquelle on determine les inegalites scculaires des mouvements des planetes." In Oeuvres Completes (lIe Serie), volume 9. [43] F. Chatdin (1983). Spectral Approximation of Lincar Operators" Aca- demic Press, N(,w York. [,tot] P. L. ChebyslHv (1859). "Sur !'interpolation par la methode des moin- dres carres." Mhnoires de l'Academie Imperiale des sciences de St.- Petrrsbo1l1:q, VIle serie 15, 1 24. [45] A. K. Cline, C. 13. Moler, G. W. Stewart, and .1. H. Wilkinson (1979). "An Estimate for the Condition Number of a Matrix." SIAM Journal on Numerical Analysis 16, 368-375. [46] R. Courant (1920). "Ueber die Eigenwert bei den Differentialgleichun- gen der Mathematischen Physik." Mathematische Zeitschrift 7, 1 57. [47] P. J. Courtois and P. Semal (1984). "Error Bounds for the Analysis by Decomposition of Non-Negative Matrices." In G. Ialleolla, P. ,J. Cour- tois, and A. Honlijk, editors, Mathematical Computer Pelfonnance and Reliability, pages 287 302. Elsevier, North Holland. [48] C. K Crawford (1976). "A Stable Generalized Eigenvalue Problem." SIAM Journal on Numerical Analysis 13, 854860. [49] R. B. Davies and 13. Hutton (1975). "The Effects of Errors in the Independent Variables in LiIlf'ar Regression." Biometl'ika 62, 383- 391. [50] C. Davis (1963). "The Rotation of Eigenvectors by a Perturbation." Journal of Mathematical Analysis and Applications 6, 159 173. [51] C. Davis (1965). "The Rotation of Eigenvectors by a Perturbation. II." Journal of Mathematical Analysis and Applications 11, 20-27. [52] C. Davis, W. Kahan, and H. Weinberger (1982). "Norm-Preserving Dilations and ThP.ir Applications to Optimal Error 13ounds." SIAM Joun/,al on Numerical Analysis 19, 445-469. [53] C. Davis and W. M. Kahan (1970). "The Rotation of Eigenvcctors by a Perturbation. III." SIAM Journal on Numel'ical Analysis 7, 1 46. [54] H. P. Decell (1972). "On the Derivative of the Generalized Inverse of a Matrix." Linear and Multilinear Algebra 1, 357- 359. [55] J. W. Demmel (1987). "On Condition Numbers and the Distance to the Nearest Ill-Posed Problem." Numel"ische Mathematik 51,251290. [56] J. E. Dennis and J. J. More (1977). "Quasi-Newton Methods, Moti- vations and Theory." SIAM Review 19, 46-89. [57] J. E. Dennis and R. B. Schnabel (1979). "Least Change Secant Updates for Quasi-Newton Methods." SIAM Review 21, 443- 459. [58] .1. Desplanques (1887). "ThcorilIle d'algcbra." .1. de Math. Spcc. 9, 12 13. Cited in [152]. [59] J. J. Dongarra, J. R. Bunch, C. 13. Moler, and G. W. Stewart (1979). LINPACK User's Guide. SIAM, Philadelphia. [60] P. Van Dooren (1979). "The Computation of Kronecker's Canonical Form." Linear Algebra and Its Applications 1979, 103 -140. [61] M. P. Drazin (1958). "Pseudo-Inverses in Associative Rings and Semi- groups." American Mathematical Monthly 65, 506 -514. [62] L. Dulmage and I. Halperin (1955). "On a Theorem of Frobenius- Konig and J. von Neuman's Game of Hide and Seek." Transactions of the Royal Society of Canada, Section 3, Thil'd Series 49, 23,-25. [63] C. Eckart and G. Young (1936). "The Approximation of One Matrix by Another of Lower Rank." Psychometrika 1, 211,"218. [64] L. Eldll (1983). "A Weighted Pseudoinverse, Generalized Singular Values, and Constrained Least Squares Problems." BIT 22, 487502. [65] L. Elsner (1982). "On the Variation of the Spectra of Matrices." Linear Algebra and Its Applications 47, 127138. 
330 [ GG] [67] [G] [69] [70] [71 ] [72]  [73] [74] [75] [76] REFERENCES REFERENCES 331 L. Elsner (1 9S5). "An Optimal Bound for the Spectral Variation of Two Matrices." Linear" Algebr'a and Its Applications 71, 77 -80. L. Elsner and J.-G. Sun (1982). "Perturbation Theorems for the Gen- prali7,pd Eigpnvalup Problem." Linear" Algdnu and Its Applications 48, :\,11 :\;)7. V. N. Faddcpva (1959). Cornprtfational Methods of Linear Algebra. DoV('r, Npw York. Translated from the Russian by C. D. Benster. S. Falk (1965). "Einschliessungssatze fur die Eigenvektoren nor- malcr Matrizenpaare." Zeitschr-ift fur Angewandte Mathematik und Mechanik 45,47-56. K. Fan (1951). "Maximum Properties and Inequalities for the Eigenval- ues of Completely Continuous Operators." Proceedings of the National Acadrmy of Scirncrs 37, 760 766. K. Fau and A. .L Hoffman (1955). "Some Metric Inequalities in thc Space of Matrices." Pr'oceedings of the A merican Mathematical Society 6, 111116. D. 13. Feingold and R. S. Varga (1962). "l3lock Diagonally Dominant Matrices and Generalizations of the Gerschgoring Circle Theorem." Pacific .Journal of Mathematics 12, 1241-,1250. W. Feller and G. E. Forsythe (1951). "New Matrix Transformations for Obtaining Characteristic Vectors." Quarterly of Applied Mathematics 8, 325 33L E. Fischcr (1905). "Dber quadratische Formen mit reelen Koffizien- ten." Monat.hefte Fir Mathematik und Physik 16, 234249. .L G. F. Francis (1961, 19(2). "The QR Transformation, Parts I and II." Computer Journal 4, 265271, 332345. F. G. Frobenius (1911). "(Iber den yon L. Bieberbach gefundenen Beweis eincs Satzes von C. Jordan." Sitzungsberichte der K oniglich Pnubischcn Akademie der' Wisenschaften zu Berlin, 49250L In [79, v. 3, pp. 492-501]. [77] F. G. Frobcnius (1911). "Dbcr die unzerlegbaren diskreten Bewegu- ugsgruppelL" Sitzungsberichte da Koniglich Pr'eusischen Akademie dcr Wissenschaften Z1l BeT'lin, 507518. In [79, v. 3, pp. 507518]. [7R] F. G. FrolH'nius (1!Jl2). "Ubcr Matrizen aus nicht ncgativen Ele- lllenten." Sitzungsberichte dcr Koniglich Preusischen Akadernie der Hfisscns('!wjten zu BeT'lin, 456 477. In [79, v. 3, pp. 546-567]. [79] F. G. Frobenius (1968). Ferdinand Georg Frobenius. Gesammclte Ab- handlungen (J.-P. Serre editor). Springer Verlag, Berlin. [80] W. A. Fullcr (1987). Meas1lT'ement ErTor Models. .John Wiley, New York . [81] F. B.. Gantmaclwr (1959). The l'h(xJ1" y () f Malr"i('r V.ols I II ( ' I I " ..., .,., Ie sca Publishing Company, New York. [82] C. F. Gauss (1809). Theoria Motus Corporum Coelestium in Section- lbus Conicis Solem Ambientium. Perthes and Besser, Hamburg. [83] C. F.. Gauss (1809). Theory of the Motion of the Heavenly Bodies Movm g . about the Sun in Conic Sections. Dover, New York (1963). C. H. DavIs, Trans. [84] C... auss (1821). "Theoria Combinations Observationulll Erroribus M 1ll1l1llS 0lJ1loxiap, Pars Prior." In WerA:c, I V, pagcl> 1 26. Kiiniglichcn Gessdlshaft dcI' Wisscnschaften zu GijUinging (1880). [85] S. A.. Grschgorin (1931). "Uber die Abgrenzung der Eigenwerte einer MatrIx. Izv. Akad. Nauk SSSR, Ser. Fiz.-Mat. 6, 749 -754. [86] I. Gohbrg, P. Lancaster, and L. Rodman (1982). Matr'ix Polynomials. AcademIc Press, New York. [87] I. Gohberg, P." Lancaster, and L. Rodman (1986). Invariant Subspaces of Matrzces wzth Applications. John Wiley, New York. [88] G. H. Golub (1965). "Numerical Methods for Solving Least Squares Problems." Nume1'ische Mathematik 7, 206 216. [89] G, H. Golub, S. Nash, and C. Van Loan (1979)" "Hessen1>erg-Schur Method for the Problem AX + X n = C." IEEE 1}"ansactions on Automatic Control AC-24, 90g- 913. [90] G.. H. Golub and V. Pereyra (1973). "The Differentiation of Pseu- dOlI1verses and Nonlinear Least Squares Problems Whose Variables Separate." SIAM Jour'nal on Numerical Analysis 10, 413-432. [91] G. H. Golub and V. Pereyra (1976). "Differentiation of Pseudoinverses Separable Nonlinear Least Squares Problems and Other Tales." Ir M. Z. Nashed, editor, Genemlized Inverses and Applications, pages 303 324. Academic Press, New York. [92] G. H. Golub and C. F. Vall Loan (1980). "An Analysis of the Total Least Squares Problem." SIA M Journal on Numerical A nalY8i8 17 883-893. ' 
332 REFERENCES REFERENCES 333 [!J3] G. H. Golub and C. F" Van Loan (1983)" Matr'iJ; Computations" .Johns Hopkins University Press, Baltimore, Maryland. [94] G. II. Golub and J. H. Wilkinson (1966). "Note on the Iterative Refine- ment of Least Squares Solution." Numerische Mathematik 9, 139 148. [95] G. H. Golub aud ,J. H. Wilkinson (1976). "Ill-Conditioned Eigensys- t('ms all< I the Computat.ion of the ,Jordan Canonical FOrIn." SIAM R('1,i('1/I 18. 578 (j HI" [!Hj] W. 13" Uragg and G. W. St.<,wart (1976). "A Stahle Variant of the Secant Method for Solving Nonlinear Equations." SIAM Jou77wl on Numairal Analysis 14, 880-903. [97] ,J. P. Gram (1883). "trber die Entwickelung reeler Functionen iI Rei- hen mittelst der Methode der kleinsten Quadrate." Journal fur dze r'eine und angewandte Mathematik 94,41--73. [98] W. H. Greub (1967). Linmr' Algebm. Springer-Verlag, New York. [99] H. Hahn (1927). "Vber lincare Gleichungssysteme in linearen Raumen." Journal Fir die reine und angewandte A1athematzk 157, 214 229. [100] P. Hall (1935). "On Representation of Subsets." Journal of the London Mathematical Society 10, 26- 30. [10 1] P. R. Ilalmos (1 950). "Normal Dilations and Extensions of Operators." Summa Bmsiliensis MaUL 2, 125 134. Cited in [237]. [102] R. J. Hanson and C. L. Lawson (1969). "Extensions and Applica- tions of the Householder Algorithm for Solving Linear Least Squares Problems." Mathematics of Computation 23, 787-812. [10:] G" II. Hardy, J" K Littlewood, and G" P6lya (1934). Inequalities" Cambridge University Press, Cambridge, England. [104] F. Hausdorff (1914). Gnm,dzl"iqr d('!" Meng<'1l.l('hn. Chelsea, New York. H<,printcd by Chelsea, 1949" [105] F. Hausdorff (1919). "Das Wertvorrat einer Bilinearform." Mathema- tische Zeitschrift 3, 314 316" [106] M. Ilaviv and L. van der Heyden (1984), "Perturbation Bounds for the Stationary Probabilities of a Finite Markov Chain." Advances in Applied P1'Obability 16, 804-818. [107] J. Z. Hearon and J. W. Evans (1968). "Differentiable Generalized Inverscs." Jounwl of Rrsean'h of the National B1l7"eau of Standards, S(Ti('s B 72, 109 113. [108] H. V. Henderson and S. R. Searle (1981). "On Deriving the Inverse of a Sum of Matrices." SIAM Review 23, 53 -60. [lOg] P. Henrici (1962). "Bounds for Iterates, Inverses, Spectral Variation and Fields of Values of Nonnormal Matrices." Numer'ische !l1athematik 4, 2439. [110] C. Hermite (1857). "Extrait d'une leUre de M. C. Hermite it M. Bor- chardt. sur l'invariabilite des carr(;s positifs d des carres Il<;gatifs dans la transformation des polynomes homogimes du second degre." Jou77wl fur die r'eine und angewandte Mathematik 53, 271 274. [111] N. J. Higham (1987). "A Survey of Condition Number Estimation for Triangular Matrices." SIAM Review 29, 575-596. [112] N. J. Higham (1989). "Computing Error Bounds for Regression Prob- lems," Manuscript to appear in the Proceedings of the AMS Confer- ence on Measurement Error Models, Humboldt, CA. [113] N. J. Higham (1989). "How Accurate is Gaussian Elimination?" Tech- nical Report TR 89-1024, Department of Computer Science, Cornell University. To appear in the proceedings of the 13th Dundee Biennial Conference on Numerical Analysis. [114] N. J. Higham and G. W. Stewart (1987). "Numerical Linear Algebra in Statistical Computing." In A. Iserles and M. ,J. D. Powell, editors, The State of the Art in Numeriml Analysis, pages 41 57. Clarendon Press, Oxford. [115] A. Hirsch (1902). "Sur les racines d'une equation fondementale (Ex- trait d'une lettre de M. A. Hirsch it M. 1. Bendixson)." Acta Mathe- matica 25, 367-370. [116] S. D. Hodges and P. G. Moorc (1972). "Data Uncertainties and Least Squan's Regression." Applied Statistics 21, 185 195. [117] A. J. Hoffman and H. W. Wielandt (1953). "The Variation of the Spectrum of a Normal Matrix." Duke Mathematical Journal 20, 37, 39. [118] O. Holder (1899). "Uber einen Mittelwertsatz." Gol.I.ing Nachr'., pages 3847. Cited in [20]. [119] H. Hotelling (1933). "Analysis of a Complex of Statistical Variables into Principal Components." J011T'nal of Educational Psychology 24, 417 -441 and 498520. 
334 REFERENCES REFERENCES 335 [120] A. S. !lousdlOldcr (1 9G8). "U nitary Triangulari'l:ation of a Nonsym- metric Matrix." Journal of thr Association fOI' Computing Machinrry 5, 339 342. [121] A. S. Householder (1964). Tlw Tlwory of Matl'icrs in Numerical Anal- ysis. Dover Publishing, New York. Originally published by Ginn Blais- dell. [122] Vasile I. Istraescu (1981). Intmduction to Lineal' Opemtor Theory. Marcel Decker, New York. [123] C. G. J. Jacobi (1857, posthumous). "fIber eine elementare Trans- formation eines in Buzug jedes von 7:wei Variablen-Systemen linearen und homogenen Ausdrucks." Jou7Twl Fir die I"rine und angewandte Mathematik 53, 2G5 270" [124] G Jordan (1870). Tmitf des Substitutions et des Equations Alg- C1n'iqu('s. Paris. Cited in [150]. [12)] C. Jordan (1874)" "Mhnoire S\ll' Ie" formes bilineaires." .Journal de Mathbnatiqurs Pm"rs d Appliqwirs, Deuxibne Serie 19, 35 n 54. [126] C. Jordan (1875). "Essai sur la geometrie a n dimensions." Bulletin de la Societe Mathbnatique 3, 103 '174. [127] P. Jordan and .L von Neumann (1935). "On Inner Products in Linear Metric Spaces." Annals of Mathr71l.atics 36, 719 723. [J 28] B. I(;'\.gstr()m and k Hul\(' (1980). "An Algorithm for N umcrical Com- putation of the Jordan Normal Form of a Complex Matrix"" Tmnsac- tions on Mathematical Software 6, 398419. [129] W. Kahan (1966). "Numerical Linear Algebra." Canadian Mathemat- ical Bulletin 9, 757-801. [130] W. Kahan (1967). "Inclusion Theorems for Clusters of Eigenvalues of Hermitian Matrices." Technical report, Computer Science Depart- ment, University of Toronto. [131] W. Kahan (1972). "Conserving Confluence Curbs Ill-Conditioning"" Technical Report 6, Computer Science Department, University of Cal- ifornia, Berkeley. W. Kahan (1973). "Every n x n Matrix Z with Real Spectra Satisfies IIZ - Z*II :s; IIZ + Z*II(logn + 0.038)." Proceedings of the Amel'ican Mathematiral Society 39, 235241. W. Kahan (1975)" "Spectra of Nearly Hermitian Matrices." Proceed- ings of the Amehcan Mathematiral So('icty 48, 11-17. [132] [134] W. Kahan, B. N" Parlett, and E. .Jiang (1982). "Residual Bounds on Approximate Eigensystems of Nonnonnal Matrices." SIAM Journal on Numerical Analysis 19, 470-484. [135] T. Kato (1966). Pel'turbation Theory fOI' Linem' Oprmtors. Springer Verlag, New York. [136] D. Konig (1916). "fIber Graphen und ihre Anwendung auf Determi- nantentheory und Mengenlehre." J\1athematische Annalen 77, 453 465. [137] M. G. Krein and M. A. Krasnoselski (1947). "Fundamental Theo- rems Concerning the Extension of Henninian Operators and Some of Their Applications to the Theory of Orthogonal Polynomials and the Moment Problem." Uspekhi Mat. Nauk. 2. In Russian. Cited in [27]. [138] M. G. Krein, M. A. Krasnoselski, and D. P. Milman (1948). "Con- cerning the Deficiency Numbers of Linear Operators in Banach Space and Some Geometric Questions." Sbomik Trudov Inst. A. N. UkI'. S. S. R. 11. In Russian. Cited in [27]. [139] L. Kronecker (1890). "Algebraische Reduction der Schaaren bilinearer FonnelL" Sitzungberichte del' Koniglich Preuflischen Akademie del' Wissenschaften zu Berlin, pages 1225-1237. [140] Peter Lancaster and Miron Tismenetski (1985). Thc Theol"y of Maf1"i- ces. Academic Press, New York. [141] P. S. Laplace (1820). Theoria analytique des pmbabilities (31'd ed.} premiel' supplement: Sur I 'application du calcul des probabilites a la philosophie naturelle. Oeuvres, v.7. Gauthier- Villars. Supplement pub- lished before 1820. [142] C. L. Lawson and R. J. Hanson (1974). Solving Least Squm'es Pmblems. Prentice Hall, Englewood Cliffs, New Jersey. [143] A. M. Legendre (1805). Nouvelle methodes pom' la detennination des orbites des cometes. Courcier, Paris. Cited in [219]. [144] N. J. Lehmann (1 9(3)" "Optil1lale Eigenwerteinschiessungen." Nu- mel'ische Mathematik 5, 246-,272. [145] N. J. Lehmann (1966). "Zur Verwendung optimaler Eigenwerteingren- zungen bei der Losung symmetrischer Matrizenaufgaben." Numel'ische Mathematik 8, 4255. L. Levy (1881). "Sur la possibilite du I'equilibre electrique." Comptes Rendus de l'Arademie des Sciences, Pal'is 93, 706..708. [13:] [146] 
:336 REFERENCES REFERENCES 337 [147] V. 13. Lidskii (1950). "1'11(' Proper Values of the Sum and Product of Symmet.ric Matrices." Doklady Akademii Nauk SSSR 75, 769 772" In Russian. Translat.ion by C. 13enst.er availabk from the National cn'anslation Center of the Library of Congress. [148] A. Loewy (1898). "Sur les formes quadratique defines it indctermininces conjugces de M Hermite." C. R. Acad. Sci. Paris 123, lG8 171. Cited ill [150, p" 79]. [H9] Qi-kmg Lu (I!}();\). "The Elliptic Geomd.ry of Extpnded Span'." Chi- nese !l1athematics 4, 54 69. Translation of an article appearing in Acta mathenwtim Sinica, 13 (1963). [150] C. C. Mac Duffee (1946). The Theory of Matrices. Chelsea, New York. [151] M. Marcus (1960). Basie Theorems in Matrix Theo r'y. Applied Math- ematics Series #57. National Bureau of Standards, Washington, D.C. [152] M. Marcus and H. Minc (1964). A Survey of Matr'ix Theor'y and Matr"ix Inequalities. Allyn and 13acon, Boston. [153] J. L. Massera and J. J. Schiiffer (1958). "Linear differential equations and functional analysis I." Annals of Math. 67, 517573. Cited in [27]. [154] H. 1. Medley and R. S Varga (1968). "On Smallest Isolated Ger- schgorill Disks for Eigpnvahws. III." NU71l.p.r"ische Mathr'matik 11, :161, ;Hi!1. [155] C. Meyer and G" W. Stewart (1988). "Derivatives and Perturbations of Eigenvectors." SIAM Joumal on Numerical Analysis 25, 679-691. [156] H. Minkowski (1896). Geometr'ie der Zahlen. I. B. G. Teubner, Leipzig. Cited in [20]. [157] H. Minkowski (1911, posthumous). "Theorie der Konvexen Karpel', inbesonder(' 13egrundung ihres Oberfliichenbegriffs." In David Hilbert, editor, Mink01Jlski Abhandlung. Teubner Verlag. [158] L. Mirsky (1960). "Symmetric Gage Functions and Unitarily Invariant Norms." Qua7'tr'r"iy JOU7'7w.l of Mathematics 11, 50 -59. [159] L. Mirsky (1963). "Results and Problems in the Theory of Doubly Stochastic Matrices." Zeitschrift fur Wahrscheinlichkeitstheorie und verwandte Gebiete 1, 319334. [160] D. S. Mit.rinovic (1970). Analytic Inequalities. Springer, New York. [161] C. Moler and G. W. Stewart (1973). "An Algorithm for Generalized Matrix Eigenvalue Problems." SIAM Journal on Numerical Analysis 10, 241 256. [162] E. H. Moore (IV20). "On the Reciprocal of the General Algebraic Matrix." Bullr.tin of the Amer'ican Mathematical Society 26, 394 395. Abstract. [163] M. Z. Nashed and L. B. Rail (1976). "Annotated Bibliography on Generalized Inverses and Applications." In M. Z. Nashed, editor, Gen- eralized Invp.r'ses and Applications, pages 771 1041. Academic Press, New York. [164] W. OettIi and W. Prager (1964). "Compatibility of Approximate So- lution of Linear Equations with Given Error 130unds for Coefficients and Right-Hand Sides." Nurner'isrhe Mathematik 6, 405 -409. [165] D. P. O'Leary (1989). "On 130unds For Scaled Projections and Pseudo- Inverses." To appear in Linear Algebm and Its Applications. [166J J. M. Ortega and W. C. Rheinboldt (1970). Iter'ative Solution of Non- linear Equations in Several Variables" Academic Press, New York. [167J A. Ostrowski (1951). "Ueber das Nichtverschwinden einer Klasse von Determinanten und die Lokalisierung del' charakteristischen Wurzel von Matrizen." Compositio Mathematica 9, 209-226. [168J A. Ostrowski (1952). "Sur quelques applications des fonctions convexes et concaves au sens de 1. Schur." Journal de Mathfrnatiques PU7'(;S et Appliquees 11 7, 253 292. [169J A. Ostrowski (1957). "Dber die Stetigkeit von charakteristischen Wurzeln in Abhiingigkeit von den Matrizenelementen." Jahresberichte der Deutsche Mathematische Ver"ein 60, 40-42. [170J D. V. Ouellette (1981). "Schur Complement and Statistics." Linear Algebm and its Applications 36, 187-295. [171J C. C. Paige (1979). "Computer Solution and Perturbation Analysis of Generalized Linear Least Squares Problems." Mathematies of Com- putation 33, 171-184. [172J C. C. Paige (1979). "Fast Numerically Stable Computations for Gen- eralized Least Squares Problems." SIAM J. on Numerical Analysis 16, 165-171. [173J C. C. Paige (1984). "A Note on a Result of Sun .Ji-guang: Sensitivity of the CS and GSV Decomposition." SIAM Journal on Numer'ical Analysis 21, 186-191. [174] C. C. Paige and M. A. Saunders (1981). "Toward a Generalized Singu- lar Value Decomposition." SIAM Journal on Numerical Analysis 18, 398 405. 
338 REFERENCES [175] B. N. Parlett (1980). The Symmetric Eigenvalue Problem. Prentice- lIaJl, Englewood Cliffs, New .Jersey. M" Pavel-Parvn and A. Korganoff (1969). "Iteration Functions for Solving Polynomial Equations." In B. Dejon and P. Henrici, editors, Constructive Aspects of the Fundemental Them'em of Algebra. .John Wiley, New York. G. Peano (1888). "Intcgration par scries des equations differentielles lincaires." Mathematische Annallen 32, 450456. R. Penrose (1955). "A Generalized Inverse for Matrices." Proceedings of the Camhidge Philosophical Society 51, 406,413. R. Penrose (1956). "On Best Approximate Solutions of Linear Matrix Equations." Pr'oceedings of the Cambridge Philosophical Society 52, 1719. V. Pereyra (1969). "Stability of General Systems of Linear Equations." Aequationes Mathematicae 2, 194,206. E. Picard (191O). "Sur un theorem general relatif aux equations integrales de premier espcce et sur quelques problemes de physique mathematique"" Rwdicondi del Circolo Maternatico di Palermo 25, 79 97. L. qi (1984)" "Some Simple Estimates for Singular Values of a Matrix." Linear' Algebra and Its Applications 56, 105-119. Lord Rayleigh (J. W. Strutt) (1899). "On the Calculation of the Fre- quency of Vibration of a System in its Gravest Mode, with an Exam- ple from Hydrodynamics." The Philosophical Magazine 47, 556572. Cited in [175]. F. Riesz and B. Sz.-Nagy (1955). Functional Analysis. Ungar, New York. L. F. Boron, Translator. [185] J. L. Rigal and J. Gaches (1967). "On the Compatibility of a Given Solution with the Data of a Linear System." Jour'nal of the Association for Computing Marhincr'y 14, 543548. [186] W. Ritz (1909). "Uber eine neue Method zm Losung gewisser Varia- tionsprobleme der mathematischen Physik." Joumal fur die reine und angewandte Mathematik 135, 1- 61. [187] H. Rohrbach (1931). "Bemerkungen zu einem Determinantensatz von Minkowski." Jahrcsber'icht der De1ltschen Mathematiker' Vereinigung 40. ,19 53. [176] [177] [178] [179] [180] [181 ] [ 182] [183] [184] REFERENCES __ 339 [188] M. Rosenblum (1956). "On the Operator Equation BX _ X A = Q." Duke Mathematiral J01LT'nal 23, 263 269. [189] k Ruhe (1970). "An Algorithm for Numerical Determination of the Structme of a General Matrix." BIT 10, 196-216. [190] A. Ruhe (1970). "Pertmbation Bounds for Means of Eigenvalues and Invariant Subspaces." BIT 10, 343- 354. [191] H. Rutishauser (1955). "Une methode pom la dctermination des valeurs propres d'une matrice." Cornptes Rendus de l'Acadernie des Sciences, Paris 240, 34-36. [192] E. Schmidt (1907). "Zur Theorie del' linearen und nichtlinearen In- tegralgleichungen. I Tiel. Entwicklung willkiirlichen Funktionen nach System vorgeschriebener." Mathematische Annalen 63, 433-476. [193] I. Schur (1909). "Uber die charakteristischen Wiinpjn einer linearen Substitution mit einer Anwendung auf die Theorie del' Integralgleich- ungen." Mathematische Annalen 66, 448-510. [194] P. J. Schweitzer (1968). "Perturbation Theory and Finite Markov Chains." Journal of Applied Probability 5, 401--413. [195] :. .1. Scriba (1973). "Carl Gustav Jacob Jacobi." In C. C. Gillispe, ed- Itor, Dzctzonar"y of Scientific Biography. VII. Charles Scribner's Sons wfu. ' [196] G. A. F. Seber (1977). Linear' Regr'essioT/ Analysis. Jolm "'i!ey. Nl:'w York. [197] R. D, Skeel (1979). "Scaling for Numerical Stability in Gaussian Elim- ination"" Journal of the Association for C'omputmy Jlachinery 26. 494-.526. [198] F. Smthes,,(1937). "The Eigen-values and Singular Values of Integral EquatIOns. Proceedzngs of the London Mathematical Society 43 255 279. ' , [199] G. W. Stewart (1969). "On the Continuity of the Generalized Inverse." SIAM Journal on Applied Mathematics 17, 33- 45. [200] G. W. Stewart (1971). "Error Bounds for Approximate Invariant Sub- spaces of Closed Linear Operators." SIAM JouT'Tlal on Numerical Anal- ysis 8, 796-808. [201] G. W. Stewart (1972). "On the Sensitivity of the Eigenvalue Problem Ax = >'Bx." SIAM Journal on Numeriral Analysis 4, 669 686. 
340 REFEHENCES REFERENCES 341 [202] G" W" Stewart (1973)" "Error and Perturbation Bounds for Subs paces Associated with Certain Eigenvalue Problems." SIAM Review 15, 727 764. G. W. Stewart (1974). Introduction to Matrix Computations. Aca- demic Press, New York G. W. Stewart (1975). "Gen;chgorin Theory for the Generalized Eigen- value Problem AJ: = >.13:1:." Mathematics of Computa.tion 29,600 606. G. W. Stewart (1977). "On the Perturbation of Pseudo-Inverses, Pro- jections, and Linear Least Squares Problems." SIAM Review 19, 634- 662. [216] G. W. Stewart (1989). "On Scaled Projections and Pseudo-Inverses." Linear Algebm and Its Applications 112, 189 194. [217] G. W. Stewart (1989). "Perturbation Theory and Least Squares with Errors in the Variables." Technical Report UMIACS- TR-89-97, CS- TR 2326, Department of Computer Science, Unive;sity of Maryland. To appear in the Proceedings of the AMS Conference on Measurement Error Models, Humboldt, California. [218] G. W. Stewart (1989). "Two Simple Residual Bounds for the Eigenval- ues of Hermitian Matrices." Technical Report CS- TR 2364, Depart- ment of Computer Science, University of Maryland. [219] S. M" Stigler (1986). The History of Statistics. Harvard University Press, Cambridge, Massachusetts. [220] J.-G. Sun (1979). "A Theorem on the Perturbation of Generalized Eigenvalues." Report on the Conference Numerical Mathematics, Guangzhou, China. [221] J.-G. Sun (1980). "Invariant Subspaces and Generalized Invariant Sub- spaces (II)"" Math. Numer. Sinica 2, 113123. Cited in [67]. [222] J.-G. Sun (1982). "A Note on Stewart's Theorem for Definite Matrix Pairs." Linear Algebm and Its Applciatio1/.S 48, 3:31 339. [22:3] J.-G. Sun (1983). "Perturbation Analysis for the Generalized Eigen- value Problem and the Generalized Singular Value Problem." In 13. Kagstrom and A. Ruhe, editors, Matrix Pencils, pages 221- 244. Springer Verlag, New York. [224] J.-G. Sun (1983). "Perturbation Analysis for the Generalized Singular Value Decomposition." SIAM Journal on Nume7"ical Analysis 20,611-- 625. [225] J.-G. Sun (1983). "Perturbation Bounds for Eigenspaces of a Definite Matrix Pair." Numf7'ische Mathematik 41, 321 343. [226] J.-G. Sun (1984). "Estimation of the Separation of Two Matrices." Journal of Computational Mathematics 2, 189 200. [227] J.-G. Sun (1984). "On the Perturbation of the Eigenvalues of a Normal Matrix." Math. Numer. Sinca 6, 334-336. [228] J .-G. Sun (1985). "Gerschgorin Type Theorem and the Perturbation of the Eigenvalues of a Singular Pencil." Math. Nmner. Sinica 7,253"" 264. In Chinese. Translation in Chinese Journal of Numerical Mathematics and Applications 10 (1988) 113" [20:3] [204] [205] [20G] G. W. Stewart (1977). "Research Development and UNPACK." In .J. R. Rice, editor, Mathematical Software III, pages 1  14. Academic Press, New York. G. W. Stewart (1979). "The Effects of Rounding Error on an Al- gorithm for Downdating a Cholesky Factorization." Journal of the Institute for Mathematics and Applications 23, 203-213. G. W. Stewart (1979). "A Note on the Perturbation of Singular Val- ues"" Linear Algeb7'a and Its Applications 28, 213 216. G. W. Stewart (l!)82)" "Colllpntinl!: the CS Decomposition of it Parti- tioned Orthogonal Matrix." Numerische Mathematik 40, 297 306. G. W. Stewart (1984). "On the Invariance of Perturbed Null Vectors under Column Scaling." N117nrrische Mathematik 44, 6165. G. W. Stewart (1984). "Rank Degeneracy." SIAM Journal on Scien- tific and Statistical Computing 5, 403413. G. W. Stewart (1984). "A Second Order Perturbation Expansion for Small Singular Values." Linear Algebm and Its Applications 56, 231  235. G. W. Stewart (1987). "Collincarity and Least Squares Regression." Statistical Scirnce 2, 68 -100. G. W. Stewart (1987). "Invariant Subspaces and Capital Punishment." Technical Report TR-1923, Department of Computer Science, Univer- sity of Maryland. G. W. Stewart (1988). "Stochastic Perturbation Theory." Technical Report CS- TR2129, Department of Computer Science, University of Maryland. To appear in SIAM Review. [207] [208] [209] [210] [211] [212] [213] [214] [215] 
342 REFERENCES REFERENCES 343 [229] .J.-G. Still (1987)" Matrix Pe1"turbation Analysis. Academic Press, I3eijing. In Chincse. [230] J .-G. Sun (1988). "A Note on Simple Non-Zero Singular Values." .Jvurnal vf Cvmputativnal Mathematics 6, 258 266. [231] .J .-(;. Sun (I 98!J). "A Not.e on Local13ehavior of Multipk EigmtvahH's." SIAM .Jounw./ on Mah"i:l: Analysis and Applimtions 10, 5:33 5tH. [2:32] C. A. Swanson (1901). "A n Inequalit.y for Litwar Transformations with Eigenvalues"" Bulletin af the Amer'ican Mathematical Saciety 67, 607 608. [23:3] .J. .J. Sylvester (1852). "A Demonstration of the Theorem that Every Homogeneous Quadratic Polynomial is Reducible by Real Orthogonal Substitutions to the Form of a Smn of Positive and Ncgative Squares." Philosopical Magazinr 2, 138 142. [2:\'1) .1. .1. Sylvest.!'!" (185:\)" "On a TIH'O!"y of t.he Syzygdi<; Rdatiolls de." Phil. Tmns., pages 481 84. Cit.ed in [39]. [235] J. .J. Sylvester (1889). "Sur la reduction biorthogonale d'une forme lineo-lincaire it sa forme cannonique." Camptes Rendus de l'Academie des Sciences, Paris 108, 651-653. [2:36] .J. J. Sylvester (1890)" "On the Reduction of a Bilinear Quantic of the nTH Order to the Form of a Sum of n Products by a Double Orthogonal Substitution." Messenger af Mathematics 19, 42-46. [237] Bela Sz.-Nagy (1960). Extensians af Linear' Tmnsfannatians in Hilbert Space which Extend beyond This Space. Ungar, New York. An ap- pendix to Ricsz and Sz.-Nagy [184] issued as a separate pamphlet. [238] O. Taussky (1948). "Bounds for Characteristic Roots of Matrices." Duke Mathematical .Jaw'nalI5, 1043 1044. [239] O. Taussky (1949). "A Recurring Theorem on Determinants." Amer- ican Mathematical Manthly 46, 672-675. [240] .J. Todd (1950). "The Condition of a Certain Matrix." Proceedings af the Cambridge Philasaphical Saciety 46, 116-118. [241] O. Toeplitz (1918). "Das algebraische Analogon zu einem Satze von Fejer." Mathematische Zeitschrift 2, 187 197. [242] A" l\L Turill/?: (I!H8)" "R.olllldill/?:-off Errors in Matrix Processes." The (2//0./"tcl1.1/ .Io//pwJ of Mcr'ho.ni,'s o.nd Applied Mathcmatic., 1, 287 :308. [24:3] F. Uhlig (1979). "A Recurring Theorem about Pairs of Quadratic Forms and Extensions: A Survey." Linear Algebra and Its Applicatians 25,219-237. [244] A. van der Sluis (1969). "Condition Numbers and Equilibration of Matrices." Numer'ische Mathematik 14, 14 23" [245] A. van der Sluis (1975). "Stability of the Solutions of Linear Least Squares Problems." Numer'ische Mathematik 23,241-254. [246] B. L. van der Warden (1927). "Ein Satz uber Klasseneinteilungen von Endlicher Mengen." Abhandlungen aus dem Mathematischen Seminar der IIambllrgischen Univer'sitiit 5, 185-188. [247] C. F. Van Loan (1975). "A General Matrix Eigenvalue Algorithm." SIAM Jallmal an Nllmer"ical Analysis 12, 819-834. [248] C. F. Van Loan (1976)" "Generalizing the Singular Value Decomposi- tion." SIAM .JounUll vn Numeriml Analysis 13,76 83. [249] C. F. Van Loan (1985). "On the Method of Weighting for Equality Constrained Least Squares." SIAM Jaumal an Nllmer'ical Analysis 22, 851-864. [250] J. M. Varah (1967). "The Computation of Bounds for the Invariant Subspaces of a General Matrix Operator." Technical Report CS 66, Computer Science Department, Stanford University. [251] .J. M. Varah (1970). "Computing Invariant Subspaces of a General Matrix When the Eigensystem Is Poorly Determined." Mathematics af Camputatian 24, 137 149. [252] R. S" Varga (1962). Matri"T Iter'ative Analysis. Prentice-Hall, Engle- wood Cliffs, New .Jersey. [253] R, S. Varga (1964). "On Smallest Isolated Gerschgorin Disks for Eigen- values." NumeT'ische Mathematik 6, 366 "376. [254] .J. von Neuman (1937). "Some Matrix-Inequalities and Metrization of Matrix-Space." Tamsk. Univ. Rev. 1, 286 300. In [255, v.4, pp.205- 219]. [255] .J. von Neuman (1962). Callected W07'ks (A. H. Taub, editor). Perga- mon, New York. [256] J. von Neuman and H. H. Goldstine (1947)" "Numerical Inverting of Matric:p-s of High OrdP-L" Bulletin of the Amer"imn Mnthr'matiml Society 53, 1021 1099. 
344 REFERENCES [257] .T. H" M. Wedderburn (1934). LpctUH'S on Malr'ices. American Math- cmatical Society Colloquium Publications, V. XVII. Amprican Math- p!IIatkal SociPly, Npw York. p.-A" \Vedin (1969). "On Pseudoinverscs of Perturbed Matrices." Tech- nical report, DppartnlPnt. of Computer Sciencp, Lund University. P.-A. Wedin (1972). "Pert.mbation Bounds in Connection with Singu- lar Value Decomposit.ion." BIT 12,99 111. P.-A. \Vedin (1973). "Pprt.ubation Theory for Pseudo-Inverses." BIT 13, 217232. P.-A. Wedin (1983). "On Angles Between Subspaces." In B. Kagstrom and A. Iluhe, editors, Matrix Pencils, pages 263 285. Springer, New York. p.-A. Wedin (1!)87). "l'ert1ll'bation Tlwory and Condition Numbers for Generalill('d and Constrained Least Squares Problems." Technical Rq)()rt S-90 1-8 7, Instit ute of Information Processing, University of Umea. [25k] [259] [260] [261] [262] [263] K. Weierstrass (1867). "Zm Theorie der bilinearen und quadratischen Formcn." Monaish. Akad. Wiss. Berlin, pages 310-38. Cited in [81]. H. F. Weinberger (1974). Var'iaiional Methods for Eigenvalue Approxi- mation. Society for Industrial and Applied Mathematics, Philadelphia. Cit.ed in [175]. H. WC'yl (1912)" "Das asympt.otisdl(, Vertdlungsgest.ell der Eigenwert lin('arpr particlkr Differentialgleichungen (mit eincr Anwendung auf del' TIH()ric del' Ilohlraumstrahlung)." Mathematische Annalen 71, 441 479. [266] H. Weyl (1949). "Inequalities between the Two Kinds of Eigenvalues of a Linear Transformation." Proceedings of the National Academy of Sciences 35, 408 411. [267] H. W. Wielandt (1955). "An Extremum Property of Sums of Eigenval- ues." Proceedings of the Amer"ican Mathematical Society 6, 106-110. [268] N. Wiener (1922)" "Limit in Terms of Continuous Transformations." Bulletin de Ie Societe Mathernatique de Fr'ance 50, 119-134. [269] ,L H. Wilkinson (1965). The Algebmic Eigenvalue Pmblem. Clarendon Press, Oxford, England. [264] [265] [270] ,L H. Wilkinson (1971). "Modern Error Analysis." SIAM Review 13, 5,18 568. REFERENCES 345 [271] [273] J.. H. Wilkinson (1972). "Note on Matrices with a Very III-Codition('d Elgenproblem." Numer'ische Mathematik 19, 176 178. J. II.. WilkiwiOn (1979). "Kronecker's Canonical FOrIn and t.he QZ Algontluu." Linear' Algebr'a and Its Applimtions 28, 285303. " J. H. Wilkinson and G. H. Golub ( 1976 ) " Ill C l ' t ' 1 E . . - ,on< I,lone< "lgensys- terns and the Computation of the Jordan Canonical Form." SIAM ReV7ew 18, 578 619. H:. Wittmeyer (1936). "Einflul3 der Anderung einer Matrix auf der o.sung des zugehorigen Gleichungssystems, sowie auf die charakter- IstIschen Zahlen und die Ei g envektoren" ZI " t " ' 1t f ". " '. ,e se Ir I 1l7' angwandte MathematIk 71nd Mechanik 16, 287, 300. . :: Yamamoto (1980). "Error Bounds for Computed Eig<'llvalues and Ergenvectors." Numerische Mathematik 34, 189 199. D. . Young (1971). Itemtive Solution of Large Linear" Systems. Aca- dennc Press, New York. Z. Zha.n g (198). "On the Perturbation of the Eigenvalues of a NOIl- DefectIve Matnx." Math. NU7ner. Sinica 6 10 6 1 ()8 I Cl ' , . 11 ,Ullese. [272] [274] [275] [276] [277J 
Notation R The set of real numbers " 1 Rn The set of real n-dimensional vectors 1 Rrnxn The set of real m x n matrices 1 e The set of complex numbers 1 en The set of complex n-dimensional vectors 1 c rnxn The set of complex m x n matrices 1 I (I,,) The identity matrix (of order n) 2 1 The vector (1,. .., I)T 2 Ii The ith unit vector 2 AT The transpose of A 2 AH The conjugate transpose of A . 2 A-I The inverse of A . 2 A-T, A-H The inverse transpose and conjugate trans- pose of A . 2 X+Y The set {x + y : x E X, Y E Y}. Other opera- tions on sets are defined similarly 2 R(A) The column space of A 2 N(A) The null space of A 2 rank(A) The rank of A 2 dim( X) The dimension of the subs pace X 2 det( A) The determinant of A 2 trace( A) The trace of A . 2 IIxll2 The 2-nonn of x . 3 347 
348 IAI A>B dd diag(61'""" ,6,,) R(A)1., X j , PA,p,.' PX,P, L(A) cPA (A) 1 1 A", A-" .J,,()..) F(A) H(L) A0B S(A) IIAII2 inf 2 (A) 8(X,Y) L(x, y) IIxlh II J'II ex' 11:1:111' IJAliF p( A) II A 111' IIAIII IIAlioo 11.11<1> x;-y <h. C;' (Ri') PI'..,,(X, Y) NOTATION The matrix whose elements are the absolute values of the elements of A . . . " " . A is component-wise greater than B. Other relat.ions are defined similarly. Fo[mal ddi ni t.ion Implicit definition A diagonal matrix . . . . . . . . " . " . " The orthogonal complement of R(A), X . The orthogonal projection onto R(A), X " . . The complementary projectors I - PA, I - Px The set of eigenvalues of A . . . . . The characteristic polynomial of A . The square root of A and its inverse A .Jardan hlock of order k . The field of values of A " " The convex hull of L '" The Kronecker or tensor product of A and B The set of singular values of A . The spectral norm of A . . . . . . . . . . . . The smallest singular value of A . . . . . . . The matrix of canonical angles between X and Y.............. The angle between x and y The I-norm of x . . . . . The oo-norm norm of x . The I1i}I(Jpr ]i-narm of:r " The Frobenius norm of A The spectral radius of A The Holder p-norm of A The I-norm or row-sum norm of A . The oo-norm or the column-sum norm of A The unitarily invariant norm defined by the symmetric gauge function <I> ......... 76 81 x majorizes y . . . . . . . . . . . . . Fan's symmetric gauge functions . . . . . .. 87 The set of i-dimensional subspaces of c n (R n) gO The 1/-gap between X and Y . . . . . . . . . . 91 NOTATION 349 A(i.),k) 3 3 3 3 4 8 10 10 14 15 20 20 23 24 30 31 33 33 At ae(O', it), re(O', it) ae(A, Ii), [e(A, Ii) K(A) KBs(A) sv A(A)_ hd(A, A) md(A, A) 6v(A) md 2 (A, A) inertia( A) sep(L, M) (a, {J), (A) ,(A, B) x( (a, {J), (r, 6)) PD PL, PR 43 45 51 51 51 65 66 69 69 69 II(P, Q)IIF dif[(A, B), (C, D)] A generalized inverse satisfying Penrose's con- ditions i, j, and k . . . . . . . . . " . . " . 102 The pseudo-inverse of A. . . . . . . . " ". . 102 The absolute and relative errors errors in (t . 115 The absolute and relative errors errors in A . IlG The condition number of A: IIAlllIAtll . '. . 119 The BauerSkeel condition number of A ". . 128 The spectra variation of A with respect to A . 167 The Hausdorff distance between the eigenval- ues of A and A. " . . . . . . . . . . . . . . . . 167 The matching _distance between the eigenval- ues of A and A. . . . " . . " . . . . . . . . . . 167 The 1/-departure of A from normality . . . . . 171 The 2-norm matchin distance he'tween the eigenvalues of A and A . . . . . . . . . " . 189 The inertia of A . . . . . . . . . . . . . '. . 196 The separation of the spectra of Land M. . 231 Generalized eigenvalues . . . . . . . . . " . 273 Crawford's number for the definite pair (A, B) 282 The chordal distance between (a, {J) and (r,6) 283 The chordal metric for definite pairs. . 288 The left and right equivalence metrics for reg- ular pairs . . . . . . . . . . . . . . . . . " . . " 288 The combined Frobenius norm . . . . . . . . . 307 The difference between the spectra of (A, B) ane! (C,D) . . . . . . . . " . . . . . . . " . . .307 
Index 2-norm 51, 59, 71, 72 as largeHt sillf!,;ular value 33, ()!) consistency with Frobenius norm 66 matrix 2-norm 69 properties 36, 51, 70 relation to the I-norm and oo-nonn 55 symmetric gauge function 79 unitary invariance 52, 60, 72, 74 vector 2-nonn 3 Abdelmalek, N. N. 163 absolute and relative error in individual elements 128 matrix 116, 117, 119, 134 limitations 117 properties 116 scalar 115 properties 115 absolute value 49 acute perturbation 136 140, 151, 152 continutiy of pseudo-inverse 140 definition 139 in reduced form 1:39 acute subspaces 151, 152,255 AfriaL, S. N. 45 Amir-Moez, A. R. 209 approximation by matrix of lower rank (see Schmidt uMirsky theorem) 208 Ariolo, M. 163 arithmetic-geometric mean inequality 61 artificial ill-conditioning 125, 193 Autollne, L. 35, 36 backward perturbation (see under linear system, least squares, etc. 128 Banach space 60, 98 Banach, S. 60 Bateman, H. 35 Bauer Fike theorem 171, 181, 192, 294, 300 Bauer -Skeel theorem 127 Bauer, F. L. 133, 177, 194, 195 Baumgartel, II. 176 Beckenbach, K F. 60 Bellman, R. 4, 60 Beltrami, E. 34 351 
352 INDEX IN D EX 353 I3en-IsraeL A. 152 Bendixson Hirsch TOi'plitz theori'm 30 13endixson, I. 30 13ergman, P. G. 108 Berkson, K D8 13hatia, H.. 176, 194, 227, 228 13irkhoff's theorem 50, 85, 87, 190, 1D:3, 211 Birkhoff, G. D. 88 13jerhammer, A. 108, 109 Bjiirck, A. 163 Borchart, C. W. 209 Bunch, J. n. 1:3G 13unyakovski GO canonical angle 43, 45, 94, 98, 99, 226, 232, 240, 250, 255, 260, 323 basis metric 95 computation 43, 45 gap function 92 pairs of projections 43 variati(JIIal dlaractcrization 45 can(JIlical bases 40 Cauchy's interlacing theorem 196 198,209 Cauchy inequality 5, 60 generalized 67 Cauchy sequence 63, 64, 99 Cauchy, A. L" GO, 209 Cayley- Hamilton theorem 27 characteristic equation 15 charadm'istic polynomial 15 Chatelin, F. 4 Chebyshev, P. L. 11 Cholesky factor 13 chordalnlf'tric 283, 284, 290, 314 column span' 2, 4 cohunn stun nann (see nann, matrix I-norm) 70 companion matrix 28 complete space 63, 64, 99 condition estimation 133 condition lllnuber (see under matrix inverse, eigenvalue, etc.) 118 congruence transformation 1 96 r:onsistency between norms (see norm, consistency) 66 foolish 2G contraction matrix 40 Courant.., Fischer t.heorem (see Fischer's t.heorem) 209 Courant, R. 209 Crawford, C. R. 290, 324 CS decomposition 37--40, 45 computat.ion 45 existence 37 generalized singular value decomposition 47 perturbation theory :324 perturbation of eigenvalues 192, 215217 diagonally dominant matrix 186-188 diagonal matrix 3, 5 block 4 dif 307, 309, 311, 312 bounds for definite matrix pair 319 dilation 209, 211 direct rotat.ion 46 doubly stochastic mat.rix 83, 85, 88, ID5 13irkhoff's theorem 85 defini tion 81 Drazin, M. P. 108, 113 Duff, I. S. 163 Dulmage, L. 88 Fischer's theorem (q.v.) 27 geometric multiplicity 15, 16 Gerschgorin's theorem (q.v.) 181 Gerschgorin disks (q.v.) 181 Hausdorff distance 167-169, 177, 178 inclusion region ID5, 210 matching distance 167169, 174,177,178,217 m(h 18D matrix pair (q"v") 271 multiple 15, 26, 27 nomenclature 14, 26 of matrix functions 29 perturbation theory 105, 176, 203, 241 Bauer-Fike theorem (q.v.) 171 diagonalizable matrix 192 Elsner's theorem (q.v.) 168 generalized Rayleigh quotient 24D Henrici's theorem (q. v.) 172 Hermitian matrix 258, 263 HofIman- Wielandt theorem (q.v.) 189 matrices similar to a Hermitian matrix 215, 216 matrices similar to a normal matrix 216 Mirsky's theorem (q.v.) 205 non- Hermitian perturbations of Hermitian matrices 212214, 217 normal matrix 192, 195 Eckart- Young theorem (see Schmidt- Mirsky theorem) 210 Eckart, C. 35, 210 eigenpair 14 eigenproblem 26 eigenspace (see under matrix pair) 303 eigensystem 20 eigenvalue 14 algebraic multiplicity 15, 16 backward perturbation 175 optimal 176 Cauchy's interlacing theorem (q.v.) 196 complex 15 condition number 186, 188, 192, 226, 240 continuity 166, 167, 176, 178, 244 defective 16, 176 Davis, C. 45, 46, 151, 194, 227, 228, 244, 258, 259 Decell, H. P. 152 defective matrix 16 Jordan form 21 definite matrix pair (see matrix pair (definite)) 281 Demmel, J. 133, 136, 177 Dennis, J" E. 135 departure from nOrInality 171, 172,177,178 Desplanques, J. 186 determinant as poor measure of condition 122 diagonalizable matrix 21, 28 
35.1 INDEX INDEX orthogonal matrix 195 Ostrowski -, Elsner theorem ( q. v.) 170 simple eigenvalue 183 Weyl's theorem (q.v.) 203 Hayleigh quotient (q.v") 185 residual bound 171, 176, I!H, 205 207,209,211,2:\;), 248, 254 257 set of eigenvalues 26 simple 15, 29, 183 differentiability 185 spectral variation 167169, 177 eigenvector 14 backward pertubation 179 condition 219 condition number 241 left. 15 nOll\miqueuess 89, 219, 220 IJPrtmbation theory 240, 2,11, 2.14 Hermitian matrix 258, 263 stochastic matrix 241, 244, 246 residual bound 211 right 15 simple eigenvalue 29 elliptic norm 109 Elsner's tlH'orem 167, 1G8, 170, 181 Elsner, L. 177, 178, 311 error (see absolute and relative error) 115 Euclidean norm (also see 2-nOl'm) 3, 53 Evans, J. W. 152 exponential of a matrix 73 Faddpeva, V. N. 71 Fan's theorem 50, 81, 86, 254 Fan, K. 88 Feingold, D. B. 188 Feller, W. 10 field of values 23, 24, 27 convex hull of eigenvalues 24 convexity of 23 dpfinition 23 of a Hermitian matrix 28 of a normal matrix 24 Fike, C. T. 177, 194 first order approximation (also see under least squares, eigenvalue, etc.) 131, 134, 292 Fischer's theorem 27, 196, 198, 201, 209, 281, 289 Fischer, E. 209, 289 Forsythe, G. E. 10 Francis, .I. G. F. 11 Frobenius norm 65,71, 110, 131, 135, 172, 177, 180, 247, 258 consistency 65, 69, 71 consistency with 2-norm 66 relation to eigenvalues 72 symmetric gauge function 79 unitary invariance 71, 72, 74 Frobenius, F. G. 71,88 full rank factorization 12, :\2, 105 generalized eigellvalue problem (see matrix pair) 271 gelleralized inverse 102 (i,j, k)-inverse 102 discontinuity 108 Drazin inverse 108, 113, 241 from singular value decompositioll 103, 104, 110 group illverse 241 limitiations 108 projections 110 generalized singular value decomposition 46 perturbation theory 324 Gerschgorin's theorem 177, 181, 186, 187, 203 block variant 188 compared with Elsner's theorem 181 gelleralized (see under matrix pair (regular)) 291 Gerschgorin disks 181, 186, 187 irreducible matrix 188 isolated 187 reduction by diagonal similarity 182--187 Gerschgorin, S. A. 177, 186 Gohberg, I. 227 Golub, G. H. 4, 151, 152, 155, 163 Gragg, W. B. 133 Gram-Schmidt algorithm 11, 12 modified 12 Gram, J. P. 11 Gaches, J. 134 Gantmacher, F. R. 4 gap function (see under metrics for su bspaces) 90 Gastinel, N. 71, 133 Gatlinburg Conferences 71 Gaussian elimination 132 Gauss, C. F. 108 110, 134 Hadamard's inequality 8, 14, 168, 177 Hahn Banach theorem 57, 63 Hahn, H. 60 355 Hall's theorem 84, 88, 89, 170, 178 Hall, P. 88, 89 Halmos, P. R. 46 Halperill, I. 88 Hanson, R. .I. 152, 163 Hardy -LittIewoodP6lya theorem 81,87 Hardy, G. H. 88 Hausdorff distance (see under eigenvalue) 167 Hausdorff, F. 27, 177 Hearon, J. Z. 152 Henrici's theorem 172,174, 177 Henrici, P. 177, 178 Hermite, C. 209 Hermitian matrix 3, 5 Cauchy's interlacing theorem (q"v.) 196 eigellvalues 19 bounds 30 of sums 25 field of values 28 Fischer's thearem (q.v.) 27 perturbation of eigenvales Mirsky's theorem (q.v.) 204 perturbatioll of eigenvalues 203, 258, 263 generalized Rayleigh Quotient 249 matrices similar to a Hermitian matrix 215, 216 nOIl- Hermitian perturbations 212 -214, 217 positi ve definite perturbation 203 Weyl's theorem (q"v") 203 
356 I)('rturbat.ion of eigenvectors 258, 26:3 lH'rturbation of invariant subspacps 244 rpsidual bound for eigenvalues 205 207 residual bounds for eigenvalues 248, 254 257 residual bounds for invariant subspaces 249 254 first sin 8 theorem 250, 258 nonorthonormal baisis 251 spcOlHI sin 8 thporcm 251, 2;)5, 2;,8 tan t-) theorpm 25:, 258, 259 skew Hermitian matrix 5 spectral decomposition 19, 226 sums of 27 lIessiall matrix 1:34 Higham, N. .J. 1:3:3, 16,t Hilbert spacp 64, 98 Hirsh, H. :30 Hoffman Wielandt theorem 189, 19:3, 205, 213, 218, 257 generalizations 19:3, 194 limitations HJ1 Hoffman, A. J. 88, 19:3 Holbrook, J. R. A. 194 Holder's inequality 61 Holder norms 192 Holder, O. 61 Hotelling, H. 35 Householder transformation 5, 6, 10 lIouseholdl'1', A" S. 4, 6, to, 11, 71, 176, 195 id('mpotent matrix 28 INDEX INDEX inertia of a matrix 196 illf 2 in terms of U1(' inverse matrix 36 in terms of the pseudo-inverse 110 inner product 5:3, 62 invariant subspace 21, 22 approximate (see invariant subspace, residual bound) 230 backward perturbation 175, 178-180 optimal 176 characteri,r,ation 220, '2'27 complementary 22'2, 225 complex conjugate eigenvalues 29 definition 22 left 221, 225. normalization 239, 244 perturbation theory 229, 236-240, 244, 254 canonical angles 2:37 first sin 8 theorem 250, 258 Hermitian matrix 244 second sin e theorem 251, 255, 258 tan 8 theorem 253, 258, 259 reduced form of matrix 221 representation of matrix 22, 220, 2:31, 235, 237 condition number 240 residual bound 174, 206, 229-2:36, 246, 249 254 canonical angles 232 nonorthonormal basis 251 sensitivity 21, 90, 219 simple 220, 221, 226, 227, 2:31, 2:35, 2:38 existence of complementary invariant subspace 225 reduction to block diagonal form 224 spectral projection 114, 225, 240 canonical angles 226 spectral resolution 223-228, 237, 244 Sylvester's equation (g. v.) 222 mverse matrix condition number inverse matrix 102 asymptotic forms and derivatives 130-132 condition 17 condition number 119, 127, 133 artificial ill-conditioning 122124 distance from singularity 120, 121, 1:3:3 in the 2- norm 121 optimal 133, 135, 193 relation to determinant 134 relation to eigenvalues 122, 134, 135 significant digits in inverse 120 left and right inverses 134 perturbation theory 117-124 linear system 124 random perturbations 131 well-conditioned 120 irreducible matrix 186, 188 Jacobian matrix 134 357 Jacobi, C. G. J. 209 Jiang, E. 178, 180 Jordan Wielandt matrix 32, 34, 35, 259 Jordan block 20, 28, 174, 180, 186, 300 function of 73 powers of 28 Jordan canonical form 20, 21, 26, 174,227,280 associated invariant subspaces 21 cOlnputat.ioll 27 Drazin gpllf'ralized inverse 11:3 limitations 21, 227 principal vector 21 Jordan, C. 26, :34, 35, 45, 289 Jordan, P. 63 Kahan, W. 27,45,46, 133, 151, 178,180,209,211,217, 218, 244, 258, 259 Kato, T. 4, 176, 178, 244 Kiinig, D. 88 Korganoff, A. 152 Krasnoselski, M. A. 98 Krein, M. G. 98 Kronecker product :30, 228, 258 Kronecker, L. 289 Lancaster, P. 227 Laplace, P. S. 11 Lawson, C. L. 152, 163 least squares 10, 11, 101 aSYlIlptotic forms and derivatives 162 backward perturbation 160-163 I3jorck's theorem 162 
358 INDEX condition condition IHlmoer 15G, IG: reflected by solution 156 158, IG3 square of K. 158, IG3 constrained 109, 112 cross-product mat.rix backward JH'rt.mhatioll I (i,1 !'lliptic n01'l1l 109 errors in the variables IG3 bias 2G 7 expand!'d equations 107, IGl, IG3 Gauss-Markov theorem 110 Gauss, C. F. 108 measurement e!Tor models IG3 normal equations 107 numerical methods 152, IG3 pertmbation tlwory 156 IGO Bji.irck's theorem 158 errors in a column 163 Iwrtmbation in A 157 pert moat ion in b 156 pertmbation of the residual vector 160 structured pertmbation 158, ]63 priority dispute between Gauss and Leg!'ndre 109 r!'d u('(d form 1 SS regr!'ssion diagnostics IG:3 residual vector 107 solution by pseudo-inverse 107 statisticians' notation 109 Legendre, A. M. 109 Levy, L. 186 Lidskii, V. I3. 209, 210 lilH'ar functional f)6 linear system 10], 114 artificial ill-conditioning ] 25 asymptotic forms and deri vatives 130-132 backward perturbation 128-130, ]33, ]35, 136 Oettli Prager t.heorem ]30 Rigal (;adl(s tll(or!'ln 128 structmed 129 condition number 125, 127 artificial ill-conditioning 128 I3auer-Skeel 128, 133 reflected by solution 126 perturbation theory 124 128, 132 I3auer-Skeel theorem 127 component-wise bounds 125, 126 from inverse matrix 124 pertmbation in matrix 124 perturbation in the right-hand side 126, 127 structured perturbation 127, 128 residual bound 128 130 structured perturbation 133 LlNPACK 133 Littlewood, .1. E. 88 Loewy, A. 27 LH algoritlnn ]] Lu, Q.-k. gg Mac Duffee, C. C. 4 majorization 81, 88, 89 Marcus, M. 4 matching distance (see under eigenvalue) 167 matrix function 20 exponential 73 IN D EX   Jordan block 73 Neumann series 73 power series 73 rational 29 spectral resolution 229 matrix lJonn (see under norm) 64 matrix pair characteristic !'quatiolJ 27/1 eigenvalue 273 (0:, (3) notation 273 eigenvector 273 equivalence 276 generalized Schur form 277 Hermitian 281 infinite eigenvalues 271, 273 linear transformations of matrix pairs 275 metrics 284- 288 limitations 288 nonsingular B 274 numerical methods 274 reduction to ordinary eigenvalue problem 274 scaling problems 272 singular 273, 274, 289 systems of differential equations 289 matrix pair (definite) 281-283, 289 bounds for dif :n 9 definition 282 diagonalizability 283 eigenangle 314 perturbation theory 324 variational characterization 314 eigenangles 324 eigenvector 318 fail ure of defini tion in real case 290 359 matching distance 324 pert uroation of eigenspaces 317322 condition number 322 perturbation of eigenvalues 314 n 317 condition 1l1llllhPr :11 :1, 31 G, 324 limitations of the theory 316 perturbation of eigenvectors 322 positive definite B 281, 282, 289 Fischer's theorem 281 projective metric 285-287, 290, 314 right and left eigenspaces 317 sin 8 theorems 323, 324 spectral resolution 318 normalized 318 matrix pair (regular) 273, 274 approximate eigenspace 306-309 continuity of eigenvalues 291 deflating subspace 311 diagonalizable pair 280, 281, 297 eigenspace 312 backward pertmbation 309 characterizations 304 complementary 306 definition 303 eigenvectors 312 simple 305, 30G eigenvalue simple 277 generalized Bauer- Fike theorem 294, 30], 311 generalized IIenrici Theorem 
360 INDEX INDEX 311 generalized Hoffman Wielandt t.llPorem :311 gencralized Schur fOrIn 276, 28\) real pair 290 Gerschgorin theory 294 300 diagonal similarity 297 generalized Gerschgorin theorem 295 simplification of bounds 296 iufinite eigenvalues 274 left proj('ctive m!'!,ric 288, 290 normal pair 311 IlIlmher of eigcnvalues 275 perturbation of eigenspaces 309- 311 perturbation of eigenvalues :311 condition number 294, 300, 303 diagonali,mhle pair 300 - 303 first order approximation 292 first order error bounds 293 gpnpraliZ(d l1al\(r Fike theorem 301 mult.iple eigenvalue 297 300 spectral variation 302 perturbation of eigenvectors 312 recovery of bounds for ordinary eigenvalue prohkm 294, 297 right projective metric 288, 290 sppd.ral nsolut.ion 30G, :no \V('ierstrass canonical form 280, 290 matrix pencil (also see matrix pair) 209, 271 rectangular 272 singular 271 Mcintosh, k 194, 227, 228 metric 53 from a norm 62 pseudo-metric 62 metrics for subspaces 50, 90 H-metric 96, 98, 99 basis metric 95, 9S, gg completeness 99 distance from a vector to a suhspaC!' 90, 99 gap function gO 9 / 1, 98, )J canonical angles 92 definition 91 equivalence of gap functions 91 projections 93 gap topology g 1, 93, 94 projection metrics 93, 94 Schaffer's metric 99 unitarily invariant metriC's 94-98 failure to generate the gap topology 94 unitary invariance 99 Meyer, C. 244 Milman, D. P. 98 Minc, H. 4 Minkowski's inequality 61 Minkowski, H. 59, 61 Mirsky's theorem 194, 204, 206, 208, 209 Mirsky, L. 71, 72, 194, 209, 210 Moler, C. U" 290 Moore-Penrose gcncrali;wd inverse (see pseudo-inverse) 102 Moore, E. H. 108, 109 More, G. 135 Naslwd, M. Z. 108 Jl(arly singular matrix 120 Neumann series 73 nilpotent matrix 29 nondefective matrix (also see diagonalizable matrix) 21, 28 norm 2-norm (q. v.) 3 absolute norm 52, 72, 76 and spectral radius 73 combining nonns 61 consistency 65, 67, 71 defini tion 66 failure 65 family of consistent norms 69 unitarily invariant norms 80 vector norm consistent with a matrix norm 66 convexity and norms 59, 63 dual 57, 59, 67 dual of dual 58 elementary properties 51 elliptic 53, 62, 111 equivalence of norms 54, 55, 59, 65, 72 failure 64 Frobenius norm (q.v.) 65 generated by an inner product 62 generated by linear transformation 53 generated hy positivc definite matrix 53, 62 Hilbert-Schmidt norm 210 361 Hiilder norms 51 57 60 61 64,69, 121, 134' , , infinite dimensional spaces 71 limib; and norms 55, 65 matrix oo-norm 69, 71, 72 matrix I-norm 69, 71, 72 matrix norm 49, 64 of a linear fnnctional 56 operator norm 67, 68, 72 consistency 67, 68 polar 59 spectral radius and norms 67, 72,73 snbordinate norm (see norm, operator norm) 68 unitarily invariant (q. v.) 50 vector oo-norm 51, 55, 69 vector I-norm 51, 55 vector norm 49, 50 nonnalizable matrix (see diagonalizable matrix) 189 normal matrix 3, 5, 171, 191, 194 condition number 134 departnre from nonnality (q.v.) 171 eigenvectors 19 field of values 24 perturbation of eigenvalues 192, 195 Hoffman--Wielandt theorem (q.v.) 189 matrices similar to a normal matrix 216 residual bound 191 Schur decomposition 18 null space 2 O'Leary, D. P. 112 Oettli- Prager theorem 130, 161 
362 INDEX ()pt.t.li, W. U3 orthogonal matrix 3, l!J;I pprturbation of eigpnvalues ID5 orthonormal basis 8 Ostrowski ElslH'r tlH'orem 170 Ostrowski, A. 88, 177, 187 Paigp, C" C. 45, 4G, 99, Ill, 324 Parlett, B. N. 4, 178, 180, 209 Pavpl-Parvu, M" 152 Peano, G. 71 Ppnrose's conditions 102, 110 Penrosp, R. 108, I HI, 151 Pprpyra, V" 152, 155, 163 pNmutation matrix 3, 83, 85 permutation vector 83 85 Picard, It :35 polar decomposition 36 P61ya, G. 88 positive ddinite matrix 3 5, 27, 7:3, 74 (,(JIldition 11I1I1I1)('r 1'22 norm gPIHratpd by 5:3 positive semi-definite matrix 3, 5 square root 20 powers of a matrix 73 Prager, W. 133 principal vpctor 21 projection (oblique) 11, 14, 152 generalized inverse 110 spectral projection 114 with resp('ct to an inner product 111 projection (orthogonal) 9 acute perturbation 137 140, 153 as Hermitian idempotellt 10 asymptotic forms and derivatives 154 canonical <tnglps 43 complemcnt.ary 10 condition number 154 continuity 153 generalized inverse 110 least squares 10 pprturbation of products 141 perturbation theory 153, 154 pseudo-inverse 106 red uced form 153 proj('ction (with respect to a norm) 91 pseudo-inverse 101, 102 application to least squares 107 asymptotic forms and derivatives 150-152 I3jerhanllner's characterization 109 condition number 146, 149, 163 distancc from matrix of lower rank 152 continuity 136, 140, 146, 151 counterexamples 105 elementary properties 104 elliptic 109, III existence and uniqueness 104, 110 expressions for perturbed pseudo- inVf'rse 142 full rank case 108 Gauss, C. F" 108 minimali ty 11 0 Moore's characterization 109 nonacute perturbations 140 orthogonal projections 106 perturbation theory 140-151 acute perturbations 146 150 INDEX g(ncral rcsults 140 '146 Wedin's bounds 142 146 QR decomposition 110 reduced form 137 scaled 111 weighted 111 Pythagorean equality 10 Qi, L. 187 QR algorithm 11, 18 QR decomposition 6 8, 11, 30 existence 7 pseudo-inverse 110 uniqueness 13 with pivoting 11 QR factorization 8, 13 generalized singular value decomposition 47 partitioned 13 quasi-Newton method 134  j ""j   i Rail, L. B. 108 random p('rturbation 131, J 31, 163 Rayleigh -Ritz approximation 207 , 209 Rayleigh quotient 185, 241 generalized 240, 244, 248, 249 Rayleigh, Lord (J. W. Strutt) 210 residual bounds (see under linear system, eigenvalue, etc.) 128 Riesz, F. 60 Rigal- Gaches theorem 128 Rigal, J. L. 134 right inverse 110 Ritz vpctors 210 Ritz, W. 210 Rodman, L. 227 Rohrback, H. 186 363 Rospnblum, M. 227, 228 Bouche's t!worem 167, 176 rounding-error analysis 132, 133 rounding prror 274 row sum norm (spe norm, matrix oo-nonn) 70 Ruhe, A. 244 Saunders, M. A. 45, 46 Schaffer, .J. J. 99 Schmidt-Mirsky theorem 208, 210 Schmidt, E. 11,35, 209, 210 Schur complpment 13 Schur decomposition 17-20, 26, 28, 171, 222 existence 17 of a real matrix 26, 29 of normal matrix 18 uniqueness 26 Schur, 1. 13, 26, 71 Schwarz 60 sep 244 continuity 2;3<1, 23G definition 2;31 Hermitian matrices 247, 258 properties 245 relation to separation of eigenvalues 233, 247, 258 set operations 2 Sherman 'Morrison Woodbury formula 5 similarity transformation 16 ill conditionpd J 7, 21 uni tary 17 singular subspace 259 residual bound 260 262, 266, 267 Wpdin's <I>E> theorems 260, 262, 267 singular valu(' 31