/
Текст
Matrix Perturbation Theory
G. W Stewart
Computer Science Department
Institute for Advanced Computer Studies
University of Maryland
College Park, Maryland
Ji-guang Sun
Computing Center of the Chinese Academy of Sciences
Beijing, China
@
This is a volume in
COMPUTER SCIENCE AND SCIENTIFIC COMPUTING
Werner Reinboldt and Daniel Siewiorek, editors
ACADEMIC PRESS. INC.
Harcourt Brace Jovanovich, Publishers
Boston San Diego New York
London Sydney Tokyo Toronto
,
,
i
L
, "
i
i
i
i
i
I
,
I
j
I
,
Contents
Preface
xiii
I Preliminaries
1 Notation . . . . . . .
Notes and References
Exercises . . . . . . .
2 The QR Decomposition - Projections
2.1 The QR Decomposition
2.2 Hadamard's Inequality
2.3 Projections . . .
Notes and References . . .
Exercises . . . . . . . . . .
3 Eigenvalues and Eigenvectors .
3.1 Definitions and Elementary Properties
3.2 The Schur Decomposition .
3.3 The Jordan Canonical Form
3.4 Invariant Subspaces. . . . .
3.5 The Field of Values. . . . .
3.6 SUlns of Hermitian Matrices
Notes and References . . . . . .
Exercises . . . . . . . . . . . " .
4 The Singular Value Decomposition
4" 1 The Singular Value Decomposition
4.2 Two Inequalities. . . " . . .
1
1
4
5
6
6
8
9
10
11
14
14
17
20
21
23
25
26
27
30
30
33
vii
VIII
CONTENTS
Notes and References
Exercises . " . . . " .
5 Pairs of Subs paces " .
5.1 The CS Decomposition "
5"2 Pairs of Subspaces .
5": Pairs of Projections.
Notes and References
Exercises . . . . . .
II
Norms and Metrics
1 Vector Norms
1.1 Definition . . .
1.2 Examples . . .
1"3 Equivalcnce and Limits.
1.4 Lincar Functionals and Dual N arms "
Notes and References
Exercises . . . . . .
2 Matrix Norms . . . .
2.1 Basic Concepts .
2.2 Operator Norms
Notes and References
Exercises . . . . . . .
3 Unitarily Invariant Norms.
3.1 Von Neumann's Theory
3.2 Properties of Unitarily Invariant Norms
3.3 Doubly Stochastic Matrices and Fan's Theorem
Notes and References . . . .
Exercises . . . . . . . . . . .
4 Metrics on Subspaces of en .
4.1 The Gap. . . . . . . . .
4.2 Unitarily Invariant Metrics.
Notes and References
Exercises . . . . . . . . . . . . .
III
Linear Systems and Least Squares Problems
1 Th(' PS('1\(lo- I nvers(' and L('ast Squares. . . . .
1.1 Generalized Inverses and the Pseudo-Inverse
1.2 Projections and Least Squares. . . . . . . .
34
36
37
37
40
43
45
45
49
50
50
51
53
56
59
60
64
64
67
71
71
74
74
79
81
87 IV
88 .
89 \.
90
94
98
99
101
.102
.102
. 106
CONTENTS
IX
Notes and References . . . .
Exercises . . . . . . . . . . "
2 Inverses and Linear Systems
2.1 Absolute and Relative Errors
2.2 The Inverse Matrix . . . . . .
2"3 Linear Systems . . . . . . . .
2.4 Asymptotic Forms and Derivatives
Notes and References
Exercises . . . . . . . . . . . . . . . . .
3 The Pseudo-Inverse . . . . . . . . . . .
3.1 Projections and Acute Perturbations
3.2 General Results . . . . . . . . " . .
3.3 Acute Perturbations . . . . . . . .
3.4 Asymptotic Forms and Derivatives
Notes and References
Exercises . . . . . . .
4 Projections......
Notes and References
5 The Linear Least Squares Problem
5.1 Perturbation of the Coefficients
5.2 The Residual . . . . . . " . . .
5.3 Backward Perturbations . . " .
5.4 Asymptotic Forms and Derivatives
Notes and References
Exercises . . . . . . .
108
109
114
. 115
117
124
130
132
134
136
137
140
146
. 150
151
152
153
155
155
156
160
160
162
163
163
The Perturbation of Eigenvalues
1 General Perturbation Theorems ......
1.1 Continuity: OstrowskiElsner Theorems
1.2 The Bauer-Fike and Henrici Theorems
1.3 Residual Bounds
Notes and References
Exercises . . . . . . .
2 Gerschgorin Theory: Differentiability
2.1 Gerschgorin's Theorem
2.2 Diagonal Similarities
Notes and References " . .
165
166
166
170
174
176
. 178
180
181
182
" 186
x
CONTENTS
Exercises . . . " . . . . . " " " " . . .
3 N onnal and Oiagonalizable Matrices .
3.1 The Hoffman"' Wielandt Theorem
3.2 Diagonalizable Matrices
Notes and References
Exercises . . . . . . . . . .
4 Hermitian Matrices . . . .
4.1 Inertia and Interlacing
4.2 Wielandt's Theorem and Its Consequences
4.3 Mirsky's Theorem ...........
4.4 Residual Bounds . . . . . . . . . . . .
4.5 Approximation by a Low-Rank Matrix
Not('s and References
ExPfcis('s " " " " . . "
5 Some Further Results
5"1 Non-Hermitian Perturbations
5.2 Similarity Bounds.
Notes and References
Exercises . . . . . .
V Invariant Subspaces
1 The Theory of Simple Invariant Subspaces
1.1 Definition . . . . . . . . . . . . . .
1.2 The Operator T = X I-> AX - X B
1.3 The Spectral Resolution
Notes and References . . . .
Exercises . . . . . . . . . . .
2 Perturbation of Invariant Subspaces
2.1 The Approximation Problem
2.2 Perturbation Theorems. . . .
2"3 Eigenvectors. . . . . . . . . .
2.4 Solution of a Nonlinear Equation
Notes and References
Exercises . . . . . " . " . . " . . .
3 Hermitian Matrices . . . . " . . "
3.1 The Approximation Theorem
3.2 Generalized Rayleigh Quotients
187
189
189
192
193
194
. 196
. 196
. 198
.203
. 2CJ5
.208
" 209
" 210
.211
.212
. 215
. 217
.217
219
.220
.220
.222
.223
.227
.227
.229
.230
.236
.240
.242
.244
.245
.246
.246
.248
L
!
I
!
,
CONTENTS
XI
3.3 Direct Bounds" . " " " . . . . " .
3.4 Residual Bounds for Eigenvalues
Notes and References . " . " . . .
Exercises . . . . . . . . . . . " " "
4 The Singular Value Decomposition
4.1 Two sin e Theorems . . .
4.2 A Perturbation Expansion
Notes and References
Exercises . . . . . . . . . . . .
.249
" 254
.258
.258
" 259
.260
.263
.266
.267
VI Generalized Eigenvalue Problems
1 Background ....."."....
1.1 Matrix Pairs .........
1.2 Triangular and Weierstrass Farms
1.3 Definite Pairs " " . . " . . . .
1.4 Metrics and Their Limitations
Notes and References
Exercises . . . . . . .
2 Regular Matrix Pairs
2.1 Continuity, First Order Theory
2.2 Gerschgorin Theory.
2.3 Diagonalizable Pairs
2.4 Eigenspaces . . .
Notes and References
Exercises . . . . . . .
3 Definite Matrix Pairs
3.1 Eigenvalues of Oefinite Pairs.
3.2 Eigenspaces . . .
3.3 Direct Bounds. .
Notes and References
Exercises . . . . . . .
271
.273
.273
.276
. 281
.283
.289
.290
. 291
. 291
.294
.300
.303
.311
. 312
. 312
. 313
. 317
.322
.324
.324
References
325
Notation
347
Index
351
;
I
!
I
'"
I
I
I
11
r
Preface
The central question of perturbation theory is: How does a function
change when its argument is subject to a perturbation? The function
may be almost anything - the modes of a vibrating system, the solution
of an ordinary or partial differential equation, the states of an electron.
This book is concerned with the perturbation of matrix functions, such
as the solution of a linear system or the singular values of a matrix.
The result of a perturbation analysis may be a perturbation expan-
sion or a perturbation bound. A perturbation expansion approximates
the perturbation in the function in terms of a known perturbation in
the argument. Perturbation expansions are widely used in the physical
sCIences.
A perturbation bound starts with a bound on a perturbation in the
argument - here the perturbations are often called errors - and uses
it to bound the resulting error in the function. Matrix perturbation
theory has traditionally emphasized perturbation bounds. The reasons
are varied, but two are paramount.
The first reason is the widespread use of backward rounding-error
analysis, a technique in which errors made in executing an algorithm
are thrown back on the original data. To complete the analysis, pertur-
bation theory is used to assess the effects of these backward errors on
the accuracy of the computed solution. Since only bounds on the error
are known, only bounds on the error in the solution can be expected.
The second reason is that perturbation bounds often give insight
into the behavior of a function of a matrix under perturbation. For
XIII
XIV
PREFACE
PREFACE
XV
example, it may be possible to determine a multiplier, called a CON-
DITION NUMBER, that converts a bound on the error in the argument
into a bound on the error induced in the function. The knowledge of a
condition number enables one to say whether a function is sensitive or
insensitive to perturbations of its argument.
Although we shall give some perturbation expansions, we will be
chiefly concerned with perturbation bounds. Deriving perturbation
bounds is like cutting a diamond. Tap a problem in just the right
way and it decomposes into one or two informative expressions. Jmash
it with a hammer and it shatters into ugly, uninterpretable pieces. One
of the purposes of this book is to introduce the reader to the art of
deriving perturbation bounds.
Our book began life as a translation of a book in Chinese of Sun [229,
19 8 7] on perturbation theory. As we progressed, however, it became
clear that an expanded treatment was needed. The result is an entirely
new book, in which the spirit of the old still lives.
Matrix perturbation theory is too large a field to fit between the
covers of a single book, and we have had to be selective in our choice
of topics. We have chosen to treat the solution of linear systems and
least squarcs problems (Chapter 111), the eigenvalue problem (Chap-
ters IV and V), and the generalized eigenvalue problem (Chapter VI).
Aesthetics has played a part in our choice of what results to present;
we have generally preferred simple, informative bounds to more com-
plicated, though technically sharper bounds. In particular, we have
not been greatly concerned to seek out optimal bounds when nearly
optimal ones are available for less effort.
The book is divided into chapters, sections, and subsections, with
bibliographical notes and exercises at the end of each section. We
have tried to keep the notation uniform. Our hero is the intrepid, yet
sensitive matrix A. Our villain is E, who keeps perturbing A. When A
is perturbed he puts on a crumpled hat: A = A + E. There are many
parts for A to play ,- he is variously square, rectangular, Hermitian,
normal, and unitary. To avoid confusion, A's current guise is posted at
the beginning of each chapter or, if necessary, section.
The book is largely self contained. We have made a deliberate effort
to keep important material outside the proofs, which in most cases can
be skipped without loss of continuity. When a proof illustrates a general
technique, we point out the fact explicitly.
The notes and references emphasize original sources and historical
development. They also point the reader to topics not treated, many
of which are developed in the exercises. However, the bibliography,
like the book, is not comprehensive, and we apologize in advance to all
those whose work we may have slighted"
The assigning of names to theorems is complicated by the fact that
some theorems have been discovered twice - as theorems in linear al-
gebra and as theorems in functional analysis. We have adopted the
practice of naming the first claimant, whatever his field and adding
names of people who made substantial generalizations. Thus, the t.he-
orem on low-rank approximation, known to many as the Eckart--Young
theorem, is called the Schmidt- Mirsky theorem after Schmidt, who
proved it for integral equations, and Mirsky who showed it held for all
unitarily invariant norms. Difficult cases are treated in the notes.
The exercises are intended to amplify the material in the text and to
introduce new results. They range from the trivial to the very difficult.
We have not graded them, but the reader will do well to assume that
any exercise with a reference attached is a major undertaking, especially
in the later chapters.
We debated whether to include numerical examples. Our decision
not to was directed by the increasing availability of interactive systems
that manipulate matrices. With these systems it is a small matter to
play with a perturbation bound, watching it perform under a variety
of circumstances. Beside such lively exercises, a printed example is a
thing of lead and stone.
We wish to thank Nick Higham, Xiaobai Sun, and Guodong Zhang
for reading and commenting on parts of the text. Sven Hammarling and
Vince Fernando furnished us with historical material on the singular
value decomposition. We are particularly indebted to the staff of the
library of the National Institute of Standards and Technology, who
contributed much to the completeness of the bibliography. We are also
indebted to the National Science Foundation of the People's Republic
of China for its support.
College Park
Beijing
1990
I
I
I
,
I
,
,
Chapter I
Preliminaries
This chapter prepares the ground for the chapters that follow. To keep
it reasonably short it contains only material that is used in two or more
chapters. Background for the individual chapters is developed in initial
introductory sections.
The first section introduces some notation" In Section 2 we intro-
duce the QR decomposition. In Section 3 we review the elementary
theory of eigenvalues, eigenvectors, and invariant subspaces. In Sec-
tion 4 we introduce the singular value decomposition. The chapter
concludes with a section on the geometry of pairs of subspaces.
1. Notation
In this section we will review our basic notation. Other notation will
be introduced as needed. A summary of all notation is given in an
appendix at the end of the book.
The real numbers will be denoted by R. The space of all n-dimen-
sional column vectors with components in R will be denoted by Rn.
The set of all m x n matrices with elements in R will be denoted
by RTnxn. The complex numbers, their vectors, and matrices will be
denoted bye, en, and e TnXn . To avoid clutter, we will use this notation
sparingly, giving dimensions only when they are required and cannot
be inferred from context.
1
2
1. PRELIMINARIES
Generally we will use lower-case Greek letters for scalars, lower-
case Latin letters for vectors, and upper-case Latin and Greek letters
for matrices. However, we will not make a fetish of this convention
,
especially for scalar valued functions like the determinant and rank.
Sets of all kinds will be denoted by calligraphic letters (but note Rand
e in the last paragraph.)
We will maintain a loose association between the letter denoting a
vect,or or matrix and the lower-case Greek letter denoting its elements.
Thus aij will usually denote the (i, j)-element of a matrix A, and (3i the
ith component of b. The reader should keep in mind the association of
with :r and 11 with y.
The zero matrix, vector, or scalar will all be written O. The identity
matrix will be written 1- or In when it is necessary to specify the
order" The vector of all ones of any dimension will be written 1 (a
boldface one). The ith unit vector, whose ith component is one and
whose other components are zero, is 1 i (this unusual notation is due
to the fact that in matrix perturbation theory the more conventional
symbols e and ei are needed to represent errors).
The transpose of the matrix A will be written AT, and its conjugate
transpose AH; i.e., AH = J'F. As usual AI will denote the inverse of A.
The inverse of the transpose of A will be written AT, and the inverse
conjugate transpose A -H.
Opcrations on or between sets will run through all combinations of
the members of the sets. For example, if X, Y c en, then
,1'+ Y = {x + Y: x E X, Y E Y}
and
AX = {Ax: x EX}.
The COLUMN SPACE of A E e mxn is
R(A) = {Ax: x E en}.
Its NULL SPACE is
N (A) = {x : Ax = o}"
The RANK of A is rank(A) = dim[R(A)], where dim(X) denotes the
dimension of the subspace X. The DETERMINANT of a square matrix A
will be written det(A), and its TRACE written trace(A).
1. NOTATION
3
We will write Ilxlb for the 2-norm of a vector, which is dcfined as
the positive square root of 2:i ld2 = xll.T" The 2-nonn is also called
Euclidean norm, since in real two or three dimensional space it is the
Euclidean length of its argument. The properties of this and other
vector norms are treated in detail in Section ILL
The matrix IAI is the matrix whose elements are laijl. We write
A > B to mean that aij 2': (3ij for all i, j, with similar definitions for
th;relations >, ::;, <. Note that this notation is inconsistent with the
convention by which A > B means that A - n is positive definite.
The symbol "(,, is used in definitions to introduce new terminology.
The rclation "::::::" is used for implicit definitions; for example, (a, (3 +
,) :::::: (a, 8) defines 8 to be (3 + f. ..
Some special types of matrices are collected in the followmg defim-
tion.
Definition 1.1. A matrix A is
1. SYMMETIC (HERMITIAN) if AT = A (AH = A);
2. POSITIVE DEFINITE (POSITIVE SEMI-DEFINITE, NEGATIVE DEFINITE,
NEGATIVE SEMI- DEFINITE) if it is Hermitian and xII Ax > (2':, < ,
::;) 0 for all x =/: 0;
3. UNITARY, or ORTHOGONAL in the real case, if A H A = AA H = I;
4. NORMAL if AHA = AAH;
5. UPPER TRIANGULAR if it is square and i > j =} O'ij = 0; i.e., if it
is zero below its diagonal;
6. LOWER TRIANGULAR if it is square and i < j =} aij = 0; i.e., if it
is zero above its diagonal;
7. DIAGONAL if it is upper and lower triangular; i.e., its nonzero
elements are on its diagonal;
8. a PERMUTATION MATRIX if it is obtained by permuting rows and
columns of the identity matrix.
4
1. P RELIMIN ARIES
The notation diag( 6 1 ,6 2 , . . . ,6 n ) will mean a DIAGONAL MATRIX whose
diagonal elements are 6 1 , (h, . . . , 6n. The scalars 6 i may be replaced by
square matrices, in which case the matrix will be said to be BLOCK
DIAGONAL. BLOCK TRIANGULAR matrices are defined similarly.
Notes and References
This book presupposes a knowledge of basic matrix theory, for which there
are any number of good introductory texts. For more advanced treatments
see [98, 140]. The books by Gantmacher [81, 1959] and Bellman [21, 1970]
and the little pamphlet by Marcus [151, 1960] are classics. The perturbation
theorist will find the survey of matrix inequalities by Marcus and Minc [152,
1964] particularly useful. Other useful references on inequalities are [20,
160]. Mac Duffee [150, 1946] and Wedderburn [257, 1934] contain extensive
references to the older literature. We have drawn heavily on the former in
preparing the notes for this book.
Much of matrix perturbation theory comes from numerical linear algebra.
For an entry into this area see the books by Varga [252, 1962], Householder
[121, 1964], Wilkinson [269, 1965], Stewart [203, 1974], and Golub and Van
Loan [93, 1983]. Particular mention should be made of Parlett's excellent
book on the symmetric eigenvalue problem [175, 198o], which contains a host
of perturbation results. Another source of matrix perturbation theory is the
specialization of perturbation theorems for operators in infinite dimensional
spaces - a vast area with an extensive literature. The definitive reference is
by Kato [135, 1966]. A third, hybrid source is the approximation of linear
operators by finite dimensional operators, for which see the book by Chatelin
[43, 1983].
There is no standard notat.ion for matrix theory. The one adopt.ed here will
not be unfamiliar to numerical analysts.
When A E em x n is regarded as an operator on en, its range is the same
as the space spanned by its columns - hence t.he use of R(A) to denote the
column space of A.
Our definit.ion of positive definite carries the implication that the matrix is
symmetric. Some authors (e"g., see [276]) drop this requirement.
1. NOTATION
5
Exercises
1. Let A be nonsingular. Show that if T = I + V" A -I U is nonsingular, then
(A + UV")-I = A-I - A-IUT-1V H AI.
For a history of this useful formula, which is sometimes called the Sherman
MorrisonWoodbury formula, see [108].
2. Show that R(A) nN(A") = {OJ.
3. Show t.hat det(I - 1W H ) = 1 - V H l1.
4. (Cauchy inequality). Show that IIx H yl12 :::: II:rIl2I1yIl2' with equality if and
only if x and yare linearly dependent.
5. (Triangle inequality). Show that II.T + yl12 :::: IIxll2 + Ily112'
6. Show that trace(AHA) = Li,j IO:ij12.
7. Show that the diagonal elements of a Hermitian matrix are real.
8. A matrix A is SKEW HERMITIAN if A" = -A. Show that the diagonal
elements of it skew Hermitian matrix are imaginary.
9. Show that any square matrix A can be written uniquely in the form
A = A + iA
where A and A are Hermit.ian.
10. Show that if A is positive definite, then Ax = 0 => x = O.
11. Show that for any matrix A the cross-product matrix AT A is positive
semi-definite. Show that AT A is positive definite if and only if the columns
of A are linearly independent.
12. Let 1111112 = 1. Show that H = 1- 21lu H is Hermitian and unitary. The
matrix H is called a HOUSEHOLDER TRANSFORMATION.
13. Show that triangular matrix is normal if and only if it is diagonal.
14. A matrix is STRICTLY UPPER TRIANGULAR if it is upper triangular with
zero diagonal elements. Show that if A is a strictly upper triangular matrix
of order n then An = O.
6
I. PRELIMINARIES
2. The QR Decomposition - Projections
Throughout this book we will be confronted with the following prob-
lem: Given a matrix A, find an orthonormal basis for R(A) and an
ort.honormal basis for the orthogonal complement of R( A)" In this
section we will ta.ke a constructive approach to the problem via the
QR decompositiOlL We will then use the resulting bases to construct
orthogonal projections onto a sllbspace a.nd its complcmenL
2.1. The QR Decomposition
To establish the existence of the QR decomposition, we will use a
lemma, which is useful in its own right. Its proof is purely compu-
tational and is left as an exercise.
Lemma 2.1 (Householder). Let Ilxlh = 1 and suppose that the first
component of x is real and nonnegative. Let
x + 1 1
u=
II x + 1 1 112'
Then the matrix
H = I - 2uu H
is Hermitian, and unitary. Moreover,
Hx = -1 1 ,
(2.1)
Equation (2.1) can be written in the form
H11 = -x.
In other words, there is a unitary matrix whose first column is -x. By
scaling the first column, we may change it to any multiple of x. Hence,
given any vector x of 2-norm one, there is a unitary matrix
whose first column is x.
We may use this result to reduce a matrix to upper triangular form by
a unita.ry transfonnatiOlL
2. THE QR DECOMPOSITION
7
Theorem 2.2. Let A be an m x n matrix with m :::: n. Thcn thcrc is
a unitary matrix Q such that
QUA ( ) ,
where R is upper triangular with nonnegative diagonal elements.
Proof. The proof is by induction on the number of columns of A. Let
n = 1, so that A is a vector-call it a. Let Q be a unitary matrix
whose first column is aillalh (if a = 0 let Q = I). Then
QUa ( III, ) ,
which is in the required form"
Now let A have n > 1 columns, and partition A in the form A =
(a A*). Let H be a unitary matrix whose first column is aillall2 (if
a = 0 let H = I). Then HH A can be written
HU A (lIaI' I).
By hypothesis there is a unitary transformation V such that
vHe U )
where 5 is upper triangular with nonnegative diagonal elements. If we
set Q = Hdiag(l, V), then
( Ilalh b H )
QH A = 0 5 ,
o 0
which is the required upper triangular form. .
8
I. PRELIMINARIES
Let us now assume that A has rank n. Since R is of order n, it
must be nonsingulc.r with positive diagonal elements. Partition Q =
(QA Q), where QA has n columns. Then
A = (Q A Q) ( . ) = Q A R.
(This decomposition is sometimes called the QR FACTORIZATION of A;
it is essentially unique [Exercise 2.6].) It follows that R(QA) = R(A);
i.e., the columns of QA form an orthonormal basis for R(A). More-
over, the columns of Q form an orthonormal basis for the orthogonal
complement R(A)J.. Thus we have the following corollary.
eorollary 2.3. Let X be a subspace of en. Then there is a unitary
matrix Q = (Q.l' Q) such that R( Q,y) = X.
Proof. Let. the colUlllns of A forIll a basis for X. -
2.2. Hadamard's Inequality
The main use we will put the QR decomposition to in this section is to
establish the properties of orthogonal projectors. But before we do, we
shall digress to establish an important determinantal inequality due to
Hadamard.
Theorem 2.4 (Hadamard). Let A = (al a2 .., an) be of order n.
Then
'I . " '
n
1 det(A)1 ::; II IlaJ2'
j=l
Moreover that equality holds if and only if A has a zero column or A
is llni tar)'.
Proof. Let QIIA = R == (rl,...,r n ) be the QR factorization of A.
Since premultiplication by an orthogonal matrix does not change the
magnitude of the determinant or the norms of the columns of A, we
have
(2.2)
I det(A)\ = I det(R)1 ::; II IPii! ::; II Ilrilb = II Ilailb.
If equality holds then either det(A) = 0 and one of the ai is zer or
dd.(A) of 0 and {Iii = 111/.;1\2 (i = 1,... , n), which can only happen If A
is unitary. -
2. THE QR DECOMPOSITION
9
2.3. Projections
Let X be a subspace of en and let the columns of Q,y forIll an or-
thonormal basis for X. The matrix
P,l' = QA:'Q
(2.3)
is called the ORTHOGONAL PROJECTION ONTO X. Informally, P,y acts like
the sun at high noon, projecting a vector z into its shadow P,yz on the
ground X. In this subsection we will treat the elementary properties of
projections.
For (2"3) to be a proper definition, it must be independent of the
choice of Q,y. In fact, if the columns of Q,y also form an orthonormal
basis for X, then Q,y = Q,y U for some nonsingular U. Since
I = QQ,y = UIIQQxU = UHU,
it follows that U is unitary. Hence
, 'II II II H
Q,yQ,y = Q,yUU Q,y = Q,yQ.l' = P,y.
The matrix P,y is Hermitian (P,l! = P,y) and idempotent (P = P,y).
The fact that P,y is idempotent implies that for aliy x E X we have
P,yx = x. In fact, since R(P,y) = X, we must have x = P,yw for some
vector w. Hence P,yx = Pw = P,l'W = x. On the other hand, the fact
that P,y is Hermitian implies that P,yy = 0 for any y E Xl.' In fact,
y must be orthogonal to the columns of P,l'; that is, 0 = (yH p,1:.)H
P1!y = P,yy.
It follows that if we decompose any vector z in the form
z = x + y,
x E X, Y E Xl.,
(2.4)
then x = P,yz and y = (I - P,y )z. Thus P,y projects z orthogonally onto
its component in X, and I - P,y = P,t projects z onto its component
in Xl..
Moreover, if P is any Hermitian, idempotent matrix with column
space X, then by the above argument pz = x, where x is defined by
(2.4). It follows that p,\,z = pz for all z, and hCllce PA:' = P. III other
words,
10
I. PRELIMINARIES
2. THE QR DECOMPOSITION
11
z = p'yZ + Pl.Z
triangular. The requirement that the first component of.T be nonnegative is
more than a trick to avoid tLe degenerate case :1: = -11; it is necessary for
numerical stability.
In one sense the QR decomposition goes back to Gram [97, 1883], who intro-
duced the idea of orthogonalizing a sequcnce of functions and gave a determi-
nantal expression for the resulting sequence: Schmidt [192, 1907] described
the orthogonalization technique now known as the Gram-Schmidt algorithm
and pointed out that the results are the same as Gram's. If Schmidt's fornlU-
las are applied to the columns of A, they compute the columns of QA with the
elements of R appearing as the coefficients in the expansions (Exercise 2.4).
A drawback of the GramSchmidt approach to thc QIl decomposition is that
it does not provide an explicit basis for R(A)l..
The name of the decomposition derives from Francis's QIl algorithm for
finding the eigenvalues of a matrix [75, 1961 2] and its precursor the LR al-
gorithm [191, 1955]. The letter R comes from the German word recht the
equivalent of English upper in reference to triangular matrices. The letter
Q was chosen "somewhat arbitrarily" by Francis.
Projections do not have to be orthogonal. In fact, any idempotent matrix
P, Hermitian or not, can be regarded as an oblique projection onto R(P)
along N(pH) (see Exercise 2.10). We will use such oblique projections in
the perturbation theory for invariant subs paces (Chapter V).
For more on least squares approximations, see the notes and references to
Section IlLI.
any Hermitian, idempotent matrix is the orthogonal pro-
jection onto its column space.
By way of notation, we will write P A for the orthogonal projection
onto R(A). We will write Pj; and pl for the projections complemen-
tary to P.t' or PA- When X or A can be inferred from context, we will
simply write P and Pl. for the complementary projections.
Since the vectors in the decomposition
are orthogonal, we have
IlzlI = (Pxz + pl.z)lI(pxz + Pl.z)
= (PXZ)H(pxz) + (Pl.Z)H(pl.Z)
= IIPxzll + IIPl.zlI,
a relation we shall call the PYTHAGOREAN EQUALITY. This equality IS
the basis of the following important theorem.
Theorem 2.5. If X is a subspace of en, then Pxz is the unique vector
saUs(ying
Ilz - Pxzlb = minllz - .T112'
xEX
Proof. For any x E X
Ilz - xll = II (Pxz - x) + Pl.zlI = IIPx z - xll + IIPl.zlI.
Exercises
The right-hand side of this relation is clearly minimized when its first
t.erm is zero; that is, it is minimized if and only if x = Pxz. .
Thus Px z is the vector in X that is nearest z in the 2-norm. It is
called the LEAST SQUARES APPROXIMATION to z, since the sum of squares
of the absolute values of the components of z - Pxz is minimal.
1. (Householder [120]). Show that if x =1= y, IIxll2 = lIy112, and yHx is real,
then there is a Householder transformation H such that H x = y.
2. Let Tn n and A E c rnxn have rank k. Show that there is a permutation
matrix P and an orthogonal matrix Q such that
QHAP= ( I
R I 2 )
o '
Notes and References
where Rll is a nonsingular, upper triangular matrix of order k. Moreover,
Although Householder transformations are mentioned in passing by Feller
and Forsythe [73, 1951], Householder [120, 1958] was the first to use them
systematically to reduce a matrix to a simpler form, in this case upper
'Precursors may be found in the works of Laplace [141, 1820J and Chebyshev [44, 1859]" But
the former is not concerned with the decomposition qua decomposition (the formula.s are used to
determine the variance of a least-squares estimate), and the latter restricts himself to polynomials.
12
1. P RELIMIN ARIES
the columns of the matrix
p ( Rj12 )
form a basis for N(A).
:t Use Exercise 2.2 to show that any matrix A can be written in the form
A = Fell, where F and e have full column rank. This factorization, which
is not unique, is called a FULL RANK FACTORIZATION of A.
THE NEXT TWO EXERCISES TREAT TilE GRAM-SCHMIDT AL-
GORITHM, AN IMPORTANT ALTERNATIVE FOR COMPUTING THE
QR DECOMPOSITION"
4. (The Gram Schmidt. algorithm). Let A = (al a2 ... an) have rank n,
and consider the followinp; alp;orithm.
for j := 1 to n
qj : = aj
for i := 1 to j-l
T
Pij := qi qj
end for
for i := 1 to j - 1
qj := qj-Pijqi
end for
pjj := JqJ qj
.- -\ /
qj . - Pjj j
end for
Show that the algorithm goes to completion. Moreover, if we set QA
(ql q2 "., q,,) and
l PH P12 PIn ]
o P22 p2n
o 0 pm
then QQA = I and A = QR; i.e., the GramSchmidt algorithm computes
the QR factorization of A.
R=
fL (Th\' IIHHlilh'd Gram Schmidt, algorithm). Show that if tllP two inner
loops in the Gram=-Schmidt algorithm are replaced by the single loop
2. THE QR DECOMPOSITION
13
for i : = 1 to j - 1
Pij : = q; qj
qj := qj - PiN;
end for
Then the same decomposition is computed. This modified procedme has
superior numerical properties [35].
-0-
6. Let A = Q R be a QR factorization of A. Show that if A has full column
rank, then any QR factorizations of A has the form A = (QD)(DR), where
IDI =1.
7. Let R be the R-factor in the QR factorization of A. Show that All A =
R H R. The matrix R is called the CIlOLESKY FACTOR of All k
8. (The partitioned QR factorization). Let A E C mx1I , where 111 > n. Let
A be partitioned in the form A = (AI A 2 ), where Al has full colu;;m rank.
Let
(AI Ad'(Al A2) = ( C: C12 )
C I2 C22
be a conformal partitioning of the cross-product matrix C = AH A, and let
:: )
be a conformal partition of a QR factorization of k Show that
( ( Rll
Al A2) = (Ql Q2) 0
1. Al = QIRll 1'. C ll = RliRll
2. PAl A 2 = Q I RI2 2'. C 1 /C 1 2 = Rj/ RI2
3. PX l A 2 = Q2R22 3'. C22 - cttc 1 /C l2 = Rr2 R 22.
The matrix C 2 2 - C{C1/CI2 is called the SCHUR COMPLEMENT of Cll in C
[193, 1909] and appears in many applications. For more see [108, 170].
9. Let A = (al ... an) be of order n and let p/ be the projection onto the
orthogonal complement of the space spanned by al,..., a;-l (take Pt = I).
Show t.hat
n
I det,(A)1 = I1IJP/aiI12'
i=1
14
I. PRELIMINARIES
Hence deduce Hadamard's inequality.
10. Show that if P is idempotent then R(P) ffi N(P) = en. Conclude that
any idempotent matrix P is the projection onto R(P) along N(P). Such
projections arc called OBLIQUE PROJECTIONS.
11. Show that if P is any projection then rank(P) = trace(P).
3. Eigenvalues and Eigenvectors
This section is devoted to the eigenvalue problem Ax = Ax.
Throughout the section, A will denote a matrix of order
n.
3.1. Definitions and Elementary Properties
We begin with the usual definition of an eigenvalue and eigenvector of
a matrix. Loosely speaking, an eigenvector is a vector that does not
change direction when it is multiplied by A, and its eigenvalue is the
amount by which it shrinks or expands in the process.
Definition 3.1. The pair (x, A) is called an EIGENPAIR of the matrix
A if x i- 0 and
Ax = AX.
The vector x is called an EIGENVECTOR of x, and A is its associated
EIGENVALUE. The set of all eigenvalues of A is written £(A).
A word on nomenclature is appropriate here. The prefix eigen is
German, and in this connection it means something like "characteris-
tic." Purists originally objected to the hybrid translation "eigenvalue"
for the German eigenwert, preferring one or another of a host of names
(characteristic value, proper value, latent root, etc.). By now, however,
"eigen" has become a living English prefix that means "pertaining to
eigenvalues and eigenvectors," and we will use it with complete free-
dom -, as we did in defining the term eigenpair above.
The equation Ax = Ax may be written in the form (M - A)x = 0,
from which it is seen that A is an eigenvalue of A if and only if M - A
is singular or, equivalently, if and only if
cPA (A) det(M - A) = 0"
3. EIGENVALUES AND EIGENVECTORS
15
The function cPA (A) is a polynomial of degree n in A and is called the
CHRACTERISTIC POLYNOMIAL of A. Consequently, a matrix has exactly
n eIgenvalues, each distinct eigenvalue being counted according to its
multiplicity as a root of the CHARACTERISTIC EQUATION cPA(A) = O. An
eigenvalue whose multiplicity is one is called a SIMPLE EIGENVALUE.
The characterization of eigenvalues in terms of the characteristic
olynomial laS sOIIe important consequences. First, since cPA (A) = 0
If and onlY_If cPAH(A) = 0, each eigenvalue A of A corresponds to an
eigenvalue A of All. Hence there is a vector y such that All y = y or
. I H H' '
eqUlva ently y A = Ay . The vector y IS called a LEFT EIGENVECTOR
of A (and the original eigenvector is called a RIGHT EIGENVECTOR of A
when it must be distinguished from y).
Second, if A is real, then its characteristic polynomial is real and
its complex eigenvalues must occur in complex conjugate pairs. '
Third, if A is block triangular, say
All A I2 Alk
0 A 22 A 2k
A=
0 0 Akk
then cPA (A) = cPA ll (A)cPAn(A)' . . cPAkk (A). Hence
the eigenvalues of a block triangular matrix are the eigen-
valucs of its diagonal blocks.
In particular, the eigenvalues of a triangular matrix are its diagonal
elements.
If (x, A) and (y, A) are eigenpairs of A, then (ax + (3y, A) is also
an eigenpair, provided ax + (3y i- Q. Thus the set of all eigenvectors
corresponding to an eigenvalue A together with the zero vector form a
subspace, which is equal to N(AI - A). The dimension of this'sub-
space, dim[N(AI - A)] = n - rank(M - A), is called the GEOMETRIC
MULTIPLICITY of A. It is easy to see that the geometric multiplicity of an
eigenvalue is not greater than its ALGEBRAIC MULTIPLICITY as a root of
the characteristic equation. It can, however, be smaller as the following
example shows"
16
1. PRELIMINARIES
Example 3.2. Let
A( )
Then cPA PI) = (A - I?, so that 1 is an eigenvalue of algebraic multi-
plicity two. On the other hand, the only eigenvector of A is a multiple
of 1 1 , so that the geometric multiplicity of the eigenvalue 1 is one.
An eigenvalue whose geometric multiplicity is less than its algebraic
multiplicity is said to be DEFECTIVE A DEFECTIVE MATRIX is one with
at least one defective eigenvalue. Unfortunately defective eigenvalues
and their matrices are the bane of matrix perturbation theory, as the
following continuation of the above example shows.
Example 3.2 (continued). Let
AC )
be a perturbation of A. Then cPA (A) = (A - I? - E, so that the eigen-
values of A are 1:!::.,fE. Thus the eigenvalues of A are not differentiable
at E = 0" Moreover, if E = lOl(), then the eigenvalues of A and A differ
by 1O5. Thus a perturbation of 10- 10 in A can cause a perturbation
in its eigenvalues that is five orders of magnitude larger.
The use of a tilde (viz. A) to denote a perturbed quantity is the
first occurrence of a convention that will be used throughout this book.
See the introduction to Section lI1.2 for more details.
If (x, A) is an eigenpair of A and U is nonsingular, then
(U- I AU)U-Ix = AU-lX,
which shows that (UIX,A) is an eigenpair of U-lAU. Thus the SIM-
ILARITY TRANSFORMATION A -+ U- I AU preserves the eigenvalues of
A and transforms its eigeIlvectors by U- I . Since rank(AI - A) =
rank(AI - U- I AU), the geometric multiplicities of the eigenvalues are
invariant under similarity transformations" Since cPA(A) = cPU-1AU(A),
the algebraic multiplicities are also preserved. In the next two subsec-
tions we will consider how far a matrix may be simplified by similarity
tr ansformations.
3. EIGENVALUES AND EIGENVECTORS
17
3.2. The Schur Decomposition
A major theme of matrix theory is the reduction of matrices to a
simple form by similarity transformations. For example, it will fol-
low from the results of the next subsection that if a matrix A is not
defective then there is a nonsingular matrix X such that X-I AX =
A == diag(AI' A2,.' ., An), where the Ai are the eigenvalues of A. The
advantages of being able to work with a diagonal matrix instead of a
full one are obvious.
Unfortunately, similarity transformations can introduce problems of
their own. Suppose, for example, that the matrix of the last paragraph
is perturbed by an error E. Then the diagonalized problem is A +
X-l EX. If X and its inverse are large (or if X is ILL CONDITIONED, as
we shall learn to say in Chapter III), then the effect of the similarity
transformation is to magnify the error.
In this connection, unitary similarity transformations of the form
A -+ U H AU are particularly desirable, since neither U nor its inverse
U H can be large. Specifically, from the fact that UHU = I, it follows
that the jth column of U satisfies lIuj 112 = 1. Hence no element of U
or its inverse can be greater than one in absolute value.
In general we cannot reduce a matrix to diagonal form by uni-
tary similarities (informally, the relation U H U = I implies that U has
roughly n 2 /2 degrees of freedom, too few to satisfy the roughly n 2 con-
ditions that the off-diagonal elements of U H AU be zero). However,
the following theorem shows that we can reduce an arbitrary matrix to
triangular form by a unitary similarity.
Theorem 3.3 (Schur). There is a unitary matrix U such that T =
U H AU is upper triangular. The matrix U may be chosen so that the
eigenvalues of A appear on the diagonal of T in any order.
Proof. The proof is by induction. The theorem is trivial for a matrix
of order one. Assume it is true for all matrices of order less than n > 1.
Let an ordering of the eigenvalues of A be given, and let A be the first
eigenvalue of the ordering. Let Ax = AX, where IIxl12 = 1, and let
H = (x X) be a unitary matrix.
18
I. PRELIMINARIES
The matrix Il H AIl has the form
H A ( xHAx xHAX )
H H=
XHAx XHAX .
Since Ax = Ax and J;IIX = 1, we have xII Ax = AXIlX
Xlix = 0, we have X H Ax = AXHX = 0" Thus
A. Since
( A x H AX ) ( A b H )
HIIAH = .
o X H AX 0 M
Since HH AIl is block triangular, the eigenvalues of the matrix Mare
the eigenvalues of A other than A. By hypothesis, there is a unitary
matrix V such that V H MV is upper triangular with its eigenvalues in
the correct order. If we set U = (x XV), then
T = U H AU = ( A o bHV )
VHMV
is the required decomposition. ·
Although the proof of Schur's theorem is not difficult, the result-
ing decomposition is one of the most important tools in theoretical
and computational linear algebra. Computationally, it is the target of
the QR algorithm, the single most successful algorithm for computing
eigensystems of general matrices. We will encounter its theoretical uses
throughout this book, beginning with some elementary consequences in
this chapter.
Recall (Definition 1.1) that a matrix A is normal if AHA = AA H .
This seemingly innocuous equation has important consequences.
Theorem 3.4. If A is normal then any Schur decomposition of A is
diagonal.
Proof. The theorem is trivial for a matrix of order one. Assume it is
true for all matrices of order less than n > 1. Let T = U H AU be a
Schur decomposition of A and partition T in the form
( Tt.'r )
'1'-
- 0 '1'. .
3. EIGENVALUES AND EIGENVECTORS
19
Now if A is normal, any matrix unitarily similar to A is normal. Hence
TilT = TT H , from which it follows that
ITI2 = ITI2 + tHt,
and
TilT = T T II
* * * * .
The first of the equations implies that t = O. The second says that T. is
normal, and hence by hypothesis it is diagonal (since it is triangular).
Thus T is diagonal. .
If we write T = A = diag(AI' A2"'" An) and U = (u] U2 '" Un),
then AU = UA or
AUj = AjUj,
j=I,...,n.
In other words the Uj are eigenvectors of A. Since the eigenvectors are
pairwise orthogonal, we have the following corollary.
Corollary 3.5. A normal matrix of order n has a system of ortlJOnor-
mal eigenvectors that span en.
A matrix chosen at random is unlikely to be normal. However, there
are two frequently occurring classes of normal matrices: unitary matri-
ces and Hermitian matrices. They are distinguished by the situation of
their eigenvalues.
eorollary 3.6. A unitary matrix is a normal matrix with eigenvalues
on the unit circle. A Hermitian matrix is a normal matrix with real
eigenvalues.
The proof of this corollary is left as an exercise. It immediately implies
that any Hermitian matrix can be written in the form
A = UAU H ,
where U is unitary and A is diagonal and real. We will call this the
SPECTRAL DECOMPOSITION of A. We will sometimes write it in the form
A = 2:= Ailli U : I ,
(3.1 )
20
I. PRELIMINARIES
where lLi is the ith column of U. This form allows us to extend scalar
valued functions to Hermitian matrices: namely,
) clef ( ) H
<p( A = L.. <p Ai lLilLi .
In particular if A is positive semi-definite, its eigenvalues are nonnega-
tive and its SQU ARE ROOT
1
A ! clef \"2 II
2 = L.. Ai lLilLi .
is well defined.
3.3. The Jordan eanonical Form
Although the Schur decomposition is useful in many applications, in
others it does not go far enough in reducing its matrix. The ques-
tion thus arises of how much we can simplify a matrix using similarity
transformations. Example 3.2 shows that in general we cannot hope
to reduce a matrix to diagonal form. However, we can reduce it to a
simple block diagonal form, as the following classical theorem shows.
Theorem 3.7 (Jordan). For any A there is a nonsingular X such
tha t
X-I AX = diag(Jkl (Ad, Jk2(A2)"'" Jk,(Al)),
where Jk(A) E C kxk is a JORDAN BLOCK of the form
A 1 0 0 0
0 A 1 0 0
Jk(A) (!:gf
0 0 0 A 1
0 0 0 0 A
(3.2)
The right-hand side of the decomposition is unique up to the ordering
of the blocks.
For a proof of this theorem (which is not easy) see any good linear
algebra text.
'
,'!
't
3. EIGENVALUES AND EIGENVECTORS
21
The eigenvalues of a matrix corresponding to Jordan blocks of arder
greater than one are defective (n.b", the same eigenvalue can occur in
different blocks).. Thus a nondefective matrix has only 1 x 1 Jordan
blocks, or equivalently every nondefective matrix can be diagonalized
by a similarity transformation. For this reason a nondefective matrix
is also called a DlAGONALIZABLE matrix.
If we partition X = (XI,... , .Tn) by columns, then the first k l vectors
associated with the Jordan block J k1 (AI) satisfy the equations
XI = AIXI,
Xj = AIXj + Xj_l,
j=2,..",kl'
Such a sequence is called a chain of PRINCIPAL VECTORS of A. The
columns of X corresponding to the other Jordan blocks also form chains
of principal vectors.
In spite of its solid appearance, the Jordan form is a fragile thing.
The continuation of Example 3.2 shows that the slightest perturbation
can shatter it to bits. Moreover, the ones on the superdiagonal of a
Jordan block represent an arbitrary normalization. For example the
similarity
U ,r u n u ,) U n (3.3)
puts (j's on the superdiagonal of J 3 (A). (Note that as (j approaches zero,
the transformation becomes increasingly ill conditioned). For these
reasons, there has been a tendency to shy away from the Jordan form,
at least in applications.
3.4. Invariant Subspaces
One of the ways of handling instabilities in the eigensystem of a matrix
is to decompose the matrix into smaller matrices acting on subspaces"
Although the matrices may be ill-behaved within their subspaces, the
subspaces themselves often turn out to be insensitive to perturbations.
. For example, if in (3.2) we partition X = (XI '" XI) conformally
wIth the Jordan blocks and similarly partition X- H = (Y I ".. Yi), then
A = XddAdy 1 H + X2Jk2(A2)Y2H + '" + Xtlk;(AI)}{H. (3.4)
22
1. PRELIMINARIES
Since yH = X-I, it follows that
AX i = XiJdAi),
i=I,...,1.
(3.5)
In other words, A can be regarded as an operator mapping n(X i ) into
itself. If u = XiV is a representation of u in the basis formed by the
columns of X, then by (3.5)
AXiv = X;Jk.(Ai)V;
i.e", V maps into JdAi)V. This means that Jki(Ai) is the representation
of A on R(X i ) with respect to the basis formed by the columns of Xi'
These considerations suggest the following definition.
Definition 3.8. The Sllbspace A' is an INVARIANT SUBSPACE of A if
AXeA'.
Some of the facts about invariant subspaces are contained in the
following theorem.
Theorem 3.9. Let X be an invariant subspace of A, and let the col-
umns of X form a basis for A'. Then there is a unique matrix L such
that
AX = XL.
(3.6)
The matrix L is the representation of A on X with respect to the basis
X. In particular (v, A) is an eigenpair of L if and only if (X v, A) is an
eigenpair of A.
Proof. Let X = (XI ... Xk). Since AXi E X, it can be expressed as
a unique linear combination of the columns of X; that is, Ax; = Xli
for some unique vector ii' The matrix L = (h .,. ik) is the required
matrix. The fact that L is the representation of A on X follows as
above for the Jordan canonical form. In view of (3.6), the statement
about eigenpairs, which amounts to saying that Lv = AV if and only if
AXv = AXV, is a triviality. .
This is not the whole story - only enough to get us through Chap-
ter IV. We will return to invariant subspaces in Chapter V.
3. EIGENVALUES AND EIGENVECTOH.S
23
3.5. The Field of Values
The quadratic form x H Ax plays an important role in many applications.
This subsection is devoted to the values that such a form can attain for
a given matrix. We begin with a definition.
Definition 3.10. Let A E e nxn . The set
F(A) {xHAx : IIxlh = I}
is called the FIELD OF VALUES of A
The set F(A) is bounded and closed in e. From Definition 3.10, it
is easy to verify the following properties of F( A):
1. F(aA + (3I) = aF(A) + (3, a, (3 E e;
2. £(A) F(A);
3. If U is unitary, F(U H AU) = F(A);
4. F(A + B) F(A) + F(B).
An important and far-reaching result is that the field of values of a
matrix is convex; that is, it contains any line whose endpoints lie in it.
Nothing in the proof of the following theorem is used later, and it may
be skipped without loss.
Theorem 3.11 (Toeplitz-Hausdorff). The field of values F(A) is
a convex set.
Proof. Let p, a E F(A). Since F(aA+(3I) = aF(A)+(3 and aF(A)+(3
is convex if and only if F(A) is convex, we may assume without loss of
generality that p = 0 and a = 1. Thus there are vectors X() and x] of
2-norm one such that
X Ax() = 0 and xr AXI = 1,
which implies that Xo and Xl are linearly independent. By multiplying
Xo by a scalar of absolute value one we may further assume that
RxgXl = o.
24
1. PRELIMINARIES
We must show that any T E [0,1] is in the field of values of A. B.J
the linear independence of Xo and Xl, the vector (1 - A )xo + AXl IS
nonzero. Consequently the function
II ] \ A 1 2
[(1 - A)Xo + Axd A[(1 - A):rO + AXI = 2
<p(A) = \1(1 - A)XO + AXI\1 11(1 - A)Xo + Axl\12
is continuous and real when A is real. Moreover, <p(A) E F(A) for all A.
Since <p( 0) = 0 and <p( 1) = 1, by the intermediate value theorem there
is a A E [0,1] such that T = <p(A) E F(A). ·
The field of values of a matrix A is closely related to the eigenvalues
of A. In particular if Ax = AX with IIx\12 = 1, then A =.x H Ax E F(A).
Since the field of values is convex, F(A) must contam the smaUest
convex set containing all the eigenvalues of A, that is, the set
H[£(A)] = { L 8 i A i: 8 i ;::: 0, L 8 i = I } (3.7)
.\iEL:(A)
(H[£(A)] is called the CONVEX HULL of £(A)). Unfortunately, the fi:ld
of values can be bigger than the convex hull of L:(A), as the foUowmg
example shows.
Example 3.12. Let
A( )
Then H[L:(A)] = {O}. But F(A) = {z E e: Izi S }.
There is one important class of matrices for which the field of values
coincides with the convex hull of its eigenvalues.
Theorem 3.13. If A is normal, then
F(A) = H[£(A)]" (3.8)
Proof. Since the normal matrix A has a decomposition A = U H AU,
wh,re U is unitary and A = diag(Al,"" An), we have
F(A) = {(Ux)IIA(Ux) : x E en, IIxlb = I}
= {yIIAy: y E en, IIyl12 = I}
= {L'=I ly;\2 Ai : Li ly;\2 = I} .
Since L IYil 2 = 1, this last equation is clearly equivalent to (3.7). ·
;,
!
3. EIGENVALUES AND EIGENVECTORS
25
3.6. Sums of Hermitian Matrices
The main result of this subsection will be found in more general form
in Section IV.4, where it is a consequence of a very powerful character-
ization of the eigenvalues of a Hermitian matrix. Because the result is
needed in Chapters II and III, we will give an elementary proof here.
Theorem 3.14. Let A and B be Hermitian and A = A + B. Let the
eigenvalues of A be Al :::: A2 :::: ... :::: An, and let the eigenvalues of A
be I ;::: ).2 :::: . . . :::: n' If JLn is the smallest eigenvalue of B, then
Ai :::: Ai + /In,
i=I,...,n.
Proof. We begin by simplifying the problem. First, since the order of
the eigenvalues is not affected by a shift of the origin, we may replace
B by B - /lnI, so that the theorem says that the eigenvalues of A are
pairwise greater than the eigenvalues of A.
Second, if XlI AX = A is the spectral decomposition of A then we
may replace A with XlI AX, A with A, and B with X H B X. llence we
may assume that A = A is diagonal.
Third, if B = Li /liYiy!1 is the spectral decomposition of B, then it
is sufficient to prove the theorem in stages: first for A + /lIYIYP, then
for (A + /lIYIyr) + /l2Y2YI, and so on. Thus it is sufficient to establish
the theorem for the case B = yylI.
Fourth, if the ith component of Y is zero, then Ai is an eigenvalue
of A. Moreover, the remaining eigenvalues of A are those of the matrix
obtained by deleting the ith row and column of A. Thus we may assume
that lyl > O.
Fifth, the characteristic polynomial of A is
cPA (A) = det[(.>..! - A) - yyH] = det(.>..! - A)det[I - (.>..! - A)-lyyH].
From the identity det(I - nv H ) = 1 - vHn, we have
cPA(A) = cPA(A) - L l1]iI2cP)(A),
i
where cP)(A) = cPA(A)/(A - Ai). Now if Ai has multiplicity Tn then
(A - Ai)m-I factors out of cPA(A). This implies that Tn - 1 copies of A
26
I. P RELIMIN ARIES
stay fixed and we need only be concerned with the change in one of
them" Hence we may assume that the Ai'S are distinct.
The result now follows easily from the intermediate value theorem.
Since 1>)(Ai) = 1>/..(Ai), w have 1>A(Ad < 0, 1>;\(A2) > 0, 1>A(A3) < 0,
and so all. It follows that A has 11 -1 eigenvalues interlaced with the Ai.
Sin<.:e 1>A(Ad < 0 and lim.\--;oo 1>;\JAI) = +00, the rpmaining eigenvalue
of A must be greater than AI' ·
Theorem 3.14 is our first perturbation theorem, in th(,t it restricts
the location of the eigenvalues of the perturbed matrix A. It is note-
worthy that there is no restriction on the size of the perturbation B.
Notes and References
The prefix "eigen" has triumphed because of its brevity and utility. Other
coinages are EIGENSYSTEM for the system of eigenvalues and associated vec-
torS and EIGENPROBLEM for the eigenvalue problem. The term "eigenproblem" is
actually more precise, since one is seldom concerned with eigenvalues alone; how-
ever, the sense does not survive translation back into German -- further evidence
of the thorough Anglicization of the prefix. Nontheless, our free use of the pre-
fix "eigen" does not extend to revising established nomenclature: there will be no
eigenpolynomials in this book ("A foolish consistency is the hobgoblin of little minds
'" . ").
The notation for the set of eigenvalues of a matrix or operator varies" Numeri-
cal analysts and some matrix theorists write A(A); functional analysts, who call
eigenvalues and their generalizations the spectrum of an operator, write a(A). Un-
fortunately, the former group uses a(A) to denote the set of singular values of A.
We have punted by writing L(A) for the set of eigenvalues of A.
The proof of the existence of the Schur decomposition is the same as Schur's [193,
19 0 U]" The decomposition is essentially unique, up to the ordering of the eigenvalues;
that is, once an ordering has been fixed, the copies of each multiple eigenvalue being
placed together, the columns of U corresponding to simple eigenvalues are unique.
The columns corresponding to multiple eigenvalues are not unique, but the space
spanned by the columns is. This kind of "essential uniqueness" is typical of most
decompositions involving eigenvectors and the like.
Since a real matrix can have complex eigenvalues, the Schur decomposition of a real
matrix can be complex, something that is undesirable in numerical applications.
Reality can be restored by allowing the final form to be block triangular, with
2 x 2 blocks representing the complex eigenvalues. The details will be found in
Exercises 3"22, 3"23, and 3.24"
The .Jordan canonical form [124, 1870] represent.s an ext.reme redudion of its matrix,
which is achieve,1 at. t.he expense of st.ability For more on the computat.ion of the
3. EIGENVALUES AND EIGENVECTORS
27
Jordan form and intermediaries see [273, 18, 128].
The term "field of ahles" is a reasonable translation of Wel'tvol'mt which seems
to be clue to Toe { )htz [ 241 1 1 8] It ' I II 1 .'
, ,9", IS a so ca e< the numencal range. Toeplitz
proved t.hat the boundary of the field of values is a convex curve. Hausdorff '[105
1919] showed "tlat the set itself is convex. The proof given IIPre, which is really
souped-up vel SlOn of Hausdorff's, was adapted from [122]. .
The esut on sums of Hermit.ian matrices is an easy consequence of Fiscll{'r's char-
actenztlOn of the eigenvalues of a Hermit.ian matrix (see Corollary IV.4.9). The
proof given here owes much to Wilkinson [269, 1965; pp. 94 98]
Exercises
1. (CayleyHamilton). Prove that <PA(A) = O.
2. Prove that if A is real and A E L:(A) then>. E L:(A).
3. Let A be Tn x nand B be n x Tn. Show that the matrices
( A: )
and
( BOA)
are similar. Conclude that the nonzero ei g envalu es of AB ar tl
h " e 1e same as
t ose of EA. [Note: This elegant proof is due to Kahan [130].]
. Show that the geometric multiplicity of an eigenvalue is not greater than
Its algebraic multiplicity.
5. Show that if A is SKEW-HERMITIAN ( i e A H - A) tl II . t ' I
I . ., .., - - , 1en a I s elgenva -
ues Ie on the llnagmary axis.
6. (Loewy [148]). Let K be skew Hermitian. Show that the matrix
U = (I + K)-I(I - K)
(3.9)
s unitary. Moreover, if U is unitary and 1 L:(U), then U call be represented
In the form (3.9) for some skew-Hermitian matrix K.
7. Suppose tht HI. nd H2 are Hermitian with HI positive definite. Show
that HI + H2 IS posItIve definite if and only if all the eigenvalues of H- I H
are greater than -1. I 2
28
1. P RELIMIN ARIES
8. L('t. 11 b(' idempot('nt. (iy., A 2 = A). Show that A is nondefective with
eigenvalues zero ano one.
g. Let A be nonoefective. Show (without appealing to the Joroan canonical
form) t.hat if the columns of X form a set of linearly indepenoent eigenvectors
of A, t.hen A = X-I AX is diagonal. What arc the diagonal elements of A?
10. Let A be diagonalizable and let X-I AX = A, where A is diagonal.
Let X = Q R be the QR factorization of X. Show that QII AQ is upper
triangular; Le., it is a Schur decomposition of A.
11. Let
"n-1 + "n
q,(O = no + ffl€ + . . . + an-I<., <., ,
and let
0 1 0 0 0
0 0 1 0 0
C<I> =
0 0 0 0 1
-0'0 -a1 -a2 -0',,-2 -an-I
Show that the characteristic polynomial of C<I> is ,p. The matrix C <I> is called
the COMPANION MATRIX OF ,p.
12" Let. q,(O = ( - A)". Show that the companion matrix C<I> is similar to
the Jordan block J,,(A).
13. Show that Jk(O)k = O. What do the matrices Jk(O)i (i = 1,..., k - 1)
look like?
14. Show that
.1,(»" >"1 + >,,-' (':).1 + >,,-' ().1' +... + >,,-, " (k ,).1'->,
where J = Jk(O).
15. Prove that the field of values of a matrix A is real if and only if A is
Hermitian. What if the field of values is imaginary?
16. Prove that the field of values of the matrix in Example 3.12 is {z E C :
Izl:S}.
17. Let A and B be diagonalizable matrices. Show that the following state-
ments are equivalent.
3. EIGENVALUES AND EIGENVECTORS
29
1. AD = DA.
2. There is a nonsingular mat.rix X sHch t.hat. X' I AX and X-I JJX are
diagonal.
3. There are polynomials p and q and a diagonalizable matrix C such that
A = p(C) and B = q(C).
1. Let A be simple eigenvalue of A and let x and y be its right and left
elgenvectors. Show that yHx "I O.
19. Let ,p be a polynomial and Jk(A) be a Jordan block of order k. Show
that
,p[.h(A)] =
(T
,p(3)(A)/3!
,p"(A)/2
,p' (A)
,p(3)(A)/3!
.. )
,p' (A)
,p(A)
o
,p"(A)/2
,p' (A)
,p(A)
,p(3) (A)f:3!
,p"(A)/2
20. Let r(A) = p(A)/q(A) be a rational function, and let A be such that
no zero of q is in £(A). Show that q(A) is nonsingular and hence r(A) =
p(A)/q(A) is well defined. Prove that
£[r(A)] = {r(A) : A E £(A)}.
What are the corresponding eigenvectors?
21. Let B be nilpotent (i.e., let there be an integer k > 0 such that B k = 0).
Show that if AB = BA then det(A + B) = det(A).
THE FOLLOWING THREE EXERCISES DEVELOP AN ANALOGUE OF
THE SCHUR FORM FOR REAL MATRICES.
t\-
22. Let the columns of X form an orthonormal basis for an invariant sub-
space of A. Let AX = XL, and let (X Y) be unitary. Show that
( : ) A(X Y) = ( ).
23. Let A be real, and let A be a complex eigenvalue of A with eigenvector
x + lY. Show that the space spanned by x and y is an invariant subspace of
A.
30
1. PRELIMINARIES
. . S l tl t'f A is real thcre is an orthogonal
24. (Real Schur decomposItIOn). lOW la I c.' d 2 2 blocks
. U 1 tl lat UT AU is block triangular with 1 x 1 an x .
matnx suc I . I f A and the
. 1 . I Tllc 1 x 1 blocks contain the real Clgenva ucs 0 ,
on Its ( lagona . . I f A
. I S ()f tl ln 2 x 2 blocks are the complex elgenva ues 0 .
elgenva ue. . '"
-0-
. I ) Let A - B + iC whcre B =
2" (Ikndixson Hirsch Toephtz t lcorem . -.' I f B
u. A iI )/ 2 d C = (A _ All)/2i are Hermitian. Lct the clgenva.ue 0
(A+ an fCb >...>, Showthat£(A)hesmthe
b a >... > f3 and those 0 e ,I - - n It r
e fJI - - n Note: Bendixson [25, 19 02 ] proved the rcSU lor
rectangle [f3n,f3d x lIT",J]. [ d th tl bove for the imaginary part.
real matrices, giving a weakcr bouIl an le a . d
Hirsh [115 H)02] pointed out that the result holds for complex mtJces a c
gae a sh;rp:r bound for \.1lP imaginary part. In the form statc! lcre, 1.
thcorem is due to Tocplitz [241, 19 18 ].]
, TENSOR PRODUCT of two matrices A and
26. The KRONECKER PRODUCr or
B is
( aJlB a12B (X]3 B
(X21B a22B an B
A 0 B = a31 B a32 B a33 B
. .
. .
. . .
Show that if A and B arc square, then
£(A 0 B) = £(A)£(B).
4. The Singular Value Decomposition
...,
q J
4.1. The Singular Value Decomposition
. I d t that it does not mix up
The QR decompositIOn has t Ie a van age I . r t' b
the columns of a matrix; since it involves only P l remu t l l t Ica IOari: i
: t . The P rice to be paid is that t le resu mg m
ullltary ma fiX. t it' I y b y a
. lar If we are also willing to pos mu Ip
me.r t cl Y uP a P t e r r l ' x tr:n ;educe n arbitrary matrix to a diagonal form.
Ulll ,ary m ,
The result is the SINGULAR VALUE DECOMPOSITION.
4 1 L t A E cmxn have rank r. Then there are unitary
Theorem ., e
matrices U and V such that
ullAV=
(,+ )
(4.1)
4. THE SINGULAR VALUE DECOMPOSITION
31
where E+ = diag(al,"", aT) wjt}j a] 2': "'2': a," > O.
Proof. Let the eigenvalues of AHA be af 2': '" 2': a; > a = a;+1 =
... = a. If V = (VI V 2 ) (VI E e nX ,") is a unitary matrix formed from
the corresponding eigenvectors of A H A, then
VHAIIAV = ( E 0 )
o 0 '
whcre E+ is defined as above. Thus we have
VIHAllAV J = E,
( 4.2)
Vi i A ll Av'2 = 0,
and from thc second of these relations we conclude that
.
A V 2 = O.
(4.3)
N ow let
UI=AVIE:;:J. (4.4)
Then from (4.2) we have ufu l = I. Choose U 2 so that (U J U 2 ) is
unitary. Then from (4.2)( 4.4) wc get
UllAV = ( UrAVI U fIAV 2 ) = ( E O + ( 0 ) ) " .
UJIAV I UJ'AV 2
The diagonal elements of E+ are called SINGULAR VALUES of A. Con-
ventions differ on how to handle zero singular values. The choices are
to say that A has min{ m, n} - rank(A) zero singular values or that A
has max{ m, n} - rank(A) zero singular values. Whenever the choice
makes a difference, it will be clear from context what is meant, and we
will therefore use either convention at our convenience.
We will denote the set of singular values of A by S(A). It is easy to
see that S(A) consists of the nonnegative square roots of the eigenvalues
of AHA or of AA H , depending on how we count zero singular values.
The columns of U and V are callcd LEFT and RIGHT SINGULAR VEC-
TORS. The columns of U are eigenvectors of AA H , while those of V are
eigenvectors of All A. The singular vectors are not uniquc, but they are
. by no means arbitrary. Thc columns of the matrix VI must form an
32
1. P RELIMIN ARIES
orthonormal basis for the column space of A, while the columns of VI
must form an orthonormal basis for the column space of All; and these
bases are related by (4.4). Moreover, if U = U I W u and V{ = VI W v
also consist of singular vectors of A, then Vu and "V v are unitary an?
VuE+W} = E+. The columns of the matrices U 2 and V 2 may be arbI-
trary orthonormal bases for the orthogonal complements of R(A) and
R(AII). Thus the singular value decomposition, like the QR decompo-
sition, can be Ilsed to compute the projection onto the column space
of k However, it can also be used to compute the projection onto the
row space of k . .
The singular value decomposition (4.1) can be wntten III the form
A = U1E+V I H , (4.5)
which is called the SINGULAR VALUE FACTORIZATION of A. If we write
F = U I and C = VIE+, then (4.5) is an example of a FULL RANK
FACTORIZATION of A into the product FC H of matrices whose rank is
the same as A.
The relation between the singular value decomposition of A and the
spectral decomposition of A H A allows one to obtil results .on the sin-
gular value decomposition from results for HermItIan matnces. There
is another way.
Theorem 4.2 (Jordan-Wielandt). Let A E e mxn (m 2: n) have
the singular value decomposition
U H AV = diag(E, 0),
where E is of order n. Then the matrix
C Un )
has eigenvalues ::I: a I , . . . , ::I:a,,, corresponding to the eigenvectors
( Ui )
::I: Vi '
i = 1,... ,n,
where Ui is the ith column of U and Vi is the. ith column of v.r I
addition C has m - n zero eigenvalues whose elgenvectors are (u i 0)
(i=n+1,...,n}
4. TIlE SINGULAR VALUE DECOMPOSITION
33
The proof is purely computational and is left as an exercise.
An important consequence of Theorem 3.13 is the following char-
acterization of the largest and smallest singular values of a matrix. In
stating it we take the opportunity to introduce some useful notation.
Theorem 4.3. For any matrix A E e mxn
IIAI12 f max IIAxll2 = maxS(A).
IlxliFI
If m 2: n, then
inf 2 (A) min IIAxll2 = minS(A).
IlxllFI
Proof. We have
IIAII = max IIAxll = max xH(AHA)x = maxF(AHA).
IlxliFI IIx1l2=1
Since A H A is Hermitian, the largest member of its field of values is its
largest eigenvalue, which is the square of the largest singular value of
A. A similar argument shows that inf 2 (A) is the smallest singular value
of A. .
As the notation suggests, the function II . 112 is actually a matrix
norm. It will be treated in detail in the next chapter.
4.2. Two Inequalities
In this subsection we will prove two useful inequalities. They are con-
sequences of Theorem 3.14.
Theorem 4.4. Let A E e mXll be partitioned in the form
A ( : )
Let the singular values of A be al 2: a2 2: " . 2: an and tllOse of Al be
2: T2' " 2: Tn' Then
ai 2: Ti,
i = 1,... ,n.
I. P RELIMIN ARIES
34
Proof The squares of the singular values of A are the ei.genval l ues of f
. . I I s of A are the elgenva ues 0
AH A and the squares of the smgu ar va ue I ., . .
, H A A H A + AH A Since A!1 A2 is posItIve senll-defimte,
A H A But A = I I 2 2. '2 h
I I' I S of A H A taken in descending order are greater t an or
t Ie elgenva ue .' H.
I to tIle correS l Jondmg eIgenvalues of Al AI"
equa ' , ' ' .
The sc'cond incquality concerns the product of two matnces.
4 5 Let B E e mxn have singular values al 2: a2 2: . . . 2: an
Theorem . '. > . . . > T Then
and let C = AB have smgular values TI 2: T2 - - no
Ti ::; a;\\A\\2, i = 1,. .., n.
P f We will establish the inequality by comparing the eigenvalue
roo..' II \\ 2BHB. Let AliA = QA 2 QlI be the spectra
of CHC wIth those of A 2 2 A 2 )Q H TI D is P ositive
., f A li A L t D - Q(I\ A \I 1- H . len
decomposItIon 0 . e - 2 2
semi-definite and All A + D = \IA\l21. Now
\IA\IBH B = BII(AIIA + D)B = CHC + BHDB.
S . B H DB is P ositive semi-definite, the eigenvalues of \IA\IBH Bare
mce . . I f CHC .
not less than the correspondmg elgenva ues o. .
Both of the above theorems have trivial variants and corollanes,
which we will use freely in the sequel.
Notes and References
. . ." l' d dently by Beltrami
The singular value decomposItIOn was dlscovere( I e P I e t . . terms of
. ] I Jordan [125, 1874]. Both cast theif ( enva Ions 1Il '
[23. 18 73 aIIC . . . ) _ H Ax b orthogonal transforma-
silnplifyin!!: a rpallJl!1ncar fO\'\I1 1>(.1,,11 - 11. . '. y. I t 1.1 in the
tions of the variables :r and y" Bdtrami's dcnva,'on b ls c ° k s 0 r le t ( l ) l l e lelar g es
. I .' He begllls y as Illg lor
text. Jordan's is somethmg e se agam.' t ( H H)H is an
value of 1> when Ilxlb = IIY211 = 1 and shows that the vec or x Y
eigenvector of the matrix
( A;)
He then transforms A to the form
whose eigenvalue is our aI'
( al 0 )
o A.
",
$,:
4. THE SINGULAR VALUE DECOMPOSITION
35
and proceeds by induction a la Schur. Thus, Jordan can claim precedence
for both variational characterizations in matrix analysis and the recursive
definition of matrix decompositions.
The decomposition has been frequently rediscovered, first by Sylvester [235,
236, 1889, ]990]. Autonne [8, 1913] generalized it to complex matrices, and
Eckart and Young [63, 1936] to rectangular matrices, where they used it to
approximate a matrix by another of lower rank (see Theorem 111.4.]8).
The use of the word "singular" in this connect.ion apparently comcs from the
literature of integral equations. The story begins with Schmidt [192, 1907],
who expanded the kernal of an integral operator ill the form
00 1
K(X, y) = L >:fli(X)Vi(Y),
i=1 '
where the fli and Vi are eigenfunctions of the iterated kernals
J K(X,t)K(y,t)dt and J K(t,X)K(t,y)dt.
This is equivalent to the matrix representation
n
K = UVT = L a;uivT
i=1
A little later Bateman [12, 1908] refers to numbers that are essentially the
reciprocals of the eigenvalues of K as as singular values, but does not relate
them to the numbers introduced by Schmidt (he will continue this usage
through 1(22). Picard [181, 1910] notes that for symmetric kernals Schmidt's
Ai are real, and in this case (but not the general case) he calls them singu-
lar values. By 1937, Smithes [198] was calling Schmidt's numbers singular
values. Exactly when ami how the usage changed remains to be determined.
Theorem 4.2 is implicit in Jordan's derivation of the singular value decom-
position; however, Wielandt seems to be responsible for its widespread use
today (e.g., see [71, p.113]).
The singular value decomposition is closely related to the analysis, due to
Hotelling [119, 1933], of a multivariate random variable into principal com-
ponents. Specifically if the rows of A form a centered sample o a normally
distributed random vector a, then V estimates an orthogonal transformation
of a into a vector whose components are uncorrelated.
36
1. P RELlMIN ARIES
Exercises
. A . 11 b x n ( m > n) matrix
Throughout these exercises WI e an m -
wit.h t.he singular value decomposition (4.1).
1. Show that if A is square, I det(A)1 = 0:'=1 ai.
]) P tl t an square matrix A has a POLAR DE-
2 (Autonne [7 19 02 . rove Ja Y , .' . . d fi 't
. A ' - HQ where Q is unitary and H IS positive seml- e III e.
COMPOSITION - , . . fi . d tl p lar de-
Moreover, if A is nonsingular, then H is posItIve de mte an Ie 0
composition of A is unique.
3. What is the singular value decomposition of a Hermitian matrix? Of a
unit.ary matrix?
4" 1,1'1. A 1)(' nOllsin{';ulaL Show t.hal. i nf 2(A) = IIA- I II2'I.
5. Show that
IIAlb = max max lyH Axl.
IIvIl2=llI x 1iFI
. 1 » . . . > a and let C = AB
6. Let B E c mxn have smgular va ues al _ a2 _ - n
. I I S T > 7' 2 > . . . > Tn- Then
have smgu ar va ue I _ - -
Ti ;:::: aiinf2(A), i = 1,..., n.
r tl I. A . s square and nonsingular
[Hint: Assume without loss 0; genera Ity la I
and apply Theorem 4.5 to A- C.]
7. Let the eigenvalues of a square matrix A be ordered so that 1)'1 I ;:::: . . . ;::::
IA"I. Show that IAII S; al.
1 'he matrix W n is illustrated below for n = 5:
8.
1 -1 -1 -1 -1
0 1 -1 -1 -1
W5 = 0 0 1 -1 -1
0 0 0 1 -1
0 0 0 0 1
. 1 f W ? What is
) ()( 2-n ) What are the elgellva ues 0 n'
Show that. inf2(W" = .
it.s determinant.?
5. PAIRS OF SUBSPACES
37
5. Pairs of Subspaces
The problem to be treated in this section is that of comparing two
subspaces of en. Our tool will be the CS DECOMPOSITION (cosine-sine
decomposition) of a partitioned unitary matrix. This decomposition
allows us to define canonical angles for pairs of subspaces in such a
way that as the largest canonical angle approaches zero the subspaces
approach one another. This in turn leads to some useful results on the
singular values of products and differences of projections.
5.1. The es Decomposition
We begin by establishing the existence of the CS decomposition.
Theorem 5.1. Let the unitary matrix vV E c"xn be partitioned in
the form
1 n-I
W = I ( Wn W12 )
n-l W 2I W 22 '
where 2l ::; n. Then there are unitary matrices U = diag(U n , Un) and
V = diag(V n , V 22 ) with Un, V n E e 1xl such that
n-21
UHWV =: (
n-21 0
).
-I;
l'
o
(5.1)
where
l' = diag( 'YI, . .., 'Y/) 2: 0,
I; = diag(al"'" al) 2: 0,
and
1'2 + I;2 = I.
Proof. The proof, which is long and tedious, is included mainly for
completeness. Let
uii W n V n = l'
be the singular value decomposition of W II , and suppose that
l' = diag(1' I , II-k),
1. PRELIMINARIES
38
where the diagonal elements of r I satisfy
o ::; ,1 ::; . . . ::; ,k < 1
( 1) S ince \IV is orthogonal the singular values of Wu cannot be greater
IL .,
than one). Clearly the mat.rix
( W 11 ) Vu
W 21
has ort.honormal columns. Hence
I [ ( ::: ) Vn1 n [( ::: ) vn1 1" + (W" Vn)n(W" Vn),
that is,
(W 21 V u )H(W 21 Vu) = diag(I - ri, Ol-k)'
f TV V rthogonal with it.s last [- k
This means that the columns 0 v 21 11 are 0 . U " e(n-I)x(n-l)
1 . Thus there is a unitary matnx 22 E
columns )emg zero.
sllch t.hat.
"j
"H 1 ( )
U 22 W 2 1 Vu =
11-21 '
where
k l-k
E = diag (El 0),
_ d . ( a ) has I >ositive diagonal elements.
and EI - lag al,' . ., k
(5.2)
Since
" H ( \;Vu ) V -
diag(U u , U 22 ) W21 u -
(J
has orthogonal columns, we must have
2 2 - 1
,i + a i - ,
i = 1, . . . , l.
(5.3)
5. PAIRS OF SUBSPACES
39
In a similar manner we may determine a unitary matrix V22 E
c(n-I)x(n-l) such that
UiiW I2 V 2 2 = (T 0),
where T = diag(TI"'" TI) with Ti ::; O(i = 1,..., I). Since
uii (W u Wddiag(V u , V 22 ) = (r T 0)
has ort.hogonal rows, we must have ,; + T? = 1 (i = 1,..., l), and it
follows from (5.2) and (5.3) that T = -E.
Set (; = diag(U u , (;22) and V = diag(V u , V 22 )" Then the foregoing
shows that the matrix X = (;II\1VV can be partitioned in the form
k l-k k l-k 11-21
k r l 0 -E I 0 0
I-k 0 I a 0 0
X= k EI 0 X 33 X 34 X 35
l-k 0 0 X 43 X 44 X 45
n-21 0 0 X 53 X 54 X 55
Since X is unitary and E 1 has positive diagonal clements, we have
X 33 = r l . Moreover, X 31 , X 35 , X 1 : J , and X 53 are zero. Therefore
r l 0 -E I 0 0
0 I 0 0 0
X= EI 0 r l 0 0
0 0 0 X 44 X 45
0 0 0 X 54 X 55
where
( X 44 X 45 )
X 54 X 55
is unitary.
Set
U 33 = ( X44 X45 ) E c(n-I-k)x(II-I-k),
X 54 X 55
1. P ItELIMIN ARIES
40
then we have
r l 0 -E 1 0 0
0 1 0 0 0
diag(1(l+k), uj)X = EI 0 r l 0 0
0 0 0 1 0
0 0 0 0 1
11-21
: ( -rE 1 ) '
n-21 0 0
ObSNve that
diag(1(/+k), UmX = diag(1(I+k), UmUHWv.
Hence, if we set
U U dia g( l(IH) U. )
, 33
diag( U II , U 22 )diag(1(I), diag(1(k), U 33 ))
diag(U II , U n diag(1(k), U 33 ))
diag(U II , Un),
then UHWV has the form (5.1), where U and V are diagonal block
unitary matrices. .
5.2. Pairs of Subspaces
Armed with the CS decomposition, we may now attack the problem
of determining how two subspaces are situated with respect to one
another. The following theorem shows that there are natural bases for
two subspaces that exhibit their relation.
Theorem 5.2. Let XI, 1'1 E e nxl with xiI XI = 1 and ylHY I = I. If
21 ::; n, there are unitary matrices Q, U II and V u such that
QXIUII =: ( ) ,
n-21 0
(5.4)
QY l V Il = I, ( ) ,
n-21 0
(5.5)
5. PAIRS OF SUDSPACES
41
where
r = diag(rl"'" ,I) and E = diag(al,"', al)
satisfy
o ::; II ::; . . . ::; II,
al 2: . .. 2: at 2: 0, (5.6)
If + a; = 1, i = 1,. . . , i.
On the other hand, jf 21 > n then Q U d \ 1
that " II, an II may be chosen so
n-I 2t-n
QXIU U =
tln ( ) ,
n-I 0 0
(5.7)
n-I
21n
QY I VII =
n-I ( r
21-n 0
n-I E
(5.8)
o )
I ,
o
where
r = diag(rl, . . . , In-I) and E = dia g( a l a )
,. .., n-l
satisfy
o ::; II ::; . . . ::; 111-1,
al . . . 2: an-I 0,
I; + a; = 1, i = 1, . . . , n - l.
Proof. We. will prove the theorem for 2l ::; n leavin g the tl
as an exercIse L t X d . ' ,0 ler case
y _ (y; ' I" ) . e. 2 an J'2 be chosen so that X = (X X ) d
_ I 12 are umtary. Let I 2 an
I 11-1
" W XHy = :,-, (::: :::)
"By Theorem 5 1 tl .
., lere are umtary matrices U U 11
that the relation ( 5.1 ) holds TI C . f II, 22,' II and V n such
" . lerelore, I we set
Xi = XiV ii , Y; = }V;i,
i = 1,2,
1. P RELIMIN ARIES
42
and
x = (XI X 2 ), Y = (Y 1 Y 2 ),
then
n-21
Xlly =
1 ( 1'
I;
-21 a
-I;
r
o
() )
.
Moreover, by permuting the columns of U ii and Vii we can insure, that
(5.6) holds. Equations (5.4) and (5.5)now follow on setting Q = X H . ·
Theorem 5.2 has the following geometric significance. Let XI and
Yl be i-dimensional subspaces of en. If 2i :::; n we can transform en
by unitary transformation Q so that the columns of the matrices
(D and UJ
form orthogonal bases of QX 1 and QYI' The space spanned by the
columns for which Ii = 0 is the subspace of QYl that is orthogonal to
QX 1 .
When 21 > n, the columns of the matrices
unand(n
form orthogonal bases for QX I and QYI' As above, the space spanned
by the columns for which Ii = 0 is the subspace of QYI that is orthog-
onal to Q XI' The last 21 - n columns represent the smallest possible
intersection of QX 1 and QYI'
Since the numbers ai and Ii satisfy a+, = 1, they can be regarded
as sines and cosines of angles between the bases. Moreover, X = Y if
ancl only if I; = O. This means that the size of I; is a measure of how
X and Y differ. Since I; depends only on the subspaces X ancl Y, we
may make the following definition.
5. PAIRS OF SUBSPACES
43
Definition 5.3. Let X and Y b b
The CANONICAL ANGLES b t e:u spaces of the same dimension.
matrix e ween "" and Y are the diagonals of the
8(X, Y) f sin- I I;
where I; is the matrix of Theorem 5.2. '
The following corolhr y of ' 1 ' 1 r:
. ( . worem u 2 shows h( t
canoIllcal angles. Its P roof' I ft . '. )W 0 compute the
IS e as an exercIse.
eorollary 5.4. Let X and Y b b
sion. Let the columns f X !i e su spaces of en of the same dimen-
the columns of Y form 0 J. t l. orm an orthonormal basis for Xl. and
. an or 11onormal basis for Y Tl. 1
smgular values of XH y th ' . 11en t ]e nonzero
1. are e smes of the n .
between X and Y. onzero cano111cal angles
5.3. Pairs of Projections
A difficulty with the results of the last sub . .
in terms of explicit bases for th . b ectlon IS that they are cast
it is more convenient to re e s sp:ces mvolved. .In many instances
jections. As one would ex .sen . su space b theIr orthogonal pro-
to the canonical an g les bee e ' P t a h lr.s of b PrOJectlOns are closely related
n elf su spaces.
Theorem 5.5. Let X and Y be I d' .
- 1I11ensJOnal subspaces of en Let
k = { I, if 21 :::; n
n-I, if21>n
Let al 2: a2 2: . . . > a be th . .
and y. Then: - k e smes of the can0111cal angles between X
1. The singular values of P ( 1 P )
x - yare
aI, a2,. .., ak, 0,..., o.
2. The singular values of P P
x - yare
al,ul,a2,u2,... ,Uk,Uk,O,... ,0.
I. PRELIMINARIES
44
. or the case 21 :::; n, leaving the
P f Vve will prove the theOlem f I X d Y be bases
roO. . If in Theorem 5.2 we et 1 an 1
other case as an exerCIse. H 'd P = y 1 y l H . It follows from (5.4)
for X and Y, then P,y = X1X I an y
and (5"5) that
Qp,y(I _ py)QII = ( \ ( I :2
000) 0
( I -0 r 2 - \
o 0 0)
= ( 2 _ \
o 0 0)
U)<b -ro)
-r O J
I _ 2
o
I tl sin g ular values
-f 0) are orthonorma, 1e
O. This proves the first
a1,a2,'" ,az,O,...,
Since the rowS of (
of Qp,y(I - py)QH are
assertion.
In a similar manner we find that
( 2
Q(P,y _ py)QII = -f
-f 0 J
_2 0 .
o 0
. I f P - Py are
d . I the nonzero smgular va ues 0 ,y
Since and rare Jagona" .
, I f t l 2 x 2 matnces
just the singular va ues 0 Ie
-"(iai )
2 '
-ai
( a2
S. = '
, -ani
i=l,...,l.
1 f S are just ai twice over. ·
Bu t the singular va ucs 0 i' . C
5. PAIRS OF SUBSPACES
45
Notes and References
The notion of canonical angles between subspaces goes back to Jordan [126,
18 75], and has been frequently rediscovered; e.g., in the very readable papers
by Afriat [2, 3]. Davis and Kahan [53, 1970] give a unified treatment of the
subject in Hilbert space and a bibliographical note with further references.
Wedin [261] gives a lucid survey with an cmphasis on geometry.
The CS decomposition was introduced by Stewart [205, 1977], although it
is implicit in the paper of Davis and Kahan just cited. Paige and Saunders
[174, 1981] consider the case where the diagonal blocks are not square.
It is not a trivial matter to compute canonical angles or the CS decomposi-
tion. Algorithms are given in [37, 209].
Exercises
IN THE FOLLOWING EXERCISES X AND YARE SUBSPACES OF en
OF DIMENSION l, WHERE 2l :::; n. THE CANONICAL ANGLES OF
X AND YARE 0 1 2: ... 2: 0/. By THE ANGLE BETWEEN TWO
NONZERO VECTORS x AND Y WE MEAN
clef -1 Iyllxl
L(x, y) = cos IIx1I211y1l2 '
1. Let IIxll2 = lIyll2 = 1 and yHx 2: O. Show that
. L(x, y)
lIy - :1:112 = 2slll.
2. Let the columns of X and Y form orthonormal bases for X and y. Show
that the singular values of y H X are the cosines of the canonical angles
between X and y.
cos(OJ) = minmaxcosL(x,y).
rE,\' yEY
xi"O yi"O
0 1 = max min L(x, y).
xE.l' yEY
3"#0 V,#O
I. PRELIMINARIES
46
4. Let VHl1 = 1. Show that there are matrices U and V such that UHU =
diag(\\111\2\1v\l2' 1), VHV = I, and
( VII )
(11[1)-1= V H "
5. (Halmos [101]). A matrix A of order n is a contraction in the 2-norm if
\IA\\2 :::; 1. Show that if A is a contraction, there exist matrices B, C, and
D of order n such that
( )
is unitary.
TilE FOLLOWING TWO EXERCISES DEFINE A UNITARY TRANS-
FORMATION __ THE DIRECT ROTATION - - THAT MAPS X ONTO
y, AND THEY SHOW THAT IT IS IN SOME SENSE OPTIMAL. FOR
FURTHER DETAILS SEE THE PAPER BY DAVIS AND KAHAN [53].
6. In Theorem 5.2 let the columns of X and Y form canonical bases for X
and Y, so that U1I and V1I are the identity. Let
( l' - 0 J
U = QII l' 0 Q.
001
Show that U is It unitary matrix such that U X = y. Moreover, \11- UI\2 =
2 . h
Sill 2 .
7. Let V be an orthogonal matrix such that V X = y. Show that \11- VI\2 2:
2sin.
-0-
THE FOLLOWING EXERCISE DERIVES A SPECIAL CASE OF THE
GENERALIZED SINGULAR VALUE DECOMPOSITION INTRODUCED BY
VAN LOAN [247, 248]. THE FORM GIVEN HERE IS DUE TO
PAIGE AND SAUNDERS [174].
8. Let
x = ( ) ,
r: I :)
CJ. AIRS OF SUBSPACES
47
wh.erc X] and X 2 are square and X has full
umtary matrices U] and U d . rank. Show that there are
2 an a nOIlsmgular matrix, such that
diag(U I ,lh)IIX = ( r ) 'i
c,
where S is nonsingular and l' and .
Consider the CS decomposition of th:rerJagona l l with 1'2 + 2 = 1. [Hint:
tion of X.] 0 lOgona part of the QR factoriza-
-0-
Chapter II
Norms and Metrics
The goal of matrix perturbation analysis is to predict or bound the
changes in objects associated with a matrix when the elements of the
matrix change. For example we might ask how far the eigenvalues and
eigenvectors of a matrix A will change when A is replaced by a nearby
matrix A + E. A prerequisite for answering such questions is to make
precise terms like "how far" and "nearby."
For eigenvalues, which are complex numbers, the absolute value
function I . I : e --t R provides a natural notion of size and distance.
The usefulness of this function in analysis depends on three properties:
1. ( ::F a ===} 1(1 > 0 (definiteness),
2. 10'(1 = 10'11(1 (homogeneity),
3. I( + 171 s 1(1 + ITII (the triangle inequality).
These three properties make the function p( (,17) = I( -171 a metric over
C, which endows it with a topology, so that we may speak of limits
and continuity.
A vector norm is a generalization of the absolute value; i.e., a defi-
nite, homogeneous function on en that satisfies the triangle inequality.
. The first section of this chapter is devoted to the study of the elemen-
;tary properties of vector norms. It is also possible to define a matrix
"'norm to be a definite, homogeneous function on c mxn that satisfies
49
50 ___________________ 11. NORMS AND METRICS
1. VECTOR NORMS
51
the triangle inequality. However, such a definition ignores the fact that
matrices can be multiplied, and it is usually augmented to take this op-
eration into account" Section 2 is devoted to the study of matrix norms,
and Section 3 is devoted to the study of a particular class of matrix
norms ,-,- the UNITARILY INVARIANT NORMS - which interact nicely with
the Euclidean geometry of en" In some applications, the object that is
pertnrbed is no!' a vector or a matrix, but a subspace. Accordingly, the
chapter concludes with a discussion of measures of distance or metrics
for subspaces.
This chapter, like the first, is preliminary. Unlike the first, it con-
tains material requiring lengthy, deep proofs. The following comments
are for the reader who wants to get through the chapter quickly and
move on to the perturbation theory itself.
The first two sections are an elementary introduction to vector and
matrix norms and may be skimmed by anyone familiar with the subject.
The third section contains the most challenging material in the chapter.
Theorem 3.6, which characterizes unitarily invariant norms, is the key
result of the first subsection. However, its supporting lemmas and proof
are not required elsewhere and may be skipped. Of the material in
the last subsection only Birkhoff's theorem (Theorem 3.1G) and Fan's
theorem (Theorem 3.17) will be used in the sequel. The material in
Section 4 is used only in Chapters V and VI. The reader may find the
summary (4.11) useful in sorting out the various metrics introduced in
this subsection.
3. v(x + y) :S v(x) + v(y).
Three important properties follow immediately from Definition 1.1.
For any norm v,
1. v(O) = 0,
2. v( -x) = v(x),
3. Iv(x) - v(y)1 :S v(x - y).
1.2. Examples
There are an infinite number of norms on en. However, three of these-
the 1, 2, and oo-norms - are most commonly used in practice. The
I-norm is defined by
n
IIxlh (f 2:: Iil ;
i=1
the 2-norm by
n
IIxlb 2:: ld2;
i=1
and the oo-norm by
Ilxli oo l Id.
The norms II . III and II . 112 are special cases of the HOLDER NORMS (or
p-NORMS) defined by
1. Vector Norms
n 1
IIxli p = (2:: Idp)p,
i=1
1 :S p < 00.
(1.1)
1.1. Definition
(For a proof that 11.111' is indeed a norm, see Exercises 1.G-1.8.) Since
As we mentioned in the introduction, a norm on en is a generalization
of the absolute value of a complex number.
IIxll oo = lim IIxll p ,
1''''' 00
(1.2)
Definition 1.1. A function v : en --t R is said to be a NORM on en
(or a VECTOR NORM) if v satisfies the following conditions
the norm II . 1100 is also regarded as a Holder norm.
The 2-norm has the useful characterization
1. x =I 0 ==} v(x) > 0,
2. v(nx) = Inll/(.T),
Ilxll = xHx,
from which it immediately follows that
52
11. NORMS AND METRICS
1. VECTOR NORMS
53
the 2-norm is UNITARILY INVARIANT; i.e.,
The converse is left as an exercise. _
U unitary::=} IIUxJI2 = Ilxlh.
The Hblder norms have the property that Ilx\ll' = Illxllll" They
also have the property that \lx\ll' :::; Ilylll' whenever Ixl :::; Iyl. These
properties will play an important role in the theory of unitarily invariant
norms as wdl as in the structured perturbation theory of linear systems
and least squares problems, and it is worth establishing that they are
equivalent. We begin by introducing some terminology that we will use
later.
Definition 1.2. A vector norm v on en is ABSOLUTE if v(lxJ) = v(x)
for all x E en.
The following theorem exhibits a technique for constmcting new
norms from old. Its proof is left as an exercise.
Theorern 1.4. Let IL be a norm on em, and let A E e mxn have rank
n. Then the function v defined by
v(:z:) = Jl(A:r)
is a norm on en.
eorollary 1.5. Let A be positive definite. Then the function v defined
by
v(J;) = VxHAx
Theorem 1.3. A vector norm v is absolute if and only if
1S a nonn.
Ixl :::; Iyl ::=} v(x):::; v(y).
( 1.3)
Proof. We have
Proof. Suppose that v is absolute. Note that this implies that if
IDI = I then v(Dx) = v(x).
Now let Ixl :::; Iyl. It suffices to prove (1.3) for the case where the
first component of Ixl is less than or equal to the first component of
Iy\ and the other components of x and yare equal in absolute value.
By multiplying x and y by suitable diagonal matrices, we may assume
without loss of generality that the first components of x and yare
nonnegative and the others are equal.
Let x = (pT/I' T/2, . . . , 1]n)'f, where 0 :::; p < 1. If we set iJ =
(-T/t, '/2, . . " , T/,,)T, then
1
v(x) = \lA 2 x112'
Hence by Theorem 1.4, v is a norm" _
Norms generated by a positive definite matrix A are called ELLIPTIC
NORMS and are often written 11.\lA. It is worth noting that these norms
bear the same relation to the "inner product" yll Ax that the 2-norm
bears to the usual inner product yHx (see Exercises 1.15-1.17).
l+p I-p,
x=y+y.
1.3. Equivalence and Limits
In R 2 and R 3 , the norm \ly - X\l2 is the ordinary Euclidean distance
between x and y. For this reason the 2-norm on en is also called the
Euclidean norm. Moreover, the function
Since 1 - P :::: 0,
P2(X, y) = Ily - x\l2
v(x) = v (l¥y + y)
:::;v(y)+v(y)
= v(y) + v(y)
= v(lyl) + v(lyl)
= I/(Y).
is a METRIC on en; that is, it satisfies the conditions
1. P2(X, y) :::: 0,
2. P2(X, y) = 0 x = y,
3. P2(X, y) = P2(y, x),
54
II. NORMS AND METRICS
1. VECTOR NORMS
55
4. (J2(X, z) ::; (J2(X, y) + (J2(Y, z)"
. Th unit sphere 52 = {x : IIxlb = I} is closed and bounded. Since f.l
IS contmuous (Lemma 1.4), f.l achieves a minimal value Jr) on 52' Now
let y = x/llxll2' Then y E 52, and
The metric (J2 defines a topology on en; that is, it provides en with a
collection of open sets from which the notions of closed sets, compact-
ness, limits, and continuity may be defined.
If v is a norm Oil en, the function
IJ.(X) = fl,(lIxlby) = IIxI12/"(Y) 2> Jr) 11:1:112.
(1.5)
(Jv(X, y) = v(y - x)
.ow let f.l and v be arbitrary norms" From the foregoing, there are
posItIve numbers al, a2, TI, and T2 such that
is also a metric and hence defines a topology for en. It turns out that
this topology is the same as the topology generated by the Euclidean
norm. This is a consequence of the equivalence of norms on finite
dimensional spaces. We will prove this result in two steps, the first of
which is of independent interest.
alllxII2 ::; f.l(x) ::; a211 x 112,
(1.6)
and
Tlllxll2 ::; v(x) ::; T211x112'
(1. 7)
From (1.G) and ( 1.7),
Lennna 1.6. Lei, 1/ be a /lorm OlJ en. 1'h(,lJ v is cO/ltilJ/lol1s in the
Euc1idealJ metric"
O < = al < IJ.(X) < 0"2 _
"1- -= Jr 2. .
T2 - v( x) - T)
Proof. We need only prove that for any E > 0 there is a 6 > 0 such
that II/(Y) - v(x)\ < E whenever lIy - xlb < D. From Definition 1.1,
II/(Y) - v(x)l::; v(y - x)
= V[2::'=1 (rJi - ;)1;]
::; 2::'=II1Ji - ;\v(l;)
::; I'lly - xlb,
The following example shows the relations between t Ile 1 2 d
, , an
(X)-norms.
(1.4 )
Example 1.8. For all x
where I' = ) 2::;'"=1 v 2 ( 1 i ) > 0 is independent of x and y. 1£ we take
6 = Ell', then Iv(y) - u(x)1 < E provided that Ily - xlb < 6. ·
This lemma allows us to prove the equivalence of norms.
1. IIx1I2::; IIxlll ::; JTlllxll2,
2. In IIxII2 ::; Ilxll oo ::; Ilxlb,
3. IIxlloo::; IlxliI ::; nllxlloo.
In all cases, equality can be attained.
Theorem 1.7. Let v and f.l be norms on en. Then there are positive
numbers Jrl and Jr2 depending only on v and f.l such that
JrIV(X) ::; f.l(x) ::; Jr2V(X),
"ix E en.
n heorem 1.7 shows that all norms generate the same topology on
e , m tl.le sense that for any sequence XI, X2, . . . we have limkoo f.l( X -
Xk) = 0 If and only if limkoo v(x - Xk) = O. Thus we can use any norm
to define the notion of a limit of a sequence of vectors.
However, it is possible to define limits without using norms. A
very natural definition is the following. Let Xk = (d k ),. . . , k»)T (k =
1,2,.."). If
liw c(k) = c"
k l (,,1,
oo
i=I,..",n,
Proof. Without loss of generality, we may take x =I O. The first step
is to prove the theorem for the case v(-) = II . 112. In this case , it follows
from (1.4) with 11 = 0 t.hat we way take Jr2 = J L'=J ,,2(1;). Thus we
need only determine Jrl.
56
II. N aRMS AND METRICS
1. VECTOR NORMS
57
then we say that the sequence of vectors {xd has the limit x
(6,"." ,,,)T, or that x: converges to x, and we write
lim ;rk = :r.
koo
Definition 1.10. Let v be R norm on en. Then the function 1/* defined
by
v*(y) = max lyHxl
1/(:r)=1
is the DUAL NORM of 1/ on en.
The following t.hearcm shows that this component-wise convergence is
the sauJ(' as convergence in any norm.
The dual of the 2-norm is easily seen to be itself. The dual of the
I-norm is the oo-norm, and vice versa. More generally if p, q > 1 satisfy
Theoreln 1.9. For any vector norm 1/,
lim Xk = x {=::} lim V(Xk - x) = O. (1.8)
k --+ OCJ k -....+(X)
Proof. The result is trivial when vO = \1 . 1100' Hence by the equiva-
lence of norms, the result holds for any norm. ·
1 1
- + - = 1,
p q
<p(x) = yHx
(1.9)
then the Holder norms II . 111' and II . IIq are dual.
These examples suggest that we cannot generate new norms by
taking the dual of a c!ualnorm- we simply get back the original norm.
We are going to prove that this is indeed true, but to do so we must first
establish an important result on the extension of linear functionals.
A linear functional <p : X -t e defined on a proper subspace X of
en has a representation in the form (1.9). However, the vector y is not
unique; for example, it can be replaced by y + z, where z is any vector
in X.l. Since (1. 9) defines <p on all of en, another way of expressing
the nonuniqueness of y is to say that there are many ways of extending
the functional <p from X to en. The following theorem shows that
among these extensions is one that does not increase the norm of the
functional.
1.4. Linear Functiollals and Dual Norms
A LINEAR FUNCTIONAL on en is a continuous function <p : en -t e that
is linear. The matrix representing such a function has dimensions 1 x n;
i.e., it is a row vector. Thus to each linear functional <p on en there
corresponds a unique vector y such that
for all x E en.
There is a rather nat.ural way in which a linear functional can be
given a norm. In ardinary conversation we would say that a linear func-
tional was big if it mapped vectors of ordinary size into large ones; and
conversely we would say it was small if it mapped ordinary vectors into
small ones. If we make the notions of "big," "small," and "ordinary"
precise by choosing a specific vector norm v, then we can define the
"size" of the functional <p by
Theorem 1.11 (Hahn, Banach). Let v be a norm on en. Let X be
a subspace of en, and let <p : X -t e be a linear functional satisfying
max 1<p(x)1 = p.
xEX
v(x)=l
v*(<p) = max 1<p(x)l.
l/(x)=1
Note that v* is well defined, since 1<p(x)1 is continuous, and by the
equivalence of norms the I/-sphere SI/ = {x : v( x) = I} is closed and
bounded. It is easy to verify that v* is indeed a norm.
According to (1. 9) every linear functional <p on en can be identified
with a vector y E en. Hence (1.10) defines a new norm on en. This
justifies the following definition.
(1.10)
Then <p can be extended to a linear functional on en that satisfies
max 1<p(x)1 = p.
TEC n
l/(x)=l
Proof. Without loss of generality, we may assume that JL = 1. If
X = en, then we are through. Otherwise, there is a vector u =I 0 that
58
11. NORMS AND METRICS
X'
I . . X We shall show how to extend 'f! to the space
does not Ie 111 . '
spanned hy X and u in such a way that
max \'f!( x) I = 1.
TE.\"
1/(:1")=1
(1.11)
Let Xl and X2 be two vectors in X. Then
'f!(xd - 'f!(.L2) V(XI - X2) V(XI + u) + V(X2 + u).
Hf'nce
'f!(:rl) - v(:r\ + 1l) 'f!(:r2) + v(:r2 + 11.).
1 ' 1 ' l ' lleql Hllit y im p lies that any element in the set {'f!(x) - v(x + u) ) :
liS ' c . h t { ( x ) + v( x + u .
x E X} is less than or equal to any element m t e se 'f! .
E X} If we set 'f!( -u) equal to any value lying between these two
ets (or'equal to their one common value if they intersect), then
'f!(x) - v(x + u) 'f!(-u) 'f!(x) + v(x + u).
In other words
\'f!(x + u)\ v(x + u).
Now extend 'f! to X' by linearity; i.e., for x E X set
'f!( x + au) = 'f!( x) + a'f!( u ).
If a ::F 0, we have
\'f!(x + au)1 lall'f!(a-Ix + u)\ lalv(a-Ix + u) = v(x + au).
Hence (1.11) holds. I . X' and extend
If X' =I en, we may select another vector u not m . ".
I 1 I d X' in such a way that ItS norm IS
'f! to the space spanne( )y u an 1 II 1
. 1 After a finite number of such extensions, we s la lave
not mcrease( . c '
extended 'f! to all of en. .
We are now in a position to prove the duality theorem.
Theorelll 1.12. Let v be a norm on en. Then v** = v.
1. VECTOR NORMS
59
Proof. From the definition of dual nOrm we have
lyHxl v*(y)v(x).
(1.12)
Consequently
v**(X) = sup lyHxl v(x)"
v*(y)=1
To show equality define a linear functional 'f! on the space spanned
by x by 'f!(ax) = av(x). Then (with an abuse of notation) v*('f!) = 1.
By the Hahn,-Banach theorem, there is an extension of 'f! to en with
v* ('f!) = 1. Let z be the vector representing 'f!. Then v* (z) = 1 and
IzHxl = v(x). Hence
v**(x) = sup lyHxl 2: v(x),
v*(y)=1
which establishes the theorem. .
We note for later reference that (1.12) is a generalization of the
CAUCHY INEQUALITY,
lyHxl lIylbllxll2'
For this reason it is sometimes called the GENERALIZED CAUCHY IN-
EQUALITY.
Notes and References
The quantitative notion of distance or size is as old as thc mcasuring stick.
Our 2-norm in 3-space is just the Euclidean notion of size - the basis of
Greek geometry - and the triangle inequality says that the lcngth of one
side of a triangle is less than the sum of the lengths of thc other two sides.
Therc are two ways of generalizing Euclidean length to a vector space. The
first, due to Minkowski [157, 1911, v.2, pp.131 -229], uses convcx bodies to
define norms. Specifically if K is a compact convcx set containing the origin,
we may define a norm I/!((x) as the reciprocal of thc number a such that
ax lies on the boundary of K (in this approach we must add a 2: 0 to the
homogeneity condition). Minkowski established the equivalence of his norms
to the 2-norm, introduced thc dual norm (he called it the polar norm), and
showed that the dual of the dual was the original. Although Minkowski
worked only in 3-space, it is obvious (as it must have been to him) that the
GO
- .---"-- -_.---"-----._---------------
I I. NORMS AND METRICS
1. VECTOR NORMS
Gl
approach p;encralizes. For a modern treatment along these lines see [121,
Ch.2]"
6. (Arithmetic-geometric mean inequality). Let > O. Show t.hat
The second approach is to add the axioms for a norm to those for a vec-
tor spacc, which was done independently by Banach [9, 19 22 ] and Wiener
[268, 19 22 ]" For Wiener t.he matter seems to have been little more than a
mat.hematical pxcrcisp. Banach, on the ot.her hand, developed the notion
ext.C'nsivdy and went. on to apply it. In any event, nOrIned linear spaces
are known today as BANACI! SPACES. It. is of interest that both Banach and
Wiencr use the modern notation 1\ . II to denote a norm.
Norms of the form y' x H Ax arise frequently in the analysis of iterative meth-
ods for linear syst.ems [276]. In spite of the equivalence of norms, a statement
lJlade in one elliptic norm may in practice mean something entirely different
frolJl the same st.atement made in the Euclidean norm. We will return to
this point in Section 111.2, where we discuss the limitations of absolute and
relative errors defined in terms of norms.
c - O' + 0' _ 1 { ::::: 0 if 0: > 1 or 0' < 0,
:::; 0 if 0 < 0' < 1.
Conclude that for 0 < 0' < 1 and {: {: > 0
<.,1,<.,2 _ ,
l-n :::; 0'6 + (1- 0')6.
EquivalentJy if p > 1 and 1 + 1 = 1 tl
l' q , len
1 1 0' {3
O'v{3q:::; - +-
P q
(1.13)
for all nonnegative 0' and {3.
7. (Holder's inequality[1l8, 1889]).
Show that if p > 1 and 1 + 1 = 1 tl
l' q , len
We have already pointed out that Minkowski had the concept of the dual
norm. The approach taken here is due to Hahn [99, 19 2 7] and Banach [10,
19 2 9]. The proof given here is adapted from the one in the elegant book
by H.ipsz and Sz.-Nagy [184, 195fj]. The inequality yT x :::; \1xlbllylb for
real vectors is due t.o Cauchy [41, 1821, Note II, Theorcm XVI]. It, is also
associated with t.he names Schwarz and Bunyakovski.
lyIlxl :::; Ilxlll'lIyllq.
!Hint: (1.13) take 0' = Ullxlll' and {3 = 1/i/l l x ll
mequahtles.] q, and sum the resulting
8. (Minkuwski's inequality [156, 1896]). Show t.hat. for p > 1
1. Let v be a vector nonn. Show that Iv(x) - v(y)\ :::; v(x - y).
2. Verify directly that the functions \1 . 111' (p = 1,2,00) are norms.
3. (Cauchy inequality). Show that
IIx + ylll' :::; II X 111' + lIylll"
[Hint: Assume without loss of generality that x, y > O. Write
(i + 1/i)1' = L i(i + 17;)1'-1 + L 1/i(i + 1/iyl,
. '
.
Exercises
I:rlly\ :::; IIxlb\1yl12
and apply milder's inequality twice.]
9. Shuw that if 1 + 1 - 1 th 1.1 H U ld
l' q - , en Ie 0 er norms II . 11 1 > and \1 . IIq are dual.
-0-
10. Let {I be a norm on em and v be n
x E em+n in the f _ ( II II norm on e . Partition any vector
. . OrIn x - xI x2) , where XI E em. Show that tl
followmg functIOns are norms on e m + n : Ie
with equality if and only if x and yare linearly dependent.
4. Show that up to a constant multiple the 2-nonn is the only unitarily
invariant norm on en.
5. Show that Iyll:rl :::; \1x\1l\1y\1oo. Conclude that Ilxll :::; \1xl\r\1x\1oo.
TilE FOLLOWING EXERCISES SHOW THAT THE HOLDER NORMS
ARE TRULY NORMS. THEY FOLLOW BECKEN BACH AND BELL-
MAN [20].
1. (1 (Xl) + V(X2),
2. y' {1(xJ) 2 + V(X2 ) 2 ,
3. max{{1(xJ), V(X2)}'
62
II. NORMS AND METRICS
1. VECTOR NORMS
63
11. Let v be it norm on C n . Show that the function
P'J(x, y) = v(y - x)
17. (Jordan and von Neumann [127, 1935]). Let v be a norm on C n . Show
that a necessary and sufficient condition for v to be generated by an inner
product is that it satisfy the RHOMBUS IDENTITY
is a metric; i.e..
1/ 2 (X + y) + v 2 (x - y) = 2[1}(x) + v 2 (y)].
L 1',,(:1', y) 0,
2" 1',,(:1', y) = 0 <==> :r = ;II,
3" plJ(:r,1/) = Pv(Y, :1:),
4. Pv(x, z) :::; Pv(x, y) + Pv(y, z).
I I 1 f t . I C n X C n is a PSEUDO-METRIC if it satisfies all
12. A rea va ue< unc Ion 0 I
the defining conditions for a metric except
-<>-
l8. Let 1/ be a norm on en The sequcnce T], :1:2, . . " is a CAUCHY SEqUENCF;
if for every f > 0 there is an integer N such that 1/(.7:; - x j) :::; f whenever
i,j N.
1'2 (x, y) = 0 {=} x = y.
Show that the relation 1'( x, y) = 0 is an equivalence relation un en. ?e-
. . I - -I b y ( x ) Show that the functIOn
note the correspondlllg eqlllva ence c asses . . k .
p( (x), (y)) = p( X, y) is well defined and is a metnc over the space of eqUIva-
lence classes (x).
13. Verify the inequalities in Example 1.8.
14. Prove Theurem 1.4.
THE FOLLOWING EXERCISES CONCERN NORMS GENERATED BY
POSITIVE DEFINITE MATRICES.
. . ( ) . en EB C n ---> R
15. An inner product on en is a contllluous mapplllg .,. .
that satisfies
1. Show that this definition is independent of the choice of norm.
2. Show that en is COMPLETE; that is, every Cauchy sequence in en
converges in en. [Hint: use the corresponding fact about complex
numbers. ]
TilE FOLLOWING EXERCISES EXPLORE SOME OF THE RELATIONS
OF NORMS AND CONVEXITY.
19. Let v be a vector norm and let
Bv = {x : v( x) :::; I}
be the unit v-ball. Show that Bv is closed, bounded, convex, and equilli-
brated (x E Bv and 10'1 :::; 1'* ax E Bv). Shuw further that B IJ contains the
origin in its interior. Conversely if B is a closed, bounded, convex, equilli-
brated set containing the origin in its interior, then the function VB defined
by
VB(X) = inf{a- I > 0 : ax E B}
(1.14)
is a vector norm.
L :r # ° {=} (:r, :r) > 0,
2. (0':1: + riy, z) = n(:r, z) + (3(y, z),
3. (y, :1:) = (:1:, y) .
Shuw that any inner product has a unique representation of the form (x, y) =
H Ax where A is positive definite. Conclude that the function v defined by
v2(x)"= (x, x) is a norm. It is the NORM GENERATED BY THE INNER PRODUCT
(-, ").
16. Let v be a norm generated by an inner product. Show that the unit ball
{:r : lJ(:1:) < I} is an ellipsoid (hence the alternative name ELLIPTIC NORM for
these nors)" Dcscribe the lcngths and situations of the axes.
20. Let BeRn bc a cluscd, boundcd, convcx set containing the origin. Show
that the function VB defined by (1.14) is a norm in which the homogeneity
condition is replaced by
0'2:0 => VB(X) = O.
21. Show that the Hahn-Banach theorem is equivalent to the following state-
ment. If x is a point outside a closed, bounded, convex set cuntaining the
origin in its interior, then there is a hyperplane that separates x from the
set.
-<>-
II. NORMS AND METRICS
2. MATRIX NORMS
65
64
. 1 fi 1 1 Y ( 110 ) is a norm.
22. Verify that the function (e 11('( >. .
WING EXERCISES IS TO SHOW THAT
THE PURPOSE OF THE FOLLO " IN INFINITE DIMENSIONAL
THE EQUIVALENCE OF NORMS FAILS
SPACES"
t of all infinite sequelle es x = (6,6,...)
23. For 1 :e:; ]I :e:; 00 let PI' be the se t k the limit). Show that if PI < ]12
1 tl t ,,",00 I C I I' < 00 (for P = 00 ,a e
suc I la L.,,=1 c" .'
tl n is P ro p erly contamed m £1)2'
len I'l '
S how that the function II . \II' defined by
24.
This definition has the consequence that all the properties of vector
norms developed in the previous section remain true of matrix norms.
For example, all matrix norms are equivalent and generate the same
topology, in which they are all continuous functions. It makes no the-
oretical difference whether we define convergence of matrices element-
wise or as convergence in any matrix norm.
A natural generalization of the Euclidean norm is given in the fol-
lowing definition.
Definition 2.2. Let A E e mxn . The FROBENIUS NORM of A 1S the
number
1
\I:r\lp = (L liIP);;
i=1
00
m n
IIAIIF f L L IO'ijl2 = trace(A II A)L
i=1 j=l
(2.1 )
. C 1 sequences in the
hat £ is complete; that IS, auc W, ' c
is a norm OIl fl" Prove t "I' ( N t . £ 's called a HILBERT SPACE.)
p-norm converge in the I)-norm. 0 e. 2 I
H "ld norms are defined on £ I, they are not
25 Show that although all the 0 er l th ' . IS a se q uence Xk in £1 such
. . 1 'f < P t len ele '
equivalent. In partlcu ar I PI 2
that \lxk\lpl ---> 00 while \lXk\l1'2 ---> O.
-<>-
Note that when A E e nxl , i.e., when A is a vector, the Frobenius norm
reduces to the 2-norm.
Our Definition 2.1 of matrix norm has one important defect: it
makes no concession to the fact that matrices can be multiplied. What
we would like is an analogue of the triangle inequality for matrix mul-
tiplication. In fact the Frobenius norm satisfies such an inequality:
namely,
IIABIIF:::; IIAliFIIBIIF'
2. Matrix Nonns
2.1. Basic eoncepts . .
. . ector S p ace of dimensIOn mn, It
I f m x n matnces IS a v , I
Since t le space 0 . . tile same wa y as vector norm. n
I 1 fi a matnx norm m " I
is natura to (e ne < . 11 I tl '. I owev er as we shall see ater,
. d fi . t' e WI (OilS, 1 ,
the fol1owmg e m Ion, W d ' d " I roperty to be really useful.
a matrix norm needs an a ItlOna p ,
e rnxn ( or a
. . e mxn -+ R is a NORM on
Definition 2.1. A functJOn v . . ".
M) l ' f it satisfies the followlllg cond1tlOns.
MATRIX NOR '
whenever the product An is defined (Exercise 2.1), However, not every
matrix norm satisfies this kind of equality, as the following example
shows.
Example 2.3. Let us attempt to generalize the DO-norm as we did the
Euclidean norm by defining
voo(A) = max 100ijl.
',)
Clearly this is a matrix norm. However, if
L A:I 0 ===> v(A) > 0,
2. v(O'A) = \O'\v(A),
3. v(A + B) :::; v(A) + v(B).
A= ( II )
1 1 '
then voo(A.A) = 2 > 1 = voo(A)voo(A).
11. NORMS AND METRICS
G6
The sub multiplicative incquality satisfied by the Frobenius norm
allows us to obtain bounds on the products of matrices in terms of
the individual matrices. So important is this for matrix analysis, that
norms with this property are given a special name.
Definition 2.4. Let 11, v, and p be norms on e mxn , e nxk , and e mxk .
Then Ii, v, and p are CONSISTENT if
p(AB) ::; Il(A)v(B)
whcnever A E emxnandB E e nxk . In particular, a matrix norm v on
e nxn is consistent if v(AB) ::; v(A)v(B) for all A, B E e nxn .
Since vector norms can be identified with matrix norms, Defini-
tion 2.4 includes the notion of consistency of a vector norm and a
matrix norm. For example, the Frobenius norm and the vector 2-norm
are consistent-that is \\Ax\\F ::; \\A\\F\\xl\2-because \\X\\2 = \\X\\F'
The following theorem shows that for any consistent matrix norm
there is a consistent vector norm.
Theorem 2.5. Let \\ . II be a consistent matrix norm on e nxn . Then
tJlf're is a norlJl 1/ 011 en that is COIIsistCllt with \\ . \\.
Proof. Chose a nonzero vector a E en and define
v(x) = \\xa T \\.
It is easy to verify that v is a vector norm. Moreover, since
v(Ax) = \\Axa T \\ ::; \\A\\\\xa T \\ = \\A\\v(x),
v is consistent with \\ . \\. ·
Consistent matrix norms have an important relation to the eigen-
values of a matrix. Let us define the SPECTRAL RADIUS of a matrix A
to be the number
p(A) d,g max{\A\ : A E L:(A)}.
Then we have the following theorem.
2. MATRIX NORMS
67
Theorem 2.6. Let II . II be "
Il1a trI ' x A a consistellt matrix lIorm T h e Ii
. n or allY
p(A) ::; \\AII. (2.2)
Proof. By Theorem 2 5 tl .
with II . II. Let x be an ien:: IS etor norm that is consistent
A; i.e", , or 0 correspondmg to an eigenvalue
Ax = AX.
Taking norms we get
IAlv(x) = V(AX) = v(Ax) ::; IIAIiI/(x).
Since v(x) > 0 we ma y d l ' . d b ( )
( VI e y v x to get I A I < II All
2.2) follows from the fact that A E £ (A) ' '. - . The result
( IS arbItrary. .
2.2. Operator Norms
Recall that in Section 1 4 we defi d tl d
. "ne Ie ual to the norm v by
v*(y) = max lyHxl. (2 )
1/(x)=1 .3
A .
. I, n Immediate consequence of this was the generalized C I .
I ,y , , auc lY lllequal-
. I yll:r: I ::; I/*(Y)I/(:r:),
whIch may be interpreted .
e, e 1xn and e nxl ,a saYlllg that the norms I . I, v* and v on
, are consIstent. It turns out th t ( 2 3 )
general technique for general' I . a . represents a
will call OPERATOR NORS'. mg a c ass of consIstent norms, which we
Let J1 be a norm on em and b
of norms, the function /I I ' S V t ' e a norm on en By the equivalence
( ) . t"" con lllUOUS and th - I S
v x = I} IS closed and b d d e v sp lere v = {x :
oun e . Hence for any .
may define the number II A II b Tn X n matnx A, we
/l",V y
IIAII/L,v = max IL(Ax). (2 4)
v(x)=1 .
As the notation II A II
p.,V suggests, the function 11 . 11 I ' S a . II
Th p.," (CLua y a norm
eorem 2.7. Let J1 and v be as above and 1 II II .
(2.4). Then II . II/L,v is a norm on emxn '1"1 .et . ./L. v be defined by
v. ' W 11C I IS consIstent with J1 and
68
II. NORMS AND METRICS
2. MATRIX NORMS
69
Proof. We first prove consistency. From (2.4),
Il(Ar)
II A ll - max-"
I',IJ - TiO v(x)
We saw in Example 2.3 that generalizing the vector oo-norm by ex-
tending its algebraic definition to matrices failed to produce a consistent
matrix norm. The notion of an operator norm provides a means of ex-
tending the definition of the Holder norms to all matrices. Specifically,
if we define
Th('ITfoJ'(" II ( ) (2}i)
/1.( AI') S 11/ 1 ,.,I,fI :1' "
tl t II . II is a matrix norm by showing that it
N ow we prove la I"V
fi" 21
satisfies the :ollditiOlS of De I;;tn"#. O' there is a index i such that
1. PositIve defimteness. ( A1 ) ' < \lAII. vv(l i ), which implies
A1 i "I 0" Then from (2.5), 0 < 11 ,- "
that IIAII/l.v > O.
2. Homogeneity. For any n we have
II '. \\ = max JI(aAx) = max lalll(Ax) = laIIIAII/I,'J'
ax I',IJ v(x)=1 v(T)=l
. . A B E e mx1I . Suppose that x satisfies
3 Triangle mequahty. Let , )
v(x)'= 1 and J4(A + B)x] = \lA + BII",IJ' Then from (2.5 ,
IIA + BII,I,v = 1l[(A + B)x]
Il(Ax) + Il(Bx)
IIAII'I,vv(x) + \lB\l/',vv(x)
= IIAII'I'v + \lBI\'"v' ·
IIAIlI' = lIIax 1111.1'111"
lI.rll,,--1
(2.6)
then by Theorem 2.9
IIABlip IIAllpllnll p
(2.7)
whenever the product AB is defined. We say that these Holder matrix
norms form a CONSISTENT FAMILY OF MATRIX NORMS. Another consistent
family is the Frobenius norm defined by (2.1).
For p = 1,2,00, the Holder matrix norms have explicit characteri-
zations.
Theorem 2.10. Let A = (aij) E e mxn . Then
m
IIAlh = max L laij!,
I::;J::;ni=1
(2.8)
n
IIAlioo = max L laijl,
I::;,::;m j=1
(2.9)
Th 2 7 J ' ust ifies the following definition.
eorem . .
e m d en The norm
1 lJ be llorms 011 all .
Definition 2.8. Let. /1. aJ]( > elllX1l It is also
1\ . 1 ' 1 1 ?[ill('d by (2.4) is called all OPERATOR NORM 011 .
1',1' ( ( . RD INATE to the vector norms 11 and v.
said to be the norm SUBO
. ws that an operator norm subordinate to two
EquatIOn (2.5) slo, . I tl The following thearem shows
. s consIstent WIt 1 lem. . h
vector norms I, , , . . t norms are consistent WIt
that under appropriate condItIOns, opea or '
themselves. Its proof is left as an exercIse.
e m e n and e k and let
L v and p be norms 011 " ,
Theorern 2.9. et JL" 1 , b d'nate operator norms. Then
II . 1\ II . II alld II . II,I,!' be t Je su or 1
It,V' I/d"
IIABI\",!' S IIAII",IJII B II,),!"
and
IIAII2 = y' .\.nax(AIIA) = O'max(A),
(2.10)
where A:nax(AHA) is the largest eigellvalue of All A and O'max(A) is the
largest singular value of A.
Proof. To prove (2.8) let A = (0.1,"', an) be partitioned by columns.
For any x "I 0 we have
n
IIAxll1 = L jaj
j=1
n
L Ijlllajill max lIajlhllxlll'
j=1 I::;J::;n
Hence
IIAIII S max lIajlh.
I":J":1I
(2.11)
70
II. NORMS AND METRICS
On the other hand, if maX1:'Oj:'On \lajllI = \lakllI, then
IIA 1 k\ll = Ilaklll = max Ilajl\J.
Ilhlli I:'OJ:'On
lIence
(2.12)
IIAIII 2: l1t{,) lIajl\J.
., ( 1) d ( 2 12 ) together imply (2.8).
The two inequalItIes 2.1 an. .' I E tion (2 10) fol-
The characterization (2.9) is proved smlliar y. qua .
lows from Theorem 1.4.3. ·
II II ' times called the
. f ( 28 ) -(2.10), the norm . 1 IS some "
III vIew o. \1 II th ROW SUM NORM; and the narm
COLUMN SUM NORM; the norm . 00' , e
II . 1\2, the SPECTRAL NOR. 'lVenient characterization of the spec-
There is no computatlOnally COI b ' , f " P ro p erties that make it
. d I a e anum er 0 Illce .
tral norm, but It oes 1< v'S f these properties are contamed
useful for theoretical purposes. ome 0 ,"
in the following theorem.
Theorem 2.11. For any matrix A
1 II Alb = max Iyll Ax\,
. 117112=1
Ily112=1
2. IIAlIII2 = IIATlh = IIA112,
3. IIAIIAlb = IIAII,
4. If U and V are unitary, IIU lI AVlb = IIAIb.
5. \lAII \I AI\! \lAlloo.
Proof. The first four items follow directly from tl:e sin:oI;1:::
decomposition, and their proofs are left as an exercIse.
item, note that from (2.7) and Theorem 2.6
IIAI11 = A 1I1ax (A II A) IIAIIAI\I \lAlllltllAIIt = \lA\lexo\lAIII' ·
2. MATRIX NORMS
71
Notes and References
The spectral norm was introduced by Peano [177, 1888], who established its
basic properties. The Probenius norm appears briefly in the paper of Peano
just cited, but only as a bound for the spectral norm. Schur [193, 19 0 9] uses
it in an important bound on the eigenvalues of a mat.rix. Probenius [76, 77,
19 11 ] appears to be first. t.o regard the function II.IIF as what. we would call
a matrix norm. Actually he worked wit.h the quantity II'II, which he called
the Spannung of a matrix, and established the equivalent of the triangle
inequality, consist.ency, invariance under orthogonal transformations.
Although norms of operators in infinite dimensional spaces were a staple of
functional analysis almost from the beginning, they seem to have percolated
more slowly into matrix theory. Applications in numerical analysis played
a large role in the process, thanks in large part to a series of conferences
in Gatlinburg, Tennessee, hosted by A. S. Householder, which combined
both the theoretical and the numerical aspects of matrix theory. Any list
of the more influential works would include von Neumann [254, 1937],[256,
1947], Paddeeva [68, 1959], Mirsky [158, 1960], Householder [121, 19 6 4], and
Wilkinson [269, 1965].
Exercises
1. Show that IIABIIF ::; IIAIIFIIBIIF whenever the product AB is defined.
2. Prove Theorem 2.9.
3. Establish the first four items in Theorem 2.11.
4. Show that IIAII ::; IIAIII IIAllexo.
5. Let A = (al ... an), where IIadl2 = 1 (i = 1,..., n). Show that IIAI12 ::;
yn.
6. Let v be defined on e nxn by v(A) = nmaxi,j Inijl. Show that v is a
consistent matrix norm.
7. (Gastinel [121, p.61]). Show that if v is a matrix norm on e nxn , there is
a constant T such that the function A 1-+ Tv(A) is a consistent matrix norm.
8. Let v be a consistent matrix norm on en and let X be nonsingular. Show
that the function Vx defined by
vx(A) = v(X- 1 AX)
)i
72
II. NORMS AND METRICS
is a consistent matrix nortn.
g. Let v be a norm on c nxn and let B be nonsingular. Let the norm It be
defined by J.L(x) = v(Bx). Show that the operator norms 11. III' and II. I\v
subordinate to It and v are related hy the equation J.L(A) = v(BAB- I ).
10. Ld II . 11 be the operator norm subordinate to an absolute vector llorm.
Show that
II dia g(8 1 , 82, . . . ,8" )11 = max 1 8 il.
L
(2"13)
" Conversely, if (2.13) is satisfied for all diagonal matrices by an operator
norm, then it is generated by an absolute nortn.
11. (Mirsky [121, p.61]). Show that
inf I\X- I AXIIF = L 1.A12,
X nonstngular AEL:(A)
with equality for some particular X if and only if A is diagonalizable.
12. Show that the numbers Ppq in the following table satisfy IIAl\p ::; ppqllAllq,
where A E c nxn . Show that equality can be attained.
q
]I 1 2 00 F
1 1 ,fii n .;n
2 .;n 1 .;n 1
00 n .;n 1 .;n
F .;n .;n ,fii 1
13. Build the table in Exercise 2.12 for the same norms over c mxn .
14. Let A E C" X1J al\(l let f > O. Show that there is a consistent matrix
norm 11 . II A,( such that
IIAIIA,( = peA) + f.
[Hint: Reduce A to Schur form. Use a diagonal similarity transformation
to reduce the off-diagonal clements so that the infinity norm is less than
peA) + f. Finally, use Exercise 2.8 to undo the transformations.]
15. Show that the spectral norm and the Frobenius norm are UNITARILY
INVARIANT; that is,
U, V unitary =? IIU H A Vllp = IIAllp, ]I = 2, F.
2. MATRIX NORMS
73
lG. Let A be square. Show that there i .
IIAII = peA) if and only if ev . s a consIstent nortn " . " such that
nondefective. . ery eIgenvalue .A E L(A) with I.AI = peA) is
THE FOLLOWING EXERCISES INVESTIG
TilE POWERS OF A MATRIX A E C'1Jx n ATE THE PROPERTIES OF
17. Show that Jim A k 0 . f
koo = I alld only if peA) < 1.
18 00 , (Ne k umann series ) . Let peA) < 1.
'" Show that I - A . .
6k=O A = (I _ A) -I . . IS nonslllgular and
19. Show that if v is . I,
a conSIS ent matrix norm thenlim (A k ) !
-0-' koo v k = peA).
20. Let the infinite series r/J( () = L oo k .
hw that if peA) < (j then L oo k=o2 k ( have radIUs of convergence (j.
hnnt. k=O rk A converges. We write r/J(A) for its
21. In the last exercise let I .A I
, < (j. Show that
r/J[Jk(.A)] =
(T
r/J'(.A)
r/J(.A)
o
r/J(3)(.A)/3!
r/J"(.A)/2
r/J'(.A)
r/J"(.A)/2
r/J'(.A)
r/J(.A)
r/J(3) (.A)/3!
)
r/J(3) (.A) /3!
r/J" (.A) /2
22. For any A E c nxn define
A 00 Ak
e =L,.
k=o r.:.
Show the following.
1. If AB = BA then e A + B _ A B
, - e e
2. det(e A ) = etrace(A)
3. deTA/dT = Ae TA .
23. Show that peA) < 1 if and onl if .
such that Q _ A Q AH i s po . 1, ' I Y fi . there IS a positive definite matrix Q
Sl Ive ( e 1Il1te,
74
II. NORMS AND METRICS
3. UNITARILY INVARIANT NORMS
75
24 A matrix is STABLE if all of its eigenvalues have negative real parts. Show
tht A is stable if and only if limt-++oo etA = O.
25. Show t.hat - A is stable if and only if there i a positive denite Ilatrix
' I . 1 t I t A '1 + MAli is positive definite. [Ihnt: Use ExerCIse 2.2,3.1
"' S\lC I ,la, "'
so that II A 1100 = 2. Let
3.1. Von Neumann's Theory
An important property of Euclidean space is that shapes and distles
do not change under rotations. In particular for any vector x an or
any unitary matrix V we have
u ( I 1 )
v'2 v'2
I I
- v'2 v'2
Thcn
( 2 ) 4
IIV Alloo = v'2
0 }2 '
00
3. Unitarily Invariant Nonns
IIV xlh = IIxjk
Since not all matrix norms are unitarily invariant, we may ask which
ones are. One purpose of this section is to establish a characterization,
due to von Neumann, in terms of certain vector norms, which, in this
connection, are called SYMMETRIC GAUGE FUNCTIONS.
In one direction the connection is easy to establish. Let A be of
order n, and let VII A V = 2: be the singular value decomposition of A.
Let II . II be a unitarily invariant norm. Since V and V are unitary,
. d l t l ctral and Frobenius norms:
An analogous property IS share )y Ie spe. V H A V
namely, for any unitary matrices V and V for whIch the product
is defined,
IIAII = 112:11.
IIVIIAVllp = IIAllp, p = 2,F.
These examples suggest the following definition.
jlAIl = IIAI12
(3.1)
Thus IIAII is a function <I> of the singular values of A.
Since II . II is a norm, the function <I> , regarded as a mapping from
Rn to R, is also a norm. Since by interchanging columns of V and
V we can make the singular values of A appear in any order, <I> must
be symmetric in its arguments. Since by multiplying a column of V by
-1, we can change the sign of the corresponding singular value, <I> must
depend only on the absolute values of its argument. Moreover, if II . II
is normalized and A has rank one, then IIAII = <I>(O"IId = 0"1. All this
suggests the following definition.
A llorlII II . li on c mxn is UNITARILY INVARIANT if it
Definition 3.1.
satisfies
IIVHAVII = IIAII
[or all unitary V and V. It is NORMALIZED if
W I I r e d that the S I Jectral and Frobenius norms are UIll-
e lave 0 )se v . ., th
tarily invariant. However, not all norms are unitarily mvanant, as e
following example shows.
Definition 3.3. A function <I> : Rn -+ R is a SYMMETRIC GAUGE FUNC-
TION if it satisfies the following conditions.
whenever A is o[ mnk one.
Example 3.2. Lct
A=( ),
1. x fc 0 <I>(x) > O.
2. <I>(px) = Ipl<I>(x).
3. <I>(x + y) :S <I>(x) + <I>(y).
4. For any permutation matrix P we have <I>(Px) = <I>(x).
76
II. NORMS AND METRICS
3. UNITARILY INVARIANT NORMS
77
<I>(ld = 1.
Since the set of unitary matrices is closed and bounded, there are uni-
tary matrices U o and V o for which the supremum in (3.4) is attained.
Let C = UOTV O H .
We claim that C is a diagonal matrix with nonnegative diagonal el-
ements. To see that the diagonals of C are nonnegative, let us suppose,
say, that /11 i= 0 is not positive. Then by multiplying the first row
of C by 1'1I/hlll (n.b., this is a unitary transformation) we increase
R trace( EC), contrary to the optimality of C.
To show that the off-diagonal elements of C are zero, let us suppose,
say, that /12 i= O. By multiplying the first row of C by 1'12/11121 and di-
viding the first column by the same number -- a unitary transfarmation
that does not change the trace - we may take /12 > O.
Let
5. <I>(lxl) = <I>(x).
The fUllction <I> is NORMALIZED if
- t . - auge function is an abso-
In the language. of. nonls, a SY d lIune rIClUtion transformations. If
lute norm that IS mvanant un e.r pern .
_ .., )T we will often wnte <I>(6,"', n) for . <I> (x).. .
x - (6, , . n, -I d roof that every unitarily mvanant norm IS
We have Just sketc I. a p . ar values of its argument. The
a svmmetric gauge functIon of the smg1 f -t' d if II . II<I> is
. . Iso tr ue' if <I> is a symmetrIc gauge unc IOn an
converse IS a. , "
defined by
IIAI\,y, = <1>(a1" . . , a,,),
I f A th n II . \\ <I> is a unitarily
a are the singular va ues 0 , e
where al,' . ., n ." h t IIAII is a definite, homogeneous
invariant norm. It IS ea to =t \t asatisfi:s the triangle inequality re-
fuction. However, to s eO:l wth an inequality, due to von Nemann,
qUIres more work. We b .g t . _ with the trace of theIr procl-
relating the singular values of two ma rIces
uct.
(:3.2)
Ro = ( cos B - sin B )
si II B cos B
and let Co = Cdiag(Ro, In-2)' Then
R trace(EC o ) = al hll cos e
+/12 sin e) + a2( 122 cos e - R 121 sin e) + 2::'=3 ai/ii,
and
n
max R trace(AU BV II ) = L aiTi'
U,V unitary i=1
(3.3)
dR trace( CO) I In
dB = al/12 - a2 /21.
O=()
If this derivative is nonzero, then R trace( Co) > R trace( C) for suf-
ficiently small B (positive or negative depending on the sign of the
derivative). Otherwise, let Co = diag(R, I n - 2 )C. Then
dR trace( Co) I
de = a2/12 - a I R/ 21 .
o=()
Since al > a2, /12 > 0, and the derivative (3.5) is zero, this lat-
ter derivative cannot be zero, and a small change in e will increase
Rtrace(EC o ). Either case is a contradiction of the optimality of C.
Since C is nonnegative and diagonal, its diagonal elements must
be the singular values of B; i.e., C = diag( T"(I), . . . , T,,(n)) for some
permutation 7r of {I, 2, . . . , n}. Hence
(3.5)
L 3 4 L et A and B have singular values al a2
elnma ., '
d T > > . . . > Tn- Then
an I _ '2 - -
an
f the case where the ai
P f It is sufficient to prove the theorem or ' . . t '
roo. " 1 . turb t h e a" so they are pOSI Ive
. . d d' tinct (ot lerWlse per , '
are d . P0 1 .Sl t IV t a a n nd tke the limit in (3.3) as the perturbation approaches
an (IS mc .
zero). . ) and T = cliag(TI,'" ,Tn). By passing t.o
Let E = dJag(al,"" an d B tha t ( 33 ) IS
. t' f A an we see .
the singular value decomposl IOns 0 ,
equivalent to
sup
U,V unitary
n
Rtrace(EUTV II ):::; LaiTi'
i=1
(3.4)
n n
sup trace(EUTV H ) L O"iT,,(i) L O"iTi,
U,V unitary i=1 i=1
7L-
II. NORMS AND METRICS
3. UNITARILY INVARIANT NORMS
79
f t tl t the a" and the Tj are
the' last ineqllality following from the ac, ,la, ., '
nonincreasing" · d I' <1>
. . .' I a dual norm <1>* whose ua IS
Since <1> IS Itself a nOlln, It las . llaracte rize II A II <1> in terms
) I It that we can ( c
(Theorem 1.12" t turns ?l ' I .t' tion is just what we need to
of the dual norm, and tills c lar<IC enza
establish the triangle inequality for II . 11<1>'
. [ ction and let <I>* be its
3 r: Let <1> be a symmetnc gauge un,
Lemma .0. d b ( 32 ) Then
1 1 L t the [unction II . II <1> be denne y . .
(ua. e
II All <1> = max 3? trace(X H A).
IIXII...=I ,
Proof. L'fr(A) = {at,...,a n }, and ledT(X) = {I,...,n} for any
X. Then
3.2. Properties of Unitarily Invariant Norms
The correspondence between symmetric gauge functions and unitarily
invariant norms allows us to transfer results about the former to the
latter. This subsection is devoted to establishing the basic properties
of unitarily invariant norms.
The first item of business is to extend the definition of unitarily
invariant norms to rectangular matrices. Suppose that <1> is a symmetric
gauge function on Rn. Then if m = min{ k, l} :S n, a unitarily invariant
norm on e kxl may be defined by
IIAII<1> = <1>(a],a2,"..,a m ,0,..",0),
max 3?trace(X H A) =
II x lI...=1
Inax
I1 X I1...=l
u,v unitary
3? trace(V XHV A)
where aI, a2, . . . , am are the singular values of A. We shall call these
norms THE FAMILY OF NORMS GENERATED BY <1>. This convention allows
us to use what is essentially the same norm on matrices of varying
dimensions.
In the most important cases, the gauge function <I> can be regarded
as defined for all infinite sequences 6,6, . . . with only a finite number
of nonzero elements, in which case the corresponding norm is defined
for matrices of all dimensions. For example, the function
n
= max iai (Lemma 3.4)
<1>.(l'"H,,,)=l i=1
= <1>(al, . . . , an) (by duality)
= II All <1> (by (3.2))" ·
W Lellll lHl 3 5 to prove t.hat. eVNY symmetric gauge func-
e now use , c.
tion generates a unitarily invariant narm.
) L t ,¥,. be a S y mmetric gauge [unc-
3 6 ( Neumann. e '¥ . . 1
Theorelll . yon Co d b (3 .2 ) . Then II . 11<1> is a umtan y
. R n d let 11 . II <1> be de1lne y . . . t
tJOn on an ' 1 . [ II . 11 is a unitan1y 111 van an
. c nxn Converse y, ] 1
invanant norm on . . t . U g e [unction <I> on Rn suc I
C nxn then there IS a symme nc ga
non1I on "
that IIAII = IIAII<1>' .
d I P rove that II . 11<1> satisfies the triangle inequalIty.
Proof. We nee on y
From Lemma 3.5,
IIA + BI\<1> = max 3?trace[X Il (A + B)]
IIxlI...=1 H max 3?trace(X Il B)
:S max 3? trace(X A) + IIXII...=I
IIXII...=I
= II All <1> + IIBII<1>' ·
<1>2(6,6,. ..) = max Id
,
generates the spectral norm, and
<I>F(I, 6, .. .) = J2; lil2
generates the Frobenius norm.
The fact that a symmetric gauge function is an absolute norm has
the following important consequences for unitarily invariant norms.
Theorem 3.7. Let A and B have singular values al 2: '" 2: an and
Tl 2: . . . 2: Tn. If ai :S Ti (i = 1, . . . , n), then [or every unitarily invariant
norm 11,11, we have IIAII :S IIBII.
Proof. Let <1> be the symmetric gauge function that generates II . II.
Then since <I> is absolute,
IIAII = <1>(al,.. ., an) :S <1>h,..., Tn) = IIBII. .
80
II. NORMS AND METRICS
. . ollar of this theorem and Theo-
The following useful result IS a cor Y
rem 1.4.4. and let the
L t II . 1\ be a unitarily invariant norm,
Corollary 3.8. e .
. A be partitioned 111 the [arm
matflx ( All AIZ )
A- .
- A Z1 An
Then \I All I! ::; \lA\I. . f nitarily invariant
" I role m the theory 0 u
The 2-narm plays a specIa .
tile following theorem shows.
norms as . Tl 1]
. '1' 1 aflant norms. Ie
Let 11 . 1\ be a family of U1l1tan y J1 v
Theorem 3.9.
\lAB\I ::; IIA\I\lBlb
(3"6)
and
\IABII ::; \lA\b\lB\I.
(3.7)
Also
\\AB\I 2: IIA\linfz(B)
(3.8)
and
\lAB\I 2: infz(A)IIBII.
(3.9)
M 1 ' [ II . II is normalized, then
oreover,
IIAIIz ::; \lAII.
. . Id (3 7) are immediate corollaries of
Proof. The inequalItIes (3.6) a.J I' . alities (3.8) and (3.9) are
Theorem 3.7 and Theorem 1.4.5. TIe mequ
corollaries of Exercise 1.4.6. > 1 e the singular values of A.
To establish (3.10), let al 2: . .. _ an )
Then since 1> is absolute,
) > 1>(al 0,... ,0) = al = IIAIIz. ·
IIAII = 1>(al,az,...,a" - ,
. ) and ( 3.10) we have the following corollary.
Combinmg (3.6), (3.7 , .'
1 . d unitarily invaflant norms IS
A family of norma lze ,
(3.10)
Corollary 3.10.
consistent.
\.
3. UNITARILY INVARIANT NORMS
81
3.3. Doubly Stochastic Matrices and Fan's Theorem
As far as unitarily invariant norms are concerned, the principal result
of this subsection is a thearem of Ky Fan, which gives conditions under
which one matrix dominates another in any unitarily invariant norm.
However, to establish it we must first prove some important theorems
on doubly stochastic matrices, one of which will be used later in this
book.
Definition 3.11. Let A be a matrix with nonnegative elements. Then
A is STOCHASTIC if A1 = 1. A stochastic matrix A is DOUBLY STOCHAS-
TIC if AT is also stochastic.
Since the elements of a row of a stochastic matrix sum to one, they may
be regarded as probabilities -- hence the name stochastic. A doubly
stochastic matrix is one whose rows and columns sum to one.
The first theorem gives necessary and sufficient conditions for two
vectors to be related by a doubly stochastic matrix. A little notation
will be helpful in stating and proving it. Let x = (6,..., n)T and
y = (1]1,..., 1]n)T be real vectors with
6 2: .. . 2: n,
1]1 2: . . . 2: 1]n"
(3.11 )
We shall write
x)-y
if
6 + . . . + k 2: 1]1 + . . . + 1]k,
k = 1, . . . , n - 1.
and
6 + .. . + n = 111 + . .. + 1]n-
We say that x MAJORIZES y.
Theorem 3.12 (Hardy-Littlewood-P6lya). Let the vectors x and
y satisfy (3.11). Then a necessary and sufficient condition for there to
exist a doubly stochastic matrix S with
y = Sx
(3.12)
is that
x)-y
(3.13)
82
II. NOHMS AND MI'THJCS
Proof. The necessity of the condition is left as an exercise. ';
For the sufficiency, note that (3.13) implies that if all the :ii'S are
equal to some number, then thJ Yi'S are equal to the same number, and
we may take 5 = I. Moreover, if x >- y and we add a constant to the
elements of x and y then we still have x >- y. Consequently we may
assume without loss of generality that
.; I > 0 > ';n
and
n
2:';i = O.
i=1
The proof is by induction. The result is trivial for n = 1. For n = 2,
the most general form of a doubly stochastic matrix is
( ai-a ) ,
I-a a
where 0 s: 0' s: 1. Thus (3.12) requires that
"11 = a6 + (1 - a)(2
and
112 = (1 - a)';1 + a6.
But the hypotheses of the theorelIl imply that 6 "11 6, which
in turn implies that the first of these equalities can be satisfied for a
unique a E [0,1]. Summing the two equalities, we see that the second
is equivalent to ';1 + 6 = 111 + 172, which is the equality in (3.13).
Far t.he gellNal case, let us first note that if equality occurs ill any
of the inequalities (3.13), then x and y can be partitioned in the form
( XI ) ( YI )
X = and y = ,
X2 Y2
where the pairs XI, Yl and X2, Y2 satisfy the hypotheses of the theorem.
Thus there are doubly stochastic matrices Sl and S2 such that Yi = SiXi
(i = 1,2). It follows that S = diag(S],S2) is the matrix required by
the thearem.
3. UNITARILY INVARIANT NOHMS
83
Let k be the index of the smallest positive ';i and I be the index of
the largest negative ';i; i"e.,
';k > 0 = ';k+1 = ... = (1-1 > .;/.
Let x' be the vector obtained by replacing k by k - a and ';1 by .;/ + a.
For sufficiently small a, we have X >- x' >- y.
Now conider the following three cases.
k > 1: Here we have equality in the first of the relations x >- x'.
i < n: Here we have equality in the next to last of the relations x >- x'.
k = 1, 1= n: This is equivalent to the 2 x 2 case.
In all three cases, there is a doubly stochastic matrix 5' such that
x' = 5' x. Now let us increase a from zero until one of two things
happens.
1. We obtain an equality in the relations x' >- y.
2. x k or xl becomes zero.
In the first case, there is a doubly stochastic matrix Sy such that Y =
8 I1 x ' , in which case 5 = Sy5 ' is the matrix required by the theorem. In
Y the second case, x' has one more zero component than x. We may then
'. repeat the above construction to obtain a new vector x" x' and a
P:'double stochastic matrix 5" satisfying x" = 5" x'. Again we either have
]' equality in the relation x" >- y, in which case the theorem is proved, or
+X" has one more zero element than x'. Ultimately this reduction must
+ furnish the required doubly stochastic matrix or produce a zero vector
;",' x(m). This latter implies that Y = 0 and the matrix 5 = 5(m) . . . 5" 5'
is the required doubly stochastic matrix. .
The second theorem is a characterization of doubly stochastic matri-
which says that they are the convex hull of all permutation matrices.
!The proof requires a theorem of independent interest. We begin with
}l definition.
,:
Ii; Definition 3.13. Let T E e mxn , where rn s: n. If 7r is any permuta-
>', tion of {I, . . . , n} and 1 s: jl < " . < jm s: n, then the vector
T
(T,,(I),h' . . . , T,,(m),jm)
is called a PERMUTATION VECTOR of T.
84
II. NORMS AND METRICS
Theorem 3.14 (Hall). Let T E e mxn (m :S n). Then there are pe'-
mutation matrices P and Q such that PTQ has a p x q zero submatnx
with p + q > n if and only if every permutation vector of T contains a
zero component.
Proof. We will first. show t.hat. if T can he permuted to the form
q 11 -q
T = TI-]I (I
T12 )
Tzz '
(3.14)
where p + q > n, then any permutation vector must h.ave zero como-
nents. Suppose to the contrary that (Ti"h, . . . , Tim,jm) IS a pernutatlOn
vector with no zero elements. Then from (3.14), first rn-(n-q) mtegers
iI, " . . , im-(n-q) must be distinct and lie bet.ween p+ 1 and rn (.inlusive).
It follows that m - p m - (n - q) or n p + q, a contrachctlOn.
The proof of the converse is by induction. It is trivial for rn 1
or n = 1. Therefore assume that rn, n > 1 and that every permutatIOn
vector of T has a zero component. Without loss of generality we may
suppose that Trnn fc O. Then every permutation vector of the (m - 1) x
(n - 1) leading principle submatrix must have a zero component. . For
any permutation vector not having a zero component could be combmed
with Tmn to give a permutation vector of T that does not have a zero
component - a contradiction.
By the induction hypothesis, T can be permuted to the form (3.14),
where now p + q = n. It follows that T I2 is square and T 2l . has at
least as many columns as rows. Now at least one of the matnces T 2I
or T 12 must. have all it.s permutation vectors with zero components; for
otherwise we could piece toget.her permutation vectors from T 21 and T I2
having nonzero components to fonn a permutation vector far T whose
components are nonzero.
Assume that all the permutation vectors of T2l have zero compo-
nents. By the induction hypothesis, we can permute the rows and
columns of T so that it has the form
""
'-t S
Tl2 )
7:22 ,
Tzz
(3.15)
]I ( 0
r 0
m-l'-r T 31
3. UNITARILY INVARIANT NORMS
85
where r+s > q. Then (p+r)+s > p+q = n, which shows that (3.15)
is the required matrix. .
From Theorem 3.14 we get the following corollary.
eorollary 3.15. If T E Rnxn is a nonzero multiple of a doubly sto-
chastic matrjx, then T has a pernllltathm vector consisting of pOBitivc
ele111cllts. v' \.. > I '
.."
Proof. The proof is by contradiction. Without loss of generality, we
may assume that T is stochastic. If every permutation vector of T
contains a zero element, then from Theorem 3.14 the matrix T can be
permuted to have a zero p x q submatrix with p + q = n + 1. Since the
property of being doubly stochastic is invariant under permutations, we
may assume that the zero submatrix is located in the upper left corner
of T; i.e.,
'1 nq
T = :_1' (I ) .
Now the sum of all elements of T I2 is p and the sum of all elements of
T2l is q. But p + q > n which is greater than the sum of all elements of
T. The contradiction establishes the corollary. .
We may now establish our characterization of double stochastic ma-
trices.
Theorem 3.16 (Birkhoff). The set of all doubly stochastic matrices
of order n is the convex hull of all permutation matrices of order n;
that is, any doubly stochastic matrix 5 can be expressed as a convex
combination of the permutation matrices Pi (i = 1,. . . , n!)
n!
n!
L Ui = 1, Ui 0, i = 1,..., n!.
i=1
(3.16)
S = LUiPi,
i=1
Proof. Any matrix of the form (3.16) is clearly doubly stochastic. It
therefore remains to show that any doubly stochastic matrix S = (Uij)
of order n has the form (3.16).
By Corollary 3.15, 5 has a regular set Uli" . . " , Uni n in which Ukik >
O. Let UI = minlsksn {UkiJ, and let PI be the permutation matrix with
1 in its (l,id,...,(n,i n ) elements" Let 51 = 5-UI P I.
Clearly the matrix 51 has the following properties:
86
II. NORMS AND METRICS
1. 51 is nonnegative;
2. The sum of all elements in each row and the sum of all elements
in each column are equal to 1 - al 0;
3. The 1lI1mber of zero elements of 51 is greater than that of 5 by
at least one.
If 1 - al = 0, then 51 = 0; i.e., 5 = al PI and the theorem is proved.
If 1 - 0"1 > 0, then by Corollary 3.15 the matrix 51 has a regular
set consisting of positive elements. Repeating the above argument,
we get a matrix 52 = 5 - 0"IP 1 - a2P2 in which 0"2 > 0 and P 2 is a
permutation matrix. As above, the matrix 52 is nonnegative. The sum
of the elements in each row and the sum of the elements in each column
are equal to 1 - 0"1 - 0"2 O. Finally the number of zero elements of 52
is greater than that of 5 by least. two.
Continuing in this way we may produce a sequence PI, P 2 , . .. of
permutations and multipliers 0"1, a2, . .., with 0 < ai :S 1. The sequence
terminates when 1 - al - . . . - am = 0, at which point
m
5 = 2: O"iPi
i=1
is the required convex combination. .
We may now prove Fan's theorem.
Theorem 3.17 (Fan). Let x, y ERn satisfy
6 . . . n 0,
7]1 . . . 7]n O.
(3.17)
Then
6 + . . . + k 171 + . . . + 17k,
k = 1,... ,n.
(3.18)
is a necessaIY and sufficient condition for
<I>(x) <I>(y)
to hold for all symmetric gauge functions <I>.
3. UNITARILY INVARIANT NORMS
87
Proof. For the necessity consider the symmetric gauge functions
<I>k(X) f . max {I';i, I + ... + !';ik I}.
IS"<"""<'kS n
(3.19)
If x and y satisfy (3"17) then <I>k(X) <I>k(Y) (k = 1,.". , n) is equivalent
to (3.18)"
For sufficiency, note that by successively reducing n, then nl, and
so on, we can obtain a vector i; :S x such that i; >- y. By the Hardy
LittlewoodP6lya theorem, there is a doubly stochastic matrix 5 such
that y = 5i; :S 5x. By Birkhoff's theorem we can write 5 as a convex
combination of permutation matrices:
n!
n!
2: O"i = 1, O"i 0, i = 1, . . . , n!.
i=1
5 = 2: O"iPi,
i=1
It then follows that for any symmetric gauge function <I>,
<I>(y):s <I>(5x) = <I> (O"iPiX) :S ai<I>(PiX) = <I>(x). .
The symmetric gauge functions <I>k defined by (3.19) have associated
unitarily invariant norms 1I.II<I>k' When Fan's theorem is recast in terms
of these norms it takes the following form.
eorollary 3.18. In order for
IIAII :S IIBII
for every unitarily invariant norm it is necessary and sufficient that
IIAII<I>k :S IIBII<I>k'
k = 1,... ,n.
Notes and References
The characterization of unitarily invariant norms as symmetric gauge func-
tions of singular values is due to von Neumann [254, 1932]. Lemma 3.4 is of
independent interest, since it links the eigenvalues and the singular values of
a matrix (see Exercise 3.2). The proof given here is new.
88
II. NORMS AND METRICS
For surveys with extensive bibliographies of the subjects of doubly stochastic
mat.rin's and majori7:ation H('( [159, 5]" For t.he Hardy Littlewood P6lya
theorem see [103,1934]. The proof given here is due to Ostrowski [168,1952].
For Birkhoff's theorcm see [33, 1946], and for Fan's see [70, 1951]. Fan's
paper is the culmination of a flury of results, initiated by Weyl [266, 1949] on
inequalities bounding eigenvalues in terms of singular values. As Fan points
out, these results can be used to establish von Neumann's characterization
of unitarily invariant norms in terms of singular values.
Theorem 3.14, which we have called Hall's theorem, is associated with the
names Konig and Frobenius. Hall [100, 1935] actually proved a set theoretic
version of the theorem given here and noted that it was a generalization of
a p;raph UH'oretic t.heore'm of Kiinip; [136, 1916], which van del' Warden [246,
1927] had }'('cast. in a sl'!. t.h('o}'(ti(' fonn. TIIP aSHociation wit.h Frobenius
apl)(ars t.o be spuriouH" The earliest reference we know that mentions him
in t.his connection is by Dulmage and Halperin [62, 1955], from which the
proof givell here is adapted. This paper cites Frobenius's famous paper on
nonnegative matrices [78, 1912]. However, the theorem does not appear
there, at least not in all obvious form.
Exercises
1. Show that for any unitarily invariant llorm II . II,
II (AI :2) II II ( :
A 21
A 22
) II.
2. Let L(A) = {AI,..., An} and S(A) = {iTl,... ,iT n }. Show that
IAII +... + IAnl ::; iTI +... + iT n .
3. (Fan and Hoffman [71]). Let II. II be a unitarily invariant norm.
1. Show that if H is Hermitian and U is unitary then
IIH - III IIH - UII IIH + III.
2. Show that for any Hermitian matrix H,
IIA - A +2 A H II::; IIA - HII.
>
.'t,
'\
:?,
4. METRICS ON SUBSPACES OF en
89
4. For any real vector x let :z;+ = (x + Ixl)/2 be the result of setting the
negative components of x to 7:ero. Show that x )- !/ if and only if IT (.1; -
Tl)+ :2: 1T(y - T1)+ for all real T.
5. Show that x )- y if and only if y is a COllvex combination of all vcctors of
the form Px where P is a permutation matrix.
6. (Hall's theorem, the original version [100]). Let A be a set consisting of
n elements. For Tn ::; n suppose that A = Ui1 Hi. Show that there exist
Tn distinct elements al,..., am of A such that ai E Hi (i = 1,..., rn) if and
only if every union of k of the sets Hi contains at least k distinct elements
of A.
4. Metrics on Subspaces of en
A difficulty in framing a workable perturbation theory for eigenvectors
is that there may be no unique eigenvector corresponding to a multiple
eigenvalue. For example, any nonzero vector in the space spanned by
the unit vectors 1 1 and 1 2 is an eigenvector of the matrix
A 0 n
When A is perturbed, two distinct eigenvectors can precipitate from
this subspace, and for different perturbations these eigenvectors can be
quite different, even when the perturbations are small. For example,
when E > 0 the matrix
u+n
has eigenvectors 11 and 1 2 , whereas the matrix
( n
-------
II. NORMS AND METRICS
4. METRICS ON SUI3SPACES OF en
91
has the very different eigenvectors (1 1 0)'1' and (1 -1 0)'1'. In spite of
such differences, the two eigenvectors will always span a space that is
very near to the space spanned by 11 and 1 2 . Thus it makes more sense
to derive perturbation bonnds for the subspace, which is stable, rather
than for the eigenvectors, which are not.
In order to derive snch a theary, we must first specify what we
mean by the distance between subspaces. Since we will be comparing
only subspaces of the same dimensions, we can restrict ourselves to
the problem of introducing a notion of distance on the set of all l
dimensional subs paces of en, a set that we will denote by C;' (or R/,
in the real case). Unfortunately, this can be done in a number of ways,
not all of which turn out as we might hope.
Definition 4.2. Let. X be a subspace of en and let y E en. If v is a
nonn on en, then the V-DISTANCE between y AND X IS THE FUNCTION
8" (y, X) ,f min v(y - x).
xEX
( 4.1)
An elementary compactness argument shows that Dv is well defined.
When v is the 2-norm, it follows from Theorem 1.2"5 that
D2(Y, X) = 11(/ - Px )ylb
( 4.2)
p(X, Y) = L[x(X), x(Y)]
In other words, the 2-distance between y and X is the distance between
y and its projection onto X. For this reason, a minimizing vector x in
(4.1) is sometimes called a v-projection of y onto X.
We are now in a position to define a distance between subspaces.
Definition 4.3. Let X, Y E er and let v be a norm on en" Then the
V-GAP BETWEEN X AND Y is the number
p,,,( X, Y) "'" mox b! 6,,( x, Y), :;:c 6"(y, X) } .
Thus the gap is the largest distance from Y of a vector of length one
lying in X, or vice versa, whichever is greater.
The definition of the gap function satisfies our intuitive notions of
what a distance between subspaces should be. In Example 4.1 it gives
Ri a topology in which lines that intersect at small angles have a small
distance from one another. However, the gap function need not be a
metric, and this means that we must establish from first principles that
a gap function actually generates a topology. We shall do this in two
stages. First we shall prove that all gap functions are equivalent, in the
same sense that all norms are equivalent. We will then show that one
gap function, Pg,2 is a metric. It will then follow from their equivalence
that all gap functions generate the same topology.
Example 4.1. Consider the space Ri of all infinitely extended lines in
the plane that pass through the origin. Given any sIbspace X E Rio
there is a unique vector x(X) = (6 6)T in X that lies in the half open
semicircle {x: 1I;z;112 = 1,0::; 1,-1 < 2:S 1}. It is easy to verify that
the function
is a metric on Ri. However, the lines along the directions (0, -1)
and (f, + 1) are nearly 7r apart in this metric, even though the lines
themselves approach one another as f -+ O.
In the above example we gave Ri the topology of a semicircle open
at one end, which allows lines that are near in the usual sense of the
word to be far apart in the sense of the p metric. This shows that we
have to t.ake some care in defining distances and metrics on e/,. In the
first subsection we will introduce one widely used distance function-
the gap - and derive its properties. In the second subsection we will
consider metrics that are unitarily invariant.
Theorem 4.4. Let f1 and v be norms on cn, and let
4.1. The Gap
af1(x) ::; v(x) :S (3f1(x),
a, (3 > o.
In defining a notion of distance between subspaces, it is natural to begin
with the distance between a point and a subspace.
Then
a (3
{jP g ,fl(X, Y) ::; pg,v(X, Y) ::; Pg,fl(X, Y).
(4"3)
92
11. NORMS AND METRICS
Proof. First we will establish t.he second inequality in (4.3). Let x E X
with v(x) = 1 and let y E y. Let x' = X/Jl(X), so that j1,(x') = 1, and
let y' = y / Jl( x ). Then
v(:1: - y) = IL(X)V(X' - y') :S !:.v(x' - y') :S Jl(X' - y').
a 0:
From this it follows that
Dv(X, Y) :S DI'(X', Y).
a
As :1: ranges over all vectors in X with v( x)
vectors in X with Jl(X') = 1. Hence
1, x' ranges over all
(3
max D,/(x, Y) :S - max D1,(J;', Y).
TEX Q :r'e,\'
v(x)=l It(X')=l
Similarly,
max D,,(y,X):S max D1,(y',X),
yEY a y'EY
v(y)=1 /t(y')=l
and the second inequality follows from the definition of the gap.
The first inequality in (4.3) follows from the second and from the
fact that
(3-IV(X):S Jl(x):s a-Iv(x). .
We shall now prove that P g ,2 is a metric. The proof is based on the
following characterization of Pg,2( X, Y) in terms of the canonical angles
between X and y.
Theorem 4.5. Let X,Y E C[', and let 8 = diag(Ol,...,OI), where
0 1 2: . . . 2: 0 1 are the canonical angles between X and y. Then
P g ,2(X, Y) = sin 0 1 = II sin 8112
(4.4)
Proof. By (4.2),
D2(X,Y) = 11(/ - Py)xI12'
4. METRICS ON SUI3SPACES OF en
93
Hence
max D2(X, Y) = max 11(/ - Py)XII2
xE,\ xE
IIx1l21 IIx1l21
= max 11(/ - Py)xlI2
:rE.\
II x ll2S 1
= max 11(/ - Py)Pxxlb
IIxllFI
= 11(/ - Py)PxI12'
Thus by Theorem 1.5.5,
max D2(X,Y) = sinOI.
.rE.\'
IIx1l21
Similarly,
max D2(Y,X) = sinOI,
yEY
lIyllF'
and the result follows from the definition of the gap" .
Since by Theorem 1.5.5 the singular values of Px - Py are the sines
of the singular values of the canonical angles between X and Y, we
have the following corollary.
eorollary 4.6. In the 2-norm,
P g ,2(X, Y) = IIPx - Py1l2'
(4.5)
,"
It immediately follows from Corollary 4.6 that Pg,2 is a metric on er
[e.g., to get the triangle inequality write Pg,2(X, Z) = IIPx - PZll2 :S
IIPx - Pyll2 + IIPy - PzII2 = Pg,2(X, Y) + Pg,2(Y, Z)]. Consequently,
Pg,2 induces a topology on Cr, and by Theorem 4.4 aU gap functions
induce the same topology, which we will call the GAP TOPOLOGY.
Equation (4.5) does not hold in general; if we replace the 2-norm
by another norm, the equality may fail. However, the right-hand side,
regarded as a function of X and Y, remains a metric on el'. We leave
the proof of the following theorem as an exercise (the subscript p in
(4.6) stands for projection).
Theorem 4.7. Let v be a matrix norm on e nxn . Then the function
ppAX,y)' v(Px - Py)
(4.6)
is a metric on el', which generates the gap topology.
94
II. NORMS AND METRICS
4.2. Unitarily Invariant Metrics
A metric P 011 Cl' is UNITARILY INVARIANT if p(X, Y) = p(U X, UY)
far all unitary matrices U. In this subsection we will be interested in
unitarily invariant metrics that generate the gap topology. Now the
metric of Example 4.1 is not unitarily invariant and does not generate
the gap topology" It is therpfare appealing to conjecture that any uni-
tarily invariant metric must generate the gap topology. Unfortunately,
the conjecture is not true, as the following example shows.
Example 4.8. Let X, Y E Ri and let O(X, Y) be the canonical angle
betwecn X and y. Denne p( X, Y) by
, _ { e( X, Y), if e( X, Y) is rational,
p( ,l , Y) - 1, if O( X, Y) is irrational.
Thcn it is easilv verined that p is a unitarily invariant metric. But two
subspaces with an irrational canonical angle are at a distance of unity,
no matter how near they are in the gap topology.
Fortunately, there are many unitarily invariant metrics that gen-
erate the gap topology. One of them is the gap function Pg,2, since
II . 112 is unitarily invariant. However, this metric effectively exhausts
class of unitarily invariant metrics that can he generated by gap func-
tions, since up to a constant multiple the 2-nonn is the only unitarily
invariant vector norm on e". l3ut there are many unitarily invariant
matrix norms, which can be used in the definition (4.6) to give unitarily
invariant metrics generating the gap topology.
Theorem 4.9. If lJ is a unitarily invariant matrix norm, then Pp,v is a
unitarily invariant metric on el"
By (4.4), Pg,2(X,y) = Ilsin8(X,Y)lb, and it is natual to ask if
we can find new unitarily invariant metrics of the form v[sm 8(X, Y)],
where lJ is a unitarily invariant matrix norm. Unfortunately, these
metrics are just the Pp,v metrics in disguise.
Theorem 4.10. Let lJ be a unitarily invariant matrix norm on e lxl .
Then there is a unitarily invariant matrix norm v' such that
v[sin 8(X, Y)] = pp,v'(X, Y)
for all X, Y E el'.
::
4. METRICS ON SUBSPACES OF C n
95
Proof. We will treat only the caSe 2l S n, leaving the case 2l > n
as an exercise. Let 171 2': ... 2': 171 be the sines of the canonical angles
between X and y. By Theorem 1.5.5, we know that the singular values
of Px - Py are 171,171,"..,171,171,0,. " , O.
Let <P be the symmetric gauge function that generates lJ, and define
<P' : Rn --+ R as follows. For any vector x E Rn with lill 2': li21 2':
.. . 2': I inl let
<P'(x) = <P ('ill ; li21 , li31 ; li41 , . . . , li2H 1 2 + li211 ) .
It is easily verified that <P' is a symmetric gauge function. Let lJ' be the
unitarily invariant matrix norm generated by <P'. Then
lJ'(Px - Py) = <P'(al, 171,...,171,171,0,.. .,0)
= <P(al""' ,(71)
= v[sin 8(X, y)]. .
We now turn to two different metrics on Ci'. To motivate the first,
let X, Y E ei' and let the columns of X and Y form orthonormal bases
for X and y. If X = Y, then there is an unitary matrix Q = yHX
such that X = YQ or equivalently IIX - YQIIF = O. This suggests that
we use the number
Pb(X, Y) f min IIX - YQIIF
Q umtary
as a measure of the distance between X and Y (the subscript b stands
for basis).
Theorem 4.11. The function Pb is a unitarily invariant metric on ei'.
If Ii is the cosine of the i th canonical angle between X and Y, then
Pb(X, Y) = ) 2 L(1- Ii)'
(4.7)
Proof. It is easy to see that Ph is a unitarily invariant,
function that is zero if and only if its arguments are equal.
show that it satisfies the triangle inequality.
nonnegative
We will now
96
II. NORMS AND METRICS
Let the columns of X, Y, Z form orthonormal bases for X, Y, Z E
Cr. Let
p,,(X, Y) = IIX - YQx,yIlF,
Pb(Y, Z) = IIY - ZQy,zIIF,
Pb(X, Z) = IIX - ZQx,zIIF'
Then
Pb(,l', Z) :::; IIX - ZQy,zQx,yIlF
= IIX - YQxy + YQx,y - ZQy,zQx,YIIF
:::; IIX - YQx,yIlF + IIY - ZQY,zIIF
= Pb(X, Y) + Pb(Y, Z).
We will establish (4.7) for the case 2l :::; n. Without loss of gener-
alit.y, we may assllmc t.hat X and }' ,Ue' canonical bases for X and Y"
Thus we must find a unitary mat.rix Q that minimizes
2
(f)-O)Q
= III - rQII + IIEII.
F
The second term on the right-hand side of the above equation is inde-
pendent of Q. Hence Q must minimize
III - rQII = trace(1 - Qllr - rQ + r 2 ).
This quantity is minimized when the diagonals of Q are one, and since
Q is unitary, Q = I. Hence
III - rQII + IIEII = trace(I - 2r + r 2 + E2)
= 2trace(I - r) = 2 Li(1 - Ii)' .
The second metric is defined by the formula
I det(XHY)1
pe(X, Y) = arccos V det(XH X) det(yHY)'
wlH're th(' columns of X and Y form bases for X and y. We will also
consider the closely related metric
pg(X, Y) = sin Pe(X, y).
4. METRICS ON SUBSPACES OF en
97
It is easily seen that these functions are unitarily invariant and inde-
pendent of the choice of X and Y. By choosing canonical bases we see
that
Pe(X, Y) = arccos n Ii,
(4"8)
where, as usual, the Ii'S are the cosines of the canonical angles between
X and y. The proof of the following theoreIll shows that they are
metrics on e/,.
Theorem 4.12. The functions pe and pg are unitarily jnvarjant metdcs
on e/,.
Proof. The fact that pg is a metric follows immediately from the fact
that Pe is a metric, which we will now establish. From (4.8) we have
PII(X, Y) = 0 {=::::> Oi Ii = 1
{=::::> II = 12 = . . . = 1
{=::::> X = y.
Thus it remains only to show that Pe satisfies the triangle inequality.
Let the columns of X, Y, Z form orthonormal bases for X, Y, Z E
e/,. Then we must show that
arccos I det(X H Z)I ::; arccos I det(XIIY)1 + arccos I det(y H Z)I.
For S = X, Y, Z, let 6j;")"i , denote the determinant of the matrix
formed from the i1th, ..., i[th rows of S. By the Binet-Cauchy formula
(see, e.g., [81, V.I, p.9J) we have
det(XHy) = L 6;;')i, 6 t y \,
1:";;1 <"""<i/"
(4.9)
with similar formulas for det(XH Z) and det(yH Z). For S = X Y Z
let the components of Vs be the numbers 6(S)" taken in some' fie(i
order. Then l[ "HII
det(XHy) = VVy,
with similar formulas for det(XII Z) and det.(yll Z). Thus our problem
reduces to showing that
arccos Ivvzl ::; arccos Ivvyl + arccos Ivvzl.
(4.10)
98
II. NORMS AND METRICS
The QR decomposition of the matrix (vy Vx vz) gives a unitary
matrix U such that
) T
vy=U(I,O,""",O ,
) T
VX=U(0'1,0'2,0,...,0 ,
) T
V 7, = U U:l1 , ;32, /3:1, 0, " " . , 0 ,
10'11 2 + 10'21 2 = 1,
1(:1 1 1 2 + 1/321 2 + 1;331 2 = 1.
Since
1 0 1/31 + 02/321;:::: 10'111/311-10' 211;321
= 10'111/311 - Vi -10'11 2 V l - 1/311 2 -1/331 2
;:::: 10'111;311- J I-10'11 2 V I-I/3112
= cos(arccos kl'll + arccos 1;311),
we have
arccos 10l,Lh + 02.821 :S arccos lall + arccos 1.811
which is the required inequality (4.10). .
Since most perturbation bounds deal with small quantities, it is in-
structive to examine the asymptotic behavior of the metrics introduced
above when the canonical angles e i are small. Specifically, we have
1. Pg.2 = si n (}I 1181h,
2. Pp,F = )2 L sin 2 e i v'2118I1F,
3. Ph = J 2 2::(1 - cos e i ) 11811F,
4. pe = arccos(n cos e i ) 11811F.
In particular, comparing (4.11.3) and (4.11.4) with (4.11.2) we see that
the metrics Ph and pe also generate the gap topology.
(4.11 )
Notes and References
The original impetus for comparing subspaces came from functional analysis.
According to Berkson [27], the gap or "opening" Pg,2 was first defined on
Hilbert space by Krein and Krasnoselski [137, 1947]. Krein, Krasnoselski,
and Milman later extended the notion to an arbitrary Banach space [138,
1948]. For more on the gap and its applications see Kato [135].
4. METRICS ON SUBSPACES OF en
99
The metric Ph was introduced by Paige [173, 1984] and the metric pe y Lu
!149, .1963], A, this writing there is no simple characterization of unitarily
Il1vanant metncs for subspaces. One reason is that these metrics - unlike
norms, which are defined for one object on a space with a linear structure--
are relations between two objects in a space with a complicated structure.
Hweer, the survey in this section suggests that any reasonable approach
will YIeld something that can be expressed in terms of canonical angles __ at
least asymptotically [ef. (4.11)].
Exercises
IN THE FOLLOWING EXERCISES ,1:' AND YARE SUBSPACES OF en
AND v IS A NORM ON en. UNLESS OTHERWISE STATED X AND
Y HAVE THE SAME DIMENSION.
1. Show that if dim(X) > dim(Y) then there is a point x E X such that
ov(x,y) = v(x). [Note: This theorem is nontrivial. For a proof and further
references see [135, Ch.lV,Sec.2], from which some of the following exercises
were excavated.]
2. Show that Pg,v(X, Y) :S 1, with equality if dim(X) dim(y),
3. Let
bv(Y, X) = mill u(y - :1; )
:rEX
I/(T)l
and
Pg,v(X, Y) = max { max 8v ( X, Y )
rE,\' ,
v(x)=1
Show that Pg,v is a metric.
4. Show that
max 8v(Y, X) } .
yEJ'
v(y)=1
Pg,v(X, y) :S Pg,v(X, y) :S 2pg,v(X, Y).
5. Show that if {.1'd is a Cauchy sequence in Pg,v then there is a subspace
Y such that X k -+ Y in Pg,v (i.e., the space of all subspaces is complete).
6. (Schiiffer [153]) . Let
1T(X,Y) = max{min{v(I - C): CX = y},min{v(I - C): CY = X}}
100
II. NORMS AND METRICS
and
Ps(,1', Y) = 10g[1 + 11'(,1', Y)].
Show that {is is a metric.
Chapter III
Linear Systems and Least
Squares Problems
In this chapter we will be concerned with the solution of the linear
system
Ax = b,
where A is a nonsingular matrix, and with the closely related least
squares problem
minimize lib - Axllz,
(1)
where A is a general m x n matrix. A solution of the latter problem is
given by Atb, where At is the pseudo-inverse of A to be defined below.
When A is nonsingular, At = A-I, so that t.he pseudo-inverse solves
the first problem as well.
This chapter begins with an introductory section, after which we
will treat the perturbation of matrix inverses and pseudo-inverses and
in consequence the solution of linear systems and linear least squares
problems. As we have noted, the fonuer is a special case of the lat-
ter, and in principle we could approach the subject by developing the
perturbation theory of pseudo-inverses and least squares problems and
note what happens when m = n. However, the perturbation of pseudo-
inverses is complicated by the fact that the pseudo-inverse need not be
a continuous function of its elements. We will therefore begin with the
101
102
Ill. LINEAR SYSTEMS AND LEAST SQUARES
simpler case of matrix inverses and linear systems. This approach also
has the advantage of presenting some of the key ideas of perturbation
theory in a comparatively simple setting.
1. The Pseudo-Inverse and Least Squares
1.1. Generalized Inverses and the Pseudo-Inverse
Let A E e nxn . It is well known that if A is nonsingular then there is a
unique matrix X such that
AX = X A = I.
(1.1)
The matrix X is called the inverse of A and is denoted by A -I. In this
case the linear system Ax = b has a unique solution x = A -I b.
It is natural to attempt to generalize the idea of an inverse to the
case where A is singular or even fails to be square. This can be done
by requiring X to satisfy conditions that are less restrictive than (1.1).
By varying the conditions, we can obtain many different "generalized
inverses," each suited to its own application. In this book we will
be particularly concerned with the geometry of C n , and the following
PENROSE CONDITIONS are the most appropriate:
1. AXA=A,
2. X AX = X,
3. (AX)H = AX,
4. (X A)" = XA.
(1.2)
Note that the first condition alone implies that X = A-I when A is
nonsingular. However, it does not define X uniquely when A is singular.
We are therefore free to impose additional conditions. The conditions
2-4 have geometric implications, which we will explore in the next
section.
It is customary to denote an "inverse" satisfying a subset of the Pen-
rose conditions, say conditions i, j, and k, by writing A(i,j.k). Thus A(I)
satisfies only the first condition. The matrix A (1,2,3,4), which satisfies
all four, is written At, and is called the MOORE-PENROSE GENERALIZED
INVEHSE or tllP PSEIJDO-INVEHSE of A"
1. THE PSEUDO-INVERSE AND LEAST SQUARES
103
There are explicit formulas for some of the generalized inverses gen-
erated from the Penrose conditions. Let
A = U (+ ) V H
(1.3)
be the singular value decomposition of A. Let us seek A(I) in the form
AP' V U ) U H
By direct multiplication
"!
AA'O A U ( E, ''E, n If",
and it follows frolll the first Penrose condition that T = E- 1 Tlms any
(I)-inverse has the form +
A'" V (E{' ) UH,
where K, L, and M, are arbitrary.
If we now seek a (1, 2)-inverse in the form
AP," V (Ei' ) U",
then by the second Penrose condition
A(1,2) = A(I,2) AA(1,2) = V ( E O + 0 ) UH.
LE+K
Thus any (1, 2)-inverse has the form
A(I,2) = V ( E L +1 K ) H
M U,
104
III. LINEAR SYSTEMS AND LEAST SQUARES
where I< and L are arbitrary and LE+I< = M.
In the same way we can prove that any (1,2, 3)-inverse has the form
( 12'1 ) ( 1::;,1 0 ) V II
A ", = V
L 0 '
with L arbitrary; and any (1, 2,4)-inverse has the form
A(I,Z,4) = V ( E:;:I I< ) VII
o 0 '
wit.h I< arbit.rary. Finally the pseudo-inverse has the form
At',"A) V (E;' ) U H
(1.4)
These representations are independent of the choice of singular vectors
(see the comments following Theorem 1.4.1). In particular, since there
is nothing arbitrary about (1.4), we have established the existence and
uniqueness of the pseudo-inverse.
Theorem 1.1. Let A E C'" x n" Theu there is a uuique matrix X E
c"xm that satisfies the Penrose cauditians (1.2).
The following properties of t.he pseudo-inverse are easily established
from (1.2) or (1.4)
Theorem 1.2. For any matrix A the fallawiug hold.
1. (At)t = A
2. (A)t = (At) .
3. (AT)t = (At)T.
4. rank(A) = rank(At) = rank(AAt) = rank(At A).
5. (AAII)t = AId At, (AIIA)t = At A Ht .
6. (AAH)tAA H = AAt, (AIIA)tAHA = AtA
1. THE PSEUDO-INVERSE AND LEAST SQUARES
105
7. If A E e",xn has rank n, then At = (All A)-I All aud At A = f(n).
8. If A E e mxn has rauk rTl, I,hen At = AII(AAII)-1 and AAt =
f(m).
9. If A has the full rauk factorization A = PCIl, where rank(P) =
rank(C) = rank(A), then
At = C(p II AC)-l pH
and
At = (ct) lI pt.
In particular, Ear the singular value Eactarizatiau A = VI E+ VI,
At = \/,1:- J V II
I + I'
10. If V, V are unitary matrices, then
(U AV)t = VIIAtU II .
11. If
A = ( D O 0 0 )
E enxn,
with D = diag(81"", 8r) aud 8 i i= 0 for i = 1,...,1', theu
( D-I 0 )
At = E e nxm
o 0 r'
Theorell 1.2 shows. that the pseudo-inverse has many properties in
commo wIth. the ardmary inverse. However, it fails to share other
propertIes. It IS left as an exercise to construct examples to show that
1. (AB)t is not necessarily the same as BtAt
,
2. AAt is not necessarily the same as At A
,
3. (Ak)t is not necessarily the same as (At)k,
4. The nonzero eigenvalues of At are not necessarily the reciprocals
of the nonzero eigenvalues of A.
106
III. LINEAR SYSTEMS AND LEAST SQUARES
1.2. Project.ions and Least. Squares
As we saw in Section 1. 2, any solution of the problem (1) of minimizing
Ilb- Axllz must satisfy Ax = PAb, where P A is the orthogonal projection
onto the column space of A. It turns out that this projection can be
expressed in terms of the pseudo-inverse of A.
Theorem 1.3. For any matrix A,
1. P A = AAt is the orthogonal projector onto R(A),
2. PAil = At A is the orthogonal projector onto R(A II ),
3. I - PAil is the orthogonal projector onto N(A).
Proof. From the Penrose conditions, AAt is a Hermitian idempotent
and hence is the orthogonal projector onto
R(AAt) C R(A).
Since A = (AAt)A, we have rank(AAt) 2: rank(A). Hence R(AAt) =
R(A), which establishes Part 1.
Part 2 follows from Part 1 on observing
PAil = AII(AII)t
= [AII(AH)t]H
= AtA.
To establish Part 3, note that by Part 2
A(I - PAil) = A - AAtA = A - A = 0,
so that I - PAil is an orthogonal projector into N(A). But if Ax = 0,
then
(I - PAII)x = X - At Ax = x,
so that I - PAil projects onto N(A). .
The characterization Ax = PAb of the solution of the least squares
problem (1) is not sufficient to determine x when A is not of full column
rank. The following theorem gives a complete characterization.
1. THE PSEUDO-INVERSE AND LEAST SQUARES
107
Theorem 1.4 (Penrose). The SOlUUOllS of the problem (1) have the
general form
x = Atb + (I - PAII)Z, (1.5)
where Z is arbitrary. Of all the solutions, Atb has the smallest 2-11onn.
Proof. By Theorem 1.2.5, any minimizing vector Ax must satisfy Ax =
PAb. Since A(Atb) = PAb, the vector x = Atb is one solution"
Let us now seek a general solution in the form
x=Atb+y.
(1.6 )
Since
Ay = A(x - Atb) = Ax - PAb = 0,
We have y E N(A). By Theorem 1.3,
y = (I - PAII)Z,
where Z is arbitrary. This establishes (1.5).
Since Atb is orthogonal to (I - PAII)Z, from
IIxll = IIAtbll + II(I - PAII)zII
we see that IIxll is minimal whcll Z = o. .
Whell A has full column rank, thcn PAil = I, and the solution of
(1) is unique.
We conclude this subsection with two sets of equations that are
always satisfied by least squares solutions. The first are the classic
NORMAL EQUATIONS. The second, which involves both the solution and
the residual, are called the EXPANDED EQUATIONS and are useful in a
number of applications. The proof of the following theorem is left as
an exercise.
Theorem 1.5. Let x be a solution of the least squares problem (1),
and let r = b - Ax be the associated RESIDUAL VECTOR. Then
A H Ax = A H b
and
Un )(:) ()
(1. 7)
108
III. LINEAR SYSTEMS AND LEAST SQUARES
Notes and References
Although Gauss [84, 1821] exhibited the rows of the pseudo-inverse to prove
his celebrated minimum variance theorem (see Exercise 1.g), he had no con-
cept of the pseudo-inverse as a matrix or an operator, and it would be mis-
taken to impute the notion to him. The true fathers of the pseudo-inverse
are Moore [162, Ig20], Bjerhanuner [34, 1951] (the full rank case), Penrose
[178, 179, 1955,1956], and to a lesser extent Bergman et al. [26, 1950], who
introduced the (1,2)-inverse of a symmetric matrix and specialized it to the
pseudo- inverse. Penrose, perhaps because of his elegant algebraic char-
adpri7,atiol\ of the pseudo-inverse touched ofT a vogue in the subjed of
gcnprali7,cd inverses. By 1976, NashI'd and Rail [163] were able to compile a
bicentennial bibliography running to 1776 entries, in which they all but say
that a number of these papers contributed more to the promotion of their
authors than to the promotion of science.
Things have settled down since then, and now is a good time to sift the
residuum.
Any attempt to use a generalized inverse in the case where the matrix is not
of full rank must come to grips with the fact that under such circumstances
the generali;>;ed inverse is not a continuous function of its elements. This
observation was first made by Penrose [178] for the pseudo-inverse, but it
is true of any (I)-inverse: Thus, to use a pseudo-inverse in practice, one
must determine the rank of the matrix and project the errors appropriately.
Unfortunately, determining the rank of a matrix in the presence of errors
is a very difficult problem (e.g., see [211]). For this reason, papers about
the applications of generalized inverses have a certain theoretical air about
them: they leave you all dressed up with nowhere to go.
The clear winner in the generalized inverse sweepstakes is the pseudo-inverse
applied to full rank problems. It is unique, continuous, and computable
(although one seldom has need of an explicit pseudo-inverse). Moreover,
its connection with orthogonal projections makes it useful in discussing the
gpolJ\ptry of lI-space. A distant second is the Dra;>;in generali;>;cd-inverse
[61, 1958] (see Exercise 1.23), which enters into the perturbation theory of
eigenvalues and eigenvectors, one of the few cases in which we know the rank
a pri07'i (see Example V.2.1O).
The principle of lea.<:;t squares was used by Gauss in astronomical calculations
'This follows from the fI'Let thI'Lt, mnk(A) = trI'Lce(A(1)A) I'Lnd thI'Lt the trace of a n",trix, being
an intr'gpr I is dicol1titHlOllS unlc:o;s the matrix is of full rank.
1. THE PSEUDO-INVERSE AND LEAST SQUARES
109
in the 1790s. IIowever, Legendre [143, 1805] first publishf'd the method,
and Gauss's subsequent claim to it [82, 8:, 1809] sparked a famous priority
dispute (for a discussion and further references see [219]). The applications
of least squares are far too many to survey here. For the statistician'spoint of
view, see [196]. [Warning: the notation is quite different. A statistician does
a "regression analysis," which is usually written in the form y = Xj3 + e,
where X has n rows and typically p columns.] For the numerical analyst's
point of view, see [38, 142].
The least squaresproblem can be generali;>;ed in a number of ways. One is
to replace the 2-norm by an elliptic norm (See Corollary II.1.5 and Exer-
cise II. 1.16). Actually this can be done in two ways, since we can generali;>;e
Theorem 1.4 by requiring that :r be the solution of minimal T-norm that min-
imizes lib - Axlls. The theory of these problems can be approached through
elliptic pseudo-inverses (see Exercises 1.13-1.15). However, the numerical
solution of such problems requires a different approach (see [171, 172]).
Another generalization is to require that the solution satisfy a linear equality
constraint. This problem can also be approach via generali;>;ed inverses, but
the returns on this approach are not yet in. See [64] for details and further
references.
Exercises
1. (Moore's characterization [162]). Show that if
1. AX A = A,
2. X AX = X,
3. R(A) = R(X H ),
4. R(AH) = R(X),
then X = At.
2. (Bjerhammer's characterization [34]). Let A have full column rank and
let B be any matrix such that A H B = 0 and (A B) is nonsingular. If
( X yH H )
= (A B)-I,
',then XH = At.
110
111. LINEAR SYSTEMS AND LEAST SQUARES
3. Show that if A(I) is a l-invn;e, then rank(A) :S rank(A(I»). Moreover
AA(I) is an (oblique) projection onto th column space of A. If in addition
A(I) is a 3-inverse, then the projection is orthogonal.
4. Let A have the singular value decomposition (1.3). Determine conditions
on T, [{, L, ami Af such that
x = V ( I' [{ ) UH
1, AI
is a (1,3)-inverse.
5. Establish the existence and uniqueness of the pseudo-inverse directly from
Penrose's conditions
6. Show that if A has full column rank and A = QR is the QR factorization
of A, then At = RIQH.
7. Show that if A is of full rank then inf2(A) = IIAtllil. What is IIAtilil
when A is not of full rank?
8. Prove Theorem 1.2.
(t) d I . th
9. (Gauss [84, 1821]). Let A have full rank, and let a i enote t Ie I row
of At. Show that
Ilat)112 = min Ilzll2'
zH A=l;
Conclude that among all matrices Z satisfying ZH A = I (i.e., among all
RIGHT INVERsES of A) the pseudo-inverse has minimal Frobenius norm. [Note:
Gauss's application was the following. If b = Ax + e, where e is a vector
of uncorrelated random variables with mean zero and variance (j2, then any
vector satisfying zH A = IT yields an unbiased estimate zHb of the first
componC'nt of x. Since the variance of zHb is (j2I1zll: the let squares
csUlIIate is UH' lIIinillllllJl variance ('stilllatc. This result IS sometlIIles called
t.he Gauss- Markov t.hconm, although the attribution to Markov is spurious.]
10. (Penrose [178]). Show that the equation AX B = C has a solution if
and only if AA(l)CB(I)B = C, in which case the most general form of the
solution is
X = A(1)CB(I) + Y - AA(1)YB(I) B,
where Y is arbit.rary.
11. Show t.hat if A is normal, then (An)t = (At)n.
'Ji
1. THE PSEUDO-INVERSE AND LEAST SQUARES
111
12. Show that
At = lim(TI + AHA)-I A H = lim AH(TI + AAII)-I.
TO TO
THE FOLLOWING EXERCISES SHOW HOW TO GENERALIZE THE
NOTION OF PSEUDO-INVERSE TO THE CASE OF ELLIPTIC NORMS.
13. Let S and I' be positive definite matrices of orders m and 71, and let the
last two Penrose conditions be replaced by
3'. SAX is Hermitian,
4'. X AT is Hermitian.
Show that AX is the projection onto R(A) that is orthogonal with respect
to the inner product (x, y)s = yHSx. What is X A? The matrix X == A(S,T)
is called an ELLIPTIC or WEIGHTED PSEUDO-INVERSE.
14. Show that the most general solution of the problem of minimizing lib _
Axlls is x = A(S,T)b + (I - A(S,T) A)z, where z is arbitrary. In particular
A(S,T)b has minimal T-norm.
15. Show that if A is of full column rank, then
A(S,T) = (AHSA)-IAHS.
In this case we write A(S.T) = AU,).
16. (Paige [171]) . Let W be positive definite and let W = LLII. Show that
the problem of minimizing lib - Axllw-t is equivalent to the problem
minimize IIvll2,
subject to b = A;r + Lv.
The importance of this result is that it works when W is singular: simply
let W = LL H be a full rank factorization of W.
-0-
WHEN S AND I' ARE DIAGONAL THE ELLIPTIC PSEUDO-INVERSES
ARE CALLED SCALED PSEUDO-INVERSES. IN THE FOLLOWING
EXERCISES WE WILL ASSUME THAT A IS OF FULL RANK AND
DE D+, THE SET OF ALL DIAGONAL MATRICES WITH POSITIVE
DIAGONAL ELEMENTS.
112
111. LINEAR SYSTEMS AND LEAST SQUARES
17. (Stewart [216]). Let
,f' = {:r E R(A) : 11:1:11 = l}
and
y = {y: 3D E V+ such that yTDy = O}.
LPI,
p = inf lIy - :rll.
yEY
xE'\'
Show that
sup II AA (D) 1\2 ::; p
DED+
and
sup IIA(D)II::; pllAtli.
DED+
18. (Stewart [2IG]). For any matrix X let inf+(X) be the smallest nOlzero
. I I f X L n t the columns of U form an orthonormal basIs for
smgu ar va ue 0 ." ' c
R(A). Lct
p = min inf+ (Uj),
whcrc Uj denotes any submat.rix formed from a set of rows of U.
P? p.
19. (O'Leary [IG5]). Show that
Show that
p ::; /i"
20. Devise an example to show that the above results do not hold when D
ranges over the space of positive definite matrices.
-0-
TilE FOLLOWING EXERCISES TREAT THE THE LEAST SQUARES
PROBLEM WITH LINEAR EQUALITY CONSTRAINTS:
minimize IIb 2 - A 2 xlb (1.8)
subject to Alx = b l .
21. Let Al have full row rank and A 2 have full column rank. Let
_ ( TAl )
AT -
A 2
and b = ( :: ) .
1. THE PSEUDO-INVERSE AND LEAST SQUARES
113
Show that
x = lim Atb
T00
is the solution of (1.8). [Notc: This result suggests that we can solvc a
constrained least squares problem by taking T large enough and solving an
ordinary least squares problem. In essence this is true, but precautions must
be taken against the effect of rounding errors. See [249] for more.]
22. (Wedin [262]). Show that if Al has full row rank, then the the con-
strained problem (1.8) has a unique solution which satisfies
( A\'
o
I
AI
: )( ) (:J
where I! is a vector and
( r) (::) - ( : ) x.
-0-
THE FOLLOWING EXERCISES DEVELOP THE DRAZIN GENERAL-
IZED INVERSE [61]. DRAZIN ORIGINALLY DEFINED HIS INVERSE
FOR ELEMENTS OF RINGS AND SEMIGROUPS. HERE WE AP-
PROACH TilE DRAZIN INVERSE TlInOUGIi TilE JORDAN CANON-
ICAL FORM .
23. Let the Jordan form of A be written
A = XI.h l (AdYlI + X2Jk2(A2)y]I +... + X/Jk/(AdYl H
(1.g)
(see the subsection On invariant subspaces in Section 1.3). Define
.h(A)# = { O h(A)-1 if A i 0
if A = 0
and
A# = Xdkl().d#y I H + X2Jk2(A2)#Y2H +... + X/Jk/()./)#1/H.
Show that A # is the unique matrix satisfying
II,}
Ill. LINEA It SYSTEMS AND LEAST SqUAIUS
1. A# A = AA#,
2. A# = (A#)2 A,
.. I tl t A m - Am+1 A#
3. there IS an mteger 111, sue I la - .
The matrix A# is called the DRAZIN GENERALIZED INVEHSE of A.
24. Let A have the Jordan form (1.9) and let A be an eigenvalue of A. The
invariant subs pace corresponding to A is the space
EB R(X j ).
.\,=.\
Show that F.\ = I - (AI - A)(AI - A)# is a (generally oblique) p.rojecion
onto the invariant subspace associated with A. What subspace does It project
along? The matrix P.\ is called the SPECTRAL PROJECTION of A.
-0-
2. Inverses and Linear Systems
In this section we will be concerned with two related problems. Let A E
e nxn be nonsingular and let A = A + E be a perturbtion of A. The
first problem is to determine under what conditions A is nonsingular
and to bound a norm of AI - A-I. The second problem is to derive
bounds on some norm of x - x, where x is a solution of the linear system
Ax = b
and .i is the solution of
Ai: = b.
The close relation between the two problems is due to the fact that
x - x = (A- I - A-I)b.
In the statement of these problems, note the use of a tilde to denote a
perturbed quantity. We will use this notational convention throughout
the book:
A symbol with a tilde over it always denotes a perturbed
quantity. The unperturbed quantity is denoted by the
same symbol without a tilde.
2. INVERSES AND LINEAR SYSTEMS
115
Here we must distinguish between primary perturbations and de-
rived perturbations. An example of the former is .4, which is obtained
from A by the addition of an explicit perturbation E. The latter is
represented by x, which is not explicitly perturbed but depends on the
primary perturbation E. In perturbation analysis our goal is usually
to obtain bounds on derived perturbations in terms of an explicit per-
turbation; e.g., to bound some norm of x - x in terms of some norm
of E. Generally, we will represent explicit perturbations by the letters
E, F, G, H or their lower-case and Greek analogues. Usually, but not
always, these letters will carry the implication that the perturbation is
in some sense small.
2.1. Absolute and Relative Errors
Before we can proceed to the perturbation theory of linear systems, we
must first discuss how errors are to be represented. For scalars there
are two representations in general use.
Definition 2.1. Let 0:, ii E e. The ABSOLUTE ERROR or simply the
ERROR in ii regarded as an approximation to 0: is the number
ae(o:, ii) = Iii - 0:1.
If 0: =I 0, then the RELATIVE ERROR in ii is
_ Iii - 0:1
re(o:,o:)= 10:1 .
The absolute and relative errors have a number of elementary prop-
erties, which we list here without proof.
Theorem 2.2. In the notation of Definition 2.1:
1. There is a number f with iEl = ae( 0:, ii) such that ii = 0: + f.
2. There is a number p with Ipl = re(o:,ii) such that ii = 0:(1 + p).
3. Ifre(o:,ii) < 1, then
re(o:, ii) re(o:, d')
( _ ) ::; re(ii, 0:) ::;
1 + re 0:, a 1 - re(o:, ii).
116
III. LINEAR SYSTEMS AND LEAST SQUARES
1. If re( n, a) = 10- 1 , then a and a agree to approximately t decimal
digits"
The first item in the theorem says that if the absolute error is sm.all,
then a may be obtained from a by adding a number nea zero. Slln-
ilarly if t.he rdat.ive error is slllall, t,lJ('1l (V lIIay be obtamcd f[(.)n n
by ,;\;lltiplying hy a !l\lIllIH'r !lear (Jnt'" The t.hird itt'lIl S:WS t.hat. I t.he
relative error is small, thell for practical purposes re(n, n) (nd re(n,.n)
are the same. The last item gives a quick way of estimatmg .relat.lve
error; e.g., we see from it that 3.14159, regarded as an approxllnatlOn
to Jr, has a relative error of about 10- 6 . .
It is natural to attempt to generalize absolute and relatIve error to
vectors and matrices by replacing the absolute value by a norm. Tle
result is the following defillition, which, however, is not without Its
difficulties.
- I II C ",xn
D fi . t ' 2 3 L ot A A E c",xn and let I. be a norm on .
e nl Ion .. " , _
The ABSOLUTE ERROR or simply the ERROR in A regarded as an approx-
imation to A is the number
ae(A, A) = IIA - All.
If A =I 0, then the RELATIVE ERROR in A is
- IIA-AII
re(A, A) = IIAII .
The following theorem list the analogues of the items in Theo-
H'm 2.2.
Theorem 2.4. In the notation of Definition 2.3:
1. There is a matrix E withllEIl = ae(A, A) such that A = A -I- E.
2. If II . II is consistent ,!:nd there is a matrix R such that A =
A(I -I- R), then re(A, A) ::; IIRII.
3. Ifre(A, A) < 1, then
re ( A A) - re(A, A)
, - rc(A, A) - .
1 -I- re(A, A) 1 - re(A, A)
2. INVERSES AND LINEAR SYSTEMS
117
The difficulty with the definitions 2.3 is that they attempt to char-
acterize the errors in the mn elements of A by a sillgle number. Some
information has to be lost in the process, and that information may
be important. This is illustrated by comparing the second items in
Theorems 2.2 and 2.4. For scalars, the statement p = re( a, tV) means
that a is obt.ailled from () hy lIIult.iplyillg by a lIumher within p of
one. For matrices, we would like to say that p = re(A, A) means that
A = A(I -I- R), where IIRII p. However, if A is singular such an R
may not exist, and if A is nonsingular the most we can say about R
is tha IIRII ::; p1lAIlIIA-11I (see Exercise 2.1). Thus to report the fact
thai, A = A(I -I- R) by saying that the relative error in A is IIRII is to
give away information.
The absence of a fourth item from Theorem 2.4 illustrates another
loss of information. For example, suppose that with respect to the
oo-nonn we have
reCXJ(x, x) = 10- 1 .
Then we know that the largest components of x and x agree to roughly
t decimal digits. But the best thing we can say about the smaller
components is that if Iil = lOkllxIlCXJ' then Id andld agree to about
t - k significant figures. In particular, if k > t then the relative error
says nothing at all about the accuracy of i'
As we shall see, structured perturbation theorems, such as Theo-
rem 2.14, provide partial relief from these problems, as do component-
wise bounds (Theorem 2.12). Nonetheless, we will cast many of our
results in terms of absolute and relative errors of vectors and matri-
ces. In the first place, bounds of this form do say something about the
larger components. Moreover, the use of absolute and relative errors
gives perturbation bounds a simplicity that makes them easier to inter-
pret. Finally, in applications we can often scale the problem so that the
components of the quantity being bounded are approximately equal, in
which case a bound on the relative error is as good as bounds on the
individual components.
2.2. The Inverse Matrix
The fundamental results on matrix inverses are contained in the follow-
ing theorem.
118
III. LINEAR SYSTEMS AND LEAST SQUARES
Theorem 2.5. Let A E cnx1t be nonsingular and let A = A + E be
a perturbation of A. Let II . II be a consistent matrix norm. If A is
nonsingular, then
IIA- I _- A-III < IIA- I Ell.
IIA-III -
(2.1)
Alternatively, if
IIA-IEIl < 1,
then A is perforce nonsingular and
- IliA-III
IIA- II l-IIAIEII '
(2.2)
l\Joreover
IIA- I - AIII IIAI Ell
<
IIA-III - l-IIA-IEII
(2.3)
Remark 2.6. The theorem remains valid if A-I E is everywhere re-
placed by EA- I .
Proof. Since A is nonsingular, AA- I = (A + E)A- 1
A-I E)A-I = AI. Hence
A -I _ Al = -A- I EAl
,
I, or (I +
(2.4)
and (2.1) follows on taking norms in (2.4).
If IIA- I Ell < 1, the spectral radius of A-I E is less than one and
I + AI E is nonsingular. From A = A(I + A-I E) it follows that A
is nonsingular. Moreover, from (2.4) we also have II A - ' II IIA-'II +
IIA- I EIlIIA- I II, and (2.2) follows on solving this inequality for IIA-III.
Finally, (2.2) follows from (2.1) and the third item in Theorem 2.4. .
This theorem establishes four things: 1. a bound on re( A I, A -I)
which holds whenever A is nonsingular; 2. a condition under which A
is nonsingular; 3. a bound on A-I; 4. a bound on re(Al, A-I). All
these are cast in terms of the number IIA- I Ell (or, by Remark 2.6, the
number ilEA-III). However, in many applications we will not know E
explicitly, only an estimate of II Ell. The following corollary, which also
defines the condition number of a matrix, answers to these applications.
It is proved by replacing IIA- I EII by the upper bound IIAlIII1EII in the
conclusions of Theorem 2.5.
2. INVERSES AND LINEAR SYSTEMS
119
Corollary 2.7. Let
K(A) = IIAIlIIA-IIi
be the CONDITION NUMBER of A. If A is nonsingl1lar, then
IIA-l - AIII II Ell
IIA-III K(A) IIAII '
1£ in addition
K ( A ) II Ell . 1
IIAII <: ,
then A is perforce nonsingl1lar and
IIA-III < IIA-III
- 1- K(A) IIEII '
IIAII
(2.5)
Moreover
_ IIEII
IIA-I - A-III < K(A)lTAIT
IIA-JII - IIEII'
1 - K(A) IIAII
It is insruc.tive to look at the inequality (2.6) in greater detail. The
lt;hand sIde IS the relati:,e er.ror (with respect to the norm II . II) of
regaded as an approxllnatlOn to A -I. Its bound on the right- hand
i! sde consIsts of two factors. The first is the relative error IIEII/IIAII in
.. A regarded as an approximation to A. The second is the quotient
(2.6)
K(A)
1- K(A) II Ell .
II All
If K(A) is much .less than one, as it will be when the error E is small
enoug, t e denommator has negligible effect, and the second factor is
essentIally K(A). Thus, the inequality (2.6) says that
a relative error in the matrix A may be magnified by as
much as K(A) in its inverse.
120
HI. LINEAR SYSTEMS AND LEAST SQUARES
The word "magnify" is appropriate here because
1 :S 11111 = IIAA-III :S IIAIIIIAIII = K(A).
It is difficult to overstate the insight that one gets from this bound.
Its preCllfsor, the inequality (2.3), suggests that the inverse of a ma-
trix will be WELL CONDITIONED ..,. that is, insensitive to perturbations-
provided its inversE' is sufficiently small. But it does not say what "suf-
ficiently small" is. TlH' bound (2.G), on the other hand, makes it clear
that a well conditioned matrix is one with a small condition number.
Anel small is well defined, since the condition number is bounded below
by unity. For example, to the extent that Item 4 in Theorem 2.2 holds
for relative errors in matrices, we have the following rule of thumb:
If a matrix has a condition number of 10 k and its elements
are perturbed in their t-th digits, then the elements of its
inverse will be perturbed in their (t - k)-th digits.
People often say that ill-conditioned matrices are nearly singular.
The following theorem gives substance to this way of speaking.
Theorem 2.8. Let A be a nonsingular matrix. Let K(A) = IIAIlIIA-IIi
be the condition number of A with respect to a consistent matrix norm
II . II. Then for any matrix E,
A + E is singular => IIEII > KI(A).
IIAII -
(2.7)
Moreover, if the norm 11.11 is subordinate to a vector norm (also written
11'11), then there is a matrix F; with IIEII/IIAII = KI(A) such that A+E
is singular.
Proof. If A -+ F is singular, thf'1l hy TheorC'1ll 2"5
1 :S IIA- I Ell IIA-11IIIEII,
which with a little manipulation is seen to be equivalent to the impli-
cand in (2.7).
Now suppose that II . II is subordinate to the vector norm II . II,
and let x be a vector of norm one such that IIA-Ixll = IIA-Ili. Let
y = A-lx/IIA-III, so that
x
Ay = IIA-III '
2. INVERSES AND LINEAR SYSTEMS
121
Let II . II. be the norm dual to the vector norm /I . II, and let z be a
vector such that II z /I. = 1 and
zHy = max lu"yl.
111111.=1
It follows from Theorem 11.1.12 that
z"y = lIyll = 1.
Let
xz ll
E=- /lA-III '
Then
H
(A -+ E)y = Ay - 1II/l Y = IIAIII - IIAIII = 0,
so that A + E is singular. The theorem will be proved if we can show
that IIEII = /lA-Ill-I. But
II E II - /I(xzH)vll _ /lxll II -1 -I
- Ii IIAIII - IIAI /I li)i Iz vi = IIxllllzll.IIA II ,
and the result follows from the fact that x and z have norm one. .
The first part of this theorem states that to make a matrix A sin-
gular we must introduce a relative perturbation of at least K-I(A).
Thus, well-conditioned matrices are not nearly singular. The seconcl
part says that for a broad class of norms, which includes the Holder
norms, we can make A singular by introducing a relative perturbation
of K-I(A). In these norms, the larger the condition number, the nearer
to singularity is the matrix.
In general, the condition number is not easy to characterize. How-
ever, for the spectral norm, the condition number can be expressed in
terms of the singular values. The proof of the following theorem is left
as an exercise.
Theorem 2.9. Let A have singular values 0"1 0"2 > ... O"n > O.
Then
0"1
K2(A) = -.
0""
122
III. LINEAR SYSTEMS AND LEAST SQUARES
As modern numerical linear algebra began to develop in the late
1940s, positive definite matrices were very much to the fore. Since the
singular values and the eigenvalues of a positive definite matrix are the
same, the condition number was sometimes defined as the ratio of the
largest to the smallest eigenvalue. Certainly if this ratio is large, the ma-
trix must be ill conditioned (see Exercise 2A). However, the following
example shows that it can fail completely to inclicate ill-conditioning.
Example 2.10. Let An be the matrix illustrated below for n = 5:
A5 =
1 -1
o 1
o 0
o 0
o 0
-1
-1
-1
-1
1
-1 -1
-1 -1
1 -1
o 1
o 0
The eigcnvalues of An are all one; hence the ratio of the largest to tile
smallest eigenvalue is one. On the other hand it is easily seen from the
equation
1
o
o
o
-2- 3
-1 -1 -1 -1
1 -1 -1 -1
o 1 -1 -1
o 0 1-1
000 1
1
!
2
I
4
I
8
I
8
=0
tlUlt replacing the (n, I)-element of An by _22-n makes A exactly singu-
lar. Since IIAnlb 1, it follows from Theorem 2.8 that K2(A n ) 2n2.
This example also shows that the determinant is not a good measure
of singularity, since the determinant of An is always one.
Corollary 2.7 shows that the condition number will never underes-
timate the sensitivity of a matrix to perturbations. Theorem 2.8 shows
that for subordinate norms the condition number is sharp in that it
truly estimates the distance to singularity. Nonetheless, when the con-
dition number is used in practice, it often overestimates the actual
error. The phenomenon is known as ARTIFICIAL ILL-CONDITIONING, and
it is instructive to see how it cOllies about.
2. INVERSES AND LINEAR SYSTEMS
123
Consider the matrix
A" (: _)
TJ > o.
Its inverse is
A-I = ( 1
'I 2 TJ-I
From this it is seen that if TJ < 1
1
_TJ-l
).
1
Koo(A,,) = 1 + -,
1]
and A" becomes increasingly ill-conditioned as TJ approaches zero.
Now .in the sense of being nearly singular, A" is truly ill conditioned
when TJ IS small, for we can construct a small perturbation that will
render A singular. In fact if
E" ( -),
then IIE"lIoo = TJ and A + E is singular. However, in applications we do
. not construct the error in A; rather it is given as part of the problem.
For e.xample, Suppose that TJ represents a column scaling of an original
matnx Al (extreme scaling can result from changes of units; e.g., years
to seconds). This matrix is presumed to have an error E in it - say its
elements are bounded by E, where E is small. Now when Al is scaled
so that it becomes A,p E inherits that scaling, so that its elements are
bounded by the elements of the matrix
(2.8)
( E TJE ) .
E TJE
This says that a perturbation like E'I in (2.8) cannot occur. In other
words, although A'I becomes increasingly ill conditioned, the nature of
the delying problem insures that perturbations exhibiting this ill-
condltlOlll.ng are forbidden. It is this restriction on the range of the
perturbatIOns that makes the ill-conditioning artificial.
124
III. LINEAR SYSTEMS AND LEAST SQUARES
It should be stressed that artificial ill-conditioning in no way rep-
resents a failure of our perturbation theorems: they were designed to
predict the behavior of the inverse matrix under arbitrary perturba-
tions, and they handle the worst cases well. It does, however, represent
a failme of our perturbation theory, which gives no way of incorporat-
illg t.lH' st.rllct.ure of all e!Tor into the it.s hounds. III the next subsection
we will see t.hat. (,olllpolle'nt.wise bounds can alkviate the sit.uation. l3ut
they are not a cure-all, and at the present state of our knowledge we
must cope by using ad-hoc methods, usually some form of rescaling.
The practitioner who encounters a large condition number should ex-
amine his data for artificial ill-conditioning before concluding that his
results are inaccurate.
2.3. Linear Systems
III this subsection we will be concerned with pertmbations of the system
Ax = b. Since the solution of the perturbed system Ai = b satisfies
i-x = (AI - A-I)b, we immediately obtain from (2.3)
11 _ _ II < IIAIEIlIIA-Ilillbll
x x-I _ IIA-I Ell .
(2.9)
However, this bound illustrates one of the ironies of matrix perturbation
theory: a general result often does not give the best bound when applied
in a special case. In particular the factor IIA IIIII bll may be replaced by
IIxll, as the following theorem shows.
Theorem 2.11. Let A be nonsingul"r and let A = A + E be a pertur-
/Jil/.ion of A. For Ii E e" let,
Ax = b.
Let II . II be a consistent matrix norm that is also consistent with the
vector norm II . II. If there is a vector i such that
Ai = b,
(2.10)
then
IIi - :1:11 < IIAl Ell.
Ilill -
2. INVERSES AND LINEAR SYSTEMS
125
If in addition
IIA- 1 Ell < 1,
then (2.10) always has a unique solution, which satisfies
lIi-xll < IIA-IEII
1I:r.1I - 1 -IiA-lEII'
UK(A) = IIAIlIIA-IIi is the conditionl1l1111lJer of A and
K A IIEII
( ) IIAII < 1,
then
K(A) II Ell
IIAII
1 - K(A) IIEII .
IIAII
Prof. The proof is mutatis mutandis the same as that of Theorem 2.5
and Its corollary. The key is the relation i + A-I Ei = x. .
Most of the observations to be made about this theorelll have al-
read .been made in the preceding subsection. Here, as there, the
'" con?lton number determines the relative perturbation of the solution.
. rtJficIa.1 ill-conditioning can occur with linear systems, just as with
.nverses. Perhaps the most interesting feature of the bounds is their
, mdependence of the right-hand side Ii of the linear system.
, One way of dealing with artificial ill-conditioning is to examine the
effects of special perturbations on individual components. We will ill us-
,?rate Yle technique for perturbations in a single column of the matrix
of a Imear system.
:<t
IIi - xii
1f:S;
Let A be nonsingular, and let A = A + E, where
(2.11)
Then
I i I II (-1) -
<,i - i :s; a i 1I.lIellljl,
(2.12)
""I .
..... h (-1)11. "
,{were ai IS the zth row of A-I and II . II and II . II. are dual.
12G
III. LINEAR SYSTEMS AND LEAST SQUARES
Proof. From the relation
x - x = -A-lEx,
we have upon multiplying by iT and substituting the right-hand side
of (2.11) for E
- (I)II -
i - i = -Oi e.i'
'I'll(' t.hCOr<'llI now follows on t.akinF; norms. .
The advantage of this result over more general bounds is that it is
invariant under column scaling. For example, if we replace A by AD,
where D = diag( 6 1 ,6 2 , . . . ,6 n ) is nonsingular, then we must also make
the following substitutions in (2.12):
L it-6i-li,
2. .i t- {;;I.i'
3. e t- 6.ie,
(-1)11 ,-I (-1)11
4. 0i t- (Ii 0i .
The effect of these substitutions is simply to multiply both sides of
(2.12) by 6;1, which does not represent a change. in tle boud.
Let us now turn to the problem of perturbatIOns m the nght-hnd
side of the system Ax = b. In order to describe what s actuall gomg
on, it is necessary to introduce some additional notatIOn. SpecIfically,
let
IIAllllxl1
17 =
IIbll
Since IIbll IIAllllxl1. we have 17 ;::: L On the other hand, since IIxll
IIA- I lillbll, wc hav( 17 K(/l). Whcn1/ is near K(A), we say t.hat t.he
solution of the system Al; = b REFLECTS TilE CONDITION of A. In oth:r
words, when 1/ is near K(A), the size of the solution x is nearly as bIg
as we might predict on the basis of the condition number alone.
With these preliminaries, we can state the following theorem.
Theorem 2.13. Let A E c nxn be nonsingu1ar. For b =I 0, let Ax = b,
and let Ai: = b + e. Then
(2.13)
IIi: - xii K(A) Ilell
<--.
IIxll - 17 IIbll
(2.14)
2. INVERSES AND LINEAR SYSTEMS
127
Proof. We have i: - x = A -Ie, from which
!I:r - xII IIA-Ililieli.
The result now follows on dividing by IIxll = 17l1 b ll/IIAII. .
The left-hand side of the bound (2.14) is the relative error in x.
The factor lIeli/llbll is the relative error in b. The factor K,(A)/1/ is the
condition mllnber of the problem. Whatever the condition of the matrix
A, if x reflects that condition, then x is insensitive to perturbations in
b, whatever they may be. This is in contrast to the sensitivity of x to
perturbations in A, where it is K(A) alone that predicts the worst case.
Theorem 2.12 shows that by manipulating the relation i: - x =
- A -1 Ei: we can obtain bounds that to some extent circumvent the
problems of artificial ill-conditioning. The focus there was the compo-
nents of x. The following STRUCTURED PERTURBATION THEOREM homes
in on the components of A.
Theorem 2.14 (Bauer, Skeel). Let A be nonsingu1ar, Let Ax = b,
and (A + E)i: = b + e. Let 11.11 be an absolute vector norm, and let 11.11
also denote a consistent matdx norm (e.g., the subordh]ate operator
norm). If for some nonnegative S, s, and (
lEI (S and lei (3
and in addition
(IliA-II SII < 1,
then
IIi: - xII < (1IIA-II(Slxl + s)1I
- 1-(IIIAIISIl .
Proof. From the identity
i-x = A-lEx + A-Ie + A-IE(i - x),
it follows that
Ix - xl (jA-IISlxl + (lA-lis + (IA-IISIi: - xl.
The theorem now follows on taking norms and solving for II x - x II. .
128
III. LINEAR SYSTEMS AND LEAST SQUARES
The point of this theorem is that that it gives us control over the
form of the errors. By choosing the STRUCTURE MATRICES Sand s
appropriately we may model the behavior of the error. For example,
if we wish to consider only errors in the matrix A, we may set s = O.
Again, if we wish to consider only relative errors in the elements of A,
we may take'S = IAI. Tlwse substitutions and a little manipulation
yield the following corollary.
eorollary 2.15. Let e = 0 and lEI EIAI. Set
KBS(A) = IIIAIIIAIII.
If EKBs(A) < 1, then
IIx - xii < EKBs(A)
Ilxll - 1 - EKBs(A)'
(2.15)
The number Kns(A) is the BAUERnSKEEL CONDITION NUMBER of A.
It has the property that it is independent of row scaling, which therefore
cannot be a source of artificial ill-conditioning in (2.15). Unfortunately,
the bound can be quite sensitive to column scaling.
We conclude this subsection with the topic of RESIDUAL BOUNDS. .
Generally speaking, most problems in matrix computations can be cast
in the form of solving an equation r(x) = O. For example, r(x) = b-Ax
for the linear system Ax = b. The result of the computation will not
be the exact solution but an approximate solution X, usually one for
which the RESIDUAL r(x) is small. The problem of residual bounds is
to construct a bound on x - x in terms of r(x).
There are many possible solutions to this problem; but one- the
METHOD OF BACKWARD PERTURBATIONS - is particularly fruitful. The
technique is to show that the computed solution is the exact solution of
a problem with slightly perturbed data and to bound the perturbation.
Conventional perturbation theory can be then used to determine the
accuracy of the purported solution X.
This technique is illustrated by the following theorem and its proof.
Theorem 2.16 (Rigal-Gaches). Let A E e nxn . Let II . II denote
a vector norm and its subordinate matrix norm. For any x "I 0, let
r = b - Ax. Then there is a matrix E satisfying
11EII = tl
lIill
2. INVERSES AND LINEAR SYSTEMS
129
such that
(A + E)x = b,
and E is the smallest matrix with this prop er (y. Hence, if A is lJonsilJ-
gular and Ax = b, we have
II.T - xII IIrll
II xII K(A) IIAllllxll '
(2.16)
Proof. Let 11.11. be the norm dual to 11.11. Let y be a vector satisfying
lIyll. = 1 and yHx = Ilxll. Set
ryH
E=-
IIxll'
(2.17)
Then
ryHx
b - (A + E)x = r - Ex = r - - = r - r - 0
IIxll - .
The norm of E is
IIEII = max 111'y Hz II = M max Hz = tJl
IIzll=1 IIxll IIxllllzll=lly I IIxll'
he Ist inequality following from the fact that lIyll. = 1. Moreover,
If E IS any matrix satisfying (A + E)x = b, then Ex = r. Hence
IIEllllxll 2: IIrll or IIEII 2: IIrll/llxll. The bound (2.16) follows directly
from Theorem 2.11. .
Provided we ar willing to sacrifice optimality, here is nothing sa-
cred about the chOIce (2.17) of E. For example, if i "I 0, the choice
T'lT
E=
lIi1dl
is als a backward perturbation for the problem. It has the property
that It concentrates the error in the ith column of A. Other choices
migt plae te error entirely on nonzero elements of A, an important
consIderatIOn 111 dealing with sparse matrices.
There is a structured backward perturbation thearem, the analogue
of Theorem (2.1G) in the spirit of the Bauer-Skeel theorem.
130
III. LINEAR SYSTEMS AND LEAST SQUARES
Theorcm 2.17 (Oettli-Prager). Let r
nonnegative and set
b - Ax. Let Sand s be
Ip;\
f = Inax \ _ 1 )
; (S.T +s;
(here % = 0 and otherwise p/O = 00). Iff "I 00,
and a vector f' with
(2.18)
there is a matrix E
lEI :S fS and lel:S fS
(2.19)
such that
(A + E)x = b + e. (2.20)
1\1oreover, f is the smallest number for which sllch matrices exist.
Proof. From (2"18) we havp
Ipil :S f(Slxl + sk
This in tum implies that r = D(Slil + s), where IDI ::; d. It is then
easily verified that E = DSdiag(sign(I),'" ,sign(n)) and e = -Ds
are the required backwards perturbations.
On the other hand, given perturbations E and e satisfying (2.19)
and (2.20) for some f, we have
Irl = Ib - Axl = lEi - el :S f(Slxl + s).
Hence f Ip;\/(S;r + s);, which shows that the f defined by (2.18) is
optimal. .
2.4. Asymptotic Forms and Derivatives
Throughout this section we have made use of the formula
AI = A-I - A-I EAI.
The main drawback to this formula is the presence of A on. both
sides of the equality sign. There is little we can do about thIs wIthout
passing to infinite series, e.g.,
Al = A-I _ (A- I E)A- 1 + (A- I E)2 A-I - ...,
2. INVERSES AND LINEAR SYSTEMS
131
or inverses, e"g.,
A-I = (I - A-1E)-IA- 1 ,
either of which destroys the simplicity of the relation. Howeve'r, if we
are willing to make do with an approximation, we can write
A-I 3" A-I - A-lEA-I.
Since A -I is a differentiable function of its elements, this FIRST ORDER
APPROXIMATION is accurate up to terms of order IJEIJ2.
First order approximations occupy an important place in perturba-
tion theory. They furnish computable approximations to the perturbed
objects; in fact, in many applications the term "perturbation theory"
amounts to little more than the constructioll of first and higher order
approximations. Moreover, first order approximations are often eas-
ier to work with than their exact equivalents. The following example,
which requires a smattering of probability theory, illustrates this point.
Example 2.18. Let us suppose that the elements of E are independent
random variables with mean zero and standard deviation a. Then
IIA- I - A -IIJF is a random variable, whose distribution gives infonnaUon
about the sensitivity of A -I to perturbations in A. Unfortunately,
its distribution is not tractable. However, we can easily compute the
number
E(IIA- 1 EAII) = uIlA- J II,
which is the root mean square of the Frobenius norm of the linearized
error. This number is analogous to our error bounds, being proportional
"to an error term u and the square of the inverse. However, unlike our
: bounds, it is an exact equality, so that if we can ignore second order
'terms and higher, it is a better estimate of the actual error.
. An important application of first order approximations is to calcu-
:.)ate derivatives. For example, to compute 8A- I /8O:;j note that when
E = El i 1J the matrix A is just A with its (i,j)-element perturbed by
.i'£. Hence for this choice of E,
-"i
8A-I A-I _ A-I
-=lim
8O:;j .o f
132
III. LINEAR SYSTEMS AND LEAST SQUARES
A I - A l = -AI EA + 0((2), we have
Since --
1 T A -1
oA- 1 . -fA- IiI} _ i (I)H
- = llIn - -O(I)OJ
oai) fO (
1 I (-1)11. 'ts 7 'th row.
I " i is the ith column of A am OJ IS Ie, '
WWI(' (l(_I) " ' . f I r .., t'III AI'=/i In
Si1l1ilnr n'slIlt,s hold for t]H' sol1l1 Ion 0 t.)P 11IP,\I sys ,( ." "
t.lip not.atioll of t]H' last. subspct.ion,
x x - A-I Ex.
Moreover
that is,
ax i
- = ja(_l)'
ail
the derivative is proportional to the ith column of A -I.
Notes and References
I r t s finds applications in
The perturbation of matrix inverses am mea sys eme e I h
S . the theor y is simple, exposItors tend to deve op as uc
many areas. Illce . d W t beglll to
or as little as they need in the notation of theIr fiel s. e cannot t Ie st
e tl " bo I of literature but to see how the theory appears 0 a ae
survey liS ( Y , " . 2 3
two numerical optimizers see [166, SectIOn .].
. t I h re as it is found in the numerical analysis litera-
The theory IS presen ,e( e, e e. . t Ie of rounding-
ture where it has become highly polished. The reason IS s Y I : h 1.1 ffects
' , I . - BA CKWARD ROUNDING-ERROR ANALYSIS --111 W lIC Ie e e
error ana YSIS l' 't tl most
(;f rounding error are thrown back on the original ata.. o. CI e. Ie d;o
I 'f Gaussian elimination with partIal plvotmg IS use
famous exmp e, I t A _ b then under certain conditions, which need
solve the lmear sys em x - , " . _ . fi ( A + E)i = b where
not COIlCl'rll us hl'rl', I,he computed solutlOn.x satl les" ." aluate
IIEII/II1'111 is a m(Hll'st multiple of the I'Olllldmg UIlI!.. It: leUHlms t e
the effect of this error on the solution, and this necessity h.as ma e lumer-
ical nists keen perturbation theorists. An unfortunate te ffet s I:
the outsider must search a desert of rounding-error ana YSIS 0 n
, t e f I > e rturbation theory. For historical comments and references on
nugge so, , '.
the subject of rounding error, see [270, 113]. .
Actually, a bound in the classic style. was developed earlier by WIttmeyer
[274, 1936], who showed (in our notatIOn) that
_ IIA-Ilbllel12 + IlbIl21IA-lllIIEI12 ,
11:1' - :r1l2 = 1 -II1'1-11121IEI12
2. INVERSES AND LINEAR SYSTEMS
133
which is equivalent to (2.9) extended to account for errors in the right-
hand side. It is not clear what influence this paper exerted, since it is only
sporadically referenced (e.g., in [240, 121] but not in [269].)
The notion of condition number was introduced by Tllfing [242, 194 8 ] to
quantify "the expressioll 'ill-condition' [which] is sometimes used merely as
a term of abuse appli<:ab]( to Illatri('(s or pqual.iolls, " . . ," Thl' fad that the
condition numbm' appears ill the error boullds lead to attempts to find the
row alld column scaling of A that gives the smallest condition number, of
which the most penetrating investigation was given by Bauer [13, 19 G 3]. Not
much is heard about the topic now, partly because of a better understanding
of what the condition number actually means and partly because of a re-
markable theorem of van del' Sluis [244, 19G9], which says that the condition
number of a matrix is nearly optimal in the spectral norm if its columns have
the same 2-norm. The present discussion of artificial ill-conditionillg owes a
great deal to Wilkinson's COllUnon-sense approach to the subject. [269, Ch.2,
pp.192-193].
Kahan [129] attributes Theorem 2.8 to Gastincl but cites 110 reference" The
.. connection between singularity and condition is not an accident. Kahan
,; [131, 1972] shows that for a number of problems the condition is related to
the distance from degeneracy. Denuuel [55, 1987] gives a uniform treatment
,"; of this phenomenon via differential inequalities.
k' ,
,The condition number K( A) has the drawback that it depends on A -I, which
" seldom needs to be calculated in application. This has given rise to CONDI-
'; TION ESTIMATORS, which attempt to approximate some norm of A -I from a
' factorization of A. The first of these was suggested by Gragg and Stewart
[96, 197 6 ], and an improved version [45, 1979] was incorporated into the
\:LlNPACK codes [59, ]979]. Since t.hen there have been many variatiolls and
"improvements in the technique. See [111] for a survey.
The approach to perturbation t.heory through structured errors, as in The-
orem 2.14, is due to Bauer [14, 1966] in the forward sense and Oettli and
,lager [164, 19 6 4] in t.he backward sense. Skeel [197, 1979] combined this
approach with rounding-error analysis to arrive at important and original
?bservations on the numerical solution of linear systems. The llI11nber KBS
*'sometimes called the Skeel condition number, but it was introduced by
"auer. A variant of Theorem 2.14 for matrix inverses was first established
Bauer [14]. The specialization to linear systems is due to Skeel [197], and
e present statement is a variant of one by Higham [113].
"kward perturbation theory ill the style of Higal and Gauches [ISri, ]9°7]
134
III. LINEAR SYSTEMS AND LEAST SQUARES
has important applications to quasi-Newton methods for the IluIIIerical so-
lution of nonlinear equations and optimization problems, where the per-
turbation is used to update approximate Jacobians and Hessians (e.g., see
[57]). For the case of Hessians, where the update must be symmetric, see
Exercise 2"10.
The use of first order expansions to determine the properties of functions of
random variables goes back at least to Gauss [84, 1821, Section 16]. For a
systematic application of the idea to matrix perturbation theory, see [215].
Exercises
UNLESS OTHERWISE STATED, A IS A NONSINGULAR MATRIX OF
ORDER 11. TilE SYMBOL II . II DENOTES A VECTOR NORM AND
ITS SUBORDINATE MATRIX NORM.
1. Show that if re(A, A) = p, then there exists a matrix R satisfying IIRII
pK( A) such that A = A(I + R). Show that this is the best possible result.
2. Giv<, an example of a matrix X such that III - XAII/IIAIIIIXII is small
while III - AXII/IIAIIIIXII is large (i.e", a matrix that is a good left illverse
but a poor right inv(rsc")
3. Show that for any Holder norm, if re(x, x) p then the relative error in
any component i "I 0 of x is bounded by pllxll/IJ
4. Show that. for any consistent norm
K(A) :0:: IAmax(A)I ,
I Amin (A) I
where AnHL,,(A) is the eigenvalue of A of greatest absolute value and Amin(A)
is the eigenvalue of least absolute value. Show that. for the spect.ral norm
equalit.y is attained whenever A is normal.
5. What. is K2(O.1I,,)? What is det(O.lI,,)?
6" Show that t.he inverse of t.he matrix An in Example 2.10 has the form
2. INVERSES AND LINEAR SYSTEMS
135
illustrated below for 11 = 5:
1 1 248
01124
00112
00011
o 0 001
Generalize and derive an upper bound for K (A ) . f .
n III your avonte norm.
7. Let. Bn be the matrix illust.rated below for n = 5:
1 1 1 1 1
0 1 1 1 1
B5 = 0 0 1 1 1
0 0 0 1 1
0 0 0 0 1
What is B;;I?
8. It is sometimes objected that the matrix in Example 2 10 is not I d
properly. If we normalize the columns so that. the y I 2' c sca.e
a matrix A 1,1 d (A - ) r-; lave -norm one, to gIve
C '" Ie et n = 1 I v n!, which reveals the ill-conditionin g of A
omment. >.
9. (van del' Sluis [244 ] ) L I, D b 1 .
P . t ' d . . e + e I, Ie set of all dIagonal matrices with
OSI Ive mgonal elements. Let
K2,opt(A) = inf K2 ( AD )
DEV+ '
and let
A = A diag(lIaI1l2"", Ila n Il2)-1
(i.e., A scaled so that its columns have 2-norm one). Show that
4A) vnK2,oPt(A).
..10. (Dennis and More [56 ] ) L I, A b .
tthat . e e symmetnc and let r = b - Ax. Show
rx T + ir T iT r
E = - -xxT
IIill 2 II.i;II
8 th t S fi rna ( ll A est S E YI ) Imetric matrix in the Frobenius norm for which the vector
a IS es +, x = b. .
'I
1
I
I
136
11 1. LINEAR SYSTEMS AND LEAST SQUARES
11. (Bunch, DemIlle!, and Van Loan [40]). In the last exercise, show that if
F is any matrix for whdl (A + F)x = b, then IIEl1l' s: 31IFII1' (p = 2, F).
3. The Pseudo-Inverse
In this section we will derive perturhation bounds for the pseudo-
inverse. The task is complicated by the fact that the pseudo-inverse,
unlike the inverse, is not a continuous function of its elements. The
qualitative result is that
lim At = At
AA
!im rank(A) = rank(A).
AA
(3.1)
In spite of this discontinuity, we can ohtain informativf' hounds on At,
even when rank(;l) "I rank(A).
A second complicating feature is that the bounds depend on where
the perturbations lie. For example, if A is of full column rank, then
a perturbation in the orthogonal complement of the column space of
A, no matter how large, can have only bounded effect. Accounting for
this makes our bounds more intricate than the corresponding bounds
for inverses.
We will begin in the first subsection by establishing a uniform no-
tation and nomenclature. Here we will also introduce the notion of an
acute perturbation which will play a central role in the theory. The
second subsection is devoted to general results, and the third to acute
perturbations. In the last subsection we give derivatives and asymp-
totic forms.
The complexity of our results makes it imperative to have a consis-
tent notation.
Throughout the next three sections A will denote an m x n
matrix with Tn 2: n. The projection onto the column space
of A will be denoted by P, and the projection onto the row
space R(A II ) by R. The complementary projectors will be
delloted by P L and R L . As in the last section we will let
A = ;\ + E df'lIote 11 pertl1l'batioll of ;\. The associated
projectors will be denoted by P, ii, F.L, and it.L'
3. THE PSEUDO-INVERSE
137
3.1. Projections and Acute Perturbations
Given the relation of the pseudo-inverse to the geometry of en 't'
natural to cast our res It . t f '" ' I IS
will let . u m errls ultanly mvariant norms. We
th II II ?enote a famIly of umtanly mvariant norms generated b
e symmetnc gauge function 1>. Y
Since we are workin 'tl . . 1 ' .
t t . g WI 1 umtan y mvanant norms we ma y fr eely
ro a e our P roblel t . I ' f ' '
. . n 0 sImp I y It. In particular, let V = (V, V ) b
ullltry matnx with R(V I ) = R(AH), and let U = U I 2 . e.a
matnx with R(Ud = R(A). Then ( I U 2 ) be a UllltaIY
UHAV = ( U!;AVI UAV2 ) = ( All 0 )
U 2 A VI U 2 A V 2 0 0 '
where All is square and nonsingular. If we set
( UfI EV I Uf IEV2 )
Ur EV I UJf EV 2 "
( Ell EI2 ) ,
E 2I E 22
U H EV =
UIIAV = ( All; Ell E12 ) _ ( All EI2 ) .
21 E 22 E 21 E 22
, etill calhese transformed, partitioned matrices the REDUCED FORM
: . pro em. M.any statements about the original problem have
>revea mg analogues III the reduced form. For example, in reduced form
(3.2)
p ii ( An: En ,)
(3.3)
'f
ba !he final ite1 in this subsection is to define the kind of pertur-
, bon uner whIch pseudo-inverses behave well. Essentially these are
, rturbatlns under which the column and row spaces do not It
catastroplucall y The t b t . a er
, . se per ur a IOns are characterized in the f II
theorem. 0 ow-
beorem 3.1. The following statcmcnts are cquivalC'nt.
, 1. liP - Plb < 1.
138
III. LINEAR SYSTEMS AND LEAST SQUARES
2" There is no vector in R(A) that is orthogonal to R(A) and vice
versa.
3. rank(A) = rank(A) = rank(P A).
Corresponding statcments hold for the row spaces of A and A.
Proof. 1 :=} 2: Suppose there were a nonzero vector x E R(A) that
is orthogonal to R(A). This is equivalent to saying that Px = x and
Fx = O. It follows that (P-F)x = x, which implies that IIP-FII2 2: 1,
a contradiction. The reciprocal relation follows by interchanging A and
A in the preceding argument.
2 :=} 3: If, say, rank(A) > rank(A), then the dimension of the
column space of A is greater than the dimension of the column space
of A. Hence there is a vector in R(A) that is orthogonal to R(A).
Thus it remains only to show that rank(P A) = rank(A). In reduced
form this amounts to saying that the matrix (All E 12 ) has full row
rank [cf. (3.3)]. Suppose to the contrary that xH(All E l2 ) = 0 for some
nonzero vector x. Then (XII O)H E R(A). But by (3.3), (x H O)A = 0,
i.e., (xII 0)11 E R(A).l. _
3 :=} 1: By Corollary 1.5.4 and Theorem 1.5.5 the 2-norm of P - P
is the 2-norm of Xry, where the columns of X.l form an orthonormal
basis for R(A).l, and those of Y an orthonormal basis for R(A). In
reduced form we !lIay take X = (0 1)11" To get Y, note that since
(All E 12 ) has full row rank, we may permute the columns of A so that
the reduced form is
( Bll BI2 ) ,
B 21 B 22
where Bll is nonsingular. If we set F = B2IBl/' then the columns
y ( , ) (I + pH P) -!
form an orthonormal basis for R(A). It follows that XHy = F(I +
F II F)-!. It is easy to see that if a is a singular value of F then.
a(l + a 2 )-! < 1 is a singular value of XHy, and conversely. Hence
liP - PI12 = IIX H Yl12 < 1. .
This theorem suggests the following definition.
3. THE PSEUDO-INVERSE
139
Definition 3 2 The 1 t' A - .
PII . -' na nx IS an ACUTE PERTURBATION of A if ll P-
2 < 1 and I I R - R II < 1 TV 1 1 -
2 . He a so say t wt A and A are acute.
Thus the column spaces of acute perturbations ar e of the d .
m' d I . same 1-
h enslOns an t Ie canomcal angles betweell them are all less than 'Tr / 2
w ence the name acute t b t . , '
per ur a IOn. 1 he same is true O f tl
spaces TI f II . I ,Ie row
d d . £ Ie 0 owmg t leorem characterizes acute perturbations in the
re uce lorm. '
Teor 1 e 3:3. .In the :-educed form the matrices A and A are acute if
an on y If Au IS nonsmgu1ar and
E 22 = E2IAj/ E 12 .
(3.4)
F 21 = E21Aj/
Fl2 = Aj/ E 12 ,
L ( F, ) A" (I p,,)
(3.5)
if A' (I p,,)' A,,' ( 1', )' (3.G)
Proof. Assume that A d A -
'/ 1 an are acute, and suppose that A is
t u ar. Specifically, let Allx = 0 for Some x Ie 0, and considerllthe
or y = E 21 x. If Y = 0 the ( H O ) H . .
" !J1 tha . ' n x IS a vector m the row space
J, t I orthogonal to the row _space of A. If y Ie 0, then (0 I1)H
, v ec f to A r m :he columl: space of A that is orthogonal to the cOInns
eo. EIther case IS a contradiction.
Equation (3.) simply says that the last columns (rows) of the ar-
!on (3.2) ar.e Ime<:t r combinations of the initial columns (rows) Wich
necessary sme ll s nonsingular and rank(A) = rank(A), '
Conversely, If All IS nonsingular and ( 3.4 ) holds the n . t . . 1
ed that rank ( A ) _ - -' .: I IS easl y
. - rank(A) = rank(P A) = rank ( AR ) wi' 'h .
Clent for acuteness. ' IIC IS
The formula ( 3 5 ) I ' S . d .
k . an Imme late consequence of (3.4) an I (3 6)
ows from Penrose's conditions. . ' ( "
140
III. LINEAR SYSTEMS AND LEAST SQUARES
. . \ . to verify that
From the above characterizatIOn, 1, IS easy, " ,
lim rank(A) = rank(A)
A.-,A
if and only if A and A are ultimately acute. Comparing this with (3.1),
we see that
the set on which the pseudo-inverse is continuous about A
is the set of acute perturbations of A.
3.2. General Results
In this subsection we shall establish results, due to Wedin, tha\d.not
re( uire acuteness. The first result shows that nonacue pertur a IOns
ar no only discontinuous but in some sense behave lIke poles.
Theorem 3.4. If A and A are not acute, then
- t 1
II At - A II IIElb '
Moreover, if rank(A) rank(A), thcn
-t
IIA II IIEII2'
- ) k ( A ) Si ce A is not an acute
P roof SU I J p OSe that rank(A ran . n - . h I
. t E R ( A ) that IS ort ogona
perturbation of A, there is a nonzero vc or x. hit R(A H )
R ( A ) or a nonzero vector x E R(A H ) that IS Oft ogona 0 .
to f r t that the former is true ancl that
Assume without loss 0 genera I y
Ilxlb = 1. Then
1 = xHx
= x H Fx
= xHAAt x
= xll(A + E)At x
= xHEAt x
::; IIE11211At x lb.
3. THE PSEUDO-INVERSE
141
Hence "jitlb IIAIxll 2 1/IIEII2, which establishes (3.8). To establish
(3.7), note that At: r = O. Hence
(3.7)
(3.12)
IIAt - Atlb II(At - At)xlb = IIAt x ll 2 11;112 '
This inequality for rank(A) ::; rank(A) is valid by symmetry. .
For small perturbations the case of interest is r = rank(A) >
rank(A), since rank(A) = rank(A) implies that the perturbation is
acute when E is sufficiently sma}!. In this case it is easy to understand
what is going OIL The matrix A !!lllSt have l' nonzero singular values.
Since A has fewer than r singular values, at least one of the singular val-
ues of A must approach zero as A approaches A. This means that At,
whose spectral nOrm is equal to the reciprocal of the smallest singular
value of A, must diverge.
We now turn to our general perturbation bounds. The theorems are
based on two lemmas: one containing perturbation bounds on products
of projections, and the other an explicit formula for At. First the
bounds.
The projections P.L and P saUs(y
P P.L = (At)H E H P.L,
(3.9)
(3.8)
IIPP.LII ::; IIA t ll 2 11EII.
(3.10)
liP P.LII = IIP.LPII.
(3.11)
liP P.LII IIP.LPII.
mark 3.6. Similar results hold for the product P F.L as well as the
." projections Rand R. We will reference this lemma for any of these
ults.
1,12
II [. LINI';AH SYSTIMS AND LEAST SqUARES
Proof. We have
P P.L = pil P.L = (At)H All P.L
= (At)ll(A + E)H P.L
= (At)HEIIP.L'
Thr inC'quality (3.10) follows on taking norms in (3.9).
The inequality (3.11) follows from Theorem 1.5.5, which shows that
if rank(A) = rank(A) then P P.L and F.LP have the same singular values.
To establish (3.12), let F = i\ + F 2 , where rank(i\) = rank(A) and
R(?2) 1. R(A). Then
IIP?.LII = IIP(l- FI - F 2 )11 = IIP(l-l\)1I = IIFIP.LI\,
the last inequality following from (3.11). Now for any x we have
IIF 1 P .L x lb ::; liP P.Lxlb.
Hence by Theorem 1.5.5, IIFIP.LII ::; II? P.LII. ·
The RecollCl ingredient in our bounds is an explicit expression for
At. Actually, we will use three closely related expressions.
Lemma 3.7. The difference At - At is given by the expressions
At - At = -AtEAt + AtP.L - R.LAt,
(3.13)
At - At = -At ?ERAt + At F P.L - R.LRAt,
(3.14)
and
At _ At = -AtFERAt + (AHA)tREHp.L + R.LElIp(AAH)t. (3.15)
Proof. These expressions may be verified by replacing all quantities
by their definitions in terms of A, A, At, and At (e.g., E = A - A) and
simplifying. .
We are now in a position to establish a general bound on IIAt - Afli.
The exact form of the bound depends on the norm II . II.
. 3. THE PSEUDO-INVERSE
143
--
Theorem 3.8 (Wedin). The error At - At has the following bound.
jlAt - Atll ::; Jlmax{IIAtll, IIAtll}IIEII,
(3.16)
where Jl is given in the following table:
II . II arbitrary spectral Frobenius
IL 3 1+ 2 5 V2
Prof. The inequality for an arbitrary norm follows immediately on
takmg norrs in (3.15). The results for the spectral and Frobenius
norms reqlllre further argument.
For the spectral norm we have from (3.13) that for any unit vector
UE em
II(At - At)ull = 11- tEAtu + At!.Lull + IIR.LAtull
= II -:. At EAt Pu + At P.LP.LUIl + IIR.L At Pull
::; (IiAtEAtIl2I1PuIl2 + IIAtP.L1I2I1P.LuIl2)2
+ IIR.LAtllIIPull.
(3.17)
0'1 = IIAt EAtIl2, 0'2 = IIAt P.L1I2, 0'3 = IIR.LAtlh
cosO = IIPull2 ;:::: 0, sinO = IIP.LuIl2;:::: o.
.Substitutin g these values into (3"17), we get
II(A t - A t)ull ::; (0'1 cos 0 + 0'2 sin 0)2 + 0'5 cos 2 0
::; maxo::;o::; [( 0'1 cos 0 + 0'2 sin 0)2 + 0'5 cos 2 0]
= H O'f + O' + 0'5 + [(O'f + O' + 0'5)2 - 40'0'5]}
< 3+V5 { 2 2 2 }
- max 0'1,0'2,0'3
- ( 1+V5 ) 2 { 2 2 2 }
- max 0'1,0'2,0'3 .
(3.18)
0'1 ::; IIAtlbllAtll211Elk
144
Ill. LINEAR SYSTEMS AND LEAST SQUARES
By Lemma 3.5,
0'2 = IIAt P Pllb
::; IIAtlbll P P1lb
::; IIAtllIIEII2'
and similnrly
(t:\ -:; IIAtllIIElh"
Hence from (3.18) we obtain
IIAt - A t l1 2 ::; 1 \ v'5 max{IIAtll, IIA t llnllEl12' (3.19)
For the Frobenius norm, we first consider the case where rank(A) ::;
rank(A). Let
FI = - At P E RA t, F 2 = At P P l, F3 = - R 1 RA t
be the terms in (3.14). Then
IIAt - Atll = IIFI + F211 + IIF311.
Now FI + F 2 = At(-PEAtp + PP l ). Hence
I!PI + F211::; IIAtll(IIPEAtpll + IIPPlll).
(3.20)
It follows from Lemma 3.5 that
II [JEAtrllf. + IIPrlll} -:;IIPEAtll, + lIlPIif.
= liP EAt II} + IIPlEAtll}
= liE At II}
::; IIAtllIIEII}.
Consequently,
IIFI + F211F ::; IIAtIl2I1AtIl21IEIIF'
Moreover, we have
IIF31h -:; IIA t I1 2 I1RJ.RIIF = IIA t l1 2 11R l E H Atll F -:; IIAtllIIEIIF'
3. THE PSEUDO-INVERSE
145
Hence from (3.20) we get
II At - AtIl F ::; J2IIAtlbmax{IIAtIl2' IIA t lb}II E IiF
-:; J2 max{IIAtll, IIAtllnIlEIiF.
(3.21)
Since the bound is symmetric in A and B, it also holds for the case
rank(A) -:; rank(A). .
Although the perturbation bounds bear a family resemblance to
the bounds for matrix inverses, they cannot by themselves insure the
convergence of At to At as E -t 0, since At may grow unboundedly.
What is needed is the additional hypothesis that rank(A) = rank(A),
which ensures that A and A will become acute as E -t O. The following
theorem gives the perturbation bounds that hold for this case. It is
established by essentially the samc arguments as Theorem 3.8 - with
the difference that some of the terms vanish in the expressions for At.
The details are left as an exercise.
Theorem 3.9 (Wedin). Let A E c rnxn , where m 2:: n. lfrank(A) =
rank( A), then
IIAt - Atll ::; JlIIAtIl2I1AtlbIlEII,
(3.22)
wllere JI is given by the following table:
norm
arbitrary spectral Frobenius
rank(A) < m, n 3 1+ 5 J2
rank(A) = n 2 J2 1
rank(A) = m = n 1 1 1
A trivial rearrangement of (3.22) gives a familiar looking corollary.
Let
K2(A) = IIA11211A t 11 2 .
IIAt - Atll IIEII
IIA t ib -:; Jl K 2(A) IIA112 '
(3.23)
14G
III. LINEAR SYSTEMS AND LEAST SQUARES
There are two points to make about this corollary. First, although
the number K2(A) is usually called the condition number of A, the
theorem shows that the "real" condition number is jlK2(A) - at least to
the extent that the bounds really describe the behavior of At. However,
tlH' usag(' is sanct.ioned by cnstom; and if we take the view that a
condition nUlnber is any number whose size gives a rough estimate of
thC' sensitivity of the problem, then the usage is even correct.
The second point is that as E -. 0 the right-hand side of (3.23)
approaches zero. This means that the relative error in At approaches
zero; i.e., At -. At, all under the hypothesis that rank(A) = rank(A).
On the other hand if rank(A) =I rank(A), then by Theorem 3.4 the
matrix At cannot converge. Thus we have established the statement
(3.1) with which we opened this section:
a //ecessary and sufficient condition for At -. At as A -. A
is thai rank(A) -. rank(A)"
3.3. Acute Perturbations
It is evident from the proof of Theorem 3.8 th(t we have given away
much in deriving the bounds. In particular, if A is a small acute per-
turbation of A then P and P are nearly equal, and the same is true of
R and it. Thus it follows from (3.15) that At - At can be decomposed
into three terms - one essentially depending on PER, one on PERl.,
and one on Pl.ER. However, this does not tell the whole story; for we
shall show t.hat the df'pendency of At - At on PERl. and Pl.ER is
bounded, no matter how large these projections may be.
In order to state our results concisely, we must introduce some addi- '
tional notation. Let 11.11 be generated by the symmetric gauge function
IJ>, and for any FE ekXT(k 2: r) define
w1.(F) (f IJ> ( al(F) 1 '"'' aT(F) 1 ) I!FII.
[1 + ar(F)] '2 [1 + a;(F)] '2
(3.24)
The function W <j> is not a norm; however, it has some useful properties.
First, from Theorems 1.4.5 and II.3.9 and the monotonicity of IJ>,
W<j>(GF) W4.(IIGlbF) w4.(IIGIIP).
3. THE PSEUDO-INVERSE
147
Second, since for 0' 2: 1
O'a
O'a
<
(1 + 0'2( 2 )! (1 + ( 2 )!'
we have
0' 2: 1 ==} W<j>(O'F) O'W<j>(F).
For small , the function w<j> is asymptotic to IIFII, and for all F it is
bounded: VIZ.,
W<j>(F) lilT II.
Finally, for the spectral norm
W 2 (F) = 1!F1I2 .
(1 + IIFIID!
Our first resnlt conCPfllS a rather special matrix"
Lemma 3.11. The matrix
U)
U)' $1
2
(3.25)
( ;, )' - (I 0) "'_(F).
(3.26)
Let aj(F) be the singular values of F. It is easily verified that
( ) t (I + F H Fr'(I F"),
(3.27)
1
[ 2 1 1,
1 + a j (F)]'2
148
III. LINEAR SYSTEMS AND LEAST SQUARES
from which (3.25) follows. Also if
G ( , )' - (I 0),
then
GG H = I - (I + F H F)-I.
It follows that the singular values of G are given by
(Ji(F)
1 ,
[1 + (JJ(F)] 2
which establishes (3"2G). .
We turn now to the perturbation theorem.
Theorem 3.12. Let A be an acute perturbation of A, and let
--I
K- = IIAllliAu lb.
Then
IIA;It;tll s K. "'li' + W (K. III ) + w (K. lf11 )'
(3.28)
where w is defined by (3.24).
Proof. Let F ij be defined as in Theorem 3.3. Let
I" ( n, I" (I 0)
J - ( ) J I2 = ( I FI2)'
21 - I '
. - t --1 t.
From (3.G), At = J I2 A ll J 21 , hence
iit - At = (Jt2 - It2)A 1 lIJI + .712 A ll(JJI - IJd + J12(A 1 l - A1U1)
. 3. THE PSEUDO-INVERSE
149
From Corollary 2.7 we have the following bound:
II t (A -I A -I ) t II IIA -I II ,IIEuil
J 12 11 - II .7 21 S ]1 IIAIJII '
(3.30)
By Lemma 3.11
11(112 - I1 2 )A I lIJlII S IIA I lIIIIJ12 - ILII
= II All IIW.p(F I2 )
= II All II w.p(All E 12 )
S IIAlllllw.p (K.rn),
(3.31)
and likewise
IIJ1 2 A I l(JJl - IJI)II S IIAi-Itllw (K. Iil )' (3.32)
The bound (3.28) follows on combining (3.29), (3.30), (3.31), and (3.32)
and recalling that IIAI/II = IIAtll. .
The bound (3.28) gives a rather nice dissection of IIA L Atll. Asymp-
totically, it is better than the bound that would be obtained by taking
norms in (3.15); i.e.,
IIAt - Atll , liEu II + II E l211 + IIE 21 11
IIAtll S IIAII
(the two are asymptotically equal for El2 and E 21 small). However, the
bound also shows that E I2 and E 21 can have at most a bounded effect
on IIAt - Atll.
When A is square and nonsingular, EJ2 and E21 are void, and the
bound reduces to that of Corollary 2.7.
, If Eu is sufficiently small, we can estimate II All 112 in terms of
nAlllb and II Ell. This gives the following corollary.
Corollary 3.13. In Theorem 3.12, let
= IIAIlIIAtlb,
IIAtll211Ellil < 1,
150
III. LINEAR SYSTEMS AND LEAST SQUARES
so that
, == 1 - K:11£llII/IiAIl > o.
Then
(3.33)
II At II :S IIAtll/!,
and
IIAt - Alii < '5:. II Ell II + W ( '5:. E21 ) + W<I> ( '5:. El2 ) .
IIAtll -, IIAII <I> ,IIAII ,IIAII
. - t --I t
Proof. From the equatIOn At = J l2 A ll J 21 , we have
IliP11 :S IIJLIIIIAiI11lIlJJllb:S IIAi/li.
(3.34)
By Corollary 2.7,
II A-III :S IIAill1l = IIAtll ,
11 , ,
Also :S K/!, and the inequality (3.34) fol-
which establishes (3.33).
lows from (3.28). .
3.4. Asymptotic Forms and Derivatives
Asymptotic forms for A may be obtained from (3 15). Of. course for
jjt t.o approach At we must have rankSA) = rank(A); and smce we are
assuming that E is arbitrarily small, A may be assumed to be an acute
perturbation of A. In this case
At = At + O(IIEII),
and
P = AAt = (A + E)[At + O(IIEII)] = P + O(IIEII)
with similar expressions for the other projections. Hence from (3.15)
At = At _ At PERAt + (A H A)t RE H Pl. - Rl.EH P(AAH)t + 0(IIEI12).
(3.35)
We could apply this formula, as we did the corresponding formul.a
for the inverse to calculate oAt loaij; however, the results are complI-
cated. Instead' let us assume that A( T) is a differentiable function of T
with
rank[A( T)] = rank[A( T')]
3. THE PSEUDO-INVERSE
151
for all T and T'. Then A(T)t is a differentiable function of T and
dAt dA dA H dAH
- d = -At p - d RAt + (A H A)tR- d -Pl. - Rl. -P(AAH)t. (3.36)
T T T dT
The reduced form of (3.35) can be computationally useful. From
the results of the last section we have
Ai/ = Ai/ - AI/ Ell Ai/ + 0(11 Ell 11 2 ).
FrOill (3.27) in the proof of Lemma 3.11 we have
( ) t
I -H H
F 21 = (I Au E 21 ) + 0(11 Ell II II E 21 II)
and
(I F 12 )t = ( EH-H ) +O(IIEllIIIIEdl).
12 II
Hence from (3.6)
A.t =
( Ai/- Ai/Ell Ai} + 0(11 Ell 112)
E{(AllA{l)-1 + O(IIEllIIIIEl2I1)
(AH A 1 d- 1 E£{ + O(IIE II IIIIE 2 J) )
E{;(A{lAllAH)-lE{ .
+0(11 Ell II IIEI211 IIE 21 II)
Notes and References
'Much of the material in this and the next two sections has been taken,
{,SOmetimes word for word, from a survey article by Stewart [205, 1977].
The notion of acute subspaces is due to Davis and Kahan [53, 197 0 ], who
used the second condition of Theorem 3.1 as a characterization. The notion
. of an acute perturbation of a matrix is due to Wedin [260, 1973].
t.:
"'?enrose [178, 1955] established that the pseudo-inverse is continuous only if
'he rank is unchanged. However, he used techniques that do not give explicit
. rturbation bounds. The subject was revived by Golub and Wilkinson [94,
19 66 ] , whose interest in stable algorithms for solving least squares problems
152
III. LINEAR SYSTEMS AND LEAST SQUARES
(see [88]) led t.helll t.o derive first.-order pertl1l'bation bounds for least squares
solutions. 'I'IH' first. perturbation bounds for t.he pseudo-inverse itself were
given by Ben-Israel [24, 1966], who rest.ricts his class of perturbations so
t.hat (in reduced form) only Ell is nonzero. More general bounds for acute
perturbat.ions were established by Hanson and Lawson [102, 1969], Pereyra
[180, 1!)69], and St.ewart [199,1969]. Theorem 3.12 extends Stewart's bound
to unitarily invariant norms. An identity in terms of projections related to
(3.6) is given by Wedin [260, 1973], who uses it to derive bounds for acute
perturbations.
The general results in the second subsection arc essentially due to Wedin
[260, 1973]. Theorem 3.4 is an extension by Wedin of a theorem of St.ewart
[199, 1969]. In an earlier report Wedin [258, 1969] considers t.he sharpness
of the constants /1" iu Theorem :U) and shows that for tllf' spect.ralnorm the
constant /1, canuot be made small!'r.
Early differentiability H'stilts have been given by Pavel-Parvu and Korganoff
[176, 1969], Hearon and Evans [107, 1968] and Decell [54, 1972]. Wedin
[258, 1969] derived the formula (3.36), as we do, from (3.15). The same
results for functions of several variables was derived independently by Golub
and Pereyra [gO, 1973] in connection with separable nonlinear least squares
problems. For more, see [91].
Exercises
1. Show that if X has linearly independent columns and B is positive defi-
nite, then R(X) and R(BX) are acute.
2. Let XI and Y I have full column rank and suppose that R(Xd and R(Y I )
are acute. Show that if the columns of X 2 span R(Yd.L then (XI X 2 ) is
nonsingular. [Hint: Use canonical bases.]
3. Show that rank(A) = rank(A) is not sufficient for A to be an acute
perturbation of A.
4. Give an example of matrices A and A such that IIP A - PAI12 < 1, while
liRA - RAII2 = 1.
5. Let K2(A) = IIAI12I1AtIl2. Show that if rank(A) < rank(A) then
IIA - AI12 > K ( A )
IIAII2 - ,
and the bound can be attained.
4. PROJECTIONS
153
6. (edin [258]): Show that the constants for the spectral norm in The-
orem 3..9. re optunaL [Note: this is a hard problem, Wedin solves it, not
by elllbltmg IIlt.rices for which the bounds are attained, but by exhibiting
matrIces for wIndr the bounds are asymptotically sharp.]
7. Let \]11> be defined by (3.24). Show that
lal :::; 1 Inl\]l1>(F):::; \]I1>(aF)
and
IIFII <
[1 + aax(F)] - \]I1>(F) :::; IIFII.
8. Let A be of full rank. Calculate a At I Oaij.
4. Projections
I this section we shall consider how the projection P varies with A.
Sll1ce P = AA t, we can obtain perturbation bo.unds for P from the
theory develope in he last section. However we can derive sharper
bouds by workll1g dIrectly with Olle of the decompositiolls of At. In
partIcular we shall work :vith the decomposition (3.6) based on the
reducd forms of A and A. The use of this form presupposes that A
and A are acute. This is no loss, since, as with the pseudo-inverse we
ust reuire rank(A) = rank(A) to ensure the continuity of PA, wllich
111 turn Implies that the perturbation is ultimately acute.
Theorem 4.1. Let A be an acute perturbation of A, and let k, be
defined as in Theorem 3.12. Then
liP - PII2 ::; k,II E 21lldIiAIl 1
[1 + (k,II E 21112/I1AIIPJ> < 1.
Proof. With F 21 defined as in Theorem 3.3, we have
(4.1 )
R(A) = R ( I ) .
F 21
The matrix
( l ) (I + F2F21)-I(I FD
154
III. LINEAR SYSTEMS AND LEAST SQUARES
is a Hermitian, idempotent matrix whose column space is R(il); hcncc
it is P. It follows that
- ( (I + FJF2Itl - I (I + FJF2Itl FJi ) ( 4.2)
P - P = II -I H ) I FH '
F 21 (I + F 21 F 21 ) F 21 (I + F 21 F 21 - 21
from which it is easily verified that
- 2 ( f'1{F21(I+FJ{F21)-1 a ) . (4.3)
(P - P) = a F 21 (I + F2 F 2 J)-1 FJi
N ow the nonzero singular values of the diagonal blocks in (4.3) are
a;(F 2 d/[1 + a;(F 2 dJ,
where the ai(F 21 ) are the nonzero singular values of F 21 . The ound
follows from the fact that the largest singular value al of F 21 satIsfies
A IIE 21 1b .
al(F 21 ) = IIF 21 1b ::; '" IIAII .
In terms of projections, t.he bound (4.1) can be written in the form
kIlP.L ER II2/IiAII 1 < 1.
liP - PI12 ::; [1 + (kIIP.L ER I12/IIAIIF]"2
The bound is interesting in several ways. First, it is independent of E I2
d E S econd its de p endence on Ell is only through the constant
an 22., . . ll' t zero
k. Third, the bound is always less than Ulllty. Fma y, It goes 0
along with E21" . h
If the hypotheses of Carollary 3.13 are satisfied. (that IS, w en
IIA]}112I1Ellll < 1), then we may replace k by "'h 1Il (4.1). Thus,
'" serves as a condition number for P A.
Asymptotic forms may be obtained in the usual way from (4.2).
Indeed,
- ( 0(IIE21112) FJ{ + 0(IIE 21 11 3 )
p - P = F 21 + O(lIE21113) 0(IIE 21 11 2 ) .
5. LEAST SQUARES PROBLEMS
155
In terms of projections
P = P + P.LERAt + AtIlREHp.L + 0(II P .L E RII2).
It follows that if A( T) is differentiable and varies without changing rank,
then P( T) is differentiable and
dP = P.L dA RAt + Atlf R dAH Pi."
dT dT dT
(4.4)
Notes and References
Theorem 4.1 is due to Stewart [205, 1977]. The expression (4.4) for the
derivative is due to Golub and Pereyra [90, 1973].
5. The Linear Least Squares Problem
In this section we will derive perturbation bounds for the solution of
the least squares problem of minimizing lib - Axlb and bounds for the
resulting residual vector. Although the solution of minimum norm is
given by x = At b, the perturbatiOlI theory of Scction 3 again docs not
. give the best possible results.
i!. We shall assume throughout this section that A is an acute pertur-
.bation of A, and we shall work with the reduced form of the problem.
111 this form x is replaced by 7HX and b is replaced by UHb. If x and b
e partitioned in the forms
x (::), h ( )
XI = AI/b l
X2 = O.
oreover, the norm of the residual vector
r = b - Ax
15G
III. LINEAR SYSTEMS AND LEAST SQUARES
is gi ven by
111'112 = Ilb 2 112"
In the theorems to follow we shall freely use the definitions made
in the previolls sections (e.g., k, K" and!,). As in Sections 3 and 4, the
nlllnber K may be replaced by K,h whenever IIAtlbllEllll < 1.
OnE' additional piece of notation will be lleeded" In analogy with
('L I:n, ddil1(,
IIAllllxlb
1]=
IIb l 1l 2 .
Since b] = AllXI, we have 1] 2': 1. Also Ilxll ::; IIAtlillblll, which shows
that 1/ ::; K,. When A is ill conditioned, that is, when At is large, the
vector :r may be either large or small. In the first case 1] is near K" and
we shall say that x reflects the ill-conditioning of A.
5.1. Perturbation of the eoefficients
We begin by bounding the effects of perturbations in b"
Theorem 5.1. Let b = b + e, x = Atb, and x = Atb. Then
IIx - Xll2 K, II Pe l12
<--.
IIxlb - 1} IIPbll 2
Proof. With the obvious partitioning of e we have i-x = Ailel , so
that
IIi - xl12 S; IIAill112l1ellb" (5.2)
But 11.1:112 = 1/lIb l lb/IiAII, which combined with (5.2) yields (5.1). .
Theorem 5.1 shows that. the perturbation in x is determined by the
projectioll of e onto R(A). However, Pe is normalized by IIPbIl2, and
if this latt.er quantity is small, the perturbation may be large. Since
IIbll = IIPbll + IIrll,
this observation may be summarized by saying that large residuals are'
troublesome, a statment that will be amply confirmed later.
Since 1/ can be as large as K" the number K, cannot be taken as a'
conditioll llumber for perturbations in b without further qualification.
(5.1)
11(112 - 112)Ai,/b I 1l 2 < W 2 ( k El2 ) I I x il
- IIAII 2.
5. LEAST SQUARES PROBLEMS
157
If x does not reflect the ill-conditioning of 4 tl' .
K. is a condition number As' . '. lell 1} IS near Ul1lty and
. . t ' : 1} glowS the solutlOll becomes increasin g l y
msensl Ive to perturbatIOns ill b.
We next turn the effects of a perturbation in A
on x.
Theorem 5.2. .Let x = Atb and i: = .1tb, where A
acute perturbatIOn of A. Then = A + E is all
li-xI2 <k"Ellib ( ,EI2 )
IIxll2 - IfAII + W K, IIAII
+ k2 11Edl2 ( 1-l lIb2112 IIE21112 )
IIAII } IIb l ll 2 + IfAII .
(5.3)
X-x = J{2(.1i/ - Ai/)b l + (J{2 - I{2)Ai/bl + .J{2 .1 1'/(141 - 141)b. (5.4)
Then
1/J12(.1il - Ail )bI//2 < k IIEllll2 11 x li
- IIAII 2,
J12 .1 i/(111 - IJI)b = J12 .1 1'/[(I + F2F2d-1 - I]b l
*-,' + JL.1I'/(I + F2F2d-IF2b2.
bound the first term in (5.7), note that
(I + FiIF 2 d- 1 - I = _ ( I + F.Hp. ) -I p, 1I T:'
21 21 21r21.
II t - I
J I2 A il [(I + F2 F2d - I]bdl2
::; lIi/1I211(I + F2F2d-1112I1FJIII2I1F21bdb
s; IIAlllIIIE2I.1l/blIl2
S lI.1l/IIIIE2111lIx"2
_ [ , ] 2 II II
- K, IIAII x 2.
(5.5)
(5.6)
(5.7)
(5.8)
158
111. LINEAR SYSTEMS AND LEAST SQUARES
For the second term in (5.7) we have
IIJJIA;/ (I + FJ[F2d- 1 F 21 b 2 112
:S IIA11111IIE2111b211
= 11/1 1 / 1IIIE21112 :::i 1]-llIxlbIIAIl
< -1'2 I1E2111>Ji 11 II
_ 1] K IIAII 11"111 x 2.
The bound (5.3) follows on combining (5.4)-(5.9). .
The first two terms in (5.3) are unexceptionable. The first term
corresponcls to the classical result for linear systems and is the only
nonzero term when A is square and nonsingular. The second term
depends on P ERJ. and vanishes when A is of full column rank, as it is
in many applications"
The third term J'('quires more explanation. If terms of second order
in IIE2111 are ignored, it is essentially
(5.9)
'2 11b 2 11211E21112 = t e 211E2d12
K Ilbllh IIAlj - an 1] IIAlj ,
where e is the angle subtended by band R(A). The number 1]-1 tan e
can vary from a to (x). It is small when e is small (i.e., when the residual
vector is small). It is also reduced in size when IIEulb is small and x
reflects the ill-conditioning of A so that 1] K k. When x does not
reflect the ill-conditioning of A and e is significant, it is of order k 2 ,
thus making the third term in (5.3) the dominant one.
We have bounded the third term in the decomposition (5.4) in such
a way as to reflect its behavior when E 21 is small. In fact it is bounded
for all values of E 21 , and the third term in (5.3) may be replaced by
,lIbll 2 ( , E21 )
K1] llb l l1 2 W 2 K IIAII .
For the full rank case there is a structured perturbation theorem for
least squares solutions.
Theorem 5.3 (Bjorck). Let A be offull column rank and let x = Ab.
Let Sand 8 /J(' llOnncgati ve, ilnd assume that
lEI :S eS and lel:S es.
(5.10)
, 5. LEAST SQUARES PROBLEMS
159
_i is=an solution f .the least squares problem of minimizing lib _
Xll2 - II r ll2 (n.b., A IS not assumed to be an acute perturbation of
A), then for any absolute norm II . II
IIi - xII :S e[IIIAtl(s + Slxl)II + III(AHA)-'ISTIFIII].
(5.11)
Proof. By Theorem 1.5, we have
Hence
CA : E)H A E )(; ) ( )
( I A ) ( F ) ( b + e - Ei )
A" 0 j. -E"1"
(: 11 )' (: -(::t')'
(5.12)
x - Atb = Ate - AtEx + (AIIA)-I ETF.
Since x = Atb, we have on taking absolute values and using (5"10)
'.
Ix - xl :S e[lAtl(s + Six!) + I(AIIA)-IISTIFI]"
;,The inequality (5.11) now follows on taking norms in (5.13). .
'" '..v
(5.13)
.Remark 5.4. The proof of the theorem works if the matrix (A + E)H
in (5.12) is replaced by (A + Fl', where IFI S eS.
( . As it stands, the theorem is unsatisfactory for applications in which
1!i" IS not kno.wn explicitly, for in that case it is impossible to compute
. However, If we compute f = b - Ax, then 1 = 1 - Ex. It follows that
IFI :S If I + eSlxl,
d we may use this upper bound in place of In in (5.11). Note that
adjustment is only of order e 2 and will usually be negligible.
160
III. LINEAR SYSTEMS AND LEAST SQUARES
5.2. The Residual
. . . 1 l' - Pb the theory of Section 4 may
SincC' thc resIdual vector IS glven)y -, . S"f . II if
bc applied to give perturbation bounds for the residual. pecI Ica y,
i = Atb
and
l' = b - Ai = Pb,
then
liT - 1'112 :s liP - Plbllblb
and II P - Fib can be bounded by (4.1).
5.3. Backward Perturbations
The problem of backward perturbatons for least sqIares problels;
f d ' ffi It ti lan the correspondmg problem for Imear system.
ar more I cu f ., ., Il b-
see wh let i be a purported solution of the problem 0 mllllfmzmg .
A y, II t A - b - Ai What we would like is to find a perturbatIOn
xlb allC e l' - . . . ., li b
E suh that x is an exact solution of the problem f mmlmlzmg E f:
( A + E ) x ll == Il rlb. For linear systems all we need do IS produce an
bl h ever we must choose
which T is zero. For the least squares pro em, ow,. E' i e.
E so t.hat t.he rcsidual is orthogonal to the .column space of + ut;o
I t ( 1 - t- E ) "1' = O. Since T is defincd m terms of E, tillS eq <
so t Ja, j I . lit' 0 IS
is nonlinear in E, and at present we know on y specIa so u I I .
Theorem 5.5. Let x be given. Let x = Atb, l'
b - Ai" Then x = (A + Ei)tb (i = 1,2) for
rfHX
EI = - Ilfll '
b - Ax, and f
in which case
IIx H flb
IIEllb = IIflb '
and for
(f - r')i H
E 2 = Ilill '
5. LEAST SQUARES PROBLEMS
161
in which case
(1Ifll - IIrll)
1I.7:lb
li E II = Ilf - rib
2 2 lIill 2
(5"14)
Proof. The proof for EI is a straightforward, if tedious, verification
that (A + E)Hr = O.
For IIE211, note that f - r' = A(x - i) E R(A). Hence R(A + E 2 ) c
R(A). But it is easy to verify that b - (A + E 2 )i = r. Hence
b - (A + E 2 )i E R(Ah C R(A + E 2 )J.,
which is sufficient for i to be a solution of the perturbed least squares
problem.
The first equality in (5.14) follows on taking norms. The second
follows from the Pythagorean equality and thc observation that since
f E R(A), we have r .l r. .
The perturbation E 1 and its norm can be computed if we are given
.i. It is small when the residual r is nearly orthogonal to the column
'8pace of A. The perturbation E 2 cannot be computed, since it involves
"the true residual 1', which is not known. However, it has the theoretical
:consequence that there is little use hunting for the exact minimizing
z. Provided the residual is nearly minimal, the approximate solution
%, however inaccurate, is the exact solution of a slightly perturbed
-!if_',,,
problem.
}' The matrix E 2 is only one of a class of backward perturbation theo-
,'rerns. For example, if it is desirable that some of the columns of A not
.e altered by the perturbation, we can proceed as follows. Let j; be the
. tor obtained from i by setting to zero the components corresponding
the columns that are not to be disturbed. Then
. (f - l' )j;H
E = II£II
the required matrix. Of course 11£112 :s lIill 2 , so that IIElb IIE 2 112;
owever IIEII2 may still be small enough for practical purposes.
The attempt to formulate a structured backward pcrturbation the-
rem for the least squares problem leads to an intractable optimization
" lem. However we may apply the OettliPrager theorem (Theo-
2.17) to the expanded equations to get the following useful result.
"e proof is left as an exercise.
162
III. LINEAR SYSTEMS AND LEAST SQUARES
( k) Let ' 0 E R rnxn and s E R rn be nonnegative.
Theorem 5.6 Bjorc. ,.J
For b, 1 E ern and x E en let
{ 11 + A.i: - bl i IAlIfli }
f = max IIlflx (Slxl + s); , mfx (STlfl)i .
( here % = 0 and otherwise p/O = 00). If f "# 00, then there are
. lEI I F I < fS and a vector e satisfying
matrices E and F satisfYlllg ,
le\ ::; fS such that
CA : F)H A E )( :) (b c )
This theorem does not say that f and x are solutions of a slghty
perturbed least square problem, since the perturbation E in A IS chf-
ferent from the perturbation F H in A H . Nonetheless, by Remark 5.4,
tl bound on E and F can be used in Theorem 5.3 to assess
Ie common
the accuracy of X.
5.4. Asymptotic Forms and Derivatives
An asymptotic form of the perturbed least squares solution x can be
obtained from (3.15):
.i = x - At P ER.T + RJ.E II P(AII)t x + (A H A)t RE H PJ.b + 0(IIE\12).
The corresponding derivative formula is
dA H dA H 2
d.T = _Atp dA Rx + RJ. _P(AH)t x + (AIIA)t R-PJ.b + O(IIEII ).
&
In reduced form
( XI ) = ( XI ) + ( -Al/ EllXI (tIAll)-1 E 21 b 2 ) + 0(IIEI1 2 ).
X2 X2 EI2 A ll XI
5. LEAST SQUARES PROBLEMS
IG3
Notes and References
The first perturbation analysis of the least squares problem is due to Golub
and Wilkinson [94, 1966], who gave first order bounds. They were the fin;t
to note the dependence of the solution on ",2. Rigorous upper bounds were
derived by Hanson and Lawson [102, 1969], Pereyra [180, 19 6 9], Stewart
[199, 1969], and Wedin [258, 1969]. More recent treatments have been given
by Lawson and Hanson [142, 1974] and Adbelmalek [1, 1974]. Van der Sluis
[245, 1975] gives an especially detailed treatment. He was the fin;t to point
out the mitigating effect of 71 in (5.3).
Strictly speaking, K, is not a condition number for the least squares prob-
lem- at least not in the simple sense we have been using the term. Nonethe-
less, it is called the conditiou Il1lmber of A everywhere iu the literat.ure.
Statisticians are coucerned with the effects of errors in the matrix A, a prob-
lem they treat under the names "errors in the variables" or "measurement
error models." One approach to the problem is to pose a probabilistic model
of the error and investigate its effects [116, 49, 80]. Another approach is to
compute "regression diagnostics" to tell when the error is having harmful
effects [22, 213]. Yet another approach is given in Exercise 5.8.
The structured perturbation theorem (Theorem 5.3) is due to Bjorck [36,
19 8 9], who also noted that it remains valid when the perturbations of A in
the augmented equatious are differeut (Remark 5.4). Arioli, Duff, aud de
Rijk [6, 1989] have Ilsed this fact t.o aualyze the errors in algorithms based
on the expanded equatious (1. 7).
Theorem 5.5 is due to Stewart [206, 1977], and Theorem 5.6 to Bjorck [36,
19 8 9]. The problem of obtaining optimal backward perturbation bouuds,
structured or uot, is unsolved. See [112] for further details.
1. Show that "'2(A Il A) = ",(A).
THE FOLLOWING EXERCISES USE FIRST ORDER PERTURBATION
THEORY TO EXPLORE THE SENSITIVITY OF LEAST SQUARES SO-
LUTIONS TO ERRORS IN INDIVIDUAL COLUMNS.
Let A be of full column rank and x = Atb. Let E = eli. Show that
:i: - x = -iAte + c;-l)ellr + O(lIell)'
lG/t
III. LINEAR SYSTEMS AND LEAST SQUARES
where c(-I) is the ith column of the cross-product matrix 0 = AliA and
'"
r = Ii - Al: is the residual vector.
3. Show that
11:1: - .7:112 :S IIell2 (liIIlAtI12)2 + (11c;-I)112I1rI12)2 + O(lIell).
4. Show that
Ii - il :S lIel12 Ila)t)II + (h'S-I)lll r I12)2 + O(llell),
where ajt) is the jth row of At and IS-I) is the (i,j)-element of 0- 1 .
-0-
5. (A quick and dirty bound). Let A and A have full rank. Starting from
the normal equations
All Ax = Allb,
use the perturbation theory for linear systems to show that for any consistent
norm II . II
IIxlllIxll :S K,(A) \\:: (1 + I,':'D + K,2(A) \\:: ( 1 + \\D .
G. (Higham and Stpwart [114, JU87]) . Let A be of full colulIln rauk and let
o = All A. Show that if F is sufIiciently small then 0 + F can be written in
the form
0+ F = (A + E)II(A + E),
where
1
IIEIIF;S 2I1AtIl2IJFIIF'
Show that this bound is realistic.
7. The vector l' in Theorem 5.6 can be regarded as an arbitrary parameter.
Write down t.he bounds obtained when f = Ii - Ai: and f = O. [Note: the
problem of determining the optimal f is open.]
8. (Stewart [217, 1989]). Let A have full rank. Show that
x = At(1i + Ex) + 0(11E1I2).
Give an expression and a bound for the 0(IIEI1 2 ) term. [Note: To the
statistician, this result says that if E is small enough, the least squares
solution x behaves as if it came from an unperturbed problem in which the
error in t.he right-hand side has been inflated.]
Chapter IV
The Perturbation of
Eigenvalues
Of al the problems in matrix perturbation theory the perturbation of
the eIgenvalues of a matrix presents the most varied technical difficul-
ties. The problem itself is simply stated" Given a matrix A E e nxn and
a pertrbtion. E. of A, how are the spectra £(A) and £(A + E) related?
But thIs sImplIcIty is elusive. In the first place, the term "related" has
more than one natural sense, as we shall see in the first section of this
chapter. More important, different classes of matrices even matrices
having the same eigenvalues, behave differently under erturbation.
Example 1. Let Ao = O. Then the eigenvalues of Ao are all zero.
Let E be a perturbation of Ao. Since Ao + E = E, it follows from
Theorem II.2.6 that
A E £(At + E) ===* IAI :S IIEII"
for any consistent matrix norm II . ". On the other hand, let
I
17
A, ( )
165
lGG
IV. TIlE PERTURBATION OF EIGENVALUES
Then the eigenvalues of Al are also zero. However, if f > 0, the eigcn-
values of
A, ( ]
afe (tw" wlJCfe UJ(' Wi afe I.he primit.iw 4th roo(..'i of uni/,y O.e", 1, i,
-1, -i). Thus while a perturbation of order, say, 10- 8 wm induce a
perturbation of order only 10- 8 in the eigenvalues of Ao, it can induce
a perturbation of as large as 10- 2 in the eigenvalues of AI.
This example shows that a general perturbation theory for eigenval-
ues has to be pessimistic, since it must account for the ill-conditioned
behavior of the eigenvalues of matrices like AI' The cure for this prob-
lem is to develop individual perturbation theories for different classes
of matrices, which is what we will do in this chapter. We begin in Sec-
tion 1 with the general case. In Section 2 we introduce the very useful
Gerschgarin theorem and use it to compute the derivative of a simple
eigenvalue. In Section 3 we treat normal and diagonalizable matrices,
and in Section 4 Hermitian matrices. The chapter concludes with a
section on special topics.
As in the last chapter we will use the tilde conventions to
denote perturbations. Specifically, A wm denote a (com-
plex) matrix of order n, and A = A + E wm denote
a perturbation of A. The eigenvalues of A are £(A) =
{AI,'""' '\n} and those of A are £(A) = {I"" ,,n}' As
wm;]l I.he clwracterist.ic polynomials of A and A wm be
wriHen 1> A (,\) and 1> A (,\)"
1. General Perturbation Theorems
1.1. eontinuity: Ostrowski-Elsner Theorems
The first thing that we will establish about eigenvalues is that they are
continuous, which follows from a fact and a theorenL The fact is that
the characteristic polynomial of a matrix, being itself a polynomial in
. 1. GENERAL PERTURBATION THEOREMS
lG7
the elements of the matrix, is a continuous function of the matrix. The
theor.em is RouclH's theorem. In the form we will use here, it states
hat I cP and Tl are analytic in a simply connected region D and D c D
IS a dIsk for which
IT1(OI < 11>(01, (E av,
where a v is the boundary of V, then 1>( () and 1>( () + T/( () have t.he
same number of zpros in V.
Theorem 1.1. Let ,\ be an eigenvalue of A of algebraic multiplicity
m. Then for any norm II . II and all sufficiently small f > 0 there is
6> O.such that ifllEil < 8, the dsk D('\,f) = {( E e: Ie -,\1 < f}
contams exactly Tn eigenvalues of A. -
Proof. Let f be so small that V('\, f) contains only the eigenvalue ,\ of
A. Let T( () = 1> J (0 - 1> A (0. By the continuity of the characteristic
polynomIal, as A ---+ A the function Tl( 0 converges to zero on the
compact set av. Since 1>A(() is nonzero on av, there is a 8 > 0
such that ITJ(OI < I1>A(OI on av whenever IIEII < 8" By Rouche's
theorem 1>A and1>A = 1>A + TJ have the same number of zeros in V. .
. Theorem 1.1 is an example of a qualitative perturbation theorem:
It states.that a perturbation must be small without providing a bound
on ,the sIze .of the perturbation. We now turn to a theorem of Elsner,
whIch provIdes explicit bounds. However, we first need to introduce
s?me notaion to describe how the the eigenvalues of two matrices are
sItuated wIth respect to one another.
Definition 1.2. Let A have ei g envalues ,\ \ d A - L .
- - I, . . . ,An an lJave eIgen-
values. '\1,... , '\n. Then the SPECTRAL VARIATION OF A WITH RESPECT
To A IS
(A - ) dcf -
SV A = mj'Lx IIVnl'\i - '\j I" (1.1)
The HAUSDORFF DISTANCE between the eigenvalues of A and A is
hd(A,A) max{svA(A),svA(A)}. (1.2)
MATCHING DISTANCE between the eigenvalues of A and
md(A, A) f lIn{mj'Lx I)."(i) - '\d},
,where 11' is taken over all permutations of {l, 2,..., n}.
(1.3)
168
IV. THE PERTURBATION OF EIGENVALUES
The function sv A (A) is not a metric: it may be zero, even whel: the
eigenvalues of A and A are different (e"g., when n = 2 and Al = Al =
2 = a while A2 = 1). Geometrically, the function sv has the following
interpretation.
If
Vi = {(: I( - Ad sVA(A)},
i = 1,."., n,
then
n
£(A) C U Vi.
i=1
In other words, the eigenvalues of A lie in the union of disks of radius
sv II (/\) ('cntcI"I'd at the cigcnvalues of A"
The Hausdorff distance hounds the spectral variation and is actually
a metric. The matching distance bounds the Hausdorff distance and
is also a metric. To say that the matching distance is small is one of
the nicest things that can be said of the eigenvalues of a matrix and its
perturbation. It means that they can be grouped into nearby pairs.
We are now in a position to bound the Hausdorff distance between
two matrices.
Theorem 1.3 (Elsner). For any A and A,
1
hd(A, A) (IIAII2 + IIAII2)1- IIEII.
(1.4)
Proof. Since the right-hand side of (1.4) is symmetric in A and A,
it is sufficient to prove that it bounds sv A(A). Assume the maximum
in (1.1) is attained for the eigenvalue of A, and let XI,..., x n be
orthonormal vectors with AXI = XI' Then
sv A (,1)n [L IAi - I
= det(A - I)
[L II(A - J)X;j12 [Hadamard inequality (1.2.2)]
= II (A - A).Tllb Di>III(A - I)xiI12
IIElb(IIAII2 + IIAII2)n-l"
The result follows on taking nth roots in the above inequality and from
the symmetry of the resulting bound. (:J
1. GENERAL PERTURBATION THEOREMS
169
As we have mentioned above, the most desirable bound is one on
the matching distance. In some cases bounds on the spectral variation
or the Hausdorff distance can be converted into such a bound" Since
the technique, with appropriate variations, can be applied to other
problems, we will develop it informally and then summarize the results.
Let us begin by relaxing our bound a little and writing
_ 1
sv A(A) /LIIElif == f,
where
Il = (max{2I1A + TEII2: 0 T 1})I-.
As above, set
Vi = {( : I( - Ad :::; f},
i=l,"..,n.
The purpose of this adjustment is to make the bound monotone in T E.
We claim that
if any m of the disks Vi are isolated from the others, then
their union contains exactly m eigenvalues of A.
To see this, assume without loss of generality that the m disks isolated
from the others are VI, V 2 , . . . , V"," For a :::; T 1, let AT = T A + (1 -
T)A = A + TE, and let
1
VT) = {(: I( - Ail /LIiTEIlf}.
Since
IIAII2 + II A t au ll2 Il,
- lc 1
by Theorem 1.3, we have sv A (AT) /LIlT EII2' = Tn f, and the eigenval-
ues of AT lie in the union of the disks VT).
N ow UZI Vo) contains exactly m eigenvalues of Ao = A, namely
AI(A), A2(A),..., A",(A). Since Tf is an increasing function of T, as T
varies from zero to one the region U;1 V;T) remains disjoint from the
other disks. Since by Theorem 1.1 the eigenvalues of AT are continuous
in T, they cannot jump from one disjoint regio to nother. Hence
U1 V?) must contain exactly m eigenvalues of Al = A.
170
IV. THE PERTURBATION OF EIGENVALUES
It is now easy to obtain a bound on mcl(A, A} Let C 1 , C 2 , . . . , C k be
the connected components of Ui'=l Vi" If C 1 is the union of ml of the disks
Vi, then it contains exactly m{ eigenvalues of A and m{ eigenvalues of
X Choose the' permutation 11' to associate the eigenvalues of A in each
C 1 with tlJ(' corresponding eige'nvalues of A" Since each eigenvalue of A
in C{ is within (2ml - 1)6(A, A) of any point in C{, each eigenvalue of A
in C 1 is within (2m/ - 1)6(A, A) :s: (2n - 1)6(A, A) of the corresponding
eigenvalues of A.
We have just established the following theorem.
Theorem 1.4 (Ostrowski, Elsner).
- - III
md(A, A) :s: (2n - 1)(IIAII2 + IIAlb) -;;-IIEII2'.
Actually, we have only used the fact that Theorem 1.3 gives a
bound on sv II (A} Elsner, by an application of Hall's thearem (The-
orem 11.3.14), has shown that the factor 2n - 1 can be replaced by
2ln/2J. To summarize:
Theorem 1.5. LetT 2 O. Ift3(T) is a nondecreasing bound onsvA(A+
T E), then
md(A, A) :s: (2n - 1)13(1).
If t3(T) is a nondecreasing bound on hd(A, A + TE), then
md(A, A) :s: 2ln/2Jt3(I).
(1.5)
1.2. The Bauer-Fike and Henrici Theorems
As was pointed out in t.he introduction to this chapter, any general
perturbation bound on the eigenvalues of a matrix will have to be pes-
simistic. In Theorems 1.3 and 1.4, this shows itself by the fact that
6(A, A) is proportional to the nth root of the error v(A, A). Exam-
ple 1- or rather a trivial extension of it - shows that this nth root is
necessary. However, in most ca.<;es it is unrealistic.
To see one way in which this can come about, let us return to
Example 1 and set A" = 17AI, where 17 is presumed small. Let E be
given with IIEII2 = E. If E is much smaller than 17, Elsner's bound will
gp]J('rally be' of the' right ordpr, unlpss E has special structure. On the
othe'r hand, if E 2 1/, then IIA" + Elb :s: 1/ + E :s: 2E, and no eigenvalue of
1. GENERAL PERTURBATION THEOREMS
171
A1} can change by more than 2E. The reason that the fourth root of the
error is unrealstic in the second case is that A" + E can be regarded
as a perturbatIOn of the zero matrix, which is well behaved.
In this subsection we will derive a bound, due to Henrici, that takes
this phenomenon into accollnt" It is based on a general theorem of
Bauer and Fike.
The?rem 1.6_ (Baur-Fike). Let Q be nonsingular, and let II . II be
consIstent. If A E £( A) is not an eigenvalue of A, then
IIQ-I(A - I)-IQII-I :s: IIQ-I EQII.
( 1.6)
Proof. We have
Q-I(A - 5..I)Q = Q-I[(A - 5..1) + E]Q
= Q-I(A - 5..I)Q{J + [Q-1(A - 5..I)-IQ][Q-1 EQ]}.
Since A - 5..£ is singular,
1:S: II[Q-I(A - 5..I)-IQ][Q-I EQJlI
:s: IIQ-I(A - 5..I)-IQIIIIQ- I EQII,
(1. 7)
and this last inequality is equivalent to (1.6). .
_ Note that if the left-hand side of (1.6) is regarded as zero when
A E £(A), then the inequality holds for all eigenvalues of X In the
sequel we will not be over fussy in dealing with trivial singularities of
this kind.
Our first application of the Bauer- Fike theorem is to prove Henrici's
perturbation theorem. It is phrased in terms of a deviation from nor-
mality. Recall that if a matrix is normal, its Schur form is diagonal.
Consequently the size of the off-diagonal terms in the Schur form can
be used to measure the departure of a matrix from normality.
Defini tion 1. 7. Let v be a norm on e nx n. Let U be the set of unitary
U such that U ll AU is upper triangular. For each U E U write U II AU =
Au + Ru, where Ru is strictly upper triangular. Then the 1/- DEPARTURE
FROM NORMALITY of A is the number
6v(A) '1,gf min v(Ru).
UEU
172
IV. THE PERTURBATION OF EIGENVALUES
The departure from normality is not easy to calculate, since the
Schur form is not unique. However, if A has eigenvalues Ai, then by the
unitary invariance of the Frobenius norm we have for any Schur form
IIAII} = L 1.\;1 2 + IIRII}"
TllllS we have t.he following theorellL
Theorem 1.8. For any matrix A with eigenvalues Ai,
8 F (A) = V IIAII} - L I A iI 2 .
We are now ill a position to state and prove Henrici's theorem.
Theorem 1.9 (Henrici). Let v be a norm on enxn_such_that v(C) :::::
IIGlb for all G E e"x". Then for every eigcnvaluc A of A there is an
cigcnvaluc A of A sucll ilwt
(r < IIElb (1.8)
1 + ( I>'AI ) + .., + ( 1)'_AI ) n-1 - 8 v (A)'
6,,(A) 6v(A)
Proof. Let be an eigenvalue of A, and let U H AU = A + R be a Schur
form of A. Then by (1.6),
II(A - I + R)-11l2 1 IIElb.
(1.9)
Since R is strictly upper triangular,
(A - I + R)-I =
{I - (A - I)-I R + . . . + (-1)"-I[(A - I)-I R]"-I}(A - I)-I.
Thus if 8 = minp E £(A) : I - AI},
II(A - I + Rr l 1l2 8- 1 {I + 8- 1 8'J(A) + . . . + [8- 1 8 v (A)]"-I}.
Hence
ti
11(1\ - 5.1-1- R)-11I2- 1 ::::: 1 + 8,,(A)/8 +... + [ti,,(A)/8],,-1
(1.10)
1. GENERAL PERTURBATION THEOREMS
173
The theorem follows on combining (1.9) and (1.10) and dividing by
8 v (A). .
The remarkable thing about Henrici's theorem is that it provides a
conti1Uous traJ.lsition between the two cases mentioned at the beginning
of tillS subsectIOn: namely, the case in which the perturbation bound
is proportional to the nth root of the error and the case in which it is
proportional to the error itself. To see this let
,
'I/J(T/) = T/n/(1 + T/ +... + 1JnI),
(1.11)
so that the left-hand side of the bound (1.8) has the form 'I/J[I _
AI/8 v (A)]. For T/ small, 'I/J(11) T/ n , and the bound takes the asymptotic
form
- 1
sVA(,1) < ( "Elb ) ""
ti'J(,1) '" ti,,(,1)
When T/ is large, 'I/J(T/) T/, and the bound takes the asymptotic form
sv A(A) :s IIEII2'
Specifically, we have the following corollary.
Corollary 1.10. If IIElb/ti,,(A) < n- I , then
- 1
sVA(A) l ( "EI2 ) ""
<nit -
8v(A) - tiv(A)
If IIEII2/ti v (A) > 1, then
sv A(A) IIEII2 + 6 v (A).
Proof. If 'I/J(11) < l/n, then T/ < 1. Hence if IIEII2/8,,(A) < n-I,
n- I ( SVA(A) ) " < 1/; ( SV A (,4) ) < IIEII2
6v(A) - 6v(A) - 6v(A)'
(1.13 )
(1.12)
from which (1.12) follows. On the other hand, if '1/;(11) > 1, then T/ > 1
and
'l/;(11) = 1 + -I + 11 + _( -I) ::::: 11(1- 11- 1 ) = 11- 1.
11 . . . 11 n
174
IV. THE PERTURBATION OF EIGENVALUES
Hence if IIEI12/o//(A) > 1,
sv A (,1) < 'IjJ-I ( IIElb ) :S IIElb + 1,
o//(A) - o,,(A) oAA)
which is equivalent to (LI3). .
Using Theorem L5 and the monotonicity of 1j}, we have the following
bound on md(A, A).
eorollary 1.11. Let 1j} be defined by (1.11). Then
- -I ( IiElb )
md(A, A) :S (2n - l)o,,(A)1j! o//(A) '
An unsavory aspect of Henrici's theorem, one that it shares with
Theorems L3 and 1.4, is the nth root of the error in its bounds. The
examples that show that it's presence is necessary all deend on the
matrix having a Jordan block equal to its order. The followlIl theorem
shows that for a matrix with smaller Jordan blocks the root IS smaller.
Its proof is similar to Henrici's theorem and is left as an exercise.
Theorem 1.12. Let Q-IAQ = J be the Jordan canonical form of A.
Let m be the size of the largest Jordan block hl J. Then for any
eigenvalue E A( A) there is an eigenvalue A of A such that
_ I - A1 111 _ :S IIQ-I EQII2'
1 + IA - A\ +... + IA - A11I1-1
(LI4)
1.3. Residual Bounds
Let the columns of X form a basis for an invariant subspace of A. From
Theorem}3.9, we know that there is a unique matrix M (which is now
easily seel; 'to be X t AX) such that
AX - X M = O.
The matrix M is the representation of A on R(X) with respect to the
basis X, and hence the eigenstfllcture of M is a substructure of the
eigenstructure of A.
1. GENERAL PERTURBATION THEOREMS
175
Now suppose that the columns of X span a subspace that is only
approximately invariant. For example, X may come from a numerical
algorithm for approximating invariant subspaces. Then for any AI the
resid ual
R = AX - X M
(1.15 )
is nonzero, although presumably with a proper choice of M it can be
made small. An important problem in perturbation theory is: Given
some norm of R, determine how near R(X) is to an invariant subspace
of A and how the eigenvalues of M relate to those of A. We will consider
the invariant subspace problem in the next chapter. Here we will focus
on the eigenvalue problem.
The key tool in our investigation is the following backward pertur-
bation theorem. Its proof, which is purely computational, is left as an
exercise.
Theorem 1.13. Let A E e nxn , X E e nxp , and ME e pxp . Let R be
defined by (1.15). IfyH is any matrix satisfying yH X = I and
A = A - RyH
,
(LI6)
then
AX - X M = O.
The theorem says that if R is small then R(X) is an exact invariant
subspace of a matrix A that is near A - in fact within IIRyHl1 of A in
any norm 11.11. Moreover, M is the representation of A on R(X), and
its eigenvalues are therefore eigenvalues of A. Since we know II Ell, we
may use any appropriate perturbation theorem for eigenvalues to assess
the accuracy of the eigenvalues of M. For example, from Corollary 1.11
we have the following corollary.
eorollary 1.14. Let /11, . . . , /1p be the eigenvalues of M. Then there
are eigenvalues Ah , . . . , Ajp of A such that
l/1i - Aj,l :S (2n - l)o//(A)'IjJ-l C'/;12 ) .
The problem of choosing M and Y to minimize IIRyHIl still remains.
In general, the problem is intractable; however, for unitarily invariant
norms it has an elegant solution.
I7G
IV. TilE PERTUHBATION OF EIGENVALUES
Theorem 1.15. In the notation of Theorem 1.13, assume thai X H X =
I. Let II . II be a unitarily invariant norm. Then II RII is minimized [or
M = X H AX, and RyH is minimized [or M = X H AX and Y = X.
Proof. Let (X X,d be unitary. Then from (1.15),
IIHII = II(X Xd H RII =
( XII AX - M )
X.lAX
It follows from Corollary 3.8 that II HII is minimized when XII AX - M =
O.
To minimize II RyH II, note that yll X = I implies that Y = X +
X.lS, for some 5" Then
IIR} ""II = II(X X.l)1I R}r"(x X.l)1I
( XHAX - M (XHAX - M)SI! )
X.lAX X.lAXS H
f R vH . '" d I X H A X -
Again by Corollary 3"8, the norm OIlS mllllmlze w len
!11 = 0 and 5 = O. .
Notes and References
PerdIrbation theory for eigenvalues comes in two flavors. In this book we
consider comparatively unstructured errors and attempt to bound the per-
turbations in terms of some norm of the errors. Other approaches impose
some structure on the errors; for example, they may be analytic functions
of a complex variable. The problem is then to determine how this structure
affects tlH' pertmbed eigenvalues: e.g., when are they analytic functions of
the variable, what kind of paths do they follow in the complex plane? For
more result.s of t.his kind see tJH' books by Kato [135] and Baumgartel [17].
The approach taken here is the one generally followed by numerical analY,sts;
for example, Householder [121, 1964] and Wilkinson [269, 1965]. PartIcu-
lar mention should be made of the elegant little book by Bhatia [28, 1987],
which is a rcquired supplement to this chapter.
RouchC's theorem may be found in most texts on complex analysis. As Ex-
amplc 1 shows, we cannot expect much more than continuity in the eigen-
valucs, at least. for defcctivc cigenvalues. However, the reader should not
1. GENERAL PERTURBATION THEOREMS
177
conclude from this example that all perturbations of defective eigenvalues
are multiples of primitive roots of unity. A counterexample is given in the
exercises.
The term "spectral variation" is found in Henrici [109, 1962] but may be
of earlier vintage. The Hausdorff distance between two sets may be found
in the second edition of Hausdorff's famous book on set theory [104, 19 1 4].
In general, the Hausdorff distance is a metric only over the class of closed
bounded sets, which is just what the set of eigenvalues of a matrix is. The
term "optimal matching distance" seems to be due to Bhatia [28], although
thc concept. has been around for some time (e.g., Henrici calls it. t.he eigen-
value distance).
The first. general perturbation bounds for eigenvalues were given by Os-
trowski [169, 1957]. Theorem 1.3 is due to Elsner [66, 1985], who also shows
that the bounds are in some sense the best possible (Exercise 1.4). It is
perhaps significant both Ostrowski and Elsner use Hadamard's inequality in
deriving their bounds.
The fact that one can count the eigenvalues in the connected components of
inclusion regions provided by the bound on the spectral variation was first.
noted by Gerschgorin [85, 1931] and also by Ostrowski [169, 1957], who used
it to establish the "271 - I" bound on the matching distance. Thc "2ln/2j"
bound is due to Elsner [65, 1982].
Bauer and Fike [15, 1960] have not. been treated fairly in the literature.
Their names have become associated with a weak corollary of Theorem 1.6
(Theorem 3.3), which is frequently trotted out as a straw man by people who
have not read the original paper. The generality of their technique makes it
applicable in a variety of situations (e.g., Henrici's t.heorem, Gerschgorin's
theorem in the next section, and also a useful theorem of Demmel [55]).
Henrici's theorem [lOg, 1962] is but one of many results that Henrici casts in
terms of the depart.ure from normality. The observation (Corollary 1.10) that
the theorem provides a smooth t.ransition from nonlinear to linear behavior
in the bounds is new.
For a single eigenvalue, Theorem 1.13 may be found in Wilkinson's book [269,
19 6 5]. The optimality conditions of Theorem 1.15 are part of the folklore,
at least for the Frobenius norm. The observations that the conditions are
optimal for all unitarily invariant norms appears t.o be new.
In some applications we may have, in addition to a residual for an approxi-
mate invariant. subspace, a residual for t.he corresponding left invariant sub-
178
I V. '\'IIE PERTURBATION OF EIC:ENVALUES
1. GENERAL PEIUUrWATION TIIEOREMS
17U
space. Kahan, Parlett, and Jiang [134, 1982] Rhow how to use this infonna-
tion to derive a backward perturbation theorem (See Exercise 1.12).
9: (A structured backward pert.urbation theorem for eigenvectors). Let r =
(AJ - Ai) and let. 5 be nonnegative. Set.
Exercises
]" Show t.hat the I'ienvalues of the matrix ./,,(0) + £lilT (i = 1,..., n) are
zero with multiplicity n - i and (+ times the primitive ith roots of unit.y.
2. (Wilkinson [269, p.80]). Let A = diag[h(O), h(O)]. Show that there is a
perturbation of A of order ( for which all the eigenvalues of the perturbed
5
matrix are of order (2.
Ipil
(= max- ( I _ I
' 5.7; )i
(Here % = 0 and otherwise p/O = (0). If ( oF 00, there iR a matrix E
satisfying lEI :s (5 such that (.>., i) is an eigenpair of A + E.
3. Let [A] be the equivalence class of matriceR having the same eigenvalues as
A. Show t.hat. the Hausdorff dist.ance and the matching distance are metrics
over the space of such equivalence classes.
THE FOLLOWING EXERCISES CONCERN BACKWARD PERTURBA-
TIONS WHEN RESIDUALS FOR LEFT AND RIGHT INVARIANT SUB-
SPACES ARE KNOWN. THEY ARE BASED ON A GENERAL THEO-
REM OF DAVIS, KAHAN, AND WEINBERGER ON DILATIONS [52,
19 82 ], WHICH IS ESTABLISHED IN THE NEXT TWO EXERCISES.
THE DILATION PROBLEM MAY BE STATED AS FOLLOWS. GIVEN
THE PARTITIONED MATRIX
4. (Elsner [66)). Show that _equality hlds in the bound (1.4) if and only if
A = wliAlbI (Iwl = 1) and A has -w11A1I2 as an eigenvalue.
5. (Kato [135, p.l09)). Let M(T) be an unordered n-tuple of n numbers
that depend continuously on the parameter T in an interval I. Show that
there are functions II'i ( T) (i = 1,..., n), continuous on I, such that M (T)
conRiRtR of ILl (T), .. . , Il" (T).
6. Establish the bound (1.5). [Note: This is a difficult problem. The idea is
to declare eigenvalues A and.>. relat.ed if they can be connected by a suitably
short chain of disks. One then applies Hall's theorem (Theorem 3.14) to the
0-1 matrix of this relation. See [65, 28] for details.]
A = ( All
A 21
A 12 )
A 22 '
DETERMINE A 22 SO THAT IIAII2 IS MINIMIZED. FOR A HISTORY
OF TilE PROBLEM AND APPLICATIONS, SEE THE ARTICLE JUST
CITED.
10. Let IIAlllb :s: v. Show that.
7. Show that if 11(.>.1 - A) ] 112 2 17, then there is an eigenvalue A of A
satisfying
II ( : ) 1/2 :s V
I'>' - AI :S 2(IIAII2 + 17-1)11-'
if and only if A 21 = K21(v 2 J - ArlAll), where IIK21112 :S 1. In particular
we may take K 21 = A21(v 2 I - ArIAll)t/2.
11. Let A be as above and let
8. (Henrici [109]). For any A
PF(A) :S i n3 1; n V IIAIIA - AAHIIF.
"'ax {II ( :: ) II; II(A" A.,)lb} " v
Characterize the matrices for which equality holds. [Note: Recall that A is
normal if and only if IIA II A - AAHIIF = O. It is therefore not surprising that
the size of IIAIIA - AAHIIF is related to the departure from normality. This
problem iR not an paRY.]
Then A 22 may be chosen so that IIAII2 :S v. In particular if K 21 = A 21 (v 2 1-
Arl A ll)t/ 2 and K12 = (v 2 I - A ll Arl)t/ 2 A I2 , then the most general form of
An is
A 22 = -K21A\KI2 + v(I - K2IKJi)C(I - KnKI2),
180
IV. THE PERTURBATION OF EIGENVALUES
where C is an arbitrary matrix satisfying IICII2 :::; ]. [Hint: Apply the
previous exprcise three times: twice to define K 21 and K I2 and once to the
partition
A = ( : ) .
N .IL, t hI' last st.pp is nont.rivial.]
12. (Kahan, Parld,t., and Jiang [134]). Let A E C/ X /. Let X, Y E c nx l'
have orthonormal columns, and assume that yH X is nonsingular. For any
M E Cl'Xl' let N = (yH X)-I M(yH X). Set
R = AX - XM and SH = yHA - NyH.
Then there is at least one matrix E such that
(A + E)X = XM and yH(A + E) = NyH.
MoreovC'r the smallest solution in the Frobenius norm satisfies
IIEIIF = V IIRII} + IISII} -llFull,
where Fll
satisfies
yHR
SH X . The smallest solution in the spectral norm
IIEII2 = max{IIRII2, IISI12}'
[Hint: Let (X Xd and (Y Yd be orthogonal and set
( yH ) ( Fu FI2 ) .
H E(X Xd =
Y.L F21 F22
Then show that only F 22 is free.]
-0-
2. Gerschgorin Theory: Differentiability
The results of the last section do not suggest a way to assign a condition
number to an eigenvalue. The problem is that eigenvalues associated
with a nontrivial Jordan block are not differentiable functions of the
elements of the matrix. However, this does not mean that individual
eigenvalues cannot behave in a locally linear fashion and hence have
condition numbers. This section is devoted to one of the most powerful
tools for probing the sensitivity of a single eigenvalue - the Gerschgorin
thporcm.
2. GERSCHGORIN THEORY
181
2.1. Gerschgorin's Theorem
trictly speaking, ?erschgorin's theorem is not a perturbation theorem;
It states that the eIgenvalues of a matrix lie in the lInion of certain disks
!n the complex plane. However, as we shall see in the next subsection,
It can be used to establish extremely accurate perturbation bounds"
There are several ways of establishing Gerschgorin's thearem. Here
we will approach it through the Bauer'Fike theorem of the last section.
Theorem 2.1 (Gerschgorin). For A E e nxn let
ai = L laijl
j#i
and
9i(A) = {z E e : Iz - aid::; ad.
(2.1 )
Then
n
£(A) c U 9 i (A)"
i=1
(2.2)
Moreover, if m of the GERSCHGORIN DISKS Qi(A) are isolated from the
otler n - m disks, then there are precisely m eigenvalues of A in their
Ulllon.
Proof. Let D = diag( au, a22, . . . , ann). In the Bauer Fike theorem
make the following substitutions:
Q
A
A
II. II <-11.1100'
<- I,
<-D,
<-A,
Then it is easy to verify that the first inequality in (1. 7) is equivalent
to saying that each eigenvalue of A lies in a Gerschgorin disk.
The proof of the second part of the theorem uses the techniques
developed in the previous subsection and is left as an exercise. _
The following illustrates how mllch an improvement Gerschgorin's
heorem cn be over Elsner's thearem. It also illustrates a deficiency
m the straIghtforward use of Gerschgorin theorem.
182
IV. TilE PERTURBATION OF EIGENVALUES
Exalnple 2.2. Consider the matrix
( 1 10 2 - 4 ) .
A = 10- 4
(2.3)
Regarding A as a perturvation of the matrix diag(l, 2), we find from
Theorem 1.3 that one eigenvalue must lie in the interval [1- 0.021,1 +
0.021] and the other in the interval [2 - 0.021,2 + 0.021] (ac.tually the
thf'orem yields intervals that are just barely greater than "04 mlength).
On the ther hand, vy Gf'rschgorin's theorem each of the intervals
[1- 10- 4 ,1 + 10- 4 ] and [2 - 10- 4 ,2 + 10- 4 ] must contain an eigenvalue
of A. Thus Gerschgorin's theorem is vetter than Elsner's vy more than
two orders of magnitude.
However, the eigenvalues oE A arc approximately 1 - 10- 8 and
2 + 10- 8 . Thus Gcrschgorin's theorcm is still off by four orders of
magnitude.
It is worth noting that in the above example we have replaced disks
in the complex plane with intervals on the real axis. The ratiolale
for this is the following. The two disks - either the ones provIded
by Elsner's's theorem or by Gerschgarin's theorem- contain only oI.le
eigenvalue each. Since A is real, its complex eigenvalues must occur III
complex conjugate pairs. Hence the eigenvalues in the disks must, be
real and are contained in the intersection of the disks with the real lIne.
2.2. Diagonal Similarities
Example 2.2 shows that the bounds provided by Gerschgorin's theorem
need not be very sharp. Now in principle there is no reason why Gersch-
gorin's theorem should provide sharp bounds. However, te matrix A
of (2.3) has a special structure; it is almost diagonal, and It turns out
that we can exploit this structure to obtain sharper bounds.
The general technique is seen at its simplest with the matrix of
Example 2.2. Let Do = diag(o:, 1), and let
At = Da AD ;:/ = (
1
10- 4 0:- 1
1O40: ) .
2. GERSCHGORIN THEORY
183
Since Aa is similar to A it has the same eigenvalues; however, A and
At have different Gerschgorin disks. As 0: becomes small the first disk
shrinks, while the other grows. Eventually, the second disk expands to
engulf the first, but until it does, the first provides an ever-improving
bound on the eigenvalue. In particular, as long as
10- 4 0: + 1O40:-1 < 1,
the two Gerschgorin disks will remain isolated. It is easy to see that
this will be true as long as 0: is just a little greater than 10- 4 , say
0: = 1.01 . 10- 4 . This isolates an eigenvalue of A in the interval [1 _
1.01.10- 8 ,1 + 1.01 . 10- 8 ], which is a very sharp bound.
Of course this is a trivial example. However, the technique it illus-
trates -- that of reducing one Gerschgorin disk until the others over-
whelm it - is widely applicable. We will give another example in the
proof of the following theorem.
Theorem 2.3. Let), be a simple eigenvalue oE the matrix A, with right
and left eigenvectors x and y, and let :4 = 1 + E be a perturbation of
A. Then there is a unique eigenvalue). of A such that
- yll Ex
). = ). + II + O(IIEII2).
Y x
(2.4)
Proof. Let (j > 0 be the distance between). and the other eigenvalues
of A. Let J = yH AX be the Jordan canonical form of A, in which
the superdiagonals arc equal to (j/3 or zero [see (1.3.3)]. Note that the
first columns x and y of X and Yare the right and left eigenvectors
corresponding to )., and since yH = X-I we have yHx = 1; i.e", the
denominator in (2.4) is nonzero.
Now consider the matrix j = yH(A + E)X. This matrix has the
form illustrated below for n = 5:
). + yHEx E E E E
E IL T E E
J= E E IL T E
E E E IL T
,
\ E E E E IL
2. GERSCHGORIN THEORY
185
184
IV. TilE PERTURI3ATION OF EIGENVALUES
Here we have used f to stand generically for a quantity bounded by
IIYIIIIEIIIIXII; 11 for an eigenvalue of A other than A plus f; and 7 for a
quantity bounded by f + 8/3. By a diagonal similarity transformation,
we may replace .1 by a matrix of the form
eorollary 2.4. Under the hypotheses of Theorem 2.3 the eigenvalue
A is a differentiable function of A. Moreover,
A + yHgr Ctf Ctf ()'f Ctf
a:-'I f f.L 7 f f
J" = Ct-If f f.L 7 E
Ct-IE f E Jl 7
Ct-IE f E E f.L
OA i/ij
- -
OCt;j yH x .
(2.6)
Proof. By definition a function f(A) is differentiable if there is a linear
operator f such that f(A + E) = f(A) + f(E) + o(IIEII). Equation
(2.4) exhibits such an operator for the eigenvalue A: namely, E I-> yH1Px .
y x
To establish (2.6), note that .
Now the first Gf'fschgorin disk of .1" has center A + yH Ex and radius
hOlIl\(jpd hy (1/-1)nc The othl'r disks havl' cent.er Ji and radii boulICled
by n-If + T + (1/ - 3)E. I1cllce if
OA = lim A(A + 71i1J) - A(A)
O()'ij TO T
But by (2.4)
Ct-IE + 8/3 + nf + (71. - I)CtE < 8,
the first Gerschgorin disk will be disjoint from the others.
N ow let E be small enough so that
2 8
-8 - nE > -.
3 2
(2.5)
Hl"l T x
A(A + 71i1J) - A(A) = 7 Y j + 0(7 2 )
y X '
provided we require E to be so small that
16m 2
<1.
In this case, the radius of the first Gerschgorin disk is bounded by
41/f 2 /8 = 0(E 2 ). Since this isk is centered at A + yH Ex, the unique
eigenvalue it contains is om A. _
An immediate consequence of Theorem 2.3 is that the simple eigen-
values of a matrix are differentiable functions of the elements of the
matrix"
from which the result follows immediately. _
The proof of Theorem 2.3 is almost as interesting as the theorem
itself, since it gives us insight into the factors that make the higher
order terms important. Specifically, we require that terms involving
E/8 be sufficiently small. The denominator 8 shows that if a simple
eigenvalue is near its neighbors, the range of perturbations for which
the derivative provides an adequate approximation will be restricted.
The size of the numerator depends not only on E, but on the sizes of
the reducing transformations X and Y. If these are large, we again
can expect higher order terms to become significant. It is worth noting
that according to the remark following (1.3.3), a small value of 8 will
tend to aggravate this effect.
Equation (2.4) can be written
Then if
8
nE0 2 - -0' + E < 0,
2
the inequality (2.5) is satisfied. This latter condition will be satisfied if
4E
0=6'
= yH(A + E)x + 0(IIEII2).
yllx
(2.7)
The quantity yll(A + E)x/yHx is called a RAYLEIGH QUOTIENT, and
one way of stating the theorem is to say that the Rayleigh quotient
provides a first-order approximation to the perturbed eigenvalue. We
18G
IV. THE PERTURBATION OF EIGENVALUES
will generalize the notion of a Rayleigh quotient in Section V.2, where
we will give explicit bounds for the second order terms.
The theorem also provides us with a condition number for a simple
eigenvalue. We see from (2.4) that
I). - AI < lIyll . II xII II Ell
rv Iyllxl J,
for any consistent pair of matrix and vector norms. Thus the quantity
v=
lIyll.llxll
lyHxl
(2.8)
is a condition number for A.
When 11.11 = 11.112, the number v is the secant of the angle between
x and y" It is one when .r and y lie in the samf' dirf'ction, and grows
un!Jonndpdly as .7: and y approach ort.hogonalit.y" Notf' t.hat if A iR simple
its left and right eigenvectors cannot be orthogonal, although it is easy
to construct examples where they are as close to orthogonality as we
like. Also note that the left and right eigenvectors corresponding to a
nontrivial Jordan block have to be orthogonal.
Notes and References
GerRchgorin [85, 19;Jl] established hiR theorem aR a corollary to the theorem
that a diagonally dominant matrix is nonsingular. In particular, the union
of the Gerschgorin disks of A is the complement of the set of all ( for which
(I - A is diagonally dominant. In a restricted form the diagonal dominance
theorem is due to Levy [146, 1881], and in a general form to Desplanques
[58, 1887]" The theorem kept getting itself rediscovered until Olga Taussky
put a stop to it with a paper appropriately entitled A Recun'ing Theorem on
Dctenninants [n9, 1949]. Rohrbach [187, 1931] used the technique to estab-
lish eigenvalue bounds but did not define the regions now called Gerschgorin
disks.
Actually, the theorem stated by Gerschgorin is not true unless the matrix is
irreducible (see Exercise 2.7).
More generally, if 7r is any proposition such that 7r(A) is true if and only
if A is nonsinp;ular, tlH'n the complement of the set {( : 7r( (I - A) is true}
C!mtainR all the pip;envalues of A. By varying 7r one can p;et different regions,
some of which are treated in the exercises.
2. G ERSCHGORIN THEORY
187
In his paper, Gerschgorin noted that the union of k isolated disks contains
exactly k Eigenvalues. Although the idea of using diagonal similarity to
reduce the radius of an isolated disk is due to Gerschgorin (and in a different
sense to Rohrbach), it was Wilkinson [269, 1965] who refined the technique
and applied it to a variety of problems. Although ad hoc techniques for
reducing the diameter of an isolated disk suffice for most applications, there
are algorithms for determining the optimal disk [253, 154].
Exercises
1. A matrix A is strictly diagonally dominant if
100iil > :L 100ijl,
#i
i = 1,..., n.
Show that a strictly diagollally dominant matrix is lIonsingular and us( this
fact to prove Gerschgorin's theorem.
2. Let Ax = Ax, and suppose Ijl:::: Iil (i = 1,...,n). Show that A lies in
the Gerschgorin disk centered at O'jj.
3. (Ostrowski [167]). Let Pi = L#i 100iji and "Ii = L#i 100jil. Show that if
for some T E [0,1]
I(Xi;! > Ii pI T,
i = 1,... ,n,
then A is nonsingular.
4. (Ostrowski [167]). In the notation of the last exercise suppose that
I II I > T I-T T I-T
°ii O'jj Ii Pi Ij Pj ,
i,j=I,...,n, i¥j.
Show that A is nonsingular.
5. (Qi [182]) . Let A be of order n. Let
{ji = max { :L 100;jl,:L IOj;! } ,
#i #i
i = 1,..., n.
Show that the singular values of A lie in the union of the intervals [lO'ii 1 _
{ji, 100iil + {ji] (i = 1,..., n).
188
IV. THE PERTURBATION OF EIGENVALUES
6. (Feingold and Varga [72]). Let A ue partitioncd in the form
( All
A 21
A= .
Au
Alk )
A 2 k
Ak '
Al2
An
A k2
and let II .11 be a consist.ent norm. Show that if A is an eigenvalue of A, then
for some i
II(AI- Aii)-lll1 ::; L IIA;jll.
Hi
7. A square matrix A is fiEDUCIBLE if there is a permutation matrix P such
that
pT AP = ( AI
Al2 )
A22 '
I A and A 22 are sl l uare. Show that an irreducible diagonally domi-
w lere II . . 1 d . t'
nant matrix for which at least one of the diagonals IS stnct y omlllan IS
l1ousil1/!,ula L
8. (Taussky [238]). Let A be irreducible. Show that A E £(A! lies on the
boundary of one Gerschgorin disk, then it lies on the boundanes of all the
Gerschgorin disks.
9 L t A - X J ( A ) yH + ... + XkJ m (Am)Y, be the Jordan decom-
. e - I 11'1 I I 'k 'f E .
position of A, and assume that Al has multiplicity_ mI. Sho.w that I IS
sufficiently small there are exactly m eigenvalues of A that are III £[J m1 (Ad+
yllg"YI + O(IIEI12)J.
10" (Wilkinson [271]). Let 1: and y be left and right eigenvectors crrespon-
ing to t.he simple eigenvalue A. Let () = L(.T, y). Show that there IS a matnx
E satisfying
IIEI\2 < cot ()
IIAII2 -
such that A is a multiple eigenvalue of A + E. Otherwise put, if. a imple
eigenvalue of a matrix has a large condition number, then the matnx IS near
one with a multiple eigenvalue.
3. NORMAL AND DIAGONALlZABLE MATRICES
189
3. Nonnal and Diagonalizable Matrices
A normal matrix is any matrix satisfying AHA = AA H . From this it
follows that Hermitian matrices, skew Hermitian matrices, and unitary
matrices are all normal. Given the importance of this class of matrices,
it is natural to seek a special perturbation theory for its eigenvalues.
The main complicating factor here is that normal matrices, unlike Her-
mitian matrices, can have complex eigenvalues which cannot be ordered
by size. Nonetheless, normal matrices' have enough structure to enable
us to prove the striking Hoffman--Wielandt theorem.
Since any normal matrix can be diagonalized by a unitary trans-
formation, the normal matrices are special cases of diagonalizable ma-
trices; that is, matrices that can be diagonalized by similarity trans-
formations (these matrices are sometimes called normalizable). In the
second subsection we will treat the perturbation of eigenvalues of diag-
onalizable matrices.
3.1. The Hoffman-Wielandt Theorem
In Section 1 we saw that it is relatively easy to obtain bounds on the
spectral variation sv A (ii) of a mat.rix ii with respect to A. Although it
is usually possible to escalate such a bound into a bound on md(A, A),
we pay the price of a factor of 2n - 1 in the bound. The essence of the
HoffmanWielandt theorem is that when A and A are normal we do
not have to pay such a price to get a bound on
md 2 (A, A) f min L vl l).,,(i) - Ai1 2 ,
" "
(3.1 )
where 7r ranges over all permutations of the integers 1,2,.. . , n. (The
subscript 2 refers to the 2-norm. In this notation the usual matching
distance is md oo .)
Theorem 3.1 (Hoffman-Wielandt). Let A and A be normal. Then
md 2 (A, A) :::; IIA - AIIF,
(3.2)
where md 2 (A, A) is defined by (3.1).
190
IV. TilE PERTURBATION OF EIGENVALUES
3. NORMAL AND DIAGONALIZABLE MATRICES
191
Proof. Since II . IIF is unitarily invariant, we may assume that A =
A = diag(A\,..., \,). Let A = wAw H , where W is unitary and
A = diag(I' . . . , n)' We will have established the theorem if we can
show that II A - V A VIIIIF, regarded as a function of the unitary matrix
V, is minimized when V = P" is a permutation matrix corresponding
to some permutation 7r. For in that case,
It follows that if 7r is the permutation for which 1/}( P,,) is maximal, then
1/J( S) :S 1/J( P,,). Hence rp( P,,) is also maximal, and 7r is the permutation
required by the theorem. .
The hypothesis that both A and A be normal is necessary. For
example, let
md(A, 11) = md(A, A) :S L IAi - "(i)12
i
:S IIA - WAWIIII = IIA - AII.
A ( n
and
Denoting the elements of V by Vij, we have by direct calculation
A = ( -1 -1 )
1 1 '
IIA - VAVHII = L \A;\2 + L lil2 - rp(V),
so that A is no!mal but A is not. The eigenvalues of A are 0 and 4
while those of A are both zero. Hence
where
= - - 2
rp(V) = L(AiAj + AiAj)lvijl .
i,j
md(A, A) = 16 > 12 = IIA - AII.
1/J(S) = L(Aij + ),ij)(Jij'
i,j
This fact complicates the practical application of the Hoffman Wie-
landt theorem, since the sum of two normal matrices may not be nor-
m1. Even the sum of a normal matrix and a Hermitian matrix may
faIl to be normal. Thus the class of perturbations that the theorem can
handle is strictly limited.
A case in point is the attempt to derive residual bounds from a
backward pe:turbation result like Theorem 1.13. The difficulty is that
the matrix A, defined by (1.16), need not be normal. Howver, the
following result gives a residual bound for a single eigenvalue.
Thus our problem reduces to showing that rp(V) is maximized when V
is some permutation matrix.
Since V is unitary, the matrix whose elements are IVij 1 2 is doubly
stochastic. For any doubly stochastic matrix S define
It is clear that max rp(V) < max 'lj!(S), since not every doubly stochastic
v - 5
matrix has elements of the form IVij 1 2 , where V = (I/ij) is unitary.
Therefore, if we can show that 1/) is maximized when S is a permutation
matrix P", tllPn since P" is unitary, it also maximizes rp.
By llirkhoff's theorem (Theorem 1I.3.1G), any doubly stochastic ma-
trix S can be written as a convex combination of the permutation ma-
trices P,,: namely,
Theorem 3.2. Let A be normal. If IIxll2 = 1, then
li,IAi - xHAxl :S IIAx - (x H A.r)xIl2'
(3.3)
Proof. Since A is normal, there is a unitary matrix U such that A =
UAU H , where A = diag(AI,"', An). Hence
S = L ex"P",
"
II [A - (x H Ax)I]xIl2 = IIU(A - (x H Ax)I)U H xIl2
min IAi - xHAxl
Iin '
where the Ct" are nonnegative and sum to one. Since 1/) is linear in S,
1/;(5) = L a,,1/}(P,,).
"
from which (3.3) follows. .
192
IV. THE PERTURBATION OF EIGENVALUES
3.2. Diagonalizable Matrices
The chief general result for diagonalizable matrices follows from the
Bauer-Fike theorem (Theorem 1.6).
Theorem 3.3. Suppose that A is diagonalizable; i.e., X-I AX = A,
where A is diagOllitl. L,rt 11 . 11 he iI cOJlsisf,rJlt matrix Jlorm sllch that
IIdiag(81"" ,8,,)11 = maxi 18;\" Then
sv A(A) ::; IIX- I EXII (3.4)
and
sv A(A) ::; I\;(X)IIEIJ,
where as usuall\;(X) = IIXIIIIXIII. Moreover,
md(A, A) ::; (2n - 1)lIx- 1 EXII ::; (2n - 1)I\;(X)IIEII.
(3.5)
(3.6)
Proof. Let). be an eigenvalue of A. Under the hypotheses of the
theorem, the inequality (1.6) in the BauerFike theorem assumes the
form
II(A - ).1)-111- 1 ::; IIX- 1 EX\J,
from which (3.4) follows immediately. The inequality (3.5) follows from
consistency. Finally, (3.6) follows from Theorem 1.5. .
The bounds (3.4) and (3.5) hold for the widely used norms. II. . lip
(p = 1, 2, 00) (and in fact for all the Holder nrms). They hold tnJa.lly
for all normalized unitarily invariant norms, Slllce these norms dommate
the spectral norm.
Corollary 3.4. If A is normal, then
sv A(A) ::; IIEI12'
Although (3.4) is stronger than (3.5), we will usually have no more
than an estimate of IIEIJ, in which case we are forced to use the weaker
bound. Here the condition number of the matrix of eigenvectors serves
as an overall condition number for the eigenvalue problem of A. Unfor-
tunately, if we replace X by X D, where D is diagonal, I\;(X) c?anges,
even though X continues to diagonalize A. Moreover, bYr makIlg ole
colullllI of X very large or vC'ry small we can make I\; (X ) arbItrarIly
3. NORMAL AND DIAGONALIZABLE MATRICES
193
large - a situation we called artificial ill-conditioning in the last chap-
ter.
These considerations lead us to ask: What is the optimal scaling
of X? In general this is a very difficult question; however, for the
Frobenius norm we can give an answer.
Theoreill 3.5. Let X E C" X" be nonsingular, and let yll X = J. Then
I\;p(X) 2: L IIVill2l1 x ill2,
with equality if and only jf there is an 0: . 0 sllch that
IIvdl2 = 0:11xd12'
i = 1,... ,n.
(3.7)
Proof. By the Cauchy inequality
4(x) = (lIxIII + ... + IIxnllD(IIVdl +... + IIYnll)
2: (II X II1211YIII2 +... + IIx n ll2I1v,,1I2)2.
Equality holds if and only if (II X lll2,..., IIxnll2) and (IIvdI2"'" IIVnlb)
are proportional, which is equivalent to (3.7). .
There are two observations to be made about this theorem. First,
the proportional scaling (3.7) is probably not a bad strategy for other
balanced norms like II . lip (p = 1,2,00). Second, if the eigenvalues of
A are simple, the optimaII\;F(X) is the sum of the individual condition
numbers of the eigenvalues [ef. (2.8)]. This shows that the bounds in
Theorem 3.3 are realistic in the sense that if the optimall\;F (X) is large,
then there must exist at least one ill-conditioned eigenvalue.
Notes and References
For the HoffmanWielandt theorem, see [117, 1953]. Wilkinson [2G9, 19G5]
gives an elementary proof that docs not use Birkhoff's theorem.
The Hoffman- Wielandt theorem can be rewritten in a suggestive manner.
Let <I> be a symmetric gauge function and let 11.11<1> be the associated unitarily
invariant norm. Set
-_' - 2 - 2
md<l>(A, A) - lIn <1>(1..\"(1) - ..\J I ,.".,1..\,,(,,) - ..\nl ),
194
IV. THE PERTURBATION OF EIGENVALUES
where as usual 7r ranges over all permutations of the integers 1,. . . , n. Then
for <1>(1:) == /1:[/12, the Hoffman- Wielandt theorem states that
md<I>(A, A) :s: /lA - AII<I>.
(3.8)
It is natural to cOllj(cture that (3.8) remains true for lIormal matrices and
arbitrary ullitarily invariant 1I0rms. The coujecture is untrue, even for 01'-
thogollal matrices; however, many partial results are known. The following
survey is largely based on the book by Bhatia [28], which contains proofs
and further references.
Mirsky [158, 19Go] showed that the conjecture is true for Hermitian matrices.
See Section 3 for a proof and applications.
Wittmeyer [274, 1936], claims that the theorem is true for normal matrices
and the 2-norm, but he refC'rs the reader to his Ph.D. thesis for the proof.
Since others have tried and failed to establish this result, it must remain
open ulltil Wiu'meyl'l"s proof can bc examined.
Bhatia and Davis [29, 1984] have showll that the conjecture is true for or-
thogonal matrices and the 2-norm. Another proof was given by Bhatia and
Holbrook [32, 1985].
Other partial results are obtained by relaxing the bound. Bhatia, Davis, and
McIntosh [:31, 1983] have shown that for unitary matrices
- 7r-
md<I>(A, A) :s: 2/1A - AII<I>,
and they give an example to show that is the best possible constant (Ex-
ercise 3.4). They also show that for normal matrices
md(A, A) :s: 1'IIA - A1I2,
where l' :s: 2.91 [30, 1987]; i.e., the conjecture is true for normal matrices and
the 2-norm, provided we multiply the right-hand side by a factor of about
three. For most practical applications this is good enough.
The inequality (3.5) is due to Bauer and Fike [15], but as we have pointed
out it is a weak corollary of their more general results.
Exercises
1. Let A and A be normal of order n. Show that
11,1 - AIIF 2 n\x L V I-',,(i) - Ai1 2 ,
i
3. NORMAL AND DIAGONALIZABLE MATRICES
195
where 7r ranges over all permutations of the integers 1,2,. . . , n.
2. Let A and A be normal. If thcre are convcx sets A and A such that
1. A contains k eigenvalues of A,
2. A contains at least n - k + 1 eigenvalues of A,
3. the distance from A to A is 8,
then
8 :s: IIA - A/l2'
3. (Bhatia and Davis [29]). Let A and A be orthogonal matrices with their
eigenvalues lying in a semicircle of the unit circle. Order the eigenvalues
by the order in which they appear on the semicircle, say counterclockwise.
Show that
max I-'i - Ail :s: IIA - A112'
,
4. Let <I> be the symmetric gauge function defined by <I>(x) = IlxliI. Let
0 1 0 0
0 0 1 0
A:I: =
0 0 0 1
::J:l 0 0 0
Show that /lA+ - A-II<I> = 2, whereas limnoomd<I>(A+,A_) = 7r.
5. Give an example of a doubly stochastic matrix S whose elements are not
of the form IVijl, where U is unitary.
6. (BauerHouseholder [16]). Let A = X-lAX be diagonal. Let nand {3
be polynomials and w a vector with {3(A)w '" O. Show that there is an
eigenvalue of A in the region
{I n(O I < K,(x) lln(A)w/l2 } .
(3(O - /I{3(A)wIl2
7. Let X E c nxn and let K,opt(X) be the smallest value of K,p(X D), where
D is nonsingular and diagonal (see Theorem 3.5). Show that if 1I.7:iIl2 = 1
(i = 1, . . . , n), then
K,p(X) :s: vnK.oPt(X).
196
IV. THE PERTURBATION OF EIGENVALUES
4. Hermitian Matrices
In this section we will treat the perturbation of eigenvalues of Hermitian
matrices. This is an area rich in results, and we will only be able to
sample some of the more importanL
We will begin with two classical results: Sylvester's inertia theorem
and Cauchy's interlacing theorem. We will then establish Wielandt's
elegant generalization of Fischer's characterization of the eigenvalues
of a Hermitian matrix. This result in turn yields a host of powerful
perturbation bounds.
Throughout this section A will denote a Hermitian matrix
with eigenvalues
Al :::: AI' :::: . . . :::: An,
and A = A + E will denote a Hermitian perturoation of A
with eigenvalues
- - -
Al :::: A,l 2: . . . :::: An-
4.1. Inertia and Interlacing
A fundamental problem of matrix theory is to determine what remains
invariant under some class of transformations. For example, the eigen-
values and Jordan structure of a matrix are not altered by similarity
transformations. For Hermitian matrices it is natural to consider trans-
formations that leave the matrix Hermitian, which leads us to the class
of CONGRUENCE TRANSFORMATIONS; that is, transformations of the form
XHAX,
where X is nonsingular. Unless X is unitary, the eigenvalues of A need
not remain invariant under this transformation. However, the number
of positive, negative, and zero eigenvalues does not change.
Theorem 4.1 (Sylvester, Jacobi). Let A be Hermitian, and define
the INERTIA of A to be the ordered triplet
inertia(A) = [7r(A), u(A), ((A)],
4. HERMITIAN MATRICES
197
where u(A), (), nd 7r(A) are respectively the number of negative
zero, and posItIve eIgenvalues of A. Then for any nonsingular X, '
inertia(X H AX) = inertia(A).
Proof TI f' 1 . .
. Ie y.roo IS )y contraclIctlOIL Suppose, for example, that A
has more posItive eigenvalues than XII AX L t Y b tl
b tl . . e e Ie space spanned
y Ie clgenvectors corresponding to positive ei g envalues of A 1 ' 1
. len
yEY===}yHAy>O.
et Z be the space spaned by all vectors of the form X z, where z is an
eIgenvector correspondmg to a negative or zero ei g envalue of XH AX
Then .
z E Z ===} zHAz::; O.
But by hypothesis, dim(Y) + dim(Z) > n, where
Hence X and Y have a vector in common - a
Hence Y n Z = {OJ.
n is the order of A.
contradiction. .
. An important consequence of the inertia theorem is Cauchy's beau-
tful theorem relatin .the eigenvalues of a principal submatrix to the
eIgenvalues of the ongmal matrix.
heorem .2 (?auchy). Let B be a principal submatrix of A of 01'-
er n -1 wIth eIgenvalues 111:::: /12:::: ...:::: /In-I' Then
AI:::: 111:::: A2:::: 112:::: ...:::: 11"-1:::: An.
Proof. Without loss of generality assume that B is the leadin g . .
P ie s b t . f A pflnCI-
u ma fiX 0 , so that we may write .
A (:., )
Assume that the theorem is false. Then for some i either 11" > A
Ai+1 > l1i. Let i be the first such index. ' , or
We will treat the case /li > Ai, the other case being similar. Let
/li > T > Ai. Then B - T I is nonsingular, and the matrix
H= ( B-TI 0 )
o a-T-aH(B-TI)-la =
( I 0 ) ( B - TI
-all(B - TI)-] 1 all
a ) ( I - (B _ I T I) -) a )
n - TI 0
198
IV. THE PERTURBATION OF EIGENVALUES
. t t A T 1 Hence by the inertia theorem, H has the
IS congruen o. I . 1 B t H
same number of positive eigenvalues as A - Tl, name y l - . . U TI
. . . I as B - T 1 namely l. Ie
has at least as many POSItIve elgenva ues ( '"
contradiction establishes the theorem. .
If in the theorem C is a principal submatrix of A of order n - 2,
, ,"" f c t'f >v>IL>
then the eigenvalues VI 2: Vz 2: . . . 2: Vn-z 0 sa IS y ILl _ I _ z_
Vz . . . 1/1I1 jI'n-l' Hence
Ai 2: Vi Ai+Z,
i = 1,2, . .. , n - 2.
Continuing through sub matrices in this manner, we have the following
corollary.
eorollary 4.3. Let B be a principle submatrix of order n - k of A
with eigenvalues ILl ILz 2: . . . 2: ILn-k' Then
Ai ILi 2: Ai+k,
i = 1,2, . . . , n - k.
Finally we observe that the interlacing theorem holds for more
'. . . L t U E enx(n-k) have orthonormal
than just pnnClpal submatnces. e .,
I 1 1 t \ 1 be chosen so that ( u \1) IS umtary. Then apply-
co umns au( e " , II .
.. C . II 4 3 to the matrix ( u \1)11 A(U \1), we have the fo owmg
mg 01 0 ary ., "( .
corollary.
II 4 4 L t U E enx(n-k) have orthonormal columns. Let the
Coro ary .. e
. 1 f U II AU be 1/ 1 > I/z > . . . > ILn-k' Then
eJgenva ues 0 fA' _ fA' _ - .
Ai 2: ILi 2: Ai+k.
i = 1,2,. . . , n - k.
4.2. Wielandt's Theorem and Its Consequences
It is a consequence of Theorem 1.3.13 that
Al = max x H Ax.
xHx=1
gellerall ' zat i o n of this fact is Fischer's theorem, which
An important
states that
. H A
A" = Inax nun x x.
t ditn(..Y)=i xfl-:l
4. HERMITIAN MATRICES
199
In this subsection we will establish a further generalization, due to
Wielandt, which has far-ranging implications. The proof, which has
been adapted directly from Wielandt's paper, is complicated and may
be omitted without loss of continuity.
Theorem 4.5 (Wielandt). Let 1 ::; i l < i z < ... < i k ::; n. Then
Ail + Ai2 + . . . + Aik = max min. trace(X H AX),
Xil C'\"2 C "' C .--\'tk X=(3'il T 12 ". Iik ),1'1 J EAij
dim(X 1 ) )=i) .Xllx=!
and
( 4.1)
Ail + Ai2 + . . . + Aik =. min max, trace(X Il AX).
'\il J"'t2 J"'J'\'ik X:(T '1 T '2 ... X lk ),T 1J E'\tj
dim('\'tj )n-iJ+l xH x=[
( 4.2)
Remark 4.6. Note that the words max and min (instead of sup and
inf) imply that the maximizing or minimizing objects actually exist.
Proof. We will establish (4.1), from which it is an easy exercise to
establish (4.2). We begin by showing that there is a particular sequence
XiI C X i2 C ... C X ik of subspaces with dim( X ij ) = ij sllch that
if X = (XiI Xi2 '" Xik) (Xij E XiJ has orthonormal columns, then
trace(X H AX) 2: Li) Ai)' In fact, let X ij be the space spanned by
the eigenvectors of A corresponding to AI, Az, . . . , Ai" Then Xi j is a
linear combination of these eigenvectors, and since X}!Xi j = 1, we have
Xl , I Ax , .. > A , ". Hence
J ) - 1
trace(XHAX) = 2::>AXij 2: L Ai)'
i) ij
In view of the result of the last paragraph, it will be sufficient to
establish that
max min trace(X H AX) ::; Ail + Ai2 + . . . + Aik'
Xi! CXi2C"'CXik X=(Xil xi2 .. Xik),TijE.-l'tj
dim('Yij)ij XHX1
The proof will be by induction on n. Note that the theorem is trivially
true when k = n, since in this case X H AX is similar to A. Hence the
theorem is true for n = 1.
200
IV. TilE PEHTUIWATION OF EIGENVALUES
Let us therefore assume that n > 1 and k < n. Let Xi! C X i2 C
. . . C X ik with dim( Xi)) = ij be given. We must show that there is a
matrix X = (Xi! Xi2 ..' Xik)' Xij E Xi) with orthonormal columns such
that trace(X H AX) S; Ai! + Ai2 + . . . + Aik'
First. assume that i k < n. Let X"_1 be an (n - I)-dimensional
subspacp cont.aining X ik . Let Z = (2J 22 .., 2,,-1) be a matrix with
orthonormal columns such that R[( 21 "., 2ij)] = X ij and R( Z) = X,,_I'
Let B = ZII AZ. Then by Corollary 4.4 the eigenvalues 11i of B satisfy
J.li S; Ai,
i=I,...,n-1.
( 4.3)
N ow let
Yi j = {ZlI x : X E XiJ.
Observe that since X ij C R(Z), if V E Yij then x = Zy E Xi)' More-
over, yllBy = XII Ax. By the induction hypotheses there are orthonor-
mal vectors Yij E Yij such that
LVBYij S; LILi)'
j j
(4.4)
Hence if Xi , = Z Vi ) , then Xi ) E Xi. and L J ' yil BYi = L J . xr AXi.' Hence
" "J) J j"J
by (4.3) and (4.4)
trace(XIIAX) = 2::>Axij S; L Aij'
j j
which is what we were to establish.
Now assume that i k = n. Let 1 be the largest index such that
it + 1 < i nJ . For notational convenience let it = p and il+ 1 = q.
Let X n - I be an (n - I)-dimensional su bspace that contains X p and the
eigenvectors corresponding to Aq, . . . , Art' Since q, q + 1,. . . , n - 1 are
among the indices i j , we have
A A
,1'/> C X q n Xn-I C ... C Xnl n Xn-I C X n -].
Since for i = q, " . . , n -1, dim(X i n Xn-d 2: i-I, we can find subspaces
X q _],.. ., Xn-2 such that
X'l,-I C Xq,. .., Xn-2 C X,,_I
4. HERMITIAN MATRICES
201
and
Xi! C '" C X p C X q _ 1 C '" C X n - I .
. No:v apply the construction of the previous case to give a matrix B
wIth eIgenvalues II >... > II t . f . (4 3) d .
,...1 _ _ ,...n-I sa IS ymg . an a umtary matrix
X = ( .x , " ! ... X l> . X I )
, o.J.'q ... .,tt-
such that
Xi) EX i ), j=I,...,l,
Xi E Xi C Xi+I, Z = q - 1,. .., n - 1,
and
trace(X li AX) S; L;=1/ 1 i) + L::qlI/Li
< ",I A. + ",-I. (4.5)
- L-J=I 'j L-,=q_I/L,.
By construction X n _ 1 contains the eigenvectors of of A corresponding to
Aq,. . . , An. Hence these are also eigenvalues of B. Sinc e 11
I q-I,...,J.ll1 I
are t Ie smallest eigenvalues of B, we have -
n-l n
L JLi S; L Ai,
i=q-I i=q
and the result follows upon substituting this inequality in (4.5). .
":'hen k = 1, Wielandt's theorem gives Fischer's characterization of
the eIgenvalues of a Hermitian matrix.
eorollary 4.7 (Fischer). The eigenvalues of A are given by
Ai = max min xII Ax
dim(X)=i IE"\"
xHX=l
and
Ai = min max Xli Ax.
dim(X)=n-i+1 IE.\"
x H x=l
For i = 1 the second of the above charact.erizations reduces to
Al = max xII Ax
xHx=l '
202
IV. TilE PERTURBATION OF EIGENVALUES
as was pointed out at the beginning of this section. This latter char-
acterization has important implicat.ions for perturbation theory. For
suppose, as usual, that A = A + E,_where E is _also Hermitian. Then
denoting the largest eigenvalues of A and E by Al and EI, we have
.xl = max x" Ax < max x" Ax + max :r" Ex :S Al + E).
:,-fl:1'-:1 - :,.'I:r=-l .,.II.r:==1
In other wards, since IEtl :S IIEI\2, the perturbation E can increase the
largest eigenvalue of A by no more than IIE112'
We will now proceed to generalize this result. As we did earlier, we
first establish a result for sums of eigenvalues and then specialize it to
a single eigenvalue.
Theorem 4.8. Let the eigenvalues of E be
EI E2 .. . En,
and let iI, . . . , i k be distinct integers between one and n inclusive. Then
Ail + . . . + Aik + En-k+1 + . . . + En :S .xii + . . . + ik
SAil + . . . + Aik + EI + . . . + Ek.
Proof. Without loss of generality, we may assume that i l < .. . < in.
We will first establish the second inequality. By Remark 4.6 following
Theorem 4.5, there are subspaces Xii C X i2 C . . . C X ik such that
i + ...+ =
11 tk
nun
X=(Til X'2 ,., Tlfr ),Ttj EX tj
XHX=I
H -
trace(X AX).
Moreover, t.here are vectors :1\ E Xi.) such that X
unitary and
(Xii ... Xik) IS
Ail +... + Ai k trace(XHAX).
It follows that
il + . . . + ik :S trace[X"(A + E)X] S Ai! + . . . + Aik + trace(X H EX).
But by Corollary 4.4,
trace(X" EX) :S E) + ... + Ek,
4. HERMITIAN MATRICES
203
which establishs the second inequality.
The first inequality may be obtained from the second by writing
A = A - E, from which it follows that
Ai, + . . . + Aik :S ).il + . . . + \k - Enk+1 - . . . - En" .
Wl)(n k: = 1, the theorem provides a perturbation bOUIHL
Corollary 4.9 (Weyl). For i = 1,. . . , n
).i E [Ai + En''\; + Ed.
There are three things to note about this corollary.
First, the corollary is similar to the Gerschgorin theorem in that it
pro':.ides a set of n intervals (disks) whose union includes the eigenvalues
of A. However, we know just which eigenvalue to look for in each
interval. Moreover, it is impossible for an eigenvalue corresponding
to one of a cluster of overlapping intervals to migrate outside its own
interval.
Second, the intervals are not symmetric about the eigenvalues Ai.
In fact if En is positive, the ith interval will not contain Ai' This occurs
when E is positive definite. In other words,
if a Hermitian matrix is perturbed by a positive definite
matrix, its eigenvalues must increase.
Third, there is a weaker, more conventional form of the theorem
which is stated in the following corollary.
Corollary 4.10.
max{IAi - Ail} :S IIEIIz.
(4.6)
This result follows directly from the preceding corollary and the ob-
servation that IIEI12 = max{lEll, IEn!}. In the next subsection we will
generalize this corollary.
4.3. Mirsky's Theorem
Equation (4.6) can be rewritten in a more symmetric form: namely,
IIdiag(.x i - '\i)1I2 :S IIEII2'
(4" 7)
204
IV. TilE PERTURBATION OF EIGENVALUES
I 11 . II ith other norms to
This suggests that we attempt to rep ace 2 w . . t '1
. 1 d 1ft ( 4 7 ) is valId for any um an y
btain new perturbation Joun s. n ac, . I
(. . I e r to I Jrove it we first establish an ana ogous
mvanant norm, lOwev ,
result for singular values.
k ) Let X and X be matrices of the same di-
Theorem 4.11 (Mirs y .
lIH'nsioJJs with singular valucs
al a2 . . . a 1 "
al 2: a2 2: . . . 2: al"
Then for any unitarily invariant norm II . II,
Iidiag(ai - ai)11 II"Y - XII.
. e that X and X are
Proof. Without loss of generalIty we may assum I t ke
s uare (otherwise pad them out with zero rows or co umns. 0 ma
tem so). Now by Theorem 1.4.2 the eigenvalues of the matnx
Un :)
. f X - F . II if E >... > E are the
::!: ::!:a and simIlarly or . ma Y I _ _ n
are al,"', P' f
singular values of X - X, then the eigenvalues 0
)
( 0 X-X
(X - X)H 0
are ::!:EI, . " . , If]>'
III Theorelll ,1.8 let
i, {
k ifak2:ak,
n + k if ak < ak.
It then follows that
lal - all + . . . + lak - akl EI + . . . + Ek,
Therefore by Theorem 11.3.17 the inequality
''' ( - - a a - - a , ) < <I> ( fl, . . . , fl')
'F at 1, . . ., P J_
k = 1,... ,n.
4. HERMITIAN MATRICES
205
holds for any symmetric gauge function <1>. The result now follows from
yon Neumann's characterization of unitarily invariant norms (Theo-
rem II.3.6). .
An immediate consequence of Mirsky's theorem is the generalization
of (4.7). Specifically, we have the following corollary.
Corollary 4.12. Let <I> be a sYllJl1wtric gauge function and II . 11<1> ds
corresponding unitarily invariant norm. Then
IIdiag().i - .\;)11<1> IIEII<I>'
( 4.8)
Proof. Let p = min{An, ).n}. Then the eigenvalues of the matrices
A - pI and A - pI are nonnegative; i.e., their singular values and their
eigenvalues are the same. Mirsky's theorem now applies to give (4.8).
.
When <I> generates the Frobenius norm, we obtain a Hermitian ana-
logue of the HoffmanWielandt theorem.
Corollary 4.13.
n
2)).i - AiF IIEIIF'
i=1
Note that this result is stronger than the HoffmanWielandt theorem,
since it specifies an ordering of the eigenvalues that satisfy the inequal-
ity, whereas the Hoffman- Wielandt theorem merely asserts that such
an ordering exists.
4.4. Residual Bounds
In this and the next subsections we will consider applications of the
Mirsky theorem. The subject of this subsection is residual bounds.
As in Section 1 we are given a matrix A (now Hermitian) of or-
der n and a matrix X whose column space approximates an invariant
subspace of A. This means that for some choice of M, the residual
R = AX - X M
will be small. In particular, if X has arthonormal columns, then by
Theorem 1.15 any unitarily invariant norm of R is minimized when
206
IV. THE PERTURBATION OF EIGENVALUES
Moreover, since A is Hermitian, we can use irsk.y's
a bound on the eigcnvalues of A1 as an approxnnatlOn
AI = XHAX.
theorem to get
to t hose of A
L t :y E nnxk have ortlJOnorwal coluwns. Let !vI =
Theorem 4.14. r", . [, t' on
c H A v J 1 t R = AX - X AI" Lrt <I> be a symmetnc gauge U1c lOl
X ,\. aI1C1 r, C '1 [ 't '1 III van an t
nn Ilpt 11.11 denote the correspon(Jjng lan1J y 0 UIll ,an .
, <W( A . A >... > A and the eIgenvalues
norms. If the eIgenvalues o[ al e I - -". i such
[ M > > Il k then there are integers 1'1 < 12 < . . . < k
o areIII_"'_f"" ,
that
'II ' ( "- \" )11 < \ I XR H + RXI!II<1> = <I>(PI,PI,P2,P2,.' .), (4.9)
(Jag 11) At} <1>_
1 » . .' . Ire tI le sin g ular values o[ R.
II! ]('1'1' PI _ P2 _ ,,' c
Proof. We will establish (4.9) for the case 2k :S 11, leaving the other
. F AI XI!AX let
case as an exerCIse. or = ,
E = -(X R H + RXI!).
Then E is Hermitian, and it is readily verified that
(A + E)X = XM.
(4.10)
. . b f A + E and to each eigenvalue
1 ' 1 'D ( V ) I 'S an lllvanant su space 0 ,
l\lS '''" J\. . - fA., E
11' of !vI there corresponds an eIgenvalue Aij 0 . d'
) By Mirsky's theoremlldiag(.\i - Ai)II<I> :S IIElk Hence II Jag(l-
. < E <I> = 11 X R H + RX H 11 <1>, and it remains only to esta IS
A I '}) 'I <I> _ I ' I t I ! I ' n ( 4 9 ) or e q uivalently that the singular values of E are
tleequalY . ) . 't then
But if X, is chosen so that (X X.l IS um ary,
PI, PI, P2, pz, . . . . .L
( 0 R H X.l )
(X X.l)HE(X X.l) = XR 0 .
Since n(R) c n(X.l) and the columns of X.l are orthonormal, the
. I Illes of X l! R are the same as those of R, and hence those of
smgu ar va .1
E are those of R repeated. . .
It is worthwhile to list the bounds for the spectral and Frobemus
norms.
4. HERMITIAN MATRICES
207
Corollary 4.15. For the spectral norm we have
max {I/Ij - Ai} I} :S 11 RII2,
J
and [or the Frobenius norm
v '2((1Jj - AiJ2 :S 1211RIIF'
(4.11)
Remark 4.16. By an application of the argument leading to the Hoff-
manWielandt theorem, Kahan has been able to remove the factor 12
in (4.11). See Exercise 4.8.
The residual bounds derived above can be very good or very bad,
and the following example shows.
Example 4.17. I[
A( )
and X = 1 1 , then M = 0 and IIRlb = E, so that the bound is attained
(the eigenvalues o[ A are :h). On the other hand, i[
A( :),
then the residual bound [or AI = 0 is the samc, but the smallest eigen-
value of A is approximately -E 2 !
The distinction between the two examples is that in the second the
unwanted part of the spectrum is well removed from the part we are
attempting to bound. In Section 3 we will show how to use such infor-
mation to get a better bound.
Two more comments. First, the eigenvalues of M = Xl! AX are
sometimes called the RAYLEIGHRITZ APPROXIMATIONS to the eigenval-
Ues of A. Second, although we motivated this subsection by taking X
to be a matrix of approximate eigenvectors, all that is required to get
accurate eigenvalues is that R be small. Indeed, the part in proof where
we show that the eigenvalues of M are the same as those of A + E can
be turned into an algorithm for getting approximate eigenvectors from
X, a procedure that is sometimes called RayleighRitz improvement.
208
IV. THE PERTURBATION OF EIGENVALUES
4.5. Approximation by a Low-Rank Matrix
The second application of Mirsky's theorem is to the cletermination
of low-rank approximations to a fixed mat.rix. As above, let <I> be a
symmetric ane function and II . 11<1> be the corresponding unitarily
i;lvariant. norm. Let X E e mx " have the singnlar value decomposition
X = UI:V H ,
(4.12)
where al ;::0: ... ;::0: am ;::0: O. We wish to find a matrix Y of rank not
greater than k that is as near as possible to X in the <I>-norm.
First let Y be any matrix of rank not greater than k. Then the
singular values of Yare TI ;::0: . . . ;::0: Tk ;::0: 0 = . . . = 0; i.e., the last m - k
sinular values are zero. It follows from Mirsky's theorem that
IIY - X\I<I> ;::0: <1>( TI - al,. . . , Tk - ak, aHI," . . , am)
;::0: <1>(0, . . . ,0, ak+l, . . . , am)'
III other words, any approximation of rank not greater than k must be
at least <1>(0,.. .,0, aHI,"', am) removed from X ill the <I>-norm.
N ow let
I: k = diag(al,"" ak, 0,...,0)
(4.13)
and
X k = UI:kV H . (4.14)
Then it is easily verified that XI; has rank not reater than k andllX k -
XII<I> = <1>(0,."'.,0, ak+I," ., am). Thus we have proved the following
approximation theorem.
Theorem 4.18 (Schmidt, Mirsky). Let X have the singular value
decomposition (4.12), where al ;::0: ... am ;::0: O. Let <I> be a symmetric
gauge function and 11.11<1> be the corresponding unitarily invariant norm.
If Y is a matrix of rank less than or equal to k, then
IIY - XII<I> ;::0: <1>(0,...,0, aHl,"', am)'
Moreover, equality is attained for the matrix Xk defined by (4.13) and
(4.14).
4. HERMITIAN MATRICES
209
Notes and References
Although Sylvester published the inertia theorem in 1852 [233] (also see [234,
18 53]), the theorem was found in Jacobi's papers ami published posthu-
mously [123, 18 57] by Borchart, who gives 1847 as the date of discovery [39].
Hermite, who published his own proof [110, 1857], also names Jacobi as
having discovercd the principle. According to one biographer [195], at the
time Jacobi was suffering from diabetes and from personal revcrses stem-
ming from the revolutions of 1848, which probably accounts for his failure
to publish.
The interlacing theorem (Theorem 4.2) is due to Cauchy [42, 182 9].
Wiclandt [267, 1955] provcd his theorem because he was unable "to succeed
in completing the intcresting sketch of a proof given by Lidskii [147, 195 0 ]"
of Tlworem 4.8 (see [28, p.50] for more details and further rcferences), Amir-
Moez [4, 195 6 ] gcncralized Wielandt's charactcrization by replacing the sums
and traccs by any function of the eigenvalues in question that is nondecreas-
ing in its arguments. The special, but very important case in Corollary 4.7
is due to Fischer [74, 1905], who actually established it for matrix pencils
(see Corollary VI.1.16). Courant [46, 1920] extended the result to differ-
ential operators, and the theorem is frequently called the CourantFischer
theorem.
Weyl [265, 1912] proved more than is stated in Corollary 4.9 (see Exer-
cises 4.3 4.4). He also claims the analogous results for singular values Ii la
Schmidt [192].
For Mirsky's theorem see [159, 1963], which in addition contains an ad-
mirable survey of unitarily invariant norms and related topics.
For the spectral norm and arbitrary M, the rcsidual bound of Corollary 4.15
is due to Kahan [130, 19G7] (finally published as a part of [52, 19 82 ]), who
uses the dilation theorem (Exercise 1.10) specialized to Hermitian matrices.
The generalization in Theorem 4.14 to unitarily invariant norms is new.
The proof given here is closely related to a proof for the spectral norm given
by Parlett [175, pp. 219-220]. It should bc noted that this result is but
one -- and one of the simplest - of a host of useful residual bounds. Sce
[144, 145, 264] and especially the book by Parlett [175] , which contains a
unified treatment of many of these topics.
As was noted in the text, the eigenvalues of M in Theorem 4.14 are fe-
quently called RayleighRitz approximations to the eigenvalues of A. Both
Rayleigh and Ritz were concerned with approximating the eigenvalues of an
210
IV. THE PEHTUIU3ATION OF EIGENVALUES
infinite operator by nplacing it with a matrix eigenvalue problem. R.ayleigh
[183, 1899] found the natural frequencies vibrating systems by restricting
its degrees of freedom to a finite number of modes, which were to be cho-
sen to accentuate the fundemental frequency. R.itz [186, 1909] approximated
the eigenvalues of the vibrating string by minimizing the variational equation
over a fiuite dimensional subspace. Neither gives a formal justification for his
method. A curious custom has grown up of calling eigenvector approxima-
tions obtailwd from !If and X "Rit", v(d,ors," alt.hough Rit", himsdf merely
said that he was unable to establish their couvergence using the techniques
he had devdoped earlier in his paper.
The Schmidt Mirsky theorem (Theorem 4.18) is commonly attributed to
Eckart and Young [63, 1936], who established it for the Frobenius norm.
I3ut Schmidt [192, 1907] proved it for integral operators and the Hilbert
Schmidt norm -- the natural extension of the Frobenius norm. Mirsky [159,
1963] generalized it to unitarily invariant norms.
When a Hermitian matrix is perturbed at random, a multiple eigenvalue
will tend to break up into simple eigenvalues, and the perturbation in these
eigenvalues will all be of a size. When the perturbation is not random,
however, the perturbations can be quit.e disparate. Sun [231, 19R9] has
investigated the case where t.he clements of A depend analytically on several
paramcters.
Exercises
1. Let
( Be )
A = cll D .
Show t.hat. there is an eigenvalue A of A satisfying IA - DI s: Ilcll2'
2. (Lidskii [147]). In the notation of Theorcm 4.8, let e = (EI'"'' En)T.
Show that (>\] - AI, .. . , n - AT,) lies in the convex hull of the set {Pe :
P a permutation}.
THE FOLLOWING TWO EXERCISES SHOW IN MODERN NOTATION
WHAT WEYL [265] ACTUALLY PROVED.
3. Let A and B be Hermitian with B of rank k. Then the largest eigenvalue
of A - B is not less t.han the (k + l)th largest eigenvalue of A.
5. SOME FUn:I'BEH RESULTS
211
4. Let A, B, and C have eigenvalues (XI 2: '" 2: ct", {JI >
II 2: . . . 2: In' Then
> {3n, and
li+j+l s: cti+1 + {Jj+I'
-0-
THE FOLLOWING EXERCISES DEVELOP THE KATOTEMPLE RE-
SIDUAL BOUND [43, SECTION 6.5] FOR AN ISOLATED EIGEN-
VAUI AND ITS EIGENVECTOR. IN WIIAT FOLLOWS lI:rll2 = 1,
P, - x Ax, AND r = Ax - ttx.
5. Let J1. E (a, (3), where (a, (3) contains no eigenvalues of A. Then
({3 - p,)(p,- a) s: IIr1l2'
6. Let I!. < p, < 71, where (I!., 71) contains exactly one eigenvalue A of A. Then
A E [ p, _ Irll , It + IIrll ] .
p,-Il Il-I!.
-0-
7. (Kahan [129]). Show that for the 2-norm, tlw hypothesis !if = XIi AX
can be removed from Theorem 4.14. Specifically, for arbitrary Hermitian M,
the inequality (4.9) can he replaced by
IIdiag(p,j - Aij)112 s: IIRII2'
[Hint: Use the dilation theorem (Exercise 1.11).]
8. .(Kahan [129]).. Show that the factor v'2 can be removed from (4.11).
[Hmt: Assume wIthout loss of generality that A and M are diagonal, and
reard R as a function of X, or more generally of U = (X X), where U is
umtary. Let W = IUI, and let Dij = (Ai - p,j) when j s: k and otherwise be
zero. Show that IIRII = Li Lj Wijliij. Conclude from I3irkhoff's theorem
that IIRlh is minimized when U is a permutation matrix.]
5. Some Further Results
This subsection is devoted to some useful results that could not be made
to fit comfortably into the preceding subsections. In the first subsec-
tion we treat the' problem of non-Hermitian perturbations of Hermitian
matrices; and in the second, the perturbation of eigenvalues of matrices
that are similar to Hermitian or normal matrices.
212
IV. THE PERTURBATION OF EIGENVALUES
5. SOME FURTHER RESULTS
213
r: 1 N on-Hermitian Perturbations
iJ. .
The results of this subsection concem non-Hermitian perturbations of
Hermitian matrices and except as noted a:e dle to. Kahan. Thro;t
ill Clssume that A is a Hermitian matnx wIth eIgenvalues Al -. _ I'
wew ,,' _ 1 ' II ,' "i'III I H'rt.urbat.lolio
:\ \V!' will furt.h!'r aSSUIlt(' t.hat. / IS a llon- (11111", . _ . ,
/ A ':' tl . t . F = A - A is not Hermitian. The eigenvalues of A, :v lnch
, Id IS, J . " > . . . > Fmally
may be complex, will be wntten ILk + Wk, where ILl _ _ IL".
we will write E + E H
E 1R =
2
Hence
II XII (A - AII)x
x E"x = = 1/
" 2i '
from which it follows that Ivl ::; IIE\.1112. .
If Oil!' of tit(' rqiolls Dk is isola!.!'d from t.h!' oth!'rs, it. (,ollt.ains only
one eigellvalue, llC1mely ILk + ivk, which is perforce real. Thus the theo-
rem says something new only for clusters of eigenvalues whose regions
overlap. Specifically, if the m regions Dk, . . . , Dk+rn-l overlap, then they
contain precisely m eigenvalues of A, namely ILk + iVk, . . . ,Ilk+m-I +
iVk+rn-l. The regions themselves are disks trimmed at the top and bot-
tom by horizontal lines (z) = ::!:II E'.'i 112. As the perturbation becomes
increasingly Hermitian, these lines approach one another, restricting
the sizes of the imaginary parts of the eigenvalues of A.
There is another version of the theorem that is reminiscellt of the
Hoffman W ielancl t theorcm.
and
- -II
E - Ell A - A
E\.l = 2i 2i
for the "real" and "imaginary" parts of E. It call be verified by direct
computation that
IIEII = IIE1RII + IIE'.'iII.
(5.1 )
Theorem 5.2. In the notatjon above
"\"1 I \\2 = L
t, v,' " IIEolI,
(5.2)
Theorem 5.1. Let
Dk = {fL + iv : III + iv - Akl ::; IIEI12 andlvl ::; II E '.'i1l2}'
and
Then
n
A(A) C U Dk'
k=1
n
L(ILk - Ak)2 ::; IIE1RIIF +
k=l
n
IIE'.'i/l - L Vk 2 .
k=1
(5.3)
Proof. By Corollary 3.4, for any It + iv E A(A) there is an eigenvalue
Ak of A such that
From thjs jt follows that
\/1 + iI/ - AI,l s; \lElk
It remains only to show that \1/\ s; \I E'.'I 112_" . " .'
Let x be a 'normalized eigen\'ector of A correspondmg to /1 + lV, I.e.,
n
L I(ILk + ivd - Akl 2 s; V2/1Elk. (5.4)
k=1
By passing to the Schur form of A, We may assume that
A.r = \/1 '7 iv)I.
A = .11 + i.\' + R. (5.5\
'\ '1.':"
\' = :. :,'
. ,......-
. J'.., . ;::\.: _\, :-::-:-!\.::-i,"
x H Ax = fL + iv
and
XII Allx = IL - iv.
R + R II
A + E'!R = M +
2
214
IV. TIlE PERTUlWATION OF EIGENVALU
and
R-R H
E = N + 2i
N . N d (R - R II )/ 2i have dis J ' oint sets of nonzero eleme
ow smre an ,
, \\ R_R H \\ 2
IIE'JII = IINllf, + 2i F
= IINII + IIRII
IINII = I:k=1 Vk 2 ,
(5 2) O n the other hand, since A and M are He
which establishes . .
tian
n
L:)fLk - Ak)2 11M - AIIF
'=J = liE. _ R +2 RIl
IIEIIF + \\R +2 RHt
= IIEIIF + IIRIIF
= IIEIIF + J IIE'JII - IINII
= IIEIIF + J IIE'JII - I:k=l Vk 2 ,
which establishes (5.2).
To establish the combined bound (5.4), write
" 1 2 ",n ( \ ) 2 + ",n v 2
I:k=1 I(P'k + Wk) - A = L-k=l JLk - /lk 2 L-k=1 k
J n 2 ) ",n 2
(IIEIIF + IIE'JII - I:k=l Vk + L-k=l V k
= IIEII + 21IEIIF J IIE'JII - I:k=1 Vk 2 + IIE'JII}
s: (IIEIIF + II E 'JIIF)2
s: 2(IIEII} + IIEII)
= 2I1EII. .
215
rem 3.3 we assumed that A was diagonalizable and derived
d on sv A CA) that depended on the condition number of the
""alizing transformation. In this subsection we will assume that
fand A can be reduced to either Hermitian or normal matrices
".larity transformations, and obtain perturbation bounds on their
ues.
, begin with the Hermitian case. The principal result is based on
, owing lemma.
a 5.3. Let Hand K be n x n Hermitian matrices , and let
ag(O'I" . . , an) with 0'1 . . . an O. Then
IIH - K1I2 O'nliH - K1I2.
Er,
fj'Let A be the eigenvalue of H - K of largest absolute value, so
'IH - Klb = IAI. Let x be the corresponding normalized eigenvec-
en
':xH(H - K)x = xH(H - K)x + xH(IO= - K)x
= Axllx + iT,
(5.6)
.;:
iT is real (here we have used the fact that the matrix K - K
'-Hermitian). Hence
,0' "H - K1I2 = max luH(H - K)vl
;. 11"112=1
" 1111112=1
max luH(HE - EK)ul
lIull2=1
IxH(HE - K)xl
IAllxllxl [by(5.6)]
= IIH - K1I21xHxl O'nliH - Klb. .
a 5.3 allows us to establish the following theorem.
, m 5.4. Let A, A E e nxn , and suppose that there are two non-
'.' matrices P and Q such that p1 AP and QI AQ are Hermi-
t the eigenvalues of A and A (which are necessarily real) be
An and 5. 1 . . . 5. n . TheIl
l.x i - Ad s: !\;2(P)!\;2(Q)IIA - A1!2,
i = 1,..., n.
(5.7)
216
IV. TilE PERTURBATION OF EIGENVALUES
5. SOME FURTHER RESULTS
217
Proof. Let
and
-I - -I -
A. = P AP and A. = Q AQ.
D = 2: - anI.
Obviously, the diagonal elements of D are nonnegative. Moreover,
Then
Ilj - AII2 = IIQ.4.Q-I - PA.P- I 112
-:-IIQCI.Q I/'_Q 1/';1.)/' 1112 (S.H)
2': IIQ- 1 1I2- I IIPIl2- I IIA.(Q-Ip) - (Qlp)A.1I2'
{i = IIM(D + a"I) - (D + a , J)NIlt., - alIl\f - Nllt.,
= liMO - !IN + an(M - N)IIf., - aIIM - Nllt,
= IIMD - DNII + 2an{trace[(MD - nN)H(M - N)]}
= liMn - DNII
+antrace{D[(M - N)H(M - N) + (M - N)(M - N)H])
Let U2:V H be the singular value decomposition of Ql P. Then with
a" denoting the smallest diagonal of 2:, we have from Lemma 5.3
IIA.(Q-Ip) - (QIP)A.lb = II(U H A.U)2: - 2:(V H A.V)II2
2': anllUHA.U - V H A. VII2
2':a"l5.i-Ail, i=l,...,n
2': 0,
which is the required inequality. .
Notes and References
Thus from (5.8) we get
- I 1-
IAi - Ail a;: 11P11211Q- IhiiA - AII2,
i=I,...,n.
(5.9)
With the exception of Theorem 5.5, which is due to Sun [227, 1984] and
Zhang [277, Ig8G] , the results of this section are taken from a paper by
Kahan [133, 1975]. Kahan writes as if Theorem 5.1 were due to Wilkinson
[269, 19G5], but although Wilkinson gives a brief discllssion of nonsymmetric
perturbations, he does not bound the imaginary parts.
In addition to the results of this section, Kahan shows that the matching
distance of a non-Hermitian perturbation is proportional to log nllEII2 (Ex-
ercise 5.2).
Now
a n - I = II(Q-I Pt l l12 IIp- 1 1121IQlb.
Combining this with (5.9) we get (5.7). .
Therc is an analogue of Theorem 5.4, due to Sun and Zhang, for
matrices that can be transformed into normal matrices.
Theorem 5.5. Let A, A E c"xn. Assume that there are nonsingular
matrices l' and Q sllch t.1wt p-I AP and Q-I AQ are normal. Then
Exercises
md 2 (A, A) K:(P)K:(Q)IIA - AIIF'
1. Let A be Hermitian and A be normal. In the notation of Theorem 5.1
show that
Proof. If we can establish the analogue of Lemma 5.3 for normal
matrices, then the proof of Theorem 5.4 goes through mutatis mutandis.
Specifically we must show that if M and N are normal matrices and
2: = diag(al,"', an) with al 2': ... 2': an 2': 0, then
t. "1 :S 11/'00 II,
and
IIM2: - 2:NIIF 2': anllM - NIIF.
n
I)11k - Ak)2 ::; IIE!RIIF.
k=1
To show t.his, set.
Conclude that
2 2 11 11 2
{) = IIM2: - 2:NIIF - an M - N F
n
L I(/lk + il/ k ) - Akl 2 S; IIEIIF'
k=l
218
IV. THE PERTURBATION OF EIGENVALUES
[Notc: This is thc bound that thc Hoffman Wielandt theorcm would provide,
cxeppt that thp pairing of the cigenvalucs of A and A is cxplieiL]
2. (Kahan [133]). Use thc fat't [132] that if ..\(Z) is real then IIZ - ZIII12 :S
(0.038 + log2 n) II Z + Z1I112) to show that. if A is lIcrrnitian then
md(A, A) :S IIE!R112 + (0.038 + log2 n)IIE\J1I2.
Chapter V
Invariant Subspaces
We have already observed in Section 11.4 that the problem of establish-
ing perturbation bounds for eigenvectors is complicated by the fact that
eigenvectors corresponding to multiple eigenvalues are not unique. This
has the consequence that the eigenvectors corresponding to a tight clus-
ter of eigenvalues will be ill conditioned. However, the space spanned
by these eigenvectors is an invariant subspace, which need not be sensi-
tive to perturbations in the matrix. It therefore makes sense to derive
perturbation bounds for invariant subspaces, from which bounds for
eigenvectors follow as a special case.
The first section of this chapter may be regarded as a continuation of
the subsection on invariant subspaces in Chapter I. Here we introduce
the notion of a simple invariant subspace - the analogue of a simple
eigenpair - and establish its properties. In the next section we derive
error and perturbation bounds for a simple invariant subspace of a
general matrix. In the third section we present the Davis- Kahan theory
for invariant subspaces of Hermitian matrices. The chapter concludes
with a section on the singular value decomposition.
Throughout this chapter, A will denote a matrix of order
11, except in the last section, where it will denote an Tn x 11
matrix.
219
220
V. INVARIANT SUBSPACES
1. SIMPLE INVARIANT SUBSPACES
221
1. The Theory of Simple Invariant Subspaces
AX = XL.
which establishes (1.1). Writing (1.1) in the form X" AHy = 0, we see
that R(Y) must be an invariant subspace of A". .
Just the invariant subspace R(X) is a generalization of the notion
of a right eigenvector, so R(Y) is a generalization of a left eigenvector.
Consequently we shall call R(Y) a LEFT INVARIANT SUBSPACE of A.
The condition (1.1) can be regarded as saying that A can be red uced
to a block triangular form by a unitary similarity. To see this, let XI be
an invariant subspace of A, and the columns of XI form an orthonormal
basis for X. Let (X I Y 2 ) be unitary. Then
1.1. Definition
Let .Y be an invariant subspace of A, and let the columns of X form a
basis for X. In Section 1.3 we showed that there was a unique matrix
L such that
The matrix L is the representation of A on X with respect to the basis
X, and the eigenvalues of L are eigenvalues of A.
U nfartl111ately, t.he matrix L need not characterize the invariant sub-
space X" For example, if A = In, then any matrix X E e nx2 with
orthonormal columns spans an invariant subspace whose representa-
tion with respect to X is 1 2 . This shows that we cannot circumvent
the problem of nonuniqueness of eigenvectors by passing to invariant
subspaces: we need additional conditions to insure that the invariant
subspaces are themselves unique.
The key is provided by the observation that if ). is' a simple eigen-
value of A, then its eigenvector is unique up to a scalar multiple. The
analogous requirement for an invariant subspace is that the eigenval-
ues of it.s representation L be distinct from the other eigenvalues of A.
We will say that such an invariant subspace is simple. However, be-
fore making a formal definition, it will be convenient to establish some
preliminary results.
We begin with a useful characterization of invariant subspaces.
( X" A X
( X Y ) H A ( X Y ) = I'" ]
I 2 I 2 y2H AX I
X:fAY 2 )
Yl 1 AY 2 .
By (1.1) the matrix YifAX] is zero. Consequently, if we set
LI = XAXI,
L 2 = Yi I AY 2 ,
and
H = X:IAr2,
then
( L O I L H 2)'
(XI y 2 )HA(X I Y 2 ) =
(1.2)
Theorem 1.1. Let the columns of X be linearly independent and let
the columns of Y span R( X).l. Then R( X) is an invariant subspace
of A if and only if
It is easy to see that
AX I = XIL I ,
Y"AX = o.
(1.1)
so that LI is the representation of A on X with respect to X. Thus
the eigenvalues of LI are the eigenvalues of A associated with X. The
complementary set of eigenvalues are those of L 2 . All this suggests the
following definition.
In this case R(Y) is an invariant subspace of A".
Proof. Let X = R(X). Then by definition X is an invariant subspace
of A if and only if AX c X. But
AX eX{:::=} AX 1- x.l
{:::=} R(AX) 1- R(Y)
{:::=} yHAX = 0,
Definition 1.2. Let X be an invariant subspace of A, and let (1.2) be
its REDUCED FORM with respect to the unitary matrix (XI Y2). Then
X is a SIMPLE INVARIANT SUBSPACE if
£(LI) n £(L 2 ) = 0.
222
V. INVARIANT SUBSPACES
1. SIMPLE INVARIANT SUBSPACES
223
A key fact about simple invariant subspaces is that they have COlIl-
plements in en that are also an invariant su bspaces (to see that this is
not true in general, let
A= ( O 1 )
o 0
We will show that this equation has a unique solution.
Partition Y = (YI '" Yn) and D = (d l ". d n ) by columns. Since T
is upper triangular, the first column in the relation (1.4) is AYI -TIlYI =
d l or
(A - TIlI)YI = d j
(1.5 )
alld considC'r thC' invariant. subspace spanned by 1d. However, before
we can prove this fact we must digress to establish the properties of a
certain linear operator.
Since TII E £(B), the matrix A - Till is nonsingulaL Hence YI is the
unique solution of (1.5).
Now suppose that YI,. .., Yk-I are uniquely determined. The kth
column of (1.4) is
1.2. The Operator T = X I-> AX - X B
k
AYk - L TikYi = d k
i=1
III the sequel we shall have to solve SYLVESTER'S EQUATION, which is of
the form
or
AX - X n = c,
(1.3)
k-]
(A - T,,,J)Yk = (h + L TiYi'
i=1
(1.G)
where A and 1J are square mat.rices of orders n ami rn., so that X and
Care n x rn. matrices. We will be concerned with conditions under
which (1.3) has a unique solution. Equivalently, if we define the linear
operator T : c nxm ---t e nxm by
Since Tkk E £( B), the matrix A - Tkkl is nonsingular. Hence Yk is the
unique solution of (1.6). .
A corollary of this result is a characterization of the eigenvalues of
T = X I-> AX - X B,
T.
then the problem becomes one of determining when T is nonsingulaL
Theorem 1.3. The linear operator T = X I-> AX - X B is nonsingular
if and only if
eorollary 1.4.
L(T) = £(A) - L(B).
£(A) n £(B) = 0.
Proof. If v E £(T), then there is an X such that AX - XB = vX,
or (A - vI)X - X B = 0; i.e., the operator X I-> (A - vI)X - X B is
singulaL It follows that £(A - vI) and L(B) have a common element,
which is to say that v = A -/1 for some A E £( A) and fl E £( B). Hence
£(T) c £(A) - £(B). The inclusion in the other direction follows by
reversing the above argument. .
Proof. First. suppose that A E £(A) n £(B). Let Ap = AP and qBB =
AqIl (p, q =f 0). Let X = pqH. Then
T(X) = ApqH - pqH B = Apqll - ApqH = O.
Thus T annihilates the nonzero matrix X and must be singulaL
Conversely, assume that £(A) n £(B) = 0. We must show that
the system AX - X B = C has a unique solution. Let the Schur
decomposition of B be T = VB BV. Then with Y = XV and D = CV,
thC' C'quat.ioll AX - X B = C is equivalent to
1.3. The Spectral Resolution
AY -)''1' = D.
(1.4)
We are now in a position to show that a simple invariant subspace
has a complementary subspace. The following theorem exhibits the
complement as the column space of a matrix constructed from a reduced
form of the invariant subspace.
22'1
V. INVAIUANT SUUSI'ACES
Theorem 1.5. Let the simple invariant subspace Xl have the reduced
form (1"2) with respect to the orthogonal matrix (XI Y2). Then there
are matrices X 2 and Y I such that
(XI X 2 )-1 = (Y I y 2 )H,
and
A = XILIy/1 + X 2 £2 Y 2 H ,
(1. 7)
whcH'
L . = yHAX
1- 1, Z,
i = 1,2.
Proof. We begin by reducing the matrix
( £1 H )
o L 2
from (1.2) to block diagonal form by a similarity transformation. Specif-
ically, we will show that there is a matrix Q such that
( -n (' :)( en (' J (1")
This is equivalent to showing that there is a matrix Q such that
LIQ - QL 2 = -H.
Since X is simple, £(LJ) n £(£2) = 0. Hence by Theorem 1.3, Q exists
(and is unique).
It follows from (1.2) and (1.8) that
( £1 0 ) =
o L 2
-/Q) ( 2 ) A(X I 1 2 ) ( )
: ) A(X, X,),
(1.9)
where
X 2 =Y 2 +X 1 Q
1. SIMPLE INVARIANT SUI3SPACES
225
and
Y I = XI - Y 2 QH.
The fact that (Xl X 2 )-1 = (Y I y 2 )H follows from the fact that (1.9) is
a similarity transformation. Hence we may write (1.9) in the form
( £1
A = (Xl X 2 ) 0
o ) ( y,H )
£2 Yl l '
from which (1. 7) follows directly. .
From (1.7) we see that AX I = XI£I' More important, AX 2 =
X 2 £2, which implies that X 2 = R(X 2 ) is an invariant subspace of A.
Since (XI X 2 ) is nonsingular, together XI and X 2 span en. Thus we
have the following corollary.
eorollary 1.6. Let XI be a simple invariant subspace o[ A. Then
A has a complementary invariant subspace X 2 . Moreover, the spaces
YI = Xl and Y2 = X{ are the corresponding complementary pair o[
left invariant subspaces.
We will call (1. 7) the SPECTRAL RESOLUTION of A along XI and X 2 .
It is instructive to write the spectral resolution in a different way. Let
Pi = X i y;lI,
i = 1,2.
Then it is easily verified that
1. P/ = Pi (i = 1,2),
2. P I P 2 = P 2 P I = 0,
3. A = PIAP I + P 2 AP 2 .
(1.10)
As we saw in Section 1.2, the first condition says that Pi Xi = Xi (i =
1,2). The second condition says that P I X 2 = O. Hence if we decompose
any vector z into the sum
z = XI + X2,
XI E X I ,X2 E X 2 ,
then XI = Plz and X2 = (I - PI)z = P 2 z. For this reason we say that
PI is a the projection onto Xl along X 2 . We will call it the SPECTRAL
PROJECTION of the simple invariant subspace ,l'!"
226
V. INVARIANT SUBSPACES
When dim(X,) = 1, that is when XI = XI is an eigenveetor, the
spectral projection is PI = xy" and IIPllb = IIY1Ib" We have already
seen that this quantity is a condition number for the eigenvalue Al
[(IV.2.8)]. The quantity II PI II will play an analogous role for LI' Hence
it. is of int.cITst. t.o know t.lw sin!!;ular vaJups of PI'
Theorem 1. 7. Let X be a simple hnmriant suhspace of A and let P
be its spectral projection. Let Y be the corresponding left invariant
subspace, and let BI ::::: B 2 ::::: . . . ::::: 0 be the canonical angles between X
and y. Then BI < and
S(P) = {secBI,secB2,...}.
(1.11)
Proof. We will adopt the notation of Theorem 1.5, with Xl = X, etc.
Since )'1 = XI + Y2Q, we have
1'1 11 )'1 = 1+ Q"Q.
It follows that if PI, P2, . . . are the singular values of Q, then
S(Y I ) = { )1 + pi,)1 + P ,...}. (1.12)
Clearly the columns of Y I (I + QHQ) form an orthonormal basis
for YI. Since the columns of Y2 form an orthonormal basis for XI.l, it
follows from Corollary 1.5.4 that the sines of the canonical angles of XI
and YI are the singular values of
y.]IYI(I + QHQr = Q(I + QHQr.
Hence
. (j P; 1
sm ; = < .
V l + PT
It follows that the canonical angles must be less than . Finally, (1.11)
follows from (1.12) and (1.13) and the fact that XI in the expression
P = XI y l H has orthonormal columns. .
Although we shall not use the fact in the sequel, it is worth noting
that a spectral resolution can be defined for more than two complemen-
tary subspaces. An extreme example is given by the spectral decompo-
sition (1.3.1), in which each invariant subspace is spanned by a single
eigenvectoL This example also shows that although the simplicity of
an invariant subspace is sufficient for a spectral resolution, it is not
(1.13)
necessary.
1. SIMPLE INVARIANT SUBSPACES
227
Notes and References
The theory developed in this section is constructive, in the sense that given a
basis for an invariant subspace, one can construct the associated spectral res-
olution. From a pedagogical point. of vi(w, t1H approach has the advantage
that. one can devplop the theory for a simple eigenpllir- something students
grasp readily '- and then generalize it by replacing lower case letters with
capital letters [214].
The disadvantage of the approaeh is that it deals only with simple invari-
ant subspaces, whereas the set of all invariant subspaces of a matrix has
a far richer structure. For a detailed exposition see the book by Gohberg,
Lancaster, and Rodman [87, 19 86 ].
In spite of it.s simplicity, Theorem 1.1 is the key to perturbation theory for
invariant subspaces, since it furnishes an equation t.hat. an invariant subspace
must satisfy. To obt.ain perturbation bounds all one has to do is solve the
equation.
Equation (1.3) is known variously as Sylvester's equation and Rosenblum's
equation [188, 1956]. The proof of the existence of a solution (Theorem 1.3)
is constructive and can serve as a basis for efficient algorithms for solving
the equation [11, Sg]. Integral representations of the solution have been
given by Rosenblum (Exercise 1.4) and Bhatia, Davis, and McIntosh [31]
(Exercise 1.5).
The possibility of spectral resolutions into more t.han two blocks is treated in
the exercises below. The ultimate spectral resolution is the Jordan canonical
form (d. 1.3.4), in which the blocks are as small and simple as possible.
However, the transformations which producp the Jordan form may be too
ill conditioned to make it usable. This has led some algorithmists to seek
resolutions in which the blocks are nearly as small as possible, given a bound
on the condition of the transformations (e.g., see [95, 189, 18, 128]).
Exercises
1. Given a (not necessarily orthonormal) basis for an invariant subspace of
A, describe in detail how to compute its spectral resolution.
THE FOLLOWING EXERCISES CONCERN THE SOLUTION OF SYL-
VESTER'S EQUATION AX - X B = C AND THE ASSOCIATED OP-
ERATOR T = X..... AX - X B.
228
V. INVARIANT SUBSPACES
2. Assuming A and Bare diagonalizahle, show tlat '1' has a complete sysem
of eigenvectors" Use this fact to give an alternatIve proof of Theorem 1.3.
. f'1' ) L t X - (x x) and define
3. (Matrix representatIOn 0 . e, - ,I... m
ve«X) ( .,:, ) .
Show that
vec['1'(X)] = (1m <:9 A - B <:9 In)vec(X),
where <:9 is the Kronecker product defined in Exercise 1.3.26.
4 (Il 11 [188]) Let r. be a sim I )le closed curve containing £(B) and
. osen ) \!In ., ':f ,
pxdudinp, C(A). Show that
X = - {(A - (1)- I C(B - (1)-1 de.
27rZ ./ 9
5 (Bhatia, Davis, and McIntosh [31]) . Let A and B be normal with. L:A)n
£(B) = 0. Let A = A!R+iA'}, where A!R and A'} are Hermitian, and sllIlIlarly
for B. For t = (71 72? E R2, let
" ( A + A ) alld V(t) _ _ e i(T,B!J1+T2B')) .
U (t) = te' T, !J1 T2 ')
R 2 . f .
Show that if (h is any function integrable on sat IS ymg
T 1
e- 1f Tq;6C:r)d:z; == . ,
. R2 71 + Z72
IItll2 D,
then
X = ( U( -t)CV(t)(h(t) dt.
./R2
-0-
THESE EXERCISES DEVELOP THE PROPERTIES OF SPECTRAL RES-
OLUTIONS WITH MORE THAN TWO BLOCKS.
. . I f A b ,,\ ,,\ Show that there are ma-
6 Let the distlllct el g enva ues 0 e I"" k. I II
. , ) I tl I, X- - Y and
trices X = (X I ... X k) and Y = (Y I ... Y k SUC I la -
ylIAX =diag(LI,...,Lk),
where C(Li) = {Ad. Conclude that
A = XILly1 +... + XkLkyl.
2. PERTURBATION OF INVARIANT SUBSPACES
229
7. Let Pi = XiYi H (i = 1,..., k). Show that the Pi are (oblique) projections
satisfying PiP j = 0 (i ¥' j) and
A = PIAP I +... + PkAP k .
8. Let 4>(A) be defined as in Exercise II.2.20. Show that
4>(A) = X I 4>(Ldy k H +... + Xk4>(LdY1f.
[Note: This exercise shows that the evaluation of 4>(A) may be reduced to
the evaluation of4>(L i ) (i = 1,...,k). Since the orders of the L i may be
much less than the order of A, this reduction may save a great deal of work.]
-0-
2. Perturbation of Invariant Subspaces
In this section we will treat two closely related problems. The first
is the problem of assessing the accuracy of an approximate invariant
subspace in terms of a residual. Specifically, let the columns of Xl form
an orthonormal basis for an approximate invariant subspace of A. Let
L = Xr AX I , and let R = AX I - Xl LI be the associated residual. If
R = 0, then R(X 1 ) is an invariant subspace of A. This suggests that if
R is sufficiently small there will be an invariant subspace XI of A that
approaches R(Xd as R approaches zero. The problem is to bound the
difference in terms of R.
The second problem is our usual perturbation problem. Let XI
be a simple invariant subspace of A and let Ii = A + E. Show that
for sufficiently small E there is an invariant subspace XI of A, that
approaches XI as E approaches zero, and bound their difference in
terms of E.
The two problems are closely related. For example, if the orthonor-
mal columns of XI span an invariant subspace of A and we set !vf =
Xr AX I , then R = AX - X M = EX. Thus for any unitarily invariant
norm, IIRII :::; /lEII, and we may use any residual bound to determine
how near R(Xd is to an invariant subspace of A. In fact, this is the
general approach we will take in this section ,- first establish a residual
bound, then derive a perturbation bound from it.
230
V. INVARIANT SUBSPACES
2.1. The Approximation Problem
Let (XI Y2) be a unitary matrix and let
(XI 1 2 )IIA(X I Y2) = ( :).
(2.1 )
By Theorem 1.1 the space R(Xd is an invariant subspace if and only if
G = yli AX I is zero. The problem treated in this section is to determine
how near R(X I ) is to an invariant subspace of A, when G is small.
The solution is conceptually simple, although fussy to realize. Let
XI = (XI + );P)(I + pH P)
(2.2)
and
f 2 = ()2 - XI pll)(I + P pll)-,
(2.3)
where P is a matrix to be determined so that R(XJ) is an invariant
subspace of A. It is easy to see that (XI }7 2 ) is unitary. Hence by Theo-
rem 1.1, R(Xd is an invariant subs pace of A if and only ify{IAX I = O.
Writing this condition out in terms of (2.2) and (2.3) yields the follow-
ing nonlinear equation for P:
P LI - L 2 P = G - PH P.
(2.4 )
If we define
T = P I-> P L I - L 2 F,
(2.5)
then this equation can be written
T(P) = G - PHP.
(2.6)
The conditions under which this equation has a solution are given
in the following theorem, in which II . II represents a consistent family
of norms.
Theorem 2.1. Let (XI Y2) be unitary. Let LI' L2' G and H be defined
by (2. I) illld set
, = IIGI!'
1] = IIHII.
2. PERTURBATION OF INVARIANT SUBSPACES
231
Assume that £( L ) n £ ( L ) - (/i 1
(2 5) . . I 2 - '11, so t lat the operator T defined by
. IS nonsmgu1ar, and set
b = sep(L I , L 2 ) (f inf IIT(P)II > O.
IIPII=I
Then if
,T] 1
7)i < 4'
there is a unique solution P of (2.6) satisfying
(2.7)
I I P II < 2, ,
- _ vi _ 2 < 2 -.
u + u - 4,17 b
If XI and Y2 are defined by (2.2 ) and ( 23 ) then 'D ( {, ) 1 'D (} ' )
. 1 . 1 ., 11.- """ I an ( 11.- 2 are
sImp e ng It and left invariant subspaces of A Tl t .
with respect to XI is . Ie represen atlOn of A
(2.8)
£1 = (I + pHp)(LI + HP)(I + pHp)-.
The representation of A with respect to Y2 is
£2 = (I + PpH)(L2 - PH)(I + ppH). (2.10)
Proof. The existence of a P satisfying (2.8) is a consequence of Theo-
rem ?11 .at the end of this section. By construction R( X ) and R(Y: )
are Ivanant subspaces. We will establish their simpliclity later t
remams only to estblish the epresentations, (2.9) ,and (2.10). .
The representatIOn of A wIth respect to Xl is X AX I . From (2.2)
AH ' H 1
XI AX I = (I+P P)2(LI+HP+pHG+pHL2P)(I+pHp). (2.11)
From (2.4),
L 2 P = P LI + PH P - G.
If this value is substituted into (2 11 ) th It f '.
t' . ( 2 ) . , e resu , a ter some slmphfica-
lOn, IS .9. The representation (2.10) follows similarly. .
. We now trn t.o an extended discussion of the theorem. The first
pomt to consler IS the interpretation of P and its norm. By Corol-
lary 1.5.4 the sl1lgular values of the matrix
(2.9)
11 A
Y 2 Xl = P(I + plIp)-
232
V. INVARIANT SUBSPACES
are the sines of the canonical angles B 1 , B2'" . between R(Xd and the
invariant subspace R("Yd. If 7r1, 7r2,. .. are the singular values of P,
then
7ri . ()
= sm 0i.
V 1 + 7rT
Hence
7ri = tan B i .
Thus, loosely speaking, the theorem bounds the tangents of the cnoni-
cal angles between R( X I) and an invariant subspace of A. In partIcular,
since sin B 'S tan B (0 'S B 'S ), if II . II is the spectral norm, then
A 12
IIP I - nll2 'S 2 t5 2 '
where PI and PI are the orthogonal projections onto R(Xd and R(Xd.
In terms of the Frobenius norm
A IF
IIP I - P 1 11F 'S 2v 2 t5 F
(see Theorem L5.5). . .
There are three numbers - ,- I' 1/, and t5 -- that determme the eXIs-
tence and size of P. The number 1 is closely related to the residual
R = AX I - XIL I .
In fact, since L, = XrAX I , it follows that
H ( 0 )
(XI 1'2) R = G .
Hence if II . II is unitarily invariant, 1 = IIRII. Moreover, by Theo-
rem IV.1.15, R is optimal in the sense that IIRII is minimal among all
residuals of the form AX I - XIL.
The number 17 = II H II is of only secondary importance in the bound
(2.8). However, it features proninently in. the condition ITJ/t5 2 < 1/4,
which insures the existence of X I. Here It plays a role analogous to
the quantity 1/ introduced at the beginning of the second subsectin of
Section IV.1. If 17 is small, the eigenvalues of L 1 and L 2 are effectIvely
2. PERTURBATION OF INVARIANT SUBSPACES
233
uncoupled. On the other hand, if 17 is large, the eigenvalues of LI and
L 2 look like a single cluster compared to IIAII, and the residual must be
very small for our theorem to hold.
Before we go on to discuss the meaning of the number t5, it will be
worth our while to recast part of the theorem in the less cluttered form
of a residual bound. The key is to note that if we set 5 = Xr A - LI Xr,
then 11511 = II H II for any unitarily invariant norm. In the following
corollary we change our notation slightly.
eorollary 2.2. Let II . II be a unitarily invariant norm. Let (X Y) be
unitary and X = R(X). Let
L = Xli AX and M = ylIAY.
Finally, let
R=AX-XL and 5=X H A-LX H .
If
IIRII1I511 1
sep(L, M)2 < 4'
then there is a simple invariant subspace X of A sl1ch that
II tan[8(X, X)]II < 2 IIRII
sep(L, M)
We now turn to the number t5 = sep(L I , L 2 ), which is the thing
that makes the whole theory work. As the name "sep" indicates, it is
related to the separation of £(Ld and £(L 2 ).
Theorem 2.3. For any square matrices Land M,
sep(L, M) ::::: min I£(L) - £(M)I.
(2.12)
Proof. As above, set T = P I-> P L - M P. If sep(L, M) 0, the
inequality (2.12) holds trivially. Otherwise, T is nonsingular, and
sep-I(L, M) = sup T-I(Q) = liT-III.
IIQII=1
234
V. INVARIANT SUBSPACES
Now by Theorem 11.2.6 the spectral radius of T- I is bounded by
liT-III. I3y Corollary 1.4, .c(T) = .c(L) - £(M). Hence
sep-I(L, M) 2': p(T- 1 ) = max I£(L) - £(M)I-I,
which is equivaknt to (2.12). _
Theorem 2.3 and the bound (2.8) imply that if some of th eigen-
values of LI and L 2 are close then the invariant subspace R(X I ) may
be distant from R(Xd. However, the converse need not be true, since
sep(L I , L 2 ) can be small when the eigenvalues of L 1 and L 2 are well
separated, as the following example shows.
Example 2.4. Let
L = ( )
and
M( )
Then £(L) = {OJ and .c(M) = {:l::d}, so that 6).. = minl.c(L) -
.c( M) I = d. On the other hand, sep( L, M) = E. Thus sep( L, M) can
be arbitrarily smaller than the distance between the eigenvalues of L
and those of M, in the sense that limco sep(L, M)/6).. = O.
The distance 6>. between the eigenvalues of L and those of M is
neceRsarily a continuous function of the elements of Land M; however,
it need not be analytic. For example, perturbing M in the above ex-
ample so that it becomes equal to L changes the (2,2)-element of M
by E. However it changes 6).. by EL The function sep(L, M) is better
behaved.
Theorem 2.5.
sep(L,M)-IIEII-11F1l S sep(L+E,M+F) S sep(L,M)+IIEII+IIFIi.
Remark 2.6. If the norm II . II is unitarily invariant, then we may
replace IIEII and IIFII by IIEII2 andllFl12'
2. PERTURBATION OF INVARIANT SUBSPACES
235
Proof. For the first inequality,
sep(L + E, M + F) = inf l lT'll=1 II(L + E)P - P(M + F)II
2': infllPII=dllLP - PMII-IIEPII-IIPFII}
2': inf i l1'll=1 {IILP - PMII} - IIEII- IIFII}
= sep(L, M) - IIEII - IIFII.
The second inequality is established similarly. _
Thus a perturbation in L or M cannot induce a larger perturbation
in sep(L, M). This stability of the function sep will be important in
establishing the perturbation bounds of the next subsection.
The representations (2.9) and (2.10) imply that
L:(A) = £(L 1 + H P) U £(L 2 - P lJ)"
By (2.8)
IIPHII,IIHPII < 2 ,T] ,
6
and it follows from Theorem 2.5 that
(2.13)
A A ,T7
sep( L I , L 2 ) 2': 6 - 4-y > 0,
the last inequality following from (2.7). This implies that £(Ld n
£(L 2 ) = 0, which shows that R(Xd and R(Y2) are simple invariant
subspaces.
An important consequence of the approximation theorem is a new
class of residual bounds for the eigenvalues of A. Specifically, we have
II(L I + PH) - LIII = IIPHII S 2 ';7 .
Since LI is known and L:(LI + PH) C £(A), we may use the pertur-
bation techniques of the last chapter to bound the distance between a
subset of the eigenvalues of A and the eigenvalues of LI'
Comparing these bounds with the bounds from Theorem IV.1.13,
we see that the approaches are different and give different results. In
the case of residual bounds, we apply perturbation theory to a man-
ufactured perturbation A + E to relate the eigenvalues of LI (M in
Theorem IV.1.13) to those of A. In the approximation theorem, we
23G
V. INVARIANT SUI3SPACES
apply perturbation theory to L 1 to relate its eigenvalues to those of
LI + H P, which are a subset of the eigenvalues of A. Moreover the
perturbations are of different sizes: ry in the case of the residual bounds
and bounded by ry1] / {y in the approximation theorem. Which is better
will depend on the application.
2.2. Perturbation Theorems
The key to obtaining perturbation theorems for invariant subspaces is
to cOlllbine t.lJ(' approximation theorelll with continuity of the measure
sep. Specifically, we have the following theorem, whose proof is left as
an exercise.
Theorem 2.7. Let (XI Y 2 ) be unitary and suppose that R(Xd is a
simple invariant subspace of A, so that
(XI Yd'A(X I Y 2 ) =
( LI H ) .
o L 2
Given a perturbation E, let
(XI y 2 )11 E(X I }2) = ( Ell EI2 ) .
E 21 En
Let II . II represent a consistent family of norms and set
i'=IIE 21 11,
Tl = IIHII + IIE1211,
6 = sep(L I , L 2 ) - IIEI1I1 - IIEdl.
If {y > 0 and
-- 1
ry1] < _
6 2 4'
there is a unique matrix P satisfying
2- -
II PII :::; _ _ ry < 2::;
{y + J {y2 - 4i'T] {y
2. PERTUIU3ATION OF INVARIANT SUUSI'ACES
237
such that the columns of
- 1
XI = (XI + Y 2 P)(I + pHpt'i
and
Y2 = (Y 2 - XlpH)(I + ppH)-!
fr:. JrIl1 orthonormal bases for simple right and left invariant subspaces of
A = A + E. The representation of A with respect to XI is
£1 = (I + pH P)! [LI + Ell + (H + EdP](I + pH P)!,
and the representation of A with respect to Y 2 is
£2 = (I + PpH)-!.
The comments following Theorem 2.1 apply here. In particular the
singular values of P are the tangents of the canonical angles between
R(X I ) and R(Xd.
The expressions for II and £2 are somewhat awkward to interpreL
This is because of the way we have chosen to express XI and f;. If
we choose different expressions, we will obtain different bases for the
perturbed invariant subspaces and hence different representations for
A on those subspaces.
A good choice, it turns out, is to express XI in terms of the spectral
resolution of A. Specifically, let (XI X 2 t l = (11 y 2 )H and
( H ( LI 0 )
Y I Y 2 ) A(X I X 2 ) = ,
o L 2
(2.14)
as in (1.9). If we seek XI and }";.2 in the forms
XI = XI + X 2 P
and
Y 2 = Y 2 - ylpH,
then we have the following theorem.
238
V. INVAnIANT SUI3SPACES
Theorem 2.8. Let A have the spectral resolution (2.14) and set
" . II - ( Fu F12 )
(11 Y2) E('{I X 2 ) = F 21 Fn .
Let II . II represent a consistent family of norms, and set
.:y = IIF 21 11,
11 = IIFnll,
8 = sep(L I , L 2 ) - 11F1l1l -IIFnll.
If 6 > 0 and
- - 1
,T] < _
6 2 4'
there is a unique matrix P satisying
2- -
IIPII ::; _ _' < 2::r,
{j + J {j2 - 4.:yil {j
such that the columns of
XI = XI + X 2 P
and
- II
Y 2 = Y 2 - YIP
form bases for simple right and left invar!ant subspace of A = A + E.
The representation of A with respect to XI is
£1 = LI + Fu + FI2Pf
and the representation of A with respect to )/2 is
£2 = L 2 + F 22 - PF I2 .
Proof. Apply the approximation theorem to eliminate F 21 E in the
matrix
( LI + Fll F I2 ) .
(Y I y 2 )H(A + E)(X 1 X 2 ) = F 21 L 2 + F 22 .
2. PERTURI3ATION OF INVARIANT SUI3SPACES
239
Remark 2.9. Since XI and Y2 are the same in Theorems 2.7 and 2.8,
we have E 2 ] = F 2 ]. Hence if p] denotes the matrix P produced by
Theorem 2.7 and P 2 denotes the matrix P produced by Theorem 2.8,
then
PI(L I + Ell) - (L 2 + En)PI = E 2 ] - PI(H + E I2 )P I ,
and
P 2 (L I + F Il ) - (L 2 + F 22 )P 2 = E2l - P 2 (F 12 )P 2 .
Since these two equations differ in terms of order 0(IIEII2), we have the
remarkable result:
PI = P2 + 0(IIEI12).
It is instructive to consider the difference between Theorem 2.7 and
Theorem 2.8 from_a different point of view. There is no unique basis
for the subspace Xl' However if we chose a matrix Z whose columns
span a subspace that is acute to XI, the normalization
H -
Z Xl = I,
along with R(X I ) = :f\, uniquely determines "Y"I' In Theorem 2.7 we
require the normalization
XII XI = 1+ 0(IIEII2),
(2.15)
whereas in Theorem 2.8 we require
y l H XI = I + 0(IIEII2).
(2.16)
Which is the better theorem? It depends on the application. If
the angles between the invariant subspaces themselves are the chief
concern, it does not make a great deal of differcnce, since the matrices
P produced by the two theorems are the same up to second order terms;
the difference is in the way they are used to get a basis for R( X I)'
On the other hand, if the representations of A on the perturbed
subspaces are of interest, then Theorem 2.8 is the more natural. In the
expression £1 = Ll + Fu + F 12 P, the matrix F 12 P is of order IIEII2.
Since Fu = y/I EX l , we have
£1 = LI + y/IEX I + 0(IIEII2).
240
V. INVARIANT SUBS PACES
Since yll X I = I, this equation is completely analogous to the relation
= A + yllEx + 0(IIEII 2 ),
which we derived in Theorem IV.2.3. Moreover, if we write L 1
1,/' (/\ -t- F.:)X I + O( II E1I2) and compare t.his C'xpression with (IV .2" 7), we
see that. th(' 111<1 trix 1',11 (A + E)X I is a generalization of the Rayleigh
quotient, which we will call the GENERALIZED RAYLEIGH QUOTIENT.
Furthermore, since X f' X I = I, for any unitarily invariant norm
IILI - L111 ,:s IIY I I121IEII.
Thus II Yi Ib is a condition number for LI' This number is also the
norm of the spectral projection associated with the invariant subspace
R(X]), which justifies the statement that the condition number of an
eigenvalue is the norm of its spectral projection. Moreover, by Theo-
rem L7, IIY I I12 = see 0" where 0 1 is tlw largest canonical angle between
R(Xd and R(Yd. Hence,
the condition number of LI (with respect to the normal-
ization (2"16)) is the secant of the largest canonical angle
between R(Xd and R(Y I ).
2.3. Eigenvectors
When the concern is with eigenvectors, the preceding results simplify
considerably, since the operator T - in an obvious specialization of
the abov(' notation- becomes the matrix >,,1 - L 2 . Hence for both
theorems,
p (:>,,1 - L 2 )-I};H EXI'
For Theorem 2.7,
XI XI + Y 2 (A I I - L 2 t l y 2 H ExI,
while for Theorem 2.8
XI XI + X 2 (A I I - L 2 )-IY,JIExI.
The matrix
(AI - /\)# = X 2 (/\II - £2) 11;11
2. PERTURBATION OF INVARIANT SUBSPACES
241
is called the GROUP INVERSE or DRAZIN GENERALIZED INVERSE of Al I - L 2
(see Exercises I11.1.23II1.L24). Since for Theorem 2.8
IIxI - xIII ,:S II(AI - A)#IIIIEII.
the number II (AI - A)# II is sometimes said to be a condition number
for the eigenvector XI; but as we have seen above this depends on the
application. If the angles between the eigenvectors are the concern, then
81 = II(AII -L 2 t l ll is the condition number of the problem. But if we
are interested in the difference between the eigenvectors when yr I XI 1,
then II (AI - A)# II is the condition number. Here is an example of the
latter application.
Example 2.10. A square, nonnegative matrix A is said to be STOCHAS-
TIC if A1 = 1; i.e., its rows sum to one. Clearly, 1 is a. right eigenvector
of A corresponding to the eigcnva1ue onc" 1\10rcover, if A is irreducib1c
(see Example 2" 7 for the definition), then one is a simple eigenvalue and
hence has a unique, positive left eigenvector p that satisfies pT1 = 1.
With this normalization, the components of p can be regarded as prob-
abilities.
Now if we perturb A in such a way that Ii remains an irreducible
stochastic matrix, then we will want to keep the normalization yH 1 = 1,
so that we can continue to regard the components of y as probabilities.
In this case the Drazin generalized inverse provides the condition 11l1m-
bel' for the problem.
The two theorems give us two bounds for eigenvalues -- the first a
bound for the eigenvalue itself, the second for the Rayleigh quotient.
For the eigenvalue we have
I - AI ::; (1 + 2) IIEII.
For the Rayleigh quotient we have
-2
I). - (A + yr 1 E.Tdl ::; 2 .
242
V. INVARIANT SUI3SPACES
2.4. Solution of a Nonlinear Equation
In this subsection we will prove a general theorem that can be used to
establish the existence of P in Theorem 2.1. We state and prove it for
a Banach space, which the reader may take to be em x n.
Theorelll 2.11. Let l' be ;l bouuded lincar operator on a Banach sp<lce
B. ASSIIlIIC I.hi! t T hits a bouuded inverse, and set,
b = 111'-111-1.
Let t.p : B -t B be a function that satisfies
1It.p(x)1I :S 1]llxlI2
and
1It.p(x) - t.p(y) II :S 2Tlmax{llxlI, Ilyll}llx - yll
for somc 1] :::: 0" For any 9 E B, let
, = Ilgli.
If
,Tl
P = 4'62 < 1,
thcn the sequence defined by Xo = 0 and
:rA,+1 = T-I[g + t.p(Xk)],
k = 0,1,...
(2.17)
converges /0 the unique solution of
Tx = 9 + t.p(x)
that satisfies
2, ,
IIxII :S b + Jb2 _ 4,1] < 28'
1\'1oreover,
pk ,
Ilxk - xII :S --.
1 - P b
2. PERTURI3ATION OF INVARIANT SUI3SPACES
243
Proof. We first construct an upper bound on Ilxkll. From (2"17),
Ilxk+111 :S IIT-III(llgll + 1It.p(Xk)ID :S + llxkIl2.
Consequently if we set o = 0 and
, Tl 2
k+l = b + bk'
k = 0,1,"..,
then IIxkll :S k.
Now the sequence o, 6, . . . is clearly increasing. Moreover, if p < 1,
the function
rjJ()=+e,
has a smallest fixed point
_ 2,
· - b + V b 2 - 4,1]
If k :S ., then k+1 = rjJ(k) :S rjJ(.) = .. Hence all the k are
bounded by ., and the sequence {d must converge to .. Thus
,
Ilxkll :S II.II < 28'
We next show that the sequence {xd converges. We have
II X k+1 - xkll :S IIT-IIIlIt.p(Xk) - 'f/(xk-dll
:S 2b- l 1]max{lI x kll, IIXk-dl}lIxk - Xk-III
:S PIlXk - .Tk-III.
Hence
IIXk+1 - xkll :S llixI - xoll :S l.
It follows that {Xk} is a Cauchy sequence and must have a limit x.
Moreover,
00 00 " , pk,
IIx - xkll :S L IIXi+l - xdl :S LP'- = --. .
i=k i=k b 1 - P b
244
V. INVARIANT SUI3SPACES
Notes and References
Alt.hough various results for the perturbation of eigenvectors are found scat-
t.ered in the literature [274, 69, 269], the modern approach via invariant
s1lhspaces crystallized in the sixties and early seventies.
For Hermit.ian mat.rices, a litUe note by Sw<tnHon [2:32, 1961] contains in
embryonic form m1lch of what was to follow. The first dear statement that
the problem really concerned invariant subspaces is due to Davis [50, 51,
1963, 19 G 5]. The importance of Sylvester's equation emerged in the famous
paper of Davis and Kahan [53, 1970], whose content more than justifies its
impenetrability.
For nonnonnal matrices the problem is complicated by two facts. First, there
is no orthonormal system of eigenvectors. Second, the differences among
the eigenvalues of a nonnonnal matrix are not Lipschitz continuous. In
his thesis, Varah [250, 1967] (see also [251]) ameliorated the first difficulty
by working with spectral resolutions, whose transformations are in general
better conditioned than the matrix of eigenvectors; however, his bounds have
thc distancc betwecn the spectra raised to a power in the denominator. Iluhe
[190, 1970] proposed replacing this difference with the smallest singular value
of a power of A - ).,] for suitable A. The use of an orthogonal reduction to
block triangular form to circumvent the first difficulty and the introduction of
the function sep to circumvent the second is due to Stewart [200, 1971], who
proved his theorems for closed operators in a Hilbert space. The exposition
in this chapter is based on a later survey paper [202, Ign]. Lower bounds
on the function sep have been given by Sun [226, 1984]
Yamamot.o [275, 1980] cxploits a different nonlinear equation and eigenpair
to get component-wise bounds (Exercise 2.6).
The fact that the generalized Ilayleigl quotient Yu1XI provides a first-order
approximation to the represent.ation £1 appears to have first been noted by
Stewart [202, 1973], although it is readily derivable from standard perturba-
tion expansions, such as are found in Kato [135, Ch.II]. The observation that
a change in normalization [e.g., from (2.15) to (2.16)] leaves the multiplier
P essentially unaffected may be found in [210" For eigenvectors, nonlin-
ear normalizations are common; for example, one frequently requires that
i H i = 1. This complicates the asymptotic theory for complex matrices,
since the normalization may not be analytic. Meyer and Stewart [155, Ig88]
have treated this problem in detail.
Owing t.o UH'ir Ht.r1l('(,1I1'(', t.he pert1lrbation t.heory for HtochaHtic mat.ricm
2. PERTURBATION OF INVARIANT SUBSPACES
245
can be developed independently of the theory here [194, 106, 47, 155] (see
Exercise 2" 7).
Exercises
TilE FOLLOWING EXERCISES DEVELOP SOME OF TilE ELEMEN-
TARY PROPERTIES OF THE FUNCTION sep, WHICH WE SUPPOSE
TO BE DEFINED WITH RESPECT TO A CONSISTENT FAMILY OF
NORMS II. II.
1. Show that if X and Yare nonsingular, then
sep(A, B)
I\:(X)I\:(Y) :::: sep(X AX-I, Y By-I) :::: I\:(X)I\:(Y)sep(A, B).
2. Show t.hat if X and Yare unitary and /I . II is uuitarily invariant, then
sep(X H AX, yH AY) = sep(A, B).
3. Show that if A = diag(AI ... A k ) and B = dia g(B B ) tl
' , I, . . ., I, ,1en
sePF(A, B) = min{seP F (A; B J " ) : i = 1 ... k. J " = 1 I}
, '",..., .
4. Show that if A and Bare diagonalizable, then
sepF(A B) > IL(A) - L(B )j
, - 1\:2(X)1\:2(Y) ,
where X and Yare matrices of the eigenvectors of A and B.
5. Let II . /lr and /I . lis be consistent norms satisfying
allPlir :::: IIPII" :::: TIIPII,q
for all P E c nxn . Show that
a
sepr(A, B) :;::: -seps(A, B).
T
Use this fact to bonnd seP2 in tf'nns of sep".
-0-
246
V. INVARIANT SUBSPACES
6. (Yamamoto [275]). Let (x,) be a simple eigcnpair of A, with 11:f:112
L Let (:r;, A) = (:i: - h, - 1/) be an approximate eigcnpair. Set E
max{llhll,I1)I}. Show that if f is sufficiently small then the matrix
( A - A/ X )
l J1 ()
is nonsingular and
( A - AI X ) ( h ) _ _ ( Ax - AX ) O( 2)
l Jf 0 1/ - !(1- Ilxll) + f.
Analyze this ('quation to obtain an approximatiOlJ theorem for eigcnvectors.
7. Let A and A be stochastic matrices, each having one as a simple eigen-
value. Let y T and fiT be the corresponding left eigenvectors, normalized so
t.hat yT 1 = fiT 1 = 1. Show that
fiT - yT = yT E(I - A)#.
3. Hennitian Matrices
\.ye now turn to the the pertllfbation of invariant subspaces of Hermi-
tian matrices. In the next two subsections, we will apply the theory of
the last section to Hermitian matrices - first the approximation the-
orem and then the perturbation theorem. In the third subsection we
will deVflop part of the elegant Davis-Kahan theory of invariant sub-
spaces. Finally, we will develop two residual bounds that in some cases
are improvements on the bound of Theorem IV.4.14.
Throughout this section A wjJl be a Hermitian matrix of
order n, as wjJl the error matrix E.
3.1. The Approximation Theorem
When A is Hermitian, several things simplify in the approximation
theorem. In the first place, H = G II . It follows that any unitary
similarity that reduces G to zero also reduces II to zero, and hence that
n(}2) is the invariant subspace complementary to n(X 1 ).
3. HERMITIAN MATRICES
247
If II . II is unitarily invariant, then,
and the condition 1],/ ry2 < 1/4 becomes
, 1
-<-
b 2
The most striking simplification occurs when we take II . II to be the
Frobenius norm.
IIGII
IIGlIlI
1],
IIHI!
Theorem 3.1. If Land 111 are Hermitian, then
sepF(L, M) = min I£(L) - £(M)I.
Proof. As usual, let T = P I-> P L - M P. For any matrix P
(PI P2 "" PI) let
vec(P) =
PI
P2
PI
Then
vec(PL) = Lvec(p),
where
All 1m A12 I m All/",
L= A21 I m ).,22 1m ).,21 1m
All 1m AI2 I m All 1m
Similarly,
vec(M P) = Mvec(P),
where
!v! = diag(M, M,..., M).
Hence
vec[T(P)] = (L - Jvf)vec(P).
Since Land M are Hermitian, the linear operator T is Hermitian. Since
IIPIIF = Ilvec(P)112'
sePF(L, M) = inf T(P) = min £(T) = min I£(L) - £(M)I,
III'IIFI
248
V. INVARIANT SUBSPACES
tl\(' last equality following from Corollary 1.4" .
Thus for Hermitian A, the number 81' in the approximation theorem
truly measures the distance between the eigenvalues of LI and those of
L 2 .
Since LI and £1 = (I +pll P) (L I +H P)(I +pll P)- are Hermitian,
we may use Theorem IV.5.5 to bound the eigenvalues of L.
Theorem 3.2. In the notation of the approximation theorem, let II . II
be the Frobenius norm, so that
8 = min 1.c(LJ) - .c(L 2 )1.
L('t th,8 eigcnvalues of LI be Al ;::: ... ;::: Ak and those of LI be Al >
. . . ;::: Ak' Then
: ( 2 ) 2
, , ,
" ( A - A ) 2 < 2 1 + 2- -.
L' ,- 8 2 8
,=1
(3.1 )
Proof. I3y Theorem IV.5.5 we have
k
I)Ai - i)2 S K:d(I + pllp)]IIJ{IIFIIPIIF' (3"2)
i1
Now IIHIIFIIPIIF S 2,2/8. Moreover, II (I + pH Pt 112 S 1. Finally,
II(J + pllp)lb S (1 + IIPII) S 1 + II" S 1 + 2 : .
Combining these inequalities yields (3.2). .
Note that as , ---+ 0, the constants 2 in the bound (3.1) can be
replaced by functions that approach 1 [cf., (2.8)]. Consequently, the
right-hand side of (3.1) is bounded by a quantity that is asymptotic to
,2/8.
3.2. Generalized Rayleigh Quotients
So far as invariant subspaces are concerned, the comments of the last
subsection apply to the perturbation theorems. However, the repre-
sentations of A on the perturbed subspaces provide new perturbation
3. HERMITIAN MATRICES
249
theorems for eigenvalues. Since A is Hennitian, it is most natural to
work with Theorem 2.8, taking Xi = Y; (i = 1,2). We will also take 11.11
to be the Frobenius norm, so that and ii = TJi and 81' is the distance
between the spectra of LI and L 2 .
The proof of the following theorem is similar to that of Theorem 3"2
and is left as an exercise.
Theorem 3.3. Let the Hermitian matrix A have the spectral resolu-
tion
(XI X 2 )HA(X I X 2 ) =
( LI 0 )
o L 2 '
where L 1 is k x k, and set
( FIt Flr )
(Y I YdIE(X I X 2 ) = .
F 21 F 22
Let
i = IIF 21 11F
and
8 = sePF(L I , L 2 ) - 1IF1lib - II Fdb.
Let Al ;::: . ., ;::: Ak he Uw eigenvalues of Lt and )..1 ;::: ... ;::: )..k l>e the
eigenvalu_es of the genf!.ralized Rayleigh quotient £1 = X: 1 (A + E)X 1 .
Then if 8 > 0 and i / 8 < 1/2, there are eigenvalues jl' . . . , jk of Ii
satisfying
I)ji - >';)2 S 2 ( 1 + 2 2 ) i_2 .
i=1 8 2 8
This theorem shows that the eigenvalues of the Rayleigh quotient
are, up to terms in IIEII2, eigenvalues of A.
3.3. Direct Bounds
A great deal of the complexity of the preceding theory is due to the
necessity of establishing the existence of the perturbed invariant sub-
space. For Hermitian matrices the existence is often obvious. For ex-
ample, in the usual notation, suppose that min l.c(LJ) - .£:(L 2 ) 1 = 8
and IIEII2 < %. Then by Corollary IV.4.6, min I.£:(LJ) - .£:(£2)1 > 0, and
2[)(}
V. lNVAIUAN'l' SUBSI'ACES
there_ are unique complementary invariant subspaces associated with £1
and L 2 . Tlms we may assume the existence of the perturbed invariant
subspace and proceed directly to bounds on the canonical angles be-
tween the original and its perturbation. This general approach is due
to Davis and Kahan.
TlH' first. "sin 8" thearem is so called because it bounds the sum
of squares of the sines of the canonical angles between an invariant
subspace of A and an approximation.
Theorem 3.4. Let A have the spectral resolution
( yH )
"I A(X 1 X 2 ) = diag(LI' L 2 ),
X 2
where (XI X 2 ) is unitar.y with XI E e nxk . Let Z E e nxk have or-
t1lOnorma1 co1u111ns, and for any Hermitian !v! of order k, let
R = AZ - ZM.
If
b = minl£(L 2 ) - £(M)I > 0,
(3.3)
then
II sin 8[R(Xd, R(Z)]liF ::; ""F .
Proof. From the definition of R and the fact that Xr A = L2XI, we
have
XIR = L2XIZ - XrZM.
(3.4 )
I3y Thearf'm 3.1,
IIX' ZIIF ::; ""F .
By Theorem 1.5.5, IIXrZIlF = II sin0)[R(Xd, R(Z)]IIF' .
Thus the theorem bounds the error in the approximate subspace in
terms of the residual R. Since Z and M are arbitrary, we may use it
to assess the accuracy of the vector from any approximate eigenpair
(z, f.l), provided we can find a lower bound the distance from Ii to n - 1
eigenvalues of A.
3. llEHMITIAN MATlucES
251
As k becomes large, the conclusion of the theorem becomes less
and less satisfactory because II sin8[R(X]), R(Z)]IIF can be large even
when the individual canonical angles are small. What we need in this
case is a bound on II sin 8[R(X I ), R(Z)]1I2, which we can obtain if we
are willing to place further restrictions on the spectra of L 2 and M. We
begin with a lemma.
Lemma 3.5. Let II . II be a consistent nonn Lf't A and B be square
with II All ::; a and liB-III-I;::: a + b, where b > O. If
AX - X B = C,
then
IIXII ::; II" .
Proof. By consistency IIAXII ::; aliXl1 and IIX BII ;::: (0' + b)IIXII.
Hence
IICII ;::: IIBXII - IIAXII 2: (0 + b)IIXIl - aliXIl 2: bliXIi. .
We are now in position to establish a second sin 8 theorem.
Theorem 3.6. In the notation of Theorem 3.4, Sllppose tlwt
£(M) C [a,;31
(3.5)
and that for some b > 0,
£(L 2 ) C R \ [0' - b,;3 + b].
(3.6)
Then for any unitarily invariant nor111
. IIRII
II sm 8[R(X]), R(Z)] II ::; T'
Remark 3.7. The matrices LI and M may be switched in (3"5) and
(3.6).
Proof. By translating the spectra of A and M, we may assume without
loss of generality that a = -;3. The result now follows on applying
Lemma 3.5 to (3.4). .
In some applications, the columns of the matrix Z may not be
orthonormal. The following theorem shows that with an appropriate
correction factor, the above bounds continue to hole!.
252
V. INVARIANT SUBSPACES
Theorem 3.8. In Theorems 3.4 and 3.6, let inf 2 (Z) (i.e., the smallest
singular valuf' of Z) be positive. Then
II sin e[R(Xd, R(Z)]II::; .IIRI ) '
bmf 2 Z
Proof. Let Z = Q R be the QR factorization of Z. Then the proofs of
the sin 8 theorems show that
IIX' ZII = IIX'QRII ::; IIII .
But
IIxi'QRII :::: inf 2 (Z)IIXi I QII = inf 2 (Z)1I sin (0[7(XI), R(Z)]II. .
We conclude this subsection with a bound on the tangents of the
canonical angles. Here we must restrict 111 to be equal to ZH AZ and
impose further restrictions on the disposition of the spectrum. The idea
of the proof is to prove the theorem for the norms II . 111» associated
with Fan's symmetric gauge functions
<Pj(;Z;) = . llIax {I;ll +... + Iil}"
l<::l1<"p<zj:':n )
We begin with a lemma.
Lemma 3.9. Let R have singular values 0"1 2: 0"2 2: . " 2: O"k. If Rj is
any leading principle suhmatrix of R, then
j
trace(Rj) ::; L O"i'
i=1
Proof. By Theorem 1.4.4 the sum of the singular values of Rj is less
than or equal to 'Li=l O"i. By Lemma 11.3.4 the trace of Rj is less than
or equal to the sum of its singular values. .
'rVe llIay now prove the tan e theorem.
3. HERMITIAN MATRICES
253
Theorem 3.10. In the notation of the sin e theorems, let
M = ZHAZ,
and assume that
£(M) C [0',,3]
while for some b > 0
£(L 2 ) C (-00, 0'- b]
(or £(1 2 ) C [,8 + b, 00)). Then
II tan 8[R(X I ), R(Z)]II ::; IIII .
Proof. SOllie preliminary transformation will make the proof easieL
First, we may assume without loss of generality that £1 and L 2 are of
the same ordeL For if the order of 1 1 is less than that of 1 2 , we may
augment the reduced form of A to diag(LI, vI, 1 2 ), where v E [a, ,8].
This will make no difference in the final bounds.
N ext by passing to canonical bases, we may assume that
Z=()
and
( r - )
X = (XI X 2 ) = r '
where r = diag( cos B i ) and = diag(sin B i ) consist of the cosines (in
ascending order) and the sines (in descending order) of the canonical
angles between R(Z) and R(Xd. In this coordinate system partition
A in the form
A = ( All AI2 ) .
A 21 A 22
Since M = All, we have
R (;: ;:)( n (n Au U, ) (3.7)
254
V. INVARIANT SUBSPACES
Note that by the simplification of the preceding paragraph all the above
submatrices are square and of the same order.
Since
(-2:: r)A = L 2 ( -2:: f),
Oil mult.iplyinp; (3.7) by (-2:: f) we have
1'A' 21 = 2::A ll - L 2 2::.
The ith diagonal dement of this relation is
(2]) _' (II) (22) .
cos Bin ii - sm B i ( nii - '\i ):::: b sm B i ,
the last inequality following from the fact that n;;I) E [a,,8] and A;;2) E
(-00, a - b]" Since b > 0, the cosine of B i cannot be zero. Hence
0:;;1) :::: b tan B i . It follows from Lemma 3.9 that
j j
IIRlI lJlj :::: La;;I):::: bLtanB i = IItan811lJ1j'
i=1 i=1
where 8 = diag(e i ). The theorem now follows from Fan's theorem
(Theorem 11.3.17). .
Although the hypothesis on the situation of the spectra of 1.1 and
L 2 may appear unnecessarily restrictive, it is necessary (Exercise 3.3).
However, it answers to the frequently occurring case where Z is an
approximation to an invariant subspace corresponding to the largest
(or smallest) eip;envalues of A.
3.4. Residual Bounds for Eigenvalues
In Example IV.4.17 we saw that the residual bound for eigenvalues
provided by Theorem IV.4.14 could be off by orders of magnitude. The
problem is that the bound does not take into account the situation of
the spectrum. In this subsection we will apply perturbation theory for
invariant subspaces to derive new bounds that do just that.
Throughout this subsection X will be an n x k matrix with or-
thonormal columns. We will set
111 = XII AX
3. HERMITIAN MATRICES
255
ami
R = AX - X 1.1.
We wish to show that there are k eigenvalues of A near the eigenvalues
1],1 :::: . . . :::: I],k of 1.1, and further show that the difference is proportional
to IIRII2.
The basic idea is simple. We use olle of the direct bounds from the
last section to give us a matrix X = (X + Y P)(I + pI! P)-! whose
columns span an invariant subspace. We then know from the approxi-
mation theorem that if = ""yH AY is similar to 1.1 + e H P. But e and
p are both of order II RII, so that an application of perturbation theory
for eigenvalues will give a bound of order IIRII2.
Since the direct bounds do not give us explicit relations between the
subspaces, we must begin with a lemma that allows us to deduce the ex-
istence of P. Its proof, which llses the canonical bases of Theorem 1.5.2,
is left as an exercise.
Lemma 3.11. Let (X Y) be unitary lfR(X) and ,:f' are acute, there
is a matrix P such that R(X + y P) = X. The singular values of
P'(I + pH P)-! are the sines of the canonical angles between R(X) and
X.
The first residual bound is based on the second sin 8 theorem.
Theorem 3.12. Suppose that
there is a nl11nber b > 0 SUell that exactly
n - k of the eigenvalues of A lie outside the
interval [Ilk - b, PI + b]
(3.8)
and
p == IIRII2 < 1.
b
Then there is an index j such that Aj, . . . , Aj+k-I E (pk - b, PI + b) and
I . - A. I < IIRII
M. )+.-1 - 1 _ p2 b '
Proof. Let (X Y) be unitary. Then
( :: ) A(X Y) ( '),
i=I,...,k.
(3.9)
256
V. INVARIANT SUBSPACES
where IIGII2 = IIRII2" By Theorem 3.6 and Lemma 3.11, there is a
matrix P satisfying
IIP(1 + p Hp t t ll 2 ::; p.
(3.10)
such that the columns of
AX' = (X + }' P)(l + pI' P)-
span an invariant subspace of A. From (3.10) it follows that
II P l12 <
J l + IIPII - p,
and since p < 1,
P
IIPll2 ::; .
I-p
A 1 A A
Let Y = (Y - X pH)(I + P pH)- 2. Then (X Y) is unitary. Since the
A A H A
columns of X span an invariant subs pace of A, we have Y AX = O.
Hence
(3.11)
( XH ) A A ( if 0 )
yH A(X Y) = 0 N .
As in the proof of Theorem 2.1, it can be shown that
if = (1 + pHp)t(M + G H p)(1 + pHp)-.
The eigenvalues of !V! are eigenvalues of A. Since p < 1 it follows
from the residual bound of Chapter IV (Theorem IV.4.14) that they lie
in the interval (II,. - f1,II'1 + b), and hence are Aj,". ., Aj+k-J for some
index j. By the similarity bound of Theorem IV"5.3,
1/ 1 i- A j+i-ll::; 1I(1+pHp)tI1211(1+pHpttIl21IGI12I1PI12' i = 1,...,k.
The theorem now follows on noting that 11(1 + pHpttll2 ::; 1 and
inserting the bound (3.11) for II P Il2. .
In the bound (3.9) the factor (1 - p2)-1 is insignificant when p is
even a little less than one. The factor IIRIlUb is quadratic in IIRII2;
however, as f1 decreases the bound deterioriates.
3. HERMITIAN MATRICES
257
For p < the bound is less than the bound II RI12 provided by
Theorem IV.4.14. Moreover, the bound is asymptotically sharp, as the
matrix
A( :)
from Example IV.4.17 shows.
The requirement (3.8) unfartunately does not allow the cigenvalues
of M to be scattered through the spectrum of A. However if we pass
to the Frobenius norm, then we can obtain a Hoffman-Wielandt type
residual bound. Specifically, if there is a set 12 2 consisting of n - k
eigenvalues of A (counting multiplicities such that)
b = min I£(M) - 12 2 1 > 0,
(3" 12)
then Theorem 3.4 shows that there is a matrix P satisfying
IIP(1 + pH P)t 112 ::; IIP(1 + pHp) IIF ::; IIIIF
such that the columns of
x = (X + Y P)(1 + pHp)-
span an invariant subspace of A. By the A similarity bound of Theo-
rem IV.5.5, the eigenvalues Ail, . .. , Ajk of M may be ordered so that
k
2:.:(l1i - Aj,F ::; 11(1 + pH P)t 11211(1 + pH Ptt 11211511FIIP1I2"
i=1
Hence we have the following theorcnL
Theorem 3.13. With the above definitions, assume that A and M
satisfy (3.12). If
PF == IIRIIF < 1,
b
then there are eigenvalues Aj) , . . . , Ajk of A such that
(1 1- A ) 2 < IIRII .
1 ), - 1 _ 2 b
i=1 PF
258
V. INVAIUAN'l' SUUSPACES
Notes and References
The observation that for Hermitian matrices the function sepin the Frobenius
norm reduces to the minimum difference between the eigcnvalues was made
by Stewart [200, 1971]. The knowledgeable reader will have noted that we
have surreptit.iously int.roduC!d tlw Kronecker or tensor product in the proof
of TI\('orelll :.L 1.
TI\(' sill (-) awl t.an (-) t.heoremsare dlw t.o Davis and Kahan [53, 1970]. Earlier
Davis [50, 51, 1963,1965] est.ablished bounds on sin 20) and tan 2(O, which
are also present.ed in this ground-breaking paper, along with much, much
more, including Theorem 3.4. It should be noted that Davis and Kahan
work with bounded operators in a Hilbert space, and some of their results
extend to unbounded operators.
The residual bounds for eigenvalues are due to Stewart [21S, IgSg].
Exercises
TilE FOLLOWING TWO EXERCISES SPECIALIZE THE RESULTS OF
THIS SECTION TO EIGENVALUES AND EIGENVECTORS. NOTE
TIIAT THE EIGENVALUE BOUNDS ARE A LITTLE SHARPER THAN
TilE ONES ONE WOULD GET FROM THE THEOREMS IN THE TEXT.
L Let (z, It) be an approximate cigenpair of A with IIzll2 = 1. Let l' =
A.T, - 1/:1:. Suppose that there is a set L of n - 1 eigenvalues of A such that
0= minlL - {A}I > o.
Show that there is an eigenpair (x,..\) of A satisfying
. 111'112
Sill L(:1:, z) 'S T
and
{ 117'1I2 }
III - AI 'S min 111'112, T .
2. Let (x, A) be a simple eigenpair of A with IIxll2 = 1. Let A = A + E and
set f = IIEII2' Let 0 be the distance from A to L(A) \ {A}. Show that if f < 0
and f 1
-<-
f, - f 2'
4. THE SINGULAR VALUE DECOMPOSITION
259
then there i:-; an eigenpair (i:,).) of A satisfying
tanL(.T,i:) <2 0f
and
- H - f2
IA -.7: A:rl < 2"
{J-f
-0-
3. (Davis and Kahan [53]). By considcring the matrices
A U I ; ) and A (l
o
o
I )
v'2
'
o
1
v'2
show that the hypotheses on the situation of the eigenvalues in the tan e
theorcm are necessary.
4. The Singular Value Decomposition
The perturbation theory for singular values and vectorscomplicated by
two troublesome facts. The first is that we must deal with both right
and left singular vectors. The second is that the singular values of a
matrix are not differentiable functions of the matrix. For example, if
A = a is a 1 x 1 complex matrix, then its singular value is lal, which is
not an analytic function of a. In particular, if we seek a perturbation
expansion for a = a + f, we cannot s imply write
a = / (a + f)H(a + f) a + (fa + (if),
since the right-hand side of this expression may not be nonnegative.
For the larger singular values this example presents no problem; but it
shows that we must take care in dealing with singular values near zero.
In the next subsection we will consider a generalization of the sin 8
theorem for subspaces spanned by singular vectors, which are some-
times called SINGULAR SUBSPACES Here we circumvent the problem by
working with the Jordan- Wielandt matrix to get simultaneous bounds
for spaces spanned by right and left singular vectors. In the following
subsection, we derive a perturbation expansion based on the cross-
product matrix.
2GO
V. INVARIANT SUBSPACES
Throughout this section A will be an rn x n matrix with
rn 2: n.
4.1. Two sin 8 Theorems
In this subsection we will establish sin 8 theorems, due to Wedin, for
spaces spanllcd by the sillgular vectors of A. To fix the notation, let
( l 0 )
(U I U 2 U 3 )HA(V I V 2 ) = 2
be a partitioned singular value decomposition of A (here we do not place
ally constraints on the order in which the singular values appear), and
let
( I:l 0 )
(lh lh ud l A(Vi \/2) = 2
be a conformal partition of A = A + E. Let 1:> be the matrix of canonical
angles between R( Ud and R( U I ), and let 8 be the matrix of canonical
angles between R(\lI) and R(Vi) Finally, let
R = A\l1 - UII: I
H - --
and S = A U I - \l1I:1'
(4.1 )
;" /
"1' ,
The following theoren: 'bounds the angles 1:> and 8 in terms of the
residuals Rand S"
Theorem 4.1 (Wedin). Suppose that there is a number b > 0 such
thilt,
minla(t l ) - a(I: 2 )12: band mina(td 2: b.
( 4.2)
Then
VII sin 1:>11 + II sin 811 S VIIRIIb+ IISII .
- -
Remark 4.2. The matrices U i , Vi, U i , and Vi may be replaced by
any matrices with orthonormal columns spanning the appropriate sub-
spaces.
4. THE SINGULAR VALUE DECOMPOSITION
261
Proof. Consider the Jordan - Wieland t matrix
e- ( 0 A )
- A H 0 '
whose eigenvalues are ::!:al, . . " , ::!:a n with m - n additional zero eigen-
values. Let C be the .Jordan- Wielandt matrix for X
It is easy to show that if
1 ( UI UI )
X = J7\ == (XI X 2 )
v2 \II -Vi
then R(X) is an invariant subspace of e. The representation of e on
this subspace is diag(I:I, - I: 2 ). Similarly, if
- 1 ( UI UI ) --
X = J7\ - - == (XI X 2 ),
v2 VI -\II
then R(X) is an invariant subspace of C. The representation of C on
this subspace is diag(t l , -t 2 ). Hence by Theorem 3,1, if we set
- --II--
I' = ex - X(X ex),
the:!
II sin 8[R(X), R(Y)]IIF S IIIIF .
( 4.3)
To arrive at the conclusion of our theorem, we must compute the
left- and right-hand sides of (4.3)" For the left-hand side, note that
Px = XXII = diag(UU II , V\lII) = diag(Pu, P v ),
and similarly for p\;' Hpnc(' by Theorem 1"5.5
II sin 8[R(X), R(X)]II = IIP,*Pxll
= IIcliag(P[t PO), cliag(P Pi! )II
= II sin 1:>1I + II sin 811.
(4,1)
For the right-hand side, a straightforward computation shows that
T= ( -;).
262
V. INVARIANT SUBSPACES
Hence
, 2 2 I 11 2
IIlllr = IIRlir + IS r"
( 4.5)
The theorem follows on combining (4.3), (4.4), and (4"5). .
The appf'arance of the condition a(EI) ::::: f1 seems strange at first,
hut it is I\('('('ssary :lS tJ\(' followinp; ('xalllplc shows"
Example 4.3. Let
AU n and AU n
The 111 makes an angle of 45 degrees with UI, even though the singular
value I' of A is well separated from the other singular value.
This example also points to a fundamental defect in Theorem 4.1.
Although the vector VI is insensitive to perturbations in A, its bound
is governed by the ill-conditioning of 111. However, the problem can be
circumvented by using the theorem to bound the perturbation in R(V 2 ).
Since R(V 1 ) and R(V 2 ) are complementary subspaces, the same bound
will serve for R(Vd.
By imposing further restrictions on the singular values, we may
establish a bound on the 2-norm (actually on any unitarily invariant
norm). The proof of the following theorem is a variant of the proof of
Theorem 4.1 and is left as an exercise.
Theorem 4.4 (Wedin). Suppose that there an numbers cr, b > 0
such that
min a(td cr + band max a(E2) :::; cr. (4.6)
Then
max{IIRlb IISlld
max{11 sin «I>lb, II sin 81b} :::; {) .
The condition (4.6) restricts the bounds to subspaces associated
with a group of the largest singular values. However, by the trick
described in connection with Example 4.3, we can use the theorem
indirf'dly to get bounds on the perturbation in R(F 2 ).
4. THE SINGULAR VALUE DECOMPOSITION
263
4.2. A Perturbation Expansion
In the last subsection we saw that the perturbation theory for singular
vectors associated with small singular values presented some difficulties.
Actually, small singular values themselves exhibit curious behavior-
they tend to get larger (after all, they have nowhere to go but up)"
Since this fact has important. consequences for applications to least
squares problems and linear regression, we will develop a pert.urbation
expansion that shows what is going on. The key is to smooth out
the behavior of the small singular value by working with its square, or
equivalently with the cross-product matrix A"A.
We begin with a lemma that follows directly from Theorem 2.7
applied to Hermitian matrices.
Lemma 4.5. In the Hermitian matrix
( cr h" )
h C
let cr and C, be constant and let h depend on a parameter I' in such a
way that
IIhll 2 = 0(1')
as I' --+ O. Let the quantities ii, e and h satisfy
Iii - crl, lie - CII2 = 0(1')
and
IIh - hll 2 = 0(1'2)"
If crl - C is nonsingu1ar, then for all sufficiently small I' the matrix
( ')
(4.7)
has an eigenvector
((Ol-)'h) +0(,')
( 4.8)
corresponding to the eigenvector
ii + hH(crl - C)-lh + 0(1':1).
(4.9)
2G4
V. INVARIANT SUBSPACES
To apply the lemma, let A have the singular value decomposition
Un A V ( n
(here, as above, we do not assume that the singular values appear on
t.Ite diagonal of in des('('nding order)" Partition
p-l n-1J
U = (UI U 2 U 3 )
and
I 1'-1
V = (VI V 2 ).
Partit.ion
( al 0 )
UIlAV = 2
conformally. Finally, given a perturbation A = A + E of A, let
( I'll
U" EV = 921
931
9 " )
12
G 22
G n
so that
U H Av = ( a] :]1'11 E 2 2G22 ) . (4.10)
.l!:11 G: 12
The following theorem contains the chief result of this subsection.
Theorem 4.6. Let
h = al912 + E 2 921' (4.11)
If all - E 2 is nonsingular (i.e., if al is a simple singular value of A),
then as E -> 0 the matrix (4.10) has a right singular vector of the form
( (aU -\:;)-Ih ) + O(IIEII)
( 4.12)
4. TilE SINGULAR VALUE DECOMPOSITION
265
corresponding to a singular value al satis(yjng
ai = (al + I'll ? + 1192]1I+ 1193111+hH(ail - E)-lh+O(IIEII). (4.13)
Proof. In Lemma 4.5 let ex = ai and A = E, and let h be defined as
in (4.11). Identify the elements of the matrix (4.7) with the elements
of t.he partitioned cross product matrix (U ll A V)II (U ll A V), so that
ii = (al + 1'11)2 + 1192] II + 1193] II,
h = h. + 1'11912 + G2921 + G931'
A = (E 2 + G 22 )2 + 9129 + GG32'
Then the conditions of the lemma are satisfied and (4.12) and (4.13)
follow by making appropriate substitutions in (4.9) and (4.9). .
We have expanded the perturbed singular vector (4.12) in terms of
the transformed matrix U B A V; in terms of the original matrix we get
the expression
VI = VI + V 2 (aiI - ED-Ih + O(IIEIID.
The results can also be stated in terms of projections. Let PI, P 2 ,
and P3 be the orthogonal projections onto the column spaces of UI, U 2 ,
and U 3 , and let QI, Q2, and Q3 be the orthogonal projections onto the
column spaces of VI, V 2 , and V!. Then
IIU!I E1;j1l2 = IIPiEQjii,
so that an expresRion like (al + I'll )2 + Il.q2111 + IIg:!lIl can be written
- 2 - 2 - 2 -
IIP I AQI1I2 + IIP 2 AQI!l2 + IIP 3 AQI1I2 = IIAQIII.
In particular, if al is large compared with E, then the second order
terms in (4.13) are negligible compared to the first order terms, and we
have
a] = al + I'll + O(IIEIID
= al + ur EVI + O(IIEII)
= IIP]AQI1I2 + O(IIEIID
2GG
V. INVARIANT SUBSPACES
Our expansion quantifies the observation, made at the beginning
of this subsection, that small singular values tend to increase. For if
(}I = 0, then II, = E 2 921, and
hH((}I - ED-Ill, = -1I92111.
It. follows that
ai = ,il + Ilg3111 + O(IIEIID
= (u]Evd 2 + IIUJIEvdl + O(IIEIID
= II(P I + P3)EQIII + O(IIEII).
Notes and References
The <P e theorem for the 2-norIn (Theorem 4.4) was proven by Wedin [259,
1972], who established the results for arbitrary unitarily invriant nrms.
The <P Theta theorem for the Frobenius norm (Theorem 4.1) IS techmcally
new, but the proof is a modification of a comment by Wedin on another way
of proving the <p, e theorem for the 2-norm.
A It.hollgh WI' hav(' stn's('d dir('ct. bounds iu this section, th(' ap.proach taken
in Sect.ioll 2 for invariant subspaces can be adapted to the slllgular value
decompositioll. Briefly, let U = (U I U 2 ) and V = (VI V2) be unitary, and let
( 5 HH )
UHAV = I .
G 52
We seek
, ( I
U = (lh U 2 ) P
1
(I + plI P)-2
o
(I + pOpH)_ )
) (
_pll
I
and
1
, ( I QH ) ( (I+QIIQ)-2
V = (VI V2) 0
-Q I
(I + QOQH)_ )
such that that ejH A V is block diagonal. This requirement leads to the
equation
T(Q,P) = (G,H) - rjJ(Q,P),
4. TilE SINGULAR VALUE DECOMPOSITION
2G7
where
T = (Q, P) f-> (Q5 1 - 5 2 P, P5r - 5IQ)
and
rjJ = (Q, P) f-> (QGP, PHQ).
If we set I/(Q, P)II = I/QI/ + IIPII alld let IITI/ be the subordinate operator
norm, then I/T-II/-I = mi1l15(5d - 5(5 2 )1. Theorem 2.11 now applies to
give conditions for the existence of P and Q amI bounds on their norms.
This development is due to Stewart [202, 1973].
The material on perturbation expansions in the second subsection is taken
with small changes from a paper by Stewart [212, 19 8 4]. In least squares
problems with errors in the least squares matrix (errors in the variables, as
they arc known to the statistical COmIll\lIIity), the illcrease of small singular
values manifests itself in a downward bias of the least squares solutioll (e.g.,
see [19, 213]). This has lead to the development of techniques to remove
the bias [80, 92]. It should be noted that the solutions produced by these
techniques differ from a least squares solution ollly in second order terms
and higher [210].
Closely related to these perturbation expansions are characterization by Sun
[230, Ig88] of the behavior of a simple singular value whell the elements of
its matrix are analytic fllnct.ion of several complex variables.
Exercises
1. Verify Remark 4.2.
2. Let () = inf 2 (A) be the smallest singular value of A with right singular
vector v and similarly for if and 11. Let (j be the distallce between () and the
next largest singular value of A. Show that if I/EI/2 < b then
. L( ' ) < I/EI/2
SlIl 11, V - b _ I/EI/2
[Hint: Work with the complementary spaces and regard A as a perturbation
of A.]
THE FOLLOWING EXERCISES PRESENT VVEDIN'S PROOF [259] OF
THEOREM 4.4, WHICH IS VALID FOR ANY UNITARILY INVARIANT
NORM 1/. 1/. HERE, IN THE NOTATION OF THE FIRST SUBSECTION,
H' '" 'H "
WE SET Ai = UiI:iV; (z = 1,2) AND Ai = UiI:iV; (z = 1,2).
268
V. INVARIANT SUBSPACES
:t Show t.hat.
I'J,p Ji , = (l'J, E1';\1I + A2P'IIPJiIl)Al
, I'
and
PAHP A \ = Al1(P A - EP A \ + P A Px , Ad.
I 1 1 1 1
.1. Lpt. nand S lH' dditwd by (/1.1)" Show that
IL == max{llPt EPA"II, P A EP A \} max{IIRII, JlSJI}.
1 1 1
5. Let /I, be defined as in the last exercise. Show that uwler the hypotheses
of Theorem 4.4
II sin <1>JI IL + nil sin 811
0'+8
and
II ' E -' II IL+O'Jlsin<1>JI
sm 7 < .
- (10 + 8
llence
{II ' "" II II . E " II} rnax{JlRII,IISII}
max sm 'i' , sm 7 < .
. . - 8
-0-
6. Fill in the details of the sketch given in the notes and references for an
approximation theorem for the singular value decomposition. I3e sure to
pxhibit. mat.rices whose singular vaineR arc those of A. Derive a perturbation
t.hporpll1 from t.!w approxilnatiOI1 t.!worclIl.
7. (A norm version of Theorem 4.6 [208]). Let A have singular values (JI
. . . (In and A have singnlar values 0"1 . . . 0"11' Show that
- 2 ( ) 2 2
(Ji = (Ji + Ii + 17i ,
i=l,...,p,
where
hil IIPAEI12
and
inf 2 (PX E) 1/i IW.t EJl2
8. (Scaled null vedors [210]). Let A E C"'Xl1 have rank n, and let b = Ax.
Let. A = A + E and b = B + c. For l' nonsingular, let (JT be the smallest
singular value of (AT Ii) and
( :rT )
-1
4. THE SINGULAR VALUE DECOMPOSITION
2G9
be the corresponding singnlar vector. Show that.
T-IXT = At/; + O(II(E e)Jl2).
Chapter VI
Generalized Eigenvalue
Problems
A MATRIX PENCIL is a family of matrices A - AB, parameterized by
a complex number A. When A is square and B = I, the zeros of
the function det(A - >'B) are the eigenvalues of A. Consequently, the
problem of finding the nontrivial solutions of the equation
Ax = ABx
is called the GENERALIZED EIGENVALUE PROBLEM.
Although the generalized eigenvalue problem looks like a simple gen-
,eralization of the usual eigenvalue problem, it exhibits some important
; differences. In the first place, it is possible for det( A - >'B) to be iden-
'tically zero, independent of A. For such SINGULAR PENCILS every scalar
can be regarded as an eigenvalue.
Second, it is possible for B to be singular, in which case the problem
,bas infinite eigenvalues. To see this, write the generalized eigenvalue
>problem in the reciprocal form
Bx = AIAx.
B is singular with a null vector x, then Bx = OAx, so that x is
eigenvector of the reciprocal problem corresponding to eigenvalue
271
272
VI. GENERALIZED EIGENVALUE PROBLEMS
A -I = 0; i.e., A = 00. It might be thought that infinite eigenvalues
are special, unhappy cases to be ignored in our perturbation theory,
but that is a misconceptiOlL If we write the eigenvalue problem in the
cross-product form
,6Ar = (xBx,
(1)
then we see that infinite eigenvalues correspond to nonzero pairs (a, (3)
for which (3 = 0, a case that is not essentially different from the case
a = 0 (i.e., A = 0). In this chapter we will deal with the problem of
infinite eigenvalues by treating generalized eigenvalue problems in their
cross-product forms.
Finally, t.lH'H' are' difficult and unresolved problems connected with
the' scalillg of generalized e'igenvaille prohlems" In the ordinary eigen-
value problem, the fact that B = I provides a natural scale: namely the
size of A. For the generalized eigenvalue problem, we may scale both
A and B, and the perturbation bounds we derive will be essentially
different for different scalings. This is an open research problem, which
will keep returning to haunt us.
In spite of the differences between the generalized and the ordi-
nary eigenvalue problems, they have striking similarities, similarities
we will stress as much as the differences. In fact, this chapter is a copy
en 1T/,iniat1J.n of the' part of the book that concerns eigenvalue prob-
lems. The first section is devoted to the background- an algebraic
introduction to t.he subject. We then turn to perturbation bounds for:
the eigenvalues of regular matrix pencils - the nat.ural generalization
of the ordinary eigenvalue problem - and then for their eigenspaces-
the natural generalization of their eigenvectors. Finally, we consider
both the eigenvalues and eigenspaces of definite matrix pencils, which
generalize the Hermitian eigenvalue problem.
Although rectangular matrix pencils -- matrix pencils A - )"B with
A and B rectangular - occur and have important applications, we shall"
consider only square matrix pencils, for which the perturbation theory..
is less immature.
Unless otherwise stated, A and B will be square matrices
of ord('r n throughout this chapter.
1. BACKGROUND
273
1. Background
1.1. Matrix Pairs
':V e lave. seen in the introduction to this chapter that the presence of
mfi.lllte elgenvles results from the asymmetrical treat.ment of A and
B III t.he defimtlOn. of a matrix pencil and its generalized eigenvalue
prblem. The solutIOn to the problem is to recast it in the form (1), in
wInch A and B play equivalent roles.
However, there is a technical problem here. If the pair (a, (3) satisfies
(1) then so does T(a, (3) for any scalar T. Consequently, if we are to
regard (a, ) as a geeralized eigenvalue, we must so regard its nonzero
scalar multll.;!es. TIlls suggest.s that it is tlw subspace spanned by the
vct.or (,,6) that should be regarded as the generalized eigenvalu" To
dlstmgulsh between the subspace and the pair we make the foll .
definition. ,. oWlllg
Definition 1.1. Let (a, (3) f. (0,0). Then
(a,,6) f {T(a, (3)T : TEe}.
In ordr to preseve the .connection of the generalized eigenvalue prob-
lem wIth. the. ordmary eIgenvalue problem, we will occasionally abuse
the otalOn Illtroduced in the above definition and write (A) for (A, 1).
For mfilllte A, we define (00) = (1,0)"
We .are now in a )sition to define the generalized eigenvalue prob-
lem. Smce the defillltlOn of matrix pencil treats A and B ell ' a- tl
. 11 ueren y,
we WI drop the term and refer simply to pairs of matrices.
Definition 1.2. A MATRfX PAIR (A, B) is SINGULAR if for all (a, (3)
det((3A - aB) = O.
":;,. Otherwise the pair (A, B) is said to be REGULAR. If (A, B) is regular
and
,6Ax = aBx (1.1)
"[o (a,,6) f. (0,0) and :r f. 0, then (a, (3) is an EIGENVALUE of (A, B)
.,;WIth (.RIGHT) EIGENVECTOR x. The corresponding solution y f. 0 of the
uatlOn
)
,6yH A = ay" B
called a LEFT EIGENVECTOR.
LL-__________ ,_'-- c;l :: NEI! l' i.Zlm EIC:'iNVAL j! <; PI!OI3LEMS
SOll1e examples may make these definitions clearer.
Example 1.3. Suppose that the null spaces of A and B intersect, and
let x c:I 0 belong to the intersection. Then for an,Y (0:, (3), we have
((3A - o:B)x = 0, so that the pair (A, B) is singular.
Example 1.4. Let IJ be nonsingulaL Then with (0:, (J)
I iii \'('
(1,0), we
det((jA - nB) = - det(IJ) c:I O.
Consequently the pair (A, B) is regular.
In fact the eigenvalue problem for the pair in this example is equiv-
alent to an ordinary eigenvalue problem. To see this, note that if (0:, (3)
is an eigenvalue of (A, B), then (3 c:I O. It follows that (1.1j can be
rewritten in the form IJ- I Ax = AX, where A = 0:/ (3. Conversely if
A E £(IJ- I A), then (A) is an eigenvalue of (A, B). This observation-
that the generalized eigenvalue problem with nonsingular B can be
converted to an ordinary eigenvalue problem - is the basis of many nu-
mericalmethods, which, however, can fail in the presence of rounding
error wlH'n IJ is ill conditioned"
Example 1.5. The pair
A= ( 10 )
o 0 '
lI(n
is obviously H'gular. Its eigenvalues are (1) and (00), and the corre-
sponding eigenvectors are 1 1 and 1 2 . We shall see that in spite of the
infinite eigenvalue, the pair behaves well under perturbations, provided
we make the proper definition of "well behaved""
When B is nonsingular, the eigenvalues of the pair (A, B) satisfy
the characteristic equation
det(A - AD) = O.
When B is singular, the characteristic equation will have degree less
than n. For example, the pair in Example 1.5 has the characteristic
polynomial det(A - )"B) == ).." The missing eigenvalue is the infinite
1. BACKCI!OUND
275
one. By transforming the problem we can make the infinite eigenvalues
finite and restore the lost degrees in the characteristic equation. The
proof of the following theorem is purely computational and is left as a
exercise.
Theorem 1.6. Let W be a 2 x 2 nonsingular matrix. Given the pair
(A, B), set
(C D) = (A B) ( will W12 I ) == (A B)(W Q9 1).
W21 I w221
Given the pair (0:, /3) c:I (0,0), define h, 8) by
( !, ) W-, ( !" ) .
Then (0:, (3) is an eigenvalue of (A, B) if and only if (" 8) is an eigen-
value of(C, D).
If (A, B) is a regular pair, there are constants a and T such that
T A - a B is nonsingular. If we set
w= ( a T ) ,
T -a
then Wis. nonsingular. If C and D are defined as in Theorem 1.6,
then the eIgenvalues of (A, D) are in one-to-one correspondence with
t?ose of (C, D). But D is nonsingular, and hence by Example lA the
eIgenvalues of the pair (C, D) are the eigenvalues of D-IC. Thus we
have established that
a regular matrix pencil of order n has n eigenvalues.
As with the ordinary eigenvalue problem, we will denote the set of
eigenvalues of the pair (A, B) by L:[(A, B)]"
27G
VI. GENERALIZED EIGENVALUE PROBLEMS
1.2. Triangular and Weierstrass Forms
As we saw in Chapter I, an important theme of matrix algebra is the
reduction of matrices to simpler forms by means of appropriate trans-
formations. The key word here is "appropriate." For the computation
of projections, the appropriate transformation is premultiplication by
a unitary matrix. For the eigenvalue problem it is similarity trans-
formations" For the generalized eigenvalue problem it is equivalence
tr ansformations.
Definition 1.7. If X and}' are nonshlgu1ar, then the pair (A, B) and
(}'IIAX, yIlBX) are EQUIVALENT.
Equivalence, like similarity, preserves eigenvalues while transform-
ing eigenvectors in a simple manner. The proof of the following theorem
is left as an exercise.
Theorem 1.8. Let (0:, /3) be an eigenvalue of the pair (A, B) with
eigenvector x. Then (0:, (3) is an eigenvalue of the equivalent pair
(yH AX, yH BX) with eigenvector X-IX"
The first application of this observation is a reduction to the equiv-
alent of the Schur form.
Theorem 1.9. Let (A, B) be a regular pair. Then there are unitary
matrices U and V such that the components of the equivalent pair
(5, T) = (VB AU, VB BU) are triangular. The quantities (aii,Tii) (i =
1,..., n) are the eigenvalues of (A, B) and may be made to appear in
any order on the diagonals of 5 and T.
Proof. Let (a, T) be the first eigenvalue in some prespecified order of
the' eigcnvalllcs of (A, B), and let T/Le = aB;r (x", 0). Since (A, B) is
regular, not both Ax and B x can be zero - say Ax '" 0" Let U = (u U.)
be a unitary matrix with u proportional to x, and let V = (v V.) be a
unitary matrix with v proportional to Ax. Then
VII AU = ( vH Au vB AU. ) ( a o ll A s:", . )
V. II Au V. II AU.
1. BACKGROUND
277
is block triangular. Since TAu = aBu, we mllst have V. II Bu = 0" Hence
V H 8U (T1 :)
is also block triangular.
This completes one step of the reduction. The reduction continues
inductively a la Schur (d. Theorem 1.3.3). .
We note in passing that singular pairs can also be reduced to tri-
angular form by unitary equivalences" The proof is a minor variant of
the above proof.
The computational consequences of this theorem are the same as
for the Schur theorem: it provides a target far iterative generalized
eigenvalue algorithms to aim for. However, it does not have the broad
theoretical implications of Schur's theorcm" This is because the trans-
formations involved do not preserve Hermitian matrices. Consequently,
we cannot read off the theory of Hermitian pairs from the theorem; in-
stead we must develop it directly, as we will do in the next subsection.
However, the triangular reduction has one important consequence
for simple eigenvalues.
eorollary 1.10. Let (0:, /3) be a simple eigenvalue of the regular pair
(A, B) with right eigenvector x and left eigenvector y" T11en
(0:,(3) = (yHAx,yIlBx).
Proof. It is sufficient to consider (A, B) in the triangular form
[( :).( )]
In this case x is a multiple of 11. Moreover, the first component of y
is nonzero; otherwise, y would be a left eigenvector of (A., B.), contra-
dicting the simplicity of (0:, (3). Hence yll.T '" 0 and
(yHAx,yIlBx) = (o:yllx,(3y H X) = (0:,(3). .
We now turn to the further reduction of the Schur form to block
diagonal form. Let (A, B) be a pair in triangular form and partition
A = (AI ::), B = (Bl ::).
278
VI. GENERALIZED EIGENVALUE PROBLEMS
We wish to find matrices P and Q such that
( I Q ) ( All A12 ) ( I P ) ( All 0 )
o I 0 A 22 0 I 0 A 22
and
( I Q ) ( BII B12 ) ( I P ) ( Bll 0 ) .
o I 0 B 22 0 I 0 B 22
\>Vith a little manipulation, this requirement yields the pair of equations
AllP + QA 22 = -A I2 ,
BliP + QB 22 = -B 12 .
which may be called the GENERALIZED SYLVESTER EQUATIONS If we set
T = (P, Q) I-> (AllP + QAn, BllP + QB 22 ), (1.2)
then our problem becomes one of determining when the linear operator
T is nonsingular. It turns out that a separation condition, analogous
to the condition of Theorem v" 1.3 far the ordinary eigenvalue problem,
is necessary and sufficient for T to be nonsingular.
Theorem 1.11. Let (All, B ll ) and (A 22 , B 22 ) be regular pairs, and let
T be defined by (1.2). Then T is nonsingular if and only if
L:[(A ll , B ll )] n L:[(A 22 , B 22 )] = 0.
(1.3)
Proof. Suppose that L:[(All, B ll )] n L:[(A 22 , B 22 )] = 0. We will show
that for any (R, S), the equation T(P, Q) = (R, S) has a solution, which
implies that T is nonsingular.
\Ne may assume without loss of generality that All, An, B] I, and
B 22 are upiJer triangular. For by Theorem 1.8, there are unitary matri-
ces U i , V; (i = 1,2) such that the pairs (HAiiUi,HBiiUi) are upper
triangular. Then the equation T(P, Q) = (R, S) is equivalent to
(VlIAIIUd(Ur PU 2 ) + (VjHQV2)(Vl1A22U2) = V j H RU 2 ,
(VI II13 II U d(U[IPU 2 ) + (VlIQV 2 )(Vl'B 22 U 2 ) = Vl 1 SU 2 .
1. BACKGROUND
279
Hence with the substitutions All f- Vl' All U 1 , P f- Ufl PU 2 , etc.,
the problem reduces to one in which the pairs (A ii , B ii ) (i = 1,2) are
triangular.
We shall now show how to solve the equations
AllP + QA n = R,
BliP + QB 22 = S
(1.4)
column by column beginning with the first columns of P and Q. Sup-
pose that the columns PI, P2, . . . , Pk-I and ql, q2, . . . , qk-I have already
been computed (n.b., k may be equal to one). From (1.4) and the upper
triangularity of A ii and B ii , it follows that the kth columns of P and
Q must satisfy
(22) k-I (22)
AllPk + a kk qk = rk - Li=1 aile qi,
(3 (22) _ k-I (22)
BllPk + kk qk - Sk - Li=1 (3ik qi'
(1.5 )
Multiply the first equation by (3ki 2 ), the second by ai2), and subtract
to get
k-j k-I
((3ki 2 ) All - ai2) Bll)Pk = (3ki 2 )(rk - L a;2)qi) - ai2)(sk - L (3i(2)qi)'
i=1 i=1
( 1.6)
Since L:[(A II , B ll )] n L:[(A n , B 22 )] = 0, the matrix (3ki 2 ) All - a2) BII
is nonsingular. Hence equation (1.6) may be solved for Pk' Since
(An, Bn) is a regular pair, not both a2) and (3ki 2 ) are zero. Hence
one of the equations (1.5) may be solved for qk, and it is easily verified
that this solution is consistent with the other equation. This completes
the computation of Pk and qk from PI,P2,.." ,Pk-I and ql, q2,".., qk-I.
For the converse, suppose that (A) E L:[(A II , B II )] n L:[(A 22 , B 22 )],
and assume without loss of generality that A '" 00 (otherwise reverse
the roles of A and B in what follows). Then there are nonzero vectors
x and y such that
Allx = ABllx,
yH An = AyH 13 22 .
Let
P = xyH Bn,
Q = -BIIXyll.
280
VI. GENERALIZED EIGENVALUE PROBLEMS
1. BACKGROUND
281
Then
1.3. Definite Pairs
T(P, Q) = [(A - A)BllXyH B 22 , (1- I)B llXy H B 22 ] = (0,0),
The natural generalization of the Hermitian eigenvalue problem is to
pairs of Hermitian matrices. Unfortunately, the property of being a
Hermitian pair is not in itself enough to guarantee that the pair has
nice properties, as the following example shows.
which shows that T is singulaL .
One consequence of this t.heorem is that we can reduce any regular
pair to any block diagonal form, in which the diagonal block-pairs do
not. have common eigenvalues. In particular, we have the following
corollary
eorollary 1.12. Let the regular pair (A, B) have disUnct eigenval-
ues. Thc)] there arf' nonsingular 111atrices X and Y such that the pair
(yH AX, yH BX) is diagonaf The colu111ns of X are the right eigenvec-
tors of (A, B), and tlu? colu111ns of Yare its left eigenvectors.
1\ morc intPfcsting consequence is WEIERSTRASS'S CANONICAL FORM.
Theorem 1.13 (Weierstrass). Any regular pair is equivalent to a
form
Example 1.14. Let
A= ( 1 0 ) B= ( O 1 ) .
o -1 ' 1 0
Then the pair (A, B) is Hermitian, but the eigenvalues of the pair are
clearly (::I::i).
[diag(J, I), diag(I, N)],
(1.7)
Even mare pathological cases can OCCUL For example, any ma-
trix can be written in the form B-1 A, where A and B are Hermitian.
Clearly, we must impose additional conditions if we are to have a work-
able theory. One possibility, which accounts for many applications, is
to require that B be positive definite. This condition is justified by the
following theorem
Theorem 1.15. In the Her111itian pair (A, B) let B be positive defi-
nite. Then there is a no)]singular matrix X satisfying XII B X = I such
that X H AX = 1\, where 1\ is real and diagonal.
Proof. Since B is positive definite, it has a positive definite square
root B!. Then the pair (A, B) is equivalent to the pair (B-!AB-!, 1).
Let B-! AB-! = U 1\U II be the spectral decomposition of B-! AB-L
Then X = B-! U is easily seen to be the required matrix" .
The construction used in the proof of this thearem can be used to
establish a min-max characterization of the eigenvalues of (A, B) in the
spirit of Fischer's thearem (Corollary IV .4. 7). In fact, the following
corollary, whose proof is left as an exercise, is what Fischer originally
established.
eorollary 1.16 (Fischer). Let the eigenvahlf's of (A, B) be ordcH?d
so that Al 2: A2 2: . . . 2: An. Then
where J and N are in Jordan canonical for111 and N is nilpotent (i.e.,
has only zero eigenvalues).
Proof. Assume that the eigenvalues of the pair have been ordered so
that its triangular form can be partitioned
[( All AI2 ) , ( Bll B I2 )]
o A 22 0 B 22
where the diagonals of Bll are nonzero and the diagonals of B 22 are
zero. Since L:[(A ll , B ll )] nL:[(A 22 , B 22 )] = 0, we may further reduce the
pair to the form
[diag(A ll , A 22 ), diag(B ll , Bd] .
Sincc t.he pair is regular and the diagonals of B 22 are zero, the diagonals
of A 22 are nonzero; i.e., A 22 is nonsingular. Hence we may further
reduce the pair to the form
[diag(AllB 1 /, 1), diag(I, B 22 A;n] .
The reduction is completed by reducing AI1Bl/ and B22Ail to their
Jordan canonical forms. .
XII Ax
Ai = max min-,
dim(,Y)=i rE.\" xlI Bx
T#O
282
VI. CENERALIZED EIGENVALUE PROI3LEMS
and
Xli Ax
.\ = min max -"
dim(X)=n-i+1 TE.\' x H B.T
",",0
Although the condition that B be positive definite covers many cases
occllrring in practice, we can make do with an even weaker condition,
which includes cases in which neither A nor B is positive definite"
Definition 1.17. The Hermitian pair (A, B) is a DEFINITE PAIR if
,(A, B) (f min Ixll(A + iB)xl == min V (x H Ax)2 + (x H Bx)2 > O.
xEC" XEC"
IITII21 IIT1I21
(1.8)
The basic fact about definite pairs is that they can be transformed
into a pair in which B is positive definite. Specifically, we have the
following result"
Theorem 1.18. Ll?t (A, B) bl? a dl?finite pair, and for 4> E R let
A1' = A cos cp - B sin cp,
B1' = A sin cp + B cos cpo
(1.9)
Then there is a 4> E [0, 27r) such that Brp is positive definite and
,(A, B) = AmirJBrp),
where Amin(Brp) is the smallest eigenvalue of Brp"
Proof. Let F be the field of values of A+iB (see Definition 3.10)" Then
,(A, B) = minhEF IIhlb. Let the minimum be attained at the point h =
xb'(A+iB)xo. Since F is a bounded, convex set (see Theorem 3.11), it is
contained in the half plane H, whose boundary passes perpendicularly
through h.
Let Frp, Hrp, and hrp be the quantities corresponding to the pair
(A"" B",). Since Arp + iB", = ei"'(A + iB), these quantities are just the
original quantities rotated through the angle cpo Choose 4> so that H",
lies in the upper half plane; i.e", so that h", lies along the imaginary
1. BACKGROUND
283
axis. Then xg Arpxo = hrp
below H, we must have
O. Moreover, since no point of Frp lies
o < ,(A, B) = xg Brpxo = min XH B",x = Amin(B),
IIxll=J
which proves that B", is positive definite. _
If we now combine Theorems 1.15 and 1.18, we have the following
corollary.
eorollary 1.19. Let (A, B) be a definite pair. Then (A, B) is regu-
lar. Moreover, there is a nonsingu1ar matrix X such that X H AX and
X H BX are diagonal.
1.4. Metrics and Their Limitations
A novel feature of the perturbation theory for the generalized eigen-
value problem is that two matrices _m A and B -- vary instead of one.
Moreover, when we consider the perturbation of eigenvalues, we must
introduce a distance between pairs (a, (3) and (0,73). This subsection
is devoted to defining the metrics that will be used in the remainder of
the chapter.
We will first consider metrics for eigenvalues. Since we have chosen
to regard eigenvalues of matrix pairs as two-dimensional subspaces, it
is natural to use the metrics of Section II.4. Of these metrics, one, the
chordal metric, has an especially natural geometric interpretation.
Definition 1.20. The CHORDAL DISTANCE between (a, (3) and (1,8) is
the number
x((O',(3), (1,8)) (f pg,2((O',(3), (1,8)),
where Pg,2 is the gap metric in the 2-norm (see Definition II.4.3).
By definition the chordal distance is a metric. It is also easily com-
puted. In terms of a, (3, " and 6 it has the form
10'6 - (3,1
X( (a, (3), (1,6)) = V l O' j2 + 1(312 v hl2 + 181 2
28,1
VI. GENERALIZED EIGENVALUE PROBLEMS
If we set. A = n / (1 ami Il = 'Y / tJ, tlwn we have
X((A), (p.)) = IA -pi
/ 1 + IAI2 / 1 + I/LI2
From the lattN farm it is seen that
1
X((A), (00)) = /
1 + IAI2
::;L
Thus, the chordal metric regularizes the point at infinity by making it
no more than unit distance from any other point.
The name "chordal metric" comes from the following considera-
tions. In R 3 let the :r-y plane represent the complex numbers. For any
complex number A draw a line between A and the point (0,0,1) and let
s(.\) be the intersection, other than (0,0,1), of the line with the unit
sphere centered at the origin (the Riemann sphere). Then it can be
shown that
1
X(A,P) = -lIs(A) - s(p)1I2'
2
(1.10)
In other words, the chordal distance between A and p is one half the
lcugt.h of t.he chord joining t.hc projections of A and Il ont.o t.he Riemann
sphen'"
For numbers less than one in magnitude, the chordal metric behaves
essentially like the ordinary Euclidean metric. In particular
IAI,lpl ::; 1 ===? X( (A), (00)) ::; IA - pi ::; 2X( (A), (00)).
Moreover, as A, p --+ 0, we have X((A), (tL)) S:! ItL - AI.
On the other hand, for large numbers the chordal metric behaves
counter-intuitively For example, as A --+ 00 we have
1
X( (A), (2A)) f\T'
Thus large numbers can have very small chordal differences, even when
they have large relative errors" ,
Let us now consider metrics for matrix pairs. Let (A, B) be a matrix
pair and let (ii, in = (A + E, B + F). A natural way to define the
1. BACKGROUND
285
distance, between the pairs is to apply a narm to tlw difference (11 B)-
(A B), I.e., to the matrix (E F). Far example we mi ght say that the
distanc between (A,) ald (.4, B) is II(E F)112 or / IIEII + IIFII or,
dependmg on the applIcatIOn, some other combination.
In many respects this is the most natural approach; however, it has
an impartant drawback. Since the generalized eigenvalue problem is
homogeneous in A and B, we do not feel there is any substantial differ-
ence b:tween the the pair (A, B) and any nonzero multiple (TA, TB).
But wIth the above approach, these two pairs have positive distance,
unles.s T = 1. Consequently, we will also use other, less discriminating
metncs.
Let us start with definite pairs. We will take the same approach as
we did with eigenvalues and define our metric over equivalence classes
of pairs, which in some sense represent the same problem.
Definition 1.21. Let (A, B) be a definite pair. Then (A, B)D (D for
definite) is the set of pairs (C, D) such that there exists a real multiplier
p for which one of the following conditions holds:
1. C = pA and D = pB (p 0),
2. A = pB and C = pD,
3. B = pA and D = pD.
The first case in the above definition corresponds to the case where the
pair (C, D) is proportional to the pair (A, B), and hence the pairs have
te same eigenvalues. It the second case both pairs have the single
eIgenvalue (p), and in the third they have the eigenvalue (p-I). It is
easily verified that the operator (" ')D divides the set of definite pairs
into equivalence classes.
We now define
p( (A, B)D, (C, D)D) f r;:;c: X( (x H Ax, x H Bx), (xIlCx, xII Dx)). (1.11)
Theorem 1.22. The function p defined by (1.11) is a metric on the
space of definite pairs (A, B)D"
28G
VI. GENERALIZED EIGENVALUE PROBLEMS
1. BACKGROUND
287
Proof. TIIP only propprt.y of a me't.ric t.hat is difficult, to verify is that
where T is real. From (L12)
p((A,B)D,(C,D)J)) = 0 ===? (A,B)J) = (C,D)D'
( H .H 2 )
alx l XI + a2x2 X2T x
[Xl'DllXI + (Xl'DI2X2 + x1 D 21 XI)T + xr D22X2T2] =
( " + ' H 2 )[ H c + ( .H C H C ) H c 2 ]
:rl.TI X2 X 2 T XI llXI XI 12 X 2+ X 2 21XI T+.T2 22X2T "
\Ve' will establish this implication, leaving the other properties as an
eXNcise"
Suppose that p((A, B)D, (C, D)D) = O. Then for all X
:r H Ax.T II Dx = XII BxxHCx.
(1.12)
Equating powers of T, we ge
H D li C
ajx j jjXj = x j jjXj,
j = 1,2,
(L 13)
(1.14)
Le't us first dispose of cases two and three in Definition 1.21. Suppose
that A = ILB. Since (A, B) is definite, it follows that Xli Bx '" 0 for all
:L Ilence from (1.12), we' have' x"Cx = J.LXH Dx for all x. Equivalently
C = pD, which shows that (it, B) = (C, D). The third case is treated
similarly.
Turning to the first case, it is easy to see that the first relation in
Definition 1.21 and the equation (1.12) remain invariant under substi-
tutions of the form
Xr X IXr(a I D 22 - Cd X 2 = xr x 2 x l'(C ll - a2 D ll):rl,
and
aj(Xr D 12. T 2 + xID21Xd = XrCI2X2 + xrD 2l xI,
j = 1,2.
(LI5)
From (1.13)
Cll = alD ll , C 22 = a2D22.
Hence from (1.14) we obtain
A f- A cos cp - B sin cp,
n f- A sin cp + B cos cp,
C f- C ('Wi r/> - C sill cp,
D f- C sin cp + D cos cpo
H D
x 2 22 X 2
XX2
xl'Dllxl
xrxi
=J.L
for some real 11.. Since this relation holds genprally m :rl and .'];2, WE:'
must have
Hence by Theorem 1.18, we may assume B is positive definite. The
same relations are also invariant under congruence transformations.
Hence by Theorem 1.15 we may assume that B = I and that A =
diag( (Xl I, . . " , amI), where the ai are distinct numbers. Since A is not
a multiple of B, we must have m > 1. The general proof is sufficiently
well illustrated by the C&'ie m = 2.
Let
Dll = D 22 = pI
Since al '" a2, we have from (1.15),
xr DI2X2 + x1 D 21 xI = O.
(1.16)
Taking first X2 = Dxl and then XI = D\ X2 in (1.16), we obtain
D I2 = 0, D 21 = O.
C = ( Cll CI2 ) ,
C 21 C 22
D = ( Dll D12 )
D 21 D 22
Hence
C = IlA,
be conformal partitions of C and D. Consider the vector
and
D = III = liB. .
X = (:rr TX)I1,
288
VI. GENERALIZED EIGENVALUE PROBLEMS
1. I3ACKGROUND
289
The notation p( (A, B)o, (C, D)o) is a little clumsy, and in the sequel
we will write
po[(A, B), (C, D)].
the perturbed pair (A, B). This implies that up to first order terms,
the errors in, say, A do not cross over to affect the second component (3
of the eigenvalue. But the metric PD confounds errors A and B, while
the chordal metric confounds the resulting perturbations in a and (3.
Moreover, the resulting bounds are not scale invariant. Replacing
the pair (A, B) with (TA, B) will give essentially different bounds--
bounds that change nonlinearly with T. For example, our theorems
will not reduce to the usual perturbation theorems when B = I; to
recover them we must let T --+ O.
However, the situation is not entirely bleak For nicely scaled prob-
lems the perturbation bounds we derive may be quite satisfactory.
Moreover, some of our theorems will give bounds on perturbations of
the components of (a, (3), bounds that the analyst may use in any way
he sees fit.
When it comes to general matrix pairs, the situation is even less
satisfactory. The metric defined above has all the drawbacks discussed
above. Moreover, it is asymmetric; we obtain essentially different the-
orems for the pairs (A, B) and (All, BlI). However, when A and Bare
Hermitian this last objection does not apply.
However, the reader should keep in mind that the function PD so
defined is a pseudo-metric, not a metric.
Turning now to general, regular matrix pairs, we will say that the
pair (11, B) is LEFT EQUIVALENT to (C, D) if there is a nonsingular matrix
y such that yH(A, B) = (C, D)" This is clearly an equivalence relation
and we shall dC'note the equivaknce classes by (A, B)L.
We wish to define a metric over the equivalence classes (A, B)L.
The key observation is that (A, B) is left equivalent to (C, D) if and
only if the row spaces of the matrices (A B) and (C D) are the same.
Consequently we can define a metric by using anyone of the metrics
in Section 11.4. Far definiteness we will choose the gap metric in the
2-norm, or equivalently we will make the following definition.
Definition 1.23. Let (A, B) and (C, D) be regular matrix pairs. Then
p((A, Bh" (C, D)L) = sin8 1 ,
j
t
t'
)
I
whrl'r 8 1 is thr largrst canonical angle betwecn the row sp<1ces o£(A B)
and (C D)"
Again, we will usually write pd(A, B), (C, D)] for p( (A, B)L, (C, D)L).
Note that there is a natural function PR obtained by considering the
pair (AH,B lI ).
Let us now step back from the technical details and take a broader
view. The justification for the metrics we have introduced is conve-
nience and elegance. The metrics are convenient because they regular-
ize singular cases. For example, all eigenvalues, finite and infinite, are
treated uniformly. The elegance can only be judged by the results, but
the reader is invited to look ahead to the statement of Theorem 3.2.
On the other hand, convenience and elegance exact a toll. In Chap-
ter III we discussed the loss of information entailed in using norms, and
the same caveats apply here. But there is more.
In the next section we will show that if x is an eigenvector of the def-
inite pair (A, B) corresponding to the simple eigenvalue (x H Ax, x H Bx),
then (xII Ax, XII Ex) is a first order approximation of the eigenvalue of
Notes and References
Matrix pairs arise naturally in the study of systems of ordinary differential
equations of the form
'/
.J
A dx = Bx
dt '
where the simultaneous diagonalization of A and B by an equivalence rep-
resents a transformation which uncouples the system. Generalizations to
higher order systems lead to A-MATRICES of the form An + AlA +. . . + AkA k ,
for treatments of which see [81, 86].
WC'ierstrass [263, 1867] established his canonical form (Theorem 1.13) by
working with a pair of bilinear forms, as was customary at the time. Jordan
[125, 1874] gave another proof, which included singular pencils. Later Kro-
necker [139, 1890] extended these results to rectangular pencils (for details
see [81]). For modern computational treatments see [60, 272].
The generalized Schur form of Theorem 1.9 is due to Stewart [201, 197 2 ],
as is the condition for the generalized Sylvester equations to be nonsingular
(Theorem 1.11).
290
VI. GENERALIZED EIGENVALUE PROBLEMS
Definitp pairs in which one or the o',her of the components is positive definite
constitute the majority of applications. It is not generally appreciated that
Fisc!wr [74, 1!)O5] proved his min-max characterization (Corollary 1.16) for
such pairs' , not simply for eigenvalues of Hermitian matrices.
TIH'orPIll I "IR charact,prizing ddinitp pairs is one of a IlIl1nber of interrelated
tllPOrPIllS, whose hist.ory a\l(l illt.PI'COllllPCUOIlS havp beell admirably surveypd
by Uhlig [243]. The particular theorem given here is due to Crawford [48,
1976]. It. should be noted that in the definition (1.8) of I'(A, B) the minimum
is taken over all vectors x E en. It might be hoped that for symmetric pairs
olle could let x range over Rn. Although one can when n > 2, one cannot
when n = 2, as the pair of Example 1.14 shows.
The chordal metric was first used ill the perturbation theory for matrix pairs
by Stewart [204, 1975]. 'I'll!' metric 1'0 was introduced by Sun [222, 1982],
as were the metrics I'L and I'H [220, 1979] (see also [67])"
Exercises
1. Show that if A and Bare nonsingular and (A) is an eigenvalue of (A, B)
thell (A -I) is an eigenvalue of (A -I, B- 1 ).
2. Let A = I al\(l
B=(I;£ 1£)'
where £ is small. Then the pair (A, B) has eigenvalues Al S:' (1,1 + !£2) and
A2 S:' (1, - !£2)" Show that Al is insensitive to perturbations in B, but is an
ill-collditioned eigenvalue of B- 1 A.
:\" (Molpr and St.pwart. [161]). Ld, (A, n) be a real, regular mat.rix pair.
Show that (A, B) is orthogonally equivalent to (5, T), where T is upper
triangular and 5 is block upper triangular with 1 x 1 or 2 x 2 blocks on its
diagonal.
4. Show that the Weierstrass canonical form (1.7) is essentially unique.
5. Let A and B be positive definite. Show that if x H Ax s: xli Bx for all
x'" 0 then xHA-Ix;::: xHB-Ix.
6. Verify that the chordal metric may be defined by (1.10).
7" Show that
IAI, 1, / 1::; 1 =} X((A), (CXJ)) ::; IA -ILl::; 2X((A), (CXJ)).
2. REGULAR MATRIX PAIRS
291
8. Let (A, B) be as in Exam ple 1.14. Show that
min / (.r T Ax)2 + (x T B:r)2 = 1,
xER" V
IIr1l21
eV(1I t.hough (A, n) is 1101, ddinite.
2. Regular Matrix Pairs
In this section we will treat the perturbation of the eigenvalues of regu-
lar matrix pairs. We begin with first order perturbation theory, which
exhibits the typical behavior of a simple generalized eigenvalue. We
then turn to a generalization of the Gerschgorin theorem, which is the
most useful tool for bounding pertnrbations of generalized eigenvalues.
Next comes a generalization of Theorem IV.3"3, which bounds the spec-
tral variat.ion in terms of the condition of the eigenvalues. Finally, we
develop the perturbation theory of eigenspaces"
Throughout this section, (A, B) will be a regular matrix
pair of order nand
(A, B) = (A + E, B + F)
will be a perturbation of (A, B).
2.1. eontinuity, First Order Theory
The first thing that must be established is the contilluity of the eigenval-
ues of matrix pairs. We will use Theorelll 1.6 to reduce the continuity of
the generalized eigenvalues to that of the ordinary eigenvalue problem.
Here it is not critical how we measure the size of the perturbation in
(A, B), and to fix on a single measure we will set
£ = V IlEII + IIFII.
Theorem 2.1. Let (A, B) be a regular pair, and let its eigenvalues be
(AI),"" (An). Then there is an ordering ()'l)"", (\,) of the eigenval-
ues of (ii, B) such that
lirn x((>-;), (Ai)) = 0,
(-.-.--.to
i=I,..",n.
292
VI. GENERALIZED EIGENVALUE PROBLEMS
;
,
i
I
I
I
l"-
I;
1\:
t!.;
I
Ij:
'C
Proof. By ThC'orem L6 there is a 2 x 2 matrix TV such that the matrix
D in the pair (C D) = (A B)(W QSi I) is nonsingular. Let /11,... ,/1n be
the eigenvalues of DIC. Let (6 D) = (A B)(W QSi I). For E sufficiently
small, D is nonsingular. By the continuity of the ordinary eigenvalue
problem, we know there is an ordering of the eigenvalues Ill, . . . ,Iln of
f)-Ie' such that lilll,.oil; = II,"
. Lct.(jJ i ; -Iti) = (A -l)WT. Thcn y Thoreml(j, (Ai) = i,(3i)
IS all clgenvalue of (A, B). If we set ((3i - ai) = (.\ - I)W and
(.i) = (ai, jJi), then (i) is an eigenvalue of (A, B). Since (ai, jJi)
converges to (O'i, (3i), it follows that (i) converges to (Ai) in the chordal
metric. _
Throughout this book, we have presented first order perturbation
expansions whenever they exist Although these expansions are usually
carollaries of more general results, in many cases the general results
themselves were conjectured by looking at first order expansions. The
reason is that the expansions often tell ninety percent of the story and
yet are free of the clutter that accompanies rigorous upper bounds.
Since research into the perturbation of matrix pairs is still in a state of
flux, it is appropriate to begin with first order perturbation theory.
Let (0', (3) be a simple eigenvalue of (A, B). In order to derive a
first order perturbation expansion, we must first show that one exists.
In one sense this is trivial. By Theorem L6, we may assume that B
is nonsingular. Hence for E sufficiently small iJ is nonsingular, and
to the eigenvalue A = 0'/(3 of BI A there corresponds an eigenvalue
= A + O( f) of iJ-I A, which is differentiable in the elements of A and
13" It follows that (.) is the required expansion"
However, when we look at the individual components of (0', (3) we
find that their perturbations are not unique. For if (a, jJ) is an O(f)
perturbation of (0', (3) in the chordal metric and ify(f) = O(f), then
(a+aify(f),i3+(3ify(f)) differs from (a,jJ) by 0(E 2 ). This follows directly
from the formula
x( (a, i3,), (a + aify(f), i3 + (3ify(f)))
_ lify(f)(a(3 - jJa)1
- J lal2 + 1i312 J la + o:ify(E) 1 2 + 1i3 + (3ify(E)1 2 '
in which the denominatar is O( (2). Fortunately, Corollary 1.10 provides
a canonical choicc far ((t, /3).
,i
..,
J
1
:1
",
j:
i\
.\
\
J,
,
IJ
I
"
,";
.'
2. REGULAR MATRIX PAIRS
293
Theorem 2.2. Let (0', (3) be a simple eigenvalue of the regular pair
(A, B) with right and left eigenvectors x and y. Let (c):, jJ) be the
corresponding eigenvalue of the O(E) perturbation (A, B)" Then
(a,jJ) = (yHAx,yHBx) + 0(f 2 ). (2"1)
Proof. A pplying the pert 1Il'bation theory for Uw ordinary f'igcnvaluc
problem first to B-1A and then to AB-l (after a transformation, if
necessary, to make B nonsingular), we find that we may take for the
eigenvectors corresponding to (a, jJ) the vectors i: = x + u and y + v,
where u, v = O(E). By Corollary 1.10,
(a,jJ) = (yHAi:,yHBi:) =
(yHAx + yllAu + vHAx + 0(f2), yHiJx + yllAu + vHAx + 0(f2)).
Since (A, B) is regular, at least one of 0: or (3 must be nonzero, say
(3 oj 0" Then
li B li B
II A II A Y u + v x
Y u+v x=o:
(3
and
H B H B
H B li B (3 y U + v x
y u+v x=
(3
Thu (yllu + vII Ax, yll Bu + vII Bx) is an order f perturbation of
(yIlAx,yHAx) that lies along (0',(3). By the observation made just
before the theorem, deleting these terms introduces an 0(f2) error. _
From this theorem we may derive approximate error bounds for the
perturbation of a simple eigenvalue. Specifically, if we set 0: = yH Ax
and (3 = yll Bx, then
X( (0:, (3), (a, jJ)) Io:yll Fx - (3yll Exl
J l0:1 2 + 1(312 J IO' + yH Exl2 + 1(3 + yll Fxl2
c:,; layll Fx - (3yH Exl
10'1 2 + 1(312
To turn this approximation into a bound, note that
l"y"Fx - fiy" Exl k'(E P) ( !:x ) I
::; J l O' l 2 + 1(32111xll21IyI1211(E F)II2.
294
VI. GENERALIZED EIGENVALUE PROBLEMS
2. REGULAR MATRIX PAIRS
295
Hence if we set
V=
IIxlbllyll2
V l a j2 + 1/J12 '
(2.2)
Theor.em 2.3. Let (A, B) and C, B) be !'etfular pairs, and let 11.11 be
a. consIstent matrix norm. If (ii, (3) E L:[(A, B)] is not an eigenvalue of
(A, B), then
thC'n
II(A - iiB)-I([3E - iiF)1I 2: 1.
(2A)
x( (0, (3), ((t,;3)) ;S I/II(E F)112"
(2.3)
Proof. Since (o,) ct .c[(A,_B, the matrix ;3A - o:B is nonsingular.
Let i be an eigenvector of (A, B) corresponding to (ii, [3). Since
Thc 1lI11l1bcr 1/ defined by (2.2) is completely analogous to the nUlll-
ber 1/ defined by (IV.2.8) for the ordinary eigenvalue problem; that is, it
serves the role of a condition number for its eigenvalue. Unfortunately
we cannot obtain the usual bound for the ordinary eigenvalue problem
Ax = AX by replacing B by I and F by O. However, if we replace A by
T A, A by T A, and E by T E, then the bound (2.3) becomes
0= ([3A - iiB)i = ([3A - iiB)i + ([3E - iiF)i,
we have
X((TA), (T));S v T IITEII2,
(A - iiBtl(E - iiF)i = .1:,
from which the theorem follows on taking norms. .
We may now state and prove the generalization of the Gerschgorin
theorem.
where
IIxll211ylb
V T =
V ITyHAxl2 + l y llxl 2
I3ut as T --+ 0+, we have X((TA), (T.\)) TI.\ - AI. Moreover,
condition number I/ T approaches IIxll2l1ylldlyHxl" Hence we have
Theorem 2.4. Let (A, B) be a regular pair" Let
the
Vi = { (a, {J) : l(3aii - a(3iil ::; L l{Jaij - a(3ijl },
#i
i=1,..",1L
Then
1 .\ - A I < IIxll211ylb II E II
rv lyHxl 2,
n
L:[(A, B)] c U Vi'
i=1
which is the usual bound.
The approximate bound (2"3) is about as good as we will see for the
generalized eigenvalue problem. However, the trouble we had to take to
retrieve the Rayleigh quotient bound is a reminder that it suffers from
the limitations noted at the end of the last section. When in doubt, one
should return to explicit forms, like the approximation (2.1) provided
by Theorem 2.2.
Moreover, if the union k of the regions Vi is disjoint from the others
and is not equal to the space ei of all (a, (3), then the union contains
exactly k eigenvalues of (A, B).
Proof. Let D A = diag(all,...,a nn ) and DB = diag({Jll,...,(3nn). In
Theorem 2.3 make the substitutions
A f- D A ,
A f- A,
B f- Dn,
B f- B,
2.2. Gerschgorin Theory
In this subsection we will generalize Gerschgorin's theorem and apply
it to the perturbation of multiple eigenvalues. As in Section IV.2, we
approach the theorem through a generalization of the I3auer-Fike the-
on'lll.
and 11.11 f- II. 1100' Then it is easily verified that the inequality (2.4) is
equivalent to saying that each eigenvalue of (A, B) is in some Vi.
The statement about isolated disks follows from the continuity of
the eigenvalues as in the ordinary Gerschgorin theowm -- namely, if we
introduce the pairs (AT' B T ) = [DA + T(A - D A ), DlJ + T(A - Dn)],
29G
VI. GENERALIZED EIGENVALUE PROBLEMS
2. REGULAR MATRIX PAIRS
297
then the corresponding regions V;T) increase with T. The only tricky
point is to insure that the pair (AT' B T ) is regular for 0 :S T :S 1.
We argue as follows. Assume without loss of generality that the
I . " . t [ . k V (T) V (T) Tl U k V (T) I U n V (T)
( ISjOlll, (IS S are I' . . ., k'" len i=1 i an( i=k+1 i are
disjoint closed sets. Since ei is connected, there must be a point
(0', m U;"=I D;T) U U;'=k'.J I V;T) Then (O'J3) is not an eigenvalue of
(AT> 11 7 ), which is tlH'rdorC' rqlllar. .
The first comment to be made about this theorem is that it becomes
uninteresting when some (O'ii, (3ii) = (0,0), since in this case Vi includes
all pairs (0', (3). In the sequel we will tacitly exclude this case (see
Exercise 2.1).
The regions Vi are difficult to compute, since (0', (3) appears on both
sides of the bound. However, by expanding the regions, we may remove
this dependence.
the last inequality following from the Cauchy inequality. Thus the
inequality
I(3O'i; - O'(3;d :S L I(3O'ij - O'(3ijl
j#i
implies the inequality
l(3aii - O'fJiil :S V l0'1 2 + 1131 2 lIailii + Ilbdli"
ai = (Oil,...,0'i,i_1,0,O'i,i+1,"",O'ill)T
The corollary now follows on dividing by J IO'id 2 + IfJiil 2 J I0'12 + 1(31 2 " .
This corollary is actually our principal Gerschgorin theorem. It
should be noted that by the same kind of limit argument we used in
the last subsection, we can recover the usual Gerschgorin thearem, for
the ordinary eigenvalue problem.
The technique of diagonal similarities, which we used so successfully
in Section IV.2, can be applied to the generalization of Gerschgorin's
theorem. To vary the application, we will show how to apply the tech-
nique to a multiple eigenvalue of a diagonalizable pair. Since we have
already showed in Section IV.2 how to take into accollnt terms of order
higher than the first, we will not bound their contribution here.
Let (A, B) be a diagonalizable pair; that is, suppose there exist
nonsingular matrices X = (Xl .,. X,,) and Y = (YI ... Yn) such that
eorollary 2.5. Let
and
T
b i = ((3i1,".. ,fJi,i-1,0,(3i,i+I,'" , (3ill)
bl? th(' rows of A - D A and B - D 13. Let
Pi =
Iladii + IIbilii
IO'iil 2 + l(3iiI 2 '
(yll AX, y" EX) = [diag( 0'1, " . . ,0',,), diag((3I, . . " , (3n)].
and let
Assume that (0'1, (3d = '" = (O'k, (3k), and that these eigenvalues are
distinct from the others, so that
9i = {(0',(3): X((O',(3),(O'ii,(3ii)):S pd"
b = min X((Ol,(3J), (O'i,fJi)) oj O.
k<l'5:.n
Then
Vi C9i,
i = 1,... ,n.
Set
Proof. We have
Vi =
II X ill211Yill2
J 10';\2 + l(3iI 2 '
L I(3O'ij - O'(3ijl = II(3ai - O'adh
jfi
and let
< ((30') ( 110.;\11 )
- IIb i ll l
:S V l O' l 2 + 1(31 2 lIadli + Ilbilli,
V = Inax Vi
I<;i<;k
and
I
V = Iuax I/i.
k<i<;n
298
VI. GENERALIZED EIGENVALUE PROBLEMS
2. REGULAR MATRIX PAIRS 299
Then
TyH(A + E)XT- I =
O'll + /ll /12 /13 T/l1 T/15 T/16
/21 0'22 + /22 /23 T/24 T/25 T/26
/31 /32 0'33 + /33 T/34 T/35 T/36
TI/41 -I -I 0'44 + /44
T /42 T /43 /45 /46
-I -1 -I 0'55 + /55
T /51 T /52 T /53 /54 /56
-1 -I -I /64 0'66 + /66
T /61 T /62 T /63 /65
and
Tyl1(B + F)XT- I =
(3ll + 1]1l 1]12 1]13 71 114 T1]15 71 116
1]21 (322 + 1]22 1]23 71 124 T1]25 71 126
1]31 1]32 (333 + 1]33 71 134 T1]35 T1]36
-I -I -I (344 + 1]44
T 1]41 T 1}42 T 1]43 1145 1]46
T- 11 151 -] -] 1]54 !355 + 1J55
T 1]52 T 1]53 1156
T- 11 161 -I T- I 1]63 1]64 (366 + 1166
T 1]62 1165
Now consider the pair (A, B) = (A + E, B + F). Write (for n = 6)
} 'II(A + E)X =
0'11+/11 /12 /13 /14 /15 /16
/21 (Y22 + /22 /2:3 /24 /25 /26
/31 /32 0'33 + /33 /34 /35 /36
/41 /12 /43 0'14 + /11 /45 /46
/5] /52 /53 /51 0'55 + /55 /56
/61 f62 /63 /64 /65 0'66 + /66
and
yH(B + F)X =
(311 + 11u 1]12 1]13 1111 1]15 1]16
1]21 (322 + 1]22 1]23 1}24 1125 1]26
1131 11:32 {J:33 + 1133 1131 1135 1}36
1/.1] 1/.12 111:1 (341 + 1111 1145 1]16
1151 1}52 11r,3 1}54 (355 + 1155 1156
116] 1]62 1163 1161 1]65 (366 + 1]66
Let I' = max{IIElb, IJFII2}, so that
Ellxjlbllydb lTijl, 111ijl.
As I' -+ 0, the first three of the regions 9i have radii that approach zero.
The last three have radii that are bounded by V2kw' /T, up to terms
of order 1'2. Consequently, if we take
T=
2V2kv'E
{j
then for I' small enough the first three regions will be disjoint from the
last three and hence will contain exactly three eigenvalues. The radius
of these disks is bounded by V2( k - 1 )1/1' up to terms of order ('2. Since
it is easily seen that
Far definiteness let us suppose k = 3 and n = 6. Let
X((O'ii,(3ii), (O'ii + /ii,(3ii + 1]ii)) .JiVE + 0(1'2),
T = diag(T,T,T,I, 1,1).
we have shown that
300
VI. GENERALIZED EIGENVALUE PROBLEMS
2. REGULAR MATRIX PAIRS
301
there are ('xactly k eigeJJ\'alues (a;, /3,) (i
(A, 13) sat.isfring
1,".",k) of
Theorcm 2.6. Let (A, B) be a regular pair, awl Sll]Jpose that for somc
nonsingular X and Y we have
- 2
x((nll,,6l1), (ii;,,6;) + O(f ).
(yll AX, yH BX) = (D A, DB),
where
There are four points to be made about this result. First, the as-
sumption that the pair (A, B) is diagonalizable is not necessary. What
is required is that the multiple eigenvalue have k linearly independent
eigenvectors. Other multiple eigenvalues corresponding to nontrivial
.Jordan blocks can be handled as in Section IV.2.
Second, the above development shows that when k = 1 there is an
eigenvalue of (A, B) in the region with center (0' + y1 E.TI,,6 + y1 FXl)
and a radius O( (2). Thus, the GNschgorin theorem gives an indepen-
(h'nt proof of Theorem 2"2. However, unlike our first proof, this one
offers the possibility of computing the bound"
Third, the number v is a condition number for the multiple eigen-
value in the sense that the bound on the error in the perturbed eigen-
values is proportional to the error times v. However, the constant of
proportionality grows linearly with the multiplicity of the eigenvalue.
Finally, the theory developed here is a worst-case theory depending
on the largest of the numbers /Ji. In practice, the perturbed eigenval-
ues will telld to have cOlldition inversely proportional to of value of /Ji
particular to itself. Far example the pair
D A = diag(O:I"", an),
DB = diag(,6I,... , ,6,,).
Let (A, B) be regular. Then [or evelY eigenvalue (ii, ffi) E L:[(A,13)]
there is an eigenvalue (0:,,6) E L:[(A, B)] that satisfies
X( (0:, ,6), (ii, ffi)) ::; 1\:2(X)pd(A, B), (A, B)]. (2.5)
Proof. Since both the eigenvalues of (A, B) and the equivalence class
(A, B)L are invariant when A and Bare premultiplied by a nonsingular
matrix, we may assume that U H = (A B) and (;11 = (A B) have
orthonormal rows. Moreover, since (A, B) is regular, we may assume
that liil 2 + Iffil 2 = L
Let i: be the right eigenvector corresponcling to (ii, ffi), normalized
so that IIxll2 = L Then
ffiAx - iiBx = ffi(A - U II (; A)x - ii(B - U II (; B)x
[U 2,OO)( [,Oon]
= (A - UII(J A B - U II (; B) ( 'T )
-ax
= (UH _ UH(;(;H) ( x )
-o:x
= UH(UUH _ (;(;11) ( x )
-o:x
= UH(Pu - Po) ( x ) .
-o:x
has a double eigenvalue (2,1). But one of these eigenvalues is very
sensitive to perturbations of order 0.1, whereas the other is noL The
question of how to make this observation precise is an open research
problem.
2.3. Diagonalizable Pairs
In this subsection we will consider the eigenvalues of diagonalizable
pairs, and in particular we will generalize Theorem IV.3.3 - the well-
known corollary of the BauerFike theorem.
By Theorem 1.5"5 the singular values of Pu - Po are the sines of the
canonical angles between the column spaces of U and (;. Hence by
Definition L23,
118Ax - iiAxl12 ::; pd(A, B), (A, B)]"
(2"6)
302
VI. GENERALIZED EIGENVALUE PROBLEMS
Let I'll = X-I and QH = y-I. Then (A, B) = (QDAPH, QDBPH).
Now
1= UHU = Q(DAPHpDA + DBpHpDB)QH.
Hence for any w,
wH(QHQ)-IW = wH(DAPHpDA + DBPHPDB)w
::; IIpil PIl2WIl(ID AI2 + IDEI2)W,
and therefare (Exercise 1.5)
WIl(QIIQ)w;::: IlpHpIl 2 I W Il (IDAI2 + IDBI2)-IW"
Thus
- - ) II II
1I;3A:r - ii-A.1:112 = IIQ(;3D A - aD E P x
= [:r Il P(,6D A - iiD B )IIQIIQ(,6D A - ("):DB)PH x ]!
;::: IIPI12 1 [X H p(,6D A - iiD B )(ID A !2 + IDBI2)-I(,6DA - iiDB)pHX]!
> IIPII-I(xHpllpx)! min l,6ai - ii;3;1
- 2 i v lail2 + l;3il 2
;::: 1\:2(X) min p( (a;, ;3i), (ii,,6))"
1
(2.7)
The theorem now follows on combining (2.6) and (2.7). .
Recall t.hat we defined the spectra variation sv A (A) of A with respect
to A as the largest distance of an eigenvalue of A from the nearest
eigenvalue of A [see (IV.l.l)]. If we define SV(A,B)[(A, B)] analogously,
then the conclusion of Theorem 2.6 can be written
SV(A,B)[(A, B)] ::; 1\:2(X)pd(A, B), (A, B)].
Although the bound (2.5) is satisfactory in many ways, it is difficult
to Ilse when we only know bounds on the perturbations E ald _ F.
One approach is to use Theorem 111.4.1 to bound pd(A, B), (A, B)]"
However, another approach is to adapt the proof of the thearem to give
a direct bound.
Theorem 2.7. In addition to the hypotheses of Theorem 2.6, suppose
that the columns of X and Yare normalized so that ID AI2 + IDBI2 = I.
Then
SV(A,lJ)[(A, B)] ::; IIXlbIlYIl211(E F)1I2'
(2.8)
2. REGULAR MATRIX PAInS
303
Proof. Consider the equivalent pairs (D A, DB) and (D A +y lI EX, D B +
yHFX). We have
,6D A x - aDBx = ,6y H EXx - iiyHFXx
= yll(E F) ( Xx )
-aXx
Hence
II,6D A x - iiD B xll2 ::; IIX1I211Y1I211(E F)lb.
Since D A and DB are diagonal and IDAI2 + IDBI2 = I, it is trivial to
verify that
II,6D A x - iiD B x1l2;::: miIIX((ai,;3i), (ii,,6))" .
,
If in the above theorem we assume that lIy;ll2 = 1, then IIx;ll2 is the
condition number Vi. Mareover, II Y ll2 ::; ..;n and IIXII ::; ..;n maxi l/i.
This gives the following corollary"
eorollary 2.8. Let
Vi =
Ilx;l1211y;/12
V l a il 2 + 1;3;/2'
Then
SV(A,B)[(A, B)] ::; n max v;/I(E F)II2'
,
2.4. Eigenspaces
In this section we will treat the perturbation of eigenspaces, which are
the natural generalization of invariant subspaces. The theory largely
parallels theory of invariant subspaces developed in Chapter V, and the
exposition here will be a little terser than usual.
Definition 2.9. Let (A, B) be a regular matrix pair. The subspace X
is an EIGENSPACE if
dim(AX + BX) ::; dim(X)"
(2.9)
304
VI. GENERALIZED EIGENVALUE PROBLEMS
If dim(,1') = l, then (2.9) implies that both AX and 13X are contained
in a subspace Y of dimension l. In otht'r words, A and B have essentially
the same effect on X"
Eigenspaces have the following characterizations.
Theorem 2.10. Let (A, 13) be a regular matrix pair and let X be a
subspaCl? of dimension l. TheIl the following statements are equivalent.
1. X is an I?igenspace of (A, 13).
2. There are nonsingular matrices (X I U 2 ) and (VI Y2) such that
R(X]) = X and
( VII ) ( AI HA )
y:1I A(X I U 2 ) = 0 A 2 '
( BI HB ) .
o B 2
(2.10)
( v:H )
H B(X I U 2 ) =
Moreover thl? pairs (AI, Bd and (A2' B 2 ) are regular.
:t If tile colUllJllS of XI [or a IJasis [or X, tJ]eIl I.her(' is a regular
pair (AI, Bd such Ihat
AXIB] = BXIA].
(2.11)
Remark 2.11. The proof will show tl1at we may take (XI [[2) and
(VI Y2) to be unitary.
Proof. 1 => 2: Let (XI U 2 ) be a unitary matrix with R(X I ) = X.
Since X is an eigenspace, both AX and B X lie in a subspace Y of
dimension l. If we let (VI Y2) be a unitary matrix with R(Vd = Y,
then (XI U 2 ) and (Vi Y2) are the required nonsingular matrices.
2 => 3: From (2.10) we have
AX I = VIAl and BX I = VIB I .
Since (AI, B d is regular, by Theorem 1.13 here ar: nonsingular matri-
ces Rand S such that Al = R H AIS and BI = R B]S commute. Let
2. REGULAR MATRIX PAIRS
305
XI = XIS and VI = VIR-II, so that R(X[) = X A/Y I VIA], and
B.);I = 7IB]. It follows that A.-Yd3 1 = BxIA]" '
If the columns of XI form a basis for X then XI = XI T for some
nonsingular matrix T. The conclusion follows on setting B] = T-181
and Al = T-I A].
3 => 1: By Theorem 1.13 we may assume that A] = diag(J, I) and
BI = diag(J, N)" Let P = AX I and Q = BX I . Then with the natural
partitioning,
(r, r,) ( ) (Q, Q,) ( ).
It follows that R(QI) C R(Pd and R(P2) C R(Q2)' Hence
dim[R(P) + R(Q)] = dim[R(P I ) + R(PJ) + R(QI) + R(QI)]
= dim[R(P I ) + R(Q2)] :::; l. .
Equation (2.10) shows that in some sense the pair (AI, B I ) is a
representation of the part of (A, B) associated with X. In particular,
if (0', (3) is an eigenvalue (AI, B I ), then it is an eigenvalue of (A, B)"
Moreover, if B is nOllsinglllar, then XI is an invariant sllbspace of B-] A"
See Exercises 2"3 and 2"2.
Equation (2.10) implies that R(Y 2 ) is a left eigenspace of (A, B) with
representation (A 2 , B 2 ). We will say that X is a SIMPLE EIGENSPACE if
.c[(A], B])] n .c[(A 2 , B 2 )] = 0.
By Theorem 1.11 this is sufficient for the existence of matrices P and
Q such that
( i)( ' :) U n (':,)
and
( i)(' :)( n (';,)
If we set
X 2 = U 2 + XIP
30G
VI. GENERALIZED EIGENVALUE PROBLEMS
ami
Y I = VI + Y 2 QH,
Then we have proved the following spectral resolution theorem.
Theorem 2.12. Let X be a simple eigcnspace of the regular pair
(A, B). Thcn there are nonsingular matrices (Xl X 2 ) and (Y I Y 2 ) such
that
( }'II ) ( A 0 )
I H A(X I X 2 ) = I
Y 2 0 A 2
(2.12)
and
( y/I ) ( BI 0 )
H B(X I X 2 ) = .
Y2 0 B 2
(2.13)
In analogy with the terminology for invariant subspaces, we call
R(X 2 ) the COMPLEMENTARY EIGENSPACE. The spaces R(Yd and R(Y 2 )
are the corresponding left eigenspaces.
Turning now to the perturbation of eigenspaces, we begin with an
approximation theorem. Let (XI U 2 ) and (VI Y 2 ) be nonsingular and
set
( V/I ) . ( AI Ii A )
'II A(X 1 [12) =, ,
) 2 (, A ib
( VIII ) B ( X U ) = ( BI H II ) .
y 2 H I 2 C B B 2
If C A = C B = 0, then R(X I ) is an eigenspace of (A, B). We now
su ppose that C A ancl C B are small, and ask how near R( X I) is to an
eigenspace.
In analogy with the ordinary eigenvalue problem, we introduce per-
turbations
(2"14)
, T II
XI = X j + [hP and }2 = }2 + VIQ
and attempt to determine P and Q so that
Y2kYI = }2H/YI = O.
(2.15)
This leads directly to the system of equations
QA I + A 2 P = -C A - QHAP,
QB I + B 2 P = -C B - QHBP.
(2.16)
2. REGULAR MATRIX PAIRS
307
If we set
T = (P, Q) I-> (QA I + A 2 P, QB I + B 2 P),
then (2.16) can be written
T(P, Q) = -(C A + QHAP, C ll + QHBP). (2"17)
To establish a perturbation bound we must introcluce a norm on the
ace of pairs (P, Q). Here we will work with the norm II . IIF defined
II(P, Q)IIF f max{IIPIIF, IIQIIF}.
If we define
dif[(A I , Bd, (A 2 , B 2 )] ,t inf IIT(P Q)IIF ( 2.18 )
II(P,Q)ILF=I ' ,
then by Theorem 1.11, dif[(A I , Bd, (A 2 , B 2 )] > 0 if and only if the
spectra of (AI, B I ) and (A 2 , B 2 ) are disjoint. For later use not that
dif[(A l + E I , BI + Fd, (A 2 + E 2 , B 2 + F 2 )]
dif[(A I , Bd, (A 2 , B 2 )] - max{IIEdh + IIE 2 112, IIFdl2 + IIF2Ih}.
T' (2"19)
"" Ith these preliminaries, we may now turn to the approximation
theorem.
Theorem 2.13. Let the regular pair (A, B) be as in (2.14). Set
i = II(CA,CB)IIF, 1} = II(HA,HB)IIF
Assume that L:[(A I , Bd] n L:[(A 2 , B 2 )] = 0 so that
6 = dif[(A I , Bd, (A 2 , B 2 )] > o.
Then if
rn 1
82 < 4'
there is a unique solution (P, Q) of (2"17) satisfying
II(P, Q)IIF ::::; 6 vI: 7 < 2:l (2.20)
+ - 4i1J 6
he column spces of XI and }'2 defined by (2.15) are complementaq
nght and left elgenspaces of (A, B) corresponding to the regular pairs
(1 1 . HAP, BI + HBP) and (A 2 + QH A , B 2 + QH ll ), whose spectra are
dIsJOInt.
308
VI. GENERALIZED EIGENVALUE PROI3LEMS
2. REGULAR MATRIX PAIRS
309
Proof. Let 'P[(P, Q)] = (QHAP, QU n ?)" Then it is easy to see that
the conditions of Theorem V.2.11 are satisfied, which establishes the
existence of (P, Q) satisfying (2.17) and (2.20).
To prove the statements about '-X"I and 1'2, consider the equivalences
( I 0 ) ( F/' ) A ( X U ) ( I 0 )
Q I 1:]' I 2 ? I
_ ( '/I ) , ) _ ( AI + HAP
- }'p A(X I U 2 - 0
H A )
A 2 + QH A
Although the function dif is nonzero if and only if the spectra of its
arguments are disjoint, its size is not directly related to the distance
between the spectra - either in the complex plane or on the Riemann
sphere. In fact, multiplying the arguments of dif by a common scalar
increases dif by the absolute value of the scalar without changing the
spectra of the arguments.
From (2.14), we see that if (XI U 2 ) and (VI 1'2) are unitary, th(']] rl is
thp F-nonn of a perturbation (E F) such that R(X 1 ) is anl'igenspace
of (A + E, B + F). Namely, take
and
E = (VI Y2) ( 0
-G A
)(:)
- \? G vll
- - 1 2 A .-\. I
( I 0 ) ( VII ) ( I 0 )
Q I 11I B(X I U 2 ) P I
( VIH ) " ( BI + HBP H B )
= "lI B(X I U 2 ) = QH .
r 2 0 B 2 + B
ane!
This shows that XI and r'2 are complementary right and left eigen-
spaces.
To prove the statement about the spectra, note that by (2.19)
( 0 0 ) ( Xli )
F = (VI 1'2) I = -Y2GnXtl.
-G B 0 U 2
However, this backward perturbation is not necessarily the smallest one
with this property.
There is a perturbation theorem corresponding to Theorem 2.13.
Its proof is left as an exercise.
dif[(A I + HAP, BI + Hn P ), (A 2 + QH A , B 2 + QH B )]
2: {y - max{IIHAlldllPIIF + IIQIIF)IIHBIIF(lIPIIF + IIQIIF)}
2: {y - 211(IIA, IIn)II.FII(P, Q)II.F
> {y - 4 171' > O. .
{y
Theorem 2.14. Let R(X I ) be an eigenspace of the regular pair (A, B),
and let the pair have the decomposition (2.10). Given the perturbation
(E, F), let
( : ) ;(X, U,)
( 0 ) F(X, U,)
( Ell
E 21
( Fll
F 2 !
El2 ) ,
E 22
F I2 )
F 22 "
If (XI, U 2 ) and (VI, 1'2) are unitary (see Remark 2.11), then (2.O)
bounds the tangents of the canonical angles between R(XJ) and R(X I )
or R(Yi) and R(Y I ), just as in the ordinary eigenvalue problem. Un-
fortunately, the theorem provides only a single bound for both P and
Q. This problem is characteristic of the perturbation theory of matrix
pairs"
There is nothing sacred about the normll.II.F. Any narm that allows
the conditions of Tlworem 2.11 to be verified will do"
Set
l' = II(E 21 , F 21 )1I.F,
ij = II(H A + E l2 , H B + F l2 )II.F,
{y = dif[(A I , B I ), (A 2 , B 2 )]
- max{IIEllllF + IIE 22 I1F, IlFuliF + IlFnlld.
If {y > 0 and
-- 1
1' < _
6 2 4'
310
VI. GENERALIZED EIGENVALUE PROBLEMS
2. REGULAR MATRIX PAIRS
311
thcn there are matrices P and Q satisfying
21 2 1
II (P, Q) 11.1' S; - ,( _ _ < b
8 + 8 2 - 4,17
then there are matrices P and Q satisfying
such tha t t]J(' columns of
- } -. } " \/f}lI
X I =X I +U 2 P and 2= 2+ I";:"
span left and right complemental}' eigenspaces of (A + E, B + F) cor-
responding to the regular pairs
[AI + Ell + (H A + E l2 )P, BI + FIl + (H B + F I2 )P]
2- -
II(P, Q)IIF S; - \t' < 2:2
8 + 8 2 - 41 1 / 8
such that the columns of
Xl = Xl + U 2 P and }'2 = Y2 + VI QH
span left and right complementary eigenspaces of (A + E, B + F) cor-
responding to the regular pairs
and
(AI + Ell + E I2 P, BI + Fll + F I2 P)
[A2 + E 22 + Q(H A + E I2 ), B 2 + F 22 + Q(H B + F I2 )]
The spectra of these pairs are disjoint.
If instead of starting with the block triangularization (2" 10), we start
with a spectral resolution -- that is with a block diagonal form - then
II A and H B vanish and we obtain a sharper bound on the spectra.
Theorem 2.15. Let R(X I ) be an eigenspace of the regular pair (A, B),
and let the pair have the spectral resolution (2.12) and (2.13). Given
the perturbation (E, F), let
and
(A 2 + E 22 + QE I2 , B 2 + F 22 + QF I2 )
The spectra of these pairs are disjoint.
When XI has only one column-i.e., when it is an eigenvector--
Theorem 2.15 shows that the approximation (0- 1 , (31) (al +yI EXI, (31 +
yr FXI) is accurate up to terms of second order in the error. Thus the
theorem gives another proof of Theorem 2.2.
Notes and References
( : ) E(X, U,) (;: ;:),
( }: ) F(X, U,) (;;: ;:)
lIT 1
2 < l'
The first order perturbation analysis is new, as is the systematic use of the
condition number v defined by (2.2).
The Gcrschgorin theory and its application to multiple eigenvalues is from
a paper by Stewart [204, 1975]. The simplified bounds are due to Sun [228,
19 8 5] .
The generalization of the Bauer- Fike theorem is due to Elsner and Sun [67,
Ig82]. This paper also contains generalizations of Henrici's theorem and of
the HoffmanWielandt theorem for "normal" pairs --- pairs for which B-1 A
is normal (in the case where B is nonsingular).
The perturbation theory for eigenspaces is taken from a paper by Stewart
[201, 197 2 ], where eigenspaces were called deflating subspaces. Although
this theory is asymptotically sharp, it is complicated by the fact that the
function dif is not easy to interpreL When the concern is with eiglmvectors,
it is possible to write out explicit perturbation expansions (Exercise 2"G).
Set
1 = II(E 21 , F 2 dIlF,
i} = IIA + E 12 , F I2 )IIF,
'6 = dif[(A I , Bd, (A 2 , B 2 )]
- max{IIEIlIIF + IIE 22 11F, IIFllllF + IIF 22 1IF}.
If 8 > 0 and
312
VI. GENERALIZED EIGENVALUE PROBLEMS
Exercises
1. Let (A, B) be a regular matrix pair. Then there is a permutation matrix
P such that no diagonal of (AP, BP) is (0.0). [Hint: Use Theorem IL3.14.]
2. Ld X be an eigenspan' of t.lw regular pair (A, B) and let AX Ilh =
BXIA I as in (2"J J). Show that. if z is an eigenvector of (AI, B I ! then XBlz
(or X I A I z if 11] z = ()) is an eigenvector o.f (A, B). Conversely If x IS an
pig(nv(d,or of (A, B), then t.here is an eIgenvector z of (AI, BI) such t.hat
x = XIBlz (or :1: = XIAIZ if BIz = 0).
3. Let X be an eigenspace of the regular pair (l B). Show that if B is
nonsin/!;ular t.hen X is an invariant subspace of B A.
4. Show that
(1if[(A I + E I , ill + F 1 ), (A 2 + E 2 , B 2 + F2)]
2 dif[(I1I, lJ 1 ), (112. 11 2 )] -- maxi 11/<;1112 + 11/<;2112, 11F1112 + IW,dI2}
5. Show that
dif[(A1,J), (A 2 ,J)] :::: sePF(A I , AI).
Moreover, if IIA1112, IIAI1I2 :::: 1, then
1
dif[(AI,J), (A 2 , 1)] 2 2 sepF (AI, AI).
G. UIHlPr t.he hypot.heses of Theorem 2.13, show that when Al = 0'1 is a
scalar,
j:1 = .1:1 - U 2 (j3 1 A 2 - ():IB2)I(j319A - n19B) + ()(11(9A,9B)II}).
3. Definite Matrix Pairs
In this the concluding section we will treat the perturbatin o eigen-
values and eigenspaces of definite matrix pairs. We begm wIth the
I f Corolla r y IV 4 6 which g ives a uniform bound for all the
ana ogue 0 . " , ., .
eigenvalues of a definite pair. We thn look at the speClal.lzatJon to
definite pairs of our general theory of elgenspaces devlopedm the last
section. Finally we consider some direct bounds for elgenspaces.
Tlm>llgllOllt, this sectioll (A, B) will dcnote i1 definite nw-
trix pair of order n.
3. DEFINITE MATRIX PAIRS
313
3.1. Eigenvalues of Definite Pairs
Let us begin with some general observations on the condition of eigen-
values of a definite matrix pair. If x is an eigenvector of (A, B) corre-
sponding to the eigenvalue (0',/3) = (xHAx,xIlBx), then the number
"x"
II =
V (x H Ax)2 + (x Il Bx)2
is a condition number for (0', /3) in the chordal metric. This fact has
two consequences.
First, the eigenvalues of a definite pair, unlike the eigenvalues of a
Hermitian matrix, are not automatically well conditioned. As in the
Hermitian case, small eigenvalues can be ill conditioned in a relative
sense; but eigenvalucs of ordinary sizc can be ill condit.ioned in an
absolut.e licnsc" For example, the eigcnvalue (1) of the pair
[ ( O02)' ( OOl)]
is insensitive to perturbations of magnitude 10- 4 , but the eigenvalue
(2) is quite sensitive.
Second, we defined (A, B) to be definite if the number
,(A, B) = min V (x H Ax)2 + (xHB.r)2
IlxllFI
is nonzero. We now see that ,-I(A, B) is an upper bound on the
condition of the eigenvalues. Thus although the eigenvalues of a definite
pair can be ill conditioned, the degree of ill conditioning is bounded.
The motor that drives the perturbation theory of Hermitian matri-
ces is the natural ordering of the real line, which defines an association
between the eigenvalues of a Hermitian matrix and its perturbation.
Eigenvalues of definite pairs also have an ordering, although it is not as
natural. To define it, let (A, B) and (A, B) be definite pairs. By The-
orem 1.18 the field of values F(A + iB) lies in a half plane that does
not contain the origin and F(A + ill) lie in another such half plane"
Therefare, there is a ray CJ, emanating from the origin that lies in nei-
ther half plane. Given any real pair (0', /3) =/c (0,0), define 8(0, /3) to
314
VI. GENERALIZED EIGENVALUE PROBLEMS
3. DEFINITE MATRIX PAIRS
315
be the angle the line from the origin to (0:, (J) makes with 0, measured
clockwise.
This construction allows us to associate angles with the eigenvalues
of a definite pair and a perturbation of the pair. Specifically, we will
suppose the pair (A, B) has eigenvalues (0:;, (J;) (i = 1,. .. , n) and set
(}i = (}(O:i, (3 i )"
(}=o
(}
,
I (} II I tl LES f (A B) W e will assume
Tlw mun )('rs ; are ca e( Ie F:IGENANG . 0 , .
that the eigenangles are ordered so that
(}i
o ::; (h ::; . . . ::; ()n < 7r.
(3.1 )
Figure 3.1: Eigenangles and Their Bounds
The eigenangles of the pair (A, B) are defined similarly.
Eigenangles have a variational characterization.
Lemma 3.1. With the ordering (3" 1), the eigenangles of the definite
pair (A, B) satisfy
then (A, B) is definite. Moreover, if the eigenvalues (O:i, (Ji) are 01'-
dere so that their eigenangles (}i are nondecreasing and the eigenvalues
(ii;, (Ji) (i = 1,."., n) are ordered similarly, then lei - 0;/ < and
. ( II A , li B )
e i = nun "max() x x,x x
dim(X)=, TE.\'
x#o
(3.2)
x( (O:i, (Ji), (6 i , i3i)) ::; PD[(A, B), (A, B)],
i=I,...,TL
(3.4)
and
Proof. Recalling that (A, B) is definite if and only if ')'(.1, B) > 0, we
have
() = max min()(xHAx,xHBx).
, tlim(X)=n-i+1 TEX
,,#0
(3.3)
')'(A, B) 2: min { V (x H Ax)2 + (xIIBx)2 - V (x H Ex)2 + ( :r H Fx)2 }
Ilxl!FI
2: (1 - (h(A, B) > O.
Proof. By Theorem 1.18, we may assume that B is positive definite.
Then for some fixed angle eo,
(x Il A X )
II II -I
()(x Ax,x Bx)=eo+cot xlIBx '
Hence (A, B) is definite.
Now SUppose that B i 2: e i " Let A' be a subspace for which the
minimum is attained in (3.2). Then
The lemma now follows from Fischer's min-max characterization (Corol-
lary 1.16)" .
The main theorem of this section bounds the chordal metric of the
perturbation of the eigenvalues in terms of the metric po introduced in
Section 1.
Theorem 3.2. Let (A, B) be a definite pair and let (.4, B) = (A +
E, B + F). If
B i ::; max e( x H Ax, xII B x).
TEA'
x#o
Let x be a vector for which the above maximum is attained. Then
()(xHAx,xHBx)::; e i ::; (}i::; ()(xIIAx,xHBx).
(== max
""""2= I
(xIIEx)2 + (x H Fx)2
2 < 1,
(:r: 1I A:r:)2 + (:r: 1I B:r:)
These inequalities are pitured!n FiguEe 3.1, in which (xIlAx,xHBx)
is denoted by Z and (xHAx, XII Bx) by Z.
Since ( < 1, we hav IZZ, < 1021. Ilence the angle ZOZ is less than
, which implies that e i - e; < . Moreover, if we let ZP be the line
316
VI. GENERALIZED EIGENVALUE PROBLEMS
3. DEFINITE MATRIX PAIRS
317
from 'l perpendicular to OZ, then elementary geometry gives
Example 3.4. Let
- -" I
()i - (Ii :S ZOZ = sm
_ ( ) 2
1 10ZI
A= ( 1 0 )
o -1 '
and for '/ > 0 let
(:r ll A:r:r ll A:r + :1'11 A:r:r ll A:r)2
= sin- I 1 - __
[(:r Il A:r)2 + (:l:IIB:r)2][(x Il Ax)2 + (x Il B:r)2]
= sin- 1 X[(x ll Ax, xII Bx), (Xli Ax, xII Ex)]
B = ( 1 + 1/ '/ - 1 )
2 ,/-11+1/ .
The eigenvalues of Bare 1 and 1/, the latter corresponding to the eigen-
vector 1. Since 1 H A1 = 0,
1--
sin- PD[(A, B), (A, B)].
,(A, B) = 1/- 1 .
Since trace(A- 1 B) = 0 and det(A- I B) = ry, the eigenvalues of the pair
(A, B) are (:f: 1/ fij).
Now the corresponding eigenvectors are of the form x = (1 OT,
where
(3.5)
But is is easily verified_ that X[(O'i,.8i), (Ci i ..!;3i)] sin(Oi - ()i)' Hence
(3.4) holds when ()i :S ()i' The case ()i 2: ()i is established in a similar
manner, beginning with the characterization (3"3). .
We may obtain a bOllnd in terms of II Ell and IIFII by observing that
[( A B ) ( A E )] < ( < jllEII2 + IIFII2
Po , , , - - ,(A, B)
l-ry
= 1 :f: 2 r.; 1 =F 2Jii"
v ' /- '/
It follows that
_ _ j llEII2 + 11F112
p( (0';, .8;), (fYi, .8i)) :S ,(A, B) ,
:rTx < _ 1
max Vi <
i - Ix T Axl '" 2fij'
Thus ,(A, B) = 0(1/-1) while maxi Vi = 0(1/-).
Even this example would not be damning if ,( A, B) really reflected
the effects of perturbations when ( is near one. However, even in this
case the condition numbers will give a more realistic estimate. The
reason is that for the second inequality in (3"5) to be realistic, the
values of (x H Ax)2 + (:r H Bx)2 and (.r H Ax)2 + (X ll iJX)2 must be nearly
minimal, which is unlikely_
Thus, we havC' the following corollary"
eorollary 3.3. 1f
j llEI12 + IIFII2
< 1,
,(A, B)
the jJijir (A, B) is definite and
i = 1,..., n.
(3.6)
Theorem 3.2 and its Corollary 3.3 have pretty forms, but their con-
tent is less than satisfactory. The bound (3.6), for example, depends
on ,(A, B)I, which is greater than the largest individual condition
number of the eigenvalues. Of course it is to be expected that a bound
for all the eigenvalues would depend on maxi Vi, since it must take into
account the worst case" The trouble is that ,(A, B)I can be arbitrarily
largpr than max; 1/;, i1S the' following example shows.
3.2. Eigenspaces
The theory of eigenspaces for definite pairs, like its counterpart for
Hermitian matrices, is both simpler and more complex than the general
theory. On the one hand, the assumption of definiteness simplifies the
general theory; on the other hand the same assumption gives us more
structure to exploit in extending the theory"
The basic fact of eigenspaces of definite pairs is that a right eigen-
space is also a left. eigenspace.
318
VI. GENEItALIZED EIGENVALUE PROBLEMS
:3. DEFINITE MATRIX PAInS
319
Theorem 3.5. Let (A, B) oe definite" Let the columns of XI span an
eigenspace of (A, B). Then there is a matrix X 2 such that (XI X 2 ) is
nonsingular and the pair (A, B) has the spectral resolution
and
( XII ) ( A 0 )
.:, A(X 1 X 2 ) = I
Xl 0 A 2
(3.7)
max IIxdl ::; 1'-I(A, B).
,
The normalization of the resolution allows us to give an explicit
bound for the function dif in terms of the eigenvalues of the pair (A, B).
Theorem 3.6. Let AI, A 2 , B I , and B 2 satisfY' (3.9), and let
/5 = Inin '" (( 0:" (3 , ) ( 0:" (3 " )]
l<l<k A 1, l, l' J .
k+lJn
amj
( XII )
I B(X I X 2 ) =
X 2
( BI 0 )
o B 2
(3.8)
Then
Proof. It is easily verified that R( XI) is an eigenspace of the pair
(A cos 1J - B sin 1J, A sin 1J + B cos 1J)" Hence by Theorem 1.18 we may
assume that B is positive definite. It follow that XI and BX I are
acute (Definition III.3.2). Hence if the columns of X 2 form a basis
for R(XdJ., the matrix (XI X 2 ) is nonsingular (Exercise lII.3.2) and
XJ BX I = O. Since R(AX I ) C R(BX I ) it follows that XJ AX I = 0,
which establishes (3.7) and (3.8).
Since the pairs (Ai, B i ) (i = 1, 2) are definite, by Corollary 1.19 there
are nonsingular matrices U i such that (Ul l AiU i , ut J BiU i ) are diagonal.
If we make the substitutions Xi +-- XiU i (i = 1,2), then (Xl X 2 )
diagonalizes (A, B). .
The second part of the theorem allows to assume that the matrices
AI, ib, B I, and B 2 of the spectral resolution (3.7)'-{3.8) have the form
/5
V2 ::; dif[(A I , Bd, (A 2 , B 2 )] ::; /5.
Proof. To establish the lower bound we must show that for all Rand
S the solution of the system
Moreover, XI and X 2 may oe chosen so that AI, A 2 , B I , B 2 are diagonal
(1.e", the columns of (XI X 2 ) are eigenvectors.)
QA I + A 2 P = R
QB I + B 2 P = S
(3"11)
satisfies
II(P, Q)IIF ::; J211(, S)IIF . (3.12)
If we postmultiply the first of the equations (3.11) by BI and the sec-
od by Al and subtract, we get (remember that since Al and BI are
dIagonal, they commute)
A 2 P BI - B 2 P Al = RBI - SA 2 .
Hence the (i, j)-element of P is given by
Al = diag(O:I"'" Ok),
BI = diag((3I, . . . , (3k),
0 2 + (3 2 = 1
, , ,
A 2 = diag(ok+I" . . , on),
B 2 = diag((3k+I,. .., (3n),
i=I,...,n.
(3"9)
., _ Pij(3j - aijaj
"'J -
Oi+k(3j - (3i+kOj
Since the diagonal pairs (AI, Bd and (A 2 , B 2 ) are normalized,
1 7r 1 2 < Ip'JI 2 + la'Jl 2
'J - /52
(3.13)
In this case we will say that the spectral resolution (3.7)-(3.8) is NOR-
MALIZED. Among other things, normalization implies that the columns
of X = (XI X 2 ) satisfy
Hence
IIxdl = Vi,
i=I,...,n,
(3.10)
IIPIIF ::; /IIRII + IISII < J211(R, S)IIF
/5 - /5 .
320
VI. GENERALIZED EIGENVALUE PROI3LEMS
3. DEFINITE MATRIX PAIRS
321
I3y a similar argument,
and
IIQIIF ::; J211, S)IIF ,
- 1
6=-
J2
nliq P[(O'i,i), (O'j,j)] - nvmax{IIEII2, IIFII2)}'
k+l:'SJn
ane! (3" 12) follows"
To C'stahlish the uppC'r boune!, we must show that there are matrices
R ane! S such that the solution of the system (3.11) satisfies
If
i 1
<-
Ii 2 '
then there are matrices P and Q satisfying
II(P, Q)IIF 2: II(R'6 S )IIF .
2- -
II(P, Q)IIF ::; l' < 22:
6 + / 6 2 - 4i 2 6
(3.14)
Let the minimum in the definition of 6 occur for the pairs (O'k+i' k+i)
ane! (ctj, j). Let R = sign(j)lilJ and R = sign(O'j)li1J, so that
II(R, S)IIF = 1. Then from (3"13),
II(P, Q)IIF 2: l 7r ijl = IOjl ; Ijl 2: l = II (R'6 S )IIF . .
such that the columns of
XI = XI + X 2 P and X 2 = X 2 + XIQII
span comp1emcntary eigenspaces of (A, B) corrcsponding to the pairs
(AI + Ell +Efip+pHE21 +pH E 22 P, BI +Fll +F2P+plIF21 +p H F 22 P)
(3.15)
and
V>le may now combine all these facts into a perturbation theorem
which is essentially a corollary of Theorem 2"14.
(A 2 + E 22 + E 21 QH +Q E +Q Ell QH, B2+ F 22 + E 21 QH +Q Fn +Q FII QH).
(3.1G)
Theorem 3.7. Let, Uw definite pair (/1, B) havc I,he s]Jcdra1 rcso1u-
tion (3.7)(3.8) satisfying (3"9). Let Vi (i = 1,..., n) oe the condition
numoers of the eigenvalues of (A, B), and set
Proof. We have
( X:' ) E ( X X ) = ( Ell
X II 1 2 E
2 21
( xp ) ( Fll
'H F(X I X 2 ) = ,
-\2 F 21
Ell )
21
E 22 '
p,H )
21
F 22 .
IIEllllF ::; IIXdlIIEII2 ::; k:vIlElb,
the last inequality following from (3.10). Similarly IIE22I1F ::; (n-
k)vIlElb, IlFuilF ::; k v llFlI2, andllF2211F ::; (n - k)vIlFII2' so that 6 is a
ower b?und on on dif[(A I + Ell, BI + F ll ), (A 2 + E 22 , B 2 + F 22 )]. The
mequahty (3.14) now follows from Theorem 2.14. The pairs (3.15) and
(3.15) are obtained by considering the diagonalizing congruences
1/ = luax Vi.
i
Giw'n the Hermitian perturbations A = A + E and iJ = B + F set
( I PH ) ( Au + Ell E ) ( I QlI )
Q I E 21 A 22 + E 22 P I
and
i = II(E 21 , F2dllF
( I PH ) ( Bll + Fu Fi: ) ( I QII ) . .
Q I F21 B 22 + F 22 P I
Let
322
VI. GENERALIZED EIGENVALUE PROULEMS
3. DEFINITE MATRIX PAIRS
323
When E and F are sufficiently small, the bound (3.14) assumes the
asymptotic form
But XI - XI = X 2 P. Hence
I!X I - XIIiF < IIX21!, max{I!Elb, IIFI!2}
I!XII!F '" 15
(n - k)vmax{I!Elb, I!Fl!d
< .
'" 15
Thus the ratio of the overall condition of the eigenvalues to their sepa-
ration is a condition number for the problem.
(3.17)
such that XI + ql is an eigenvector of (A, E) corresponding to the re-
maining eigenvalue ('\1)'
Note that this bound is closely related to the asymptotic bound
(3.17). The factor (n - k)v/15, which multiplies the error in (3.17),
corresponds to the factor ,(A, E)-I /15 in (3.18).
We turn now to thearems that bound the sin of the canonical angles
between eigenspaces" As usual they come in two varieties: one in the
Frobenius norm that requires no restrictions on the situation of the
eigenvalues and one in any unitarily invariant norm that requires the
eigenvalues to be suitably clustered.
Theorem 3.9. Let the definite pair (A, B) be decomposed as in (3" 7)
and (3.8), where XI and X 2 have orthonormal columns. Let the anal-
ogous decomposition be given for the pair (A, B) = (A + E, B + F)"
If
I!(P, Q)I!F:S II(E 21 ,;2dIlF ,
where
15 = min P[(Oi,,Bi), (OJ,,Bj)].
V 2 lS'Sk
k+l$.Jo<;;n
Since IIE 21 I1F:::; IIX I IIFIIX 2 11FIIEIb and similarly for IIF21I1F, we have
II ) 11 < IIX I II F IIX 2 11F max{11 Elb IIF112} .
[F", 15
15 == min{X((A), (,\)) : A E .c[(A I , B l )],,\ E .c[(A I , E I )]} > 0
3.3. Direct Bounds
We conclude this chapter with three direct bounds for eigenspaces,
which we state without proof.
The first bound is for an eigenvector.
Theorem 3.8. Let XI be a eigenvector of the definite pair (A, B) with
eigenvalue (AI)" Suppose that the definite pair (A, E) has n - 1 eigen-
values ('\i) (i = 1, . . . , n) such that
15 == min X( (AI), (.\i)) > O.
,>1
then
II sin 8 [ R ( X ) R ( X )]11 < VIl A2 + 211: VIIEXIIlf. + IIFXIIlf. .
I ,IF - ,(A, B)r(A, B) 15
Finally, we state a sin 8 theorem that is valid for all unitarily in-
variant norms.
Theorem 3.10. Let the definite pair (A, B) be decomposed as in (3.7)
and (3.8), where Xl and X 2 have orthonormal columns. Let the anal-
ogous decomposition be given for the pair (A, in = (A + E, B + F).
Suppose that there are numbers 0 2: 0 and 15 > 0 with 0 + 15 :::; 1 such
that for some real number,
.c[(Aj,B I )] C {(A): X((A),(r)):::; o}
and
f = V IlEII + I!FI!.
if f / 15 < ,(A, E), then there is a vector PI satisfying
.c[(A 2 , E 2 )] C {(A) : X( (A), (r)) 2: 0 + 15}.
Let
Then
11Th 112 < =: _ < 1
lI:rllb - 15,(A, B)
(3.18)
II sin 8[R(Xd, R( Xj)]I!F
7r(o, 15; ') V Il A2 + B21b V IlEXjll + IIF X 1 1!f.
< - - ,
- ,(A, B)r(A, B) 15
VI. GENERALIZED EIGENVALUE PROBLEMS
324
where
n(n,h;) {
)2 (0' + 6) + O'V I - (0' + 6)2
20' + 6
(0' + 6) + O'VI - (0' + 6)2
20' + 6
if I #- 0,
if I = O.
References
Notes and References
1'11(' conncction of the number ,(A, B) with perturbation theory for definite
pairs was first noted by Crawford [48, 1976], who used it to derive bounds
on the spectral variation of matrix pencils. Stewart [207, 1979] introduced
the angles associated with the eigenvalues and used the induced ordering to
bound the matching distance" The sharper form of the theorem given here
is due to Sun [222, 1982].
Theorem 3.8 is a special case of a theorem of Stewart [207, 1979]. The
sin 8 theorems are due to Sun [223, 225, 1983]. The second paper also
contains sin 28 theorems. A troublesome feature of the sin 8 theorems is
the appearance of two infima, in the denominator of the bounds. Whether
both of them should be there is an open question"
.J ust as thl' singular valul' d{composition is rdated to an associated semidd-
initc matrix, the generali7,ed singular value decomposition (Exercise 1.5.8) is
rdated to an associated definite pair. Sun [223, 224, 1983] gives perturbation
bounds for the generalized singular value decomposition. Paige [173, 1984]
gives a different derivation and bounds on the CS decomposition.
[1] N. N. Abdehnalek (1974). "On the Solution of Least Squares Problems
and Pseudo-Inverses." Computing 13, 215 228.
[2] S. N. Afriat (1956). "On the Latent Vectors and Characteristic Values
of Products of Pairs of Symmetric Idempotents." Quarterly JOUT'nal
of Mathematics 7, 76-78.
[:!] S" N" Afriat (1957). "Orthogonal and Oblique Projectors and the Char-
acteristics of Pairs of Vector Spaces." Proceedings of the Cambridge
Philosophical Society 53, 800-816.
[4] A. R. Amir-Moez (1956). "Extremal Properties of Eigenvalues of a
Hermitian Transformation and the Singular Values of the Sum and
Product of Linear Transformations." Duke Mathematical Journal 23
463-476. '
Exercises
1. Show that the ill-conditioning of one eigenvalue of a definite pair can infect
the others by considering the matrices A = diag( 1, 10- 8 ), B = diag( 1,2 .
1O8),
[5] T. Ando (1989). "Majorization, Doubly Stochastic Matrices, and Com-
parison of Eigenvalues." Linear' Algebra and Its Applications 118,
163 -248.
_ ( 1 .j2 . 10- 8 )
A= ,
.j2 . 10- 8 2 . 1O8
[6] M. Arioli, 1. S. Duff, and P. P. M. van Rijk (1989). "On the Auge-
mented System Approach to Sparse Least-Squares Problems." Nu-
mer'ische Mathematik 55, 667685.
[7] L. Autonne (1902). "Sur II's groupes lineaires, reels 1'1, orthogonaux."
Bulletin de la Societe Mathernatique de Fr'ance 30, 121-134.
[8] L. Autonne (1913). "Sur II's matrices hypohcnnitiennes 1'1, II's unitairs."
Comptes Rendus de l'Academie des Sciences, Par'is 156, 858"-860.
and B = B.
2. Show that the eigenanglcs O(oo,{3) and 0(&,13) satisfy
sinI0((1,jj) - O(oo,{3)1 = X((o:,{3), (&,/3)).
325
326
REFERENCES
REFERENCES
327
[g] S. Banach (1922)" "Sur les operations dans !fs ensembles abstraits et
lcur application aux equations integrales." Fundementa Mathematicae
3, 133 181.
[10] S" Banach (1929). "Sur les functionnelles lineaires II." Studia Mathe-
1/wfica 1, 223 2:39.
[11] It IL Bartels and G. W" Stewart (1972). "Algorithm 432: The Solution
oft.lH' Matrix Equation AX -IJX = C"" C01/l,1/I,unication8 of the AC1H
8, 820 826.
[12] H. Bateman (1908). "A Formula for the Solving Functionof a Cer-
tain Integral Equation of the Second Kind." Cambridge Phzlosophzcal
Transactions 20, 179- 187.
[13] F. L. Bauer (1963). "Optimally Scaled Matrices." Numerische Math-
ematik 5, 73-87.
[14] F. L. Bauer (1966). "Genauigkeitsfragen bei der Losung linear Gleich-
ungssysteme." Zeitsehrift Fir' angewandte Mathematzk und Mechanzk
46, 409421.
[15] F. L. Bauer and C" T. Fike (1960). "Norms and Exclusion Theorems."
N1L1/U'7"i8r:he Mathematik 2, 137 141.
[16] F. L" Baucr and A. S. Householder (1960). "Moments and Character-
istic Roots." Numer"ische Mathematik 2, 4243.
[17] H. Baumgiirtel (1972). Endlichdimensionale Analytische Stonmgs-
theOT'ie. Akademie- Verlag, Berlin. Cited in [28].
[18] C. Bavely and G. W. Stewart (1979). "An Algorithm for Computing
Heducing Subspaces by Block DiagonalillatioTL" SIAM .Jom'nal on
Numerical Analysis 16, 359-367.
[19] A" E. Beaton, D. B. Rubin, and J. L. Barone (1976). "The Accept-
ability of Regression Solutions: Another Look at Computational Ac-
curacy." .JoU7'7wl of the A merican Statistical Association 71, 158-168.
[20] E. F. Beckenbach and R. Bellman (1971). Inequalities. Springer, New
York.
[21] R. Bellman (1970)" Int7'Oduction to Matrix Analysis. Mc Graw-Hill,
New York.
[22] D. A. Bclsley, A. E. Kuh, and R. E. Welsch (1980). Regression Diag-
1I08tirs: Identifying Infiurntial Data and Sou.rces of Collzneanty. John
Wiley and Sons, New York.
[23] E. Beltrami (1873). "Sulle Funzioni Bilineari." Giornale di Matem-
atiche ud 1LSO Degli Studenti Delle Unive7'sita, 11, 98 106.
[24] A. Ben-Israel (1966). "On Error Bounds for Generalized Inverses."
SIAM Jom'nal on Nume7'ical Analysis 3, 585- 592.
[25] I. Bcndixson (1902). "Sur les racines d'une equation fondemental."
Acta Mathematica 3, 359-366.
[26] P. G. Bergman, It Penfield, R. Schiller, and If" Zatkis (1950). "The
Hamiltonian of the General Theory of Relativity with Electromagnetic
Field." Physical Review 52, 1950.
[27] E. Berkson (1963). "Some Metrics on the Subspaces of a Banach
Space." Pacific .Journal of Mathematics 13, 7-22.
[28] R. Bhatia (1987). Perturbation Bounds for Matrix Eigenvalues. Pit-
man Research Notes in Mathematics. Longmann Scientific & Techni-
cal, Harlow, Essex. Published in the USA by John Wiley.
[29] R Bhatia and C. Davis (1984). "A Bound for the Spectral Variation
of a Unitary Operator." Linear and Multilinear Algebra 15, 71 76.
[30]R. Bhatia, C. Davis, and P. Koosis (1987). "An Extremal Problem in
Fourier Analysis with Applications to Operator Theorem." Preprint
cited in [28].
[31] R. Bhatia, C. Davis, and A. McIntosh (1983). "Perturbation of Spec-
tral Subspaces and Solution of Linear Operator Equations." Linear
Algebra and Its Applications 52-53, 45-67.
[32] R. Bhatia and J. A. R. Holbrook (1985). "Short Normal Paths and
Spectral Variation." Proceedings of the Amer'ican Mathematical Soci-
ety 94, 377. 382.
[33] G. D. Birkhoff (1946). "Tres Observaciones Sobre el Algebra Lineal."
Univer'sidad Nacional de Tucuman Revista, Serie A 5, 147 _ 151.
[34] A. Bjerhammer (1951). "Rectangular Reciprocal Matrices with Special
Reference to Geodetic Calculations." Bulletin Geodesique 52, 118-220.
[35] A. Bjorck (1967). "Solving Linear Least. Squares Problems by GralIl-
Schmidt Orthogonalization." BIT 7, 1-21.
[36] A. Bjorck (1989). "Componentwise Backward Errors and Condition
Estimates for Linear Least Square Problems." Technical Report LiTH-
MATH - R-1989-13, Department of Mathemat.ics, Li nkoping U ni versi ty.
328 _________,__
REFERENCES
REFERENCES
329
[37] A. Ujijrck and G. tL Golub (197:3). "Numerical Methods for Comput-
ing Angles between Linear Subspaces." Mathematir.s of Computation
27,579 594.
[38] Ake Bjijrck (1987). "Least Squares Methods." Working paper, Depart-
ment of Mathematic, Link6ping University. To appear in Handbook of
NumfTical Analysis, V.i: Solution of Equations in R", P. G. Ciarlet
and .1" L. Lions editors, Elsevier, North Holland.
[39] C. W" Borchart (1857). "13emerkung iiber die beiden vorstehenden
Aufsiitlle." .l01lnwl j1"il' die nine und angrwandte Mathematik 53, 281 -
283.
[40] .J. n. Bunch, .1. W. Demmel, and C. F. Van Loan (1989). "The Strong
Stability of Algorithms for Solving Symmetric Linear Systems." SIAM
.l01l1'Twl on Matri.T Analysis and Applir.ations 10, 494-499.
[41] A. L. Cauchy (1821). "Cours d'analyse de l'Ecole Royale Poly tech-
nique." In Oeuvres Completes (Ir Serie), volume 3.
[42] A. L. Cauchy (1829). "Sur l'equation Ii l'aide de laquelle on determine
les inegalites scculaires des mouvements des planetes." In Oeuvres
Completes (lIe Serie), volume 9.
[43] F. Chatdin (1983). Spectral Approximation of Lincar Operators" Aca-
demic Press, N(,w York.
[,tot] P. L. ChebyslHv (1859). "Sur !'interpolation par la methode des moin-
dres carres." Mhnoires de l'Academie Imperiale des sciences de St.-
Petrrsbo1l1:q, VIle serie 15, 1 24.
[45] A. K. Cline, C. 13. Moler, G. W. Stewart, and .1. H. Wilkinson (1979).
"An Estimate for the Condition Number of a Matrix." SIAM Journal
on Numerical Analysis 16, 368-375.
[46] R. Courant (1920). "Ueber die Eigenwert bei den Differentialgleichun-
gen der Mathematischen Physik." Mathematische Zeitschrift 7, 1 57.
[47] P. J. Courtois and P. Semal (1984). "Error Bounds for the Analysis by
Decomposition of Non-Negative Matrices." In G. Ialleolla, P. ,J. Cour-
tois, and A. Honlijk, editors, Mathematical Computer Pelfonnance
and Reliability, pages 287 302. Elsevier, North Holland.
[48] C. K Crawford (1976). "A Stable Generalized Eigenvalue Problem."
SIAM Journal on Numerical Analysis 13, 854860.
[49] R. B. Davies and 13. Hutton (1975). "The Effects of Errors in the
Independent Variables in LiIlf'ar Regression." Biometl'ika 62, 383-
391.
[50] C. Davis (1963). "The Rotation of Eigenvectors by a Perturbation."
Journal of Mathematical Analysis and Applications 6, 159 173.
[51] C. Davis (1965). "The Rotation of Eigenvectors by a Perturbation.
II." Journal of Mathematical Analysis and Applications 11, 20-27.
[52] C. Davis, W. Kahan, and H. Weinberger (1982). "Norm-Preserving
Dilations and ThP.ir Applications to Optimal Error 13ounds." SIAM
Joun/,al on Numerical Analysis 19, 445-469.
[53] C. Davis and W. M. Kahan (1970). "The Rotation of Eigenvcctors by
a Perturbation. III." SIAM Journal on Numel'ical Analysis 7, 1 46.
[54] H. P. Decell (1972). "On the Derivative of the Generalized Inverse of
a Matrix." Linear and Multilinear Algebra 1, 357- 359.
[55] J. W. Demmel (1987). "On Condition Numbers and the Distance to
the Nearest Ill-Posed Problem." Numel"ische Mathematik 51,251290.
[56] J. E. Dennis and J. J. More (1977). "Quasi-Newton Methods, Moti-
vations and Theory." SIAM Review 19, 46-89.
[57] J. E. Dennis and R. B. Schnabel (1979). "Least Change Secant Updates
for Quasi-Newton Methods." SIAM Review 21, 443- 459.
[58] .1. Desplanques (1887). "ThcorilIle d'algcbra." .1. de Math. Spcc. 9,
12 13. Cited in [152].
[59] J. J. Dongarra, J. R. Bunch, C. 13. Moler, and G. W. Stewart (1979).
LINPACK User's Guide. SIAM, Philadelphia.
[60] P. Van Dooren (1979). "The Computation of Kronecker's Canonical
Form." Linear Algebra and Its Applications 1979, 103 -140.
[61] M. P. Drazin (1958). "Pseudo-Inverses in Associative Rings and Semi-
groups." American Mathematical Monthly 65, 506 -514.
[62] L. Dulmage and I. Halperin (1955). "On a Theorem of Frobenius-
Konig and J. von Neuman's Game of Hide and Seek." Transactions of
the Royal Society of Canada, Section 3, Thil'd Series 49, 23,-25.
[63] C. Eckart and G. Young (1936). "The Approximation of One Matrix
by Another of Lower Rank." Psychometrika 1, 211,"218.
[64] L. Eldll (1983). "A Weighted Pseudoinverse, Generalized Singular
Values, and Constrained Least Squares Problems." BIT 22, 487502.
[65] L. Elsner (1982). "On the Variation of the Spectra of Matrices." Linear
Algebra and Its Applications 47, 127138.
330
[ GG]
[67]
[G]
[69]
[70]
[71 ]
[72]
[73]
[74]
[75]
[76]
REFERENCES
REFERENCES
331
L. Elsner (1 9S5). "An Optimal Bound for the Spectral Variation of
Two Matrices." Linear" Algebr'a and Its Applications 71, 77 -80.
L. Elsner and J.-G. Sun (1982). "Perturbation Theorems for the Gen-
prali7,pd Eigpnvalup Problem." Linear" Algdnu and Its Applications 48,
:\,11 :\;)7.
V. N. Faddcpva (1959). Cornprtfational Methods of Linear Algebra.
DoV('r, Npw York. Translated from the Russian by C. D. Benster.
S. Falk (1965). "Einschliessungssatze fur die Eigenvektoren nor-
malcr Matrizenpaare." Zeitschr-ift fur Angewandte Mathematik und
Mechanik 45,47-56.
K. Fan (1951). "Maximum Properties and Inequalities for the Eigenval-
ues of Completely Continuous Operators." Proceedings of the National
Acadrmy of Scirncrs 37, 760 766.
K. Fau and A. .L Hoffman (1955). "Some Metric Inequalities in thc
Space of Matrices." Pr'oceedings of the A merican Mathematical Society
6, 111116.
D. 13. Feingold and R. S. Varga (1962). "l3lock Diagonally Dominant
Matrices and Generalizations of the Gerschgoring Circle Theorem."
Pacific .Journal of Mathematics 12, 1241-,1250.
W. Feller and G. E. Forsythe (1951). "New Matrix Transformations for
Obtaining Characteristic Vectors." Quarterly of Applied Mathematics
8, 325 33L
E. Fischcr (1905). "Dber quadratische Formen mit reelen Koffizien-
ten." Monat.hefte Fir Mathematik und Physik 16, 234249.
.L G. F. Francis (1961, 19(2). "The QR Transformation, Parts I and
II." Computer Journal 4, 265271, 332345.
F. G. Frobenius (1911). "(Iber den yon L. Bieberbach gefundenen
Beweis eincs Satzes von C. Jordan." Sitzungsberichte der K oniglich
Pnubischcn Akademie der' Wisenschaften zu Berlin, 49250L In [79,
v. 3, pp. 492-501].
[77] F. G. Frobcnius (1911). "Dbcr die unzerlegbaren diskreten Bewegu-
ugsgruppelL" Sitzungsberichte da Koniglich Pr'eusischen Akademie
dcr Wissenschaften Z1l BeT'lin, 507518. In [79, v. 3, pp. 507518].
[7R] F. G. FrolH'nius (1!Jl2). "Ubcr Matrizen aus nicht ncgativen Ele-
lllenten." Sitzungsberichte dcr Koniglich Preusischen Akadernie der
Hfisscns('!wjten zu BeT'lin, 456 477. In [79, v. 3, pp. 546-567].
[79] F. G. Frobenius (1968). Ferdinand Georg Frobenius. Gesammclte Ab-
handlungen (J.-P. Serre editor). Springer Verlag, Berlin.
[80] W. A. Fullcr (1987). Meas1lT'ement ErTor Models. .John Wiley, New
York .
[81] F. B.. Gantmaclwr (1959). The l'h(xJ1" y () f Malr"i('r V.ols I II ( ' I I
" ..., .,., Ie sca
Publishing Company, New York.
[82] C. F. Gauss (1809). Theoria Motus Corporum Coelestium in Section-
lbus Conicis Solem Ambientium. Perthes and Besser, Hamburg.
[83] C. F.. Gauss (1809). Theory of the Motion of the Heavenly Bodies
Movm g . about the Sun in Conic Sections. Dover, New York (1963). C.
H. DavIs, Trans.
[84] C... auss (1821). "Theoria Combinations Observationulll Erroribus
M 1ll1l1llS 0lJ1loxiap, Pars Prior." In WerA:c, I V, pagcl> 1 26. Kiiniglichcn
Gessdlshaft dcI' Wisscnschaften zu GijUinging (1880).
[85] S. A.. Grschgorin (1931). "Uber die Abgrenzung der Eigenwerte einer
MatrIx. Izv. Akad. Nauk SSSR, Ser. Fiz.-Mat. 6, 749 -754.
[86] I. Gohbrg, P. Lancaster, and L. Rodman (1982). Matr'ix Polynomials.
AcademIc Press, New York.
[87] I. Gohberg, P." Lancaster, and L. Rodman (1986). Invariant Subspaces
of Matrzces wzth Applications. John Wiley, New York.
[88] G. H. Golub (1965). "Numerical Methods for Solving Least Squares
Problems." Nume1'ische Mathematik 7, 206 216.
[89] G, H. Golub, S. Nash, and C. Van Loan (1979)" "Hessen1>erg-Schur
Method for the Problem AX + X n = C." IEEE 1}"ansactions on
Automatic Control AC-24, 90g- 913.
[90] G.. H. Golub and V. Pereyra (1973). "The Differentiation of Pseu-
dOlI1verses and Nonlinear Least Squares Problems Whose Variables
Separate." SIAM Jour'nal on Numerical Analysis 10, 413-432.
[91] G. H. Golub and V. Pereyra (1976). "Differentiation of Pseudoinverses
Separable Nonlinear Least Squares Problems and Other Tales." Ir
M. Z. Nashed, editor, Genemlized Inverses and Applications, pages
303 324. Academic Press, New York.
[92] G. H. Golub and C. F. Vall Loan (1980). "An Analysis of the Total
Least Squares Problem." SIA M Journal on Numerical A nalY8i8 17
883-893. '
332
REFERENCES
REFERENCES
333
[!J3] G. H. Golub and C. F" Van Loan (1983)" Matr'iJ; Computations" .Johns
Hopkins University Press, Baltimore, Maryland.
[94] G. II. Golub and J. H. Wilkinson (1966). "Note on the Iterative Refine-
ment of Least Squares Solution." Numerische Mathematik 9, 139 148.
[95] G. H. Golub aud ,J. H. Wilkinson (1976). "Ill-Conditioned Eigensys-
t('ms all< I the Computat.ion of the ,Jordan Canonical FOrIn." SIAM
R('1,i('1/I 18. 578 (j HI"
[!Hj] W. 13" Uragg and G. W. St.<,wart (1976). "A Stahle Variant of the
Secant Method for Solving Nonlinear Equations." SIAM Jou77wl on
Numairal Analysis 14, 880-903.
[97] ,J. P. Gram (1883). "trber die Entwickelung reeler Functionen iI Rei-
hen mittelst der Methode der kleinsten Quadrate." Journal fur dze
r'eine und angewandte Mathematik 94,41--73.
[98] W. H. Greub (1967). Linmr' Algebm. Springer-Verlag, New York.
[99] H. Hahn (1927). "Vber lincare Gleichungssysteme in linearen
Raumen." Journal Fir die reine und angewandte A1athematzk 157,
214 229.
[100] P. Hall (1935). "On Representation of Subsets." Journal of the London
Mathematical Society 10, 26- 30.
[10 1] P. R. Ilalmos (1 950). "Normal Dilations and Extensions of Operators."
Summa Bmsiliensis MaUL 2, 125 134. Cited in [237].
[102] R. J. Hanson and C. L. Lawson (1969). "Extensions and Applica-
tions of the Householder Algorithm for Solving Linear Least Squares
Problems." Mathematics of Computation 23, 787-812.
[10:] G" II. Hardy, J" K Littlewood, and G" P6lya (1934). Inequalities"
Cambridge University Press, Cambridge, England.
[104] F. Hausdorff (1914). Gnm,dzl"iqr d('!" Meng<'1l.l('hn. Chelsea, New York.
H<,printcd by Chelsea, 1949"
[105] F. Hausdorff (1919). "Das Wertvorrat einer Bilinearform." Mathema-
tische Zeitschrift 3, 314 316"
[106] M. Ilaviv and L. van der Heyden (1984), "Perturbation Bounds for
the Stationary Probabilities of a Finite Markov Chain." Advances in
Applied P1'Obability 16, 804-818.
[107] J. Z. Hearon and J. W. Evans (1968). "Differentiable Generalized
Inverscs." Jounwl of Rrsean'h of the National B1l7"eau of Standards,
S(Ti('s B 72, 109 113.
[108] H. V. Henderson and S. R. Searle (1981). "On Deriving the Inverse of
a Sum of Matrices." SIAM Review 23, 53 -60.
[lOg] P. Henrici (1962). "Bounds for Iterates, Inverses, Spectral Variation
and Fields of Values of Nonnormal Matrices." Numer'ische !l1athematik
4, 2439.
[110] C. Hermite (1857). "Extrait d'une leUre de M. C. Hermite it M. Bor-
chardt. sur l'invariabilite des carr(;s positifs d des carres Il<;gatifs dans
la transformation des polynomes homogimes du second degre." Jou77wl
fur die r'eine und angewandte Mathematik 53, 271 274.
[111] N. J. Higham (1987). "A Survey of Condition Number Estimation for
Triangular Matrices." SIAM Review 29, 575-596.
[112] N. J. Higham (1989). "Computing Error Bounds for Regression Prob-
lems," Manuscript to appear in the Proceedings of the AMS Confer-
ence on Measurement Error Models, Humboldt, CA.
[113] N. J. Higham (1989). "How Accurate is Gaussian Elimination?" Tech-
nical Report TR 89-1024, Department of Computer Science, Cornell
University. To appear in the proceedings of the 13th Dundee Biennial
Conference on Numerical Analysis.
[114] N. J. Higham and G. W. Stewart (1987). "Numerical Linear Algebra
in Statistical Computing." In A. Iserles and M. ,J. D. Powell, editors,
The State of the Art in Numeriml Analysis, pages 41 57. Clarendon
Press, Oxford.
[115] A. Hirsch (1902). "Sur les racines d'une equation fondementale (Ex-
trait d'une lettre de M. A. Hirsch it M. 1. Bendixson)." Acta Mathe-
matica 25, 367-370.
[116] S. D. Hodges and P. G. Moorc (1972). "Data Uncertainties and Least
Squan's Regression." Applied Statistics 21, 185 195.
[117] A. J. Hoffman and H. W. Wielandt (1953). "The Variation of the
Spectrum of a Normal Matrix." Duke Mathematical Journal 20, 37,
39.
[118] O. Holder (1899). "Uber einen Mittelwertsatz." Gol.I.ing Nachr'., pages
3847. Cited in [20].
[119] H. Hotelling (1933). "Analysis of a Complex of Statistical Variables
into Principal Components." J011T'nal of Educational Psychology 24,
417 -441 and 498520.
334
REFERENCES
REFERENCES
335
[120] A. S. !lousdlOldcr (1 9G8). "U nitary Triangulari'l:ation of a Nonsym-
metric Matrix." Journal of thr Association fOI' Computing Machinrry
5, 339 342.
[121] A. S. Householder (1964). Tlw Tlwory of Matl'icrs in Numerical Anal-
ysis. Dover Publishing, New York. Originally published by Ginn Blais-
dell.
[122] Vasile I. Istraescu (1981). Intmduction to Lineal' Opemtor Theory.
Marcel Decker, New York.
[123] C. G. J. Jacobi (1857, posthumous). "fIber eine elementare Trans-
formation eines in Buzug jedes von 7:wei Variablen-Systemen linearen
und homogenen Ausdrucks." Jou7Twl Fir die I"rine und angewandte
Mathematik 53, 2G5 270"
[124] G Jordan (1870). Tmitf des Substitutions et des Equations Alg-
C1n'iqu('s. Paris. Cited in [150].
[12)] C. Jordan (1874)" "Mhnoire S\ll' Ie" formes bilineaires." .Journal de
Mathbnatiqurs Pm"rs d Appliqwirs, Deuxibne Serie 19, 35 n 54.
[126] C. Jordan (1875). "Essai sur la geometrie a n dimensions." Bulletin
de la Societe Mathbnatique 3, 103 '174.
[127] P. Jordan and .L von Neumann (1935). "On Inner Products in Linear
Metric Spaces." Annals of Mathr71l.atics 36, 719 723.
[J 28] B. I(;'\.gstr()m and k Hul\(' (1980). "An Algorithm for N umcrical Com-
putation of the Jordan Normal Form of a Complex Matrix"" Tmnsac-
tions on Mathematical Software 6, 398419.
[129] W. Kahan (1966). "Numerical Linear Algebra." Canadian Mathemat-
ical Bulletin 9, 757-801.
[130] W. Kahan (1967). "Inclusion Theorems for Clusters of Eigenvalues
of Hermitian Matrices." Technical report, Computer Science Depart-
ment, University of Toronto.
[131] W. Kahan (1972). "Conserving Confluence Curbs Ill-Conditioning""
Technical Report 6, Computer Science Department, University of Cal-
ifornia, Berkeley.
W. Kahan (1973). "Every n x n Matrix Z with Real Spectra Satisfies
IIZ - Z*II :s; IIZ + Z*II(logn + 0.038)." Proceedings of the Amel'ican
Mathematiral Society 39, 235241.
W. Kahan (1975)" "Spectra of Nearly Hermitian Matrices." Proceed-
ings of the Amehcan Mathematiral So('icty 48, 11-17.
[132]
[134] W. Kahan, B. N" Parlett, and E. .Jiang (1982). "Residual Bounds on
Approximate Eigensystems of Nonnonnal Matrices." SIAM Journal
on Numerical Analysis 19, 470-484.
[135] T. Kato (1966). Pel'turbation Theory fOI' Linem' Oprmtors. Springer
Verlag, New York.
[136] D. Konig (1916). "fIber Graphen und ihre Anwendung auf Determi-
nantentheory und Mengenlehre." J\1athematische Annalen 77, 453
465.
[137] M. G. Krein and M. A. Krasnoselski (1947). "Fundamental Theo-
rems Concerning the Extension of Henninian Operators and Some of
Their Applications to the Theory of Orthogonal Polynomials and the
Moment Problem." Uspekhi Mat. Nauk. 2. In Russian. Cited in [27].
[138] M. G. Krein, M. A. Krasnoselski, and D. P. Milman (1948). "Con-
cerning the Deficiency Numbers of Linear Operators in Banach Space
and Some Geometric Questions." Sbomik Trudov Inst. A. N. UkI'. S.
S. R. 11. In Russian. Cited in [27].
[139] L. Kronecker (1890). "Algebraische Reduction der Schaaren bilinearer
FonnelL" Sitzungberichte del' Koniglich Preuflischen Akademie del'
Wissenschaften zu Berlin, pages 1225-1237.
[140] Peter Lancaster and Miron Tismenetski (1985). Thc Theol"y of Maf1"i-
ces. Academic Press, New York.
[141] P. S. Laplace (1820). Theoria analytique des pmbabilities (31'd ed.}
premiel' supplement: Sur I 'application du calcul des probabilites a la
philosophie naturelle. Oeuvres, v.7. Gauthier- Villars. Supplement pub-
lished before 1820.
[142] C. L. Lawson and R. J. Hanson (1974). Solving Least Squm'es Pmblems.
Prentice Hall, Englewood Cliffs, New Jersey.
[143] A. M. Legendre (1805). Nouvelle methodes pom' la detennination des
orbites des cometes. Courcier, Paris. Cited in [219].
[144] N. J. Lehmann (1 9(3)" "Optil1lale Eigenwerteinschiessungen." Nu-
mel'ische Mathematik 5, 246-,272.
[145]
N. J. Lehmann (1966). "Zur Verwendung optimaler Eigenwerteingren-
zungen bei der Losung symmetrischer Matrizenaufgaben." Numel'ische
Mathematik 8, 4255.
L. Levy (1881). "Sur la possibilite du I'equilibre electrique." Comptes
Rendus de l'Arademie des Sciences, Pal'is 93, 706..708.
[13:]
[146]
:336
REFERENCES
REFERENCES
337
[147] V. 13. Lidskii (1950). "1'11(' Proper Values of the Sum and Product
of Symmet.ric Matrices." Doklady Akademii Nauk SSSR 75, 769 772"
In Russian. Translat.ion by C. 13enst.er availabk from the National
cn'anslation Center of the Library of Congress.
[148] A. Loewy (1898). "Sur les formes quadratique defines it
indctermininces conjugces de M Hermite." C. R. Acad. Sci. Paris
123, lG8 171. Cited ill [150, p" 79].
[H9] Qi-kmg Lu (I!}();\). "The Elliptic Geomd.ry of Extpnded Span'." Chi-
nese !l1athematics 4, 54 69. Translation of an article appearing in Acta
mathenwtim Sinica, 13 (1963).
[150] C. C. Mac Duffee (1946). The Theory of Matrices. Chelsea, New York.
[151] M. Marcus (1960). Basie Theorems in Matrix Theo r'y. Applied Math-
ematics Series #57. National Bureau of Standards, Washington, D.C.
[152] M. Marcus and H. Minc (1964). A Survey of Matr'ix Theor'y and Matr"ix
Inequalities. Allyn and 13acon, Boston.
[153] J. L. Massera and J. J. Schiiffer (1958). "Linear differential equations
and functional analysis I." Annals of Math. 67, 517573. Cited in [27].
[154] H. 1. Medley and R. S Varga (1968). "On Smallest Isolated Ger-
schgorill Disks for Eigpnvahws. III." NU71l.p.r"ische Mathr'matik 11, :161,
;Hi!1.
[155] C. Meyer and G" W. Stewart (1988). "Derivatives and Perturbations
of Eigenvectors." SIAM Joumal on Numerical Analysis 25, 679-691.
[156] H. Minkowski (1896). Geometr'ie der Zahlen. I. B. G. Teubner, Leipzig.
Cited in [20].
[157] H. Minkowski (1911, posthumous). "Theorie der Konvexen Karpel',
inbesonder(' 13egrundung ihres Oberfliichenbegriffs." In David Hilbert,
editor, Mink01Jlski Abhandlung. Teubner Verlag.
[158] L. Mirsky (1960). "Symmetric Gage Functions and Unitarily Invariant
Norms." Qua7'tr'r"iy JOU7'7w.l of Mathematics 11, 50 -59.
[159] L. Mirsky (1963). "Results and Problems in the Theory of Doubly
Stochastic Matrices." Zeitschrift fur Wahrscheinlichkeitstheorie und
verwandte Gebiete 1, 319334.
[160] D. S. Mit.rinovic (1970). Analytic Inequalities. Springer, New York.
[161] C. Moler and G. W. Stewart (1973). "An Algorithm for Generalized
Matrix Eigenvalue Problems." SIAM Journal on Numerical Analysis
10, 241 256.
[162] E. H. Moore (IV20). "On the Reciprocal of the General Algebraic
Matrix." Bullr.tin of the Amer'ican Mathematical Society 26, 394 395.
Abstract.
[163] M. Z. Nashed and L. B. Rail (1976). "Annotated Bibliography on
Generalized Inverses and Applications." In M. Z. Nashed, editor, Gen-
eralized Invp.r'ses and Applications, pages 771 1041. Academic Press,
New York.
[164] W. OettIi and W. Prager (1964). "Compatibility of Approximate So-
lution of Linear Equations with Given Error 130unds for Coefficients
and Right-Hand Sides." Nurner'isrhe Mathematik 6, 405 -409.
[165] D. P. O'Leary (1989). "On 130unds For Scaled Projections and Pseudo-
Inverses." To appear in Linear Algebm and Its Applications.
[166J J. M. Ortega and W. C. Rheinboldt (1970). Iter'ative Solution of Non-
linear Equations in Several Variables" Academic Press, New York.
[167J A. Ostrowski (1951). "Ueber das Nichtverschwinden einer Klasse von
Determinanten und die Lokalisierung del' charakteristischen Wurzel
von Matrizen." Compositio Mathematica 9, 209-226.
[168J A. Ostrowski (1952). "Sur quelques applications des fonctions convexes
et concaves au sens de 1. Schur." Journal de Mathfrnatiques PU7'(;S et
Appliquees 11 7, 253 292.
[169J A. Ostrowski (1957). "Dber die Stetigkeit von charakteristischen
Wurzeln in Abhiingigkeit von den Matrizenelementen." Jahresberichte
der Deutsche Mathematische Ver"ein 60, 40-42.
[170J D. V. Ouellette (1981). "Schur Complement and Statistics." Linear
Algebm and its Applications 36, 187-295.
[171J C. C. Paige (1979). "Computer Solution and Perturbation Analysis of
Generalized Linear Least Squares Problems." Mathematies of Com-
putation 33, 171-184.
[172J C. C. Paige (1979). "Fast Numerically Stable Computations for Gen-
eralized Least Squares Problems." SIAM J. on Numerical Analysis
16, 165-171.
[173J C. C. Paige (1984). "A Note on a Result of Sun .Ji-guang: Sensitivity
of the CS and GSV Decomposition." SIAM Journal on Numer'ical
Analysis 21, 186-191.
[174] C. C. Paige and M. A. Saunders (1981). "Toward a Generalized Singu-
lar Value Decomposition." SIAM Journal on Numerical Analysis 18,
398 405.
338
REFERENCES
[175]
B. N. Parlett (1980). The Symmetric Eigenvalue Problem. Prentice-
lIaJl, Englewood Cliffs, New .Jersey.
M" Pavel-Parvn and A. Korganoff (1969). "Iteration Functions for
Solving Polynomial Equations." In B. Dejon and P. Henrici, editors,
Constructive Aspects of the Fundemental Them'em of Algebra. .John
Wiley, New York.
G. Peano (1888). "Intcgration par scries des equations differentielles
lincaires." Mathematische Annallen 32, 450456.
R. Penrose (1955). "A Generalized Inverse for Matrices." Proceedings
of the Camhidge Philosophical Society 51, 406,413.
R. Penrose (1956). "On Best Approximate Solutions of Linear Matrix
Equations." Pr'oceedings of the Cambridge Philosophical Society 52,
1719.
V. Pereyra (1969). "Stability of General Systems of Linear Equations."
Aequationes Mathematicae 2, 194,206.
E. Picard (191O). "Sur un theorem general relatif aux equations
integrales de premier espcce et sur quelques problemes de physique
mathematique"" Rwdicondi del Circolo Maternatico di Palermo 25,
79 97.
L. qi (1984)" "Some Simple Estimates for Singular Values of a Matrix."
Linear' Algebra and Its Applications 56, 105-119.
Lord Rayleigh (J. W. Strutt) (1899). "On the Calculation of the Fre-
quency of Vibration of a System in its Gravest Mode, with an Exam-
ple from Hydrodynamics." The Philosophical Magazine 47, 556572.
Cited in [175].
F. Riesz and B. Sz.-Nagy (1955). Functional Analysis. Ungar, New
York. L. F. Boron, Translator.
[185] J. L. Rigal and J. Gaches (1967). "On the Compatibility of a Given
Solution with the Data of a Linear System." Jour'nal of the Association
for Computing Marhincr'y 14, 543548.
[186] W. Ritz (1909). "Uber eine neue Method zm Losung gewisser Varia-
tionsprobleme der mathematischen Physik." Joumal fur die reine und
angewandte Mathematik 135, 1- 61.
[187] H. Rohrbach (1931). "Bemerkungen zu einem Determinantensatz von
Minkowski." Jahrcsber'icht der De1ltschen Mathematiker' Vereinigung
40. ,19 53.
[176]
[177]
[178]
[179]
[180]
[181 ]
[ 182]
[183]
[184]
REFERENCES
__ 339
[188] M. Rosenblum (1956). "On the Operator Equation BX _ X A = Q."
Duke Mathematiral J01LT'nal 23, 263 269.
[189] k Ruhe (1970). "An Algorithm for Numerical Determination of the
Structme of a General Matrix." BIT 10, 196-216.
[190] A. Ruhe (1970). "Pertmbation Bounds for Means of Eigenvalues and
Invariant Subspaces." BIT 10, 343- 354.
[191] H. Rutishauser (1955). "Une methode pom la dctermination des
valeurs propres d'une matrice." Cornptes Rendus de l'Acadernie des
Sciences, Paris 240, 34-36.
[192] E. Schmidt (1907). "Zur Theorie del' linearen und nichtlinearen In-
tegralgleichungen. I Tiel. Entwicklung willkiirlichen Funktionen nach
System vorgeschriebener." Mathematische Annalen 63, 433-476.
[193] I. Schur (1909). "Uber die charakteristischen Wiinpjn einer linearen
Substitution mit einer Anwendung auf die Theorie del' Integralgleich-
ungen." Mathematische Annalen 66, 448-510.
[194] P. J. Schweitzer (1968). "Perturbation Theory and Finite Markov
Chains." Journal of Applied Probability 5, 401--413.
[195] :. .1. Scriba (1973). "Carl Gustav Jacob Jacobi." In C. C. Gillispe, ed-
Itor, Dzctzonar"y of Scientific Biography. VII. Charles Scribner's Sons
wfu. '
[196] G. A. F. Seber (1977). Linear' Regr'essioT/ Analysis. Jolm "'i!ey. Nl:'w
York.
[197] R. D, Skeel (1979). "Scaling for Numerical Stability in Gaussian Elim-
ination"" Journal of the Association for C'omputmy Jlachinery 26.
494-.526.
[198] F. Smthes,,(1937). "The Eigen-values and Singular Values of Integral
EquatIOns. Proceedzngs of the London Mathematical Society 43 255
279. ' ,
[199] G. W. Stewart (1969). "On the Continuity of the Generalized Inverse."
SIAM Journal on Applied Mathematics 17, 33- 45.
[200] G. W. Stewart (1971). "Error Bounds for Approximate Invariant Sub-
spaces of Closed Linear Operators." SIAM JouT'Tlal on Numerical Anal-
ysis 8, 796-808.
[201] G. W. Stewart (1972). "On the Sensitivity of the Eigenvalue Problem
Ax = >'Bx." SIAM Journal on Numeriral Analysis 4, 669 686.
340
REFEHENCES
REFERENCES
341
[202] G" W" Stewart (1973)" "Error and Perturbation Bounds for Subs paces
Associated with Certain Eigenvalue Problems." SIAM Review 15,
727 764.
G. W. Stewart (1974). Introduction to Matrix Computations. Aca-
demic Press, New York
G. W. Stewart (1975). "Gen;chgorin Theory for the Generalized Eigen-
value Problem AJ: = >.13:1:." Mathematics of Computa.tion 29,600 606.
G. W. Stewart (1977). "On the Perturbation of Pseudo-Inverses, Pro-
jections, and Linear Least Squares Problems." SIAM Review 19, 634-
662.
[216] G. W. Stewart (1989). "On Scaled Projections and Pseudo-Inverses."
Linear Algebm and Its Applications 112, 189 194.
[217] G. W. Stewart (1989). "Perturbation Theory and Least Squares with
Errors in the Variables." Technical Report UMIACS- TR-89-97, CS-
TR 2326, Department of Computer Science, Unive;sity of Maryland.
To appear in the Proceedings of the AMS Conference on Measurement
Error Models, Humboldt, California.
[218] G. W. Stewart (1989). "Two Simple Residual Bounds for the Eigenval-
ues of Hermitian Matrices." Technical Report CS- TR 2364, Depart-
ment of Computer Science, University of Maryland.
[219] S. M" Stigler (1986). The History of Statistics. Harvard University
Press, Cambridge, Massachusetts.
[220] J.-G. Sun (1979). "A Theorem on the Perturbation of Generalized
Eigenvalues." Report on the Conference Numerical Mathematics,
Guangzhou, China.
[221] J.-G. Sun (1980). "Invariant Subspaces and Generalized Invariant Sub-
spaces (II)"" Math. Numer. Sinica 2, 113123. Cited in [67].
[222] J.-G. Sun (1982). "A Note on Stewart's Theorem for Definite Matrix
Pairs." Linear Algebm and Its Applciatio1/.S 48, 3:31 339.
[22:3] J.-G. Sun (1983). "Perturbation Analysis for the Generalized Eigen-
value Problem and the Generalized Singular Value Problem." In
13. Kagstrom and A. Ruhe, editors, Matrix Pencils, pages 221- 244.
Springer Verlag, New York.
[224] J.-G. Sun (1983). "Perturbation Analysis for the Generalized Singular
Value Decomposition." SIAM Journal on Nume7"ical Analysis 20,611--
625.
[225] J.-G. Sun (1983). "Perturbation Bounds for Eigenspaces of a Definite
Matrix Pair." Numf7'ische Mathematik 41, 321 343.
[226] J.-G. Sun (1984). "Estimation of the Separation of Two Matrices."
Journal of Computational Mathematics 2, 189 200.
[227] J.-G. Sun (1984). "On the Perturbation of the Eigenvalues of a Normal
Matrix." Math. Numer. Sinca 6, 334-336.
[228] J .-G. Sun (1985). "Gerschgorin Type Theorem and the Perturbation of
the Eigenvalues of a Singular Pencil." Math. Nmner. Sinica 7,253"" 264.
In Chinese. Translation in Chinese Journal of Numerical Mathematics
and Applications 10 (1988) 113"
[20:3]
[204]
[205]
[20G]
G. W. Stewart (1977). "Research Development and UNPACK." In
.J. R. Rice, editor, Mathematical Software III, pages 1 14. Academic
Press, New York.
G. W. Stewart (1979). "The Effects of Rounding Error on an Al-
gorithm for Downdating a Cholesky Factorization." Journal of the
Institute for Mathematics and Applications 23, 203-213.
G. W. Stewart (1979). "A Note on the Perturbation of Singular Val-
ues"" Linear Algeb7'a and Its Applications 28, 213 216.
G. W. Stewart (l!)82)" "Colllpntinl!: the CS Decomposition of it Parti-
tioned Orthogonal Matrix." Numerische Mathematik 40, 297 306.
G. W. Stewart (1984). "On the Invariance of Perturbed Null Vectors
under Column Scaling." N117nrrische Mathematik 44, 6165.
G. W. Stewart (1984). "Rank Degeneracy." SIAM Journal on Scien-
tific and Statistical Computing 5, 403413.
G. W. Stewart (1984). "A Second Order Perturbation Expansion for
Small Singular Values." Linear Algebm and Its Applications 56, 231
235.
G. W. Stewart (1987). "Collincarity and Least Squares Regression."
Statistical Scirnce 2, 68 -100.
G. W. Stewart (1987). "Invariant Subspaces and Capital Punishment."
Technical Report TR-1923, Department of Computer Science, Univer-
sity of Maryland.
G. W. Stewart (1988). "Stochastic Perturbation Theory." Technical
Report CS- TR2129, Department of Computer Science, University of
Maryland. To appear in SIAM Review.
[207]
[208]
[209]
[210]
[211]
[212]
[213]
[214]
[215]
342
REFERENCES
REFERENCES
343
[229] .J.-G. Still (1987)" Matrix Pe1"turbation Analysis. Academic Press,
I3eijing. In Chincse.
[230] J .-G. Sun (1988). "A Note on Simple Non-Zero Singular Values."
.Jvurnal vf Cvmputativnal Mathematics 6, 258 266.
[231] .J .-(;. Sun (I 98!J). "A Not.e on Local13ehavior of Multipk EigmtvahH's."
SIAM .Jounw./ on Mah"i:l: Analysis and Applimtions 10, 5:33 5tH.
[2:32] C. A. Swanson (1901). "A n Inequalit.y for Litwar Transformations
with Eigenvalues"" Bulletin af the Amer'ican Mathematical Saciety 67,
607 608.
[23:3] .J. .J. Sylvester (1852). "A Demonstration of the Theorem that Every
Homogeneous Quadratic Polynomial is Reducible by Real Orthogonal
Substitutions to the Form of a Smn of Positive and Ncgative Squares."
Philosopical Magazinr 2, 138 142.
[2:\'1) .1. .1. Sylvest.!'!" (185:\)" "On a TIH'O!"y of t.he Syzygdi<; Rdatiolls de."
Phil. Tmns., pages 481 84. Cit.ed in [39].
[235] J. .J. Sylvester (1889). "Sur la reduction biorthogonale d'une forme
lineo-lincaire it sa forme cannonique." Camptes Rendus de l'Academie
des Sciences, Paris 108, 651-653.
[2:36] .J. J. Sylvester (1890)" "On the Reduction of a Bilinear Quantic of the
nTH Order to the Form of a Sum of n Products by a Double Orthogonal
Substitution." Messenger af Mathematics 19, 42-46.
[237] Bela Sz.-Nagy (1960). Extensians af Linear' Tmnsfannatians in Hilbert
Space which Extend beyond This Space. Ungar, New York. An ap-
pendix to Ricsz and Sz.-Nagy [184] issued as a separate pamphlet.
[238] O. Taussky (1948). "Bounds for Characteristic Roots of Matrices."
Duke Mathematical .Jaw'nalI5, 1043 1044.
[239] O. Taussky (1949). "A Recurring Theorem on Determinants." Amer-
ican Mathematical Manthly 46, 672-675.
[240] .J. Todd (1950). "The Condition of a Certain Matrix." Proceedings af
the Cambridge Philasaphical Saciety 46, 116-118.
[241] O. Toeplitz (1918). "Das algebraische Analogon zu einem Satze von
Fejer." Mathematische Zeitschrift 2, 187 197.
[242] A" l\L Turill/?: (I!H8)" "R.olllldill/?:-off Errors in Matrix Processes." The
(2//0./"tcl1.1/ .Io//pwJ of Mcr'ho.ni,'s o.nd Applied Mathcmatic., 1, 287 :308.
[24:3] F. Uhlig (1979). "A Recurring Theorem about Pairs of Quadratic
Forms and Extensions: A Survey." Linear Algebra and Its Applicatians
25,219-237.
[244] A. van der Sluis (1969). "Condition Numbers and Equilibration of
Matrices." Numer'ische Mathematik 14, 14 23"
[245] A. van der Sluis (1975). "Stability of the Solutions of Linear Least
Squares Problems." Numer'ische Mathematik 23,241-254.
[246] B. L. van der Warden (1927). "Ein Satz uber Klasseneinteilungen von
Endlicher Mengen." Abhandlungen aus dem Mathematischen Seminar
der IIambllrgischen Univer'sitiit 5, 185-188.
[247] C. F. Van Loan (1975). "A General Matrix Eigenvalue Algorithm."
SIAM Jallmal an Nllmer"ical Analysis 12, 819-834.
[248] C. F. Van Loan (1976)" "Generalizing the Singular Value Decomposi-
tion." SIAM .JounUll vn Numeriml Analysis 13,76 83.
[249] C. F. Van Loan (1985). "On the Method of Weighting for Equality
Constrained Least Squares." SIAM Jaumal an Nllmer'ical Analysis
22, 851-864.
[250] J. M. Varah (1967). "The Computation of Bounds for the Invariant
Subspaces of a General Matrix Operator." Technical Report CS 66,
Computer Science Department, Stanford University.
[251] .J. M. Varah (1970). "Computing Invariant Subspaces of a General
Matrix When the Eigensystem Is Poorly Determined." Mathematics
af Camputatian 24, 137 149.
[252] R. S" Varga (1962). Matri"T Iter'ative Analysis. Prentice-Hall, Engle-
wood Cliffs, New .Jersey.
[253] R, S. Varga (1964). "On Smallest Isolated Gerschgorin Disks for Eigen-
values." NumeT'ische Mathematik 6, 366 "376.
[254] .J. von Neuman (1937). "Some Matrix-Inequalities and Metrization of
Matrix-Space." Tamsk. Univ. Rev. 1, 286 300. In [255, v.4, pp.205-
219].
[255] .J. von Neuman (1962). Callected W07'ks (A. H. Taub, editor). Perga-
mon, New York.
[256] J. von Neuman and H. H. Goldstine (1947)" "Numerical Inverting
of Matric:p-s of High OrdP-L" Bulletin of the Amer"imn Mnthr'matiml
Society 53, 1021 1099.
344
REFERENCES
[257]
.T. H" M. Wedderburn (1934). LpctUH'S on Malr'ices. American Math-
cmatical Society Colloquium Publications, V. XVII. Amprican Math-
p!IIatkal SociPly, Npw York.
p.-A" \Vedin (1969). "On Pseudoinverscs of Perturbed Matrices." Tech-
nical report, DppartnlPnt. of Computer Sciencp, Lund University.
P.-A. Wedin (1972). "Pert.mbation Bounds in Connection with Singu-
lar Value Decomposit.ion." BIT 12,99 111.
P.-A. \Vedin (1973). "Pprt.ubation Theory for Pseudo-Inverses." BIT
13, 217232.
P.-A. Wedin (1983). "On Angles Between Subspaces." In B. Kagstrom
and A. Iluhe, editors, Matrix Pencils, pages 263 285. Springer, New
York.
p.-A. Wedin (1!)87). "l'ert1ll'bation Tlwory and Condition Numbers
for Generalill('d and Constrained Least Squares Problems." Technical
Rq)()rt S-90 1-8 7, Instit ute of Information Processing, University of
Umea.
[25k]
[259]
[260]
[261]
[262]
[263]
K. Weierstrass (1867). "Zm Theorie der bilinearen und quadratischen
Formcn." Monaish. Akad. Wiss. Berlin, pages 310-38. Cited in [81].
H. F. Weinberger (1974). Var'iaiional Methods for Eigenvalue Approxi-
mation. Society for Industrial and Applied Mathematics, Philadelphia.
Cit.ed in [175].
H. WC'yl (1912)" "Das asympt.otisdl(, Vertdlungsgest.ell der Eigenwert
lin('arpr particlkr Differentialgleichungen (mit eincr Anwendung auf
del' TIH()ric del' Ilohlraumstrahlung)." Mathematische Annalen 71,
441 479.
[266] H. Weyl (1949). "Inequalities between the Two Kinds of Eigenvalues
of a Linear Transformation." Proceedings of the National Academy of
Sciences 35, 408 411.
[267] H. W. Wielandt (1955). "An Extremum Property of Sums of Eigenval-
ues." Proceedings of the Amer"ican Mathematical Society 6, 106-110.
[268] N. Wiener (1922)" "Limit in Terms of Continuous Transformations."
Bulletin de Ie Societe Mathernatique de Fr'ance 50, 119-134.
[269] ,L H. Wilkinson (1965). The Algebmic Eigenvalue Pmblem. Clarendon
Press, Oxford, England.
[264]
[265]
[270] ,L H. Wilkinson (1971). "Modern Error Analysis." SIAM Review 13,
5,18 568.
REFERENCES
345
[271]
[273]
J.. H. Wilkinson (1972). "Note on Matrices with a Very III-Codition('d
Elgenproblem." Numer'ische Mathematik 19, 176 178.
J. II.. WilkiwiOn (1979). "Kronecker's Canonical FOrIn and t.he QZ
Algontluu." Linear' Algebr'a and Its Applimtions 28, 285303. "
J. H. Wilkinson and G. H. Golub ( 1976 ) " Ill C l ' t ' 1 E .
. - ,on< I,lone< "lgensys-
terns and the Computation of the Jordan Canonical Form." SIAM
ReV7ew 18, 578 619.
H:. Wittmeyer (1936). "Einflul3 der Anderung einer Matrix auf der
o.sung des zugehorigen Gleichungssystems, sowie auf die charakter-
IstIschen Zahlen und die Ei g envektoren" ZI " t " ' 1t f ".
" '. ,e se Ir I 1l7' angwandte
MathematIk 71nd Mechanik 16, 287, 300. .
:: Yamamoto (1980). "Error Bounds for Computed Eig<'llvalues and
Ergenvectors." Numerische Mathematik 34, 189 199.
D. . Young (1971). Itemtive Solution of Large Linear" Systems. Aca-
dennc Press, New York.
Z. Zha.n g (198). "On the Perturbation of the Eigenvalues of a NOIl-
DefectIve Matnx." Math. NU7ner. Sinica 6 10 6 1 ()8 I Cl '
, . 11 ,Ullese.
[272]
[274]
[275]
[276]
[277J
Notation
R The set of real numbers " 1
Rn The set of real n-dimensional vectors 1
Rrnxn The set of real m x n matrices 1
e The set of complex numbers 1
en The set of complex n-dimensional vectors 1
c rnxn The set of complex m x n matrices 1
I (I,,) The identity matrix (of order n) 2
1 The vector (1,. .., I)T 2
Ii The ith unit vector 2
AT The transpose of A 2
AH The conjugate transpose of A . 2
A-I The inverse of A . 2
A-T, A-H The inverse transpose and conjugate trans-
pose of A . 2
X+Y The set {x + y : x E X, Y E Y}. Other opera-
tions on sets are defined similarly 2
R(A) The column space of A 2
N(A) The null space of A 2
rank(A) The rank of A 2
dim( X) The dimension of the subs pace X 2
det( A) The determinant of A 2
trace( A) The trace of A . 2
IIxll2 The 2-nonn of x . 3
347
348
IAI
A>B
dd
diag(61'""" ,6,,)
R(A)1., X j ,
PA,p,.'
PX,P,
L(A)
cPA (A)
1 1
A", A-"
.J,,()..)
F(A)
H(L)
A0B
S(A)
IIAII2
inf 2 (A)
8(X,Y)
L(x, y)
IIxlh
II J'II ex'
11:1:111'
IJAliF
p( A)
II A 111'
IIAIII
IIAlioo
11.11<1>
x;-y
<h.
C;' (Ri')
PI'..,,(X, Y)
NOTATION
The matrix whose elements are the absolute
values of the elements of A . . . " " .
A is component-wise greater than B. Other
relat.ions are defined similarly.
Fo[mal ddi ni t.ion
Implicit definition
A diagonal matrix . . . . . . . . " . " . "
The orthogonal complement of R(A), X .
The orthogonal projection onto R(A), X " . .
The complementary projectors I - PA, I - Px
The set of eigenvalues of A . . . . .
The characteristic polynomial of A .
The square root of A and its inverse
A .Jardan hlock of order k .
The field of values of A " "
The convex hull of L '"
The Kronecker or tensor product of A and B
The set of singular values of A .
The spectral norm of A . . . . . . . . . . . .
The smallest singular value of A . . . . . . .
The matrix of canonical angles between X and
Y..............
The angle between x and y
The I-norm of x . . . . .
The oo-norm norm of x .
The I1i}I(Jpr ]i-narm of:r "
The Frobenius norm of A
The spectral radius of A
The Holder p-norm of A
The I-norm or row-sum norm of A .
The oo-norm or the column-sum norm of A
The unitarily invariant norm defined by the
symmetric gauge function <I> ......... 76
81
x majorizes y . . . . . . . . . . . . .
Fan's symmetric gauge functions . . . . . .. 87
The set of i-dimensional subspaces of c n (R n) gO
The 1/-gap between X and Y . . . . . . . . . . 91
NOTATION
349
A(i.),k)
3
3
3
3
4
8
10
10
14
15
20
20
23
24
30
31
33
33
At
ae(O', it), re(O', it)
ae(A, Ii), [e(A, Ii)
K(A)
KBs(A)
sv A(A)_
hd(A, A)
md(A, A)
6v(A)
md 2 (A, A)
inertia( A)
sep(L, M)
(a, {J), (A)
,(A, B)
x( (a, {J), (r, 6))
PD
PL, PR
43
45
51
51
51
65
66
69
69
69
II(P, Q)IIF
dif[(A, B), (C, D)]
A generalized inverse satisfying Penrose's con-
ditions i, j, and k . . . . . . . . . " . . " . 102
The pseudo-inverse of A. . . . . . . . " ". . 102
The absolute and relative errors errors in (t . 115
The absolute and relative errors errors in A . IlG
The condition number of A: IIAlllIAtll . '. . 119
The BauerSkeel condition number of A ". . 128
The spectra variation of A with respect to A . 167
The Hausdorff distance between the eigenval-
ues of A and A. " . . . . . . . . . . . . . . . . 167
The matching _distance between the eigenval-
ues of A and A. . . . " . . " . . . . . . . . . . 167
The 1/-departure of A from normality . . . . . 171
The 2-norm matchin distance he'tween the
eigenvalues of A and A . . . . . . . . . " . 189
The inertia of A . . . . . . . . . . . . . '. . 196
The separation of the spectra of Land M. . 231
Generalized eigenvalues . . . . . . . . . " . 273
Crawford's number for the definite pair (A, B) 282
The chordal distance between (a, {J) and (r,6) 283
The chordal metric for definite pairs. . 288
The left and right equivalence metrics for reg-
ular pairs . . . . . . . . . . . . . . . . . " . . " 288
The combined Frobenius norm . . . . . . . . . 307
The difference between the spectra of (A, B)
ane! (C,D) . . . . . . . . " . . . . . . . " . . .307
Index
2-norm 51, 59, 71, 72
as largeHt sillf!,;ular value 33,
()!)
consistency with Frobenius
norm 66
matrix 2-norm 69
properties 36, 51, 70
relation to the I-norm and
oo-nonn 55
symmetric gauge function 79
unitary invariance 52, 60, 72,
74
vector 2-nonn 3
Abdelmalek, N. N. 163
absolute and relative error
in individual elements 128
matrix 116, 117, 119, 134
limitations 117
properties 116
scalar 115
properties 115
absolute value 49
acute perturbation 136 140, 151,
152
continutiy of pseudo-inverse
140
definition 139
in reduced form 1:39
acute subspaces 151, 152,255
AfriaL, S. N. 45
Amir-Moez, A. R. 209
approximation by matrix of lower
rank (see
Schmidt uMirsky
theorem) 208
Ariolo, M. 163
arithmetic-geometric mean
inequality 61
artificial ill-conditioning 125, 193
Autollne, L. 35, 36
backward perturbation (see under
linear system, least
squares, etc. 128
Banach space 60, 98
Banach, S. 60
Bateman, H. 35
Bauer Fike theorem 171, 181,
192, 294, 300
Bauer -Skeel theorem 127
Bauer, F. L. 133, 177, 194, 195
Baumgartel, II. 176
Beckenbach, K F. 60
Bellman, R. 4, 60
Beltrami, E. 34
351
352
INDEX
IN D EX
353
I3en-IsraeL A. 152
Bendixson Hirsch TOi'plitz
theori'm 30
13endixson, I. 30
13ergman, P. G. 108
Berkson, K D8
13hatia, H.. 176, 194, 227, 228
13irkhoff's theorem 50, 85, 87,
190, 1D:3, 211
Birkhoff, G. D. 88
13jerhammer, A. 108, 109
Bjiirck, A. 163
Borchart, C. W. 209
Bunch, J. n. 1:3G
13unyakovski GO
canonical angle 43, 45, 94, 98, 99,
226, 232, 240, 250, 255,
260, 323
basis metric 95
computation 43, 45
gap function 92
pairs of projections 43
variati(JIIal dlaractcrization
45
can(JIlical bases 40
Cauchy's interlacing theorem
196 198,209
Cauchy inequality 5, 60
generalized 67
Cauchy sequence 63, 64, 99
Cauchy, A. L" GO, 209
Cayley- Hamilton theorem 27
characteristic equation 15
charadm'istic polynomial 15
Chatelin, F. 4
Chebyshev, P. L. 11
Cholesky factor 13
chordalnlf'tric 283, 284, 290, 314
column span' 2, 4
cohunn stun nann (see nann,
matrix I-norm) 70
companion matrix 28
complete space 63, 64, 99
condition estimation 133
condition lllnuber (see under
matrix inverse,
eigenvalue, etc.) 118
congruence transformation 1 96
r:onsistency
between norms (see norm,
consistency) 66
foolish 2G
contraction matrix 40
Courant.., Fischer t.heorem (see
Fischer's t.heorem) 209
Courant, R. 209
Crawford, C. R. 290, 324
CS decomposition 37--40, 45
computat.ion 45
existence 37
generalized singular value
decomposition 47
perturbation theory :324
perturbation of eigenvalues
192, 215217
diagonally dominant matrix
186-188
diagonal matrix 3, 5
block 4
dif 307, 309, 311, 312
bounds for definite matrix
pair 319
dilation 209, 211
direct rotat.ion 46
doubly stochastic mat.rix 83, 85,
88, ID5
13irkhoff's theorem 85
defini tion 81
Drazin, M. P. 108, 113
Duff, I. S. 163
Dulmage, L. 88
Fischer's theorem (q.v.) 27
geometric multiplicity 15, 16
Gerschgorin's theorem (q.v.)
181
Gerschgorin disks (q.v.) 181
Hausdorff distance 167-169,
177, 178
inclusion region ID5, 210
matching distance 167169,
174,177,178,217
m(h 18D
matrix pair (q"v") 271
multiple 15, 26, 27
nomenclature 14, 26
of matrix functions 29
perturbation theory 105, 176,
203, 241
Bauer-Fike theorem (q.v.)
171
diagonalizable matrix 192
Elsner's theorem (q.v.) 168
generalized Rayleigh
quotient 24D
Henrici's theorem (q. v.)
172
Hermitian matrix 258, 263
HofIman- Wielandt theorem
(q.v.) 189
matrices similar to a
Hermitian matrix 215,
216
matrices similar to a
normal matrix 216
Mirsky's theorem (q.v.)
205
non- Hermitian
perturbations of
Hermitian matrices
212214, 217
normal matrix 192, 195
Eckart- Young theorem (see
Schmidt- Mirsky
theorem) 210
Eckart, C. 35, 210
eigenpair 14
eigenproblem 26
eigenspace (see under matrix
pair) 303
eigensystem 20
eigenvalue 14
algebraic multiplicity 15, 16
backward perturbation 175
optimal 176
Cauchy's interlacing theorem
(q.v.) 196
complex 15
condition number 186, 188,
192, 226, 240
continuity 166, 167, 176, 178,
244
defective 16, 176
Davis, C. 45, 46, 151, 194, 227,
228, 244, 258, 259
Decell, H. P. 152
defective matrix 16
Jordan form 21
definite matrix pair (see matrix
pair (definite)) 281
Demmel, J. 133, 136, 177
Dennis, J" E. 135
departure from nOrInality 171,
172,177,178
Desplanques, J. 186
determinant as poor measure of
condition 122
diagonalizable matrix 21, 28
35.1
INDEX
INDEX
orthogonal matrix 195
Ostrowski -, Elsner theorem
( q. v.) 170
simple eigenvalue 183
Weyl's theorem (q.v.) 203
Hayleigh quotient (q.v") 185
residual bound 171, 176, I!H,
205 207,209,211,2:\;),
248, 254 257
set of eigenvalues 26
simple 15, 29, 183
differentiability 185
spectral variation 167169,
177
eigenvector 14
backward pertubation 179
condition 219
condition number 241
left. 15
nOll\miqueuess 89, 219, 220
IJPrtmbation theory 240, 2,11,
2.14
Hermitian matrix 258, 263
stochastic matrix 241, 244,
246
residual bound 211
right 15
simple eigenvalue 29
elliptic norm 109
Elsner's tlH'orem 167, 1G8, 170,
181
Elsner, L. 177, 178, 311
error (see absolute and relative
error) 115
Euclidean norm (also see 2-nOl'm)
3, 53
Evans, J. W. 152
exponential of a matrix 73
Faddpeva, V. N. 71
Fan's theorem 50, 81, 86, 254
Fan, K. 88
Feingold, D. B. 188
Feller, W. 10
field of values 23, 24, 27
convex hull of eigenvalues 24
convexity of 23
dpfinition 23
of a Hermitian matrix 28
of a normal matrix 24
Fike, C. T. 177, 194
first order approximation (also
see under least squares,
eigenvalue, etc.) 131,
134, 292
Fischer's theorem 27, 196, 198,
201, 209, 281, 289
Fischer, E. 209, 289
Forsythe, G. E. 10
Francis, .I. G. F. 11
Frobenius norm 65,71, 110, 131,
135, 172, 177, 180, 247,
258
consistency 65, 69, 71
consistency with 2-norm 66
relation to eigenvalues 72
symmetric gauge function 79
unitary invariance 71, 72, 74
Frobenius, F. G. 71,88
full rank factorization 12, :\2, 105
generalized eigellvalue problem
(see matrix pair) 271
gelleralized inverse 102
(i,j, k)-inverse 102
discontinuity 108
Drazin inverse 108, 113, 241
from singular value
decompositioll 103, 104,
110
group illverse 241
limitiations 108
projections 110
generalized singular value
decomposition 46
perturbation theory 324
Gerschgorin's theorem 177, 181,
186, 187, 203
block variant 188
compared with Elsner's
theorem 181
gelleralized (see under matrix
pair (regular)) 291
Gerschgorin disks 181, 186, 187
irreducible matrix 188
isolated 187
reduction by diagonal
similarity 182--187
Gerschgorin, S. A. 177, 186
Gohberg, I. 227
Golub, G. H. 4, 151, 152, 155, 163
Gragg, W. B. 133
Gram-Schmidt algorithm 11, 12
modified 12
Gram, J. P. 11
Gaches, J. 134
Gantmacher, F. R. 4
gap function (see under metrics
for su bspaces) 90
Gastinel, N. 71, 133
Gatlinburg Conferences 71
Gaussian elimination 132
Gauss, C. F. 108 110, 134
Hadamard's inequality 8, 14, 168,
177
Hahn Banach theorem 57, 63
Hahn, H. 60
355
Hall's theorem 84, 88, 89, 170,
178
Hall, P. 88, 89
Halmos, P. R. 46
Halperill, I. 88
Hanson, R. .I. 152, 163
Hardy -LittIewoodP6lya theorem
81,87
Hardy, G. H. 88
Hausdorff distance (see under
eigenvalue) 167
Hausdorff, F. 27, 177
Hearon, J. Z. 152
Henrici's theorem 172,174, 177
Henrici, P. 177, 178
Hermite, C. 209
Hermitian matrix 3, 5
Cauchy's interlacing theorem
(q"v.) 196
eigellvalues 19
bounds 30
of sums 25
field of values 28
Fischer's thearem (q.v.) 27
perturbation of eigenvales
Mirsky's theorem (q.v.)
204
perturbatioll of eigenvalues
203, 258, 263
generalized Rayleigh
Quotient 249
matrices similar to a
Hermitian matrix 215,
216
nOIl- Hermitian
perturbations 212 -214,
217
positi ve definite
perturbation 203
Weyl's theorem (q"v") 203
356
I)('rturbat.ion of eigenvectors
258, 26:3
lH'rturbation of invariant
subspacps 244
rpsidual bound for
eigenvalues 205 207
residual bounds for
eigenvalues 248, 254 257
residual bounds for invariant
subspaces 249 254
first sin 8 theorem 250, 258
nonorthonormal baisis 251
spcOlHI sin 8 thporcm 251,
2;)5, 2;,8
tan t-) theorpm 25:, 258,
259
skew Hermitian matrix 5
spectral decomposition 19,
226
sums of 27
lIessiall matrix 1:34
Higham, N. .J. 1:3:3, 16,t
Hilbert spacp 64, 98
Hirsh, H. :30
Hoffman Wielandt theorem 189,
19:3, 205, 213, 218, 257
generalizations 19:3, 194
limitations HJ1
Hoffman, A. J. 88, 19:3
Holbrook, J. R. A. 194
Holder's inequality 61
Holder norms 192
Holder, O. 61
Hotelling, H. 35
Householder transformation 5, 6,
10
lIouseholdl'1', A" S. 4, 6, to, 11,
71, 176, 195
id('mpotent matrix 28
INDEX
INDEX
inertia of a matrix 196
illf 2
in terms of U1(' inverse matrix
36
in terms of the pseudo-inverse
110
inner product 5:3, 62
invariant subspace 21, 22
approximate (see invariant
subspace, residual
bound) 230
backward perturbation 175,
178-180
optimal 176
characteri,r,ation 220, '2'27
complementary 22'2, 225
complex conjugate
eigenvalues 29
definition 22
left 221, 225.
normalization 239, 244
perturbation theory 229,
236-240, 244, 254
canonical angles 2:37
first sin 8 theorem 250, 258
Hermitian matrix 244
second sin e theorem 251,
255, 258
tan 8 theorem 253, 258,
259
reduced form of matrix 221
representation of matrix 22,
220, 2:31, 235, 237
condition number 240
residual bound 174, 206,
229-2:36, 246, 249 254
canonical angles 232
nonorthonormal basis 251
sensitivity 21, 90, 219
simple 220, 221, 226, 227,
2:31, 2:35, 2:38
existence of complementary
invariant subspace 225
reduction to block diagonal
form 224
spectral projection 114, 225,
240
canonical angles 226
spectral resolution 223-228,
237, 244
Sylvester's equation (g. v.)
222
mverse
matrix
condition number
inverse matrix 102
asymptotic forms and
derivatives 130-132
condition 17
condition number 119, 127,
133
artificial ill-conditioning
122124
distance from singularity
120, 121, 1:3:3
in the 2- norm 121
optimal 133, 135, 193
relation to determinant 134
relation to eigenvalues 122,
134, 135
significant digits in inverse
120
left and right inverses 134
perturbation theory 117-124
linear system 124
random perturbations 131
well-conditioned 120
irreducible matrix 186, 188
Jacobian matrix 134
357
Jacobi, C. G. J. 209
Jiang, E. 178, 180
Jordan Wielandt matrix 32, 34,
35, 259
Jordan block 20, 28, 174, 180,
186, 300
function of 73
powers of 28
Jordan canonical form 20, 21, 26,
174,227,280
associated invariant
subspaces 21
cOlnputat.ioll 27
Drazin gpllf'ralized inverse
11:3
limitations 21, 227
principal vector 21
Jordan, C. 26, :34, 35, 45, 289
Jordan, P. 63
Kahan, W. 27,45,46, 133, 151,
178,180,209,211,217,
218, 244, 258, 259
Kato, T. 4, 176, 178, 244
Kiinig, D. 88
Korganoff, A. 152
Krasnoselski, M. A. 98
Krein, M. G. 98
Kronecker product :30, 228, 258
Kronecker, L. 289
Lancaster, P. 227
Laplace, P. S. 11
Lawson, C. L. 152, 163
least squares 10, 11, 101
aSYlIlptotic forms and
derivatives 162
backward perturbation
160-163
I3jorck's theorem 162
358
INDEX
condition
condition IHlmoer 15G, IG:
reflected by solution
156 158, IG3
square of K. 158, IG3
constrained 109, 112
cross-product mat.rix
backward JH'rt.mhatioll I (i,1
!'lliptic n01'l1l 109
errors in the variables IG3
bias 2G 7
expand!'d equations 107, IGl,
IG3
Gauss-Markov theorem 110
Gauss, C. F. 108
measurement e!Tor models
IG3
normal equations 107
numerical methods 152, IG3
pertmbation tlwory 156 IGO
Bji.irck's theorem 158
errors in a column 163
Iwrtmbation in A 157
pert moat ion in b 156
pertmbation of the residual
vector 160
structured pertmbation
158, ]63
priority dispute between
Gauss and Leg!'ndre 109
r!'d u('(d form 1 SS
regr!'ssion diagnostics IG:3
residual vector 107
solution by pseudo-inverse
107
statisticians' notation 109
Legendre, A. M. 109
Levy, L. 186
Lidskii, V. I3. 209, 210
lilH'ar functional f)6
linear system 10], 114
artificial ill-conditioning ] 25
asymptotic forms and
deri vatives 130-132
backward perturbation
128-130, ]33, ]35, 136
Oettli Prager t.heorem ]30
Rigal (;adl(s tll(or!'ln 128
structmed 129
condition number 125, 127
artificial ill-conditioning
128
I3auer-Skeel 128, 133
reflected by solution 126
perturbation theory 124 128,
132
I3auer-Skeel theorem 127
component-wise bounds
125, 126
from inverse matrix 124
pertmbation in matrix 124
perturbation in the
right-hand side 126, 127
structured perturbation
127, 128
residual bound 128 130
structured perturbation 133
LlNPACK 133
Littlewood, .1. E. 88
Loewy, A. 27
LH algoritlnn ]]
Lu, Q.-k. gg
Mac Duffee, C. C. 4
majorization 81, 88, 89
Marcus, M. 4
matching distance (see under
eigenvalue) 167
matrix function 20
exponential 73
IN D EX
Jordan block 73
Neumann series 73
power series 73
rational 29
spectral resolution 229
matrix lJonn (see under norm) 64
matrix pair
characteristic !'quatiolJ 27/1
eigenvalue 273
(0:, (3) notation 273
eigenvector 273
equivalence 276
generalized Schur form 277
Hermitian 281
infinite eigenvalues 271, 273
linear transformations of
matrix pairs 275
metrics 284- 288
limitations 288
nonsingular B 274
numerical methods 274
reduction to ordinary
eigenvalue problem 274
scaling problems 272
singular 273, 274, 289
systems of differential
equations 289
matrix pair (definite) 281-283,
289
bounds for dif :n 9
definition 282
diagonalizability 283
eigenangle 314
perturbation theory 324
variational characterization
314
eigenangles 324
eigenvector 318
fail ure of defini tion in real
case 290
359
matching distance 324
pert uroation of eigenspaces
317322
condition number 322
perturbation of eigenvalues
314 n 317
condition 1l1llllhPr :11 :1, 31 G,
324
limitations of the theory
316
perturbation of eigenvectors
322
positive definite B 281, 282,
289
Fischer's theorem 281
projective metric 285-287,
290, 314
right and left eigenspaces 317
sin 8 theorems 323, 324
spectral resolution 318
normalized 318
matrix pair (regular) 273, 274
approximate eigenspace
306-309
continuity of eigenvalues 291
deflating subspace 311
diagonalizable pair 280, 281,
297
eigenspace 312
backward pertmbation 309
characterizations 304
complementary 306
definition 303
eigenvectors 312
simple 305, 30G
eigenvalue
simple 277
generalized Bauer- Fike
theorem 294, 30], 311
generalized IIenrici Theorem
360
INDEX
INDEX
311
generalized
Hoffman Wielandt
t.llPorem :311
gencralized Schur fOrIn 276,
28\)
real pair 290
Gerschgorin theory 294 300
diagonal similarity 297
generalized Gerschgorin
theorem 295
simplification of bounds
296
iufinite eigenvalues 274
left proj('ctive m!'!,ric 288, 290
normal pair 311
IlIlmher of eigcnvalues 275
perturbation of eigenspaces
309- 311
perturbation of eigenvalues
:311
condition number 294, 300,
303
diagonali,mhle pair 300 - 303
first order approximation
292
first order error bounds 293
gpnpraliZ(d l1al\(r Fike
theorem 301
mult.iple eigenvalue 297 300
spectral variation 302
perturbation of eigenvectors
312
recovery of bounds for
ordinary eigenvalue
prohkm 294, 297
right projective metric 288,
290
sppd.ral nsolut.ion 30G, :no
\V('ierstrass canonical form
280, 290
matrix pencil (also see matrix
pair) 209, 271
rectangular 272
singular 271
Mcintosh, k 194, 227, 228
metric 53
from a norm 62
pseudo-metric 62
metrics for subspaces 50, 90
H-metric 96, 98, 99
basis metric 95, 9S, gg
completeness 99
distance from a vector to a
suhspaC!' 90, 99
gap function gO 9 / 1, 98, )J
canonical angles 92
definition 91
equivalence of gap
functions 91
projections 93
gap topology g 1, 93, 94
projection metrics 93, 94
Schaffer's metric 99
unitarily invariant metriC's
94-98
failure to generate the gap
topology 94
unitary invariance 99
Meyer, C. 244
Milman, D. P. 98
Minc, H. 4
Minkowski's inequality 61
Minkowski, H. 59, 61
Mirsky's theorem 194, 204, 206,
208, 209
Mirsky, L. 71, 72, 194, 209, 210
Moler, C. U" 290
Moore-Penrose gcncrali;wd inverse
(see pseudo-inverse) 102
Moore, E. H. 108, 109
More, G. 135
Naslwd, M. Z. 108
Jl(arly singular matrix 120
Neumann series 73
nilpotent matrix 29
nondefective matrix (also see
diagonalizable matrix)
21, 28
norm
2-norm (q. v.) 3
absolute norm 52, 72, 76
and spectral radius 73
combining nonns 61
consistency 65, 67, 71
defini tion 66
failure 65
family of consistent norms
69
unitarily invariant norms
80
vector norm consistent with
a matrix norm 66
convexity and norms 59, 63
dual 57, 59, 67
dual of dual 58
elementary properties 51
elliptic 53, 62, 111
equivalence of norms 54, 55,
59, 65, 72
failure 64
Frobenius norm (q.v.) 65
generated by an inner
product 62
generated by linear
transformation 53
generated hy positivc definite
matrix 53, 62
Hilbert-Schmidt norm 210
361
Hiilder norms 51 57 60 61
64,69, 121, 134' , ,
infinite dimensional spaces 71
limib; and norms 55, 65
matrix oo-norm 69, 71, 72
matrix I-norm 69, 71, 72
matrix norm 49, 64
of a linear fnnctional 56
operator norm 67, 68, 72
consistency 67, 68
polar 59
spectral radius and norms 67,
72,73
snbordinate norm (see norm,
operator norm) 68
unitarily invariant (q. v.) 50
vector oo-norm 51, 55, 69
vector I-norm 51, 55
vector norm 49, 50
nonnalizable matrix (see
diagonalizable matrix)
189
normal matrix 3, 5, 171, 191, 194
condition number 134
departnre from nonnality
(q.v.) 171
eigenvectors 19
field of values 24
perturbation of eigenvalues
192, 195
Hoffman--Wielandt theorem
(q.v.) 189
matrices similar to a
normal matrix 216
residual bound 191
Schur decomposition 18
null space 2
O'Leary, D. P. 112
Oettli- Prager theorem 130, 161
362
INDEX
()pt.t.li, W. U3
orthogonal matrix 3, l!J;I
pprturbation of eigpnvalues
ID5
orthonormal basis 8
Ostrowski ElslH'r tlH'orem 170
Ostrowski, A. 88, 177, 187
Paigp, C" C. 45, 4G, 99, Ill, 324
Parlett, B. N. 4, 178, 180, 209
Pavpl-Parvu, M" 152
Peano, G. 71
Ppnrose's conditions 102, 110
Penrosp, R. 108, I HI, 151
Pprpyra, V" 152, 155, 163
pNmutation matrix 3, 83, 85
permutation vector 83 85
Picard, It :35
polar decomposition 36
P61ya, G. 88
positive ddinite matrix 3 5, 27,
7:3, 74
(,(JIldition 11I1I1I1)('r 1'22
norm gPIHratpd by 5:3
positive semi-definite matrix 3, 5
square root 20
powers of a matrix 73
Prager, W. 133
principal vpctor 21
projection (oblique) 11, 14, 152
generalized inverse 110
spectral projection 114
with resp('ct to an inner
product 111
projection (orthogonal) 9
acute perturbation 137 140,
153
as Hermitian idempotellt 10
asymptotic forms and
derivatives 154
canonical <tnglps 43
complemcnt.ary 10
condition number 154
continuity 153
generalized inverse 110
least squares 10
pprturbation of products 141
perturbation theory 153, 154
pseudo-inverse 106
red uced form 153
proj('ction (with respect to a
norm) 91
pseudo-inverse 101, 102
application to least squares
107
asymptotic forms and
derivatives 150-152
I3jerhanllner's
characterization 109
condition number 146, 149,
163
distancc from matrix of
lower rank 152
continuity 136, 140, 146, 151
counterexamples 105
elementary properties 104
elliptic 109, III
existence and uniqueness 104,
110
expressions for perturbed
pseudo- inVf'rse 142
full rank case 108
Gauss, C. F" 108
minimali ty 11 0
Moore's characterization 109
nonacute perturbations 140
orthogonal projections 106
perturbation theory 140-151
acute perturbations
146 150
INDEX
g(ncral rcsults 140 '146
Wedin's bounds 142 146
QR decomposition 110
reduced form 137
scaled 111
weighted 111
Pythagorean equality 10
Qi, L. 187
QR algorithm 11, 18
QR decomposition 6 8, 11, 30
existence 7
pseudo-inverse 110
uniqueness 13
with pivoting 11
QR factorization 8, 13
generalized singular value
decomposition 47
partitioned 13
quasi-Newton method 134
j
""j
i
Rail, L. B. 108
random p('rturbation 131, J 31,
163
Rayleigh -Ritz approximation 207
,
209
Rayleigh quotient 185, 241
generalized 240, 244, 248, 249
Rayleigh, Lord (J. W. Strutt) 210
residual bounds (see under linear
system, eigenvalue, etc.)
128
Riesz, F. 60
Rigal- Gaches theorem 128
Rigal, J. L. 134
right inverse 110
Ritz vpctors 210
Ritz, W. 210
Rodman, L. 227
Rohrback, H. 186
363
Rospnblum, M. 227, 228
Bouche's t!worem 167, 176
rounding-error analysis 132, 133
rounding prror 274
row sum norm (spe norm, matrix
oo-nonn) 70
Ruhe, A. 244
Saunders, M. A. 45, 46
Schaffer, .J. J. 99
Schmidt-Mirsky theorem 208, 210
Schmidt, E. 11,35, 209, 210
Schur complpment 13
Schur decomposition 17-20, 26,
28, 171, 222
existence 17
of a real matrix 26, 29
of normal matrix 18
uniqueness 26
Schur, 1. 13, 26, 71
Schwarz 60
sep 244
continuity 2;3<1, 23G
definition 2;31
Hermitian matrices 247, 258
properties 245
relation to separation of
eigenvalues 233, 247, 258
set operations 2
Sherman 'Morrison Woodbury
formula 5
similarity transformation 16
ill conditionpd J 7, 21
uni tary 17
singular subspace 259
residual bound 260 262, 266,
267
Wpdin's <I>E> theorems 260,
262, 267
singular valu(' 31