/
Текст
Universitext
Editors (North America): J.H. Ewing, F.W. Gehring, and P.R. Halmos
Berger: Geometry I, II (two volumes)
Bliedtner/Hansen: Potential Theory
Bloss/Bleeker: Topology and Analysis
Chandrasekharan: Classical Fourier Transforms
Charlap: Bierbach Groups and Flat Manifolds
Chern: Complex Manifolds Without Potential Theory
Cohn: A Classical Invitation to Algebraic Numbers and Class Fields
Curtis: Abstract Linear Algebra
Curtis: Matrix Groups
van Dalen: Logic and Structure
Devlin: Fundamentals of Contemporary Set Theory
Edwards: A Formal Background to Mathematics I a/b
Edwards: A Formal Background to Mathematics II a/b
Emery: Stochastic Calculus
Fukhs/Rokhlin: Beginner's Course in Topology
Frauenthal: Mathematical Modeling in Epidemiology
Gardiner: A First Course in Group Theory
Gar ding/Tambour: Algebra for Computer Science
Godbillon: Dynamical Systems on Surfaces
Goldblatt: Orthogonality and Spacetime Geometry
Humi/Miller: Second Course in Order Ordinary Differential Equations
Hurwitz/Kritikos: Lectures on Number Theory
Iverson: Cohomology of Sheaves
Kelly/Matthews: The Non-Euclidean Hyperbolic Plane
Kostrikin: Introduction to Algebra
Krasnoselskii/Pekrovskii: Systems with Hysteresis
Luecking/Rubel: Complex Analysis: A Functional Analysis Approach
Marcus: Number Fields
McCarthy: Introduction to Arithmetical Functions
Meyer: Essential Mathematics for Applied Fields
Mines/Richman/Ruitenburg: A Course in Constructive Algebra
Moise: Introductory Problem Courses in Analysis and Topology
Montesinos: Classical Tesselations and Three Manifolds
Nikulin/Shafarevich: Geometries and Group
Oskendal: Stochastic Differential Equations
Rees: Notes on Geometry
Reisel: Elementary Theory of Metric Spaces
Rey: Introduction to Robust and Quasi-Robust Statistical Methods
Rickart: Natural Function Algebras
Rotman: Galois Theory
Samelson: Notes on Lie Algebra
Smith: Power Series From a Computational Point of View
Smorynski: Self-Reference and Modal Logic
Stroock: An Introduction to the Theory of Large Deviation!
Sunder: An Invitation to von Neumann Algebras
Tondeur: Foliations on Riemannian Manifolds
Verhulst: Nonlinear Differential Equulioni und Dyilttfflliil IjfVlimi
Zaanen: Continuity, Integration mul I'ouilti Thtuiy
Morton L. Curtis
Abstract
Linear Algebra
With Revisions by Paul Place
Sprlngtr-Verlag
New York Berlin Heidelberg
London Purls Tokyo Hong Kong
Morton L. Curtis (1921-1989)
Department of Mathematics
Rice University
Houston, TX 77251
USA
Editorial Board
(North America):
J.H. Ewing F.W. Gehring
Department of Mathematics Department of Mathematics
Indiana University University of Michigan
Bloomington, IN 47405 Ann Arbor, MI 48109
USA USA
P.R. Halmos
Department of Mathematics
Santa Clara University
Santa Clara, CA 95053
USA
AMS Subject Classifications (1980): 15-01, 11R52
Library of Congress Cataloging-in-Publication Data
Curtis, Morton Landers, 1921-
Abstract linear algebra / Morton L. Curtis ; with revisions by
Paul Place.
p. cm — (Universitext)
Includes bibliographical references.
ISBN 0-387-97263-3 (alk. paper)
1. Algebras, Linear. I. Place, Paul. II. Title.
QA184.C873 1990
512'.5-dc20 90-33058
Printed on acid-free paper
® 1990 Springer-Verlag New York Inc.
All rights reserved. This work may not be translated or copied in whole or in part without
the written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue,
New York, NY 10010, USA), except for brief excerpts in connection with reviews or
scholarly analysis. Use in connection with any form of information storage and retrieval,
electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed is forbidden.
The use of general descriptive names, trade names, trademarks, etc., in this publication,
even if the former are not especially identified, is not to be taken as a sign that such names,
as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used
freely by anyone.
Photocomposed copy prepared using the author's LaTeX file.
Printed and bound by R.R. Donnelley & Sons, Harrisonburg, Virginia,
Printed in the United States of America.
987654321
ISBN 0-387-97263-3 Springcr-Vcrlag New York Berlin Htldf lh»i*
ISBN 3-540-97263-3 Springer-Verlag Berlin Hsldillt«f NfW Iff!
Preface
The unexpected and untimely death of the author, and my good friend,
Morton Curtis on February 4, 1989 occurred only a short time after the
first complete version of this manuscript had been returned by the typist.
His intentions with this work were clear: he saw a way of organizing the
subject of Linear Algebra around a topic which was not in the general
purview and in doing so enhancing both the topic and its audience. Indeed
it was characteristic of his approach to mathematics to seek out the elegant •
ideas, to make them clear and concise, and to teach them with enthusiasm
to the widest group possible. It gives me satisfaction to have been able to
help insure that this project be completed. That others might gain some
enjoyment of mathematics through this book would be a most suitable
monument to his memory.
The goal to which this work leads is the Theorem of Hurwitz—that
the only normed algebras over the real numbers are the real numbers,
the complex numbers, the quaternions, and the octonions. It is, to my
knowledge, the only presentation of this material at an "elementary level."
It begins from scratch and develops the standard topics of Linear Algebra
one would expect from a first course on the subject. It is intended as a text
for such a course directed toward students with mathematical leanings; it
stresses the complete logical development of the subject while suppressing
mention of its many applications to other fields. I think it would also be a
valuable reference for mathematicians in general.
Much of the beginning presentation is standard; where options exist,
the choice has tended toward the more mathematically sophisticated—
particularly where this choice makes a good case for its superiority as,
for example, in the treatment of determinants through the introduction of
exterior algebras.
The task of preparing the original draft for publication was taken on
by my student, Paul Place. In consultation with me, he approached the
job of reworking the text to insure clarity, consistency, and completeness
with admirable dedication. His efforts include: the inclusion of exercises,
rewriting of sectionH IF, HE, IIIC, HID, and large parts of Chapters IV and
V,. addition of ioctioun HIE and IVA , and all the other details inherent
VI
Preface
in such a project. The reader will surely benefit from the care he put into
polishing this book.
Our thanks also go to Vivian Choi for her excellent work in Tj^K-ing all
phases of this manuscript.
John Hempel
Houston, Texas
December 1989
Introduction
This book covers most of the basic results in linear algebra, but in some
ways it is somewhat more ambitious. The main such feature is in Chapter
V in which the Hurwitz Theorem is proved. This result has been known
since early in this century, but the proofs have just recently been simplified
enough for a textbook at this level. The treatment which gave me the
courage to include it is due to Reese Harvey and Blaine Lawson ([3]). Our
treatment is just a detailed amplification of their appendix.
Another feature is the introduction of exterior algebras to use in defining
determinants. This has been standard for years in more advanced courses,
but is not yet standard in beginning linear algebra texts. It greatly
simplifies the, usually quite messy, more standard treatments of determinants.
Also, we prove the Cay ley-Hamilton Theorem in a general setting. This
only necessitates a consideration of polynomials over a not-necessarily
commutative ring (namely, the ring ofnxn matrices). Since noncommutative
algebras will be a way of life for us in Chapter V, it doesn't hurt to make
an acquaintance in Chapter III.
All of this is written as a mathematics course—not an applied
mathematics course. It is to be hoped that the reader can read and understand
the mathematics without applications. Linear algebra has lots and lots of
important and interesting applications to many subjects as well as to other
things in mathematics. Try to accept it for now as interesting mathematics.
«-*
Contents
Preface
Introduction
Chapter 0. Algebraic Preliminaries
Chapter I. Vector Spaces and Linear Maps
A. Vector Spaces
B. Linear Maps
C. Bases, Dimension
D. Direct Sums, Quotients
E. Eigenvectors and Eigenvalues (Part i)
F. Dual Spaces
Chapter II. Matrices and Determinants
A. Matrices
B. Algebras
C. Determinants, the Laplace Expansion
D. Inverses, Systems of Equations
E. Eigenvalues (Part ii)
Chapter III. Rings and Polynomials
A. Rings
B. Polynomials
C. Cay ley-Hamilton Theorem
D. Spectral Theorems
E. Jordan Form
Chapter IV. Inner Product Spaces
A. Rn as a Model, Bilinear Forms
B. Real Injur Product Spaces, Normed Vector Spaces
, C. CompliM Imwr Product Spaces
V
vii
1
9
9
15
20
28
37
41
47
47
56
61
71
76
81
81
86
94
96
102
111
111
116
123
X
Contents
D. Orthogonal and Unitary Groups 129
E. Stable Subspaces for Unitary and Orthogonal Groups 133
Chapter V. Normed Algebras 141
A. The Normed Algebras R and C 141
B. Some General Results, Quaternions 145
C. Alternative and Division Algebras 154
D. Cayley-Dickson Process, Hurwitz Theorem 156
Bibliography 163
Index 165
Chapter 0
Algebraic Preliminaries
A field is an algebraic object consisting of a set with two operations on
it, addition and multiplication, which are required to satisfy certain
conditions. To avoid a certain amount of repetition in stating these conditions
(and thus to also make the conditions easier to remember), we first talk
about sets with only one operation.
Definition A group G is a set G with one operation ((a, b) —► ab)
satisfying:
(i) the operation is associative
a(bc) = (ab)c,
(ii) there exists an identity element e € G such that for each a G G
ae = ea = a, and
(iii) for each a G G there is an inverse element a"1 such that
aa~x — e = a~xa.
Note that, because of (ii), the empty set cannot be a group. But given
a set with exactly one element (a singleton) 4t has precisely one operation
on it, and that operation makes it into a group, a trivial or zero group.
Now conditions (ii) and (iii) leave open certain possibilities. Namely,
there may be more than one identity element, and an element of G may
have more than one inverse element. Actually, neither of these can happen.
Proposition 1 A group G has exactly one identity element and each a 6 G
hm emctly om'invgrm.
2
0. Algebraic Preliminaries
Proof: Suppose e and / are identity elements in G. Then fe = e since /
is an identity element, and fe = / since e is an identity element.
Next suppose that 6, c are both inverse elements for a. Then
b — eb = (ca)b = c(ab) = ce = c. I
Examples
(1) The set Z of integers is a group under addition. The operation
is associative, zero is the identity element, and —a is the inverse
of a.
(2) Z is not a group under multiplication. The operation is
associative and 1 acts as identity, but only 1 and —1 have
multiplicative inverses in Z.
(3) The set Q of rational numbers is a group under addition.
(4) The set Q — {0} (i.e., all nonzero rationals) is a group under
multiplication. (Verify this.)
(5) R+ : The set of all real numbers x > 0 forms a group under
multiplication. (Verify.)
(6) Let Rn be the set of all ordered n-tuples
x = (x1,x2,...,xn) '
of real numbers. Given x = (xi,..., xn), y = (yi,..., xn),
define x -f y = {x\ -f yi,..., xn + yn). This operation on Rn
is associative, 0 = (0,..., 0) acts as identity, and the inverse of
x = (a?i,..., xn) is —x = (—a?i,..., —xn). This makes Rn into
a group.
We will be primarily interested in this course in groups which are
abelian.
Definition A group G is abelian group if always
ab = ba.
In the examples above, all groups are abelian.
Now we are ready to define field.
Definition A set k with an operation of addition (a+ 6) and an operation
of multiplication (ab) is a field if the following aoildltloni are Hatinfiecl.
(i) k is an abelian group under + (ili it» Identity olcurumt for +
is denoted by 0).
0. Algebraic Preliminaries
3
(ii) Multiplication distributes over addition
a(b 4- c) = ab + ac.
(iii) k — {0} is an abelian group under multiplication (the identity
element is written as 1).
Note that in a field k, the additive identity 0 is a "multiplicative anni-
hilator." That is, for any a G k we have aO = 0.
Proof: aO = a(0 + 0) (since 0 is the additive identity) = aO + aO (by (ii)).
By (i), aO must have some additive inverse, and adding it to both sides of
aO = aO + aO
gives 0 = aO. I
Recall that we could not have an empty group, but could have one with
only one element. For a field we need at least two elements, 0 and 1. (These
could not be equal since 0 annihilates 1, but 1 times 1 equals l.)1 There is
a field with just these two elements:
Addition Table Multiplication Table
+
0
1
0
0
1
1
1
0
X
0
1
0
0
0
1
0
1
Exercise. Prove that these operations cannot be anything else.
Proposition 2 Let k be a field. If x,y G k and x ^ 0 and y ^ 0; then
xy ^ 0.
Proof: We will show the contrapositive, that is, if x ^ 0 but xy — 0, then
y = 0.
Now x/0 implies (by (iii)) that x~x exists (with x~xx — 1). Then
y = (x~1x)y = x~1(xy) = x_10 = 0. I
The property described in Proposition 2 is "no divisors of zero"\ i.e.,
we cannot have x ^ 0 and 2/^0, but xy — 0. So a field has no divisors of
zero.
But, additivcly, a field can have "torsion." For example, the two-
element field has 1 + 1 = 0. We can define a three-element field (elements:
0,1,2) by
^Show that If i = 0 wo are reduced to one trivial operation and really have just the
ssero group.,
4
0. Algebraic Preliminaries
Addition Table Multiplication Table
X
0
1
2
0
0
0
0
1
0
1
2
2
0
2
1
+
0
1
2
0
0
1
2
1
1
2
0
2
2
0
1
(Verify that this is a field.) Note that in this field 1 + 1 + 1 = 2 + 1 = 0.
Definition The characteristic A of a field k is the least natural number
such that
1 + 1 + ... + 1 = 0.
v v '
A
If no such natural number exists, we say k has characteristic zero (strange
terminology?).
There will be cases in which we will want to have 1 + 1^0. So from
now on when we say "a field fc" we will mean "a field k with 1 + 1 ^ 0."
Definition If k is a field and s is a subset of fc, then s is a subfield of k if
the operations of k make s into a field. Then k is called an extension field
of s.
Let p{x) = anxn + an-ixn~1 + ... + a\x + ao be a polynomial with
coefficients ai in a field k. We say that r G k is a root of p(x) if
p(r) = anrn + an_irn_1 + ... + axr + a0 = 0.
The field k is algebraically closed if every polynomial with coefficients in
k has a root r in fc. For example, the field R is not algebraically closed
because p(x) = x2 + 1 has coefficients in R but there is no real number r
such that p(r) = r2 + 1 = 0.
However, the field C of complex numbers (which we are just now going
to introduce) is algebraically closed. This fact is known as "the fundamental
theorem of algebra"
The remainder of this introductory chapter is devoted to a classical and
very important extension field C of the field R of real numbers.
The Field C on R2
Example (6) of a group was Rn. In particular, for n = 2 we have
x = (xi,x2), y= (2/1,2/2) with
x + y = (xi+yi,X2+1/2)1
identity 0 = (0,0) and the inverse -a? ■* (-&ij^$i) for x. Wo consider
R C R2 by assigning to r € R the point (f,0) ill Wo easily hoc that
this makes R into a subgroup of R3, (Wrlti out ft formal definition of
0. Algebraic Preliminaries
5
"subgroup.") Can we define multiplication on R2 so that R2 becomes a
field with R C R^ as a subfield?
Surely the most obvious try is
xy= (zi2/i,X22fc).
This operation does distribute over addition
x(y + z) = {x1,x2)(y1+z1,y2 + z2)
= (x1(y1+z1),x2(y2 + Z2))
= (xiyi + x1zllx2y2 + x2z2)
— xy + xz.
The operation is easily seen to be associative and commutative. It has an
identity (1,1). But inverses are a problem. Indeed, if this gave a field then
(by Proposition 2) we would have no divisors of zero, but
(1,0)(0,1) = (0,0) = 0 (the additive zero of R2).
Another way of stating the difficulty here is to note that R^ — {0}
is not a group under this multiplication. (Because, for example, (1,0) ^
0, but if (1,0) had a multiplicative inverse x = {x\,x2) we would have
(l,0)(xi,x2) = (1,1). Whereas 0x2 = 0^1.)
But there is a multiplication on R^ which makes it into a field. In
Section" A of Chapter V we will show that the following formula is "forced
on us."
(*) (a, 6)(c, d) = (ac — bd, ad + be)
In the following set of exercises one shows that (*) does the job.
Exercises
(1) A subset of H of a group G is a subgroup of G if the operation
on G makes H into a group. Prove that H C G is a subgroup
if and only if
(i) e e H, and *
(ii) if a, be H, then afc"1 € H.
0. Algebraic Preliminaries
(2) Given three distinct objects {a,b,c} let G be the set of all
permutations (= one-to-one maps) of {a, 6, c}, (e.g., a -> c,
c —> a, b —> b is a permutation). Define an operation on G to
be iteration (first one permutation and then the other). Show
that this makes G into a group. Show that this group is not
abelian.
(3) For a positive integer ra, let
Zm = {0,l,...,m-1}.
Define: + : Zm x Zm -* Zm
mult: Zm x Zm —> Zm
by taking answers modulo m (e.g., Zg = {0,1,2,3,4,5} and
(3) (5) = 15 modulo 6 = 3). Show that Zm has no divisors of
zero <^misa prime.
(4) Show that (1,0) acts as identity for (*).
(5) Show that (*) extends the multiplication on R (R = {(r, 0)|r G
R} C R2).
(6) Show that (*) makes R into a field (the field C of complex
numbers).
(7) Write (a, b) as a + bi and treat these as polynomials in i with
the condition that i2 = — 1. Show this gives (*).
(8) Define the conjugate of a = a + bi e C to be a = a — bi. (Note
that conjugation is the identity map on R.) Prove that for
a, (3 G C we have
(i) (a + /3) = a + /3, and
(ii) (a/3) = a/3.
Also show that aa is a nonnegative real number, and aa = 0
exactly when a = 0. (The real number |a| = y/afi. is called
the absolute value of a.) Show that if a is real (i.e., 6 = 0),
then |a| is just the usual absolute value of a real number. (The
definition of \r\ for r G R is Vr2.)
(9) Let p(x) be a polynomial in the indeterminate x with real
coefficients. Show that for any a E C we have
Use this to prove that if p(x) (rt&l eoefftalcmtii) hm a complex
root a (i.e., p(a) = 0), then S li §lio ft root,
0. Algebraic Preliminaries
7
(10) A polynomial p(x) over a field k is monic if the highest power
of x has coefficient 1. Let p(x) be monic and let r G k. Show
that if p(x) is divided by x — r, then the remainder is p(r).
(Hint: Write p(x) = xn + an-\xn~x + ... + a\x + ao and do
long division by x — r.) Show that this remains true if p(x) has
coefficients in R and r G C.
(11) Prom (10), we see that if p(x) is monic with real coefficients and
a G C is a root of p(#), then x—a divides p(x). Now prove that
a monic polynomial p{x) with real coefficients can be factored
into linear and quadratic factors. (Hint: Let «i,..., sm be all
of the real roots of p(x). Then
p(x) = (x-si)...(x- sm)q(x)
where q(x) has real coefficients but no real roots. For each
complex root a of q(x) we have factors (x — a) and {x — a).
Show that (x — a){x — a) has real coefficients.)
(12) Show that {a + 6\/3 | a, b G Q} is a field (a subfield of the field
R), but {a + &\/3 | a, 6 G Z} is not a field.
Chapter I
Vector Spaces and Linear
Maps
A. Vector Spaces
We introduce the concept of a vector space by using the plane R2 as a
model. R2 is the set of all ordered pairs (a, b) of real numbers. (The fact
that we take ordered pairs means, for example, that (1,2) ^ (2,1).) We
define an operation + of addition on R2 by
(1) • (a,6) + (c,d) = (a + c,6 + d).
This operation is associative (since addition of real numbers is associative).
The ordered pair 0 = (0,0) acts as identity for +, and relative to this
identity, (—a, —b) is the inverse of (a, b). Thus R2 becomes a group, and
it is abelian since the operation is clearly commutative.
We next define a second operation in which we use a real number and
an element of R2 to produce another element of R2. Namely, if r G R
and (a, b) G R we set
(2) r(a,b) = (ra,rb).
Terminology:
"We call elements of R2 vectors. We call elements of R scalars. We call
the operation (1) vector addition and the operation (2) scalar
multiplication.
Definition A real vector space V is a set V of elements (called vectors)
mid two operation*:
10
I. Vector Spaces and Linear Maps
vector + vector = vector, and
(scalar) (vector) = vector.
These operations must satisfy the following properties:
(i) V is an abelian group under + (identity = 0),
(ii) lv = v, and
(iii) the two operations are related by
(a) r(v + w) = rv + rw,
(b) (r + s)v = rv + sv, and
(c) (rs)v = r(sv),
where v, w are vectors; and r, s are scalars.
It is left as a routine exercise to verify that R^ with the operations (1) and
(2) is a real vector space. Also, what we did for R , we can clearly do for
R** ( = all ordered triples of real numbers) and, more generally, for Rn,
n = 1,2,3,... .
But these Rn spaces are not the only real vector spaces. For example,
let a, b G R and let V consist of all twice differentiable functions / which
satisfy
T^ + 4 + V-o-
dxA ax
We define operations
(/ +0)0*0 = f(x) + g(x), and
(rf)(x) = r(f(x))
and easily verify that V is a real vector space.
By replacing R by a field k we get the concept of a vector space over
the field k.
Examples
(1) The field k is a vector space over itself. The vector addition is
just addition in k and scalar multiplication is just
multiplication in k.
(2) kn = n-tuples of elements of k. This is a vector space over k
just as Rn is a real vector space.
(3) Let V be the set of all continuous functions from [0,1] to R.
We define
(f + 9)(t) = /(*)+f(t). Midfor reR,
(rf)(t) - r/(t).
A. Vector Spaces
11
Since /, g continuous implies that / + g is continuous and rf
is continuous, V becomes a real vector space.
Definition If V is a vector space over k and r\,..., rm G k and v\,..., vm G
V, then the vector
r-iv-i + ... + rmvm
is called a linear combination of v\,..., vm.
Definition Let V be a vector space over a field k. A subset W of V is
called a subspace (or linear subspace) if the operations of V make W into
a vector space over k.
Proposition 1 A subset W of a vector space V is a subspace <=$■ any linear
combination of elements of W is in W.
Proof: => If W is a subspace, it is closed under vector addition and scalar
multiplication and hence closed under taking linear combinations.
<= This is obvious. I
Examples
(1) Let D be all real-valued twice differentiate functions defined
on [0,1]. Given P,Q in D, let W be all solutions y{x) of the
differential equation y" + P(x)y' + Q(x)y = 0. Then W is a
subspace of D.
(2) Let R°° be the set of all sequences {a^} of real numbers. We
add coordinatewise and multiply by a real number coordinate-
wise. Let W C R°° be all sequences {a^} such that J2al *s a
convergent series. Then W is a subspace (proved in §B,
Chapter IV).
(3) Let P be the set of all polynomials with real coefficients. Let
Pn be those of degree < n. Then Pn is a subspace of P.
Proposition 2IfV is a vector space and {Wa}ae^ is any collection of
subspaces, then
W= f]Wa
is a subspace.
Proof: Exercise. I
Definition If V is a vector space and A is any subset of V, then Span
(A) \h the intersection of all subspaces of V which contain A.
By propositions 1 and 2 we see that Span (A) is a subspace of V equal
to all finite? linear combinations of vectors in A
12
I. Vector Spaces and Linear Maps
Definition ACV generates (or spans) V if Span (A) = V.
Examples
(1) A = {(n,0)|n EZ}CR2 does not generate R2.
(2) A = {(1,0), (0,1)} C R2 does generate R2.
(3) A = {(M2,*3)!* e R} C R3 generates R3.
(1) and (2) should be easy. But, at this stage of our development of this
subject, (3) should not be obvious. But we already do know how to prove
it and we proceed to (somewhat lengthily) do so.
Let (a,b,c) be an arbitrary element of R . We want to show that
(a, 6, c) is some linear combination of elements of A. We will take
u = (1,1,1)
v = (2,4,8)
w = (3,9,27)
as elements of A and we assert that the arbitrary element (a, 6, c) of R** is
a linear combination of these three elements of A. We seek real numbers
a, 0,7
such that
au + f3v + 7W = (a, 6, c).
Now
au = (a, a, a)
f3v = (2/3,4/W)
7iy = (37,97,277).
So we must have
(1) a + 20 + 37 = a
(2) a + 40 + 97 = 6
(3) a + 80 + 277 = a.
You do know how to solve these equations for a, 0,7. For example, solve
(1) for a (a = a - 20 - 37) and plug this into (2) and (3). This gives .
(10 a + 20 + 67 = b
(2') o + 60 + 247 = c.
A. Vector Spaces
13
Next we solve the first for /3 and plug that into the second. (2/3 = 6—0—67,
and putting that in the second gives
a + 3(6 - a - 67) + 247 = c .)
This last equation is easily solved for 7 to give
67 = c — 36 + 2a or
_ c - 36 + 2a
7 - 6 '
Thus for any (a, 6, c) € R we can find an appropriate 7. But then we can
plug this 7 back into V and 2' to get
(1") a + 2/3 + (c-36+2a) = 6
(2") a + 6/3 + 4(c-36 + 2a) = c .
Prom (1") we have 2/3 = 46 — 3a — c and plugging this and the value for 7
into (1) will give a.
This example was designed to show that this elementary method of
"solve-and-plug-in" is really a powerful method in linear algebra. It will
settle questions which are not at all intuitively obvious. Like: the set
A = {(t,t2,t3)\t eR} cr3
does generate R .
However powerful the "solve-and-plug-in" method is, it can be tedious
if we keep doing it for every problem. So we will develop general theorems
which will give us answers without each time resorting to "solve-and-plug-
in."
(The example A = {(£, £2, t3)\t G R} is called the twisted cubic, and our
"nolve-and-plug-in" method is often called the "method of elimination")
Exercises
(1) Show that {/ : R —> R|/ continuous} with operations
(f + g)(t) = f(t) + g(t)
(rf)(t) = rf(t)
is a vector space. Call it V.
(i) Let to S R and let
W = {fev\f(t0) = o}.
Show that W is a subspace of V.
14
I. Vector Spaces and Linear Maps
(ii) Let C/ = {/E V\f{t2) = (/(0)2 for all t e R}.
Show that U is not a subspace of V.
(iii) Let X — {/ : R —> R|/ differentiate} and show
that X is a subspace of V.
(2) If (7, W are subspaces of the vector space V, show that the sum
of [/ and W
U + w = {u + w\ue U, weW}
is also a subspace of V.
(3) If U and W are subspaces of V, show that UUW need not be
a subspace. However, if U U W is a subspace, show that either
UCW otWCU.
(4) Let R°° be the vector space of all real sequences and let W C
R°° be all sequences with only a finite number of nonzero com-
ponents. Show that W is a subspace of R°°.
(5) Show that the set {1, (t - 1), (t - l)2, (t - l)3} generates P3 the
vector space of polynomials of degree < 3.
(6) Suppose A and B are subsets of the vector space V; show that
if A C B, then Span(A) C Span(B).
(7) Consider 2x2 square arrays of real numbers (called matrices).
We denote the totality of these by
M2(R) = |7 ac bAUb,c,deR\.
We make M2(R) into a vector space (over R) by denning
e f \ ( a + e & + /
a b \
c d J
1 g hi \ c + g d + h
and
a b \ _ ( ra rb
c d J y re rd
(You will note that this is just the vector space R4 of ordered
quadruples of real numbers. But we will later find good
reasons for sometimes writing these ordered quadruples as square
arrays.)
-05)
€ M2(R) is diagonal If 6 * 0 = c. Show
that the set D C M2(R) of diagonal matrices ia a subspace of
M2(R). Do the same for th© sst T of upper triangular matrices
(c = 0).
B. Linear Maps
15
(8) Generalize (8) to 3 x 3 matrices M3(R).
(9) A = I . 1 G M2 (R) is singular if ad — be = 0; otherwise
it is nonsingular. Show that the set 5 of singular matrices in
M2(R) is not a subspace of M2(R). Show that the set NS of
nonsingular matrices in M2(R) is not a subspace of M2(R) .
B. Linear Maps
For two vector spaces V, W (over the same field fc), the kind of function
from V to W of most importance to us is a linear map.
Definition 0 : V —► W is linear if it respects linear combinations; i.e.,
for any a, 6 € fc and #, y e V we have
(3) 0(ax + 6j/) = acj)(x) + 60(</).
Exercise. We define the identity map I : V —> V by 7(f) = v for every
v € V. Show that J is linear.
Proposition 3 0 : V -> W is linear &
(i) <f>(x + y) = 0(x) H- 0(j/), and
(ii) 0(aai) = a<j>(x).
Proof:. => (i) follows from (3) by taking a = 1 = b and (ii) follows from
(3) by taking 6 = 0.
<ss 0(ax + fa/) = 0(ax) + 0(fa/) by (i), and this equals acj){x) + b(j){y) by
(ii). I
Note that if <j> is linear, then (f)(0) = 0 (since <j>(0) = <j>(0 + 0) = </>(0) +
0(0)).
(1) Suppose 0 : R —> R is linear. Then 0 is completely determined
by knowing 0(1). (If 0(1) = a, then 0(6) = 0(61) = 60(1) =
ba.)
(2) The same proof shows that a linear map 0 of R into any (real)
vector space is determined by knowing 0(1).
(3) Let D be the set of all real-valued C°° functions defined on R
(i.e., functions with derivatives of all orders). Then D is a real
vector ipace and wo have a map
a\ D-+D
16
I. Vector Spaces and Linear Maps
defined by letting
a(f) = /', the derivative of /.
Then a is a linear map.
(4) With D as in (3), we define
0: D-+D
by 0(/)O) = /* f(t)dt. Then 9 is linear.
Definition Let 0: V —> W be linear. The image of 0 is the set of all <j>(v)
for v G V. We denote the image of 0 by 4>(V) or by im 0. The preimage
of A C W, written 0-1(A), is the set of all vectors in V which map into A
under 0.
The kernel of 0, denoted by ker 0, is the set of all v G V such that
<j>{v) = 0. The linear map 0 is surjective (or epic) if 0(F) = W; it is
injective (or monic) if ker 0 = 0. A linear map 0 which is both surjective
and injective is an isomorphism. If there is an isomorphism from V to W,
we say that they are isomorphic and write V = W.
Exercise. Show that the linear map 0 is injective O 0 is one-to-one (i.e.,
if 0(i>) = 0(u>), then v = w).
Note that 0-1 is a function from subsets of W to subsets of V\ however,
it is not a function from 4>(V) to V in the usual sense unless 0 is monic (i.e.,
if ker 0 ^ {0} and w G im 0 (so there is a v G V such that <j>(v) = u>), then
(j)~1(w) is not well-defined since it consists of the set {v + v'\vf G ker 0}).
Examples
(1) 0 : R —> R2 defined by 0(x) = (x,x) is linear and is injective,
but not surjective.
(2) 0: R^ —> R defined by 0(x, y) = x+y is linear and surjective,
but not injective.
(3) Let 0 : R^ —* R^ be linear. Denote 0(1,0) by (a,c) and
0(0,1) by (ft, d), i.e.,
0(1,0) = (a, c), 0(0,1) = (6, d).
We assert that 0 in example (3) is an isomorphism 4=> ad — be ^ 0.
We can prove this by our "solve-and-plug-in" method discussed in Section
A. This is a bit tedious, but we will do it in hopes of motivating our
development of the general theory of matrices and determinants.
We have 0 : R2 —► R2 linear, and
#1,0) « (a,c), 0(0,1)-(6,d).
B. Linear Maps
17
Suppose ad — bc — 0. We show that <j> is not injective (and hence can't be
an isomorphism).
If both a and c are zero, then 0(1,0) = (0,0) and we are done. If
not, then one, say a, is nonzero. Then (—1,1) ^ (0,0), but we claim
that </>(-£, 1) = (0,0). Well, (-^,1) = -|(1,0) + (0,1), so #-i,l) =
-±^(l,0) + ^(0,l) = -i(a,c) + (6,d) = (-6,-^) + (6,d) = (0,=^) =
(0,0). A similar argument works with 6, d replacing a,c. Thus we have
proved <\> an isomorphism =>ad — bc^0 (by proving ad — bc = 0=>4> not
an isomorphism).
Next we need to show ad — be ^ 0 => (i) <j> is injective and (ii) 0 is
iurjective.
To prove (i), we suppose
(4) 0(aM/) = (O,O)
and prove that (x,y) = (0,0). As before, we have 0(1,0) = (a,c) and
#0,1)-(M). Then (4) gives
(1) ax-\-by = 0, and
(2) cx + % = 0.
Since ad — be ^ 0 one of a, 6 must be nonzero. Suppose a ^ 0. We solve
(1) for # and plug into (2) to get
/ cb \ „ f-cb + ad\
y(--+d)=0 or ^_-_j=o.
Since R has no divisors of 0 and ~cb+ad ^ 0, we conclude that y = 0.
Similarly, 6 and d can't both be zero so we can solve (1) or (2) for y and
plug into the other to get x — 0. Thus 0 is injective.
The proof that 0 is surjective goes in a similar way. Given (z, w) G R
we want (x,y) such that </>(x,y) — (z,w). We can solve ax + fa/ = z for
either x or y and plug into ex + dy = w, solve and plug the answer back
into ax + by = z.
Proposition 4 If (f> : V —* W is a linear map which is an isomorphism,
then 0""1 is also linear (and thus is also an isomorphism).
Proof: Let a, b G k and x,y € W. Since 0 is one-to-one, there exist unique
elements x',y' of V such that <j){x') = x,<f>(yf) — y. Then ^ox' 4- fa/) =
ti,j' +.6y, so that
<f>'l(ax + by) = ax' 4- by' = a^""1^) 4- b<p'l(y),
proving linearity of </>~l. I
W© conclude thin section with a brief discussion of the Algebra of En-
domorphismB of a vector space.
18
I. Vector Spaces and Linear Maps
The Algebra End(F)
Definition An algebra is a vector space W which also has a multiplication
WxW -> W
(v,w) —> vw
which distributes over vector addition
u(v + w) = uv 4- UW, and
(v 4- w)u = w + wii
and satisfies v(aw) = a(vw) = (av)w for a E k. The algebra has a nm£
element if there is an element leW which acts as a left and right
multiplicative identity. It is associative if its multiplication is associative.
Examples
(1) R is an algebra with unit element.
(2) C is an algebra with unit element.
Indeed, any field is an algebra with unit element, but an algebra with
unit element need not be a field because some nonzero elements in an
algebra may not have multiplicative inverses. This is true of the algebra
End(F)
which we are about to define. This algebra End(V) plays a central role in
the study of linear algebra.
Definition A linear map of a vector space V into itself is called an en-
domorphism (or operator) of V. We denote by End(F) the set of all endo-
morphisms of V.
We first define operations on End(V) to make it into a vector space
over k (V being a vector space over the field k). Let 0, ^ G End(V). We
define <\> 4- ^ by
(4-) 04-t/OO) = <l>(v) + ij)(v) forallveV.
For a G k and (f) G End(V) we define acf) by
(.) (a(f)){v) = a(<t>(v)) for all v € V.
The operation (4-) is easily seen to be associative and commutative. *
The zero element of End(V) is the endomorpfalim lending all v € V to the
zero element in V. Relative to this zero in IndfK), tha additive inverse
of cj) e End(V) is -<fi defined by (-0)(v) » *=0(v). Clearly 0 + (-0) in
II. Linear Maps
19
the zero endomorphism. The other properties of (+) and (•) are routinely
verified so that End(F) is indeed a vector space over k.
Next, we want to make End(V) into an algebra. For 0, i/j G End(F)
their product 0-0 is defined by 4>^{v) = 0(,0(i>)). That is, we just take their
composition
v JUV A+y
v —► xj}(v) —>(f)(tp(v)).
N v '
(We will use the notation 0-0 and 0 o -0 interchangeably.) 0-0 is linear
because
{<p%jj){av + bw) = (f)(a^(v) 4- btp(w)) since i\) is linear
= a<t>{^){v)) 4- b(j>(il)(w)) since 0 is linear.
Alio our multiplication distributes over addition. For let 0,0,-0 £
End(V) and v £ V. Then
{0(0 +VOX*) = 0((0 + ^))
= 0(0(u) 4- ^(v)) = 04>{v) 4- 0V>(^)
= (00 + 0^).
Thus End(F) is an algebra. This algebra does have a unit element, i.e.,
the identity map of V to V. We denote this map by 1,
1 : V -> V,
and we clearly have 01 = 10 = 0 for all 0 G End(V).
It in convenient to use 02 to denote 00, etc. Thus p(0) makes sense if
p($) \n any polynomial with coefficients from k.
Definition Let A be an algebra with unit element 1. Then any a G A is
failed h ?/m£ if there exists b e A such that a& = 1 and ba = 1. (That is,
fi r /I is a unit if it has a multiplicative inverse b.) Clearly then b is also a
unit; furthermore, this b is unique. (Because if also ac = ca — 1, then
b — lb = (ca)b = c(ab) = c\ — c.)
Proposition 5 Let A be an associative algebra with unit element 1. Then
thr 8ft U of all units in A is a group under multiplication.
Proof; By hypothesis, multiplication is associative, 1 acts as identity and,
by definition, each u € U has a multiplicative inverse. I
Since our multiplication in End(V) is associative (composition is
associativa by itn very definition) we have a group of unite* in End(V). Wo
denote it by GL{V) aud call it the general linear group ov©r V.
20
I. Vector Spaces and Linear Maps
Proposition 6 <\> e End(V) is a unit <$ <j) is an isomorphism.
Proof: => <\> is a unit so there exists V € End(F) so that c/yip = 1 and
ipcf) = 1. Now 0-0 = 1 implies 0 is surjective (and ip is injective) and
i/j(f) = 1 implies 0 is injective (and i/j is surjective) (see exercises).
<= If 0 is an isomorphism, take i/j = (p'1 to see that 0 is a unit. I
Thus, GL(V) consists of those endomorphisms of V which are actually
isomorphisms of V.
Exercises
(1) Let V —► W be linear. Show that ker 0 is a subspace of V
and 4>{V) is a subspace of W.
(2) Give specific linear maps 0, i/j : R2 —> R2 such that 0-0 ^ i/j(f).
(3) Let F -^-> W -^-> F be linear maps such that V ^ V is an
isomorphism. Show that <\> is injective and i\) is surjective.
(4) A linear map V —► V is idempotent if pp = p. Show that if p
is idempotent then p acts as the identity on p(V). (Such linear
maps are called projections: p projects V onto p(V).)
(5) Show that <\>: R2 —> R2 given by 0(x, ?/) = (x, x + y) is linear
and so is ip : R3 —> R given by i/j(x,y,z) = z. Show that
0 : R2 -+ R given by 0(x, ?/) = xy is not linear.
(6) Decide whether or not <\> : R2 —> R2 given by 0(x, y) = (x +
?/, 2x—y) is an isomorphism. If it is, find a formula for (f>~1(u^ v)
and show that <\> o 0~1(u, v) = (u, v) and 0"1 o 0(x, y) = (x, y).
(7) If 0 : V —> W is linear, show that (f)(—v) = —</)(v) for any
(8) If 0:R2-+R2 is defined as in exercise (6), what is p(0) where
p(x) = x2 - 2x + 1?
C. Bases, Dimension
Let V be a vector space over a field k. For a (finite) linear combination of
vectors
ai^i + «2^2 + ... + amt;m (ai € fc, vt E V)
we make a convention: namely, if i ^ j then ^ ^ Vj. That is, vectors with
different subscripts must be different vectori. W© can always put a finite
C. Bases, Dimension
21
linear combination in this form — if a given vector occurs two or more
times, we sum up its coefficients and then write it just once with this sum
as coefficient.
Definition Let S be a subset of V. The set S is linearly independent
if no two distinct finite linear combinations of elements of S can be equal
vectors.
For example, if S = {u, v, w} is linearly independent and
au 4- bv + cw = a'u + b'v 4- cfw,
then it must be that a = a', 6 = b', c = c'. That is, if we don't have a = a',
b = 6', and c = c', then
au -f bv 4- ciu and a'u 4- b'v 4- c'w
must be different vectors in V.
Lemma If S is linearly independent and
aiVi 4-... 4- amVm = 0 (vi 's in 5),
Men a\ = a2 = ... = am = 0.
Proof: We have
ai^i 4-... 4- amvm = 0 = (toi 4-... 4- 0vm. I
Definition If S C V is not linearly independent, then it is said to be
Hnmrly dependent
Proposition 7
(i) IfOeS, then S is linearly dependent.
(ii) If S is linearly dependent and S C T C V, £/ien T zs linearly
dependent.
(iii) // 5 25 linearly independent and U C S C V, then U is
linearly independent.
Proof:
(i) 1 • 0 = 0 • 0 (two distinct, but equal, linear combinations).
(ii) We have a\V\ + ... 4- amvm = 0, with v% £ S and not all a*
being zero. Clearly we have the same relation in T since all
Vi € T,
22
I. Vector Spaces and Linear Maps
(iii) This is the contrapositive of (ii). I
Suppose S = {^i,..., vp} is a finite set of vectors in V. How can we
proceed to determine if S is linearly dependent or independent?
If vi = 0, we are done (5 is linearly dependent). If V\ ^ 0, then its
span is a nonzero subspace of V.
Span(^i) = {tv\\t £ k}.
If any of i?2,..., vp lie in this span (e.g., Vj = to^i), then {vi,Vj} is linearly
dependent, and hence so is 5. If none of i?2,..., vp are in Span(vi), consider
Span(vi,V2). Show that if one of V3,..., vp is in Span(vi, V2), then S is
linearly dependent. If not, look at
Span^i,^,^)-
If any of V4,... , vp lie in this span, we have that S is linearly
independent, etc. If vp is not in Span(vi,..., vp_i), we have that S is linearly
independent.
Definition A set T is infinite if T contains a subset U which can be put
into a one-to-one correspondence with N = {1,2,3,...}. (Of course, we
may have U = T.) If T is not infinite it is finite.
Definition A vector space V is infinite-dimensional if it contains an
infinite linearly independent subset. Otherwise V is finite-dimensional.
Equivalently, we can define V to be finite-dimensional if it is spanned by
some finite subset.
Examples
(1) R°° = all sequences of real numbers (Example (2) of IA after
Proposition 1). This vector space is infinite-dimensional. For
let
S = {(1,0,0,...), (0,1,0,0,...), (0,0,1,0,...),...}.
Then S is infinite (let (1,0,0,...) <-+ 1, (0,1,0,...) <-+
2, (0,0,1,0,...) <-► 3, etc.) and linearly independent.
(2) R^ is finite-dimensional. Indeed, any three vectors in R^ form
a linearly dependent set. (See (i) of Theorem 1 later in this *
section.)
Definition A set S in a vector space V is a basi$ for V if:
Span(S) = V; and
S ie linearly indepsadent.
C. Bases, Dimension
23
Example
Theset{ei = (l,0,...,0),e2 = (0,l,0,...,0),...,en = (0,0,...,l)}is
a basis for Rn or, more generally, for kn (called the standard basis for kn).
Proposition 8 S CV is a basis for V O Each element ofV is a unique
linear combination of elements of S.
Proof: => Since S spans F, any v G V is a linear combination of elements
of S. Since S is linearly independent, this linear combination must be
unique.
<= The hypothesis implies both spanning and linear independence. I
Proposition 9 Let {vi,..., vn} be a basis for V, (defined over a field k).
If W is a vector space over k and w\,..., wn are any n vectors in W, then
there exists exactly one linear map
4>:V-*W
audi that 4>{v%) = wi for i = 1,..., n.
Proof; v G V may be uniquely written as
v = aiV\ + ... + anvn,
m If <f> is to be linear we must have
(6) . (ft(v) = ai(j)(vi) + ... + an(j)(vn) = aiwi + ... + anwn.
Conversely, if 4> is defined by (5), it is linear and 4>(vi) = Wi. I
Proposition 10 If A = {i?i,... ,i?m} spans V, then some subset of A is a
hamx for V.
Pivof: Let A be the set of all linearly independent sets in A. Let n be the
maximal number of elements in any set in A. By changing notation, we
rimy asnume that: {^i,..., vn} is linearly independent, but if n < i < m,
ili^u {V\,..., vn, Vi} is linearly dependent. Thus we have a relation
aivi + ... + anvn + alvl = 0
Willi not all coefficients zero. If a* = 0, we contradict the linear inde- *
puitc'Umcn of {v[,...yVn}. So we can solve for V{ (vi = —^v\ — ... -
|*Hj„) to nhow that v% € Span(vi,... ,vn). Thus vn+i,... ,vm are all in
gpaii(ti] f,.., vn) so {vi,..., vn} is a basis for V. I
24
I. Vector Spaces and Linear Maps
Theorem 1 Let {^i,..., vn} be a basis for V. Then
(i) any wi,... ,wn, iun+i in V are linearly dependent, and
(ii) any U\,..., itn-i fail to span V.
Proof: (For (i) we really just need that {^i,..., vn} spans V.) We have
w\ = anVi + ... + ainvn
wn+! = an+i,ii;i 4-... 4- an+i)nvn.
If w\ = 0, we are done. If w\ ^ 0, some a\j must be nonzero and we
can solve the first equation for Vj (in terms of the other v's and wi) and
plug this into the remaining n equations.
The second equation has w\,W2 and all but one of the v's in it. If
W2 = 0 or if wi and w<i have the only nonzero coefficients, then {wi, W2} is
linearly dependent. If not we can solve for some v and plug this into the
remaining n — 1 equations.
By now the pattern of the argument should be clear. Either {wi,W2, ^3}
is linearly dependent or we can solve for some v and plug that into the
remaining equations. After n steps we run out of v's and have a nontrivial
linear dependence relation among wi,... ,wn+i-
The argument for (ii) is similar. If iti,... ,itn-i did span V we would
have
vi = cntti 4 ... 4ci5n_ittn_i
Vn = Cni^i 4 ... + Cn>n_iUn_i.
Now fi 7^ 0, so we can solve the first equation for some it and plug in the
other equations. We wind up with a nontrivial linear dependence relation
among vi,..., vn , contradicting the hypothesis. I
Prom Proposition 10 and Theorem 1, we see that any finite-dimensional
vector space V has a finite basis and that any two bases for V must have the
same number of elements. In fact, any infinite-dimensional vector space V
also has a basis (called a Hamel basis) and any two such bases for V can be
put into one-to-one correspondence; we will show existence of bases after
we have discussed Zorn's Lemma in section IF.
Definition The number of elements in a basis for the finite-dimensional
vector space V is the dimension of V, written dim V.
C. Bases, Dimension
25
Proposition 11 Let W be a subspace of the finite-dimensional vector space
V. Then any basis for W can be extended to a basis for V.
Proof: Let {wi,... ,wp} be a basis for W. (This must be finite or we
would have an infinite linearly independent set in V.) Then {wi,..., wp}
is linearly independent in V. If Span(u>i,..., wp) — V, we are done (since
Span(u>i,..., wp) = W so V = W).
If not, we take vi,..., Vk in V — W so that
Span(wi,...,Wp,t;i,...,t;jfe) = V.
We consider linearly independent subsets of {u>i,..., wp, t>i,..., Vk} and
proceed, as in Proposition 10, to get a basis for V containing {u>i,..., wp}.
i
This clearly implies dim W < dim V.
Now suppose V and W are finite-dimensional vector spaces over a field
k) and
4>: V-+W
is a linear map. We have seen that the kernel of <f>
ker (f) = {v e V\<t>(v) = 0}
1§ a subspace of V and the image of 0
im cf) = {(f)(v)\v e V}
i» H iubspace of W. We now prove an important relation involving dimen-
llf)IIN. •
Thaorem 2 (The Rank Theorem)
dim V = dim (ker 0) + dim(im 0).
Proof: Let {i>i,..., vr} Q V be a basis for ker 0, and {u>i,..., u>s} C W
bp a, basis for im <\>. Choose tti,..., us e V such that <j>{ui) = wi (we can do
thin by tho definition of im 0 = 4>(V)). We claim that {t>i,..., vr, tti,..., tts}
ii a basis for V. This will prove the theorem.
First we will show {i>i,..., vr, tti,..., its} is linearly independent. Sup-
\umv ai?^ «f \~arvr + b\Ui -\ \-bsus = 0. Since vi,... ,vr are in ker 0,
fhip Implies
c/)(aifi + ... + arvr + 6itti + ... + bsus) = 0(0) = 0
0(&iui + ... + b8u8)
b\<t>(ui) + ... + ba<l>(ua)
II
hw% + .,, + 6i^i.
26
I. Vector Spaces and Linear Maps
Since wi,... ,ws are linearly independent in W,
hiw\ + ... + b8w8 = 0
implies 61,..., bs are all zero. But then
a\V\ + ... + arvr = 0
so a\,..., ar are all zero. This proves
{vi,...,vr, uu...,us}
is linearly independent.
It remains to prove this set of vectors spans V. Let x G V. Then
<t>(x) e 4>{V) so
(f)(x) = tiWi + . . . + tsWs.
Set u = t\Ui +... + tsus. Since (f>(ui) — W{, we see that (f)(x—u) = 0. Thus
x — u is in ker 0 and is a unique linear combination of v\,..., vr
x — u — rri\V\ + ... + mrvr , so
x = miVi + ... + mrvr + t\Ui + ... + tsus. I
The rank theorem has two important corollaries.
Corollary 1 Let V he a finite-dimensional vector space and (j): V —> V be
a linear map. If (f> is monic, then <\> is an isomorphism. If <\> is epic, then
(f) is an isomorphism.
Proof: If (f> is monic, the rank theorem implies that dim (f>(V) = dim V, so
4>(V) = v.
If (f> is epic, 4>{V) = V, and the rank theorem implies that dim (ker (f>) =
0. Thus (f) is monic. I
Corollary 2 Let V and W he finite-dimensional vector spaces of the same
dimension. Then V = W.
Proof: Let {vi,..., vn} be a basis for V and let {wi,..., wn} be a basis
for W. By Proposition 9, there is a linear map 0 : V —+ W such that
<t>(vi) = W{ for i = 1,... ,n. Note that 0 is epic since {wi,... ,wn} spans
W.
Now the rank theorem implies that ker (j) — {0}, so <\> is also monic.
Hence (j) is an isomorphism. I
Corollary 2 says that any n-dimensional vector space V over the field k
is essentially kn since we get an isomorphism
V^kn
once we choose a basis for V (we already have the standard basis for fen).
C. Bases, Dimension
27
Exercises
(1) Check that
a: N-+N
defined by a(n) = 2n gives a one-to-one correspondence
between the set of all natural numbers and the set of all even
natural numbers. Use this to prove that:
If a set S is infinite, then it can be put in a one-to-one
correspondence with a proper subset of itself.
(2) Determine whether or not {(1,1,0), (2,0,-1), (-3,1,1)} is a
basis for R .
(3) Let (f> : V —> W be linear. Suppose that vi,..., vp G V are
such that (f>{v\),..., 4>{vp) are linearly independent in W. Show
that vi,..., vp are linearly independent.
(4) Let (f> G End(V) for a finite-dimensional vector space V. Prove
that
cf) is monic <3> (f) is epic <3> <\> is an isomorphism.
(5) Show that Pn = {polynomials with real coefficients of degree
< n} is an (n + 1)-dimensional subspace of the infinite-dimen
sional vector space of all real polynomials.
(6) Let Vbea vector space over a field k and let £/, W be finite-
dimensional subspaces of V. Prove that both U + W = {u +
w\u G i7, w G W} and U D W are finite-dimensional subspaces
of V, and
dim(U + W) + dim(U n W) = dim U + dim W .
(Mint: We can find bases:
{vu...,vp} for t/HW
{fi,... ,fp, ui,.. .,uq} for 17
{vi,...,vp, wi,...,iur} for W.
■ Show that {v\,... , i>p, ui,... , ug, u>i,... ,uv} is a basis for
U + W.)
(7) Show that the set of real numbers R is a vector space over the
rational numbers Q. Show that R is not finite-dimensional as
a vector space over Q.
28
I. Vector Spaces and Linear Maps
(8) Show that the set of complex numbers C is a two-dimensional
vector space over R.
(9) Define two subspaces C/,y of a vector space W to be
complementary if (i) U fl V = {0}, and (ii) U + V = W. Prove that
[7,V are complementary <£=£► each w G W may be uniquely
represented as w = u + v, u €U, v eV.
(10) Let {ei,e2,e3} be the standard basis for R^ and defiine (f> :
R3 -> R by <£(ei) = 1, 0(e2) = 2, <£(e3) = -1. Determine the
subspaces ker 0 and im 0, and verify the rank theorem in this
case.
(11) (f) : V —> V is nilpotent of order 2 if (fxf) is the zero endomor-
phism. Now composition of two such endomorphisms need not
be nilpotent of order 2. Find <\>,ty : R^ —> R , each nilpotent
of order 2, whose composition is idempotent.
D. Direct Sums, Quotients
In this section we discuss a generalization of the way we constructed the
vector space R^ from R. If V, W are vector spaces over a field fc, we will
construct a new vector space V 0 W called the (external) direct sum of V
and W. The set of vectors in V 0 W is just the set of all ordered pairs
(v,w) with v EV and w € W. We add two such pairs coordinatewise
(v, w) + (x, j/) = (v + x, w + y)
and for a scalar r £ k we define
r(v, t/;) = (rv, rt/;).
The fact that these operations make V 0 W into a vector space is trivial.
It satisfies the axioms for a vector space since V and W do.
The zero vector in V 0 W is (0,0), the first vector being the zero in V,
the second that in W.
Just as we can include R into R^ as either the x-axis or y-axis, and
we can project R^ onto either axis, we have four naturally defined linear
maps.
D. Direct Sums, Quotients
29
V
V
V®W
w
w
k(v) = (v,0)
i2(w) = (Q,ty)
Pi(x,y) =x
p2{x,y) =y
W© call ii^2 inclusions and pi,P2 projections. Clearly pi o^ is the identity
map on V and p^ o i2 is the identity map on W. It follows that i\,i*i are
Itljocfcive (linear maps) and Pi,P2 are surjective (linear maps). Note also
thai P2 oil and pi oi2 are zero maps. Actually, the image of i\ is the kernel
of pa and the image of i<i is the kernel of pi.
Proposition 12 If {vi,... ,vn} zs a feasis for V and {wi,... ,wm} is a
buni* for W then
{(vu 0),..., (vn, 0), (0, iyi),..., (0, wm)}
U a basis forV ®W.
Pwof; Given (v,w) <G V © W we have
v = aivi + ... + anvn and
w = biWi + ... + bmWm uniquely.
Tlinn (u,u/) = ai(vi,0) + ... + an(vn,0) + bi(Q,wi) + ... + bm(Q,wm), also
utihjudy I
CWnllary dim(V © W) = dimF + dim W
NpmI wn consider when a vector space is, "somehow," the direct sum of
iWn of Itw Nuhnpaces. So suppose A and B are subspaces of the vector space
V (6vt»r #). Then A and 5 are vector spaces over k and we can construct
A$ B SB above. We use the operation of vector addition in V to define a
Hi up
rj: A&B-+V,
tfUhb) — aH-6,
30
I. Vector Spaces and Linear Maps
Proposition 13
(i) 7] is a linear map.
(ii) rj is infective O Af)B = {0}.
(iii) rj is surjective <3> A\J B spans V.
Proof:
(i) rj(r(ai,bi)+s(a2,b2)) = v(rai + sa2,r61 + 562) by definition of
operations in A ® B. By definition of 77 this is
rax + sa2 + rbi + sb2 = n/(ai, &i) + srj(a2,b2),
proving rj is linear.
(ii) <= We suppose Af)B = {0} and 77(0,6) = 0. That is a + 6 = 0.
This implies a = —6 G B and 6 == —a G i so a = 0,5 = 0.
=> (Contrapositive proof) Suppose c e An B. If c 7^ 0, we
have ry(c, 0) = c and 77(0, c) = c but (c,0) ^ (0,c). Thus 77 is
not monic.
(iii) <= Given v G V, it is a finite linear combination of ai,..., a^
in A and 61,..., 6m in £?. Add up (with coefficients) the part
in A to get a G A and similarly get b e B. Then
77(a, 6) = a + 6 = 7j.
=> This is obvious. I
Definition When 77 (above) is an isomorphism, we say that V is an
internal direct sum of A and B.
By Proposition 13, we conclude that V is the internal direct sum of A
and B exactly when we can express every v G V uniquely as v = a + 6
where a G A, 6 G £?.
Examples
(1) R^ is the internal direct sum of
A = {(x,0)GR2} and
B = {(0,7/) GR2}.
(2) R^ is the internal direct sum of A (as in (1)) and C = {(#, x) G
R2} because A n C = {0} and A U C spans R2. (Given
(p, (?) G R , we have
(?,«) = (P-0.O) + («,g).)
£,4 eC
D. Direct Sums, Quotients
31
There is some useful terminology for situations where we have three or
more vector spaces and linear maps like, e.g.,
(6) U -^ V ^-* W.
Definition We say the sequence (6) is exact at V if
imp = ker a .
Let us denote the trivial ( = one-element) vector space by 0. Then if
wo assert that
in exact at {/, we are simply asserting that
p is monic.
Ill Hilary, if we assert that
1« ixact at W, we are simply asserting that
a is epic.
Definition If
(T) . 0->U-*>V-Z>W-+0
li M\ exact nequence of vector spaces (i.e., exact at {/, at V, and at W), we
nay that (7) .spZite if there is a linear map 7 (called a splitting)
v 2- w
Stir11 that (7 © 7 in the identity on W.
Theorem 3 For finite-dimensional vector spaces, the exact sequence (7)
fjMf * MplU,
Pft«»/; OIioohc a basis (2:1,..., zm} for W. By Proposition 9, given any
m v« turn |/i,..., j^m in V there is exactly one linear map W —> V sending
*i to fy( i -*= I,..., m, So if we choose
Vi € flr"1^) i = l,...,m
(which Wfc can do since a is epic) and we set
7(*«) s M »
32
I. Vector SpacciH and Lincwir Maps
then the resulting linear map clearly has a o 7 = id . I
Corollary Given the exact sequence
of finite-dimensional vector spaces and any splitting 7 : W —► V, then V
is represented as an internal direct sum
V = p{U)®y{W).
Proof: Let {a?i,..., xn} be a basis for U and {^i,..., zm} be a basis for
W. Since p(U) f)j(W) = {0} (by exactness at V), it will suffice to show
that
{p(xi),..., p(xn),7(*i),..., 7(^m)}
is a basis for V. Since p is injective, {p(a?i),... ,p(xn)} is linearly
independent in V. Also {7(^1),...,7(^m)} is linearly independent in V
(since their images under a are linearly independent in 7(W^)). Since
p(U)r)-f(W) = {0}, {p(xi),...,p(xn), 7(zi),...,7(zm)} is linearly
independent in V.
By the rank theorem, dim V = n + m so that set is indeed a basis. I
Quotients
Let V be a vector space over a field k and let W be a subspace of V.
We use W to define an equivalence relation on V. Namely, for v,w G V,
we define v ~ w if v — w E W. Now
(i) v ~ v, since v-u = 0G W.
(ii) v ~ w => w ~ v, since v — w G W =^ w — v = — (v—w) G W.
(hi) v ~ w and w ~ u => v ~ it, since v — u = (v — w) + (w — u)
and VF is closed under vector addition.
Thus ~ is an equivalence relation.
The equivalence relation ~ divides V into disjoint equivalence classes
which are called cosets.
Proposition 14 Let C be any one of these equivalence classes and let v
be any element of C. Then
C = W + v = {w + v\w G W}.
D. Direct Sums, Quotients
33
Proof: By definition
C = {u\u ~ v} = {u\u — ve W}.
For such a u, u — v = w E W so that u = w 4- v. So the proposition is
clear. I
Thus each equivalence class is just UW translated by adding a fixed
vector to each element of W." We have a schematic picture
We denote the collection of equivalence classes by V/W and will now
make V/W into a vector space, called the quotient space of V modulo W.
Definition Let Ci, C% be two equivalence classes. Choose any v G C\ and
any w G C% and define C\ + C<i to be the equivalence class containing the
vector v + w. For any equivalence class C and any r G k we choose any
u G C and define rC to be the equivalence class containing the vector ru.
Proposition 15 These operations are well-defined, i.e., they are
independent of our choices.
Proof: Let v' G C\ and w' G C2. We need to show that the class containing
v + w is also the class containing v' + w'. But v — v' = w\ G W and
w — w' = W2 G W, so
(y + w) — (v' 4-1//) = wi 4- W2 £ W"-
Thus addition is well-defined. If v! (as well as u) is in C, then u — u' — w
and ru — ru1 = r(ii-u') = ru; € W, so scalar multiplication is well-defined.
1
34
I. Vector Spaces and Linear Maps
Proposition 16 With these operations V/W becomes a vector space over
k.
Proof: Exercise. I
Exercise. If dimF = n and dimVF = m, show that d\mV/W = n —
m. (Hint: Let {wi,... ,wm} be a basis for W and extend this to a basis
{wi,..., wm, i>m+i, • • •, vn} for V. If Vi denotes the coset W + Vi, show
that {vm+i,..., vn} is a basis for V/W.)
Proposition 17 Let r\ : V —> V/W be the function obtained by assigning
to v G V the equivalence class r](v) G V/W to which v belongs. Then r\ is
a linear map (which is clearly epic).
Proof: r](av 4- bw) is the equivalence class to which av 4- bw belongs. This
is (by definition of 4- in V/W) the sum of the classes containing av and bw,
and this equals (a times the class of v) 4- (b times the class of w). That is,
r](av 4- bw) = arj(v) 4- br)(w),
proving linearity. I
Let U, V be vector spaces over k and 0 : U —> V be a linear map. We
know that ker 0 is a subspace of U and (j>(U) is a subspace of V (and hence
is a vector space).
Theorem 4
ker <
4>(U).
Proof: We have
U
V
u
ker <f>
where i/j is defined as follows: for u € U,rj(u) =u + ker (f> and we set
i/>(ri(u)) = (f>(u).
We claim: if; is well-defined, ^ is linear, ^ is monic, and
U
$
ker ^
= 4>(U).
D. Direct Sums, Quotients
35
If u + ker (f> = u' 4- ker 0, then r](u) = ?/(?/) so ip(ri(u)) = ^(^(u')) and
^ is well-defined.
i/j(ar](u) + bq(y!)) = ip(r](au + buf)) (since r/ is linear)
= (j>(au 4- bu;) (by definition of xjj)
= a<t>(u) 4- b<t>{v!) {(f) is linear)
.j = ai/j(r](u)) 4- bi/j(r](u')) (definition of ip).
Thus ip is linear.
Clearly ^ is monic, for if 0 = i/j(r](u)) = (j>(u), then u G ker 0 and
^) = 0inKih-
Finally, if v = 0(tt) then i/j(r](u)) = v, so that ipL U ) = </>(U). I
Exercises
(1) Show that V is the internal direct sum of the subspaces A and
B <^ every v E V can be written uniquely as v — a 4- b where
a G A, b G B (i.e., A and B are complementary).
(2) Generalize the notions of external and internal direct sums to
three or more summands. (Hint: The characterization of
internal direct sum given in exercise (1) is easier to extend.
Exercise (4), below, shows how to extend the definition given after
Propositon 13.)
(3) HU and W are subspaces of the vector space V, show that
(u + w)/w^u/(unw).
(4) Let Vi,...,Vn be subspaces of the vector space V and suppose
that V = V-i 4- • • • 4- Vn. Show that if Vi n (Vi 4- ■ • ■ 4- Vi-x 4-
Vi+i H V Vn) = {0} for i G {1,..., n}, then V is the internal
direct sum of V\,..., Vn.
(5) Let Wbea subspace of the finite-dimensional vector space V.
Show that there is a subspace U of V such that V = U 0 W.
(Hint: Consider the exact sequence
0 -► W -► V -► V/W -► 0
of finite-dimensional vector spaces.)
(6) Let W bo the subspace of R3 spanned by {(1,1,0), (1,0, -1)}.
Find a Huhnpnco U C R3 such that R3 a [/ © W.
36
I. Vector Spaces and Linear Maps
(7) Let V be finite-dimensional and suppose Vi,..., Vm are sub-
spaces of V. Furthermore, suppose that B{ is a basis of Vi for
each i. Show that V = Vi 0 • • • 0 Vm <=> B = Bx U • • • U Bm is
a basis of V.
The Linear Geometry of R .
In the real vector space R , a line is a coset of a 1-dimensional subspace
of R3 (i.e., any line L is L = v + V where V is a one-dimensional subspace
of R3 and v G R ). Similarly, a plane in R3 is a coset of a two-dimensional
subset of R . A point is a coset of the (trivial) zero-dimensional subspace
{0} of R . The dimension of a coset S is the dimension of the
corresponding subspace.
Note that the collection S of all cosets of R3 is partially ordered by
inclusion.
Proposition ( Prove) Let {Sa}aeA be a collection of cosets in R . Prove
that Qj6Y4 g is either a coset of R3 or is empty.
Definition Let Si,S2 be two cosets in R . Their join is the intersection
of all cosets S G S such that Si C S and #2 Q S, written Si J#2-
Prove the following.
(1) Let V be a subspace of R . Then
v + V = w + V<^wev + V<3>-v + weV.
(2) Ifw + W = v + V, then W = V.
(3) (w + W)J(v + V) =w + [-w + v] + W + V, where [-a + 6]
denotes the one-dimensional subspace generated by the vector
(-a + b).
(4) (w + W)Ci(v + V)^fb&(-w + v)eW + V.
Use (3) and (4) to prove.
Proposition Let S = w + W, T = v + V. Then SnT = 0 <£> dim(5JT) =
dim(TV + V) + l.
Definition w 4- W and v + V are parallel if W C V or V C W. (Note
that a point is parallel to every coset).
Prove the following theorem.
Theorem Let 5, T be two cosets in R . Then
(1) S C T => dim 5 < dimT, and equality implies S m T.
E. Eigenvectors and Eigenvalues
37
(2) IfSnT is not empty, then
dim(SJT) + dim(5 n T) = dim S + dim T .
(3) If S fl T t^ 0, £/ien 5, T are parallel <3> one contains the other.
(4) IfSHT = 0, tfien 5,T are para/ZeZ *»
dim(5 JT) = max{dim 5, dim T} + 1.
Now prove the following theorem.
Theorem
(1) The join of two distinct points is a line.
(2) The intersection of two nonparallel planes is a line.
(3) The join of two lines with a point of intersection is a plane.
(4) The intersection of two coplanar nonparallel lines is a point.
(5) The join of two distinct parallel lines is a plane.
(6) The join of a point with a line not containing this point is a
plane.
(7) The intersection of a plane with a line not parallel to it is a
point.
E. Eigenvectors and Eigenvalues
(Part i)
In this section we are interested in linear maps of a vector space V into
itself (i.e., elements of End(V)).
Definition Let <\> : V —> V be linear. A subspace W of V is said to be
4>-stable if
cj)(W) C W.
Note that if W is 0-stable, then the restriction of 0 to W is a linear
map (f>\ w '• W —> W.
38
I. Vector Spaces and Linear Maps
Examples
(1) Let D be the real vector space of C°° functions from [0,1]
into R. The set E C D of elementary functions are those
representable in closed form (i.e., start with #, logx, e^cosx,
and arccos x, and then construct new functions from these by
using the standard algebraic operations and composition).
It is not hard to verify that E is a subspace of D. We looked
earlier at the two linear maps
e
D ^ D
a
defined by a(f) = /' (i.e., differentiation) and
*(/)(*)= fXf(t)dt.
JO
The subspace E is a-stable, but not 0-stable. By basic formulas
for differentiation, if / is elementary, then so is its derivative /'.
But if, for example, /(t)=y/2 sin2 t+cos2 t, then / is elementary,
but 9(f)(x) = Jq f{t)dt is not an elementary function. (It is
an elliptic function.)
(2) A linear map p : V —> V is said to be idem/potent if pop = p. For
such a linear map, p(V) is p-stable. Indeed suppose y G p{V),
i.e., y = p(v) for some v G V. Then
p(y) = P(p(v)) = p(v) = y
so that p restricted to p(V) is the identity map.
(3) Let (j): R2 -> R2 be denned by
(j){x1,x2) = (xi + x2, xi + x2).
Then the diagonal D = {(x,x) G R } is </>-stable because
</)(x,x) = (2x,2x) eD.
(4) Let D be the real vector space of C°° functions and let 0 :
D -^ D be differentiation. Let W = {rex|r G R}. This is a
one-dimensional linear subspace of D which is 0-stable. Indeed,
(j) is the identity map on W.
The idea of finding 0-stable subspaces W of V is to study (j> by
observing its action on the 0-stable subspaces. The point is that it is easier
to understand linear mappings on lower-dimensional spaces. In particular,
we noted earlier that a linear map a : k —► k (k a field) is eomplH.t*ly
E. Eigenvectors and Eigenvalues
39
determined by knowing its value on one element of fc, e.g., a(l). So we
make the following definition.
Definition A nonzero vector v G V is an eigenvector for 0 : V —> V if
Span(i>) is 0-stable.
A more customary way of phrasing this definition is to say that v 7^ 0
is an eigenvector for <j> if there exists some A G k such that
4>(v) = Xv.
Notice that in this statement the choice A = 0 is not excluded. Thus
every nonzero vector v G ker <\> is an eigenvector for (/>. If v is an eigenvector
for <j> then the (unique) A such that
cf)(v) = Xv
is called an eigenvalue for <j> corresponding to v.
Proposition 18 Ifv is an eigenvector for (j), andr is any nonzero element
of k, then rv is also an eigenvector for (j) with the same eigenvalue as v.
Proof: Let A be the eigenvalue of v. Then
(j)(rv) = r(j)(v) = r(Xv) = X(rv). I
This suggests that, for a linear map <j>: V —+ V, the notion of eigenvalue
may be more fundamental than the notion of eigenvector. So for A G k we
let
V{X) = {ve V\(j){v) = Xv}.
Then each V(A) contains the zero vector, and each nonzero vector in V(A)
is an eigenvector of 0 having A as eigenvalue. Note that V(0) = ker </>. Also
note that each V(A) is a subspace of V called the eigenspace belonging to A.
(Of course, we expect that most V(A) will consist of the zero vector only.)
The dimension of V(A) is the geometric multiplicity of A.
Proposition 19 J/A,/i G k and A^/i, then
v(x)nv(v) = {o}.
Proof: live V{X) n V(ijl), then
(f){v) = Xv = fJLV
ho that
(A - fj)v = 0.
Thus if A # & then v = 0. I
40
I. Vector Spaces and Linear Maps
An Important Example:
There exists a linear map <\>: Rr —> R^ such that V(X) = {0} for each
AeR.
Let ei = (1,0) and e2 — (0,1). Define <\> by <\>(e,\) = e2,4>(e2) = —e\. If
v = aei + be2 is in V(A), then
4>(v) = a0(ei) 4- b(f>(e2) = Xv = X(ae1 4- &e2)
so ae2 4-6(—ei) = Aaei 4-A6e2-
Thus Afr = a and Aa = -6. So A(A6) = Aa = -b. Since A2 ^ -1 in R
we must have 6 = 0. But then v = aei and 0(f) = a</>(ei) = ae2 = Aaei
implying that a = 0 as well as 6 = 0. So t> = 0. The point is that <j> rotates
the plane R^ by 90° and the only vector which goes to a scalar multiple
of itself is the zero vector.
Let (j) e End(F) and V 6 GL(V) (C End(F)).
Theorem 5 If v is an eigenvector for (j) with eigenvalue X, then ij)(y) is
an eigenvector for ijxfrip"1 with eigenvalue A.
Proof: ^0^""1(^(t;)) = ip(j)(v) = i/j(Xv) = Xi/j(v). I
Observation:
Suppose V has a basis {t>i,..., vn} consisting of eigenvectors of <j>. If
Ai,..., An are the corresponding eigenvalues, then <j> is simply described.
If v = a\V\ 4-... 4- anvn then
(j)(v) = aiAiVi 4-... 4- anXnvn.
Exercises
(1) Show that V(A) is, indeed, a subspace of V for each A G k.
(2) Let {ei, e2} be the standard basis for R . Find the eigenvalues
and corresponding eigenvectors for <j> : R^ —> R^ denned by
(j>(e\) = 2ei 4- e2, 4>{e2) = 2ei 4- 3e2. What is the geometric
multiplicity of each eigenvalue?
(3) Let {ei, e2, 63} be the standard basis for R . Find the
eigenvalues and corresponding eigenvectors for <\> : R** —> R** defined
by 4>{ei) = ei, 4>{e2) = ei + e2, 0(e3) = e3. What is the
geometric multiplicity of each eigenvalue?
(4) Recall that we defined <j>: R —> R in the important example
by <j>(e\) = e2, ^(^2) = —ei. Now suppose that we consider
{ei, 62} as a basis for C2 and define <\> : C2 —> C2 by 0(ei) = 62,
0(^2) = —ei- Does this change matters? That is, does <f> have
any eigenvalues now, and if so, what are the corresponding
v{xys?
F. Dual Spaces
41
(5) Given 0 G End(F), show that 0 is an eigenvalue for 0 <=>
ker 0 ^ {0}.
(6) Suppose that A is an eigenvalue for the isomorphism 0 G End(F).
Show that A-1 is an eigenvalue for (f)~1.
(7) Given 0, i\) G End(F), show that 0-0 and -00 have the same
eigenvalues. (Hint: consider the cases A = 0 and A 7^ 0
separately)
(8) Suppose that the vector space V is the direct sum of the sub-
spaces Vi,..., Vm. Also, suppose that we have endomorphisms
01, • • •, 0m where fa G End(V^). Given v G V, we can write it
(uniquely) as v = V\ H h vm where Vi G Vi. If we define
0 = fa © • • • © 0m
by fav) = fa(vi) + • • • + 4>m(vm), show that 0 is an endomor-
phism of V. Furthermore, show that each Vi is 0-stable and
that 0| v; = 0i-
(9) Suppose we have 0 G End(F) and 0-stable subspaces Vi,..., V"m
such that F = Vi © • • • © Vm. Show that 0(F) = faV{) © • • • ©
0(^m).
F. Dual Spaces
Let V be a vector space over a field fc, and let V* be the set of all linear
maps
0 : V -* ft.
Then F* becomes a vector space by defining
(rfa)(v) = r0(i>) and
(0 + ^)(v) = 0(v)+^(v)
for r G ft and 0, -0 G V*.
Definition This vector space V* is called the dual space of V, and its
elements are called junctionals.
Examples
(1) Let V = fcn and let 0 : fcn —> fc be projection on the ith factor;
i.e.,
0(a?i,...,a?n) = a:< •
Than 0 in a functional.
42
I. Vector Spaces and Linear Maps
(2) Let V be finite-dimensional and choose a basis {^i,..., vn} so
that v G V can be written as v = £it>i H \- xnvn for xi G k.
Then for any (ai,..., an) G fcn we get a functional defined by
0(v) = aixi H h anxn .
In Proposition 20, we shall see that the converse is also true.
That is, any <j> G V* can be written in this form once we choose
a basis for V.
(3) Let V = C([0,1]) be the vector space of real-valued continuous
functions on [0,1]. Then
4>{f) = / f(t)dt
Jo
is a functional.
(4) Let V be any vector space (possibly infinite-dimensional) and
let B be a basis for V. If we choose v e B, then there is a
unique functional v* such that v*(v) = 1 and v*(w) = 0 for
any w e B — {v}.
Exercise. Verify these examples.
Proposition 20 If V is finite-dimensional, then dim(F*) = dim(F) so
thatV^V*.
Proof: Choose a basis a = {t>i,..., vn} for V and define vj,..., t>* G F*
by
<(^) = <^ .
We claim that a* = {vj,... ,i>*} is a basis of V* (called the dual basis of
a).
First, a* spans V*. For if 0 G V*, then we have
0 = ^(viK + • • • + 0KX
(just apply both sides of this expression to any vt to get (j){vl)).
Next, a* is linearly independent. For if we have scalars ci,..., cn such
that
civj + • • • + cn< = 0eT,
then 0 = {c\v\ + • • • + cnv^)(vi) = ca for each z.
We get an isomorphism i/ja : V —> V* by defining
F. Dual Spaces
43
We remark that the isomorphism V = V* depends strongly on our
choice of basis (i.e., a different basis /3 for V would lead to a different dual
basis j3* and a different isomorphism -0/3)• In fact, there is no obvious way
to identify V to V* unless we choose a basis.
Since we use the term "dual space", it seems reasonable to ask if,
somehow, V is also the dual of V*. We will see that when V is finite-dimensional,
we have a natural way to identify V with the dual space (V*)* of V*. Here,
"natural" means°"basis free." That is, our isomorphism of V onto (V*)*
will not require us to choose a basis. We still have a natural map of V into
(V*)* when V is infinite-dimensional, but it is only monic in this case.
We now define our map
p:V-+(V*)*
as follows. Given v G V, we want p(v) to be a functional on V*. So given
4> G V*, we define
p{v){(j)) = (j){v) .
Exercise. Show that p(v) is, indeed, a functional on V*.
Proposition 21 p : V —> (V*)* is a linear map.
Proof: Choose v±, t>2 G V and r G k and </> G V*. Then we have
p(rvi)(0) = (j){rvi) = r0(vi) = rp(vi)(0)
and
p(vi H- V2)(0) = 0(vi H- V2) = 0(vi) H- 0(^2) = p(vi)(0) + p(v2){(j))
= (P(V1) + P(V2))(0).
Since 0 is arbitrary, we must have
p(rvi) = rp(vi) and p(vi + V2) = p(vi) + p{v2) . I
We are going to show that the linear map
p : V -► (F*)*
is monic (even when V is infinite-dimensional). This will use a logical
principle known as Zorn's lemma, so we make a slight detour to discuss
that.
A nonempty set S is partially ordered if for some pairs (a, b) G S x 5,
there is a relation, written a <b, which satisfies the properties
(1) a < a,
(2) if a < 6 and 6 < c, then a < c, and
44
I. Vector Spaces and Linear Maps
(3) if a < b and b < a, then a = b.
The elements a,b E S are said to be comparable if either a < b or b < a.
Note that we do not require all pairs of elements to be comparable in a
partially ordered set; however, if a, b G S are always comparable, then S is
simply ordered (or linearly ordered or totally ordered).
A partially ordered set S has a maximal element ao if a < ao for all
a G S which are comparable to ao- An upper bound for the nonempty
subset T C S is an element bo £ S such that b < bo for all b eT.
Zorn's Lemma: Let S be a partially ordered set, and suppose that each
simply ordered subset of S has an upper bound in S. Then S contains a
maximal element.
Now let V be a vector space over the field k. We will use Zorn's lemma to
show that any nonzero element uG^is part of a basis for V. (In particular,
V always has a basis, as claimed earlier.) Let S be the collection of all
subsets of V which are linearly independent and contain v. We partially
order S by inclusion: for a, b G S we write a < b to mean a C b.
Now let T be a simply ordered subset of S, and let U be the union of all
the elements (sets) in T. Clearly v G U, and we claim that U G S, which
will show that U is an upper bound for T. So we just need to show that U
is a linearly independent set. Suppose m,..., um are elements of U with
a\U\ + • • • + dmUm = 0 where a\,..., am G k. Then, since all U\,..., um
are elements of some r E T, and every r G T is a linearly independent set
in 5, we must have all ai = 0.
Thus 5 satisfies the hypotheses of Zorn's lemma, and we conclude that
S has a maximal element So G 5. Now s0 is a linearly independent set in
V and it contains v. If $o does not span V, we can choose w e V outside
the span of So- But then Sq U {w} is a linearly independent set containing
v, contradicting the maximality of s0. Thus we have proven
Theorem 6 Let V be a vector space over k and suppose v G V is any
nonzero element. Then V has a basis containing v as one element.
Corollary Given the nonzero element v G V, there exists a linear map
4> :V —► k such that 4>(v) — 1.
Proof: Take a basis for V containing v and apply example (4). I
Proposition 22 p : V —► (V*)* is monic.
Proof: Suppose p is not monic. Then for some nonzero v € V, we have
p(v) = 0. This means p(v) is the zero map V* —> fc; that is, for any <f> € V*
we have p{v){4>) = 0(v) = 0. But, by the corollary, some 0 mapi V to 1, a
contradiction. 1
F. Dual Spaces
45
Proposition 23 IfVis finite-dimensional, then p is an isomorphism.
Proof: We know that dim V = dim(V*) = dim((V*)*), so the monic map
p must also be epic. I
Exercises
(1) Show that the set of integers Z equipped with the standard
order is simply ordered but has no maximal element.
(2) Consider the subset of R2
S = {(xux2)\x2<0}
and define a relation on S by (^i,^) < (2/1,2/2) <=> #1 = V\
and X2 <V2- Show that S is partially ordered by this relation.
Show also that S has infinitely many maximal elements.
(3) Given the basis {(1,0,0), (1,-1,0), (2,0,1)} of R3, find its
dual basis.
(4) Generalize Theorem 6 as follows: let V be a vector space over
k and let A be a linearly independent subset of V. Show that
A can be extended to a basis for V.
(5) Let W be a subspace of the vector space V. If 0 G W*, show
that we can find a <j> G V* such that <\>
= 0.
w
(6) Given the subset W of the vector space V, we say that </> G V*
annihilates W if <j>(w) = 0 for every w G W. We call
A(W) = {(j) G V* I (j) annihilates W}
the annihilator of W.
Show that A(W) is a subspace of V*. Show also that if W C
W, then A(W0 C A(W')-
(7) If W is a subspace of the finite-dimensional vector space V,
show that
W* ^ F*/A(W) .
Conclude that dim(A(W)) = dimF - dim W\ (Hint: Define a
map
ip : V* -* W*
by ^»(0) =01^ and show that ip is surjective with kcr ip =
A(W), Now apply Theorem 4.)
46
I. Vector Spaces and Linear Maps
(8) Use exercise (7) to show that W = A(A(W)). Note that we
must identify V and (V*)* for this to make sense. (Hint: First
show that W C A(A(W)).)
(9) Let U and W be subspaces of the (not necessarily
finite-dimensional) vector space V. Show that A(U + W) = A(U)f)A(W).
If V is finite-dimensional, show also that A(U f) W) — A(U) +
A(W).
(10) If V = U © W, show that V* = A{W) 0 A(U).
Chapter II
Matrices and
Determinants
A. Matrices
We have already noticed that a linear map <j> : k —> k is completely
determined by its value on one element of fc, e.g., on 1. Also we noted that a
linear map
(j): k2 -> k
is determined by 0(1,0) and 0(0,1). We will now see how a linear map
(j)'.V -+ W
(V, W being vector spaces over k of dimensions n, m) is determined by nm
elements of k.
Let {t>i,..., vn} be a basis for V and {w\,..., wm} be a basis for W.
Then (by Proposition 8 of Chapter I) each <j>{v\) is a unique linear
combination of wi,..., wm. We write this as
(1) <j>(v%) = anwi + ai2w2 H h aimwm .
Also, we know that <\> is uniquely determined by the n vectors (in W)
</>(i>i),..., <j>(vn). Thus the rectangular array
(2) M{4>)
of nm elements of k determines 0, as long as we know the bases {v\,..., vn },
{wi,..., wm). Thta rectangular array of elements of & is called a matrix
( an
Q>21
\ dnl
«12 •
&22 •
^n2 •
• • 0>lm \
• • A2m
&nm /
48
II. Matrices and Determinants
and the one shown has n rows and m columns', it is called annxm matrix;
and M((/>) is called the matrix representation of <j> with respect to the bases
{vi,..., vn} and {wi,..., wm}. We sometimes write
M(4) = (<*y)
meaning that we will denote the element in the ith row and jth column by
aij, and we call it the i, j entry of (a^).
With the given bases for V and W and the matrix M(<j>) we can describe
<j> quite explicitly. Given x G V, we have
x = X1V1 H h £n^n uniquely.
Since 0 is linear, we have
(j){x) = xi0(vi) H h x„0(v„).
Then, by using (1), we have
(j)(x) = xi(aniyi H h aimwm) H h x„(oniiui H h anmwm)
= {xian + x2a2i H h xn«ni)^i 4- • • •
4" v^i^i7?T, 4 • • • 4* xnanrn)wrn .
In particular, we see that the coefficient of Wj in the unique expression for
(f)(x) is
(3) yj = xxaxj 4- x2a2j H h xnanj.
This is the element of k obtained by multiplying the row (a?i, x2,... #n) by
/ aij \
a2j ,
term-by-term and then adding up.
the column
\ anj J
Examples
First we give some simple examples where V = W = R2 and we us© the
basis {ex = (1,0), e2 = (0,1)} for both V and W.
A. Matrices
49
(j) M((j>) Remarks
(1) 0(ei) = 0,0(e2) = 0 ( n ) The zero linear map
(2) 4>(ei) = e2,0(e2) = 0 ( ) A nilpotent map
(3) 0(ci) = O,0(c2) = ci ( 1 0 ' NilP°tent
(4) 0(ei) = 0,0(e2) = e2 ( ) Idempotent
(5) 0(ei) = ei + e2, <£(e2) = ei - e2
0 1
1 1
1-1
(6) 0(ei) = e2,0(e2) = -ei ( ) rotation by 7r/2
(7) 0(ei) = e2,0(e2) = ei (in) reflection in line
(8) 0(ei) = — ei,0(e2) = — e2 ( ] reflection in origin
(9) 0(ei) = ei H- e2,0(e2) = ei H- e2
1 1
1 1
Exercise. Which of these nine are isomorphisms?
An example in V = W = R3: let
{vi = (1,0,0), t* = (1,1,1), v3 = (0,1,0)}
be the basis for V and
{w! = (1,0,0), w2 = (0,1,0), w3 = (0,0,1)}
be the basis for W. Define <j> by
(j){vi) = wi + w2, (j){v2) = w2, (j){v3) = u>i - iu3.
Exercise. Write out M(</>) and try to decide if <j> is an isomorphism.
We have seen that given bases for V and W, a linear map <\> determines
a matrix M{(j>) (which will, in turn, determine <j>). Suppose now we are
given bases {vi,... ,vn} for V and {u>i, ... ,wm} for W, and we are also
given an n x m matrix
Then (2^) determines a linear map i/j : V —► W by
V>(v<) = *u^l + ' ' § + ZimWm.
50
II. Matrices and Determinants
This leads to an interesting point. Suppose we have vector spaces
V, W, U of dimensions n, m, p respectively and have linear maps
Then the composition if; o <j> is a linear map from V toU. If we have chosen
bases
{vi,...,vn} for V
{wi,...,wm} for W
{tti,... ,ixp} for (7,
then we have matrices M(</>), M(/0), and M(/0 o 0) relative to these bases.
Since 4> and ^ determine i/jocf), it should be that M(</>) and M(^) determine
M{il)o(j)). We now explore how this works.
Let M(<f>) = (dij)
M{i>) = (bij)
M(^o(j)) = (cij).
Then we have
<f>(vi) = anW! H h aimi<;m
il)(wt) = bnui H h b^pUp
(lpO(f))(Vi) = CnUi-\ \-CipUp.
Using the first two expressions we can calculate the coefficient Cij of Uj
in the third expression. We have
il>(<f>(vi)) = a^Oi) H h airn^(wm)
and we see that
(4) c^- = anhj + a^j H h ctimbmj •
That is, we get the ij entry in Mfyotfii) by multiplying the i**1 row of M(</>)
term-by-term with the jth column of M(ip) and summing the resulting m
terms.
So we use this as our definition of how to multiply annxm matrix with
anmxp matrix to obtain annxp matrix, i.e., (a^) • (b^) = (cij) where
Cij is given by (4).
Example
r 23\ in) - (-im)
\0-4 7/ \-3 3/ \-33 21 /
2x3 3x2 2x2
n = 2, m = 3, p = 2
A. Matrices
51
We have also proven the following.
Theorem With <\> and i/j as above, we have
M{^ ocj)) = M{(j)) • M(V>) .
Notice that formula (3) for calculating cj){x) is just a special case of
matrix multiplication. We had bases {vi,...,vn} for V and {wi,... , wm}
for W. Using this basis for V we can think of
x = XiVx H h xnvn
as a 1 x n matrix (xi,X25...,xn) called the coordinate vector of x with
respect to the basis {vi,... , wn}- Then (3) tells us we get <j>(x) (relative to
{wu...,wm}) as the 1 xm matrix (2/1,2/2, ••• ,2/m) •
If VF — fc, then 0 : V —> fc is simply a functional on V and we have
(j){x) = (Xi,...,Xn)
as in example (2) of section IF.
/ 0,1
An Important Observation:
By now, it has probably become apparent that there is a slight
discrepancy in our notation; i.e., we write functions on the left (e.g., x 1—► </>(#)),
but when we represent a linear map by a matrix, we write it on the right
(e.g., (a?i,..., xn) 1—► (a?i,..., xn)M((/>)). This difference is due to the fact
that we will want to consider linear maps over the quaternions in Chapter
V. We will see that the quaternions are noncommutative, so our choice to
define scalar multiplication on the left (i.e., (scalar)(vector)) forces us to
write a linear map as matrix multiplication on the right.
The only real difficulty caused by this choice arises when we compose
maps: if we want to find M(ip o 0), we must compute M(</>) • M(^) instead
ofM(V;)-M(0).
It is left as an exercise to show that this difficulty disappears if we write
vectors as column matrices and multiply by the matrix of a linear map on
the loft.
52
II. Matrices and Determinants
Square Matrices
Let V be an n-dimensional vector space. We will be considering linear
maps V —> V, i.e., elements of End(F); these correspond to square (n x n)
matrices.
Proposition 1 Let {vi,..., vn} and {wi,..., wn} be any two bases for V.
Then the linear map (j): V —> V defined by (j){vi) = Wi is an isomorphism.
Proof: By the rank theorem, if <j> is monic, it will also be epic and hence
an isomorphism. So suppose <j>(y) = 0. Write v = a\V\ + ... + vnvn and
then
0 = (j)(v) = ai(j){vi) H h an(j){vn) — a\wx H V anwn.
Since {wi,..., wn} is a basis, each ai = 0 and thus v = 0. I
Notation:
If we use the same basis a = {vi,..., vn} for V as domain and V as
range, we denote the matrix for p G End(F) by Ma(p).
If / : V —> V is the identity map, then for any basis a we have Ma(I) =
(6ij) where
6ij = 1 if i = j , and
6ij =0 if i^j .
This is called the identity matrix I. Clearly Iop = poI = pior any
p e End(V), and for any n xn matrix A, AI = IA = A.
Definition The n x n matrix A is a nonsingular if there exists annxn
matrix B such that AB = £?A = /; B is called the inverse of A and is
usually written as A-1.
Proposition 2 If p E End(F) zs an isomorphism, then any Ma(p) has an
inverse, namely, Ma(p~1).
Proof: I = Ma(p~1op)=Ma(p)Ma(p~1). I
Proposition 3 A matrix A is nonsingular <^ any corresponding (j) is an
isomorphism.
Proof: <= comes from Proposition 2.
=> If A = Ma(<j>) has an inverse A~x = B = (bij), where a = {i>i,..., vn}
is some basis for V, we define i/; : V —> V by
^(^t) = &ilVl + • • • + fein^n
and note that %j) o 0 and 0 o -0 are represented by L I
A. Matrices
53
Definition Two n x n matrices A, B are similar (or conjugate) if there
exists a nonsingular n x n matrix C such that
B = CAC~1.
It is an exercise to show that similarity is an equivalence relation. (It
seems that "conjugate" is a much better word than "similar" for this
relationship.)
An obvious question to ask at this point is the following: how does the
matrix representing a given endomorphism of V change when we choose a
different basis for VI Fortunately, this is easy to answer.
Proposition 4 Let (j) e End(Vr). Let A = Ma(0) and B = Mp(<j>) for
two bases a = {vi,..., vn}, (3 = {wi,..., wn}. Then A and B are similar
matrices.
Proof: If A = (a^) and B = (^), then by (1) we have
n
<l>(wi) = Ylbi3WJ
n
fc=i
Define the matrix C = (cZJ) by
n
Wj = 2_.CjkVk (change-of-basis matrix),
fc=i
and notice that C is nonsingular by Proposition 3 (show this). We calculate
<j)(w%) two ways:
n n n
</>K) = Ylb%3W3 =^2b%3^2C3kVk = Yl [ Ylb%3C3k I Vk
3 = 1 3 = 1 k=l k=l \j=l
n I n
sr v
^(^t) = <\> (Yl Cz3vj) = Yl^iYl aikVk
U = l / 3 = 1 \k=l /
n i n
YL Y2cvaJk)Vk
k=l \j=l
This shows
BC = CA or that
B = CAC-1 . |
54
II. Matrices and Determinants
Suppose / is a function whose domain is the set ofnxn matrices. If / is
constant on equivalence classes of similar matrices (i.e., f(A) = f(CAC~1)
for every nonsingular matrix C), then we get an induced function on
End(V) by Proposition 4. That is, given 0 G End(V), we define /(</>)
to be f(Ma(4>)) where a is any basis for V.
An important example of such a function is the determinant function,
which we shall study shortly. Another such is the trace function.
Example
We define the trace of the n x n matrix A = (a^-) to be the sum of its
diagonal elements, i.e.,
trace A = an + a22 H h ctnn •
It is left as an exercise to show that similar matrices have the same trace.
Exercises
(1) Verify that similarity is an equivalence relation. (Note that if
X and Y are nonsingular, then so is their product XY.)
(2) Given the linear map 4> : V —> W and bases a = {vi,..., vn}
for V and (3 = {u>i,..., wm} for W, we have
(j){vi) = anW! H h airnwm
as before; let
*M(0)
/ an «2i ••* am \
«12 «22 * * * «n2
\ Q>lm Q>2m ' ' ' Q"nm /
and note that the ith column of lM{(j)) is the ith row of M(</>)
(see equation (2)). Given the bases a and ft, we will see that
*M(</>) is the matrix associated to </> when we write vectors as
column matrices.
(i) Given x — x\V\ H \-xnvn G V, we can write this
as the n x 1 matrix : . Similarly, we can
\ %n /
(Vl
write <f)(x) = y = yiWi + • • • + ymwm m \
\ Vm
A. Matrices
55
Now show that
/ xi
lM{(t>) :
\ xn
(ii) Given the linear maps
V»
show that we have
*M(^ o 0) = *M(^) • *M(0)
once we have chosen bases for V, W, and J7.
(3) Define <\> : R^ —> R^ by </>(xi,X2) = {2%i — 3^2, —x\ — 8x2),
and determine Ma (</>), Mp{cj)) where a is the standard basis and
j3 = {Vl = (1,1),V2 = (-1,2)}. Find C such that M^(0) =
The Rank of a Matrix.
Given the nxm matrix A = (a^) with entries form the field fc, we denote
the rows of A by Ai,..., An and the columns of A by A1,..., Am, i.e., Ai
is the vector (oji, ..., a»m) G fcm and A-7 is the vector \aij,..., anj-) G fcn.
Definition The row space of A is the subspace of km generated by
{Ai,.. 0 An}, and the column space of A is the subspace of kn generated
by {A1,..., Am}. The row (column) rank of A is the dimension of the row
(column) space of A.
An interesting fact is that the row and column ranks of A are equal.
Thus we simply refer to the rank of A. It is also true that similar matrices
have the same rank, so we define the rank of 0 G End(F) to be the rank of
any matrix which represents <j>.
More Exercises
(4) Show that the row rank and column rank of any matrix A are
equal. (Hint: Choose a maximal linearly independent subset
{Si,..., Br} of {Ai,..., An} and write
(*){
M = cn-Bi H h clrBr
^ An — cnii?i + • • • + cnrt>r
for sealant cy € fc. Now note that the jth components on the
Mt hand nido of (*) combine to give A7.)
56
II. Matrices and Determinants
(5) Show that if the square matrices A, B are similar, then they
have the same rank. (Hint: Think of the nxn matrix A as an
element of End(fcn) and observe that the rank of A is just the
dimension of im (A).)
What
is the rank of the following matrices?
(i) A=\
(ii) A=\
(-:
'I
0 N
3,
0
1
-1
i
i)
(7) Given the matrix A, if we delete some (or possibly none) of its
rows and/or columns, then we obtain a submatrix of A. For
example,
<-(!!)
is the square submatrix that we get from A in exercise (6i)
when we delete its second row. Show that r is the rank of
the matrix A o A has an r x r square submatrix which is
nonsingular, but any larger square submatrix of A is singular.
(Note that this will be easy to prove once we have developed
the theory of elementary row operations.)
(8) Given nxn matrices A and £?, show that trace(AB) = trace(£?A).
Now show that similar matrices have the same trace.
B. Algebras
Let V be a vector space over a field k. If V also has a multiplication on it
which behaves well with respect to the vector space operations, then V is
called an algebra (see Section B of Chapter I). Precisely, the multiplication
must distribute over vector addition
(5) v(w + u) = vw + vu and (w + u)v = wv + uv
and for scalar multiplication we need
(6) v(rw) = r(vw) = (rv)w for r G k.
Note that the zero vector 0 of V is a "multiplicative annihilator" because
Ov = (0 + 0)v = Ov + CH;
B. Alegbras
57
and, since V is an additive group, this implies Ov = 0.
The algebra V is associative if we always have v(wu) = (vw)u. It is
commutative if we always have vw = wv. The algebra has a unit element
if there is a 1 G V such that lv = vl = v for each v eV. (Note that such
an element must be unique. See Proposition 1 of Chapter 0.)
Example °
Let Mn(k) denote the set of all n x n matrices over the field k. We first
make Mn(k) into a vector space by defining:
for A = (dij) and B = (bij) in Mn(fc), let
A + B = (atj 4- 6»j) , and
for A = (a^-) and r G fc, we set
rA = (ra,ij) .
2
A little thought will make it clear that this is just the vector space kn of
all ordered n2~tuples of elements of k. We are just writing the n2-tuples as
square arrays.
In the previous section we saw how to multiply n x n matrices and we
readily verify
A(B + C) = AB + AC
(B + C)A = BA + CA
A(rB) = r(AB) = (rA)B,
so that Mn(k) becomes an algebra. Also it is associative since matrix
multiplication corresponds to composition of linear maps which is manifestly
associative. But Mn{k) is not commutative (for n > 1). For example,
whereas
(??)(:!!)-(! J)-
This algebra has a unit element, namely, the identity matrix
/l ... 0
\o ... i
(*«)
% = {I
if
if
i = J
i + 3
58
II. Matrices and Determinants
This clearly satisfies IA = AI = A for any A G Mn(k). The algebras
Mn(k) are matrix algebras over k.
We have seen that once we choose a basis a = {t>i,... ,i>n} for the
n-dimensional vector space V over the field fc, we get an isomorphism
V^kn
which takes x = x\Vi-\ \-xnvn GFto its coordinate vector (xi,..., xn) G
kn. This choice of basis also gives a one-to-one correspondence between
endomorphisms <j> G End(F) and n x n matrices Ma((/>) G Mn(fc); this
correspondence is an (algebra) isomorphism by exercise (2) of section A.
Unfortunately, this isomosphism is not "natural" since it depends on a.
(By Proposition 3, we know how a different choice of basis affects this
isomorphism.)
Henceforth, we will sometimes use <j> and Ma((/>) interchangeably
(especially when V = kn and a is the standard basis). We have already observed
the potential difficulty in doing this (see the "important observation" in
section A).
Example
We have already made the plane Rr into a vector space. Now we make
it into an algebra by defining
(7) (a,6)(c,d) = (ac,W).
It is easy to verify that (7) does make R2 into an algebra. Also this algebra
is associative and commutative and it has unit element (1,1). But (7) does
not make R^ into a field, because we have seen that fields have no divisors
of zero whereas (7) implies (1,0)(0,1) = (0,0).
We saw in Chapter 0 that a "better" multiplication exists on R2;
namely
(8) (a, b)(c, d) = (ac — bd,ad + bc).
This is associative and commutative, has unit element (1,0), and is a field.
Example: The exterior algebra on R**
We will introduce here the exterior algebra A on R . (In the next
chapter we will define it for all Rn.)
As a vector space, A is going to be the direct sum of four vector spaces
(9) A = A°0A1eA20A3.
A0 is just the vector space R with basis {1}. A1 is the vector space R**
with the standard basis {ei,e2,e3}.
Now A2 is going to be another three-dimensional real vector space and
we are going to write the basis elements as
{ei Ac2,ci Ae3,e2 Ac3}.
B. Alegbras
59
Finally, A3 is going to be a one-dimensional real vector space with basis
element {ei A e2 A e3}.
Our clumsy notation for basis elements (at least for A2 and A3) is in
preparation for making the vector space
A = A0 0 A1 e A2 © A3
into an algebra. Now A is an eight-dimensional real vector space with basis
(10) {l,ei,e2,e3,ei Ae2,ei Ae3,e2 Ae3,ei Ae2 A e3}
consisting of "four kinds" of elements.
We are going to define a multiplication on A which will distribute over
vector addition and thus it will suffice to define multiplication of basis
elements. Our algebra is also going to be associative.
(i) First we decree that the basis element 1 for A°(= R) is to be
the unit element for the multiplication.
(ii) Next we decide that the product of two elements in A1 is to be
in A2. Precisely, we take:
the product of e\ and e2 is to be e\ A e2,
the product of e\ and e3 is to be e\ A e3, and
the product of e2 and e3 is to be e2 A e3.
(iii) We also need to know the product of e2 and ei, e\ and ei, etc.
We set
(11) ej Aei = -(e{ Ae,)
for all i,j — even for i = j.
In particular, e$ A ei = — (ei A ei) and this must be the zero vector (in
A2) since 2 ^ 0 in R.
So the multiplication of basis elements in A1 is given by
ei
e2
^3
ei
0
-(ei Ae2)
-(ei Ae3)
e2
ei Ae2
0
-(e2 Ae3)
^3
ei Ae3
e2 Ae3
0
and using distributivity we can calculate the product of any two elements
of A1.
The general rule (11) also gives us guidance in determining other
products. For example, suppose we want to multiply e3 G A1 with e\ Ae2 G A2.
We write the product as e3 A (e\ Ae2), and we are insisting on associativity,
so we can write this as e$ A eA A e2. By (11), this equals
-ei A 63 A eg = ei A 62 A 63,
60
II. Matrices and Determinants
As another example consider (ei A 62) A (e2 A 63). This can be written
ei A (e2 A e2) A e% = 0, since e2 A e2 = 0.
Finally, let's do a fairly general example:
(a0 4- axei + a3e3 + ai3ei Ae3 + a123ei A e2 A e3)
A(62e2 4- bi2ei A e2 4- 613ei A e3)
= a>ob2e2 4- ^0^1261 A e2 4- ao^i3ei A e3
4- aib2ei A e2 4 0 4 0
4- a362e3 A e2 4 a3&i2e3 A ex A e2 4- 0
4- a13&2ei A e3 A e2 4 0 4- 0
4-04-04-0
= a>ob2e2 4- (a0&i2 4- ai^)ei A e2 4- ao&i3<3i A e3
- a3b2e2 A e3 4 {a3b12 - ai362)ei A e2 A e3.
You are asked to do a few more in the exercises.
Note that the product of an element in Ap and an element in Aq will
be an element in hP+v. (Of course Ap+q = {0} for p 4 q > 3.)
Note that in our definition of algebra we did not require the
multiplication to be associative or commutative. Given elements x,y, z of an algebra
A, we define their associator by
(12) [x,y,z] = {xy)z-x(yz).
Clearly A is associative <=> all associators are zero. For two elements x, y
of A we define their commutator by
(13) [x,y]=xy-yx.
A is commutative <=> all commutators are zero.
Recall that the algebra End(V) is associative but not commutative.
From End(F) we construct a new kind of algebra, a Lie Algebra, by defining
the product of x,y G End(F) to be their commutator
[x,y] = xy-yx.
Clearly
(14) [y,x] = -[x,y].
So, in particular, [x, x] = 0 for all x G End(F).
This algebra fails to be associative, but satisfies the Jacobi identity
(15) [x, [?/, z] ] + [?/, [z, x) ] + [z, [x, y] ] = 0,
the proof of which is a routine exercise.
Theorem 1 Let A be an associative algebra. Then A with the new
multiplication
[x, y] = xy - yx
is a Lie Algebra (i.e., [x,y] = — [l/i$c] arad Wie Jacobi identity is aatiiifird).
C. Determinants, the Laplace Extension
61
Example
There is an important three-dimensional Lie Algebra. Let x = (1,0,0),
y = (0,1,0), h = (0,0,1) in R3. Define
[h, x] = 2x, [/i, 2/] = -2?/, [x, 2/] = ft, etc.
(This is the Lie Algebra associated with the Lie group whose elements are
quaternians of unit length.)
Exercises
(1) Compute the following products:
(i) (e2 A ei) A (e2 A 63) (Note that this can be done two
different ways: either using (11) or the fact that
A4 = WO
(ii) (ei 4- e2 A e3) A (ei + e2 A e3)
(iii) (3e2 - 2ex A e2 + 4e2 A e3) A (2 - 5ex)
(2) Prom (11), we concluded that e$ A e$ = 0. Given any uieA,
can we conclude that w A w = 0? (Hint: See (ii) above.)
(3) Prove Theorem 1, i.e., show that the new multiplication
satisfies (5), (6), and (15).
C. Determinants, the Laplace Expansion
We will now define the exterior algebra A "over" Rn (for n = 3, see Section
B). We fix n. As a vector space, A will be a direct sum
(16) A = A0 © A1 © ... © An.
Just as before, A0 = R with basis {1}. Similarly, A1 = Rn with basis
{ei, e2,..., en}. Then A2 will be a real vector space with basis {el A e311 <
i < j < n}. A3 will have basis {e^ A ej A e^|l < i < j < k < n}, etc.
Finally, An will have basis {ei A e2 A ... A en}.
n\
We note that the dimension of Ap (p = 0,1,..., n) is —-—-—r-.
p\(n — p)\
Exercise. Show that dim(A) = 2n.
For the algebra structure we insist (just as we did for n — 3) that:
(i) multiplication is to be associative,
(ii) multiplication is to distribute over addition, and
62
II. Matrices and Determinants
(iii) for all i, j, e$ A e3- = —(e^ A e^).
Just as for n = 3, we see that these conditions suffice to define the
multiplication. If x G Ap and y € A9, then x Ay E Ap+q (of course if p + g > n,
then A*>+<* = {0}).
If in e^ A ei2 A ... A eim any two are equal, then this product is zero.
For by (iii), we can move the two equal ones together by sign changes and
then (again by (iii)) the product is zero.
Proposition 5 Ifx,y G A1 are linearly dependent, then x A y = 0.
Proof: For any x,y € A1 we can write x = x\e\ + ••• + xnen, y —
yi^i H h 2/nen and, using (i), (ii), and (iii), we have
xAy = ^2xiVjei A eo = 5Z(^2/j ~ ^jZ/tJet A e,.
hj i<3
Now suppose that for some r E R, y — rx. Then each yi — rxi and
x Ay = 2_l(xirx3 ~~ xjrxi)ei A ej =0. I
Corollary For xgA1,xAx = 0.
Proposition 6 J/t>i, t>2,..., t>m G A1 are linearly dependent, then fi Af2 A
... A vm = 0.
Proof: We have 0 = ait>i H + ami>m with not all a; zero. We may as
well say a\ ^ 0. Then
vi = b2f2 H h ^m^m (with 62 = , etc.).
ai
So we have
vi A V2 A - • • A vm = &2i>2 A V2 A v% A • • • A vm
+ 631>3 A 1>2 A V3 A • • • A Vm
+ &m^m A V2 A V3 A • • • A Vm .
Each term has a repeated Vj and is thus zero. I
It is important to notice that the contrapositive of Proposition 6 is a
criterion for linear independence: if Vi,... ,vm € Rn(= A1) satisfy v\ A
• • • A vm -fi 0, then vi,..., vm are linearly independent.
C. Determinants, the Laplace Extension
63
We are now ready to discuss determinants of matrices. In Rn, ei,..., en
are ordered n-tuples and can be considered to be 1 x n matrices (or "row
vectors"). So given annxn real matrix A, the matrix product
eiA
is again a 1 x n matrix which we see is just the ith row of the matrix A.
Thus each a A e Rn = A1. It follows that
e\A A • • • A enA
lies in the one-dimensional vector space An and thus is some real multiple
of the standard basis vector e\ A e2 A • • • A en for An, i.e.,
(17) e-iA A • • • A enA = (det A)e\ A • • • A en.
Definition The real number det A is called the determinant of the real
n x n matrix A.
Example
We calculate det A for A =
e\A = ae\ + 6e2, e2A = cei + de2
exANe^A = (aei -\-be2) A (cei +^e2)
= ad ei A e2 + 6c e2 A ei
= (ad — 6c)ei A e2.
So det I , I = ad — be.
\c d J
Notice that (17) not only defines det A, but (using (i), (ii), and (iii))
also furnishes a means for calculating det A.
Proposition 7
(i) If I is the n x n identity matrix, then
det/= 1 .
(ii) If the rows of A are linearly dependent, then
det A = 0 .
::)
(iii) For r € R, det (rA) = rn(det ^4).
64
II. Matrices and Determinants
Proof:
(i) The rows of I are e\,..., en so
ei A • • • A en = (det X)e\ A • • • A en.
(ii) By Proposition 6, e\ A A • • • A enA = 0.
(iii) The rows of rA are re\A,..., renA and
reiA A • • • A renA = rn(eiA A • • • A enA) = rn(det A)e\ A • • • A e„. I
Let C be an n x n real matrix. Then we know how C operates on A1
and we now define an operation of C on a product v\ A • • • A vp (^ E A1)
by
(18) (vi A ■ • ■ A vp)C = v-iC A • • • A vpC .
Now we get a very important property of
det : Mn(R) -> R.
Proposition 8 Let A,B be n x n real matrices. Then
(19) det(AB) = (det A) (det B).
Proof: By definition
e-i(AB) A • • • A en(AB) = det(AB)ei A • • • A en.
Also
ei(AB) A • • • A en(A£) = {erA A • • • A enA)£ (by (18))
= (det A)(ci A • • • A en)£ (by (17))
= (det A)(eiJ5 A • • • A enB) = (det A)(det B)ei A■••A en . I
A very useful property of det is:
Proposition 9
(20) A is nonsingular <=> det A -=f=- 0.
Proof: =$> By hypothesis A"1 exists. So 1 = det I = det(^L4-1) =
(det A)(det A"1), proving det A ^ 0.
<= By (ii) of Proposition 7, the rows of A (which are the images of
ei,..., en under the linear map A) are linearly independent. Thus A is an
isomorphism of Rn and A"1 exists. |
Recall that two nxn matrices A, B are similar if there is a nonsingular
nx n matrix C such that B = CAC~l.
C. Determinants, the Laplace Extension
65
Proposition 10 det : Mn(R) —> R is constant on equivalence classes of
similar matrices.
Proof: If B = CAC~\ then det (B) = det {CAC-1) = det (C) det {A) det (C"1)
and (det C)(detC-1) = det (CC'1) = det J = 1. Thus
det £ = det A . I
Observation:
Notice that our development of the determinant function has only used
the fact that R is a field — we have not used any properties unique to the
real numbers. Thus, for any field fc, we get a determinant function
det : Mn(k) —> k
which is constant on equivalence classes of similar matrices, etc.
Exercises
(1) A matrix D = (d^) G Mn(k) is diagonal if dtj = 0 whenever
i^j. Show that if D is diagonal, then det D = d\\d^2 • • • dnn.
(2) A matrix A = (a^) is (weakly) upper triangular if a^- = 0
whenever i> j. Show that this implies det A = ai 1*222 •. • ann-
A is lower triangular if a^ = 0 whenever z < j. Show that this
also implies det A = an ... ann. (A is called triangular if it is
either upper or lower triangular.)
(3) The transpose lA of matrix A = (a^) is lA = (a^). (That is,
the rows of lA are the columns of A.) Show that t(tA) = A and
that det* A = det A.
(4) Define the (real) general linear group to be all nonsingular
matrices A e Mn(R); i.e.,
GL(n,R) = {Ae Afn(R) | det A 7^ 0} .
Show that GL(n, R) is, indeed, a group. Note that GL(n, R)
is simply GL(Rn).
(5) Let
^11 &12 ai3
i4 = I tt21 &22 &23
&31 &32 a33
66 II. Matrices and Determinants
Show that
det A = an det (°22 °23 )- fll2 det ( °21 °23 )
V a32 033 J \ a3i a33 )
+ a13detfa21 a22V
(This is a special case of the "Laplace expansion.")
The Laplace Expansion
Recall that our definition of det A for an n x n matrix A is
(21) e-iA A • • • A enA = (det A)er A • • • A en .
We can rearrange things in this product to get some formulas for det A in
terms of some determinants of (n — 1) x (n — 1) matrices. This can be useful
sometimes in calculating det A, and can be very useful in getting inductive
proofs of certain theorems about determinants.
First off, let us rewrite (21) as
(anei H h alnen) A • • • A (anle-i H h annen) = (det A)e-i A • • • A en
Next we do the case n = 3 to make the procedure clear without too much
notation.
(anei + ai2e2 + ^13^3) A (a21ei + ^22^2 + ^23^3) A (031^1 + ^32^2 + a33e3)
= anei A (a2iei + a22e2 + ^23^3) A (031^1 + ^32^2 + a33e3)
+ a\2e2 A (a2iei + a22e2 + 023^3) A (a31ei + ^32^2 + a33e3)
+ ai3e3 A (a2\e\ + a22e2 + a2^e^) A (a31ei + ^32^2 + a33e3)
= anei A (a22e2 + ^23^3) A {a<$2e2 + a33e3)
+ ai2e2 A (021^1 + ^23^3) A (031^1 + ^33^3)
+ ai3e3 A (a2iei + ^22^2) A (a3iei + a32e2)
= ^11^22^33 ei A e2 A e3 + 011023^32 ei A e3 A e2
+ ^12^21^33 e2 A ei A e3 + ^12^23^31 e2 A e3 A ei
+ ai3a2ia32 e3 A ei A e2 + ^13^22^31 e3 A e2 A ex
= {o^ii{a22a33 - a23a32)
- ct12(a2iass - a2Sa3i)
+ ^13(021^32 ~~ a22fl3i)} ei A e2 A e3
> (*)
C. Determinants, the Laplace Extension
67
Let Bij be the 2x2 matrix obtained from A by deleting the ith row
and jth column. Then we have proved that
det A = an det Bn — a\2 det B\2 + ai3 det £?i3 .
For annxn matrix A, the definition of B^ (an (n — 1) x (n — 1) matrix)
is similar.
Proposition 11 For an n x n matrix A, we have
(22) det A = an det Bn - ai2 det B12 + • • • + (-l)n+1am det £in.
Proof: Same as for n = 3. I
Equation (22) is the Laplace expansion about the first row of A. More
generally, we can calculate det A by computing the Laplace expansion about
the ith row of A, i.e.,
det A = {-lY^an det Ba + • • • + {-l)i+nain det £in .
Proposition 12 For an n x n matrix A we have
(23) det A = an det Bn - a2i det B21 + • • • + (-l)n+1ani det J3nl.
(This is the Laplace expansion about the first column of A. We can also
calculate det A using the Laplace expansion about the jth column of A.
Exercise. Write out the appropriate formula.)
Proof:. Again the general case should be clear from the computation for
n = 3.
We rewrite (*) as
(anei A (a22e2 + ^23^3) A (a32e2 + a33e3)
+ (^12^2 + ai3e3) A (a2iei) A (a32e2 + a33e3)
+ (^12^2 + ai3e3) A {a22e2 + a23e3) A (a3iei)
and this equals
^11^22^33 ei A e2 A e3 + aiia23a32 ei A e3 A e2
+ ^2iai2a33 e2 A ei A e3 + a2iai3a32 e3 A ei A e2
+ «3i«i2^23 e2 A e3 A ei + a3iai3a22 e3 A e2 A e\
= {an(a22a33 - a23as2) - a21(a12a33 - a13a32)
+ a3i(ai2a23 - a13a22)} e\ A e2 A e3 .
Thus
deti4 = an dot JBu — a2i detB2i + a3i detB3i. I
II. Matrices and Determinants
Exercises
(1) Recall that we define the transpose lA = faij) of an n x n
matrix A = (a^) by *a^- = a^. (Definition: A is symmetric if
A =tA.) Prove the following.
(i) *(S4) = A, *(A + B) =*A +*B.
(ii) A +lA is symmetric.
(iii) *(AB) =*BS4.
(iv) A$4 is symmetric.
(v) det 54 = det A.
(Hint: Use the Laplace expansion about the first
row of A and the first column of 54. Then use
induction.)
(vi) If the nonsingular matrix A is symmetric, then A-1
is also symmetric.
{ A C \
(2) Show that the determinant of ( R J is (det A) (det B),
where A and B and square submatrices. Generalize this to
calculating the determinant of
Mi \
V 0 Am J
where Ai,..., Am are square submatrices.
Elementary Row Operations.
(3) Let A = (a^) be an element of Mn(k). The following three
operations are called elementary row operations.
E2
E'3
Interchange the ith row and jth row
Multiply a row by a nonzero scalar r £ k
Replace ith row by (ith row) + (jth row)
Two matrices M, AT are row equivalent if AT can be obtained
from M by a finite sequence of elementary row operations.
Show that this is an equivalence relation.
(4) Let £3 be the operation: E2 and then E'z. Show that Eu E%,E^
still give the same equivalence relation.
C. Determinants, the Laplace Extension
69
(5) Show that row equivalent matrices have the same row space.
(6) An n x n matrix A = (a^) is in echelon form if the number of
zeros preceding the first nonzero entry in a row increases row
by row, stopping this increase only when arriving at an all zero
row (then succeeding rows must be all zeros). The first nonzero
elements in rows of an echelon matrix are called distinguished
elements. Prove that any matrix can be reduced to echelon
form by an appropriate sequence of elementary row operations
^1,^2,^3-
(7) Show that r is the rank of the n x n matrix A ^ A has a
nonsingular r x r submatrix, but any larger square submatrix
of A is singular.
(8) An echelon matrix A is row reduced if its distinguished elements
satisfy:
(i) each is the only nonzero element in its column, and
(ii) each is equal to 1.
Prove that any echelon matrix can be put in row reduced form
by an appropriate sequence of Ei,E2,E%.
(9) Let A be an n x n matrix. Show that det A is changed by
Ei,E2, and E% in the following manner:
(i) det(E1(A)) = -detA,
(ii) det(E2(A)) = r(det A), and
(hi) det (ES(A)) = detA
(10) An nxn matrix e is an elementary matrix if e can be obtained
from the nxn identity matrix I by a single Ei £ {Ei,E2,E$}.
Show that we can perform an elementary row operation on a
matrix by multiplying by the appropriate elementary matrix.
Now show that if A is an invertible (= nonsingular) nxn
matrix and A is row reduced to the echelon matrix B by
elementary row operations (i.e., there are elementary nxn matrices
ei, e2,..., eq such that B = eq ... e2£\A), then we must have
B = I. Conclude that invertible nxn matrices are precisely
those which are products of elementary matrices.
Given A € Mr,(fe), note that Ei,E2,E% make evaluating det A easier,
since we can simplify A to the matrix B with £q, I?2, E3 (being careful to
keep track of how those changes relate det A to det B): we can either row
reduce A in which cblhg B is a diagonal matrix (and det B is simply the
70
II. Matrices and Determinants
product of its diagonal elements); or we can make all (but possibly one)
elements in a column of B equal to 0, and then calculate det B by using
the Laplace expansion about this column.
Example
/ 2 0
If A = \ 12
V-l 4
third row to obtain
, then subtract twice the second row from the
B
We have applied the operation E% to obtain £?, so det A = det B. We can
calculate det B by expanding about the second column:
det B = 2 det
2 1
-3 -8
= 2(-16-(-3)) = -26.
More Exercises
(11) Compute det A for the following matrices.
(i) A-
(ii) A:
(iii) A =
(12) If we replace "row" by "column" in exercise (3), we get the
elementary column operations on A £ Mn(k). Give
appropriate versions of exercises (3)-(10) where the notions of row
equivalence, etc., have been replaced by the notions of column
equivalence, etc. Now prove these revised exercises. (Hint for
(9): recall that det*A = det A.)
(13) Some of the exercises (3)-(10) generalize to the case ofnxm
matrices. Decide which of these exercise can be generalized,
give appropriate versions, and than prove them,
D. Inverses, Systems of Equations
71
D. Inverses, Systems of Equations
For A = {cbij) € Mn(fc), we know that A has an inverse A-1 <3> det A ^ 0.
But this does not tell us how to calculate A-1 when we know it exists. In
this section we determine a way to calculate A"1.
For the i,j entry a^ of A, we define its cofactor to be
^•-(-l^'det^
where, as before, Bij is the submatrix of A that we get when we delete the
ith row and jth column of A. Then we define an n x n matrix C — (cij) by
C ='(6«) .
that is, c%j = bji. This matrix C is called the classical adjoint of A, and we
will denote it by adj A.
Example
n = 2
^ = / aH a12 \
V a21 «22 /
■Bn = a22, #12=021, B21 = ai2, -822=011.
Then adj A = ( 22 12 1. Consider the product
' A(adjA)=( aiia22~ai2a21 °
V V ° -021012 + 011022
So A(adj A) = (det A)/. Similarly, (adj A)A = (det A)I. So, at least in
the case n — 2, when det i^Owe get A-1 easily from adj A. This is true
in general.
Theorem 2 A (adj A) = (adj A)A = (det A)/.
Corollary //det A 7^ 0, tfien A"1 - (h^Wj ^-
Proof: Now Theorem 2 makes an assertion about the matrix product
AC = A (adj A), and formula (22) says that the 1,1 entry in this product
is det A. To prove the theorem, we must show that the remaining diagonal
entries in AC are det A and all nondiagonal entries are zero.
First we calculate the Laplace expansion about the ith row of A to get
det A = (-l),+1Oii det Ba + • • • 4- {-l)i+nain det B,{
(24) = aubu +--- + ainbin
in
72
II. Matrices and Determinants
which is just the i, i entry in AC.
Now suppose i 7^ fc, and we write an expression similar to the right
hand side of (24):
(25)
{-l)k+1an det Bkl + • • • + {-l)k+nain det Bkn.
We recognize (25) as the i, k entry of AC. But it is also det D where D is
the matrix obtained by replacing the kth (^ ith) row of A by the ith row
(and leaving the ith row intact). Of course, since D has two identical rows,
det D = 0 by (ii) of Proposition 7.
The proof that CA = I is almost exactly the same (starting off with
(23) instead of (22)). So we have proven Theorem 2, and the corollary tells
us A~x whenever det A ^ 0. I
Exercise. Calculate (25) for n — 3, i = 1, k = 2. Show it gives the 1,2 term
of AC and it also gives
an «12 «13
det | an ai2 ai3
«31 &32 «33
Systems of Equations
Let fcbea field of characteristic zero. Let A be an n x m matrix over
k and let y G km. We are concerned with the questions:
(26)
(a) Does there exist xGfcn such that
xA = yl
(b) When such an x does exist, is it unique?
We can write (26) as
' an
(x1,x2,...,xn)
&nl
or even
(*)
an^i +
fllm^l +
&lra
(2/l,2/2,...,2/m)
+ «nl^n = 2/1
~r OjnrnXn — Z/r?
The a jo are the coefficients and the unknowns. We have m
equations in n unknowns. Note that if y = 0, then we always have the
trivial solution x = 0.
D. Inverses, Systems of Equations
73
Now, using standard bases for kn and fcm, we have
(26r) A:kn -+km linear.
The answer to (a) is: yes <=> y G A(kn). (Notice that in (26), we
have matrix multiplication so we write A on the right; but in (26'), we are
considering A to be a linear map so we write A on the left, as in A{kn).)
So if n < m, theje will be some t/'s not in the image A(kn) and xA = y
will have no solution for such y's.
The answer to (b) is: yes <3> ker A = {0}. That is, if ker A ^ {0}, then
no solution can be unique; for if xA = y and v G ker A, then (x + v)A =
xA + 0 = y. Conversely, if x, x' G kn are distinct elements such that
xA = x'A = ?/, then (x — x')A = y — y = 0 so that x — x' e ker A. In
particular, if n > m, we must have ker A ^ {0}.
So the interesting case is n = m. In this case, if det A ^ 0, then A is an
isomorphism and there is a unique solution. If det A = 0, then the answer
to both questions is no.
The previous discussion is interesting from a theoretical point of view,
but it sheds little light on how to actually solve the system (*), if a solution
exists.
Recall that in section IA we briefly discussed the method of elimination
when we solved a system of three equations in three unknowns. We now
extend this method to the general system (*) by using elementary row
operations to simplify the system. (The elementary row operations E\, E<i, E%
are discussed at length in the exercises of the previous section.)
We begin by writing (*) as an m x (n + 1) matrix
/an ... ani
\ &lra • • • Q"nm
and then putting this matrix into row reduced form using E\,E2, and E$
(by exercise (13), we can always do this). Prom this new matrix, we can
simply "read off" the solution, if it exists.
Some examples should suffice to illustrate this technique.
Examples
(1) Starting with the system
3xi + 2x2 = 2
2xi + x2 = -1 ,
w© write this as
/ 3 2 | 2 \
\,2 1 |-1 j
V\ \
Vm J
II. Matrices and Determinants
Applying the elementary row operations to this matrix yields
the row reduced matrix
1 0 |-4
0 1 | 7
Thus the (unique) solution is (xi,x2) = (—4,7). Note that we
expected this situation since
det ( I j]=3-4 = -1^0.
(2) Starting with the system
xi + x2 = 3
2xi + 3x2 = 4
x\ — 2x2 = -2 ,
we write this as
which is row equivalent to
The third row of this matrix tells us that 0 = Oxi + 0^2 = 1;
since this is clearly impossible, we conclude that no solution
exists.
(3) Starting with the system
xi - x2 + 2x3 = 1
2xi - x2 + 3x3 = -1 ,
we obtain the row reduced matrix
/ 1 0 1 | -2
^ 0 1 -1 | -3
Thus we seek (xi,X2,#3) such that
X\ + xs = -2
x2 - a?3 = -3 ,
D. Inverses, Systems of Equations
75
or equivalently,
xi = -x3 - 2
X2 = ^3 — 3 .
So any choice of xs E R will determine xi and X2 so that
(xi,x2,x3) is a solution.
It is an interesting fact that these examples exhaust all possibilities for
the number of solutions to (*); i.e., (*) either has zero, one, or infinitely
many solutions.
Exercises
(1) Find the inverses of the following matrices by using the
corollary to Theorem 2.
(i) A
-(-ii)
1 0 -1
(ii) A= ( 2 1 0
0 -1 1
(2) If A is a nonsingular n x n matrix, show that A~x can also be
computed as follows. Form the n x 2n matrix
(A | /)
and row reduce this to
(/ | B) .
Then B = A~1.
(Hint: We know that A is the product of elementary matrices
by a previous exercise.)
(3) Find A'1 where
A =
(4) Solve the following systems of equations or indicate that no
solution exists.
( 2
-2
1
V-i
5
-3
3
-6
-3
2
-2
4
-2\
-5
2
3/
,,, 2x[ + 3a?2 = -1
1 ' hxi + 7x2 = 0
76
II. Matrices and Determinants
Xi — 2^2 + 3^3 = 1
2xx + x2 + 5x3 = -2
xi- x2 = 4
2xi + x2 = 5
—xi — 3x2 = —6
Homogeneous Systems.
Let A be an n x m matrix over k and consider the homogeneous system
of linear equations
xA = 0
where x G kn. We know that x = 0 is a solution; what else can we say
about the solution space
S = {x G kn | xA = 0} ?
(5) Show that S is a subsapce of kn.
(6) Let W denote the row space of A, and show how we can
consider x G S to be an element of the annihilator of W. Now
show that S ^ A(W).
(7) Show that dimS = dimfcn - dimW = n - rank(A). (Hint:
Recall exercise (7) of section IF.)
(8) What is the dimension of the solution space for the following
homogeneous system?
3x + 2y - z = 0
x +2 = 0
- 2x -32/ + 2z = 0
E. Eigenvalues
(Part ii)
Let V be a finite-dimensional vector space over a field k and let <\> G End(V).
Recall that A G fc is an eigenvalue for <\> if there exists a nonzero vector v G V
such that
4>(v) = At>.
Another way of saying this is to define i/j\ :V —> V by i/)\(v) = </>(i>) — A^.
Then ^ is an endomorphism of V and clearly
(27) A is an eigenvalue of <j) <=> ker ip\ ^ 0.
(ii)
(iii)
E. Eigenvalues
77
Examples
(df=r2, ,=(; j).
For A E R and v = a e\ + 6 e2 consider
^a(v) = 0(f) - Av.
Since </>(ei) = e2 and </>(e2) = 0, we get
i/>\(v) = oe2- X(aex + be2) = (-Aa)ei + (a - A6)e2.
If ^a(^) is to be 0, we must have — Xa — 0 and hence a = 0
or A = 0. If A ^ 0, then a = 0 and A6 = 0 so 6 = 0, and
we conclude that v = 0. Thus 0 has no nonzero eigenvalues.
This is not surprising because <j> is nilpotent (of order two) (i.e.,
cf) o cf) = (j)2 — I J the zero endomorphism). So if
(f)(v) = Xv, O = 0o (f)(v) = 0(Av) = X(/)(v) = A2t>
and this cannot happen for v ^ 0 unless A = 0. The same
argument works for nilpotent endomorphisms of higher order.1
Exercise. Show that A = 0 is, indeed, an eigenvalue for <\> (in
example (1)) and that e2 is the only corresponding eigenvector
(up to scalar multiplication).
(2)V = K?, ,= (g J).
Now </>(ei) = 0 and </>(e2) = e2. So if A ^ 0 and v = a e\ + 6 e2
and we evaluate i/j\(v), we get ^a(^ ) — (—Aa)ei + (6 — A6)e2.
If this is to be zero, we must have a = 0. If v ^ 0, then 6^0
and (1 — X)b = 0 so we must have A = 1. The eigenvector is
any nonzero scalar times e2. Here <j> is idempotent (<j> o </> = </>)
and does have exactly one nonzero eigenvalue.
Exercise. Show that A = 0 is also an eigenvalue for <j> (in
example (2)) and find a corresponding eigenvector.
(3) y = R2, *=(_J J).
Proceeding as in (1) and (2), we get
t/)\(v) = (~b - Xa)ei + (a — A6)e2.
lt,g., if $a ■ 0 ftnd ${v) = At;, than 0 = $3(w) as X^v again implying A ■ 0.
78
II. Matrices and Determinants
If this is to be zero, then a = Xb and so
0 = b + Xa = b + X2b = 6(1 + A2).
Thus 6 = 0 and a = 0, and <\> has no (real) eigenvalue. This is
not surprising since 0 is a rotation of \ radians.
Exercise. If we pass to the extension field C of R, then we
have V = C2 and A G C. For <\> (as in example (3)), find
two distinct eigenvalues in this extension field. What are two
corresponding eigenvectors?
f X a b
(4) F = R3, AgR*=R-{0}, 0=1 0 A c | . We also
\ 0 0 A
assume a, 6, c G R*.
/ 0 a b \
Now ip\ = 4> — XI = I 0 0 c J, and we see that e3 e
\ 0 0 0 /
ker i/j\. Thus by (27), A is an eigenvalue for (/>. Note that
tf =
and thus, although ip\(e2) = (0,0, c) ^ (0,0,0), we have ifr^fa) =
(0,0,0). So ker Va c ker ^l- We have
0 £ ker Va c ker ^a c ker V>a = r3 •
Span(e3) Span(e2, e3) Span(ei, e2, e3)
Exercise. Show that if we allow A = 0 (in example (4)), then
<j> is nilpotent of order 3.
In example (2) above I V — R , 0=1 n ) ) we have that the only
nonzero eigenvalue is 1, and we find that if v = aeiH-6e2, then ip\{v) = —ae\
and thus v G ker i/ji <& a = 0.
Now
= 0(-aei)-(-aei)
= 0-faei.
So in example (2), ker ipi = ker V>i(= Span (e2)) and the dimensions of
the kernels of ip™ stabilize at 1. In example (4), the dimensions Increased
right up to th© dimension of F(= Br).
E. Eigenvalues
79
Definition The algebraic multiplicity of an eigenvalue A of an endomor-
phism (f>: V —> V is the maximum dimension of the subspaces 0 C ker i/>\ C
ker ifc C • •.. Since we are assuming V is finite-dimensional, this will be an
integer < dim V. If the maximum dimension is zero, this just means that
A is not an eigenvalue of <j>.
Recall that back in Section E of Chapter I, for each A G k we defined a
subspace V(A) of V consisting of all v G V which had A as an eigenvalue.
We proved that If /i ^ A, then V(/i) D V(X) = {0}. In our present notation,
V(X) = ker i/j\, and now we sometimes have bigger spaces (i.e., ker ip\)
associated with A. We will now show that for A ^ /i, these spaces also
intersect only in zero.
To simplify our notation, we fix </>, A, and /i (where A ^ /i). Then we
write Ki for ker i/>lx and Lj for ker ift where i,j = 1,2, Note that we
have 0 C Kx C K2 C • • • C Ki C • • • and 0 C Lx C L2 C • • • C Lj C • •..
We now claim that Ki fl Lj = {0} for each i,j.
Proof: Since i/>\ = (j) — AI and ^a IS ^ne zero endomorphism on Ki, we
have that (0 — A J)2 = 0 on Ki. Similarly, (0 — /i/)-7 =0on Lj.
Note that (since A ^ /i) the polynomials (x — X)1 and (x — /i)-7 have
no common factor. We will see in section B of the next chapter that the
previous statement has the following consequence: we can find polynomials
a(x), b(x) such that
(28) 1 = a(x)(x - Xy + b(x){x - fx)j .
Now suppose we have v G KiC\L3. We wish to show that v must be 0.
By (28), we have
V = I(V) = [a(0)(0-Air + K0)(0-^)J]W
= a(4>)(4>-\iy(v) + b(4>)(4>-iiiy(v)
= 0 + 0 = 0. I
Note that everything we have said in this section about 0 G End(F)
applies equally to A G Mn(k); e.g., A G k is an eigenvalue of the n x n
matrix A if there is a nonzero x G kn such that xA = Ax. We will soon see
an easy way to calculate the eigenvalues of A.
Exercises
(1) Given any A e k, show that </> commutes with x/j\ = 4> - XL
(2) Show thai Ki - ker ^^ is 0-stable.
II. Matrices and Determinants
(3) Let Vi,...,vm be nonzero eigenvectors for <\> G End(V)
corresponding to the distinct eigenvalues Ai,..., Am. Show that
Vi,..., vm are linearly independent.
(4) For V = R and A as below, find the eigenvalues of A along
with the geometric and algebraic multiplicities of each.
Vi 1 0
(i) A = I 0 Ax 0 I , Ax ^ A2
(ii) A
0
0
3
2
1
Ai
0
1
4
1
0
A2
1
2
3
Chapter III
Rings and Polynomials
A. Rings
A ring R is a generalization of a field. It has operations of addition and
multiplication, and R must be an abelian group under addition (just as
for a field). However, the only requirement for multiplication is that it
distribute over addition:
(1) a(b + c) = ab + ac, and (a + b)c = ac + bc .
Example
If G is an abelian group (with identity element 0), we can make G into
a ring by using the operation on G for addition and defining ab = 0 for all
a,beG.
Definition A ring R is associative if its multiplication is associative; it is
commutative if its multiplication is commutative; it has a unit element if it
has a 2-sided multiplicative identity (i.e., a 1 G R such that for any x G R
we have xl = lx = x). If R has a unit element, then a G R is called a unit
if there exists b G R such that ab = 1 = ba.
If R has a unit element, it is unique.
Examples
(1) The ring Z of integers is associative and commutative and has
a unit element.
(2) The ring M2(R) of 2 x 2 real matrices is associative and has a
unit element, but it is not commutative. The same is true of
Mn(k), for any n > 1 and any field k.
In Chapter V we will encounter some non-associative rings. But for this
chapter wt will assume that ring means associative ring.
82
III. Rings and Polynomials
Definition A subset 5 of a ring R is a subring of R if the operations on
R induce operations on S so that S becomes a ring.
Example
In the ring Z of integers, the subset 2Z of all even integers is a subring.
Definition A nonempty subset 5 of a ring R is an ideal if:
(i) r G R and s G S => rs G S and sr G 5, and
(ii) «,t G 5=> s-t G 5.
Let 5 be an ideal in i?. Since 5 is nonempty, there is some s G 5 and
thus s — 5 = 0 G 5. Thus every ideal contains zero. We also note that the
single element {0} is itself an ideal, ((ii) is obvious and (i) is just the proof
in Chapter 0 that a0 = 0.)
Suppose R has a unit element 1. Then any ideal 5 in R containing 1
must equal R. (By (i), r G R => rl{= r) G 5.) For a field &, only {0} and
k are ideals. Because if I is an ideal in &, either / = {0} or / contains a
nonzero element x. In the latter case, x has an inverse x~l G k and (by
(i)) x~xx = 1 G / implying I = k.
Definition If i?, T are rings, a map
(j):R->T
is a (ring) homomorphism if for all r, s G R we have
(i) 0(r + 5) = 0(r) + 0(5),
(ii) 0(rs) = 0(r)0(s), and
(iii) if i?, T have unit elements 1,1/ we also require that (f)(1) = V.
Note that a ring homomorphism sends the zero of R to the zero of T.
Proposition 1 If (f) \ R—±T is a ring homomorphism, then
(i) </>(R) is a subring of T, and
(ii) ker</> is an ideal in R.
Proof: If a, b G 0(iJ), then there exist elements x, y in R such that
</>(#) = a and </>(y) = b.
Then
0(x ± y) = a ± b implying a ± 6 G 0(i2), and
</>(a?y) = a& implying ab € $(J?).
A. Rings
83
This proves part (i).
For (ii), suppose x G ker </> and r G R. Then </>(rx) = (j)(r)0 = 0, so
rx G ker </>. Similarly, xr G ker </>. For x, y G ker 0, clearly x — y G ker 0.
i
Just as we did for vector spaces, we say that two rings i?, T are
isomorphic if there is a ring homomorphism <j> : R—* T which is both surjective
(i.e., (j){R) = T) and injective (i.e., ker <\> = {0}).
Note that in afty ring i?, any element a G R commutes with itself, so
a2 is well-defined. By associativity, aa2 = a2a so a3 is well-defined, etc., so
all powers of a a are well-defined. In particular, for A G Mn(k), all powers
A? are well-defined. The ring Mn{k) is also a vector space over the field &,
and we have seen that the dimension of this vector space is n2. Thus, of
the vectors
{I, A, A2, A3,...}
at most n2 of them can be linearly independent. This means that there is
a nontrivial linear combination of /, A, A2,..., An which equals the zero
matrix 0. That is, there exist ao,ai,... ,an2 G &, which are not all zero,
such that
(2) a0I + aiA + a2A2 -\ h aniAn2 = 0.
We rephrase this as follows: given A G Mn(k) there exists a nonzero
polynomial (with coefficients in k) of degree (at most) n2 which is satisfied by
A. For n = 2 and A = I n 12 ) this means there is a nonzero vector
V a2i a22 J
(&0,&1,&2,&3,&4) € k4
such that
b0I + 6i A + 62A2 + 63A3 + 64A4 =
Actually, it is possible to find a polynomial of degree n which A G Mn(k)
satisfies.
Definition The characteristic polynomial of A G Mn(k) is
(3) a(x) = det(xl - A)
which is a monic polynomial of degree n.
Strictly speaking, we have not defined the determinant of C G Mn{R)
unless R is a field; however, we can formally calculate detC, using the
Laplace expansion, whenever C has entries from a commutative ring.
In particular, deiC i« defined for C G Mn(k[x]). Taking C = xl - A,
we see that (3) mikes mmo for A € Mn(k). (Actually, (3) makes sense for
0 0
0 0
84
III. Rings and Polynomials
A G Mn{R), where R is a commutative ring with unit element, but we will
not need this much generality until Section C.)
Now the Cay ley-Hamilton Theorem states that &(A) is the n x n zero
matrix. In more detail,
xI-A
( x - an -ai2 ... -aln \
—a2i x — a22 • • • — «2n
\ —Q>n\ —a>n2 • • • x — ann J
and the determinant a(x) of this matrix is of the form
(4) a(x) = Xn + fcn-lZ71"1 + &n-2*n~2 + • • • + hx + b0 ,
and the assertion of the theorem becomes
An + &n_i A""1 + 6n_2An"2 + • • • + h A + 60/ = 0.
For n = 2 and A = ( n 12 ) the assertion is
V a2i «22 /
A2 - (an +CL22)A + (a11a22 - a12a2i)I = I q 0 ) '
which can be checked directly.
Note that <r(0) = det(-A) = (-l)ndetA, but cr(0) = b0 is also the
constant term of a(x). It is also an interesting fact that 6n_i = —trace A
Exercise. Use induction to show that 6n_i = —trace A and that a(x) is,
indeed, a monic polynomial of degree n.
Proposition 2 x — r is a factor of a(x) O r is an eigenvalue of A.
Proof: x — r is a factor of a(x) <^> a(r) = 0 <=> det(rl - A) = 0 <& rl - A
is singular <^> there is a v ^ 0 in kn such that v(r/ — A) = 0 <^> vA = rv.
1
Proposition 2 provides us with a convenient method for calculating the
eigenvalues of a square matrix.
Examples
( 2 2 \
(1) The eigenvalues of A = I I are the roots of
a(x) = det ( :E_22 s +2j ) = 0z-2)(z+l)-4 = (*-3)(z+2)
which are Ai = 3 and A2 = — 2. To get an eigenvector for A
corresponding to Ai = 3, note that we are looking for (rci, £2) £
R2 such that
(&i,&aM = 3(01,32)
A. Rings
85
or equivalently
(x1,x2)(3I-A) =0.
Thus (xi,x2) must satisfy
(xllx2)( _2 4 J = {xi -2x2,-2xi +4x2) = (0,0)
or, mpre simply, x\ = 2x2. If we choose x2 = 1, then we get
the eigenvector
(X!,X2) = (2,1).
Exercise. Find an eigenvector corresponding to the eigenvalue
A2 = -2.
(2) The eigenvalues of A = I n J are the roots of
a(x)=det(X1 'l^x' + l.
This polynomial does not have any real roots, but if we
extend to the complex numbers (i.e., think of A as an element of
M2(C)), then we see that the eigenvalues of A are Ai = i and
X2 = — i. Unfortunately, the eigenvectors for A must also be
complex.
Since any real polynomial can be factored completely over C, the
technique used in example (2) shows that every A e Mn(R) has an eigenvalue
(although it may be a complex number).
It is easy to show that similar matrices have equal characteristic
polynomials; so if V is a finite-dimensional vector space, then we define the
characteristic polynomial of <j> G End(T^) to be the characteristic
polynomial of Ma (</>), where a is any basis for V. This allows us to calculate the
eigenvalues of </> by calculating them for any Ma((f>).
Exercises
(1) Recall the definition of Zm (with the operations + and •) from
the exercises of Chapter 0. Show that Zm is a ring. Show that
this ring is a field <^misa prime number.
(2) Let R be a ring and choose any r, 5 E R. Show that r(—s) =
— (rs) = (-r)s and (—r)(—s) = rs. If R has a unit element,
show that it is unique and that (—l)r = —r, (—1)(—1) = 1.
(3) A commutative ring is called an integral domain if it has no
diviners of zoro. Show that a finite integral domain is a field.
86
III. Rings and Polynomials
(4) Define a map (j> : Z —► Zm by </>(n) = remainder after dividing
n by m. Show that 0 is a surjective (ring) homomorphism.
What is the kernel of </>?
(5) Find the eigenvalues of the following matrices. Also find a basis
for each associated eigenspace.
1 0 1
(ii) A = | 0 2 0
0 0 1
(iii) A
(6) Show that similar matrices have the same characteristic
polynomial.
(7) Show that the n x n matrices A and lA have the same
characteristic polynomial. If one of the square matrices A, B is
nonsingular, show that AB and BA have the same
characteristic polynomial. Is this still true if we allow them to both be
singular?
(8) Let V = R3 and define 0 G End(F) by
<£(ei) = 2ei + 2e2-e3
0(e2) = ei + e2 + e3
<Ke3) = ei - 2e2 + ^4 •
Find the eigenvalues of </> and find a basis for each associated
eigenspace.
B. Polynomials
First we will consider polynomials with coefficients in a commutative (and
associative) ring R with unit element. Then we will remove the commuta-
tivity assumption (since we will want to use R = Mn(k)) and see how this
modifies our results.
Definition A function
p:R~+ R
B. Polynomials
87
is a polynomial function if it is given by a formula of the sort
(5) p(x) = anxn + an_ixn-1 H f- axx + a0
where an,..., ao are elements of R which we call the coefficients of p.
By writing (5) as we have, we mean to imply that an ^ 0. The n for
this highest nonzero coefficient is the degree of the polynomial p and we
write this as degp. The polynomial p is monic if an = 1. The x is referred
to as the indeterminate (since it may be chosen to be any element of R)
and the aiX% are the terms of p.
Definition An element x0 G R is a rootoip if p(x0) = anXQ-\ }-aiXo +
a0 = 0.
Let R[x] denote the set of all polynomials in x with coefficients in R.
We will make R[x] into a commutative ring with unit element.
The unit element 1 in R[x] will be the polynomial in x with all
coefficients except ao being zero and ao = 1 G R. The zero 0 G R[x] is the
polynomial with all coefficients zero. (Note that we do not define the degree
of the zero polynomial.)
To add p(x) and q(x) in R[x] we simply add up all of their terms and
then put the results in the form (5). For example, if p(x) = 2x3 + x — 1
and q(x) = 7x4 - 3xs + x + 1, we get p(x) + q(x) = 7x4 - xs + 2x. This
operation is associative and commutative, the polynomial 0 acts as identity
element, and — p(x) is the additive inverse of p(x). Thus R[x] becomes an
abelian group under addition.
Since multiplication is to distribute over addition, we just need to define
the product of a term a^x1 in p(x) and a term bjX^ in q(x). We set
(6) (aix^ibjx*) = (oibj)^*
and extend by distributivity. This multiplication is associative and is
commutative (since R is). Clearly 1 G R[x] is the unit element.
Properties of Degree:
(a) deg(/ + g) < max(deg /, deg#)
(b) deg(fg) < degf + deg#
If the coefficient of the highest power is a unit, for / or for g, then
(c) deg(fg) = degf + deg#
Proof: (a) and (b) are straightforward. For (c), let f(x) = ao + a\x +
V anxn (an ^ 0) and g(x) = b0 + b\x H f- 6mxm (bm ^ 0). Then we
just need to show that the coefficient anbm of a?n+m is nonzero. But one
of an, bm is a unit. Say a,n m a unit. Then anbm = 0 gives a~lanbm = 0 or
bm = 0, a contradiction, I
88
III. Rings and Polynomials
Proposition 3 (Division Theorem) Given f(x),g(x) G R[x], with the
highest coefficient in g(x) being a unit, then there exist unique polynomials q{x)
and r(x) such that
(7) f(x) = q(x)g(x)+r(x)
with degr < deg# or r(x) = 0.
Proof: (Note that if deg / < deg #, the conclusion is trivial — take q(x) =
0 and r(x) = f(x).)
Let 5 = {f(x) — h(x)g(x)\h(x) G i?[x]}. If 0 G 5, then we are done;
otherwise, from S we select r(x) = f(x) — q{x)g{x) with the least possible
degree and suppose
r(x) = Co + c\X H h cpxp1 cp ^ 0, and
g(x) = b0 + bix H h 6mxm, bm a unit.
We will show that p>m contradicts the minimality of p = deg r.
So suppose p > m, and let
*(*) = r(a;) - cpb^x^mg{x).
Then
s(x) = /(a;) - q(x)g(x) - cpb^xp-mg(x)
= f(x)-(q(x)+cpb^xP-m)g(x)
showing s(x) G S. Now degs < degr since
s(x) = (c0 + cxx + • • • + cpxp) - cpb^l1xp-rn(bo + • • • + 6mxm)
and the pth powers cancel. Thus deg r < deg g, and we have also found q.
It remains to prove that q and r are unique. So suppose qg+r = q'g+r'
where q ^ g'. (Note that q ^ qf implies q — q' ^ 0 so that deg(g — g') > 0.)
Then (q - q')g = r' - r. So
(8) deg((<? - q')g) = deg(r' - r) < degg.
By property (c) of degree,
(9) deg((q - q')g) = deg(q - q') + deg g,
and (8) and (9) together force deg(g — q') < 0, giving a contradiction. Thus
q = q', and also r = r'. I
Proposition 4 Let
f(x) = a0 + aixH \-anxn,
g(x) = 60 + 6ia?H + 6mxm, and c £ R,
B. Polynomials
89
Then
0) (f9)(c) = a>o9(c) + aig(c)c H h ang(c)cn, and
(ii) (gf)(c) = g{c)a0 + cg(c)a1 -\ h cng(c)an.
Proof: Using distributivity we get
f(x)$x) = a0g(x) + a1g(x)x -\ h ang(x)xn, and
g(x)f(x) = g(x)a0 + xg{x)ax + • • • + zn#0z)an,
so the proposition follows. (We have written these in a somewhat strange
way in preparation for the case in which R is not necessarily commutative.)
I
Corollary 1 If g(c) = 0, then (fg)(c) = 0 and (gf)(c) = 0.
Corollary 2 If f{x) can be written as f{x) = g(x)(x — c)-\-r with c,r £ R,
then f{c) = r.
Corollary 3 (Factor Theorem) Let f(x) G R[x\. Then x — c is a factor of
f(x) * /(c) = 0.
Case of Noncommutative R
Suppose now that R is not necessarily commutative, and let p(x) =
ao + a>\x H f- anxn G R[x\. We define operations on R[x] as before and
R[x] becomes a ring which, of course, may not be commutative. Now for
c G R we define the value of p at c to be
(10) p(c) = a$ + aic^ h ancn,
but in addition we define a left value p\{c) by
(11) p\(c) = a0 + cai + c2a2 H h cnan.
(Alternatively, we could write p\(x) = a0 + ccai H f- xnan and evaluate
Px(x) at c.) Now Proposition 4 becomes
Proposition 4/
Let f{x) = ao + aicc H f- anxn and g{x) G i?[cc]. Tften for c £ R, we
have
(0 (/#)(c) = ao#(c) + ai^(c)c + • • • + ang{c)cn, and
(ii) {9f)x{c) = £A(c)a0 + c^A(c)ai + • • • + cngx(c)an.
Proof: Again, wi have
90
III. Rings and Polynomials
f(x)g(x) = aog(x) + a\g(x)x -\ f- ang(x)xn (recall our
definition of the product of terms of polynomials. Thus g(x)x =
xg(x), etc.)
and
g{x)f(x) = g{x)a0 + xg(x)a1 H h xng(x)an.
Thus the proposition follows from (10) and (11) I
Corollary 1'
(*) 9(c)=0^(/<7)(c) = 0(by(i)of4').
(**) 9\(c) = 0 =» (gf)x(c) = 0 (by (ii) of 4').
Corollary 2' (Remainder Theorem)
(i) If f(x) = g{x)(x -c) + r, then /(c) = r.
(ii) If f(x) = {x - c)g(x) + r, then fx(c) = r.
Proof: For (i), apply (*) to the product g{x){x — c). For (ii), apply (**)
to the product (x — c)g(x). I
Corollary 3' (Factor Theorem)
Let f(x) G R[x] and c G R. Then
(12) (x - c) is a right factor of f(x) O /(c) = 0.
(13) (x - c) is a left factor of f(x) <^> f\(c) = 0.
Polynomials Over a Field k
Definition If p(x),q(x) G k[x] we say that q divides p if there is some
nonconstant h(x) G k[x] such that p = qh.
Given two polynomials f(x),g(x) in k[x] we now find their greatest
common divisor.
Proposition 5 Given f(x),g(x) G k[x] (not both zero), there is a unique
polynomial d(x) satisfying:
(i) d(x) — a{x)f{x) H- b(x)g(x) (for some a(x),b(x) G k[x}),
(ii) d(x) is monic,
(iii) d(x) divides both f(x) andg(x), and
B. Polynomials
91
(iv) if s(x) divides both f{x) andg(x), then s(x) divides d(x).
Proof: Since we do not have both / and g zero, we have a nonzero
polynomial of the form (i), and we may require it to be monic since A; is a field.
Let d{x) be one of minimal degree. We claim that it then also satisfies (iii)
and (iv).
If d fails to divide /, we have
f(x) = q(x)d(x) + r(x) with degr < degd, and r ^ 0.
For some c £ &, cr is monic. We claim it is of the form (i) and it has degree
less than deg d, contradicting the minimality of deg d. Now r = f — qd and
d = af + bg, so
•cr = cf- cqd = cf - cq(af + bg) = (c - cqa)f - cqbg ,
and so cr is of the form (i). This proves d satisfies (iii).
(iv) follows easily from (i).
Finally, if we have d! with these same properties, then d and d! are
monic polynomials each dividing the other (by part (iv)) and thus they are
equal. I
Definition f(x),g(x) are relatively prime if their greatest common divisor
is 1. A polynomial / is irreducible if the only monic nonconstant polynomial
dividing / is a constant multiple of /.
Lemma 1 Suppose f is irreducible and f divides the product gh. Then f
divides g or f divides h.
Proof: If / does not divide g, then 1 = a(x)f(x)-\-b(x)g(x) (by Proposition
5). But then h = afh -f- bgh, and since / divides gh we conclude that /
divides h. I
As a corollary to this lemma, if an irreducible f(x) £ k[x] divides a
multiple product of elements of k[x], then it divides some one of them.
Proposition 6 Letp(x) be a nonconstant monic polynomial in k[x\. Then
p(x) is a unique (up to order) product of irreducible polynomials.
Proof: Suppose monic nonconstant polynomials which cannot be factored
into irreducible polynomials do exist. Let / be such a one of minimal
degree. If / is irreducible, then it is already factored; so we must have
f = gh with g and h monic and nonconstant. But deg/ = deg# -f deg ft
(by (c) of properties of degree) and thus g and h can be factored into
irreducible polynomials. But then / can also be factored.
Now suppose that there are polynomials which admit nonunique
factorizations. Then we have
(14) p s= ai.. .ar =s b\.. .b9
92
III. Rings and Polynomials
with each a* and bj monic and irreducible. Choose the least r for which
this happens. Since a\ divides b\... bSl it divides one of the bj; since bj is
irreducible, they are equal and can be cancelled and r was not minimal.
Thus r = s.
Finally, if the 6/s are not just a permutation of the a^s, we can choose
r(= s) minimal for this. Again, we can cancel a\ with some bj to obtain a
contradiction. I
Definition Given A G Mn(fc), we know that there is a monic polynomial
p(x) G k[x] such that p(A) = 0 (e.g., let p be the characteristic polynomial
of A) so there is such a polynomial of minimal degree. We call this the
minimal polynomial of A and denote it by m(x). (If V is finite-dimensional,
we define the minimal polynomial of </> G End(F) to be the minimal
polynomial of any matrix which represents </>. This is well-defined by property
(e) below.)
Properties of m(x):
(a) deg m < n
(b) m is unique
(c) m divides any polynomial which is satisfied by A (so, in
particular, m divides the characteristic polynomial a of A)
(d) m and a have the same irreducible factors
(e) similar matrices have the same minimal polynomial
Proof: (a) is obvious. For (b), suppose we have another such m'{x).
Since m and mf are both monic, m — mf must be of smaller degree; but
(m — m'){A) — m(A) — m'{A) = 0 which contradicts the minimality of
deg m = deg m' (unless m' = m).
For (c), suppose we have a nontrivial polynomial p(x) G k[x] such that
p(A) = 0. By the division theorem, there are polynomials g(x),r(x) such
that
p(x) = g{x)m{x) -f r{x)
where r = 0 or degr < degm. If r ^ 0, we get a contradiction since
r(A) = p(A) - g(A)m(A) = 0.
To prove (d), the following lemma will be useful. (Its proof is left as an
exercise.)
Lemma 2 cr(x) divides (m(x))n.
Now suppose that f(x) G k[x] is an irreducible polynomial. If / divides
m, then it also divides a (since m divides a by part (c)). On the other
hand, suppose that / divides a. Then / divides mn by the previous lemma;
but / is irreducible and so it must also divide m by Lemma 1.
The proof of (e) is left as exercise (2). 1
B. Polynomials
93
Exercises
(1) Prove Lemma 2.
(2) Given A G Mn{k) and f(x) G k[x], show that f{CAC'1) =
Cf(A)C~1 where C is any n x n invertible matrix. Conclude
that miCAC-1) =0&m{A) = 0.
(3) Show^hat x — 1 divides xn — 1.
(4) Given A G Mn(fc), let S C fc[x] be the set of all polynomials
f(x) such that /(A) = 0. Show that S is an ideal of k[x].
(5) Let f(x) G R[cc] be a monic irreducible polynomial of degree
two. Show that / can be written in the form
f(x) = (x-a)2 + b2
where a, b are real numbers with 6^0. Conversely, show that
any such / is irreducible.
(6) Suppose f(x) = anxn -f • • • + a\x -f ao is a polynomial with
integer coefficients, and suppose m/£ is a root of /, where m
and I are relatively prime integers. Show that m divides ao
and I divides an.
(7) Show that the polynomials x2 — x -f 4 and x3 — x -f 1 are
irreducible over the rational numbers. What about over R?
(8) Show that if R is an integral domain with unit element, then
any unit in R[x] is also a unit in R.
(9) Find the minimal polynomial of the following matrices:
(i) A =
(ii) A =
2
0
0
0
0
2
0/
(10) Suppose A =
B
0
0
C
where B, C are square submatrices.
Show that the minimal polynomial of A is the least common
multiple of the minimal polynomials for B and C. Now
generalize to the case where
/ A1 0 \
A =
\ 0 Am )
94
III. Rings and Polynomials
C. Cay ley-Hamilton Theorem
Let R be a commutative ring with unit element. Then R[x] is also
commutative. We consider two rings. First consider Mn(R[x]), that is, the
ring of all n x n matrices with entries from the ring R[x], For example,
/ 1 -f x 2x2 \
( „ 3 ] G M2(Z[x]). Next consider Mn(R)[x], that is, the ring of
polynomials with coefficients from the ring Mn(R) of n x n matrices with
entries from R. We claim that these rings are essentially the same.
Theorem 1 Mn(R[x]) ^ Mn(R)[x] .
Proof: First we need some notation: for f(x) = ao + a\X + f- anxn G
R[x], let rp{f{x)) = ap, where it is understood that rp(f(x)) = 0 if p > n.
Clearly rp{f{x) + g(x)) = rp(f(x)) + rp(g(x)); since we want a ring
isomorphism from Mn(R[x]) to Mn(R)[x], we also need to know rp{f{x)g{x)).
Lemma rp(f(x)(g(x)) = Y%=0ri(f(x))rp-i(g(x)).
The proof of this lemma is left as an exercise.
Now for F = {fij(x)) G Mn(R[x}), we define
rp(F) = (r^fijix)))
which is an element of Mn(R). Clearly rp(F + G) — rp(F) + rp(G) for
F,G e Mn(R[x]).
We are now ready to define a map from Mn{R[x\) to Mn(R)[x]: given
F G Mn(R[x\), define
a : Mn(R[x])-^ Mn(R)[x]
by a(F) = r0(F) + n(F)x + r2{F)x2 + ... (a finite sum).
It is plain that a(F+G) = a(F)+a(G), since rp{F+G) = rp{F)+rp{G).
To see that a(FG) = a(F)a(G) requires some work, however. We will show
that the (matrix) coefficient of xp is the same in a(FG) and a(F)a(G).
The coefficient of xp in a(F)a(G) is
C = (cy) = f>(J>p_,(G)
by the obvious extension of our lemma to the polynomial ring Mn(R)[x\.
The coefficient of xp in a(FG) is
D = (dy) = rp(FG) .
C. Cay ley-Hamilton Theorem
95
If F = (fij(x)) and G = (gij(x)), then the i,j entry of FG is (FG)ij =
/•i0ij 4- • ■ • 4- /in^nj. Thus
dij = (rp(FG))ij = rp(fngij H h /*n^nj)
= Yl(n{fn)rp-i{gij) 4- • • • 4- ri(fin)rp-i(gnj))
£=0
where the last equality follows from our lemma.
Now note that
v
Cij = Yl(ith row of T^F^ x tith column of rp-t(G))
£=0
P
= ^2{r£(fn)rp-£(gij) 4- • • • 4- r£(fin)rp-£(gnj))
£=0
which is just dij.
We have shown that a is a ring homomorphism; the proofs that it is
surjective and injective are left as an exercise. I
We continue with R (and hence R[x]) commutative and with unit
element. Recall that we defined the characteristic polynomial of A G Mn(R)
to be
(15) a(x) = det(xl- A).
We note that many important properties of the determinant function
still hold in this general setting; in particular, Theorem 2 of Chapter II is
still true, That is, if adj C denotes the classical adjoint of C G Mn(R),
then
(16) (adj C)C = (detC)I.
Applying this to C = xl — A gives
(adj (xl - A))(xl -A)= det(xl - A)I = a(x)I.
Now we can consider (by Theorem 1) the relation
(adj (xl - A))(xl -A) = a(x)I
to be in Mn(R)[x}. There, it says that xl — A is a right factor of a(x)I.
Thus, by Corollary 3', the value of a(x)I at x = A is zero. We have proven
the following important theorem.
Theorem 2 (Cayley-Hamilton Theorem) For A e Mn(R), we have
. (17) (j(yl) = ao/4-a1A+... + an_1An-1+An = 0
where ar(x) s* ao4-aj£-l \ran-ixrhm~i+xn is the characteristic polynomial
of A.
96
III. Rings and Polynomials
Exercises
(1) Prove the lemma in this section. (Hint: Try induction on
deg(fg).)
(2) Verify by direct computation that the following matrices are,
indeed, roots of their characteristic polynomials.
(i) A =
2
-1
0
0
0
1
0
4
3
2
1
0
0
0
-1
0
1
0
0
0
1
\
/
0
1
0
0
\
/
(ii)
(3) Prove that a (defined in the proof of Theorem 1) is surjective
and injective.
D. Spectral Theorems
(an application of the Cayley-Hamilton Theorem)
Let A G Mn(k) and let
a(x) = det(x/ - A) G k[x]
be its characteristic polynomial. We will see how to obtain information
about A (considering it as an element of End(/cn)) from information about
a.
Case I: Suppose
a(x) = {x- n)(x - r2)... {x - rn)
where Ti G k and no two of the T{ equal.
First we note that by Cayley-Hamilton we have
(18) 0 = (A - rJ)(A - r2I) • • • (A - rnI) .
Next, since the ?Vs are distinct, we can find ai,a2,... ,an G fe* such
that (for x not equal to any of the ?Vs) we have
(19) -7-r = + ••• + , or
a(x) x — r\ x - rn
D. Spectral Theorems
97
(20) 1 = ax ]J(x - rj) + • • • + an JJ (x - rd).
(This is just the technique of partial fractions from calculus.) Plugging in
A for x, we have
(21) / = oi H(A - rjI) + • • • + an JJ(^ - rjl).
O
Setting £?i = aiTT(A - r^J), we have
(22) 7 = ^1 + ^ + ... + ^.
Proposition 7 (Properties of the Ei)
(i) EiA = AEi (Each Ei commutes with A),
(ii) EiA = riEi,
(iii) For i ^ j, EiEj = 0 fTfte Ei's form an "orthogonal set"),
and
(iv) Ef = £?j f!Sacft £?i is idempotent,).
Proof:
(i) This follows since A(A — Vjl) = (A — TjI)A for each j.
.(ii) ^(A - r,7) = (aiH(A - rjI)){A - rj) = 0 by (18).
(iii) Note that for i ^ j, the product
a, ]J(A - r£I) J j aj H (A - rm/)
contains the right hand side of (18) as a factor,
(iv)
El = ElI = Et(Ei + ••• + £„) by (22)
= Ef by (iii). |
For i = 1,... ,n we let
V$ = £?i(fcn) .
Since Ei is idempotent we know that it acts on Vi as the identity. Note
that Ei ^ 0 since the minimal polynomial for A must contain the factor
x-Vi. Henm V5 f* {0} and so dim V^ > 0.
98
III. Rings and Polynomials
Proposition 8 (Spectral Theorem) Each Vi is one-dimensional,
kn = vi e • • • e vn ,
and the action of A onVi is multiplication by r^. Furthermore, if we denote
the restriction of A to Vi by A{, then the minimal polynomial of Ai is x—ri.
Proof: First off, the Vi do span kn. For let v G kn. Then
v = vl = v(E1 + • • • + En) e Vi + V2 + • • • + Vn.
Next let w = vEi and consider the action of A on w;: wA = vEiA =
vriEi = uvEi = riw. It follows that Vi is an eigenspace for A with
eigenvalue r$. Now suppose we have w £ V± Pi (V2 + - - + Vn) so that
w = V1E1 and w = V2E2 H -f vnEn. But then
w = WE1 = (v2E2 + • • • + vnEn)Ei = 0
by (iii) of Proposition 7. Similarly, we see that Vi f) (Vi H h VJ_i H- V^+i H-
• • • + Vn) = {0} for each i. It follows that kn = V\ 0 • • • 0 Vn by exercise
(4) of Section ID. Since dim Vi > 0 for each i, we must have dim VJ = 1.
The fact that x — r» is the minimal polynomial for A* follows from (ii)
of Proposition 7. I
So we see that Case I is very favorable. When a(x) splits into linear
factors x — Vi with no r» occurring more than once, we get a clear and
simple picture of A through the idempotents Ei,...,En. In Case I it is
also easy to calculate any polynomial p{x) in A because we have
Proposition 9 p(A) = p(r\)E\ H hp(rn)En.
Proof: By the division theorem and the remainder theorem, for each i we
have p(x) — p{vi) -f (x — ri)qi(x). By plugging A in for x and multiplying
on the right by Et, we have
p(A)Ei = p(ri)El + (A - r<J)(fc(A)Ei .
But (A - TiI)Ei = 0 (by (ii) of Proposition 7). So
p(A)Et = p(rt)Ei.
Thus
p(A) = p(A)(BiH-...+Sn)
= p(ri)(£i) + .--+p(rn)(£n)
as asserted. I
Case II: Suppose
(23) a(x) = det(a:/ - A) = (x - rx)mi ... (a? - f|)mi
D. Spectral Theorems
99
with ri,..., re distinct elements of the field k. So we are assuming again
that a(x) splits into linear factors but now there may be repetitions. (For
k = C we always have Case II and may sometimes have the subcase, Case
By Cayley-Hamilton, we have
(24) 0 = (A - nJ)mi ...(A- reI)m'.
O
Just as for Case I, we want to find idempotents Ei,...,E£ associated
with A by breaking up l/a(x) into an appropriate sum. The difference now
is that we need polynomials as numerators instead of just elements of k.
Proposition 10 If p(x),q(x) are in k[x] and are (nonconstant and)
relatively prime, then there exist unique polynomials a(x),b(x) in k[x] such
that ap -f bq = 1 and deg a < deg q and deg b < degp.
Proof: Since p, q are relatively prime, there exist ai, b\ G k[x] such that
axp + hq = 1.
Divide a\ by q to get a\ = qh-\- a with deg a < deg q. Set b = b\ -f hp
and then 1 = a\p -f b\q = (qh -f a)p -f (b — /ip)g = ap -f 6g. So for the
existence part of the proof, we are done if we can show deg b < degp.
Since we are in k[x], with k a field, we always have deg(fg) = deg / -f
deg#. So deg(bq) = deg 6 -f degq. But we also have
deg(bq) = deg(l — ap) < deg a -f degp < deg q -f degp
It follows that deg b < degp, proving existence.
Suppose we also have a'p -f b'q = 1 with deg a' < deg q and deg 6' <
degp. Then (a — a')p = — (b — 6')g and, since p and g are relatively prime,
we have that p divides b — b'. Since deg(6 — b') < degp, this forces b — b' = 0.
Similarly a — a' = 0 and Proposition 10 is proved. I
We are still considering Case II; i.e.,
a(x) = det(xl - A) = (x - n)™1 ... (x - r£)m*
with ri,..., r£ distinct elements of k.
Proposition 11 We have
(25) — = Ql(x) +... + Qi(x)
1 ; ct(x) (x-n)™* (x-r{)me
where 0 < dega^a?) < m* - 1 ^/or t = 1,... J),
100
III. Rings and Polynomials
Proof: We will establish (25) in the equivalent form
(26) 1 = aiOr) ]\{x - rj)m> + • • • + a£(x) JJ(a: - rj)m>
or equivalently, that
h(x) = ai(x) ]\(x - rj)m> +... + a£{x) Y[(x - rj)m> - 1
is the zero polynomial.
Now we choose i £ {1,2,..., £} and set
p[x) = Y[(x - rj)m> and q(x) = (x - r*)m\
These are nonconstant relatively prime polynomials so we can apply
Proposition 10. This gives unique polynomials a,i(x),bi(x) in k[x] such that
(27) ai(x)p(x) + bi(x)q(x) = 1
with degeii < rrii and degfr* < degp = deg cr — rrii. We use the di(x) in
(27) to define h(x).
Note that degh < deg a. Also note that q(x) divides a,i(x)p(x) — 1 (by
(27)).
Now i was arbitrary so we conclude that for (x — ri)m% divides h(x)
for each i. But this implies a(x) divides h(x). Since degh < deg cr, we
conclude that h(x) = 0 as we desired to prove. I
We can now substitute A for x in (26) to obtain
I = a1(A)]l(A-rjir>+... + a£(A)]l(A-rjir^
or letting Ei — ai{A)V\{A — rjl)™3, we again have
(28) / = £! + •••+£/
with Z?iZ?j = 0 for i ^ j, Z^A = AE*, and Ef = Ei. So we have again
decomposed I into a sum of "orthogonal" idempotents using A. However,
condition (ii) in Proposition 7 (namely, that (A — TiI)Ei = 0) must now be
replaced by
(29) (A-riI)m*Ei=Q.
So if we define V* = Ei(kn) as before, we have that VJ ^ {0} since the
minimal polynomial of A must contain the factor x - r*.
D. Spectral Theorems
101
Proposition 12 (Spectral Theorem) Each Vi is an mi-dimensional sub-
space which is A-stable,
kn = Vi 0 • • • 0 Vt ,
and the minimal polynomial of Ai — A\v is (x — Ti)Pt where 1 < pi < ra^.
In fact, the minimal polynomial of A is m(x) = (x — v\)Vl • • • (x — ri)Pe.
o
Proof: By (29), we conclude that the minimal polynomial of A{ must
be (x — ri)Pt with a < pi < m*. That m(x) = (x - ri)Pl • • • (x - rn)Vi
is the minimal polynomial of A is left as an exercise. The proof that
kn = Vi 0 • • • 0 Vi is almost identical to the proof in case I.
To see that Vi is A-stable, choose w £ V^ and note that there is some
v ekn such that w = vEi. Thus wA = (vEi)A = {vA)Ei G V*.
We will defer the proof that dim Vi = rrii until the next section. (It is
an easy consequence of the fact that we can put A into a "block diagonal
form.") |
Exercises
(1) Compute the idempotents for the following matrices. What is
the minimal polynomial of each Ai = A\v?
f 1
o
\°
(2
0
0
V °
0
2
0
0
3
0
0
0 ^
0
-1 t
0
0
-2
0
\
/
0 ^
0
0
3)
(i)
(ii)
(2) Based on the results of exercise (1), state a general result about
the idempotents of a diagonal matrix A.
(3) Let A G Mn(k) be the matrix with A G k on its diagonal, 1
on its superdiagonal, and zeros elsewhere. For example, with
n = 3
/ A 1 0
A= 0 A 1
\ 0 0 A
What arc the characteristic and minimal polynomials of AI
Show thfid A has exactly one eigenvector. What is the algebraic
multiplicity of the eigenvalue A?
102
III. Rings and Polynomials
(4) Let V be a finite-dimensional vector space and suppose V =
Vi 0 V2 where Vi, V2 are </>-stable subspaces for <\> € End(V). If
m,i(x) is the minimal polynomial of <j>\v , show that the minimal
polynomial of <j> is the least common multiple of m\(x) and
m2(x). Generalize.
E. Jordan Form
Given </> £ End(V) where V is finite-dimensional, it is to our advantage to
choose a basis a of V so that Ma(</>) is as simple as possible; this will make
the task of understanding </> easier.
We will look at this primarily as a matrix problem; i.e., given A £
Mn(k), we seek a matrix B which is similar to A and is as nice as possible.
Now suppose that the characteristic polynomial a of A factors
completely over k so that
(30) a(x) = (x~ n)mi • • • (x - r£)m*
as in case II of the previous section. As we observed before, this is always
true when k = C.
By Proposition 12, there are A-stable subspaces Vi,..., Vt such that
kn = Vx 0 • • • 0 Vt ,
and the minimal polynomial of A% — A\v is [x — ri)Pt where 1 < pi < ra$.
Also, the minimal polynomial of A is m(x) = (x — ri)Pl • • • (x — ri)Vi.
First we show that we can consider A{ to be a square submatrix of A.
Proposition 13 Suppose that (f) is an endomorphism of the vector space
V; also suppose that Wi,...,Wm are (f)-stable subspaces such that V =
W\ 0 • • • 0 Wm. Then we can find a basis aofV such that the matrix of
(j) is of the form
(31) Ma((f>) =
f Bl 0 \
Bo
\ 0 Bm J
where jE?i, ..., Bm are square submatrices. Furthermore, the matrix of </>\w
is B{.
Proof: Choose a basis {vj,..., v^} for W\. Since W\ is ^-stable, we have
(j)(v\) = bUv{ + - - + bin^
tlVnJ = fcnilv} + --- + 6nln1Vn
E. Jordan Form
103
So let B\ be the n\ x n\ matrix (6^).
Similarly, choose a basis {v\, ...,^J for M^2? etc. Then a = {vj,...,
v™ } is the desired basis for V. I
Corollary If A € Mn(k) satisfies (30), then there is a nonsingular matrix
C such that B = CAC'1 is of the form (31).
Exercise. Finish the proof of Proposition 12; i.e., show that dim Vi = m*.
(Hint: Show that the characteristic polynomial of A{ must be (x — r^)771*.)
The matrix B in the preceding corollary is simpler than A since it is a
block diagonal matrix (i.e., it has square submatrices on its diagonal and
zeros elsewhere), but we can do better.
Definition The matrix
Jn(A)=
(X 1
0 A
0 \
1
A/
€ Mn(k)
with A on its diagonal, 1 on its superdiagonal, and zeros elsewhere is called
the n x n Jordan block corresponding to A.
Theorem 3 Suppose that <j> € End(V) has (x — r)s as its minimal
polynomial. Then there is a basis for V in which the matrix of (j> is a block
diagonal matrix of the form
(32)
/ JsAt)
\ 0
JsAr)
0 \
Jsk(r) J
where s = si > s2 > • • • > Sfc > 1. Furthermore, this matrix is uniquely
determined by (j).
The proof of Theorem 3 is somewhat long-winded, so we will present it
in an appendix after the exercises.
Returning to the case of a matrix A £ Mn(k) whose characteristic
polynomial is cr(x) = (x — ri)mi • • • (x — re)171* and whose minimal polynomial
is m(x) = (x — ri)Pl • • • [x — 7y)p*, we see that a consequence of Proposition
13 and Theorem 3 is the following.
Corollary Th&re is a nonsingular matrix C mwh that B = CAC l is the
104
III. Rings and Polynomials
unique block diagonal matrix
(33)
B
( Bx
\ 0
#2
° \
Bi J
where Bi is the rriix ^» block diagonal matrix of the form (32) corresponding
to the eigenvalue r^; furthermore, Bi has a pi x pi block (in its upper left
hand corner) and none of its other blocks is larger.
Definition The matrix B in the preceding corollary is called the Jordan
form of A.
Examples
(1) A = I 1 j has characteristic polynomial o{x) — (x — l)2
and minimal polynomial m{x) = (x — l)2; so the Jordan form
of A must have exactly one 2x2 Jordan block corresponding
to the eigenvalue 1, i.e.,
B =
(2) A is said to be diagonalizable if its Jordan form is a diagonal
matrix. If we are in case I of the previous section, then A is
diagonalizable.
(3) Suppose the minimal polynomial of A £ M^R) is (x — 2)2.
Then A has two possible Jordan forms, namely,
/2
0
V
\
i
2/
or
/2
0
V
2/
(Here, a blank space indicates a zero.) In order to decide which
is actually the Jordan form of A, we need more information
about A.
We will now use the Jordan form of A to draw some interesting (and
important) conclusions.
Since rz is a root of the characteristic polynomial a(x) of A, it is also
an eigenvalue of A. Recall that we defined the algebraic multiplicity of
Ti to be the maximum dimension of the subspaces 0 C ker (A — r,/) C
ker (A-r?;J)2C....
E. Jordan Form
105
Proposition 14 The algebraic multiplicity of r^ is equal to rrii (the
multiplicity of ri as a root of a(x)).
Proof: Let B = CAC"1 be the Jordan form of A (as in (33)) so that J3*
is the rrii x rrii block corresponding to r^.
Now the only vectors which get mapped to 0 by B — r{I are precisely the
ones which get mapped to 0 by Bi — ul. (Exercise. Show this.) Similarly,
ker (B — r^/)-7 =Jker (Bi — r^/)-7 for all j > 1. But Bi — rj is nilpotent of
order ^ (i.e., (Bi — riI)Pt = 0) so the dimensions of the kernels of (B — Tiiy
stabilize at rrii.
Since dim(ker (A — rilY) = dim(ker (J3 —r^J)-7') for all j > 1, our result
follows. |
Recall that we defined the geometric multiplicity of the eigenvalue r^ to
be the dimension of the eigenspace belonging to 7>
Proposition 15 If Bi is the block corresponding to ri in the Jordan form
of A, then the geometric multiplicity of ri is equal to the number of Jordan
blocks occurring on the diagonal of Bi.
Proof: Exercise. (Hint: An equivalent formulation of this problem is to
show that the dimension of ker (Bi — r^J) is equal to the number of Jordan
blocks on the diagonal of Bi.) I
Note that if B = (bij) is a matrix in Jordan form, then its determinant
is easy to calculate:
detB = 611622 • • *6nn
since B is triangular.
For A G Mn(C), let Ai,...,An G C be its eigenvalues (there may
be repeats in this list). If B = CAC"1 is the Jordan form of A, then
the diagonal elements of B are precisely (some permutation of) Ai,..., An.
Since det A = det £?, we see that the determinant of A is simply the product
of its eigenvalues.
In fact, for A G Mn(R), the same is true: thinking of A as an element
of Mn(C), we can put it in Jordan form (over C), etc.
Similar arguments show that trace A = Ai H h An for A with real or
complex entries.
We summarize these paragraphs as
Proposition 16 Let Ai,...,An G C be the eigenvalues of A € Mn(k)
where k = Rork = C. Then det A = Ai • • • An and trace A = AiH hAn.
Exercises
(1) Find all possible Jordan forms for a real matrix whose
characteristic and minimal polynomials are as follows. (Could
Proposition 15 possibly be used to single out a particular Jordan
form?)
III. Rings and Polynomials
(i) a(x) = (x-l)\x-2)2 , m(x) = (x-l)2(x-2)2
(ii) a{x) = (x + 3)4(x-4)4 , m(x) = (x + 3)2(x-4)2
(iii) a(x) = (x + l)6 , m(x) = (x + l)2
(2) Find the Jordan form of the following matrices. Also calculate
det A and trace A.
(iv) A =
(v) A =
(3) Show that A £ Mn{k) is diagonahzable <^> A has n linearly
independent eigenvectors.
(4) Show that A £ Mn{k) is diagonahzable <^> the minimal
polynomial m(x) of A is a product of distinct linear factors (i.e.,
m(x) — (x — ri) • • • (x — ri) where ri?..., ra are distinct).
(5) Show that the Jordan block Jn(r) is not diagonahzable. Which
of the matrices in exercise (2) are diagonahzable?
(6) If A £ Mn(k) is nilpotent and nonzero, show that A is not
diagonahzable.
(7) If A £ Mn(C) satisfies Am — /, show that A is diagonahzable.
(Hint: Show that Jn(r)m is not diagonal unless n = 1.)
(8) Given the characteristic and minimal polynomials of some A £
Af3(C), show that these completely determine the Jordan form
of A
Appendix: Proof of Theorem 3
107
(9) Suppose that A £ Mn{k) is diagonalizable, and let Q be the
matrix whose rows are the n linearly independent eigenvectors
of A. Show that B = QAQ~~l is the Jordan form of A.
Appendix: Proof of Theorem 3
First we consider^ the special case where dim V = s and the minimal
polynomial of (f) is {x — r)s. Since {(f) — r)s_1 ^ 0, there is a vector v eV such
that (<l>-r)a-\v)±Q.
Observe that the set of s vectors
a = {v,{cj>-r){v),...,{4>-ry'\v)}
is linearly independent. For if
(34) a0v + ax{(f) - r){v) + • • • + as_i(</> - r)3'1^) = 0 ,
then applying {(f) — r)s_1 to both sides of (34) yields
ao(^-r)*-1(t;) = 0.
But then a$ = 0 since {(f) — r)s~l{v) ^ 0; and so (34) becomes
(340 ai{(f) - r)(v) + • • • + as_!(0 - r)5"1^) = 0 .
Applying {(f) — r)s~2 to both sides of (34') yields a\ = 0, etc. Hence
ao = • • • = as-i — 0 and so a is linearly independent.
So a is a basis for V. What is the matrix of (f) in this basis? Since
(f){v) — (r + (f) — r){v) = rv -f {(f) — r){v)
(/>{(/> — r){v) = {r -f (f> — r)(</> — r){v) — r(</> — r){v) + {(f) — r)2{v)
(j){(j) - r)*-1^) = r{(j) - r)*-1^) + r{(f) - r)s{v) = r{(j) - r)5"1^)
we have Ma{(f>) — Js{r).
Now we return to the general case where dimF = n and the minimal
polynomial of (f) is (x — r)s. Suppose that we can decompose V as a direct
sum of 0-stable subspaces
(35) V = Vx 0 • • • © Vk
where dim V* = S{ (remember that s\ must be s) and the minimal
polynomial of $\v is (a? - r)"'. Then by the simple case above and Proposition
13, we will d© done,
108
III. Rings and Polynomials
For convenience, we write
-0 — (j) — r .
We start off by finding V\ as in the simple case. Since -0s-1 ^ 0, there
is a ^i G V such that i^s~l{vi) ^ 0. Let V\ be the «i(= s)-dimensional
^-stable subspace
Vi =Span(vi,^(t;i),...,^a-1(wi))
and note that the minimial polynomial of (j>\v is (x — r)Sl.
If we can find a ^-stable subspace W such that
V = Vl®W ,
then (noting that the minimal polynomial of </>\w is (x — r)f where t < s)
by induction on n = dim V, we can find the appropriate </>-stable subspaces
V2,..., "54 such that
w = v2 e... e vk.
This will then give us (35).
To get W, we use induction on s. If 5 = 1, there is nothing to prove.
Our induction hypothesis is:
Suppose p G End(X) has minimal polynomial (x — r)s_1.
If we let Xi = Span(xi, (p — r)(xi),..., (p — r)s~~2(xi)) where
(p — r)s~2(xi) 7^ 0, then there is a p-stable subspace Y such
that X = Xi 0 Y.
Now let V = ip(V) and note that V is </>-stable, because for any t? G V
we have
4>{%l>{v)) = </>(</> - r)(v) = ((t>- r)(f)(v) = i/>(<l>(v)) G V' .
Also, the minimal polynomial of <j>\v, is (x — r)s-1. If we let u\ = ^(vi),
then
V/ - SpanK,VK),... ^s~2(u1)) = ^(Vi)
is a ^-stable subspace of V', so we have a </>-stable subspace V{ such that
V7 = V[ 0 V2'
by induction.
Now let
W = ^~1(^) = {^ e V I tl>{v) g V2'}
and note that
(36) V = Vi + W .
Appendix: Proof of Theorem 3
109
To see this, choose t? G V and observe that we can write i/>(v) as
l/;(v) =Wi+W2
where Wi G V/. But V{ = Span(^(vi),... ,-0s-1(vi)) so we can write
wi as
Wl = ai^(vi) H ha^-i^5-1^!)
o = ^i+-- + as_if-2(t;i))
= #*i)
where zi eVi. Thus
^(v) = -0(^i) + W2
so that i/j(v — Zi) — W2 G V^. But this implies that v — Z\ G TV'; so we have
v = zi + (v - zi) eVi + W
as claimed.
Unfortunately, we do not have Vi n W7 = {0}, so the sum (36) is not
direct. However, we can say that
(37) VinV2 = {0}.
Exercise. Show this. (Hint: If v G Vi n V2', then ^(v) G V/ n V2' = {0}.
Show that t/;(v) = 0 and v G Vi together imply that v G V7.)
So W contains the subspaces Vi H IV' and V2' which intersect trivially
(by (37)). Thus we can find another subspace V7 such that
w' = (Vi n W) e v2' e v3'.
Exercise. Show this.
Finally, we let
w = vi ® vi
and show that this is the subspace we desire.
Clearly, V\ D W = {0} since V2 and V7 are both disjoint from Vi.
Furthermore, Vi + W contains both Vi and W so that V = Vi+W. Thus
v = Vi e w.
To see that TV is ^-stable, choose w G W and note that i/>(w) G V2.
But then 0(u/) = (</> - r + r)(w) = i/;(w) + rw eVj + W = W.
We have shown that <\> can be represented by the matrix (32), but what
about uniqueness? Toward this end, the following lemma will be helpful.
Lemma Suppose that U is t-dimensional and that p e End(U) has minimal
polynomial a?*. Thm p^(U) has dimension t - j for each j <t.
110
III. Rings and Polynomials
Proof: Choose wG[/so that {u, p(u),..., pt-1(ix)} is a basis for [/. Then
{p>(u),... ,pt_1(ix)} is a basis for fP(U). I
Continuing the proof of uniqueness in Theorem 3, suppose that </> has
another block diagonal representation where the blocks are Jtl (r),..., Jte (r)
with ti > • • • > te > 1. (These blocks correspond to subspaces Wi,..., Wt
such that V = W\ 0 • • • 0 W^.) We claim that k = I and s» = U for each i.
If not, there is a least index j such that
S3 T1 tj ?
without loss of generality, we assume that Sj < tj.
As before, let i/j = </> — r and note that the minimal polynomial at ij)\w
is xt% (and for i/j\v it is xSt). By the previous lemma,
dim^Sj (Vi) = Si- Sj and dim^ (Wi) = U - Sj
for i < j.
By exercise (9) of Section IE, we have
i/>*> (V) = ^ (W1 0 • • • 0 We) = ^ (Wi) 0 • • • 0 V>Sj (W€)
and so
, . dim^Sj (V) > dim^Sj (Wi) + • • • + dim^Sj (Wj)
[ j = (ti-*;) + -" + (t;-*;).
Also ^ (Vi) = {0} for i > j, so we have
^ (y) = ^ (^ 0... 0 vi) = r° (vi) ©... 0 v>Sj ovj
which implies that
(39) dim^Sj (V) = (*i - Sj) + • • • + (^-i - *,-) .
Now Si = ti,..., 5^-i = tj-i by our choice of j, so we can rewrite (39)
as
(39') dim^ (V) = (ti - Sj) + • • • + (t,_i - s,) .
Comparing (38) and (39'), we conclude that
0 > tj - Sj
which contradicts the fact that Sj < tj. This completes the proof of
Theorem 3. I
Chapter IV
Inner Product Spaces
A. Rn as a Model, Bilinear Forms
For x = (£i,...,£n), y = (yi>--->yn) two points in Rn we define the
distance from x to y by
d(x, y) = y/{xx - yi)2 + • • • + (xn - yn)2.
Two obvious properties of d are:
(1) d(y,x) =d{x,y),
. (2) d(x, y) > 0, and d(x, y) = 0 <=> x = y.
A less obvious property is:
(3) For any x,y,z £ Rn we have
d(x,y) + d(y,z) > d(x,z).
Property (3) is called the triangle property. A function d : X x X -*H
satisfying (1), (2), and (3) is a metric on the set X.
To prove (3), it is convenient to introduce some related definitions. The
norm of x G Rn is
||z|| = d(3,0) = yjxl + --- + X*.
The function (,):RnxRn^R defined by
(1) {x,y) =x1y1 + ... + a;nyn
is called an inner product We will see that d and (,) are related by
(2) d(ar, y) = y/(x-y,x-y).
112
IV. Inner Product Spaces
Proposition 1 (,) has the following properties:
(i) (,) is symmetric; i.e., (x,y) = (y,x),
(ii) (x,y + z) = (x,y) + (x,z),
(ii') (x + y,z) = (x,z) + (y,z),
(iii) (x, ay) = a(x, y) for a G R,
(iii7) (ax, y) = a(x, y) for a G R,
(iv) ( ,) is positive definite; i.e., (x,x) is always > 0, and
(x,x) = 0<^x = 0 = (0,...,0),
(v) if {ei,..., en} is the standard basis for Rn, then
[el,ej) oy-j0 .f t^j and
(vi) (,) is nondegenerate; i.e., if (x, y) = 0 /or a// y, then x = 0.
Proof: (i), (ii), (ii7), (iii), (m7)? and (iv) are all routine to prove. Note
that (ii) and (iii) together say that (,) is linear in the second variable when
the first variable is held fixed. We have a similar statement from (ii7) and
(iii7). These four together mean that (,) is bilinear.
(v) is obvious. For (vi), suppose (x,y) = 0 for all y. Then, letting y — x
and using (iv), we see that x = 0. I
Now we prove
(3) d(x,y) = yj(x-y,x-y) = \\x - y\\.
Using bilinearity of (,), we have
(x-y,x-y) = (x,x)- (y,x) - (x,y) + (y,y)
= (%i - yi)2 + • • • + {xn - yn)2 = \\x - y\\2
= (d(x,y))*.
Now, using properties of (,), we will prove the Schwarz Inequality. This
will then give the triangle inequality for the metric d.
Proposition 2 (Schwarz Inequality) For any x,y G Rn, we have
(4) (x,y)2 < (x,x)(y,y).
A. Rn as a Model, Bilinear Forms
113
Proof: By property (iv) of (,), for any real number t we have
(x + ty,x + ty) > 0.
Using bilinearity and symmetry, this is
(5) (x,x) + 2(x,y)t+(yiy)t2>0.
This quadratic polynomial in t with real coefficients can have only one
minimum, so (5) shows it cannot have two distinct roots. It follows that
the discriminant cannot be > 0. That is,
(2(x,y))2-4(y,y)(x,x)<0.
This is equivalent to (4) and the Schwarz Inequality is proved. I
Now take x,y,z G Rn. We claim that d(cc, y) + d(y, z) > d(cc, z) holds.
Proof: We apply the Schwarz Inequality to x — y and y — z to get
(6) (x-y,y-z) < ^/(x -y,x-y) y/(y - z,y - z) = d(x, y)d(y, z)
We will show that (6) is equivalent to
(7) (d(x,y) + d(y,z))2>(d(x,z))2.
Rewriting (7) as
(x-y,x-y) + 2d(x, y)d(y, z) + (y - z,y - z) > (x - z,x - z),
and then expanding out and cancelling some terms gives
-2(3, y) + 2(y, y) - 2(y, z) + 2d(s, y)d{y, z) > -2(3, z)
or
- (s, y) + (y, y) - (y, *) + (&, z) + d(z, y)d(y, *) > o
or
d(x, y)d(y, z)>(x-y,y-z)
which is (6). I
Bilinear Forms
Definition Let V be a real vector space. A bilinear form on V is a
function f ;V x V —+R which satisfies
(i) f(u + v, w) = /(w,w) + /(v, w),
(!') /(u, v + u/) - /(u, v) + /(u, w), and
114
IV. Inner Product Spaces
(iii) f(au,v) = af(u,v) = f(u,av) for a G R.
The bilinear form is symmetric if
f(u,v) = f{v,u) for all u,vGV;
and the symmetric bilinear form / is positive definite if
f(v,v) > 0 for v ^0 .
Exercise. Show that /(?;, 0) = 0 = /(0, v) for any v €V.
Examples
(1) The inner product (,) on Rn is a positive definite symmetric
bilinear form.
(2) Given A = (a^) G Mn(R), we define a bilinear form on Rn as
follows:
/an ... aln \
ffay) = xA(ty) = (xll...1xn)
n
— / v aij%iyj -
\ ani
t^nn I
( yi \
\Vn J
The fact that / is bilinear follows from the fact that matrix
multiplication distributes over matrix addition.
We get the inner product of example (1) by taking A — I.
Note that / is symmetric *=> A is a symmetric matrix.
If we choose a basis a = {t?i,... ,t?n} for the finite-dimensional vector
space V, then we can say a great deal about any bilinear form / on V.
First we express u, v G V in terms of this basis (u = x\V\ -\ h xnvni v =
y\V\ + h ynvn) and then we use bilinearity of / to compute
n
f(u,v) = ^T Xiyjfiv^Vj) .
If we set azj = /(t^, Vj), then we see that / is completely determined by
the n2 numbers {a^}. Letting A be the n x n matrix (a^), an equivalent
way to write f(u,v) is
f{u,v) = (xi,...,xn)A
f Vi \
\ Vn J
A. Rn as a Model, Bilinear Forms
115
So once we choose a basis for V, all bilinear forms on V can be written as
in example (2). A is called the matrix representation of / with respect to
the basis {t?i,..., vn}.
A natural question to ask at this point concerns our choice of basis; i.e.,
if we had picked a different basis f3 = {w±,..., wn} for V, how would that
affect the matrix representation of /? We might expect the answer to be
the same as for endomorphisms (namely, that M#(</>) = CMa((f))C~1 for
some nonsingular>matrix C), but this is not quite right. If we let B be
the matrix representation of / with respect to the basis {wi,..., wn}, then
there is a nonsingular matrix C such that
B = CAlC .
A proof of this relationship is sketched in the exercises.
Exercises
(1) One can make Rn x Rn into a vector space by taking for (x, y)
the 2n-tuple (cci,..., xn, yi,..., yn)- Show that if this is done,
then
(,) : Rn x Rn -+ R
is not a linear map.
(2) Show that (x, y) = 0 for all x implies y = 0.
(3) Show that equality holds in the Schwarz Inequality exactly
when x and y are linearly dependent; i.e., (x, y)2 = (x, x) (y, y) ^
there is a real number r such that x = ry.
(4) Prove the following refinement of the Schwarz Inequality: for
any x = (zi,..., xn), y = (yi,..., yn) in Rn, we have
n
\(x,y)\ <^2\xtyi\ < \\x\\ \\y\\ .
(Hint: Observe that 2af3 < a2 + /?2 for any a, f3 G R, and set
a"iwr p~\\vw
Now sum these inequalities over i.)
(5) The rank of a bilinear form / is defined to be the rank of any
matrix representation of /. Show that this definition makes
aenso (i.e., if B = CAlC (C nonsingular), then A and B have
fche same rank),
116
IV. Inner Product Spaces
(6) Given a symmetric bilinear form / on the real vector space V,
the quadratic form associated to / is the function
g-.V-^R
defined by g(v) = f(v, v). Show that / and g are related by
f(u,v) = \(g(u + v)-g(u-v)) .
(7) If we define a quadratic form onF = R^ by
g(xi, X2) = x\ + 3xix2 - 2x\ ,
find the matrix of its associated symmetric bilinear form.
(8) Show that the real matrix A represents a symmetric bilinear
form <=> A is a symmetric matrix.
(9) Prove the relationship B = CAlC (where A, B are matrix
representations of the bilinear form / with respect to the bases
{vi,..., vn}, {wi,..., wn}) by filling in the details of the
following steps.
(i) Define C = (cij) by
n
i=i
and notice that this is just the change-of-basis
matrix occuring in the proof of Proposition 4 (Chapter
ii).
(ii) Express u^G^as
u = x1v1 H h xnvn = x[wi -\ h x'nwn
v = yivi H h ynvn = y[wi -\ h y'nwn
and show that x = (xi,... ,ccn) is equal to x'C =
(si,..., x'n)C. Similarly, y = y'C.
(iii) Use the result of (ii) to show that /(u, v) = xA^y) =
^(CA'CXV). Conclude that B = CAfC.
B. Real Inner Product Spaces,
Normed Vector Spaces
Let V be a real vector space. (V, (,)) is called a real inner product apace if
(,) is a positive definite symmetric bilinear form on V". As in section A, we
B. Real Inner Product Spaces, Normed Vector Spaces
117
define a norm on V by ||v|| = yj(v,v). Note that || ||2 is just the quadratic
form associated to (,). If it is understood that the field in question is R,
we will simply refer to (V, (,)) as an inner product space.
It is worth observing that our proof of the Schwarz Inequality holds for
any inner product space, finite-dimensional or not.
Example
If V is finiterjlimensional, it is easy to define an inner product on V.
Take any basis {v\,... ,vn} f°r V, define (vi,Vj) = Sij, and extend bi-
linearly. If x = x1v1 + • • • + xnvn, y = y1v1 + • • • + ynvn, then we get
(x,y) = X\y\ + • • • + xnyn. This is clearly positive definite, bilinear, and
symmetric.
Example
We will now examine an infinite-dimensional inner product space in
some detail. Let V be the set of all real sequences x = {xl\ such that
Yjx\ exists; i.e., V consists of all square-summable real sequences. We add
sequences and multiply by scalars in the obvious way; to see that the sum
of two square-summable sequences is square-summable, we define
fay) = ^%iVi ,
i=l
and show that this series converges absolutely. By the refinement of the
Schwarz Inequality occuring in the exercises of section A, we have
\xiyi\ H h \xnyn\ <
£*?
£»?
<
A / ^ i \ / j *i — \\ / j i \
E-
i=l
which holds for all n. But then the monotonic sequence of partial sums
Sn = |#i2/i | H 1- |^n2/n| is bounded and so converges.
Now suppose that {x{} and {yt} are square-summable. Then
Efo + y.)2 = E(^2 + 2xtyi+yt2)
and so {xz}-\- {yl} is also square-summable. It is trivial to see that if r G R
and {x{} G V, then {rxl} is square-summable. Hence V is a real vector
space.
Exercise. Show that (,) defined above for square-summable sequences is
positive definite, bilinear, and symmetric.
Thus wo havo an infinite-dimensional vector space with an inner product
on it. It is called f3*tipacr or Hilberi Space.
118
IV. Inner Product Spaces
Definition Let (V, ( ,)) be an inner product space. We say that two
vectors u^GV are orthogonal if (u, v) = 0. A set S of vectors in V is an
orthnormal set if each v G S is a nni£ vector (i.e., ||v|| = 1) and any two
vectors in S are orthogonal.
JScarapZe
The standard basis for Rn is an orthonormal set by (v) of Proposition
1.
Proposition 3 Any nontrivial finite-dimensional inner product space (V,
(,)) has an orthonormal basis.
Proof: First, we choose any basis {vi,...,vn} f°r V, and then we set
v[ = v\ and
Note that v[ and v'2 are orthogonal since
(v[,v>2) = (v^-^v'J
= (v'1,v2)-^(v'1,v'1)=0.
Also note that Vi,vf2 are nonzero (since {vi,v2} is linearly independent)
and v\,v2 are in the span of {v[, v2}.
More generally, define
(Vm,v'm-l) , _ _ (Vm,Vl) /
(^-1,^-1) m"1 '" K^i) *
and note that t?^ is orthogonal to v[,... 1v'm_1. Furthermore, v^ is not
the zero vector and vm G Span(v/1,..., vfm). If we let
Un = TRJI Vn »
then {ixi,... ,wn} is an orthnormal set of vectors which span V. This is
our desired basis. I
The technique used in the preceding proof to generate the orthonormal
basis {ixi,...,un} is called the Gram-Schmidt orthonormalization process.
Definition Let (V, (,)) be an inner product space. A linear map 0 : V —►
V is said to be orthogonal if for all u, v G V we have
(0(u),0(v)) = (u,v) .
Equivalently (when V is finite-dimensional), A G Mn(R) is an
orthogonal matrix if
{xA,yA)** {x,y)
B. Real Inner Product Spaces, Normed Vector Spaces
119
for all x,y e Rn. We denote the set ofnxn orthogonal matrices by 0(n).
Clearly an orthogonal map preserves lengths (= norms) of vectors,
because
\\<Kv)\\2 = {<Kv),4>(v)) = (v,v) = \\v\\*.
Strangely enough, the converse is true.
Proposition 4 If</>:V—>V preserves lengths, then it is orthogonal
o
Proof: Our hypothesis is:
(a) (<t>(v), <t>(v)) = (v,v) for all v.
Our conclusion is:
(b) (<l>(u),(/)(v)) = (u,v) for all u,v.
Statement (b) appears to be more general. But we will prove that
(a) => (b) by "polarizing the identity (a)" (a procedure we will also use in
our study of normed algebras).
We apply (a) to the vector u + v:
(cj)(u + v),</)(u + v)) = (u + v, u 4- v) = (u, v) + 2(u, v) + (v, v)
= {cj>{u),cj>{u)) + 2(u, v) + <0(t;), 0(t;)>,
and note that also
(<l>(u + v),<l>(u + v)) = (<l>(u)+<l>(v),<l>(u) + <l>(v))
= (d>(u), d>(u)) + 2{cj>{u)^{v)) + {cj>{v),cj>{v))
proving (b). I
We have discussed two different concepts of orthogonality (namely,
orthogonal maps and orthogonal vectors). These notions are quite closely
related.
Given the matrix A G Mn(R), note that Ri = eiA G Rn is the ith
row of A. If A is orthogonal, then (Ri,Rj) = (eiA,ejA) = {e^ej) = •
6ij. Said differently, the rows of A form an orthonormal basis for Rn.
(Exercise. Show that if S C Rn is any orthonormal set, then S is linearly
independent.)
The converse is also true (namely, that a matrix whose rows form an
orthonormal basis for Rn is orthogonal), but a proof of this fact will have
to wait until we know more about orthogonal matrices.
Normed Vector Spaces
We saw in section A that an inner produce (,) on Rn defines a norm
by i
(8) HxHv'M.
120
IV. Inner Product Spaces
Similarly, given the inner product space (V, (,)), we can define a norm on
V by (8). Notice that this norm inherits some properties from ( ,). For
example, for r G R and v G V, we have
(9) ||rv|| = yj(rv,rv) = Vr2 \/(v,v) = \r\ \\v\\ .
Also, positive definiteness of (,) tells us that
(10) IMI>0, and ||t;||=0^t; = 0.
Finally, the Schwarz Inequality gives
(ii) i|w+u|I<iHi + ihi
for any u, v G V. (To see this, square both sides to get
\\u + v\\2 = (u + v, u + v) = \\u\\2 + \\v\\2 + 2(<u, v) and
( ll«ll + ll^ll)2 = ll~ll2 + ll^ll2 + 2||~|| ||V|| ,
and note that the Schwarz Inequality is just \(u, v)\ < \\u\\ \\v\\.)
Definition Let V be a real vector space and suppose there is a function
|| || : V -* R satisfying properties (9), (10), and (11). Then we call (V, || ||)
a normed vector space.
Note that || || is a continuous function on V\ so if we have a sequence
of vectors {vn} converging to the vector v, then the sequence of numbers
{||vn||} converges to ||v||.
We have shown that every inner product space yields a normed space by
using (8). Conversely, given a norm on V, can we define an inner product
which will give us back our norm (via (8))? Unfortunately, the answer is
"no" in general; however, we do have the following
Lemma Let (V, || ||) be a normed vector space. Then || || comes from an
inner product <^> the norm satisfies the parallelogram law
\\u + u||2 - ||u - u||2 - 2( \\u\\2 + IMI2) for all u, v G V .
Proof: => Exercise.
<= If there is an inner product which induces || ||, then it must satisfy
the polar form
(12) <u,«> = i(||u + i,||2-||u-t;||2).
(This is just exercise (6) from the previous section.)
With this in mind, we define a function on V x V by (12) and show
that this is, indeed, an inner product which induces || ||.
It is clear that (,) is symmetric, positive definite, and that it gives back
|| || since
<«,«> = i(||t; + i;||a-||0||a) = J(2a|M|a) = H|a.
B. Real Inner Product Spaces, Normed Vector Spaces
121
Now we show that
(13) (u + v,w) = (u, w) + (v, w)
(and hence we get (u,v + w) = (u,v) + (%w) by symmetry). Expanding
the right hand side of (13) by using (12) gives
(u,w) + (v,w) = J(||u + w\\2 - \\u - w\\2) + j(||t; + w\\2 - ||v - w\\2)
^= \(\\U + U)\\2 + \\V + -L^H2) - l(\\u ~ U)\\2 + \\V - W\\2)
= l(\\u + w + v + w\\2 -\\u + w — v- w\\2)
— |(||w — w + v — w\\2 — \\u — w — v + w||2)
= U\\u + v + 2w\\2-\\u + v-2w\\2)
(u,w) + (v,w) = \(u + v,2w)
where the third equality holds by the parallelogram law, and the last holds
by (12). Setting v = 0 in this last equation yields
(14) (u,w) = \(u,2w) ,
and substituting u + v for u in (14) gives (13).
To show bilinearity of (,), we still must show that
(15) * (ru,v) = r(u,v) for any r £ R.
First suppose r £ N = {1,2,...} and observe that
(ru, v) = (u-\ + u , v) = (u, v) H + (u,v) — r(u, v) ,
v v ' v v '
r r
by using (13) repeatedly. Furthermore, note that
0 = (u + (—u), v) = (u, v) + {—% v)
so that (—u,v) — —(u,v). Hence (15) holds when r is an integer.
Now suppose that r = £ is any rational number (p, q integers with
qj^O). Then
<fu,i;> =p(±u,v) = (f) g^ti,*) = f^ti),*) = § <u,t;>
and so (15) holds for the rationals.
Finally, choose? any r € R and let {ri} be a sequence of rational numbers
converging to f. (Wo know such a sequence exists since the rationals are
122
IV. Inner Product Spaces
dense in the reals.) Note that {riu} is a sequence of vectors converging to
ru. Then we have
(ru, v) = (lim (ryu), v) = lim (r^, v)
= (lim Vi) (u, v) = r{u, v) .
Note that we can "pull the limit out" of (,) since || || is continuous. |
Now we consider an example of a normed vector space (V, || ||) in which
|| || does not come from any inner product on V.
Example
Let V = R2 and define || || by
||(xi,x2)|| = |o?i| + \x2\ .
It is left as an exercise to show that this is actually a norm on R .
If we let x = (1,0) andy = (0,1), then we see that ||a?+y||2- ||#-y||2 =
(|1| + |1|)2 - (|1| + | - 1|)2 = 0 whereas 2(||*||2 + ||y||2) = 2(|1| + |1|) = 4.
So this norm does not satisfy the parallelogram law.
Exercises
(1) Given any set of vectors W from the inner product space (V, (,)),
we define the orthogonal complement of W to be all vectors in
V which are orthogonal to every w € W; i.e., WL — {v G
V | (v, w) = 0 for all w G W}. Show that WL is a subspace of
V. If V is finite-dimensional and if W is a subspace of V, show
that V^ W® W±.
(2) Let W be the subspace of R3 spanned by u = (1,1,0) and
v = (1,0, —2). Find orthonormal bases for W and WL. Verify
that the union of these two bases is an orthonormal basis for
R».
(3) Let <j) be an orthogonal endomorphism on the finite-dimensional
inner product space (V, (,)), and suppose that W is a </>-stable
subspace of V. Show that
(i) <j>\w is orthogonal, and
(ii) WL is </>-stable. (Hint: <j) is nonsingular — a fact
we will prove later — so 0(W) = W.)
(4) Show that || || : R2 -> R defined by ||(xi,x2)|| = |a?i| + \x2\ is
a norm.
(5) Draw a picture in R^ to justify the name "parallelogram law,"
C. Complex Inner Product Spaces
123
(6) For V = Rn, show that the following define norms on Rn.
(i) ||(a:i,...,xn)\\p = (\Xl\* + • • • + |ajn|p)*, where 1 <
p < oo.
(ii) ||(a?i,...,a;n)||00 = max{|xi|,..., \xn\}.
Can you decide which of these norms comes from an inner
product?
(7) Let V be the infinite-dimensional vector space of real-valued
continuous functions on [—7r, tt]. Show that
(f,g)= f" f(t)g(t)dt
J—7T
defines an inner product on V. What is ||/|| if f(t) = t + 3?
Show that {1, sin t, sin(2t),..., cos t, cos(2t),...} is an
orthogonal subset of V.
C. Complex Inner Product Spaces
A complex vector space is simply a vector space whose scalar field is C.
An inner product on a complex vector space is much the same as the real
inner products we have already studied; however, there is one important
difference from the real case which arises because conjugation is a nontrivial
operation on C.
First we define a Hermitian inner product (,) on Cn: for x = (ai,...,
«n),2/ = (/?i,...,/?n)GCnwelet
(x,y) =aiJ3i-\ han/3n.
Proposition 5 (,) has the following properties.
(i) (,) is Hermitian symmetric; i.e., (x,y) = (y,x),
(ii) (x + y,z) = (x,z) + (y,z),
(ii') (x,y + z) = {x,y) + {x,z),
(iii) (ax,y) = a(x,y),
(Hi') (x,ay) = a(x,y),
(iv) (x,y) = (x,y),
(v) (,) is a positive definite; i.e., (x,x) is a real number > 0,
and {», x) » 0 «» x = (0,..., 0),
124
IV. Inner Product Spaces
(vi) (ei,ej) = 6ij where {ei,...,en} is the standard basis for
Cn, and
(vii) (,) is nondegenerate.
Proof: Exercise. (Hint for (v): Recall that if a = a + bi G C, then
aa = a2 + b2 G R.) I
Properties (ii) and (iii) say together that (,) is linear in the first variable,
while (ii') and (iii') say that (,) is conjugate linear in the second variable.
So our inner product on Cn is said to be Hermitian linear.
We could have defined our inner product on Cn by formula (1), and
then it would still be bilinear. Unfortunately, our inner product would no
longer be positive definite because aa is not always a real number if a G C.
Since positive definiteness is crucial for defining norms, we are willing to
sacrifice bilinearity.
Definition Let Fbea complex vector space. A Hermitian form on V is
a function / : V x V —► C which is Hermitian symmetric and Hermitian
linear; i.e.,
f(u,v) = f(v,u) , and
f(au + 0v, w) = a/(u, w) + /3f(v, w)
where u,v,w eV and a,/? G C.
If V is a complex vector space, then (V, (,)) is called a complex inner
product space if ( ,) is a positive definite Hermitian form. The Schwarz
Inequality still holds in complex inner product spaces, but we need to
modify its statement and proof for the present case.
Proposition 6 (Schwarz Inequality) For any u,v in the complex inner
product space (V, (,)), we have
\{u,v)\2 < (u,u)(v,v) .
Proof: We will use the fact that aa = \a\2.
First, observe that if (u, v) is real, then our previous proof works without
any problems.
If a = (u,v) is not real, then we have
u 1 (u,v)
(-,V) = -{U,V) = r = 1
a a (w,v)
so that (^,v) is real. Thus we have
a a a aa
so that aa = (u}u)(v}v). Since a = (u,t>), we have our result. I
C. Complex Inner Product Spaces
125
Given two vectors u, v from the complex inner product space (V, (,)),
we say they are orthogonal if (u^v) = 0 (just as we did in the real case).
The Gram-Schmidt process works exactly the same, too, so every finite-
dimensional complex inner product space has an orthonormal basis.
We define a norm on V by
(16) |M| = y/{^) .
O
It is left as an exercise to show that this norm inherits the same properties
from (,) as in the real case; i.e.,
(17) \\av\\ = |a| \\v\\ for a € C , v G V ,
(18) N|>0, and ||v|| = 0 O v = 0 , and
(19) ||w + v|| < Ml + H .
Hence the definition of a (complex) normed vector space is exactly the
same as before. Furthermore, it is still true that a norm on V comes from
an inner product (via (16)) O the norm satisfies the parallelogram law.
Definition Let (V, (,)) be a complex inner product space. A linear map
(j): V —► V is unitary if for any u, v G V we have
{(j){u),(f){v)) = (u,v).
Equivalently (when V is finite-dimensional), A G Mn(C) is a unitary matrix
if
(xA,yA) = (x,y)
for all x,y G Cn. We denote the set of n x n unitary matrices by U(n).
Proposition 7 U(n) is a group under matrix multiplication.
Proof: Matrix multiplication is associative, and clearly the identity matrix
is unitary. If A, B G U(ri), then (xAB,yAB) = (xA,yA) = (x,y), showing
that AB is also unitary.
Note that if A = (al3) G Mn(C), then we define the conjugate of A to
be
A = (ai:j) .
Since the operations of conjugation and taking transposes commute, the
symbol lA is unambiguous. It remains to prove that if A is unitary, then A
is an isomorphism and A"1 is also unitary. For this, and other purposes,
the following lemma is important.
Lemma 1 For A £ Mn(C) and x,y G Cn we have
(a?i4,y) = {x,y%A) .
126
IV. Inner Product Spaces
Proof: Let A = (a^) and calculate
xAj= (xiau H h xnanll..., xxaln H h xnann)
y *A = (yian H h ynain,..., yiani H 1- 2/nflnn) •
Thus we see that (xA, y) is equal to
)2/i H 1- (ziain H h xnann)yn ,
whereas the (x,y *A) is
#i0ii2/i H 1- aint/n) H 1- Snfanif/i H h annyn).
Inspection shows these two expressions contain exactly the same terms.
i
Corollary For A G Mn(R) and a?,y G Rn, Lemma 1 gives (xA,y) =
Continuing the proof of Proposition 7, we use Lemma 1 to show that
A G Mn(C) is unitary «=> *A = A-1.
By Lemma 1, (eiA^ejA) = (e*, (e^A) *A). Thus
(e*A, e^ A) = (e», e,) = <% <£> A *A = J.
So if A is unitary, then A-1 exists and is equal to tA. Furthermore,
(#A~~1,yA~"1) = (xA~1A,?/A"~1A) = (x^y). This shows A-1 is also unitary
and completes the proof that U(n) is a group. I
In the course of proving Proposition 7, we also showed that the rows of
a unitary matrix form an orthonormal basis for Cn.
Next we want to prove a result for unitary matrices like we did for
orthogonal matrices.
Proposition 8 If A G Mn(C) and A preserves lengths, then A is unitary.
Proof: To prove this theorem, the following lemma will be useful.
Lemma 2 If A preserves lengths, then
(etA,ejA) = -(ejA,e»A) for i^j.
Proof :
((e< + Cj)-A,(e< + ej)-A) = (ei + ej,ei + ej) = (e^e*) + (e^ty) » 2.
C. Complex Inner Product Spaces
127
But the left hand side also equals
(e» A + ej A, e{A + ejA) = (e»A, e* A) + (e* A, e. A) + (e. A, e*A) + (e3- A, e^-A)
= 2 + (e^A, e^A) + (e^-A, e»A).
This proves Lemma 2. I
Now we can prove Proposition 8. For i ^ j, let
u
a: = aei + /Jej
where a,/3 e C are not yet specified. Then
(20) (x, x> = (aei + /%, ae* + /Je^) = aa + 0 4- 0 + /3/3.
But, by Lemma 2, we also have
(x, x) — (xA, xA) = (aeiA + /?ej A, ae^A + /3ejA)
/91 x = aa(e*A, e»A) + a^e^A, e, A) + /?a(ej-A, e*A)
1 j +PP{e5A,e5A)
= aa + /3/3 + (a/3 - /3a) (e» A, e, A) .
Comparing (20) and (21) gives
(a^-/3a)(eiA,eiA) =0.
Letting a = i and /3 = 1 gives a/3 — /3a = 2i, and hence
(e» A, ey A) = 0 fori^j.
So for a: = xiei H h xnen? 2/ = 2/iei H h ynen we have
(xA, yA> = ((xiei H h xnen)A, (y1e1 H h ynen)A)
proving that A is unitary. I
Exercises
(1) Show that the norm defined by (16) inherits properties (17)-
(19) from the Hermitian inner product (,).
(2) Prove that a norm on a complex vector space comes from a
Hermitian inner product (via (16)) <& the norm satisfies the
parallelogram law. Give an example of a norm on C2 which
dofi not come from any inner product.
128
IV. Inner Product Spaces
(3) Let V be the complex vector space of all continuous complex-
valued functions on [0,1], Show that
</,»> = / f(tW)dt
Jo
defines an inner product on V. What is the norm of elt =
cos t + i sin tl
(4) Given A G Mn(C), show that
*(A) = (U).
(And so *A is unambiguous.)
(5) Let (V, ( ,)) be a complex inner product space, and suppose
</> G End(V') is unitary. If W is a 0-stable subspace of V, show
that W-1 is also 0-stable.
(6) The matrix A G Mn(C) is called Hermitian if
(Note that the set of Hermitian matrices contains the set of
symmetric matrices as a subset.) Show that the eigenvalues of
a Hermitian matrix are real numbers. (Hint: Consider (xA, x)
where x is an eigenvector for A.)
(7) The matrix A G Mn(C) is called skew-Hermitian if
tA = -A.
Show that the eigenvalues of a skew -Hermitian matrix are
purely imaginary. (We say a G C is purely imaginary if a = bi
where b G R.)
(8) The matrix A G Mn(C) is called positive definite if there is a
nonsingular matrix B G Mn(C) such that
A = BtB .
Show that the eigenvalues of a positive definite matrix are real
and positive. (Hint: Consider (xB,xB) where x is an
eigenvector for A.) Note that, in particular, this holds for A G Mn(R).
(9) Given the Hermitian matrix A G Mn(C), show that the
function / : Cn x Cn -» C defined by
f(x)y)^xA(1y)
D. Orthogonal and Unitary Groups
129
is a Hermitian form on Cn. Conversely, given the basis {?;i,...,
vn} for the complex vector space V, and given the Hermitian
form / on V, we get a matrix A — (a^) associated to / as above
(i.e., dij = f(vi,Vj)). Show that A is a Hermitian matrix. How
does A change if we choose a different basis for VI
For exercises (10)-(14), let (V, ( ,)) be an inner product space (real or
complex). 0
(10) Show that each v G V determines a functional </>v G V* defined
by
(j)v(w) = (w,v) .
However, i\)v defined by ipv(w) — (v,w) is not linear unless V
is a real vector space.
(11) If V is finite-dimensional and if <j> G V* is any functional, show
that there is a unique v G V such that <j>(w) = (w,v) for all
w G V.
(12) Suppose V is finite-dimensional and let ip G End(V). Use
exercise (11) to show that there is a unique endomorphism rp*
of V such that
(il>(u),v) = (u,il>*(v))
for all u, v G V. We call ip* the adjoint of ip. (Hint: Fix v ^V
so that (f) defined by </>(u) = (ip(u),v) is a functional on V.
Then there is a unique v' G V such that 0(u) = (u, v') for all
u eV. Define ip* by
^*(^) = ^.)
(13) Continuing exercise (12), choose an orthonormal basis a for V.
Show that if A = Ma(^), then *A = Ma(^*).
(14) Show that A G Mn(C) represents a unitary map (j) G End(V')
(with respect to an orthonormal basis of V) <=>• A is a unitary
matrix. State and prove an analogous result for orthogonal
maps/matrices.
(Thus we see that if (j) is orthogonal or unitary, then it is
nonsingular.)
D. Orthogonal and Unitary Groups
Recall that 0{n) is the sot of n x n orthogonal matrices. Because of the
corollary to Lemma I, our proof that U(n) is a group (whose operation is
130
IV. Inner Product Spaces
matrix multiplication) carries over almost verbatim to show that 0(n) is
also a multiplicative group. In fact, we can think of 0(n) as a subgroup of
U(n) where the inclusion is induced by the natural inclusion of R into C.
Proposition 9 For A £ Mn(H), the following statements are equivalent:
(i) A is orthogonal,
(ii) (eiA,ejA) = %
(iii) A maps an orthonormal basis (of Hn) to an orthonormal
basis,
(iv) the rows of A form an orthonormal basis
(v) the columns of A form an orthonormal basis, and
(vi) *A = A-1.
Proof: Exercise. I
In Proposition 9, if we make the obvious adjustments, then we get
an analogous characterization of unitary matrices. From the equivalence of
(iv) and (v), we conclude that A is orthogonal (unitary) &> *A is orthogonal
(unitary).
Proposition 10 If A € U(ri), then detA is a complex number of unit
ength. If A € 0(ri), then detA G {1, -1}.
Proof: For A unitary, we know that A-1 — tA. Thus
1 = det (J) = det{AA~x) = det Adet( *A) = detA detA
so that | det A\ = Vdet AdelTA = 1.
For A orthogonal, we know that A is also unitary (by the inclusion
0(n) C U(n)) so that | det A\ = 1. But det A is a real number, so we must
have det A = 1 or det A = -1. I
Definition The special orthogonal group is
SO{n) = {A e 0{n) | det A = 1}
(also called the rotation group). Similarly, the special unitary group is
SU(n) = {Ae U(n) | det A = 1} .
The fact that these are, indeed, groups is an easy exercise.
Definition We say that the matrix A G Mn(R) is skew-symmetric if
%A = -A.
D. Orthogonal and Unitary Groups
131
The matrix A G Mn(C) is skew-Hermitian if
*A = -A .
The set of all n x n skew-symmetric matrices is denoted by o{n), and the
skew-Hermitian matrices are denoted by u{n).
Note that o{n) and u{n) are not multiplicative groups. For example, if
then A is skew-symmetric, but A2 = I 1 J is not. However, o{n)
is a subspace of the n2-dimensional real vector space Mn(R). (Exercise.
Show this.)
It is not true that u{n) is a subspace of the complex vector space Mn(C)
(since a complex multiple of a skew-Hermitian matrix need not be skew-
Hermitian); but if we think of Mn(C) as a (2n)2-dimensional real vector
space, then u{n) is a (real) subspace. For example, if
then A is skew-Hermitian, and so is rA for any r G R, but
is not skew-Hermitian.
Proposition 11 The dimension of o(n) is n^n~ \ and the dimension of
u{n) is n2.
Proof: Exercise. (Hint: Bases for both spaces are easy to write down.)
i
Recall the definition of a Lie Algebra from Section IIB (i.e., an algebra
whose multiplication is anticommutative and satisfies the Jacobi identity).
We make the real vector spaces o(n) and u(n) into Lie Algebras by defining
[A, B] = AB-BA,
where AB denotes matrix multiplication. To see that [A,B] is skew-
symmetric whenever A, B € o(n), we observe that
*[A,B] = t{AB-BA)=\AB)-\BA)
= tBtA- WB = -(*A*B -lB *A)
- -((-A)(-B)-(-B)(-A)) = -[A,B\.
132
IV. Inner Product Spaces
Similarly, [A,B] E u(n) whenever 4,BG u{ri).
The groups 0{ri), SO(n), f/(n), and SU(n) are examples of Lie Groups.
(The definition of a Lie Group would take us too far afield, but we simply
remark that they are important in various branches of mathematics.) Now
o(n) is the Lie Algebra associated to the Lie Group 0(n) (and also to
SO(n)), and u{ri) is the Lie Algebra associated to U(n). The Lie Algebra
associated to SU(n) is
su(n) — {A E u(n) | trace A = 0} .
The idea of associating a Lie Algebra to a Lie Group is akin to
associating /' to / in calculus. We do this because linear objects are easier to
study than nonlinear ones, but we still get useful information about the
associated nonlinear objects.
Exercises
(1) State and prove a proposition (analogous to Proposition 9) for
unitary matrices.
(2) Show that if A = (a^) is an orthogonal or unitary matrix which
is triangular, than A is diagonal.
(3) Show that SO(n) and SU(n) are, indeed, subgroups of 0{n)
and f/(n), respectively.
(4) Show that SO(2) = [ ( _^f^ ™ * \ I 0 E r|; also, show
that J7(l) = {eie = cos0 + i sin# | 0 E R}. Now show that the
map <$> : 50(2) - U(l) denned by cf> ( _C^ ™* ) = e"
is a group isomorphism (i.e., </>(AB) = </>(A)</>(B), and there is a
map ^ : 17(1) -► SO(2) such that ^(e^e*') - ^(^)^(^2)
and ip o 0, 0 o t/j are identity maps). (Hint: Use the standard
trigonometric identities to show that eieiel°2 = el(ei+°2\)
(5) Show that if A E o{ri) (A E w(n)), then the diagonal elements
of A are zero (purely imaginary).
(6) Show how we can think of Mn(C) as a (2n)2-dimensional real
vector space. (Recall that we already did this for n = 1.) Now
show that u(n) is a subspace.
(7) Show that su(n) is a Lie Algebra and that its dimension as a
(real) vector space is n2 - 1.
E. Stable Subspaces for Unitary and Orthogonal Groups
133
(8) The special linear group is the set
SL{n, R) = {A G Mn(R) | det A = 1}
whose group operation is matrix multiplication. Show that this
is, indeed, a group and find a matrix A G SL(n, R) — 0(ri).
(9) Show^that
sl(n, R) = {Ae Mn(R) | trace A = 0}
is a Lie Algebra. (This is the Lie Algebra associated to the Lie
Group SX(n,R).)
Curves in Matrix Groups.
A curve in the real normed vector space V is a continuous map 7 :
'vfc ~f~ h) — /"y(c)
(a, b) —» V. For c G (a, 6), we say 7 is differentiable at c if lim -
exists, and we denote this limit by 7;(c).
Given the subset W C V, we say that 7 is a curve in W if j(t) G W for
every t G (a, 6). If G is a multiplicative group in V (e.g., V = Mn(R) or
Mn(C), and G = O(n) or C/(n), and if 7, a are two curves in G, then we
define their product curve in G by
(1a)(t)=1(t)a(t).
(10) If 7 and a are both differentiable at c G (a, 6), show that 7a is
also differentiable at c, and
(jcry(c)=j(c)a'(c)+j'(c)a(c).
(11) Let /? be a differentiable curve (3 : (—a, a) —► O(n) with (3(0) =
I. Show that /?'(()) is skew-symmetric. For (3 : (—a, a) —► f/(n)
with /?(0) = /, show that /?'(0) is skew-Hermitian.
(Exercise (11) should provide a hint as to how we associate a Lie Algebra
to a Lie Group.)
E. Stable Subspaces for Unitary and
Orthogonal Groups
We begin with two easy observations about the unitary matrix A e U(n).
(1) If th® aubapace W C Cn is A-stabk, then A\w is unitary.
134
IV. Inner Product Spaces
(2) If W CCn is A-stable, then the orthogonal complement of W
WL = {z e Cn | (z,y) = 0 for all yeW}
is also A-stable.
Proof: Note that A\w € End(W) since W is A-stable. Also
(xA,yA) = (x,y)
for all x,y € W so that A\w is unitary on W.
Now choose z € W1- and note that for any y 6 W we have
(zA,y) = (z,y'A) = (z,j/A-1).
But A is an isomorphism of W onto W so yA""1 € W. Thus (zA,y) = 0
for all y € W, showing that 2A € W-1-. I
Proposition 12 Given A € U{n), there exists an orthonormal basis for
Cn consisting of eigenvectors for A.
Proof: We show this by induction on n. For n = 1, the result is trivial; so
suppose n > 1. Since the characteristic polynomial of A factors completely,
A has at least one eigenvalue and hence an eigenvector v\. Then
1
IMI
is a unit vector.
Let W = Span(ui), so that Cn = ^® W1- by exercise (1) of Section
IV B. Note that dim WL = n — 1 (since dim W = 1), and A|wX is unitary
on W^ (by our observations). By induction, W^ has an orthonormal basis
{iX2, • • •, un} of eigenvectors for A\w±.
Thus {u\,..., wn} is the desired basis. I
Corollary A G 17(ra) is diagonalizable.
Exercise. Given A G U{n), show that there is a B G 17(n) such that
BAB"1 is diagonal.
It is tempting to assert that Proposition 12 should also hold for A G
O(n) since the observations (1) and (2) still hold for an orthogonal matrix.
But we have seen that A G 0(n) need not have an eigenvalue (in R). For
example,
*-(-? ;H2)
is not diagonalizable.
However, if A € Mn(R) is symmetric, then our observations still hold.
E. Stable Subspaces for Unitary and Orthogonal Groups
135
Exercise. State and prove appropriate versions of observations (1) and (2)
for a real symmetric matrix.
So if we can show that a symmetric matrix always has an
eigenvector, then the proof of Proposition 12 can easily be adjusted to prove the
following.
Proposition 13 If A £ Mn(R) is symmetric, then there is an orthonor-
mal basis for R/i> consisting of eigenvectors for A.
We show that a symmetric matrix has an eigenvector in the next two
propositions.
Proposition 14 Let A £ Mn(R) be symmetric and suppose A £ C is an
eigenvalue of A. (Thinking of A as an element of Mn(C), we know such a
X exists.) Then A £ R.
Proof: Let x £ Cn be an eigenvector for A corresponding to A. Using the
standard inner product on Cn, we see that
A(x, x) = (Ax, x) = (xA, x) = (cc, xlA) .
But A is real and symmetric, so lA = A. Thus
(x, x *A) = (x, xA) = (x, Xx) = A(cc, x)
and we see that
A(x, x) = X(x,x) .
Since x is an eigenvector, it is nonzero and so (x, x) ^ 0. Hence
A = A
implying that A is real. I
Even though this proposition shows that the eigenvalues of A are all
real, we cannot immediately conclude that the corresponding eigenvectors
are in Rn (because we had to work in Cn).
Proposition 15 The symmetric matrix A has an eigenvector in Rn.
Proof: Let A £ R be an eigenvalue of A and let z £ Cn be a corresponding
eigenvector. We can write z uniquely as
z — x -f iy
where x,y € Rn. (Exercise. Show this.) And so
Xx 4- iXy = Xz = zA = xA + iyA .
136
IV. Inner Product Spaces
Since A is real and since the real and imaginary parts of a vector are unique,
we must have
xA = Xx and yA = Xy .
But z is nonzero, so one of x or y must be nonzero. I
Exercise. Given the symmetric matrix A G Mn(R), show that there is a
B G 0(n) such that BAB"1 is diagonal.
Now we return to orthogonal matrices and show that even though
A=(cos9 -sin0\
\ sin^ cos0 J V '
is not diagonalizable (unless 0 = kit for k G Z), this is about the worst
case. That is
Proposition 16 Given A G 0(n), Rn contains A-stable subspaces of
dimensions one and/or two such that Rn is the internal direct sum of these
subspaces.
Proof: Again, we use induction on n. For n = 1, there is nothing to prove;
so assume n > 1.
Lemma There exists an A-stable subspace W C Rn with dimW G {1,2}.
Proof: B = A + M. is symmetric, so B has an eigenvector x G Rn (by
Proposition 15). Let A G R be the corresponding eigenvalue, and consider
the two vectors x, xA.
If x and xA are linearly dependent, then xA = rx for some r G R.
Thus
W = {tx 11 G R}
is A-stable and dim W = 1.
If x and xA are linearly independent, let
W = Span(x,xA) .
Note that
Xx — xB = xA + x lA = xA + xA'1
and so
XJ\. ~==- \XJ\. X •
Now choose w G W and write this as w = ax + bxA for some a, b G R.
But then
wA = (ax + focA)A = ax A + 6xA2
= axA + b(XxA - x) = (-&)# + (a + 6A)xA € W*
so that W is A-stabl©.
E. Stable Subspaces for Unitary and Orthogonal Groups
137
This proves the lemma. I
Returning to the proof of Proposition 16, we let W be an A-stable
subspace as in the lemma, and note that Rn = W ® W±.
By our observations, A\w± is orthogonal on W1- (which is of dimension
< n); so by induction, WL can be written as the direct sum of A-stable
subspaces of dimensions one and/or two. I
Given A G 0(n), Proposition 16 allows us to restrict our attention to
the A-stable subspace W C Rn which is of dimension one or two. So by
Proposition 13 of Section HIE, we only have to look at 1 x 1 and 2x2
orthogonal matrices.
A G O(l) must be (1) or (-1) since | detA\ = 1. What about A G 0(2)?
Exercise. Show that A G 0(2) can be written as
cos#
sin#
— sin#
cos#
)«(
— cos 9 sin 9
sin 9 cos 9
for some 9 G R.
Note that A
— cos9 sm# \ . ,. Tii. ..i
. n /i is diagonahzable since its charac-
sm# cos 9 J °
teristic polynomial is x2 — 1. (Exercise. Show this.) So A is similar to
(J -?)■
Combining Proposition 16 with these observations gives the following.
Corollary Given A G 0(ri), there is a nonsingular matrix C such that
CAC~X is a block diagonal matrix of the form
(22)
where M\
Mk =
I, M2
cos Ok — sin 9k
sin Ok cos Ok
V 0
-I,M3
Mo
M.
0 \
Mk J
cos 9s — sin 9s
sin 9s cos 9s
(1) Show that ( C0Sf
kn for k € Z.
Exercises
-sin0
CGS0
)'
is not diagonalizable unless 9 =
138
IV. Inner Product Spaces
(2) Show that
2
1
7?
-1
3^2
1
3
0
4
3\/2
2
A
V2
-1
3^
is orthogonal and find a matrix of the form (22) which is similar
to A
(3) A G Mn(C) is said to be a normal matrix if A fA = lAA (or
A G Mn(R) is normal if A1 A = MA). Note that Hermitian and
unitary matrices are special cases of normal matrices. Show
that
A~ \ 1 3 + 2* )
is normal.
(4) Suppose A G Mn(C) is normal. Show that
(i) A is Hermitian o the eigenvalues of A are real,
and
(ii) A is unitary o the eigenvlaues of A are all of
absolute value 1.
(5) Given A G Mn(C) and an A-stable subspace W C Cn, show
that W-1 is ^-stable.
(6) Let A G Mn(C) be normal. Show that
(i) ^ = 0^^1 = 0,
(ii) A — XI is normal for any A G C, and
(iii) if xA = Ax, then xtA = \x.
(7) If A G Mn(C) is normal, show that there is an orthonormal
basis for Cn consisting of eigenvectors for A. (Hint: Use exercises
(5) and (6) to prove appropriate versions of the observations
(1) and (2).)
(8) Show that any endomorphism of a finite-dimensional complex
inner product space can be represented by a triangular matrix.
(9) Suppose the nxn matrix A is normal. If x and y are
eigenvectors belonging to distinct eigenvalues of A, show that x and y
are orthogonal.
E. Stable Subspaces for Unitary and Orthogonal Groups
139
(10) Recall that GL(n,R) = {A e Mn(R) | det >1 ^ 0}. If P(n)
denotes the set oinxn positive definite matrices (see exercise
(8) of Section IV C), show that
GL{n,R) = 0(n) x P(n) .
That is, given A G GX(n,R), there exist B G 0(n) and C G
P(n)Dsuch that
A = BC.
Furthermore, B and C are unique. (Hint: TV — A lA is
symmetric, so there is an orthogonal matrix Q such that QNQ~X
is diagonal; but the diagonal elements are positive since N is
positive definite. Thus there is a (real) diagonal matrix S such
that S2 = QNQ-1. Let C = Q~lSQ.)
(11) Show that GL{n, C) = U(n) x P(n).
Ill I I I i. I Hi I I ||i< |.. mil iiiIMM M, HP i »l> II «||l||li lill'n ill . liiiUlli. PlUli I HUM INllUll |i||ilillM I'MBwin HHtfliiu ii'f i N if ffS^f|WHHi^WWWiWljlHWtltef,.f||l1P4l(M|*!ffll|iNhilHf ' I I
Chapter V
Normed Algebras
A. The Normed Algebras R and C
We have seen that normed vector spaces are the same as inner product
spaces precisely when the norm satisfies the parallelogram law.
We now begin our search for normed algebras. Let A be a finite-
dimensional real vector space and choose a basis {t?i,..., vn} for A.
Suppose that A has a multiplication which makes it an algebra (which is not
necessarily commutative or associative) with unit element v\ = 1.
If we define an inner product on A by (vi^Vj) = <5;j, then we get the
associated norm on A as usual; i.e.
We say that A is a normed algebra if
(1) \\uv\\ = |H| \\v\\ for all u,v e A .
Note that unlike the case of a normed vector space, we require the norm
on an algebra to come from an inner product.
We begin with the observation that R is a normed algebra. Its norm is
||z|| = y/x? and we easily check that we do have \\xy\\ = \\x\\ \\y\\. So R is
our first example.
Next we claim that the vector space R2 with its usual norm ||x|| =
\/x\ + x\ can be made into a normed algebra by an appropriate definition
of multiplication. Indeed, the multiplication will be unique in the following
sense. We consider
R={(x,0)}CR2
and we insist that our multiplication on R2 extend that on R; i.e., (a, 0)(ft,0)
= (aft, 0), and (1,0) is the unit element for the multiplication on R2. Then
there in only on© such multiplication making R2 into a normed algebra.
142
V. Normed Algebras
To see this, we take the standard basis {(1,0), (0,1)} for R^ and denote
these vectors by
1 = (1,0) and i = (0,1).
Then any (a, b) G R can be uniquely written as al + bi or, since 1 is
the multiplicative identity, as a + bi. We always want multiplication to
distribute over addition so we must have
(2) (a + bi)(c + di) = ac + bci + adi + bdi2 = ac+ (be + ad)i + bdi2
so we just need to define i2 so that (1) is satisfied. We have ||i|| =
Vl2 + 02 = 1 so that ||i|| ||i|| = ||i2|| = 1 by (1). This shows we can
write i2 = u + wi with w, w G R and u2 + w2 = 1. Then
1 = ||i3||2 = || (^ + wi)i\\2 = \\ui + w(u + wi)\\2
= \\uw + (u + w2)i\\2 = u2w2 + u2 + 2wif;2 + if;4 ,
and putting in u2 = 1 — w2 gives
1 = (l-w2)w2+ (l-w2) + 2uw2+wA
= it;2 - w4 + 1 - w2 + 2wk;2 + tf;4
= 1 + 2uw2 .
Thus we must have 2m*;2 = 0 so that u — 0 or w = 0.
If u = 0, then it; = 1 or it; = —1. If w = 1 (and w = 0), then i2 = i which
implies i = 1 (see Exercise (3)); i.e., (1,0) = (0,1), which is false. If u = 0
and w = —1, we get i2 = —i, implying that i = — 1, also a contradiction.
Thus we cannot have u = 0.
So W = 0 and i2 = 1 or i2 = —1. If i2 = 1, we get
0=||i2-l|| = ||i + l||||i-l|| = x/2V/2 = 2,
a contradiction. Thus i2 = — 1 and (2) becomes
(2') (a + bi)(c + di) = (ac - bd) + (ad + be) i.
Exercise. Show that this multiplication is associative, commutative,
extends that of R = {a + 0i} C R , and preserves norms. (Hint: Recall the
exercises of Chapter 0.)
The previous example is typical of the situation in which we are
interested. That is, if A is a normed algebra, can we extend A to a larger
normed algebra B such that the multiplication of B extends that of A and
the norm on B restricts to that on Al If such a B exists, we say that it
has A as a subalgebra.
An application of (2') to number theory is
A. The Normed Algebras R and C
143
Theorem Let S C Z be those integers which may be written as a sum of
two squares (e.g., 29 G S since 29 = (2)2 + (5)2>). Then S is multiplicatively
closed.
Proof: Suppose p — a2 + b2 and a = c2 -f d2 are in S.
We need to show pa G S. Set a = a + bi, f3 = c + di. Then we have
pa = (a2 + 62)(c2+d2) = ||a||2||/3||2 = ||a/3||2
o= (ac-6d)2 + (ad+6c)2. I
Of course, the normed algebra we found on R2 is just the field C of
complex numbers. So now we ask if we can extend this normed algebra on
C to a normed algebra on R . The answer is a resounding NO. In fact, we
cannot even extend the multiplication on C to an associative multiplication
on R . (Later we will get this result without assuming associativity.)
Suppose we can extend the multiplication on C to R . We extend the
basis {l,i} of C to the basis
{1-(1,0,0), i = (0,l,0), j = (0,0,1)} for R3
and note that we must have
ij = a + bi + cj with a,b,c G R.
If we assume that the extension of C to R** is an associative algebra, then
—J — ^3 — *(u) — ai — b + c(a + bi + cj) = (ac — b) + (a + bc)i + c2j
and equating coefficients of j gives — 1 = c2, contradicting c G R.
Next we turn our attention to R . We shall see that we can make R4
into a normed algebra which is associative but not commutative. Our proof
of this will be constructive; that is, we will use some properties of a general
normed algebra to decide how we should define a multiplication on R .
We use the basis
{1 = (1,0,0,0), t = (0,1,0,0), j = (0,0,1,0), fc = (0,0,0,1)}
along with the standard norm (i.e., if x = x\ + x2i + x3j + x±k then
\\x\\ = \Jx\ + x\ + x\ + x\ ). Also, we want C to be a subalgebra, so we
insist that i2 = —1.
Let R = Span(l) so that its orthogonal complement is
Rx = Span(i,j, k) .
Then any q 6 R can be uniquely written as q = a + (3 with a G R and
0 £ R1, Wa call a = Re(q) the real part of q and {3 — Im(g) the imaginary
part of q,
144
V. Normed Algebras
We define the conjugate of q = a + /? to be
q = a- f3 .
Then clearly
q = q,P = -/3, Re(q) = ^, and Im(q) = ^
Also, for <ji, g2 £ R we have
<?i + 92 = qi + <?2 •
Exercise. Show that these conjugation formulas hold.
We want to prove that for any q G R , we have
(3) qq=\\q
|2
It turns out to be fairly easy to prove (3) for any normed algebra, so
we will do this in the next section.
After proving (3) in the general setting, we will finish making R4 into a
normed algebra. We call this normed algebra the quaternions and denote it
by H. (The H is in honor of W.R. Hamilton, who invented the quaternions
in 1843.)
But first, we observe that if /? G Rx is any unit vector, then we have
w = M2 = i
by (3). But /3$ = /?(-/?) = -/32 so that
(4) /32 = -1 .
In particular, we must have j2 = k2 = — 1.
Next, we note that ij must be in Rx (we will see in Proposition 3 that
(xw,y) = (x,yw) for any x,y,w G R4, so (ij, 1) = (i,j) = -(i,j) = 0; so
by (4), we also have
(5) ijij = (ij)2 = -1
since ij is a unit vector. (At this stage, we are assuming that we can
construct an associative multiplication.) Multiplying (5) on the left by i
and on the right by j gives
ji = -ij .
Similarly, we see that jk = —kj and ki = —ik.
Consequently, H is not commutative.
B. Some General Results, Quartenions
145
Exercises
(1) Show that the unit element of any normed algebra is a unit
vector.
(2) Suppose that A is a normed algebra and that a G A is a unit
(i.e., there is a (unique) a"1 G A such that aa~l = a~~la = 1).
Show°that
iia-iHMr1-
(3) Suppose a and b are elements of a normed algebra. Show that
if ab = 0, then either a = 0 or b = 0. Use this to show that if
i2 = i, then i = 1 (hence i2 ^ i).
(4) What is wrong with the following "proof" that j and k are
linearly dependent? Since j2 = k2 = — 1, we have j2 — k2 = 0.
But x2—y2 factors as {x-{-y){x—y), so we get (j+k)(j — k) = 0.
By exercise (3), either j — —k or j = k.
(5) Read "Hamilton's Discovery of Quaternions" by B.L. van der
Waerden in Mathematics Magazine, vol. 49, #5 (1976), pp.
227-234.
B. Some General Results, Quaternions
Let A be a normed algebra with basis {t>i,... ,t>n} such that V\ = 1 and
the inner product on A is
(x,y) =Xiyi-\ \-xnyn
where x = ori^x H h xnvn, y = Z/i^i H h ynVn.
Recall that the commutator of # and y is
[x, y] = xy - yx,
and the associator of #, y, z is
[x,y,z] = (xy)z-x(yz).
As before, R = Span(l) and its orthogonal complement is R-1 = Span(t>2
,..., vn). Also, we can write x £ A uniquely as x — a + /3 with real part
a — Re(a?) € R and imaginary part (3 = Im(#) G Rx. We define the
conjugate of x to be #
146
V. Normed Algebras
and it is easy to show that
n n X+X X-X _
x = x, p = -/?, a = , p = , and x + y = x + y .
Notice that if x G R (<£> Im(:r) = 0), then we can consider x to be a
scalar. In fact, A need not be normed for this to hold.
Exercise. Let B be any real algebra with unit element. Show that a G
Span(l) can be considered to be a scalar.
Proposition 1 Suppose B is a real algebra with unit element (not
necessarily normed).
(i) For x,y G B, if one of them is real, then [x, y] = 0.
(ii) For x,y,z G B, if any one of them is real, then [x, y, z] = 0.
Proof: (i) and (ii) both follow from the preceding exercise and the
definition of an algebra. I
Prom now on, our algebras will be normed unless we state otherwise.
Proposition 2 For any x,y,w G A, we have
(i) (xw,yw) = (x,y)\\w\\2 {= (x,y)(w,w)), and
(ii) (wx,wy) = (x,y)|H|2.
Proof: We prove (i); (ii) can be proved similarly. First we obtain an
identity in x and w. We have ||aru;||2 = \\xw\\ \\xw\\ = \\x\\ \\w\\ \\x\\ \\w\\ =
\\x\\2\\w\\2 so that
(6) (xw,xw) = (x,x)(w,w) .
Now we prove (i) by polarizing the identity (6). We put x + y in (6) in
place of x to get
((x + y)w, (x + y)w) = (x + y, x + y) (w, w) .
We expand both inner products (by bilinearity and symmetry) to get
(xw,xw) +2(xw,yw) + (yw,yw) = ({x,x) +2(x,y) + (y,y)){w,w) .
Using (6) to cancel some terms in this last equation gives
2(xw,yw) = 2(x,y)(w,w) , proving (i).
Exercise. Prove (ii). I
Proposition 3 For any x,y,w € A} we have
B. Some General Results, Quartenions
147
(i) (xw,y) = (x,yw), and
(ii) (wx,y) = (x,wy).
Proof: Again, we prove (i).
Let w = c + 6 with c G R and 6 G Rx. Then
0 (xw,y) = (cx,y) + (x6,y), and
(x,yw) = {x,cy) + (x,yS).
Since (cx,y) = (x,cy), it suffices to prove (i) in the case that w G R-1 (so
that w = —w). With the assumption that w G R-1 we claim
(7) (x, y)(l + \\w\\2) = (x(l + w), y(l + w)) .
By Proposition 2, the right hand side of (7) equals
(x,y)(l + w, l + w) = (a:,y)((l, 1) + (1, w) + (w, 1) + («;,«;» .
But (l,w) = (w, 1) = 0, so this equals (x,y)(l + ||w||2) as claimed.
By proposition 2, we can rewrite (7) as
{x, y) + {xw, yw) = (x, y) + (xw, y) + (x, yw) + (xw, yw) .
It follows that (xw,y) = —(x^yw) = {x,y{—w)) = (x,yw), proving (i).
Exercise. Prove (ii). I
Next "we establish an important property of conjugation.
Proposition 4 For x,y G A, we have
xy = yx.
Proof: For any x,y,z,w G A we use (i) and (ii) of Proposition 3 repeatedly
to get
(zw,xy) = (z,xyw) = ((xy)z,w) = (xy,wz) = (y,x(wz))
= (y(wz),x) = (Wz,yx).
Now take u> = 1 to get (z,xy) = (z,yx). Since (,) is nondegenerate, this
proves Proposition 4. I
We are finally ready to prove formula (3) (i.e., xx = ||x||2 for any x G A).
* Proposition 5 For x,y € A, we have (x,y) = Re(xy) = ^(xy + yx) and
80
XX = \\X\\2 .
148
V. Normed Algebras
Proof: Recall that {t>i,..., vn} is our basis of A, and we write x = x\V\ +
h xnvni y = ytvt H h ynvn-
First we consider v^j when i ^ j. If j = 1, then t^- = vi € R1- ; so
assume j > 1. By Proposition 3,
(1^,1) = <Vi,t;j> = (vu-Vj) = -(vuvj) = 0 .
Thus v^- € R"1" whenever i ^ j.
Next we note that
INI2 = (Vi,Vi) = (IjViVi) = (Vi,Vi) = 1
by Proposition 3, so ^ is a unit vector. But then vfii is also a unit vector;
furthermore, v^i is real since
ww = ww = ww for any w e A
by Proposition 4. Thus v^ G {1, —1} for all i
For i > 1, we have £i = — u» so that — u? = t>^ G {1, —1}. If u? = 1,
then
0 = v? - 1 = (t* + 1)(^ - 1)
1
I
1
which contradicts the fact that neither V{ + 1 nor u$ — 1 is zero. Hence
v? = —1 if i > 1. I
Now we use these observations to calculate f
#
zy = {x1-\-x2v2-\ \-xnvn)(yi - y2v2 ynvn) f
= a?iyi + #22/2 H h #n2/n + Vi(x2v2 H h xnvn) - ^T Xiy^v^ Z
so that Re(#y) = xiyi H h o:nyn. But this is precisely (#,2/). The fact J
that Re(xy) = \{xy + yx) follows from the fact that Re(i<;) = ^(w + w) 5
and Proposition 4. 3
Finally, we see that 5
||#||2 = (x,x) — Rje(xx) — xx . I 5
Corollary If f5 G R-1 is a txmt vector, then 01 = — 1. Consequently, when ||
A is associative, V{Vj = — t^ /or distinct Vi,Vj £ {v2,... ,t>n}-
Proof: Exercise. I jl
Quaternions z
We have the basis {1, i, j, fc} for R4 and we know that =
ji as -y, jft as -ftj, and fci = -ifc
B. Some General Results, Quartenions
149
(assuming that our multiplication will be associative). But we still need to
define ij, jk, and ki.
Choose w G R and consider the map Lw : R4 -+ R4 defined by
Lw(x) = wx .
Lw is called left multiplication by w. This is a linear map since
o
Lw(ax + by) = w(ax + by) — awx + bwy = aLw(x) + bLw(y) .
That is, Lw G End(R4). Letting w = i, we see that £(-*) is the inverse of
Li since
£(.<) o Li(x) = L(_i){ix) = (~i)(ix) = x
and
^ oi(.i)(x) = a:;
so Li is an isomorphism of R . By Proposition 2, we have
(Li(x),Li(y)) = (x,y)\\i\\2 = (x,y)
so that Li is an orthogonal map (i.e., Li takes {l,a,j, &} to another or-
thonormal basis of R4). Now Li sends 1 to i and it sends i to —1. Thus
Li must take {j, &} to an orthonormal basis for Span(j, k). That is,
Li(j) — ij — aj -f bk with a, 6 G R
such that a2 + 62 = 1 and 6^0. (If 6 = 0, then we have aj G {j, — j} which
implies that i is real.) Similarly,
Li(k) = ik = cj + dA; with c, d G R ,
c2 + d2 — 1, and c ^ 0. Then we have
iL;(j) = -j = aij + ta&
= a(aj + &&) -f b(cj + dfe)
= (a2 + 6c)j H- (aft + bd)k .
Equating coefficients of j and A; yields
(8) a2 + bc = -l
and
(9) 6(a + d) = 0 .
Since 6^0, (9) becomes
150
V. Normed Algebras
Now Li is orthogonal, so we have
0 = (Li(j),Li{k)) = (aj + 6fc, cj + dk)
which implies
(10) ac + bd = 0 .
Substituting a — —d into (10) gives
(11) a(c-b) = 0.
Now we multiply (8) by c and multiply (10) by a to obtain
(80 a2c + 6c2 = -c
(100 a2c + a6d = 0.
Subtracting (100 from (80 yields
6(c2 - ad) = -c
or, since a = — d,
6(c2 + d2) = -c .
But c2 + d2 = 1, so we see that
(12) b = -c .
Prom (11), we know that either a — 0 or c = 6. If c = 6, then we get
c = 6 = — c by (12); but then c must be zero, a contradiction.
Hence 0 = a — —d and so b — — c G {1, -1}.
We choose b = 1 so that we have
ij — k and i/c = — j .
Multiplying ij = k on the right by j gives
so we have defined ij = k, jk = i, and ki = j. These last thr6e relations
area easy to remember using the diagram below.
i
k ^—^ j
Thus wa have the multiplication table
B. Some General Results, Quartenions
151
1
i
3
k
1
1
i
3
k
i
i
-1
-k
3
3
3
k
-1
—i
k
k
-3
i
-1
and using this table (along with distributivity), we can multiply any two
elements of R . °
For example, if u = a 4- bi 4- cj 4- dk and v = x 4- yi 4- zj 4- w&, then
f 1 ^ uv ~ (ax ~by — cz - dw) 4- (a$/ 4- &£ 4- cw — c?^)i
^ ' 4- (a^ 4- car 4- dy — 6ty)j 4- (aw + dx -i-bz — cy)k .
As we noted before, R4 equipped with this multiplication is denoted
by H and called the quaternions.
Exercise. Show that H is an associative normed algebra (which is not
commutative) if its multiplication is defined by the preceding table; also,
verify formula (13). Would we still get a normed algebra if we had decided
to define ij as —k instead of kl (Yes. In fact, this algebra is isomorphic to
H-)
So far, we have constructed the normed algebras RCCCH. Can we
extend H to a larger normed algebra? In fact we can, but we lose
associativity; we can make R8 into a normed algebra having H as a subalgebra.
This new normed algebra is called the octonions (or Cayley numbers) and
is denoted by O.
We will forgo a careful development of the octonians and simply define
an appropriate structure on R . (We remark that there are several such
structures, but they give isomorphic normed algebras.)
We use the standard basis {1 = ei, e2,..., eg} where 1 = (1,0,..., 0),
etc., and the standard norm on R . By the corollary after Proposition 5,
we know that we must have
e\ = -1 for i > 2 ;
also, we require that
e% ej — ej e<i
for i,j G {2,..., 8} with i ^ j. (This property does not follow from the
corollary. Why?) We define
e2e3 = e4, e2e5 = ee, e2e7 = -e8, e3e5 = e7,
ese® = e8, e4e5 = e%, e4e6 = -e>j
and we get 14 mora relations by cyclically permuting each set of three
subscript! (©.§., 0§§4 » H and e4e2 = e3, etc:.),
152
V. Normed Algebras
Thus we know how to multipy any two basis elements, so we can
multiply two elements of R° using distributivity.
Obviously, O is not commutative; to see that it is not even associative,
we calculate
^5 * e7 = e6e7 = —e4
and compare this to
^2 * e5e7 = e2e3 = e4 .
Exercises
(1) Show that the commutator [, ] : A x A —* A is a bilinear map
on the algebra A. Now show that the associator [,, ] is trilinear
on A.
(2) Show that if we represent R4 as C 0 C = {(a, (3) | a, /? G C}
and define
(a,P)(>y96) = (cn-6P,6a + py),
then this also gives H.
(3) Now represent R8 as H 0 H and use the same definition of
multiplication as in exercise (2). Show that this also gives O.
(4) Write out a formula, analogous to (13), for the product of two
octonions.
(5) Show that O is a normed algebra which contains H as a subal-
gebra. (Showing that O is normed will be easier after we have
established the Cayley-Dickson formula.)
(6) Let FCZbe those integers which may be represented as a sum
of four squares, and let E C Z be those integers which may be
represented as a sum of eight squares (e.g., 10 = l2 +12 -f 22 -f
22 GF). Show tht F and E are closed under multiplication.
(7) Given q = a -f bi 4- cj -f dk G H where a, b,c,d e R, show that
q and q are both roots of the real polynomial
p(x) = x2 — 2ax + \\q\\ .
(8) Consider the set of n-tuples of quaternions
This is not a vector space since H is not a field, but we can
still define the notions of scalar multiplication, addition, linear
B. Some General Results, Quartenions
153
maps, etc. For example, if a G H and q = (qi,... ,qn) G Hn,
then we define
aq = (aqi,...,aqn) .
Given a matrix of quaternions A G Mn(H), we define a map
<$>: Hn -♦ Hn by
<l>(Q) = (Qi,'-,Qn)A.
o
Show that this map is linear. Also, show that the map
\ Qn
is not linear since we decided to multiply by scalars on the left.
(9) Define an inner product on Hn by
(x,y) =xiyi + --+xnyn
where x = (xi,..., xn) and y = (yi,..., yn). Show that (,)
satisfies the following properties.
(i) (x, y) = (y, x)
(ii) (x + y, *) = (x, z) + (y, z)
(ii') (x, y 4- 2) = (x, y) 4- (x, 2)
(iii) (ax,y) = a(x,y)
(iii7) (x, ay) = (x, y)a
(iv) (x, x) is a real number > 0, and (x, x) = 0 O x = 0
(v) If {ei,..., en} is the standard "basis" for Hn (i.e.,
ei = (1,0,..., 0), etc.), then (e., ej) = Sy.
(vi) (,) is nondegenerate.
(10) For x,y G Hn and A G Afn(H), show that
(Xi4,y) = (x,y*A) .
(11) The matrix ^4 G Mn(H) is said to be symplectic if
(xA, yA> = (x, y>
for all x, y G Hn. Show that Sp(n) = {A € Mn(H)|j4 is symplectic}
is a group under matrix multiplication. Sp(n) is called the
symplectic group.
(12) Show that A is symplectic <$>
(Ax, Ax) = (x, x)
for all x € H.
154
V. Normed Algebras
C. Alternative and Division Algebras
Now we show that normed algebras are rather special. First they satisfy a
weak kind of associativity called "alternative," and second they are division
algebras.
Definition An algebra A is alternative if the associator [x, y, z] = (xy)z —
x(yz) is zero whenever two of {x,y, z} are equal.
Proposition 6 Any normed algebra A is alternative.
Proof: We need to show that [x, w, w] = 0, [w, x, w] = 0 and [w, w, x] = 0.
First we prove that for any x, w in A, we have
(14) [x,w,w] = 0 .
To see this, note that
[x, w, w] = (xw)w — x(ww)
= (xw)w — x\\w\\2
by Proposition 5. For any yGi, note that
((xw)w, y) = (xw, yw) by Proposition 3
= (x, y) 11 w 112 by Proposition 2
= <*|MI2,i/>.
But (,) is nondegenerate, so (xw)w = x\\w\\2 proving (14).
Quite similarly, one proves
(15) [w^w^x] = 0.
Now write w = c 4- 8 with c G R and 8 G Br1. Then
[x, w, w] = [x, c 4- <5, c 4- 8] = [x, c, c] 4- [x, c, 8} 4- [x, <5, c] 4- [x, <5,6]
since [, , ] is trilinear. The first three terms contain c G R and thus they
are zero by Proposition 1; thus
[x, w, w] = 0 ^ [x, 6,6] = 0 .
But 6 = —6, so that [x,6,8] = —[x,6,6] which is zero by (14). Similarly,
(15) implies that [w,w,x] = 0.
It remains to prove that [w, a?, w] = 0. Since [w, w) x] = 0, we know that
0 = [x + w}x + w,w]
C. Alternative and Division Algebras
155
and the right hand side is
[x, x, w] -f [x, w, w] 4- [w, w, w] 4- [w, x, w] .
But the first three terms are zero by (14) and (15). This proves [w, x, w] = 0
and completes the proof of Proposition 6. I
Definition An algebra A with unit element is a division algebra if each
nonzero a € A lias a unique left and right inverse (i.e., there is a unique
a~l G A such that aa~l = a~la = 1).
Proposition 7 Any normed algebra A is a division algebra. In fact, given
a, 6 G A with a ^ 0, the equations ax = b and xa = b can be solved uniquely
for x.
Proof: For a ^ 0, we have a-1 = tt^W since aa = \\a\\2 (by Proposition 5).
Now note that {ba)a = b(aa) by (14), so the solution of xa = b is
x = jj^w. Similarly, the solution of ax = b is x = v^m. I
We conclude this section with some important formulas for a general
normed algebra A. (Proofs are outlined in the exercises.)
Suppose x,y G A are orthogonal, i.e., (x,y) = 0. Then
(16) xy = -yx ,
(17) x(yw) = —y(xw) , and
(18) (wy)x — ~{wx)y
for an arbitrary w G A.
Exercises
(1) Using the fact that 2(x, y) = xy -f yx in a normed algebra A,
show that
2(x, y)w = [x, y, w] + [y, x, w] + x(yw) + y(xw)
for any x,y,w G A.
(2) Show that the formula in exercise (1) reduces to
2{x, y)w = x(yw) 4- y(xw)
by considering the associator [x 4- y, x + t/, u;].
(3) Now show that
2{#, j/)w = (wy)£ 4- (wx)y
for all a1,2/, w € X.
156
V. Normed Algebras
(4) Assuming that x and y are orthogonal, prove formulas (16),
(17), and (18).
(5) If the (not necessarily normed) algebra A is alternative, show
that [w, w, x] = 0 for any w, x G A. Similarly, [x, w, w] = 0 and
[w, x, w] = 0.
D. Cayley-Dickson Process,
Hurwitz Theorem
Let B be a normed algebra and suppose A is a subalgebra with 1 G A.
(Note that A is also a normed algebra.) Then RCA and so A1- C R-1.
Let e be a unit vector in A^. Then e G R-1 so that e2 = —1.
Proposition 8 Ae is orthogonal to A; i.e., Ae C A1-.
Proof: Since A is a normed algebra, we know that
xG A&xG A .
Now any element of Ae is of the form ae for some a G A. Thus for any
b G A, we have
(ae, 6) = (e, a&) = 0
since ab G A. Hence ae G A-1. I
Note that Ae is a subspace of the vector space B, so we can form the
sum
A + Ae .
By the previous proposition, AC\Ae = {0} so this sum is direct. That is,
A® Ae
is a subspace of B. In fact, we can make A © Ae into a subalgebra.
Theorem 1 A ® Ae is a subalgebra of B and the multiplication in A 0 Ae
is given by
(19) (a + be)(c + de) = (ac - db) + (da + bc)e
Proof: Clearly (19) implies that A ® Ae is a subalgebra, so we just need
to prove (19). Now
(a + be)(c + de) = ac-{- (be)(de) + a(de) + (6e)c ,
and the first term ac is already in the desired form. By Proposition 8, we
have (6, e) = 0, (d, e) = 0, (d, be) = 0, etc. So we can use the formulas (16),
(17), (18). Recall (17) says that for (x,y) = 0
x(gw) = -i/(£it;),
D. Cayley-Dickson Process, Hurwitz Theorem
157
Let x = be and y = d and w = e. Then
(6e)(de) = -d{(be)e) = ~d((-be)e) = -d(b(ee)) = -db
where the second equality follows from the fact that be G Ae C R-1, and
the third is true because B is alternative.
Next, by (16) and (17), we have
a(def= a(—de) = a(—ed) = a(ed) = —e(ad)
= e(da) = —(da)e = (da)e ;
and finally, by (18),
(be)c = —(bc)e = (bc)e . I
Exercise. Show that the conjugation on A 0 Ae is given by
a + 6e = a — be .
We now use Theorem 1 to motivate the Cayley-Dickson Process. Let
A be a finite-dimensional real algebra with unit element and a conjugation
such that § = x and xy = yx. We are not assuming that A is normed, but
we can define the real and imaginary parts of x G A by
Re(x) = -{x + x) and Im (x) = -(x — x) .
Zi Zi
(See exercise (1).)
We make the (external) direct sum B = A 0 A into an algebra with
unit element by defining
(a, 6)(c, d) = (ac — d&, da + 6c) ;
and we define conjugation on B by
(a7&) = (a, -b) .
Exercise. Show that this conjugation on B satisfies x — x and xy = yx.
(Thus we can repeat this construction, forming B 0 £?, etc.) What are the
real and imaginary parts of (a, 6) G Bl
We say that I? is obtained from A by the Cayley-Dickson Process.
So far, we have C = R0R, H = C0C, and O = H0H by using the
Cayley-Dickson Process (see (2') and the exercises of section B).
Now we regard A as a subalgebra of B = A 0 A by including it as the
first factor. If we choose a unit vector e from the second factor, then we
can write B as the internal direct sum
B = A 0 Ae
with multiplication given by (19).
158
V. Normed Algebras
Theorem 2 Suppose B = A(&Ae is obtained from A by the Cayley-Dickson
Process. Then
(i) B is commutative 4=> A = R,
(ii) B is associative ^ A is associative and commutative, and
(hi) B is alternative ^ A is associative.
Proof: Suppose x = a + ae, y = b + /?e, and z = c + ^e are elements of B
with a, a, 6, /?, c, 7 G A.
We will state some rather incredible-looking formulas, show that these
prove the theorem, and then show that they are true by using the Cayley-
Dickson formula (19).
(20) [x, y] = [a, b] + 2 Im (a/?) + 2(/3 Im (a) - a Im (b))e
(21][x,y,z] = [a,7/?] + [&,7a] + M<*] +
(a [6,c] +/? [a,c] +7 [a, 6] +a [A 7] + [a, 7] £ + 7 [a,0] )e
(22}x, x, y] = [a, /3, a] + [a, 6, a]e
Suppose A = R so that all of the terms in the right hand side of (20)
zero. Thus B is commutative.
(i) =>• (by contrapositive)
Suppose A^Rso that dim A > 2. Now R = Span(l) C A so that A
contains C = R0R by Theorem 1. But then B must contain H = C®C
so that B is not commutative.
(ii)«=
If A is commutative, then any commutator with both entries from A
will be zero. This is precisely the case in the right hand side of (21), so B
must be associative.
If B is associative, then A certainly is; furthermore, the associator
[x, 2/, z] is always zero. In particular, the A-part in (21)
[a,7/?] + [&,7a] + [c,0a]
is always zero. Let a,/3 G A be arbitrary, let 7 = 1, and let b = c = 0.
Then we see that [a, /?] = 0 so A is commutative.
(in) <=
If A is associative, then the right hand side of (22) vanishes. Thus
[x, x, y] = 0
for all x,y e B, Looking at the proof of Proposition 6, we see that this is
sufficient to prove that B is alternative,
D. Cayley-Dickson Process, Hurwitz Theorem
159
(iii) =*
If B is alternative, then [x, #, y] = 0 (see exercise (5) of the previous
section). By (22), we have [a,J3,a] = 0 for any a, /?, a G A so A is
associative.
Now we show that the formulas (20), (21), and (22) hold.
Using (19), we calculate
xy =& (a + ae)(b + /3e) = (ab - J3a) + (/3a + a6)e
and
yx = (b + /3e)(a + ae) = (6a - a/3) + (a6 + /?a)e .
Thus
xy - yx = ab - ba + (a/3 — /3a) + (/3(a — a) — a(b — b))e .
Recall that 2 Im (w) = w — ?D, so this last equation becomes (20).
To establish (21), we must assume that A is associative. (This is no
restriction since A is always associative in case (ii).)
Now [x,y,z] = (xy)z — x(yz) and using (19), we calculate
(xy)z = {(ab -jia) + (/3a + ab)_e}(c + je)
= (a6 - /?a)c — 7(/?a + a6)
+ {j(ab - j3a) + (/?a + ab)c}e
and
#(yz) = a(bc - j/3) - (fry + c/?)a
+ {(76 + f3c)a + a(c6 - /?7)}e .
So we have
(23)
r 1 _ / (ab)c ~ (Pa)c ~ 7(/?a) ~ 7M0
J 7(a6) - 7(/?a) + (/?a)c_+ (a6)c
+ 1 -(76)a-()9c)a-a(a)+a(j97) '6
Since A is associative, (ab)c — a(bc) = 0 in the A-part. Also, in the
A-part, we have
c(j3a) - (f3a)c = [c, (Ja] (terms 8 and 2)
a(j/3) — (7/3)a = [a, 7/?] (terms 6 and 3)
6(7(2) — (7(2)6 = [6,7a] (terms 7 and 4) .
Hence the ^4-part of (23) is the same as in (21).
For the Ae-part, we have
7(a6) - 7(6a) = 7[a, 6] (terms 1 and 5)
/3(ad) - 0(da) = /3[a, c] (terms 3 and 6)
a(ls) - a(Sb) « a[g, c] (terms 4 and 7) .
160
V. Normed Algebras
We are left with the terms a(/?7) — 7(f3a) which we rewrite as
afij — aj(3 + aj(3 — ja(3 + jaj3 — jfia
= a[&7] + [a,7]j9 + 7[a,j9],
so (21) holds.
To establish (22), we must assume that A is alternative (which is the
case in (iii)). We calculate
xx = (a + ae)(a — ae) = aa-haa
and so
(24) (xx)y = (aa + aa)b + (aa + aa)(3e .
Also,
xy = (a — ae)(b + /?e) = (a& + /?a) + (/?a - ab)e
so that
x(icy) = (a + ae){(a6 + /3a) + (/?a — ab)e}
= a(ab + J3a) — (a/5 — ba)a
+ {(/3a - a6)a + a(ba + a/?)}e
= a(a&) + a(J3a) — (aj3)a + (ba)a
+ {(/?a)a - (a6)a +_a(6a) -f a(a/?)}e
= (aa)6 + b(aa) — [a, /?, a]
+ {/?(aa) + (act)fi — [a, 6, a] }e
where the last equality follows from the fact that A is alternative.
Now note that ww G R for any w € A. (Exercise. Show this.) So we
have
/25x x(xy) = (aa + aa)b - [a, /?, a]_
^ ' + {(aa + aa)^ — [a, 6, a] }e .
Subtracting (25) from (24) gives (22) which finishes the proof of Theorem
2. 1
An important corollary of Theorem 2 is
Theorem 3 (Hurwitz Theorem) The only real normed algebras are R, C,
H; and O.
Proof: Let Bbea normed algebra and let R = Span(l). If B = R, then
we are done. If not, then Rx 7^ {0} so we can choose a unit vector i G Rx.
By Theorem 1, C = R 0 Hi is a subalgebra of B.
If B = C, then we are done; otherwise, we choose a unit vector j G C1.
Then H = C 0 Cj is a subalgebra of B.
If B = H, we are done. If not, we choose a unit vector e € H1 so that
O = H 0 He is a subalgebra of B.
D. Cayley-Dickson Process, Hurwitz Theorem
161
If O is not all of B, then B contains the subalgebra O 0 Or where
r G 0^~ is a unit vector. But B is a normed algebra so that O 0 Or is
also normed. Hence O 0 Or is alternative (by Proposition 6) so that O
must be associative by Theorem 2. Since O is not associative, we must
have B = O. I
Exercises
(1) Let A%e a finite-dimensional real algebra with unit element,
and choose a basis {^i,... ,vn} for A such that vi = 1. For
x = cci^i H + xnvn G A, define
X = Xi — X2V1 XnVn
and show that this conjugation satisfies x = x and xy = yx.
Also, show that Re(cc) = \{x + x) is, indeed, in Span(l) = R.
(2) Show that the Cayley-Dickson process works. That is, show
that the multiplication defined on B = A 0 A makes B into an
algebra with unit element.
(3) Consider the algebra B = O 0 O obtained via the Cayley-
Dickson process. Show that
||x||2 = xx
still holds, where || || is the standard norm on R16. Conclude
that B is a division algebra. However, given a, b G B, the
equation
ax = b
cannot always be solved uniquely for x since B is not
alternative.
(4) Let A be the subalgebra (with unit element) which is generated
by any two elements of O. Show that A is associative. (Hint:
If x, y G O generate A, write these as x = X\ + #2, y = yi + 2/2
where x\ = Re(#) and X2 = Im (x), etc. If A ^ R, consider
R 0 Re C A where e = pjir.)
(5) We have shown that xx = \\x\\2 and xy = yx hold in a general
normed algebra. Prove these directly for O (without assuming
that O is normed).
(6) Use exercises (3) and (4) to give a proof that O is a normed
algebra (i.e., satisfies (1)). This also shows that R, C, and H
ar© normed.
162
V. Normed Algebras
(7) Given cc, y, z G O, prove the identities
(xyx)z = x(y(xz)) ,
z(xyx) = ((zx)y)x , and
(xy)(zx) = x(yz)x .
Note that the expression xyx makes sense since O is
alternative. (Hint: If any two of {#,?/, z} are equal, then these
equations reduce to zero by exercise (3). Show that these
formulas are linear in y and z, so we can assume x,y, and z are
orthogonal. Now apply exercises (2) and (3) from the previous
section.)
Bibliography
[1] Morton L. Curtis, Matrix Groups, second edition, Universitext Series,
Springer-Verlag New York, Inc. (1984).
[2] Paul R. Halmos, Finite-Dimensional Vector Spaces, second edition, Un-
degraduate Texts in Mathematics Series, Springer-Verlag New York, Inc.
(1974).
[3] F. Reese Harvey and H. Blaine Lawson, Jr., "Calibrated Geometries,"
Acta. Math. 148 (1982), 47-157.
[4] I.N. Herstein, Topics in Algebra, second edition, John Wiley & Sons
(1975).
[5] Serge Lange, Linear Algebra, second edition, Addison -Wesley Publishing
Co., Inc. (1971).
[6] Seymour Lipschutz, Linear Algebra, Schaum's Outline Series, McGraw-
Hill Book Co. (1968).
Index,
abelian group, 2
absolute value, 6
adjoint, 129
algebra, 18, 56
alternative, 154
associative, 18, 57
commutative, 57
division, 155
normed, 155
algebraic multiplicity, 79, 105
algebraically closed field, 4
annihilator, 45
associator, 60, 145
basis, 22
bilinear form, 113
block diagonal matrix, 103
Cayley numbers, 151
Cayley-Dickson formula, 156
Cayley-Dickson Process, 157
Cayley-Hamilton Theorem, 84, 95
change-of-basis matrix, 53, 116
characteristic of field, 4
characteristic polynomial, 83, 95
classical adjoint, 71
coefficients of a polynomial, 87
coefficients of linear system, 72
cofactor, 71
column, 48
column space, 35
commutator, 60, 145
comparable, 44
complementary subspaces, 28
complex numbers, 6
conjugate, 53
in normed algebra, 145
of complex number, 6
of a matrix, 125
of quartenion, 144
coordinate vector, 51
coset, 32
curve, 133
differentiate, 133
product, 133
degree of polynomial, £7
determinant, 63
diagonal matrix, 14, 65
diagonalizable, 104, 106
dimension, 24, 36
direct sum
external, 28
internal, 30
distance, 111
divide, 90
Division Theorem, 88
divisor of zero, 3
dual basis, 42
dual space, 41
echelon form, 69
166
eigenspace, 39
eigenvalue, 39, 76. 84
eigenvector, 39
elementary column operations, 70
elementary matrix, 69
elementary row operations, 68
elimination, 73
endomorphism, 18
End(V0, 18
epic, 16
exact sequence, 31
extension field, 4
exterior algebra
on IR3, 58
on IRn, 61
Factor Theorem, 89, 90
field, 2
finite-dimensional, 22
finite set, 22
functional, 41, 120
fundamental theorem
of algebra, 4
general linear group, 19, 65
generates, 12
geometric multiplicity, 39, 105
Gram-Schmidt orthnormalization
process, 118
greatest common divisor, 90
group, 1
Hamel basis, 24
Hermitian form, 124
Hermitian inner product, 123
Hermitian linear, 124
Hermitian matrix, 128
Hermitian symmetric, 124
Hilbert Space, 117
Index
homogeneous system
of equations, 76
homomorphism, 82
Hurwitz Theorem, 160
i,j entry, 48
ideal, 82
idempotent, 77, 97
idempotent map, 20
identity map, 15
identity matrix, 52, 57
image, 16
imaginary part, 143, 145
indeterminate, 87
infinite-dimensional, 22
infinite set, 22
injective, 16
inner product, 111
inner product space
complex, 124
real, 116
integral domain, 85
inverse, 52
irreducible, 91
isomorphism of
vector spaces, 16
Jacobi identity, 60
join, 36
Jordan block, 103
Jordan form, 104
kernel, 16
^2-space, 117
Laplace expansion, 67
left multiplication, 149
left value, 89
Index
167
Lie Algebra, 60, 131
Lie Group, 132
line, 36
linear combination, 11
linear geometry of H3, 36
linear map, 15
linearly dependent, 21
linearly independent, 21
matrix, 14, 47
matrix algebra, 58
matrix multiplication, 50
matrix representation
of a bilinear form, 115
of a linear map, 48
maximal element, 44
method of elimination, 13
metric, 111
minimal polynomial, 92
monic map, 16
monic polynomial, 7, 87
natural map, 43
nilpotent, 28, 77
nondegenerate, 112
nonsingular matrix, 15
nonsingular, 52
norm, 111, 125
normal matrix, 138
normed algebra, 141
normed vector space, 120, 125
oct onions, 151
operator, 18
orthogonal, 118, 125
orthogonal complement, 122
orthogonal group, 130
orthogonal map, 118
orthogonal matrix, 118, 130
orthonormal, 118
orthonormal basis, 118
parallel, 36
parallelogram law, 120
partially ordered set, 43
permutation, 6
plane, 36
point, 36
polar form, 120
polynomial function, 87
positive definite bilinear form, 114
positive definite matrix, 128 .
preimage, 16
projection, 20
quadratic form, 116
quaternions, 144, 151
quotient space, 33
rank, 55, 115
Rank Theorem, 25
real inner product space, 116
real part, 143, 145
relatively prime, 91
Remainder Theorem, 90
ring, 81
associative, 81
commutative, 81
homomorphism, 82
root, 4, 87
rotation group, 130
row, 48
row equivalent, 68
row reduced, 69
row space, 55
scalar, 9
168
Index
Schwarz Inequality, 112, 115, 124
similar matrices, 53
simply ordered set, 44
singular matrix, 15
skew-Hermitian matrix, 128, 131
skew-symmetric matrix, 130, 131
span, 11, 12
special linear group, 133
special orthogonal group, 130
special unitary group, 130
Spectral Theorem, 98, 101
split exact sequence, 31
splitting, 31
square matrix, 52, 57
stable subspace, 37
standard basis, 23
subfield, 4
subgroup, 5
submatrix, 56
subring, 82
subspace, 11
sum of subspaces, 14
superdiagonal, 101
surjective, 16
symmetric bilinear form, 114
symmetric matrix, 68
symplectic group, 153
symplectic matrix, 153
system of equations, 72
terms of a polynomial, 87
torsion, 3
trace, 54
transpose, 65
triangle property, 111
triangular matrix, 65
trivial solution, 72
unit, 19, 81
unit element, 18, 57, 81
unit vector, 118
unitary group, 125
unitary map, 125
unitary matrix, 125, 130, 134
upper bound, 44
upper triangular, 14, 65
value, 89
vector, 9
vector space
over fc, 10
real, 9
Zorn's Lemma, 44
o