A Course in Real Analysis - John N. McDonald

Автор: John N. McDonald
Теги: mathematics mathematical analysis
ISBN: 0-12-045143-3
Год: 2005
Похожие
Techniques in fractal geometry
Wavelets: Tools for Science and Technology
Handbook of Mathematics
Geometry of sets and measures in euclidean spaces. Fractals and rectifiability
Текст
                    A Course in
Real Analysis
John N. McDonald
Department of Mathematics
Arizona State University
Neil A. Weiss
Department of Mathematics
Arizona State University
Biographies by Carol A. Weiss
ACADEMIC PRESS
Я f• ® f i Hi. V fl
Nonlinear Fiber Optics 3rd ed.
G. P. Agrawal
ISBN-.0-12-045143-3
Copyright © 2001, by Elsevier, All rights reserved.
Authorized English language reprint edition published by the Proprietor.
Reprint ISBN: 981-2592-99-7
Copyright © 2004 by Elsevier (Singapore) Pte Ltd. All rights reserved.
Elsevier (Singapore) Pte Ltd.
3 Killiney Road
#08-01 Winsland Hose I
Sinagpore 239519
Tel: (65) 6349-0200
Fax: (65) 6733-1817
First Published 2005
2005 W
Printed in China by Beijing World Publishing Corporation under special
arrangement with Elsevier (Singapore) Pte Ltd. This edition is authorized
for sale in China only, excluding Hong Kong SAR and Taiwan.
Unauthorized export of this edition is a violation of the Copyright Act.
Violation of this Law is subject to Civil and Criminal Penalties.
Elsevier (Singapore) Pte Ltd.^t^tS^-S^lll
To Pat and Carol

Contents
Preface	xiii
PART ONE □ Set Theory, Real Numbers, and Calculus
1	□ SET THEORY
Biography: Georg Cantor	2
1.1	Basic Definitions and Properties	3
1.2	Functions and Sets	12
1.3	Equivalence of Sets; Countability	20
1.4	Algebras, (J-Algebras, and Monotone Classes	26
2	□ THE REAL NUMBER SYSTEM AND CALCULUS
Biography: Georg Friedrich Bernhard Riemann	34
2.1	The Real Number System	35
2.2	Sequences of Real Numbers	43
vii
viii □ Contents
2.3	Open and Closed Sets	57
2.4	Real-Valued Functions	65
2.5	The Cantor Set and Cantor Function	73
2.6	The Riemann Integral	81
PART TWO □ Measure, Integration, and Differentiation
3 □ LEBESGUE THEORY ON THE REAL LINE
Biography: Emile Felix-Edouard-Justin Borel	92
3.1 Borel Measurable Functions and Borel Sets	93
3.2 Lebesgue Outer Measure	103
3.3 Further Properties of Lebesgue Outer Measure	110
3.4 Lebesgue Measure	118
3.5 The Lebesgue Integral for Nonnegative Functions	128
3.6 Convergence Properties of the Lebesgue Integral for	
Nonnegative Functions	140
3.7 The General Lebesgue Integral	149
3.8 Lebesgue Almost Everywhere	161
4 □ MEASURE THEORY	
Biography: Henri Leon Lebesgue	166
4.1 Measure Spaces	167
4.2 Measurable Functions	174
4.3 The Abstract Lebesgue Integral for Nonnegative Functions	184
4.4 The General Abstract Lebesgue Integral	192
4.5 Convergence in Measure	203
4.6 Extensions to Measures	207
4.7 The Lebesgue-Stieltjes Integral	220
4.8 Product Measure Spaces	231
4.9 Iteration of Integrals in Product Measure Spaces	245
Contents □ ix
5	□ ELEMENTS OF PROBABILITY
Biography: Andrei Nikolaevich Kolmogorov	260
5.1	The Mathematical Model for Probability	262
5.2	Random Variables	274
5.3	Expectation of Random Variables	288
5.4	The Law of Large Numbers	301
6	□ DIFFERENTIATION
Biography: Johann Radon	314
6.1	Derivatives and Dini-Derivates	315
6.2	Functions of Bounded Variation	330
6.3	The Indefinite Lebesgue Integral	334
6.4	Absolutely Continuous Functions .	342
6.5	Signed Measures	354
6.6	The Radon-Nikodym Theorem	364
6.7	Signed and Complex Measures	377
6.8	Decomposition of Measures	390
6.9	Measurable Transformations and the General
Change-of-Variable Formula	402
PART THREE □ Topological, Metric, and Normed Spaces
7	□ ELEMENTS OF TOPOLOGICAL, METRIC, AND NORMED SPACES
Biography: Pavel Samuilovich Urysohn	410
7.1	Introduction to Topological Spaces	411
7.2	Metrics and Norms	419
7.3	Weak Topologies	427
7.4	Closed Sets, Convergence, and Completeness	431
7.5	Nets and Continuity	438
7.6	Separation Properties	447
7.7	Connected Sets	453
7.8	Separability, Second Countability, and Metrizability	459
7.9	Compact Metric Spaces	464
х □ Contents
7.10	Compact Topological Spaces	471
7.11	Locally Compact Spaces	475
7.12	Function Spaces	481
8	□ COMPLETE SPACES, COMPACT SPACES, AND APPROXIMATION
Biography: Marshall Harvey Stone	492
8.1	The Baire Category Theorem	493
8.2	Contractions of Complete Metric Spaces	498
8.3	Compactness in the Space C(Q, Л)	503
8.4	Compactness of Product Spaces	509
8.5	Approximation by Functions From a Lattice	513
8.6	Approximation by Functions From an Algebra	518
9	□ HILBERT SPACES AND THE CLASSICAL BANACH SPACES
Biography: David Hilbert	526
9.1	Preliminaries on Normed Spaces	527
9.2	Hilbert Spaces	533
9.3	Bases and Duality in Hilbert Spaces	545
9.4	£P-Spaces	553
9.5	Nonnegative Linear Functionals on C(Q)	563
9.6	The Dual Spaces of C(Q) and Co(Q)	573
10	□ BASIC THEORY OF NORMED AND LOCALLY CONVEX SPACES
Biography: Stefan Banach	578
10.1	The Hahn-Banach Theorem	579
10.2	Linear Operators on Banach Spaces	590
10.3	Topological Linear Spaces	597
10.4	Weak and Weak* Topologies	609
10.5	Compact Convex Sets	618
Contents □ xi
PART FOUR □ Harmonic Analysis and Dynamical Systems
11	□ ELEMENTS OF HARMONIC ANALYSIS
Biography: Ingrid Daubechies	634
11.1	Introduction to Fourier Series	636
11.2	Convergence of Fourier Series	643
11.3	The Fourier Transform	653
11.4	Fourier Transforms of Measures	662
11.5	£2jTheory of the Fourier Transform	672
11.6	Introduction to Wavelets	678
11.7	Orthonormal Wavelet Bases; The	Wavelet Transform	684
12	□ MEASURABLE DYNAMICAL SYSTEMS
Biography: Claude Elwood Shannon	696
12.1	Introduction and Examples	697
12.2	Ergodic Theory	707
12.3	Isomorphism of Measurable Dynamical Systems;	Entropy	715
12.4	The Kolmogorov-Sinai Theorem; Calculation of Entropy	723
Index
733

Preface
This is a book about real analysis, but it is not an ordinary real analysis
book. Written with the student in mind, this text incorporates pedagogical
techniques not often found in books at this level. The book is intended for
a one-year course in real analysis at the graduate level or the advanced
undergraduate level.
We bring over 50 years of combined teaching, research, and writing
experience to this project. The text material has been class tested several
times and has been used for independent study courses as well.
What Makes This Book Unique
This book contains many features that are unique for a real analysis text.
Here are a few.
Motivation of key concepts. All key concepts are motivated. The im-
portance of and rationale behind ideas such as measurable functions, mea-
surable sets, and Lebesgue integration are made transparent.
Detailed theoretical discussion. Detailed proofs of most results (i.e.,
lemmas, theorems, corollaries, and propositions) are provided. However,
xiv □ Preface
to fully engage the reader, proofs or parts of proofs are often relegated to
the exercises.
Illustrative examples. Following most definitions and results, one or
more examples are presented that illustrate the concept or result in order
to solidify it in the reader’s mind and provide a concrete frame of reference.
This book contains approximately 200 examples, most of which consist of
several parts.
Abundant and varied exercises. The text contains over 1200 exercises,
not including parts, far more than other real analysis books. Furthermore,
the exercises vary widely with regard to application and level.
Applications. A diverse collection of applications appears throughout the
text, some as examples and others as entire sections or chapters. For in-
stance, applications to probability theory are ubiquitous. Other applica-
tions include those to Fourier analysis, wavelets, and measurable dynamical
systems.
Careful referencing. As an aid to effective use of the book, we have con-
sistently provided references (including page numbers) to definitions, exam-
ples, exercises, and results. Additionally, we have marked post-referenced
exercises with a star (★); we strongly recommend that all such exercises be
done by the reader.
Biographies. Each chapter begins with a brief biography of a famous
mathematician. Besides being of general interest, these biographies help
the reader obtain a perspective on how real analysis and its applications
have developed.
Organization
The text offers considerable flexibility in the choice of material to cover.
•	Chapters 1 and 2 present prerequisite material that may be review for
many but provides a common ground for all readers. At the option of the
instructor, these two chapters can be covered either briefly or in detail;
they can also be assigned to the students for independent reading.
•	Chapters 3 and 4 present the elements of measure and integration by
first discussing the Lebesgue theory on the line (Chapter 3) and then
the abstract theory (Chapter 4). This material is prerequisite to all
subsequent chapters.
Preface □ xv
•	Chapter 5 provides an introduction to the fundamentals of probability
theory, including the mathematical model for probability, random vari-
ables, expectation, and laws of large numbers. Although optional, this
chapter is recommended as it provides a myriad of examples and appli-
cations for other topics.
•	In Chapter 6 differentiation is discussed, both of functions and of mea-
sures. Topics examined include differentiability, bounded variation, and
absolute continuity of functions, and a thorough discussion of signed
and complex measures, the Radon-Nikodym theorem, decomposition of
measures, and measurable transformations.	*
•	Chapter 7 provides the fundamentals of topological and metric spaces.
This chapter can be covered relatively quickly when the students have
a background in topology from other courses. In addition to topics tra-
ditionally found in an introduction to topology, a discussion of weak
topologies and function spaces is included.
•	Completeness, compactness, and approximation comprise the topics for
Chapter 8. Examined therein are the Baire category theorem, contrac-
tions of complete metric spaces, compactness in function and product
spaces, and the Stone-Weierstrass theorem.
•	Presented in Chapter 9 are Hilbert spaces and the classical Banach
spaces. Among other things, bases and duality in Hilbert space, com-
pleteness and duality of £p-spaces, and duality in spaces of continuous
functions are discussed.
•	The basic theory of normed and locally convex spaces is given in Chap-
ter 10. Topics include the Hahn-Banach theorem, linear operators on
Banach spaces, fundamental properties of locally convex spaces, and the
Krein-Milman theorem.
•	Chapter 11 provides applications of previous chapters to harmonic anal-
ysis. We examine the elements of Fourier series and transforms and
the £2-theory of the Fourier transform. In addition, an introduction to
wavelets and the wavelet transform is presented.
•	Chapter 12 examines measurable dynamical systems. This chapter re-
quires the one on probability (Chapter 5) and discusses ergodic theorems,
isomorphisms of measurable dynamical systems, and entropy.
The flowchart on the next page summarizes the preceding discussion
and depicts the interdependence among chapters. In the flowchart, the
prerequisites for a given chapter consist of all chapters having a path leading
to that chapter.
xvi □ Preface
Preface □ xvii
A cknowledgmen ts
It is our pleasure to thank the following reviewers, whose comments and
suggestions were invaluable in finalizing the book:
Wilfrid Gangbo
Georgia Institute of Technology
Maria Girardi
University of South Carolina
Michael Klass
University of California, Berkeley
Bert Schreiber
Wayne State University
Bruce A. Barnes
University of Oregon
Dennis D. Berkey
Boston University
Courtney Coleman
Harvey Mudd College
Peter Duren
University of Michigan
Our very special thanks go to Bruce Barnes who undertook a detailed
reading of the entire manuscript and provided comments and suggestions
throughout. We also thank the many graduate students in our courses, past
and present, who furnished invaluable feedback; in particular, we would
like to express our appreciation to Mohammed Alhodaly, Hamed Alsulami,
Jimmy Mopecha, Lynn Tobin, and, especially, Jim Andrews, Trent Buskirk,
Menassie Ephrem, Ken Peterson, John Williams, and Xiangrong Yin.
We thank Arizona State University for its support and those chairs
of the ASU Mathematics Department who provided encouragement for
the project: Rosemary Renaut, Christian Ringhofer, Nevin Savage, and
William T. Trotter.
Our appreciation goes as well to Berthold Horn and Louis Vosloo of
Y&Y, Inc., for their I^X software package and consistent willingness to
provide technical support; to Amy Hendrickson of T^Xnology Inc., for
perusing our T^X macros; to our copyeditors Carroll and Eugene Robinson;
and to our cover designer Richard Hannus of Hannus Design Associates.
Thanks to all of those at Academic Press for helping make this book a
reality, in particular, to Nicole Burnett, Bettina Carbonaro, Victor Curran,
Carla Daves, Linda Ratts Engelman, Julio Esperas, Amy Fulton, Pascha
Gerlinger, Charles Glaser, Anja Mutic-Blessing, Peter Renz, Bob Ross, and
Karen Wachs.
Finally, we would like to express our heartfelt thanks to Carol Weiss.
Apart from writing the text, she was involved in every aspect of develop-
ment and production. Moreover, Carol researched and wrote the biogra-
phies and took on the task of typesetter using the TgX typesetting system.
Tempe, Arizona
J.N.M
N.A.W.

A Course in
Real Analysis

PART ONE
□
Set Theory, Real Numbers,
and Calculus
(1845-1918)
Georg Cantor was born on March 3, 1845, in
St. Petersburg, Russia. He received his doc-
torate in mathematics from the University of
Berlin in 1867, having studied under Weier-
strass, Kummer, and Kronecker. In 1869, he
accepted a teaching position at the University
of Halle and became a full professor in 1879.
Cantor wanted to obtain a professorship at
the University of Berlin, where both pay and prestige were higher, but
Kronecker, believing that much of Cantor's work (particularly his "trans-
finite numbers") was unsound, stood firmly in Cantor's path.
Others, however, acknowledged Cantor’s genius. Cantor was an hon-
orary member of the London Mathematical Society and received honorary
doctorates from both Christiania and St. Andrews. Hilbert said Cantor's
work was "... the finest product of mathematical genius and one of the
supreme achievements of purely intellectual human activity.”
Known as the founder of set theory, Cantor also made fundamental
contributions to classical analysis. Many concepts in modern mathemat-
ics bear his name, among which are Cantor series and Cantor sets; he
also developed the first usable definition of the continuum.
The controversy surrounding his work took a heavy toll on Cantor;
beginning in 1884, bouts of deep depression drove him often to a sani-
tarium. Georg Cantor died in a psychiatric clinic at the University of Halle
(where he had remained as a professor) on January 6, 1918.
2
□
1
□
Set Theory
In this chapter, we will introduce the fundamentals of set theory. Although
some readers may be familiar with much of the material, we present this
chapter as a way to provide a common ground for all readers of the text.
We will first discuss basic definitions and properties of sets. Next
we will explore relationships between functions and sets, discuss Cartesian
products, and introduce countability. Finally, we will examine algebras,
a-algebras, and monotone classes — special collections of sets that play a
prominent role in analysis and measure theory.
1.1 BASIC DEFINITIONS AND PROPERTIES
A set is a collection of elements. If A is a set and x is an element (member,
point) of A, then we write x € A; x A means than x is not an element
of A and, in general, we use to signify negation. The symbol 0 denotes
the empty set, a set containing no elements.
Let A and В be sets. If every element of A is an element of B, then
A is said to be a subset of B, denoted А С В or В D A. Two sets, A
and B, are equal if they contain the same elements — in other words, if
3
4 □ Chapter 1 Set Theory
А С В and В C A. If А С В but В £ A, then we say that A is a proper
subset of B.
EXAMPLE 1.1 Illustrates Sets and Subsets
In this text, the following sets play a fundamental role:
C = collection of complex numbers
11 = collection of real numbers
Q = collection of rational numbers
Z = collection of integers
and
X = collection of positive integers
Note that Лг C Z C Q C ft С C or, equivalently, C D 7^ D Q D Z D Лг.
But, C^1Z<^.QgLZ(^M or, equivalently, N^Z^Q^IZ^C. □
We will use the notation {a} to denote the set consisting of the ele-
ment a; {a, b} to denote the set consisting of the elements a and 6; {a, b, c}
to denote the set consisting of the elements a, b, and c; and so on.
Let Q be a set. Subsets of Q are frequently defined in terms of proper-
ties that its elements must satisfy. If P(x) is some proposition concerning z,
then { x G Q : P(x)} is the collection of elements x G Q such that P(x) is
true. For example, {x G AT : x2 < 5} = {1,2}. When no confusion is
possible, we will sometimes abbreviate { x G Q : P(x) } to { x : P(x) }.
Of particular importance in real analysis are intervals of real num-
bers. The notation and terminology associated with these subsets of 1Z are
presented in the following definition.
DEFINITION 1.1 Intervals of Real Numbers
Let a and b be real numbers such that id < b. The bounded intervals
with endpoints a and b are as follows:
(a,b) = {x e It: a < x < b}
[a,6) = { x G It : a < x < b}
(a,b\ = {x elt: a < x <b}
[a,b\ = {x € 1t: a < x < b}
1.1 Basic Definitions and Properties □ 5
The unbounded intervals axe as follows:
(a, oo) = { x G H : x > a }
[a, oo) = { x G TZ : x > a }
(—00, b) = (xEll: x <b}
(—00,b] = {x : x <b}
(—00,00) = {x e 1Z} = 1Z
Complement, Intersection, and Union
We will now discuss three fundamental operations on sets — complement,
intersection, and union. In what follows, we will assume that all sets un-
der consideration are subsets of some fixed set Q, often referred to as the
universal set. The set of all subsets of Q is called the power set of Q and
is denoted by P(Q). Thus, A C Q if and only if A € P(Q).
Let A and В be subsets of Q. The complement of A, denoted Ac, is
the set of elements of Q that do not belong to A. Thus,
Ac = {x:x£A}.
The intersection of A and B, denoted А П B, is the set of elements of Q
that belong to both A and B. Thus,
AAB = {x:2:EA and x G В }.
The union of A and B, denoted A U B, is the set of elements of Q that
belong to either A or В (or both); in other words, those elements that
belong to at least one of A and B. Thus,
AU В = {x : x e A or x e B}.
Two important relationships among the three set operations of union,
intersection, and complement are given in the following proposition, known
as De Morgan’s laws.
PROPOSITION 1.1 De Morgan’s Laws
Let A and В be subsets of Q. Then,
a)	(AUB)C = ACABC.
b)	(АПВ)С = ACUBC.
6 □ Chapter 1 Set Theory
PROOF: We prove part (a) and leave the proof of part (b) as an exercise
for the reader. Suppose x E (A U B)c. Then x A U В so that x A and
x B. But then x 6 Ac and x G Bc, which implies that x G Ac П Bc.
Thus, (AUB)cC АСПВС.
Conversely, suppose x G Ac П Bc. Then x E Ac and x G Bc so that
x A and x B. But then x A U B, which implies that x E (A LLB)C.
Thus, Ac П Bc С (Л U B)c.
We have now shown that (AUB)C С АСПВС and АСПВС C (AUB)C.
This means that (A U B)c = Ac П Bc.	
The following proposition shows that intersection and union obey the
distributive laws. The proof is left to the reader as an exercise.
PROPOSITION 1.2 Distributive Laws
Let A, B, and C be subsets of Q. Then,
a)	An(BUC) = (AnB)U(AnC).
b)	A U (В П C) = (A U В) П (A U C).
Relative Complement and Symmetric Difference
Several set operations can be derived from the three basic operations of
complement, intersection, and union. Two of the most important such
operations are relative complement and symmetric difference. The
definitions of these two set operations follow.
DEFINITION 1.2 Relative Complement
Let A and В be subsets of Q. Then the complement of A relative
to B, denoted В \ A, is the set of all elements belonging to В that
do not belong to A. Thus,
B\A = {x:xEB and x 0 A }.
In particular, we have that Ac = Q \ A; that is, the (absolute) com-
plement of A is the complement of A relative to Q.
Note: Clearly, we have В\А = ВпАс.
1.1 Basic Definitions and Properties □ 7
DEFINITION 1.3 Symmetric Difference
Let A and В be subsets of fl. Then the symmetric difference of A
and B, denoted A A JB, is the set of all elements belonging to either A
or В but not both A and B. Thus,
AAB = {x:xGA or xGB, and x А П В }.
Note: It is easy to see that А А В = (A \ B) U (B \ A). We leave the
verification to the reader as an exercise.
More on Set Operations
Exercises 1.1 and 1.2 discuss several properties of union and intersection.
Among those properties are the following two:
АП(ВПС) = (АПВ)ПС and A U (B U C) = (A U B) U C.
The two sets in the first equality consist of all elements that belong to A, B,
and C, which we write as А П В П C. Thus,
АП В A C = {x : x e A and x G В and x G C }.
The two sets in the second equality consist of all elements that belong to
at least one of A, B, and C, which we write as A U В U C. Thus,
AU В U C = {x : x G A or x G В or x G C}.
We can generalize the notions of intersection and union to arbitrary
collections of sets.
DEFINITION 1.4 Intersection and Union
Let C be a collection of subsets of fl, that is, С C P(fl).
a)	The intersection of C, denoted p|AeC A, is the set of elements of fl
that belong to each set in the collection C. Thus,
p| A = { x : x e A for all A G C }.
лес
8 □ Chapter 1 Set Theory
b)	The union of C, denoted (JAec ^be se^ °f elemen^s of that
belong to at least one of the sets in the collection C. Thus,
[J A = {x : x G A for some A G C }.
лес
EXAMPLE 1.2 Illustrates Definition 1.4
Let Q = TZ and C = { [0,1/n] : n G JV }. Then
p| A = {0} and (J 4 = [0,1],
AeC	Aec
as the reader can easily verify.	□
De Morgan’s laws and the distributive laws hold for any collection of
subsets. These are stated in the following two propositions whose proofs
are left to the reader as exercises.
PROPOSITION 1.3 De Morgan’s Laws
Let C be a collection of subsets of Q. Then,
(n аУ = и a°-
'AeC '	AeC
»(u-'Y
xaec 7	Aec
PROPOSITION 1.4 Distributive Laws
Let C be a collection of subsets of Q and В a subset of Q. Then,
a)	BQ л) = и(ВпЛ).
'АбС '	AeC
b)	Bu(p| а\ = р|(ВиЛ).
'Aec '	Aec
1.1 Basic Definitions and Properties □ 9
Indexed Collections of Sets
Suppose that J is a set and that to each l E I there corresponds a unique
subset Ab of Q. Then we have an indexed collection of subsets of Q,
indexed by I. We denote such a collection by
In case I = {1,2,..., N}, the indexed collection is denoted by {An}^=1
and is called a finite sequence of sets. Similarly, if I = AT, the indexed
collection is denoted by	and is called an infinite sequence of sets.
In both of these cases, we say that the indexed collection is a sequence
of sets, and we often write {An}n to represent either a finite or infinite
sequence of sets.
For an indexed collection of sets,	we denote the intersection
and union of the collection by Ab and IJ^ez respectively. Thus,
P| Ab = { x : x e Ab for all l e I}
iei
and
PJ Ab = {x : x E Ab for some t 6 I}.
In case I = {1,2,..., TV}, we use the notations Q^=1 An and (Jn=i
respectively, for the intersection and union of the indexed collection. Sim-
ilarly, if I = AT, we use the notations An and	respectively,
for the intersection and union of the indexed collection. For example, if we
let An = (0,1/n] for each n e A/\ then
oo	oo
0 An = 0 and |JA„ = (O,1].
71 = 1	71=1
Disjoint Collections of Sets
An essential concept in analysis is that of disjoint sets. Two sets are
disjoint if they have no elements in common. More generally, we have the
following definition.
DEFINITION 1.5 Disjoint and Pairwise Disjoint
Two subsets, A and B, of Q are said to be disjoint if А П В = 0.
A collection C of subsets of Q is said to be pairwise disjoint if each
two distinct members of C are disjoint. If C is a pairwise disjoint
collection, we often say the members of C are pairwise disjoint sets.
10 □ Chapter 1 Set Theory
An indexed collection, {At}te/, °f subsets of Q is said to be pairwise
disjoint if Ai П A3 = 0 whenever г / j. In case I = {1,2,..., TV} or
I = AT and the indexed collection is pairwise disjoint, we say that we
have a pairwise disjoint sequence of subsets of Q.
EXAMPLE 1.3 Illustrates Definition 1.5
Let Q — 7£.
a)	The sets Z and (0,1) are disjoint, since Z П (0,1) = 0.
b)	The sets Z and [0,1] are not disjoint, since Z П [0,1] = {0,1} 0.
c)	The indexed collection, {[n — 1,	subsets of 7Z is pairwise dis-
joint because
[m — 1, m) П [n — 1, n) = 0, m / n.
d)	The indexed collection, {[n — 1, n]}^sl, of subsets of 7Z is not pairwise
disjoint, because, for instance, [0,1] П [1,2] = {1} / 0. Note, however,
that the intersection of all the members of the collection is empty, that
is, QJJLJn — l,n] = 0. This shows that for a collection of sets to be
pairwise disjoint it is not sufficient for the intersection of that collection
to be empty. Is it necessary?	□
EXERCISES 1.1
1.1	Let A, B, and C be subsets of Q. Prove each of the following.
a)	AU В = BU A
b)	AU0 = A
c)	A U (B U C) = (A U B) U C
d)	A C A U В
e)	A = A U В if and only if В C A
1.2	Let A, B, and C be subsets of Q. Prove each of the following.
а)	АПВ = ВПА
b)	An0 = 0
с)	А П (В П С) = (А П В) П C
d)	A D А П В
e)	A = А П В if and only if В D A
1.3	Let A and В be subsets of Q. Verify each of the following statements.
a)	A = (А П B) U (А П Bc)
b)	АП В = 0 => AC Bc
с)	AC В =>BC C Ac
1.4	Let A and В be subsets of Q. Prove that
a)	A\B = AQBC.
b)	АД B = (A\B)U(B\A).
1.1 Basic Definitions and Properties □ 11
1.5	Let A, B, and C be subsets of Q. Establish each of the following facts.
а)	А А (В А С) = (А Л В) A C
b)	AAQ = AC
c)	A A 0 = A
d)	A A A = 0
1.6	Let A, B, and C be subsets of Q.
a)	Prove that А П (В A С) = (А П В) А (А A C).
b)	What is the relationship between A U (В A C) and (A U B) A (A U С)?
c)	Precisely when does A U (В A C) = (A U B) A (A U С)?
1.7	Let A and В be subsets of Q. Show that A = A A В if and only if В = 0.
1.8	Let {An}“=1 be an infinite sequence of subsets of Q.
a) Prove that
k=n
The set on the left is called the limit inferior of {An}^=1 and is denoted
by lim inf„—oo An; the set on the right is called the limit superior
of {Anj^Lj and is denoted by limsupn_^oo An.
b)	Describe in words the limit inferior and limit superior of {Anj^Lj, and
use that description to'interpret the relation in part (a).
c)	Let Q = 1Z and define
An
f [0,1 + 1/n],
( [-1 - l/n,0],
if n is an even positive integer;
if n is an odd positive integer.
Determine lim inf n—oo An and limsupn_>oo An.
1.9	Prove the general form of De Morgan’s laws, Proposition 1.3 on page 8.
1.10	Prove the general form of the distributive laws for sets, Proposition 1.4 on
page 8.
1.11	Let C be a collection of subsets of Q and В a subset of Q. Prove each of the
following facts.
a)	If В C Ua6C A> then B = UaecG4 n B)-
b)	If Uaec A ~ then E = Uaec^ D E) for each subset E of Q.
c)	If C is pairwise disjoint, then so is the collection { А П E : A E C} for
each subset E of Q.
d)	We say that C is a partition of Q if it is pairwise disjoint and its union
is Q. Conclude from parts (b) and (c) that if C is a partition of Q, then
each subset E of Q can be expressed as a disjoint union of the collection
of sets { А A E : A E C}.
12 □ Chapter 1 Set Theory
1.12	There is a slight distinction between the notions of pairwise disjoint for
nonindexed collections of sets and indexed collections of sets, namely, an
indexed collection of sets,	can fail to be pairwise disjoint even
though the collection, C = {Ab : l € I}, is pairwise disjoint. Provide an
example that illustrates this fact.
1.13	Give an example of a collection C of sets that is not pairwise disjoint, has at
least four members, and is such that any three distinct members of C have
an empty intersection.
1.2 FUNCTIONS AND SETS
Suppose that Q and Л are sets. A function (mapping, transformation)
from Q to A is a rule that assigns to each element x G Q a unique element
f(x) G A? We call f(x) the value of f at x or the image of x under f.
To indicate that f is function from Q to Л, we often write f: Q —> Л.
The set Q is called the domain of f. The set {f(x) : x G Q} is called
the range of f. We note that, in general, the range of f will be a proper
subset of Л. Two further concepts important in the study of functions are
given in Definition 1.6.
DEFINITION 1.6 One-to-One and Onto
Let f be a function from Q to Л.
a)	f is said to be one-to-one (or injective) if distinct elements of Q
have distinct images; that is, if f(xi) = f(x2) implies that x± = rr2.
b)	/ is said to be onto (or surjective) if each element of Л is the image
of some element of Q; that is, for each у G Л, there is an x G Q
such that у = f{x). Thus, f is onto if and only if the range of f
equals A.
If a function is both one-to-one and onto, then we can invert the func-
tion by using the rule that assigns to each element in the range the unique
element in the domain of which it is the image. More precisely, we have
the following definition.
t We will take an intuitive approach to functions; that is, we will not use the definition
based on ordered pairs.
1.2 Functions and Sets □ 13
DEFINITION 1.7 Inverse of a Function
Suppose that f: fl —► Л is one-to-one and onto. For у e Л, let /“1(t/)
be the unique x e fl such that у = f(x). The function f'"1: A —> Q so
defined is called the inverse of the function f.
EXAMPLE 1.	4 Illustrates Definition 1.7
Define /: [0,1] —► [2,5] by f(x) = 3x2 + 2. Then f is one-to-one and onto.
As the reader can verify, the inverse of the function /, /-1: [2,5]	[0,1],
is given by (?/) = ^/(з/ — 2)/3.	□
Let f be a function from fl to Л and g be a function from Л to Г.
Then we can define a function from fl to Г by first applying f and then
applying g to that result. Here is a formal definition.
DEFINITION 1.8 Composition of Functions
Let f: fl —> Л and g: Л —> Г. Then the composition of g with /,
denoted g о /, is the function g о f: fl —> Г defined by

EXAMPLE 1.	5 Illustrates Definition 1.8
Define f : TZ —> [0, oo) by f(x) = x2 and g: [0, oo) —»1Z by g(y) = y/у. Then
the composition of g with f, g о ft TZ —» TZ, is given by
(9 ° /)(*) = g(f(x)) = g{x2) =	= |a:|.
In this case, we can also consider the composition going the other way, that
is, the composition of f with g, f о g: [0, oo) —> [0, oo), which is given by
(У 0 p)(y) = №(j/)) = /(л/у) = (x/y)2 = y-	D
Sometimes we have a function defined on a set that we want to restrict
to a subset of that set. To be specific, suppose /: fl —► Л and that A C fl.
From f we can obtain a function from A to Л, called the restriction of f
to A, denoted /|д, and defined by /\a(x) = /(^) for x € A.
14 □ Chapter 1 Set Theory
Sequences and Subsequences
Sequences are an important class of functions. An infinite sequence is
a function whose domain is the set of positive integers, X. If s is an
infinite sequence, then s(n) is called the nth term of the sequence and is
usually denoted sn. For ease in notation, we use {sn}^Li to denote both
the infinite sequence whose nth term is sn and the range of the sequence,
that is, { sn : n € Af }; context will determine which meaning is intended.
A finite sequence of length N is a function whose domain is the
first N positive integers, {1,2,..., TV}. As for infinite sequences, if s is a
finite sequence, then s(n) is called the nth term of the sequence and is
usually denoted sn. For ease in notation, we use {sn}^=1 to denote both
the finite sequence of length N whose nth term is sn and the range of the
sequence, that is, { sn : n = 1, 2, ..., N }; context will determine which
meaning is intended.
We use the term sequence to refer to both infinite and finite se-
quences. The notation {sn}n represents a sequence that may be finite or
infinite and whose nth term is sn. If the range of a sequence {sn}n is a sub-
set of a set Q, then we say that {sn}n is a sequence of Q or a sequence
of elements of Q.
EXAMPLE 1.	6 Illustrates Sequences
a)	The sequence {З^71}^ is an infinite sequence of 1Z.
b)	A sequence {An}n of subsets of a set Q, as defined on page 9, is a
sequence of P(f2), the set of all subsets of П.	□
Let	be an infinite sequence and {п^}^ an infinite sequence
of positive integers such that ni < rt2 < • • •• Then the sequence {$nfc}£Li
is said to be a (infinite) subsequence of	We note that a subse-
quence of a sequence is the composition of two functions.
To illustrate subsequences, let {sn}^^ be the sequence in part (a) of
Example 1.6 and let = 2k. Then snk = 3"nfc = 3“2fc = $~k. In other
words, the subsequence {snfc}i&i sequence {9~n}^=1.
Subsequences of finite sequences are defined similarly to those for in-
finite sequences. We leave the details to the reader.
Images and Inverse Images
Let f: Q —> A. If A C fi, then we define
= {f(x) --xeA},
called the image of A under f.
1.2 Functions and Sets □ 15
If В с Л, then we define
called the inverse image of В under /.
The next two propositions relate set operations and functions. We
state the results in terms of indexed collections because that is generally
what we deal with? The proofs of the propositions are left to the reader
as exercises.
PROPOSITION 1.5
Let f:Cl—>A,AcQ, and { Ab }b$i an indexed collection of subsets of Q.
Then
*) f(UAj = Uf(AJ
' lei
and
b)
\ei '
If f is one-to-one, then
с)	/(Пл) = Г|/(Л)
\ei ' i€i
and
d)	f(Ac) С (/(Л))с.
And, if f is onto, then
e) f(A') D (/(A))c.
PROPOSITION 1.6
Let	В C A, and { Bb }bEj an indexed collection of subsets of A.
Then
a)	=
4ez ' lei
t Actually, we are not losing generality, as any collection of subsets is an indexed
collection that is indexed by the collection itself.
16 □ Chapter 1 Set Theory
b)
and
с)	Г\вс) = (г\в))с.
The Axiom of Choice and Zorn's Lemma
Many of the results that we will discuss in this text require more than the
axioms of elementary set theory. Rather, they depend in addition on an
axiom called the axiom of choice, which is independent of (i.e., cannot
be derived from) the axioms of elementary set theory.
Roughly speaking, the axiom of choice asserts that given a collection
of nonempty sets, it is possible to select an element from each set in the
collection. More precisely, we have the following statement.
Axiom of Choice
Suppose that C is a collection of nonempty sets. Then there exists a func-
tion f:C -* IJagc suc^ /(^) A for each A e C.
Although most mathematicians use the axiom of choice without hes-
itation, some employ it only when they cannot obtain a proof without it
and others consider it unacceptable. In this text, we will apply the axiom
of choice freely, both tacitly and explicitly.
There are several important equivalences to the axiom of choice. We
will discuss only one, namely, Zorn's lemma. In preparation for stating
Zorn’s lemma, we make the following definition.
DEFINITION 1.9 Partial Ordering; Partially Ordered Set
Let Q be a set. A relation on Q is said to be a partial ordering if
for all x, y, z € fi,
a) x -< x [reflexive].
b) x -< у and у -< x implies x = у	[antisymmetric].
c) x -< у and у	z implies x -< z	[transitive].
The pair (Q, -<) is called a partially ordered set.
1.2 Functions and Sets □ 17
EXAMPLE 1.	7 Illustrates Definition 1.9
a)	We have that < is a partial ordering on and, hence, (7£, <) is a
partially ordered set.
b)	Let Q be a set. Then C is a partial ordering on P(Q) and, hence,
(P(Q), C) is a partially ordered set.	□
Let (Q, -<) be a partially ordered set. A subset C of Q is called a chain
if for each x,y G C, either x -< у or у -< x. An element и G Q is called
an upper bound for a subset A of Q if x -< и for all x G A. An element
m G is called a maximal element of Q if x G Q and m -< x implies
that x = m.
With the preceding definitions in mind, we can now state Zorn’s lemma
which, as we mentioned earlier, is equivalent to the axiom of choice?
Zorn’s Lemma
Let (Q, -<) be a nonempty partially ordered set with the property that each
chain has an upper bound. Then Q has a maximal element.
Applications of both the axiom of choice and Zorn’s lemma will appear
throughout the text.
Cartesian Products
Next we will introduce Cartesian products. First we define the Cartesian
product of a finite number of sets.
DEFINITION 1.10 Cartesian Product of a Finite Number of Sets
Let A and В be two sets. Then the Cartesian product of A and В
(in that order), denoted A X B, is the set of all ordered pairs (a, b),
where a G A and b G B. Thus,
A x В = { (a, b) : a G A, b G В }.
More generally, if Ai, A2, ..., An are sets, then the Cartesian product
of those n sets, denoted Ai X A2 X • • • X An or Xj=1 Аь, is the
t For a proof of the equivalence, see, for example, John L. Kelley’s General Topology
(New York: Van Nostrand, 1955), p. 33.
18 □ Chapter 1 Set Theory
set of all ordered n-tuples (ai, O2> • • •» an), where ak G Ak for к = 1, 2,
..., n. Thus,
n
Ak — { (di,	• • • j ®n) : ^k £ Afc, 1 < к < П }.
k=l
An important special case of Cartesian product occurs when all of the
sets in the product are identical. If Ak = A for 1 < к < n, where A is some
set, then we write An for the Cartesian product. In other words,
An = Ax Ax ••• x A.
EXAMPLE 1.	8 Illustrates Definition 1.10
a)	If at least one of A and В are empty, then so is A x В.
b)	Let Г and A be two sets, А С Г and В C A. Then the subset A x В
of Г x A is called a rectangle. Note that, in general, not every subset
of Г x A is a rectangle.
c)	The set lZn is called Euclidean n-space.
d)	The set Cn is called unitary n-space.	□
We can generalize the Cartesian product to any collection of sets. This
is done as follows.
DEFINITION 1.11 Cartesian Product of a Collection of Sets
Let be an indexed collection of sets. The Cartesian product
of the collection, denoted	is the set of all functions x on I
such that z(t) is an element of Ab for each l G I. Thus,
X Ab = < x: I -* (J Ab : x(b) G Ab, lE I
lei	tei
We call x(l) the tth coordinate of x and usually denote it by xb.
If Ab = 0 for some l G I, then X beI Ab = 0. Conversely, in view of
the axiom of choice, if Ab 0 for all l G I, then X bEl Ab 0.
An important special case of Cartesian product occurs when all of the
sets in the product are identical. Suppose that Ab = A for all l G /, where
A is some set. Then we write A1 for the Cartesian product. Thus, A1 is
the set of all functions from I to A.
1.2 Functions and Sets □ 19
EXAMPLE 1.	9 Illustrates Definition 1.11
a)	If I = {1,2,, n}, then we use the notation X £=1 Ak for the Cartesian
product.
b)	If I = {1,2, ...,n}, then we write An in place of	Thus,
An denotes the set of all sequences of length n of elements of A .
c)	If I = W, we use the notation X Xi An for the Cartesian product.
d)	If I = Af, then w§ sometimes write A°° in place of A^. Thus, A°° de-
notes the set of all infinite sequences of elements of A.
e)	is the set of all real-valued functions on [0,1].
f)	C* is the set of all complex-valued functions on TZ.	□
We seemingly have two different definitions of the Cartesian product
of a finite number of sets, one given by Definition 1.10 and the other by
Definition 1.11. However, the appropriate identification shows that the
difference is only apparent.
Indeed, let Л1, Л2, ..., An be n sets. By Definition 1.10, X £=1 Ak is
the set of all ordered n-tuples (ai, 02,..., an), where ak G Ak for 1 < к < n.
On the other hand, by Definition 1.11, х£=1Л& is the set of all func-
tions x on {1,2,..., n} such that Xk G Ak for 1 < к < n. If we identify
each such function x with the ordered n-tuple (^1,^2, • • • 5^n)5 then we
obtain a 1-1 correspondence between the Cartesian product x£=1 Ak as
defined in Definition 1.11 and the Cartesian product х£=1 Ak as defined
in Definition 1.10.
We will follow conventional notation and use the ordered n-tuple in-
terpretation of the Cartesian product of a finite number of sets. Thus, for
example, we construe 7Zn as the set of all ordered n-tuples of real numbers,
realizing, however, that it can also be interpreted as the set of all sequences
of length n of real numbers.
EXERCISES 1.2
1.14	Suppose that /:Q —> A is one-to-one and onto. Prove that J"1 (/(□?)) = x
for all x € Q and	= у for all у G A.
1.15	Let /:Q—*A.
a)	Prove that f is one-to-one if and only if there is a function g: A —> Q
such that (g о /)(ж) = x for all x 6 Q.
b)	Prove that f is onto if and only if there is a function g: A —► Q such that
(/ 0 9)(y) — У f°r all У £ A. Hint: The axiom of choice is needed for the
“only if’ part.
c)	Suppose there is a function <7: A —> Q such that (g о /)(х) = x for all
x G Q and (/ о g)(y) = у for all у 6 A. Prove that g = J”1.
20 □ Chapter 1 Set Theory
1.16	Let	be an infinite sequence of elements of Q and {snfc}^=1 a sub-
sequence of {sn}^_1. Interpret the subsequence as a composition of two
functions.
1.17	Suppose that f : Q —> A is one-to-one and onto. Show that for В C A, the
two definitions of are consistent; that is, the image of В under J"1
equals the inverse image of В under f.
1.18	Prove Proposition 1.5 on page 15.
1.19	Refer to Proposition 1.5 on page 15.
a)	Show that the assumption of one-to-one cannot be dropped for parts (c)
and (d).
b)	Show that the assumption of onto cannot be dropped for part (e).
1.20	Prove Proposition 1.6 beginning on page 15.
1.21	Let f: Q -> Л, A C Q, and В C A.
a)	Show that	С В and that equality holds if f is onto.
b)	Show that J"1 (/(A)) D A and that equality holds if f is one-to-one.
1.22	Show that the axiom of choice is equivalent to the following statement: If
is any indexed collection of nonempty sets, then X ceJ Ac / 0.
1.23	Let Q be a nonempty set. Construct a one-to-one function from P(Q)
onto {0, l}n.
1.24	Let Q be a nonempty set. Prove that there is no function from Q onto P(Q).
Hint: Suppose to the contrary that such a function, say, /, exists. Let
A = {x e Q : x £ f(x)}.
1.3	EQUIVALENCE OF SETS; COUNTABILITY
We see from Proposition 1.5 on page 15, that if /: Q —» A is one-to-one
and onto, then, from a set theoretic point of view, Q and A are equivalent
because, under those circumstances, the set operations are preserved by f.
Thus we can think of f as simply renaming the elements of Q according to
the rule x —> f(x). If f: Q —> A is one-to-one and onto, then it is called a
1-1 correspondence (or bijective function).
Keeping the previous paragraph in mind, we now define set equiva-
lence. Suppose that A and В are any two sets. Let us write A ~ В if there
is a 1-1 correspondence from A to B. We leave it as an exercise for the
reader to show that for any three sets, A, B, and (7,
•	A ~ A [reflexive].
•	A ~ В implies В ~ A [symmetric].
•	A ~ В and В ~ C implies A ~ C [transitive].
In view of these facts, we make the following definition.
1.3 Equivalence of Sets; Countability □ 21
DEFINITION 1.12 Equivalence of Sets
Two sets are said to be equivalent if there is a 1-1 correspondence
from one to the other.
Finite, Infinite, Countable, and Uncountable Sets
Using the concept of equivalence of two sets, we can now present definitions
regarding the “size” of a set in the sense of how many elements it contains.
DEFINITION 1.13 Finite, Infinite, Countable, and Uncountable
Let A be a set. We say that
a)	A is finite if it is either empty or equivalent to the first N positive
integers for some N E M In the former case, A is said to consist
of 0 elements and, in the latter case, N elements.
b)	A is infinite if it is not finite.
c)	A is countably infinite if it is equivalent to X.
d)	A is countable if it is either finite or countably infinite.
e)	A is uncountable if it is not countable.
EXAMPLE 1.10 Illustrates Definition 1.13
a)	The set of all integers, Z, is countably infinite. Indeed, the function
/:ЛГ —> Z defined by
‘	, x _ (n/2,	n even;
|-(n-l)/2, n odd,
is a 1-1 correspondence from X to Z.
b)	Any (nondegenerate) interval of is uncountable. One proof of this fact
is presented in Exercise 1.26 and another in Exercise 3.34 on page 126.
In particular, 1Z is uncountable.
c)	Define /:№ -> AT by /(m,n) = 2m~1(2n - 1). Then it can be shown
(see Exercise 1.27) that f is a 1-1 correspondence. Consequently, AT2 is
countably infinite and, hence, countable.	□
We can express countability in terms of sequences. If A is countably
infinite, then, by definition, there is a 1-1 correspondence	—» A. If we
let sn = /(ti), then the infinite sequence {sn}^Li is called an enumeration
22 □ Chapter 1 Set Theory
of A. Similarly, if A is a finite nonempty set, then, by definition, there is an
N G AT and a 1-1 correspondence /: {1,2,..., N} —> A. If we let sn = f(n),
then the finite sequence {sn}^=1 is called an enumeration of A.
In particular, we see that if a set is countably infinite, then it is the
range of an infinite sequence (but not conversely); and that if a set is finite
and nonempty, then it is the range of a finite sequence (and conversely).
The following proposition is also quite useful.
PROPOSITION 1.7
A nonempty set is countable if and only if it is the range of an infinite
sequence.
PROOF: Suppose A is countable. Then, by definition, it is either finite
or countably infinite. If it is countably infinite, then, by definition, it is
equivalent to X, which means there is a one-to-one and onto function, /,
from AT to A. Letting sn = /(n), we have that A is the range of the infinite
sequence {sn}~r
If A is finite (and nonempty), then, by definition, there is an N G AT
such that A is equivalent to the first N positive integers. Let д be a
one-to-one and onto function from {1,2,... ,N} to A. Select x G A and
define the infinite sequence s by sn = g(n) for n = 1, 2, ..., AT, and sn = x
for n > N. Then A is the range of the infinite sequence {sn}Xr
Conversely, suppose A is the range of an infinite sequence, {sn}^=1.
We claim that A is countable. If A is finite, there is nothing to prove. So,
assume that A is infinite. We will construct a 1-1 correspondence from А/
to A thereby proving that A is countably infinite and, hence, countable.
Let Tii = 1. Since A is infinite, A \ {sni} / 0- Therefore, because the
range of {snjJXi is A, the set { n G AT : sn / $i } is not empty. Denote
by ri2 the smallest integer in that set? Note that ni < П2.
Proceeding inductively, note again that since A is infinite, we have
that A \ {sni5Sn2r • • ,5nfc} / 0- Therefore, as the range of {sn}Xi is
the set {n€fi[:sn^snp 1 < j < к } is not empty. Denote by rik+i the
smallest integer in that set and note that xik < Пк+i-
We claim that the function f:Af A defined by f(k) = snk is a
1-1 correspondence. By construction, f is one-to-one. So it remains to
show that f is onto. Let x G A. Since the range of {sn}^=i is A, the set
{ n G AT : sn = ж } is not empty. Let тп be the smallest integer in that
t Here and elsewhere in this proof we are using the well-ordering principle: Each
nonempty subset of positive integers has a smallest element.
1.3 Equivalence of Sets; Countability □ 23
set. If m — 1, then x = $i = sni = /(1). Otherwise, let к be the smallest
integer such that m' < n^. Because sn x = sm for n < m, we have that
sm / snj for 1 < j < к — 1, which implies that m > Therefore, m =
and, consequently, x = snk = f(k).	
Proposition 1.7 often provides an efficient method for proving that a
set is countable. The next two propositions illustrate that fact.
PROPOSITION 1.8
A subset of a countable set is countable.
PROOF: Let A be a countable set and В C A. We claim that В is count-
able. If В = 0, there is nothing to prove; so, assume В / 0. This implies
that A is nonempty and, hence, by Proposition 1.7, A is the range of an
infinite sequence, {sn}Xi •
Choose x G В and let tn = sn if sn G В and tn — x if sn B. Then
В is the range of the infinite sequence	Applying Proposition 1.7
again, we conclude that В is countable.	
PROPOSITION 1.9
The image of a countable set is countable.
PROOF: Let A be a countable set and f a function defined on A. By
Proposition 1.7, A is the range of an infinite sequence, {sn}^=1. For each
n G AT, define tn = f(sn).
Now, let у G /(A). Then there is an x G A such that f(x) = y. Since
A is the range of the infinite sequence {sn}^Li, there is an n G X such that
sn = x. Therefore, у = f(x) = f(sn) — tn. This shows that /(A) is the
range of the infinite sequence	Hence, by Proposition 1.7, /(A) is
countable.	
PROPOSITION 1.10
A countable union of countable sets is countable.
PROOF: Let C be a countable collection of countable subsets of a set Q
and let A = Ucgc must Prove that A is countable.
If C is empty, then its union is empty and hence countable. So, assume
that C is a nonempty collection. Without loss of generality, we can also
assume that each member of C is nonempty.
Since C is nonempty and countable, Proposition 1.7 implies that it is
the range of an infinite sequence {An}^.! and, since each member of C is
24 □ Chapter 1 Set Theory
countable, Proposition 1.7 implies that each An is the range of an infinite
sequence
Now, define /rAf2 —> A by f(m,n) = x^m and note that f is onto.
By Example 1.10(c) on page 21, № is countable. Therefore, by Proposi-
tion 1.9, A is countable, being the image of A/2 under f.	
In Example 1.10(c) we pointed out that Af2, the Cartesian product
of Af with itself, is countable. More generally, we have the following fact.
PROPOSITION 1.11
The Cartesian product of two countable sets is countable.
PROOF: Let A and В be two countable sets. If either A or В is empty,
then so is A x B. So assume that both A and В are nonempty. By
Proposition 1.7, A and В are the range of infinite sequences, say, {un}^Li
and {bn}^=1. Define	—> A x В by /(m,n) = (am,bn). Then f is
onto and, consequently, because AT2 is countable, Proposition 1.9 implies
that A x В is countable.	
We can easily extend Proposition 1.11 to any finite number of sets.
This will be explored in the exercises.
PROPOSITION 1.12
The set Q of rational numbers is countable.
PROOF: Example 1.10(a) on page 21 shows that Z is countable. Hepce,
by Proposition 1.11, so is Z x flf. Define f: Z x A/* —> Q by /(z,n) = z/n.
Since f is onto, Proposition 1.9 implies that Q is countable.	
EXERCISES 1.3
1.25 If A and В are sets, write A ~ В if there is a 1-1 correspondence from A
to B. Prove that ~ is reflexive, symmetric, and transitive. In other words,
if A, B, and C are sets, show that the following hold.
a)	A ~ A [reflexive]
b)	A ~ В implies В ~ A [symmetric]
с)	A ~ В and В ~ C implies A ~ C [transitive]
+1.26 In this exercise, we will prove that any (nondegenerate) interval of H is
uncountable.
a)	Show that the interval [0,1) is uncountable. Hint: Suppose to the con-
trary that [0,1) is countable and let {^n}^=1 be an enumeration of its
1.3 Equivalence of Sets; Countability □ 25
elements. For each n G Af, let O.dnidn2... denote the unique decimal
expansion of xn not containing only finitely many digits differing from 9.
Then consider the number О.акгг ..., where an = 1 if dnn = 0 and an = 0
otherwise.
b)	Use part (a) to conclude that (0,1) is uncountable.
c)	Use part (b) to show that any bounded interval of the form (a, b) is
uncountable. Hint: Construct a one-to-one and onto function from (0,1)
to (a, b).
d)	Use part (c) to conclude that any interval is uncountable. In particular,
is uncountable.
1.27 Refer to Example 1.10(c) on page 21. Prove that the function /:№ —* Af
defined by f(m, n) = 2Tn-1(2n — 1) is a 1-1 correspondence.
★ 1.28 Prove that any infinite set contains a countably infinite subset.
1.29	Let A be a set. Prove that the following statements are equivalent.
a)	A is infinite.
b)	There is a one-to-one function f: A —> A that is not onto.
c)	There is an onto function g: A —> A that is not one-to-one.
1.30	Suppose that f: A —> В is one-to-one and that В is countable. Prove that
A is countable.
1.31	Prove that the Cartesian product of a finite number of countable sets is
countable.
1.32	In Proposition 1.11, we proved that the Cartesian product of two countable
sets is countable and, in Exercise 1.31, we showed that the Cartesian product
of a finite number of countable sets is countable. Is it true, in general, that
the Cartesian product of a countable number of countable sets is countable?
*1.33 Let Q be a set. A relation, =, on Q is said to be an equivalence relation
if for all x, уj z G Q,
• x = x [reflexive]
•	x = у implies у = x [symmetric]
•	x = y and у = z implies x = z [transitive]
a)	Give three examples of equivalence relations.
b)	Give three examples of relations that are not equivalence relations.
1.34	Refer to Exercise 1.33. Let Q be a nonempty set and = an equivalence
relation on Q. For each x G Q, define Ex = {yi. G Q : у = x}. And let
C = {Ex : x G Q }. Each member of C is called ail equivalence class of Q
under =.
a)	Show that for each x, у € Q, either Ex О Ey = 0 or Ex = Ey.
b)	Prove that Q = IJagc A-
c)	Conclude that = partitions Q into disjoint equivalence classes; that is,
Q is a disjoint union of the equivalence classes under =.
1.35	Let a and b be real numbers such that a < b. Prove that the intervals (a, b)
and [a, b] are equivalent.
26 □ Chapter 1 Set Theory
1.36	Prove the Schroder-Bernstein theorem: Suppose that A and В are sets
and that there are one-to-one functions f: A —* В and g: В —> A. Then
A ~ B. Proceed as follows. Define
t(E) = fl(/(F)c)c, EC A.
a)	Show that if E C F C A, then т(Е) C r(F).
b)	Let C = {EcA:EC r(E)} and set G — Uegc Prove that
t(G) = G and, hence, that Gc = g(f(G)c). In particular, Gc is a subset
of the range of g.
c)	Define h: A —> В by
_ J /(t), if x G G\
~ 1
Prove that h is a 1-1 correspondence.
1.4 ALGEBRAS, a-ALGEBRAS, AND MONOTONE CLASSES
In set theory, as elsewhere in mathematics, it is important to distinguish
collections that are closed under the relevant operations? For example,
in linear algebra, the relevant operations are vector addition and scalar
multiplication. Subsets of vector spaces closed under those operations are
called subspaces and receive intensive study because of their significance.
Algebras
The three basic operations in set theory are union, intersection, and com-
plementation. A nonempty collection of sets closed under these operations
is called an algebra of sets. Thus, we make the following definition.
DEFINITION 1.14 Algebra of Sets
Let Q be a set. A nonempty collection До of subsets of Q is called an
algebra if the following two conditions are satisfied:
a)	A G До implies Ac G До-
b)	A, В G До implies A U В G До-
t Roughly speaking, a collection (set) C is closed under an operation if whenever the
operation is applied to elements of C, the resulting element also belongs to C.
1.4 Algebras, cr-Algebras, and Monotone Classes □ 27
Conspicuous by its absence in Definition 1.14 is closure under inter-
section. However, it is easy to show that this property follows from the
two stated in the definition — an algebra is necessarily closed under inter-
section; that is, if До is an algebra and А, В € До, then А П В € До- We
leave the proof of this fact to the reader as an exercise. We also leave it to
the reader to prove the following two facts:
•	An algebra is closed under finite unions and intersections; that is, if До is
an algebra and Ak € До for к = 1, 2, ..., n, then Ufc=i Ak Ao and
П£=1 Ak С До-
•	A nonempty collection of subsets of Q is an algebra if it is closed under
complementation and intersection.
EXAMPLE 1.11 Illustrates Definition 1.14
Let Q be a nonempty set. It is easy to see that each of the following is an
algebra of subsets of Q:
a)	the power set, P(Q), that is, the set of all subsets of Q;
b)	the trivial algebra, {0, Q}; and
c)	{0, A, Ac, Q}, where A is a nonempty proper subset of Q.	□
Next we will prove that the union of a sequence of members of an
algebra can always be expressed as a disjoint union of members of the
algebra. More precisely, we have the following useful proposition.
PROPOSITION 1.13
Let До be an algebra of subsets of Q and {An}n a sequence of sets in До
(i.e.f An G До for each n). Then there is a pairwise disjoint sequence {Bn}n
of sets in До such that |Jn An = |Jn Bn.
PROOF: The proof uses a process that we will refer to informally as dis-
jointizing. Let B\ = Ai and, for n > 2, let Bn = An \ (Ufc=i Ak)-
First we prove that Bn € Ao for each n. Let Cn = Ufc=i Ak- Since
Ao is an algebra, we have, in turn, that Cn € Ao (because Ao is closed under
finite unions), E Ao (because Ao is closed under complementation), and
Bn = An\Cn = АпГ\С^ E Ao (because Ao is closed under intersection).
Next we show that (Jn An = |Jn Bn. Since Bn C An for each n, it is
clear that (Jn An D (Jn Bn. To show the reverse inclusion, let x E (Jn An.
Then x E An for some n. Let m be the smallest such n. If m = 1, then
x E Ai = Bi. If m > 2, we have x E Am and x Ak for к < m. This
28 □ Chapter 1 Set Theory
implies that x E Am but x Ufc=i Ak, in other words,
m—1
ж E Am \ I Ak j — Bm c |^J Bn.
Thus, |Jn An c (Jn Bn.
It is useful to know that given a collection of subsets, there is a small-
est algebra containing the collection. We state this fact formally in the
following proposition whose proof is left to the reader as an exercise.
PROPOSITION 1.14
Let C be a nonempty collection of subsets of Q. Then there is a smallest
algebra of subsets of Q containing C.
The smallest algebra containing a collection C of subsets of Q is called
the algebra generated by C and is denoted Ao(C). Thus, Ao(C) is an
algebra of subsets of Q; С C Ao(C); and if Ao is an algebra of subsets of Q
such that С C Ao, then Ao Э Ao(C).
As a simple example, let A be a nonempty proper subset of a set Q.
Then Ao({A}) = {0,A,Ac,Q}.
a-Algebras
As we have seen, an algebra of sets is closed under finite unions (and inter-
sections). For the purposes of modern mathematics, a stronger condition
is usually required, namely, closure under countably-infinite unions (and
intersections). Hence, we make the following definition.
DEFINITION 1.15 сг-Algebra of Sets
Let Q be a set. A nonempty collection A of subsets of Q is called a
а-algebra if the following two conditions are satisfied:
a)	A € A implies Ac € A
b)	{An}n C A implies \JnAne A
Using the same type of argument used for algebras, we can show that
a cr-algebra is necessarily closed under countable intersections; that is, if
A is a а-algebra and {An}n C A, then An E A. We leave the proof
1.4 Algebras, a-Algebras, and Monotone Classes □ 29
of this fact to the reader as an exercise. We also leave it to the reader to
prove that a nonempty collection of subsets of Q is a a-algebra if it is closed
under complementation and countable intersections.
EXAMPLE 1.12 Illustrates Definition 1.15
a)	Clearly, any a-algebra is an algebra. However, the converse is not true.
See the exercises for several examples.
b)	The three algebras given in Example 1.11 are also a-algebras.	□
Additional examples of a-algebras are presented in the exercises. We
will also encounter several a-algebras in future chapters; for instance, in
Chapter 3, we will discuss the a-algebra of Borel sets and the a-algebra of
Lebesgue measurable sets.
It is useful to know that given a collection of subsets, there is a smallest
a-algebra containing the collection. We state this fact formally in the
following proposition whose proof is left to the reader as an exercise.
PROPOSITION 1.15
Let C be a nonempty collection of subsets of Q. Then there is a smallest
a-algebra of subsets of Q containing C.
The smallest a-algebra containing a collection C of subsets of Q is
called the a-algebra generated by C and is denoted Л(С). Thus, Л(С) is
a a-algebra of subsets of Q; С С A(C); and if A is a a-algebra of subsets
of Q such that С С Л, then A D A(C).
As a simple example, let A be a nonempty proper subset of a set Q.
Then A({A}) = {0,A,Ac,Q}.
Monotone Classes and the Monotone Class Theorem
Besides algebras and a-algebras, we also need to consider monotone classes.
Here is the definition of a monotone class.
DEFINITION 1.16 Monotone Class
Let Q be a set. A nonempty collection P of subsets of Q is called a
monotone class if it satisfies the following two conditions:
a) {Dn}n=i c	and Oi С O2 C ••• implies	Dn £ T>.
b) {Dn}n=i c	and D d2 D • • • implies	Dn G V.
30 □ Chapter 1 Set Theory
Let {An}^Lx ke a sequence of subsets of Q. If Ai С A2 C • • •, then the
sequence is said to be monotone nondecreasing or, more simply, non-
decreasing. If Ai D A2 D • • •, then the sequence is said to be monotone
nonincreasing or, more simply, nonincreasing. A sequence of subsets
is called monotone if it is either monotone nondecreasing or monotone
nonincreasing. Using this terminology, we see that a monotone class is a
collection of subsets that is closed under unions of nondecreasing sequences
and intersections of nonincreasing sequences.
EXAMPLE 1.13 Illustrates Definition 1.16
a)	Any cr-algebra is a monotone class.
b)	Let A C Q and T> = {A}. Then, trivially, T> is a monotone class. Note,
however, that it is not a cr-algebra.	□
For us, the most important result regarding monotone classes is the
following theorem, known as the monotone class theorem.
THEOREM 1.1 Monotone Class Theorem
Let Q be a set and Ao an algebra of subsets of Q. Let D be a collection
of subsets of Q such that V D Ao and V is a monotone class. Then
T> D A(Ao), the cr-algebra generated by Ao-
PROOF: Let F be the smallest monotone class that contains Ao- (Exer-
cise 1.53 guarantees the existence of F.) We claim that F = A(Ao). Since
every cr-algebra is a monotone class, we have A(Ao) Э F. If we can show
F is a cr-algebra, that will imply A(Ao) C F, and the desired equality will
follow. First we show that F is an algebra.
Suppose A € Ao, and let 8 = { F E F : AUF E F }. We will show that
8 is a monotone class containing Ao- Since Ao is an algebra and F D Ao, it
follows that 8 D Ao- Now suppose {En}n c 8 and Ei С E2 C •••. Because
{Fn}n C F and F is a monotone class, |Jn En E F. And, because F is a
monotone class, {A U En}n C F, and A U Ei C A U E2 C • •, we have that
A U (|Jn En) — |Jn(A U En) E F. Therefore, (Jn En 6 8. Similarly, 8 is
closed under intersections of nonincreasing sequences. So 8 is a monotone
class containing Ao and, consequently, 8 D F. But, by definition, 8 C F.
Thus, 8 = F- In other words, A U F E F for all A € Ao and F E F-
Now suppose G E F, and let Q — { F E F : FUG E F}. We will show
that Q is a monotone class containing Ao- From the previous paragraph,
we know that Q D Ao and, using the same argument as in that paragraph,
we can show that Q is a monotone class. This implies that Q — F. In other
words, F U G E F for all F, G E F. Hence, F is closed under union.
1.4 Algebras, a-Algebras, and Monotone Classes □ 31
Next we show that F is closed under complementation. To that end,
let 7Y = { F G F : Fc G F}. Because Ло is an algebra and F D Ло, it
follows that 7Y D Aq. Also, because J7 is a monotone class, it is easy to see
that H is a monotone class. Therefore, 7Y = F\ that is, F is closed under
complementation.
We have now shown that F is an algebra of sets. To show it is a
a-algebra, we need only prove that it is closed under countably-infinite
unions. So let {Fn}Xi c For n e let En = Ufc=i Fk- Then Fi C
£2 C • • • and UXi E<ri = U~=i Fn- Since F is an algebra, {En}™=1 C F
and, therefore, since F is a monotone class, UXi Fn = UX1 En € JT.
Hence, F is a a-algebra.
We now know that F = Л(Ло). That is, the smallest monotone class
that contains Aq is Л(Ло). Because T> is a monotone class that contains Aq,
it must be that P D Л(Ло).	
In proving the monotone class theorem, we showed that the smallest
monotone class that contains an algebra of subsets is the а-algebra gener-
ated by the algebra. That result is important in its own right.
EXERCISES 1.4
1.37	Suppose that Ao is an algebra.
a)	Show that Ao is closed under intersection; that is, A, В G Ao implies
А П В G Ao.
b)	Show that 0 G Ao.
1.38	Prove that an algebra is closed under finite unions and intersections.
1.39	Show that if a collection of subsets of Q is closed under complementation
and intersection, then it is an algebra.
1.40	Let Q be an infinite set and T> = {A C Q : A is finite or Ac is finite}.
Prove that P is an algebra.
1.41	This exercise generalizes Example 1.11(c). Suppose that {Ak }£=1 is a pair-
wise disjoint finite sequence of nonempty subsets of Q whose union is Q.
Let T> be the collection of all finite (including empty) unions of members
of {Afc}£=1. Prove that 2? is an algebra.
1.42	Let C denote the collection of all intervals of 1Z, including degenerate in-
tervals of the form [a, a] and (a, a). And let V be the collection of finite
disjoint unions of members of C. Prove that 7? is an algebra. Hint: First
show that T> is closed under intersection and then under complementation.
32 □ Chapter 1 Set Theory
1.43	Let Q be a set. A nonempty collection S of subsets of Q is called a semi-
algebra if the following conditions hold:
•	A, В G S implies А П В G S.
•	A G S implies that either Ac = 0 or there is a pairwise disjoint finite
sequence {Afc}£=1 of members of S such that Ac = |JZ=i
In words, 5 is a semialgebra if it is closed under intersection and the com-
plement of each member of S is a finite (possibly empty) disjoint union of
members of S.
a)	Show that any algebra is a semialgebra.
b)	Give two examples of semialgebras that are not algebras.
c)	Let {Afc}£=1 be a pairwise disjoint finite sequence of nonempty subsets
of Q whose union is Q. Set S = {0} U { A& : 1 < к < n}. Prove that
S is a semialgebra.
d)	Let C denote the collection of all intervals of including degenerate
intervals of the form [a, a] and (a, a). Show that C is a semialgebra.
e)	Let S be a semialgebra and T> the collection consisting of the empty
set and all finite disjoint unions of members of S. Prove that 7? is an
algebra. Hint: First show that T> is closed under intersection and then
under complementation.
1.44	Prove Proposition 1.14: Let C be a nonempty collection of subsets of Q.
Then there is a smallest algebra of subsets of Q containing C. Hint: Consider
the collection of all algebras of subsets of Q that contain C.
1.45	Refer to Exercise 1.43. In part (e) of that exercise, we proved that the
collection 2? consisting of the empty set and all finite disjoint unions of
members of a semialgebra, S, constitutes an algebra. Show that T> is the
algebra generated by S.
1.46	Prove each of the following facts.
a)	A a-algebra is closed under countable intersections; that is, if A is a
a-algebra and {An}n С A, then Qn An G A.
b)	A nonempty collection of subsets of Q is a cr-algebra if it is closed under
complementation and countable intersections.
1.47	In this exercise, we will provide two examples of algebras that are not
a-algebras.
a)	Prove that the collection T> defined in Exercise 1.40, although an algebra,
is not a cr-algebra.
b)	Prove that the collection T) defined in Exercise 1.42, although an algebra,
is not a a-algebra.
1.48	Show that the collection T> defined in Exercise 1.41 is a a-algebra.
1.49	Suppose that {4n}J°=i is a pairwise disjoint sequence of nonempty subsets
of Q whose union is Q.
a)	Prove that the collection T) of countable (including empty) unions of
members of {An}^°=1 is a a-algebra.
1.4 Algebras, cr-Algebras, and Monotone Classes □ 33
b)	Prove that the collection 8 of finite (including empty) unions of members
of Mn}*=1 is not an algebra and, hence, not a cr-algebra.
1.50	Prove Proposition 1.15 on page 29: Let C be a nonempty collection of subsets
of Q. Then there is a smallest cr-algebra of subsets of Q containing C. Hint:
Consider the collection of all cr-algebras of subsets of Q that contain C.
1.51	Refer to Exercise 1.49, where {An}^=1 is a pairwise disjoint sequence of
nonempty subsets of Q whose union is Q. In part (a) of that exercise,
we proved that the collection T> of countable (including empty) unions of
members of {An}^=1 is a cr-algebra. Show that T> is the cr-algebra generated
by that sequence.
1.52	Let Q be a set. Prove that an algebra of subsets of Q is a cr-algebra if and
only if it is a monotone class.
1.53	Let C be a nonempty collection of subsets of Q. Prove that there is a smallest
monotone class of subsets of Q containing C.
Georg Friedrich Bernhard Riemann
(1826-1866)
Bernhard Riemann was born on September 17,
wJBb 1826, in Breselenz, Germany. In 1846, Riemann
joTVSHK entered Gottingen University to study theol-
ogy. However, he soon persuaded his father to
jj^^	allow him to switch to mathematics.
ИЖ IBB Despite the presence of Gauss, Gottingen
ИМи^ ДНН had only a simple mathematics curriculum, so,
in 1847, Riemann enrolled at Berlin University,
where he was greatly influenced by both Jacobi and Dirichlet.
W. E. Weber s return to Gottingen University sparked an improvement
in the mathematical climate there and, in 1849, Riemann also returned to
Gottingen where, in 1851, he earned his PhD with his thesis on complex
function theory and Riemann surfaces,
Riemann continued his studies, submitting papers on Fourier series
and geometry to qualify to become an unpaid lecturer. The mathemat-
ical tools that Riemann developed in his geometry paper were used by
Albert Einstein in his theory of relativity. Riemann’s first lectures were on
partial differential equations in relation to physics. These brilliant lectures
were reprinted for 80 years after his death. At last, in 1857, Riemann was
appointed Assistant Professor (with pay!) at Gottingen.
In 1862, Riemann became quite ill and spent most of the next four
years trying to regain his health in the more hospitable climate of Italy
But, in Selasca, Italy, on July 20, 1866, Riemann succumbed to tuber-
culosis at the age of 39,
34
The Real Number System
and Calculus
As further preparation for our study of real analysis, we will present in this
chapter several topics often encountered in previous mathematics courses.
But, again, although some readers may be familiar with much of the ma-
terial, we present this chapter as a way to provide a common ground for
all readers of the text.
We will first discuss the real number system and the extended real
number system. Next we will investigate sequences of real numbers, ex-
ploring, in particular, cluster points and limits of such sequences. Then we
will introduce open and closed sets of real numbers and examine some of
their basic properties. In the final sections of this chapter, we will present
continuous functions and the Riemann integral with an eye toward reme-
dying some of the deficiencies experienced in trying to use these classical
concepts in modern analysis.
2.1 THE REAL NUMBER SYSTEM
Although it is mathematically satisfying to construct the real numbers
from “scratch,” such a construction would be an aside to the main thrust
35
36 □ Chapter 2 The Real Number System and Calculus
of this text. Thus, we will not endeavor to present a construction of the
real numbers? Instead, we will briefly review the main properties of the
real number system, specifically, three groups of axioms that together char-
acterize that system.
Axioms for the Real Number System
The first group of axioms for the real number system consists of the field
axioms. These axioms provide the basic properties of the real numbers
relative to the two binary operations of addition (+) and multiplication (•).
We will follow convention in using juxtaposition to indicate multiplication
when convenient.
Field Axioms
Let x,y,z G 1Z. Then we have:
(Fl) x + у = у + x and xy — yx. (commutative)
(F2) (x + y) + z = x + (y + z) and (xy)z = x(yz). (associative)
(F3) x(y + z) = xy + xz. (distributive)
(F4) There exist 0,1 € TZ with 0^1, such that for each x E 11, x + 0 = x
and x • 1 = x. (identities)
(F5) For each x e H, there is a ~x G 1Z such that x + (—x) = 0 and, if
x / 0, there is an х~г G 1Z such that xx"1 = 1. (inverses)
Because of (F2), x + у + z is defined unambiguously, as is any finite
sum; likewise, xyz is defined unambiguously, as is any finite product. If
xi, £2, • • •, xn are real numbers, then we use following notation:
n
Xk = Xi + X2 +---F xn
k=l
and
n
J* Xk = XiX2 * * ’ Xn.
k=l
Also, regarding (F5), we will usually write у — x for у + (—ж) and often
write y/x or for yx-1.
t Readers interested in a construction of the real numbers are referred to Cohen and
Ehrlich’s The Structure of the Real Number System (New York: D. Van Nostrand
Reinhold, 1963).
2.1 The Real Number System □ 37
The second group of axioms consists of the order axioms. These
axioms provide the basic properties of the real numbers relative to the
less-than (<) ordering.
Order Axioms t
Let x,y,z G 1Z. Then we have:
(01) x < у and у < z implies x < z. (transitive)
(02) x < у implies x + z < у + z.
(03) x < у and 0 < z implies xz < yz.
(04) Exactly one of x = y, x < y, and у < x holds. (trichotomous)
Note: We will also employ the following notation: x < у means that x < у
or x = y; x > у means that у < x; and x > у means that у < x.
The third group of axioms actually consists of one axiom, called the
completeness axiom or the least upper-bound axiom. In preparation
for stating that axiom, we first introduce some terminology.
Let A be a nonempty subset of 7Z. A real number и is called an upper
bound for A if x < и for all x € A. Note that not every subset of has
an upper bound, for example, neither 1Z nor Af has an upper bound. If a
subset of 1Z has an upper bound, then we say that it is bounded above.
A real number и is called a least upper bound or supremum for A
if it is an upper bound for A and is smaller than or equal to any other upper
bound for A. It is easy to see that a set can have at most one least upper
bound. Also, by definition, a necessary condition for a subset of to have
a least upper bound is that it be bounded above. That this condition is
sufficient is the content of the completeness axiom.
Completeness Axiom
A nonempty subset of real numbers that is bounded above has a least upper
bound.
Let A be a nonempty subset of 11 that is bounded above. Then each
of the following notations is used to denote the least upper bound of A:
sup A, sup x, or sup{ x : x € A }.
xEA
t The order axioms can also be stated in terms of the positive real numbers. See
Exercise 2.1.
38 □ Chapter 2 The Real Number System and Calculus
Similarly, we can define lower bound and greatest lower bound: Let
A be a nonempty subset of 1Z. A real number £ is called a lower bound
for A if x > £ for all x G A. Note that not every subset of 7Z has a lower
bound, for example, neither 7Z nor Z has a lower bound. If a subset of 7Z
has a lower bound, then we say that it is bounded below.
A real number £ is called a greatest lower bound or infimum for A
if it is a lower bound for A and is greater than or equal to any other lower
bound for A. It is easy to see that a set can have at most one greatest lower
bound. Also, by definition, a necessary condition for a subset of 7Z to have
a greatest lower bound is that it be bounded below. That this condition
is sufficient is a consequence of the completeness axiom. In other words,
we have the following proposition whose proof is left to the reader as an
exercise. (See Exercise 2.4.)
PROPOSITION 2.1
A nonempty subset of real.numbers that is bounded below has a greatest
lower bound.
Let A be a nonempty subset of 7Z that is bounded below. Then each
of the following notations is used to denote the greatest lower bound of A:
inf A, inf x, or inf{ x : x € A }.
xEA
EXAMPLE 2.1 Illustrates Least Upper Bound and Greatest Lower Bound
a) sup[0,1) = 1 and inf [0,1) = 0.
b) X has no least upper bound, but infX = 1.
c) Let A = { x : x2 < 2 }. Then supxeA x = \/2 and infx€4 x = —y/2.	□
An important consequence of the completeness axiom is that given any
real number, we can find a positive integer exceeding that number. In other
words, we have the following proposition, known as the Archimedean
principle.
PROPOSITION 2.2 Archimedean Principle
For each x ETZ, there is an n 6 X such that n> x.
PROOF: Let A — {m € Af : m < x}. If A = 0, then 1 > x and we are
done. So, we can assume that A is nonempty. By definition, A is bounded
above by x and, hence, by the completeness axiom, A has a least upper
bound, say, u. Then и — 1 is not an upper bound for A and, hence, there is
2.1 The Real Number System □ 39
a к € A such that к > и — 1. Let n = к 4-1 and note that n G АЛ Because
n > и and и is an upper bound for A, we have that n A. And from this
last result and the fact that n G Af, we conclude that n > x.	
The next two propositions show that between any two real numbers
there is both an irrational number and a rational number. We will find
these two facts essential.
PROPOSITION 2.3 Density of the Irrational Numbers
Between any two real numbers there is an irrational number.
PROOF: Let a, b G TZ with а < b. In Chapter 1, we noted that the
interval (a, 6) is uncountable. (See Exercise 1.26 for a proof.) Since the
set of rational numbers, Q, is countable (Proposition 1.12 on page 24), it
follows from Proposition 1.8 on page 23 that any subset of Q is countable.
If (a, 6) contained no irrational number then it would be an uncountable
subset of Q.	
PROPOSITION 2.4 Density of the Rational Numbers
Between any two real numbers there is a rational number.
PROOF: Let a, b e with а < b. We first assume that a > 0. By the
Archimedean principle, there is an n € AT such that n > (b — a)"1. Note
that nb > nb — a > 1 4- na — a > 1; so, nb > 1.
Now, let A — {k G N : к > nb}. By the Archimedean principle,
A / 0 and, therefore, by the well-ordering principle, A has a smallest
member, say, j. As nb > 1, j > 2. This, in turn, implies that j — 1 € A/"
and, consequently, because j is the smallest member of A, we must have
j — 1 < nb. Letting m = j — 1, we have that
, ZR \	771 + 1
a = b — (o — a) <------------
n
1 m
n n
Letting r = m/n, we have that r G Q and a < r < b.
Next we remove the restriction that a > 0. Applying the Archimedean
principle, we choose an n G Af such that n > —a. Then n+a > 0 and, so, by
what we have already proved, there is an r G Q such that n+a < r < n + b.
Then r — n G Q and a < r — n < b.	
40 □ Chapter 2 The Real Number System and Calculus
The Extended Real Number System
It is convenient to enlarge the set of real numbers to the extended real
numbers, which we denote by 7£*. This set is obtained by adding two dis-
tinct symbols, oo and —oo, to the real numbers; thus, 7£* = 7£U {—oo, oo}.
We extend the usual ordering of 71 to 71* by defining — oo < oo and
—oo < x < oo for all x € 71. We also extend the binary operations of
addition and multiplication to 7£*. In doing so, we make the convention
that, for x € 71*, x — oo = x 4- (—oo) and x — (—oo) — z 4- oo in the sense
that if one side of the equation is defined, then the other is defined likewise.
Now, for x € 71, we define
z-|-oo = oo 4- ж = oo and x — oo = —oo 4- x = —oo;
and
X • OO = 00 • X = 00 X • 00 = 00 • X = —oo	and x  (—oo) = (—oo) • x = —oo. and x  (—oo) = (—oo) • x = oo,
X • 00 = oo • X = 0	and x • (—oo) = (—oo) • x = 0,
if x > 0;
if x < 0;
if x = 0.
Also, we define
oo 4- oo = oo and —oo — oo = —oo;
oo • oo = oo and (—oo) • (—oo) = oo;
and
oo • (—oo) = (-oo) • oo = —oo.
The expressions oo — oo and — oo 4- oo are left undefined because they
cannot be defined in a way that is consistent with the rules of ordinary
addition and multiplication. See Exercise 2.10.
In Definition 1.1 on page 4, we defined intervals of 7Z. We can extend
that definition to intervals of 7£* and, in fact, this extension simplifies the
number of cases that need to be considered.
DEFINITION 2.1 Intervals of 7£*
Let a and b be extended real numbers such that а < b. Then the
intervals of with endpoints a and b are as follows:
(a, b) =	{ x	€	7£*	:	а < x <	b }
[a,b) =	{x	€	71*	:	а < x <	b}
(a,b] =	{x	e	71*	:	а < x <	b}
[a,b] =	{x	€	7Z*	:	a < x <	b}
2.1 The Real Number System □ 41
If a and b are both in 7£, then these are the bounded intervals of P,
as given in Definition 1.1. On the other hand, if either a = —oo or
b = oo, then the preceding four sets are unbounded intervals.
Note that in 7?*, every set is bounded above by oo. Thus, every
nonempty subset of 7£* has a least upper bound — if it is bounded above
in 7£, then its least upper bound is also in 7£; if it is not bounded above in 7£,
then its least upper bound is oo. Since every member of P* is vacuously an
upper bound for 0, we see that the empty set also has a least upper bound
in 7£*, namely, — oo. Similar remarks hold for greatest lower bounds. Thus,
we have the following proposition.
PROPOSITION 2.5
Every subset A of P* has both a least upper bound and greatest lower
bound. We have the following:
a)	If A — 0, then sup A = —oo and inf A = oo.
b)	If A is bounded above in P, then sup A € 7£; otherwise, sup A = oo.
c) If A is bounded below in P, then inf A 6P; otherwise, inf A = —oo.
EXAMPLE 2.2 Illustrates Proposition 2.5
a)	inf JV = 1 and supJV = oo.
b)	inf Z — —oo and supZ = oo.
c)	If A = {1,2,3, oo}, then inf A = 1 and sup A = oo.
d)	Suppose that I is an interval in P* with endpoints a and b. Then (see
Exercise 2.11) we have inf I = a and sup I = b.	□
EXERCISES 2.1
2.	1 The order axioms for the real number system can also be stated in terms of
the positive real numbers as follows. Let P+ denote the subset of positive
real numbers. Then we have:
(01') x, у £ P+ implies x + у € 7£+.
(02') x,y e P+ implies xy e P+.
(03') x e P+ implies —x P+.
(04') For each x G P, we have x = 0, x G P+, or — x G 7£+.
Prove that these four axioms are equivalent to the order axioms given on
page 37. Note: Assuming the order axioms given on page 37, we define
P+ = {x : x > 0}. On the other hand, assuming the order axioms in this
exercise, we define x < у to mean у — x G 7£+.
42 □ Chapter 2 The Real Number System and Calculus
★2.2 The absolute value of a real number rr, denoted |rr|, is defined by
। . f x, if x > 0;
W = l-X, ifx<0.
Let x, у € TZ. Prove each of the following facts.
a)	| - z| = |i|
b)	|xj/| = |x||j/|
c)	|z + y\ < |x| + |?/| [triangle inequality]
d)	|kl-M| < к-3/1
★2.3 For x, у G TZ, we define the maximum of x and у to be the larger of those
two numbers. We denote the maximum by max{x, y} or x V y. Thus,
v,	r (x, if x > y,
x V у = maxjj?, in = < .P
1 J I y, if x < y.
Similarly, we define the minimum of x and у to be the smaller of those
two numbers. We denote the minimum by min{x, y} or x A y. Thus,
л • r i f У, if x > y;
x Л у = min{a;, u} = <	..
I x, if x < y.
Let x, у G TZ. Referring to Exercise 2.2, prove each of the following facts.
a)	|x| = x V —x
b)	x V у = l(x + у + |x - 2/1)
c)	x Л у -	+ у - |x - y|)
2.4	Suppose that A is bounded below. Prove that A has a greatest lower bound
and that, in fact, inf A = — sup{ — x : x G A }.
2.5	Suppose that A С B. Prove the following.
a)	If В is bounded above, then so is A and sup A < supB.
b)	If В is bounded below, then so is A and inf A > inf B.
2.6	Suppose that F is a finite nonempty subset of TZ.
a)	Prove that F is bounded above.
b)	Prove that sup F G F. (We call this element of F the maximum of F
and denote it by maxF, таххег x, or max{ x : x G F}.
c)	Referring to Exercise 2.3, show that if F = {or, y}, then maxF = x V y.
d)	Prove that F is bounded below.
e)	Prove that inf F G F. (We call this element of F the minimum of F
and denote it by minF, mincer x, or min{x : x G F}.
f)	Referring to Exercise 2.3, show that if F = {x, y}, then min F = x /\ y.
2.7	Prove that any (nondegenerate) interval of real numbers contains infinitely
many irrational numbers, in fact, uncountably many.
2.8	Prove that any (nondegenerate) interval of real numbers contains infinitely
many rational numbers.
2.2 Sequences of Real Numbers □ 43
2.9	Let x e and set A = { z € Z : z < x }.
a)	Prove that A is nonempty.
b)	Explain why A has a least upper bound.
c)	Prove that sup A G A and, hence, that sup A is an integer.
d)	The integer sup{ z e Z : z < x} is called the greatest integer in x
and is denote by [ж]. Prove that [ж] < x < [x] + 1 or, equivalently, that
x — 1 < [я] < x.
e)	The function f :1l —> Z defined by /(ж) = [ж] is called the greatest
integer function. Prove that for each z G Z, f(z + x) = z + f(x).
2.10	Show that oo — oo cannot be defined in a way that is consistent with the
rules of ordinary addition and multiplication.
2.11	Prove that if I is an interval in 7£* with endpoints a and 6, then inf I = a
and sup I = b.
2.2	SEQUENCES OF REAL NUMBERS
Recall from Chapter 1 (see page 14) that an infinite sequence is a function
whose domain is the set of positive integers, A/*. In this section, we will
study infinite sequences of real numbers. A sequence of real numbers is
a sequence whose range is a subset of To begin, we recall the following
definition from calculus.
DEFINITION 2.2 Convergent Sequence; Limit
A sequence {xn}^=1 of real numbers is said to converge to the real
number x if for each e > 0, there is an N G AT such that |ж — xn| < 6
whenever n > N. In other words, the sequence converges to x if for
each e > 0, all but finitely many terms of the sequence lie within e
of x. The number x is called the limit of the sequence {^n}^Li and
we write
lim xn — x or xn —> x, as n —> oo.
n—>oo
If a sequence converges, we say that it has a limit.
The sequence {(n — l)/™}^^ converges; in fact, its limit is 1, that is,
Нтп__+оо(тг — 1)/тг = 1. On the other hand, it is easy to find sequences of
real numbers that do not converge.
Consider, for instance, the sequence {(—l)n}^_1. This sequence does
not converge because its terms oscillate between —1 and 1 and, hence, do
44 □ Chapter 2 The Real Number System and Calculus
not approach any single number. The sequence {n2}^ also does not
converge but for an intrinsically different reason — its terms are becoming
indefinitely large and, hence, do not approach a real number. If we would
allow limits in 7£*, this latter sequence would converge to oo.
It is convenient to permit convergence to extended real numbers and,
in fact, to allow the sequences themselves to contain extended real numbers
(i.e., to have range 7£*). Here we will discuss convergence to extended real
numbers but will restrict ourselves to sequences of real numbers, leaving
the generalization to sequences of extended real numbers to the reader.
DEFINITION 2.3 Convergent Sequence (Extended Sense)
A sequence	of real numbers is said to converge in 7£* if one
of the following three conditions hold:
a)	The sequence converges to a real number in the sense of Defini-
tion 2.2. In this case, we say that the sequence converges in
or that the limit exists and is finite.
b)	For each M e 7£, there is an N e AT such that xn> M whenever
n > N. In this case, we say that the sequence converges to oo
and write lim^-^oo xn = oo.
c)	For each M G 7£, there is an N G Af such that xn < M whenever
n > N. In this case, we say that the sequence converges to — oo
and write limn_»oo rrn = — cxd.
Sequences, such as {^2}^=1 or {(n + l)/n}™=1, whose terms never
decrease with increasing n or never increase with increasing n, play an
important role in analysis. More generally, let {xn}^L1 be a sequence of real
numbers. If xi < X2 < • • •, then the sequence is said to be nondecreasing.
If > X2 > • • •, then the sequence is said to be nonincreasing. A
sequence of real numbers is called monotone if it is either nondecreasing
or nonincreasing.
The next proposition, whose proof is left to the reader as an exercise,
shows that any monotone sequence of real numbers has a limit (in 11*). In
stating this and other propositions, we use the terminology that a sequence
is bounded above if its range is bounded above, and that the least upper
bound of a sequence is the least upper bound of the range of the sequence.
Similarly, we say that a sequence is bounded below if its range is bounded
below, and that the greatest lower bound of a sequence is the greatest lower
bound of the range of the sequence.
2.2 Sequences of Real Numbers □ 45
PROPOSITION 2.6
Any monotone sequence of real numbers converges in 7£*. In fact, we have
the following:
a)	If {xn}^=1 is nondecreasing, then
lim xn = sup{ xn : n G A/"},
n—*oo
In particular, the limit exists and is finite if {^n}Xi bounded above
and is oo otherwise.
b)	If{Xn}Zl is nonincreasing, then
lim xn = inf{irn : n € V}.
n—+OO
In particular, the limit exists and is finite if	is bounded below
and is —oo otherwise.
Cluster Points
By permitting sequences of real numbers to converge to extended real num-
bers, we have dealt with one type of nonconvergence of sequences, namely,
when the terms of the sequence are becoming either indefinitely large or in-
definitely small. The other type of nonconvergence occurs when the terms
of the sequence do not approach any single number, either real or extended
real. To analyze sequences of this type, we introduce the concept of a
cluster point.
For a sequence of real numbers to converge to a real number x requires
that for each c > 0, all but finitely many terms of the sequence lie within c
of x. Thus, we see that the terms of a sequence that converges in are
“clustering” around the limit of the sequence and no other number.
If we consider again the sequence {(—we see that it does
not converge because some of the terms of the sequence are clustering
around —1 and some are clustering around 1. That is, for each e > 0,
infinitely many terms of the sequence lie within e of — 1 and infinitely
many lie within c of 1. This leads us to the following definition.
DEFINITION 2.4 Cluster Point
Let {zn}Xi a sequence of real numbers.
a)	A real number x is said to be a cluster point of {^n}^=i if f°r
each e > 0 and N G Af, there is an n > TV such that |rr — xn\ < e.
46 □ Chapter 2 The Real Number System and Calculus
b)	oo is a cluster point of {zn}n=1 if for each M G К and N € TV,
there is an n > TV such that xn > M.
c)	—oo is a cluster point of {^n}^°=1 if for each M G and TV G TV,
there is an n > N such that xn < M.
Remark: Because we are restricting ourselves to sequences of real numbers,
the condition in part (b) of Definition 2.4 for oo to be a cluster point is
equivalent to the following condition: For each M € 7£, there is an n G N
such that xn > M; and, similarly, the condition in part (c) of the definition
can be restated. However, it is better to use the definitions as stated in
Definition 2.4 because they generalize properly to sequences of extended
real numbers.
EXAMPLE 2.3 Illustrates Definition 2.4
a)	As the reader can easily verify, the sequence {(—l)n}^=i has two cluster
points, namely, —1 and 1.
b)	Consider the sequence 2, 1, 0, 2, 2, |, 2, 3, j, 2, 4, |, ..., that is,
{(n - 3)/n,
(n+l)/3,
if n = 0 (mod 3);
if n = 1 (mod 3);
if n = 2 (mod 3).
This sequence, has three cluster points, namely, 1,2, and oq.
c)	Let {rn}^Li be an enumeration of the rational numbers. From the den-
sity of the rational numbers (Proposition 2.4 on page 39), it follows that
every extended real number is a cluster point of the sequence {гл}^х.
We leave the details to the reader.	□
The cluster points of a sequence of real numbers can be characterized
as follows. (See Exercise 2.17.)
•	A real number x is a cluster point if and only if for each e > 0, infinitely
many terms of the sequence are within e of x.
•	oo is a cluster point of a sequence if and only if for each M G 7Z, infinitely
many terms of the, sequence exceed M if and only if the-sequence is
unbounded above.
•	— oo is a cluster point of a sequence if and only if for each M G 7£,
infinitely many terms of the sequence are smaller than M if and only if
the sequence is unbounded below.
2.2 Sequences of Real Numbers □ 47
All three sequences in Example 2.3 have more than 5one cluster point
and none of those sequences converge. More generally, we have the following
proposition.
PROPOSITION 2.7
A convergent sequence has exactly one cluster point, namely, its limit.
Thus, a sequence having more than one cluster point cannot converge.
PROOF: Suppose that {zn}^Lx is a convergent sequence of real numbers,
say, xn —> x, as n oo. We will prove that x is a cluster point of the
sequence and that it is the only cluster point of the. sequence. In doing
so, we wilT assume that x € 7£ and leave the other two cases (x==oo and
z = — oo) to the reader.
To verify that x is a cluster point, let e > 0 and N We must
find an n > N such that |z — zn| < e. But, in fact, since xn —> x, there is
aK such that |x — zn| < e for all n > K. Let n max{7V,K}. Then
n > N and, because n > K, we have |z — zn| < e. Thus, x is a cluster
point of {zn}^r
Now we show that rid real number different from x can be a cluster
point of {zn}Xr Let у e H and у x. Let e = \y — z|/2. Choose N e Af
such that n> N implies |z — zn| < e. Then for n > N; we have
\y - Xn\ > \y - z| - |z - zn| > 2e - e = e
and, consequently, у is not a cluster point of {zn}^_x.
Next we show that oo is not a cluster point of {zn}^L1. Choose N e Af
such that |z — xn| < 1 for n > N. Then, letting M = x + 1, we have that
xn < M for n > N. Thus, oo is not a cluster point of
Finally we show that —oo is not a cluster point of {zn}^Lx. Choose
-V G N such that — xn\ < 1 for n > N. Then, letting M = x — 1, we have
that xn > M for n > N. Thus, —oo is not a cluster point of {zn}Xi- 
Limit Superior and Limit Inferior
Two of the most important concepts associated with infinite sequences of
real numbers are the limit superior and the limit inferior. Although a
sequence {zn}^Lx of real numbers does not necessarily have a limit (even
in 7£*), it always has both a. limit superior and limit inferior. As we will
see, these two extended real numbers are cluster points of the sequence, in
fact, the largest and smallest cluster points, respectively.
48 □ Chapter 2 The Real Number System and Calculus
First we introduce some convenient notation. For a sequence, {^n}Xi
of real numbers, we write
infxn = inf{zn : n G Af},
n
supxn = sup{xn : n G A/*},
n
sup Xk = sup{ Xk : к > n },
k>n
inf Xk = inf{Xk : k>n}.
k>n
DEFINITION 2.5 Limit Superior and Limit Inferior
Let {яп}„=1 be a sequence of real numbers.
a)	The limit superior of the sequence is the extended real number
given by
lim sup xn = inf sup Xk-
n—*oo	n k>n
b)	The limit inferior of the sequence is the extended real number
given by
liminf xn = sup inf Xk-
n—*oo	n k>n
Note: Notations for the limit superior and limit inferior other than the
ones presented in Definition 2.5 are commonly used. They are:
lim sup xn = lim sup xn = lim xn = lim xn
П—OO	n“*°°
and
lim inf xn — lim inf xn = lim xn = lim xn .
n“*°°	n—*oo
EXAMPLE 2.4 Illustrates Definition 2.5
Refer to Example 2.3 on page 46.
a)	Let xn = (—l)n. Then, for each n G A/*, we have supk>nxk — 1 and
infk>nXk = —1. Therefore,
inf sup Xk = 1 and sup inf Xk = — 1.
n k>n	n k>n
In other words, lim sup xn = 1 and liminfa;n = — 1.
2.2 Sequences of Real Numbers □ 49
b)	Consider the sequence 2, 1, 0, 2, 2, j, 2, 3, j, 2, 4,	., that is,
(n — 3)/n,
2,
(n+l)/3,
(mod 3);
(mod 3);
(mod 3).
Then, for each n G A/*, we have supfc>n Xk = oo and
inf х^ = <
k>n
(n — 3)/n,	if n	=	0	(mod	3);
(n — l)/(n 4-2),	if n	=	1	(mod	3);
(n-2)/(n + l),	ifn	=	2	(mod	3).
Therefore,
inf supxfc = oo and sup inf Xk = 1.
П k>n	n
In other words, lim sup xn = oo and liminfa;n = 1.
c)	Let {rn}^Lx be an enumeration of the rational numbers. Then, for
each n G AT, we have supfc>n Xk = oo and inffc>n Xk = —oo. Hence,
inf supxfc = oo and sup inf Xk = —oo.
n k>n	n k>n
In other words, lim sup xn = oo and lim inf xn — — oo.	□
It is helpful to note that the sequences {2/n}^Li an<^	defined
by Уп = 8Щ>к>пхк and zn = infk>n^k are, respectively, nonincreasing
and nondecreasing. Consequently, by Proposition 2.6 on page 45, both
sequences are convergent, converging to, respectively,
inf yn = inf sup Xk = lim sup xn
n	n k>n	n—>oo
and
sup zn = sup inf Xk — lim inf xn.
n	n k>n	n—*oo
In other words,
limsup= lim sup^fc and liminfa:n= lim inf Xk-
П-+ОО	71—*OO k>n	n—>oo	n—>oo k>n
The next two propositions characterize the limit superior and limit
inferior of a sequence of real numbers, providing both mathematical and
intuitive interpretations. We will prove the first part of the first proposition
and leave the proofs of the remaining parts of both propositions to the
reader as exercises.
50 □ Chapter 2 The Real Number System and Calculus
PROPOSITION 2.8
Let {xn}^Li be a sequence of real numbers. We have:
a) lim sup xn = x G if and only if for each e > 0,
(i) there is an N G Af such that xn < x + e for n > N, and
(ii) for each n e there is an m>n such that xm > x — e;
in other words, if and only if for each e > 0, infinitely many terms of the
sequence are within e of x and only finitely many are greater than x + e.
b) limsupa;n =?oo if and only if for each M G and N G A/*, there is an
n> N such that xn > M; in other words, if and only if the sequence is
unbounded above.
c) limsup xn — —oo if and only if lim^oo xn = —oo.
PROOF: We prove part (a) and leave the proofs of the remaining two parts
to the reader as Exercise 2.30. Let yn= supfc>nXk arid recall that
is nonincreasing and converges to limsupxn.
Suppose that x G 7<and limsupn_+ooa?n x- Then yn > x for all
n 6 V and yn —* x as n —* oo. Let e > 0 be given. Choose N G X such
that n > N implies yn — x < c Then, for n >^.N,	< yn < x + e. This
establishes (i). To establish (ii), we note that if n G Af, then supA;>n xk =
yn > x and, hence, swpk>nXk > x e. This means that x — e is not an
upper bound for { xk : к > n }; in other words, xm > x — e for some m > n.
Conversely, suppose that for each e > 0, (i) and (ii) hold. We must
prove that lim sup = x or, equivalently, that lim^oo yn — x. Let e > 0
be given. By (i), we can choose G A/* such that Xn < x -F e for n > N.
This implies that, for n > N, yn = supjfc>n xk < x -F 6. By (ii), we know
that for each n G Af, there is an m > n such that xm > x — e, which implies
that, for each n G A/”, yn ~ supfc>n Xk > x — e. Thus, we have proved that,
for each e > 0, there is an N G Af such that \yn — x| < e whenever n > N;
in other words, lim^oo yn = x.	
PROPOSITION 2.9
Let {xnKXi be a sequence of real numbers. We have:
a)	lim inf xn = x G if and only if for each e > 0,
(i)	there is an N G Af such that xn > x — e for n > N, and
(ii)	for each n G X, there is an m>n such that xm < x -F e;
in other words, if and only if for each e > 0, infinitely many terms of the
sequence are within e of x and only finitely many are less than x — e.
b)	liminf xn	—oo if and only if for each M G К and,N G there is an
n > N- such that xn < M; in other words, if and only if the sequence is
unbounded below.
c)	lim inf xn = oo if and only if limn-^ xn = oo.
2.2 Sequences of Real Numbers □ 51
We mentioned earlier that the limit superior and limit inferior are,
respectively, the largest and smallest cluster points of a sequence. This is
illustrated by Examples 2.3 and 2.4 (pages 46 and 48) and is proved in our
next proposition.
PROPOSITION 2.10
Let	be a sequence of real numbers. Then,
a) lim sup xn is the largest cluster point of {:rn}^=1.
b) lim inf xn is the smallest cluster point of {^n}^Li-
PROOF: We prove part (a) and leave the proof of part (b) to the reader as
an exercise. Let x = lim sup xn. It follows immediately from the definition
of cluster point (Definition 2.4 on page 45) and Proposition 2.8 that x is a
cluster point of {xn}™=1.
It remains to prove that x is the largest cluster point of
x = oo, there is nothing to prove. If x = —oo, then Proposition 2.8(c)
shows that \imn^xxn = — oo. Therefore, by Proposition 2.7 on page 47,
—oo is the only cluster point of	and, hence, the largest.
So, we can assume that x e 1Z. By Proposition 2.8(a), only finitely
many terms of the sequence exceed x 4-1; consequently, oo is not a cluster
point. It therefore remains to prove that if у G TZ and у > x, then у is not
a cluster point of {жп}^г Let e = (y — x)/2. Applying Proposition 2.8(a)
again, we know that only finitely many terms of {^n}^Li exceed x + б or,
equivalently, у — e. This shows that у is not a cluster point.	
The following proposition is often useful. The sufficiency part of the
proposition enables us to prove that a sequence converges without explicitly
finding its limit, and the necessity part often makes it easy to show that a
sequence does not converge.
PR0P0.SITI0N2.il
A necessary and sufficient condition for a sequence of real numbers to
converge is that its limit superior and limit inferior are equal. In such
cases, the sequence converges to the common value of the limit superior
and limit inferior.	..
PROOF: Suppose {asn}^=1 converges. Then, by Proposition 2.7, the se-
quence has a unique cluster point, namely, its limit. Since the limit superior
and limit inferior are both cluster points (Proposition 2.10), they must both
equal the limit of the sequence and, hence, each other.
52 □ Chapter 2 The Real Number System and Calculus
Conversely, suppose that limsupa?n = liminfa;n. Then, by Proposi-
tion 2.10, {xn}^=1 has exactly one cluster point, namely, the common value
of the limit superior and limit inferior. Call that common value x. We claim
that limn_+oo xn = x. If x = —oo, the result is true by Proposition 2.8(c)
on page 50, whereas if x = oo, the result is true by Proposition 2.9(c) on
page 50.
Hence, it remains to show that if x E 7Z, then	xn = x. Let
€ > 0. Then, by Proposition 2.8(a), there is an TV} 6 Af such that xn < z+e
for n > Ni and, by Proposition 2.9(a), there is an N2 E M such that
xn > x — e for n > TV2. Set N = max{M, N2}. Then, for n > TV, we have
that x — € < xn < x + e, that is, |x — xn\ < c.	
Proposition 2.7 states that a convergent sequence has exactly one clus-
ter point, namely, its limit. It follows immediately from Propositions 2.10
and 2.11 that the converse is true. In other words, we have the following.
PROPOSITION 2.12
A sequence of real numbers converges if and only if it has exactly one
cluster point. In such cases, the limit of the sequence is the unique cluster
point.
Cauchy Sequences
Proposition 2.11 provides a criterion for determining whether a sequence of
real numbers converges. A special case of that criterion is that a sequence
converges in H if and only if its limit superior and limit inferior are equal
and finite.
Another criterion for determining whether the limit of a sequence exists
and is finite is the Cauchy criterion. Roughly speaking, a sequence is
a Cauchy sequence if the terms of the sequence become closer and closer
together as the sequence progresses. More precisely, we have the following
definition.
DEFINITION 2.6 Cauchy Sequence
A sequence	of real numbers is called a Cauchy sequence if
for each e > 0, there is an N E N such that \xn — xm\ < e whenever
n, m > N.
2.2 Sequences of Real Numbers □ 53
With Definition 2.6 in mind, we now state and prove the Cauchy cri-
terion for convergence of sequences of real numbers.
THEOREM 2.1 Cauchy Criterion
A sequence of real numbers converges in 71 if and only if it is a Cauchy
sequence.
PROOF: Let {zn}~ j be a sequence of real numbers. Suppose that the
limit of the sequence exists and is finite, say, x. Let e > 0 be given. Then
we can choose N € such that n > N implies |a; — тп| < e/2. Therefore,
if n, m > TV, we have
Thus, {^n}^Li is a Cauchy sequence.
Conversely, suppose that {^n}^! is a Cauchy sequence. Let 6 > 0
be given. Then we can choose TV e jV such that \xn — xm\ < e whenever
n, m > TV. In particular, we have that
— e < xn < xn 4- c, n > TV.	(2.1)
From (2.1) and Exercise 2.29(b) on page 55, we see that both lim sup xn
and liminfa;n lie in the interval [xn — 6, xn 4- e]. This shows that both
lim sup xn and lim inf xn are finite and that
0 < lim sup xn — lim inf xn < 2e.
As б > 0 was chosen arbitrarily, we conclude that liminfa;n = lim sup xn
and that their common value is a real number. So, by Proposition 2.11,
limn^oo xn exists and is finite.	
EXERCISES 2.2
2.12	Prove that the limit of a sequence of real numbers, if it exists, must be
unique.
2.13	Let {xn}^=1 and	be two sequences of real numbers whose limits
exist and are finite. Also, let c (E 7£. Prove that each of the following holds.
a)	lim (xn 4- 2/n) = lim xn 4- lim yn
n—*oo	n—*oo	n—*oo
b)	lim cxn = c • lim xn
n—*00	n—*oo
c)	lim (xn2/n) = lim xn • lim yn
n—>oo	n—ЮО	n—*oo
54 □ Chapter 2 The Real Number System and Calculus
2.14	Refer to Exercise 2.13. Decide under which conditions each of (a)-(c) holds
if convergence is allowed in the extended sense, that is, in IV.
2.15	Let {znlJXi and	be two convergent sequences of real numbers such
that xn < Уп for n sufficiently large, that is, there is an E Af such that
xn < yn for n > N. Prove that limn->oo xn < limn->oo yn-
2.16	Prove Proposition 2.6 on page 45.
2.17	Refer to Definition 2.4 on page 45. Let {^n}^ be a sequence of real
numbers. Prove each of the following.
a)	A real number x is a cluster point of {ain}“=1 if and only if for each
€ > 0, infinitely many terms of the sequence are within e of x.
b)	oo is a cluster point of {zn}n=1 if and only if for each M eTZ, infinitely
many terms of the sequence exceed M if and only if the sequence is
unbounded above.
c)	—oo is a cluster point of {^nj^Lj if and only if for each M € 7£, in-
finitely many terms of the sequence are smaller than M if and only if
the sequence is unbounded below.
2.18	Find the cluster points of each of the following sequences.
a)	{l/n}~ ,
b)	{1 + (-1)"}“,
c)	{sin(n7r/2)}~=1
2.19	Consider the sequence {in}“=1 defined by
' 1,
Xn = < 2,
.n+z
if n = 0 (mod 3);
if n = 1 (mod 3);
if n = 2 (mod 3).
Determine the cluster points of the sequence.
2.20	Consider the sequence {гп}“=1 defined by
{n,	if n is odd;
(n — l)/n, if n is even.
Determine the cluster points of the sequence.
2.21	Let {гп}^! be an enumeration of the rational numbers. Prove that every
extended real number is a cluster point of this sequence.
2.22	Let т be a rational number, say, r = p/q where p and q are integers with
no common divisors. Define xn = nr — [nr], where [sc] denotes the greatest
integer in x. Determine the cluster points of {^n}“=1.
2.23	Let c be an irrational number and define xn = nc — [nc], where [ж] de-
notes the greatest integer in x. Determine the cluster points of {zn}“=1 by
proceeding as follows.
a)	Show that the terms of the sequence are distinct, that is, xn = xm
implies n = m.
2.2 Sequences of Real Numbers □ 55
b)	Prove that for each б > 0 and N G N\ there is an n > A/* such that
0 < xn < e. Hint: Use the Archimedean principle to choose an m G N
such that 1/m < e. For 1 < к < m, let Ik =	and note that
the As are disjoint, their union is [0,1), and each has length 1/m. Now
consider { Xj : j = 1, N 4-1,2N -hl,..., mN 4-1} and observe that, by
part (a), this set consists of m 4-1 distinct numbers in [0,1).
c)	Let x e [0,1). Prove that for each e > 0 and N G AT, there is an n > N
such that |x — xn| < 6. Hint: Choose an m E Af such that 1/m < e.
Apply part (b) to choose an n > N such that 0 < xn < 1/m. Let к be
the unique integer between 1 and m such that (k — l)/m < x < k/m.
Now let £ be the largest positive integer such that Ixn < k/m.
d)	Obtain the cluster points of {гп}“=1.
2.24	Complete the proof of Proposition 2.7 on page 47 by showing that a sequence
converging to oo or — oo has that value as its unique cluster point.
2.25	Prove that inffc>nXfc < supfc>mXk for all n, m G AT, where {zn}“=1 is any
sequence of real numbers.
2.26	Let {rn}“=1 be a sequence of real numbers and c a real number. Show that
a) lim sup(c 4- xn) ~ c 4- lim sup xn.
b)	lim inf (c 4- xn) = c 4- lim inf xn.
ч	4 f climsupXn, if c > 0:
c)	hmsup(Cxn) = (climinfa;nj ifc-0
v . r/ Ч fcliminfxn,	if c > 0;
d)	limmf(cxn) = <	~ ’
1 fchmsupxn, if c < 0.
Note that as special cases of parts (c) and (d), we have
limsup(—xn) = — liminf xn	and	liminf(—xn) = — limsupxn.
2.27	Let {xn}“=1 and	be sequences of real numbers. Verify that each
of the following holds, provided the right-hand side makes sense. ,
a) limsup(xn 4- 2/n) < limsupxn 4- limsupyn.
b)	limsup(xn 4- 1/n) > limsupxn 4- liminf yn.
c)	lim inf (xn + Уn) > lim inf xn 4- lim inf yn>
d)	lim inf(xn 4- уn) < limsup xn 4- lim inf yn-
2.28	Let {xn}“=1 and {з/п}^ be sequences of real numbers and assume that
limn-+oo yn exists and is finite. Prove that
a) limsup(xn 4- yn) = lim sup xn 4- limyn•
b) lim inf(xn 4- 2/n) = lim inf xn 4- lim yn.
2.29	Let {xnj^Lj and {т/п}^°=1 be sequences of real numbers. Suppose xn < yn
for n sufficiently large; that is, there is an N G N such that xn < yn
for n > N.
a)	Prove that limsupxn < limsupi/n and liminf xn < liminf yn.
b)	Suppose a and b are extended real numbers such that for n sufficiently
large, a < xn <b. Show that a < lim inf xn < lim sup xn < b.
56 □ Chapter 2 The Real Number System and Calculus
2.30	Refer to Proposition 2.8 on page 50. Prove parts (b) and (c).
2.31	Prove Proposition 2.9 on page 50.
★2.32 Prove that limn_oo xn = x if and only if every subsequence of KJXi has
a subsequence that converges to x.
2.33 Prove that an extended real number is a cluster point of a sequence if and
only if the sequence has a subsequence converging to that number. Conclude
that the limit superior of a sequence is the limit of a subsequence of the
sequence and likewise for the limit inferior.
2.34 Provide an example of a sequence of real numbers that converges in 7£* but
is not a Cauchy sequence.
★2.35 Let {zn}“=1 be a sequence of real numbers. Define
n
xi 4------1- xn 1
Qn —	~~	/ Xkj
n	n
k—1
so that an is the arithmetic mean of the first n terms of {zn}“=1-
a)	Prove that
lim inf xn < lim inf an < lim sup an < lim sup xn.
n~*°°	n—*oo	n—>oo	n—*oo
b)	Prove that if {rrn}^Li converges, then so does {an}“=1 and, in fact,
limn_>oo О-n = limn_+oo xn.
c)	Show that the converse of part (b) fails.
2.36 In this exercise, we will discuss infinite series. Let {zn}^=1 be a sequence
of real numbers. The sequence {$n}^°=1 defined by
n
sn =	n e Л/*,
fc=i
is called the sequence of partial sums of	If the sequence {sn}^!
converges to a real number, say, s, then we say that {zn}^=1 is summable
to s or that the infinite series Xn converges to s, and we write
oo
S =
n=l
We also say that s is the sum of the infinite series. If the sequence {sn}^.1
does not converge to a real number, then we say that {жп}^°=1 is not
summable or that the infinite series xn diverges. For brevity, we
often write ^2 xn in place of xn.
a)	Prove that if xn > 0 for each n G Af, then either lim n—юо Sn — OO ОГ
^xn converges.
2.3 Open and Closed Sets □ 57
b)	Show that if £2 xn converges, then limn_oo xn = 0.
c)	Show that if £2 xn converges, then limn_oo Xk =
d)	Prove that if 52|#n| converges, then so does ^Txn- Hint: Use the
Cauchy criterion.
★2.37 In this exercise, we will consider generalized sums. Let I be a nonempty
set and	an indexed collection of nonnegative real numbers, that is,
xb > 0 for each l G I. Define
xL = sup <
lEI
xL : F finite, F С I
lEF
(2-2)
where each sum in the set on the right is the ordinary sum of a finite
collection of real numbers.«
a)	Suppose that I = {1,..., n}. Show that xb = £fc, where the
term on the left is interpreted as in (2.2).
b)	Suppose that I = A/*. Show that ^2lEIxl = xn, where the term
on the left is interpreted as in (2.2) and the term on the right as the sum
of the infinite series if it converges and oo otherwise.
c)	Show that if ^2lEIxl < oo, then { l G I : xe > 0 } is countable. Note:
This result is often applied in the following form: If f : Q —* [0, oo) is
such that f(x) < oo, then { x : f(x) > 0 } is countable.
2.3 OPEN AND CLOSED SETS
In this section, we will discuss open and closed sets of real numbers. These
sets not only play a significant role in classical analysis but, as we will see
throughout this book, figure prominently in many areas of modern analysis.
We begin with the definition of an open set. Roughly speaking, a set О
of real numbers is open if for each x G O, we can remain in О by staying
sufficiently close to x. More precisely, we have the following definition.
DEFINITION 2.7 Open Set
A subset (9 C 77. is said to be an open set if for each x G (9, there is
an r > 0 such that (x — r, x + г) С O. In other words, О is open if for
each x G <9, there is an r > 0 such that all numbers within r of x are
also members of O.
58 □ Chapter 2 The Real Number System and Calculus
EXAMPLE 2.5 Illustrates Definition 2.7
a)	Any interval of the form (a, b), where — oo < a < b < oo, is an open set.
Therefore, such intervals are called open intervals.
b)	The interval (0,1] is not open, because (1-r, 1 + r) £ (0,1] for all r > 0.
Similarly, neither [0,1) nor [0,1] are open sets.
c)	Let К be a nonempty countable subset of TZ. Then К is not open.
Indeed, a nonempty open set must contain an open interval and such
an interval is uncountable, as we know from Exercise 1.26 on page 24.
In particular, then, V, Z, and Q are not open, and no nonempty finite
set is open.
d)	The set, Qc, of irrational numbers is not open. If it were, then it would
have to contain an open interval. Such an interval would contain no
rational numbers, which is impossible by Proposition 2.4 on page 39. □
Our next theorem displays three fundamental properties of the collec-
tion of open sets. As we will see in Section 7.1 (beginning on page 411),
these three properties are precisely the ones needed to generalize the con-
cept of open sets to other frameworks in the form of topological spaces.
THEOREM 2.2
a)	TZ and $ are open sets.
b)	If A and В are open sets, then so is A QB.
c)	If JS a collection of open sets, then	*s °Pen-
PROOF: The proof of (a) is trivial. For (b), suppose A and В are open
sets. We must show that А П В is open. Let x e А П B. Then x G A
and x G B. Since A and В are open, there exist п,Г2 > 0 such that
(x — Г1,д;-ЬГ1) C A and (x — rz,x + r2) С B. Let r = min{ri, 7*2}- Then we
have (x — r, x + r) C A and (x — r, x + г) С В so that (x — r, x + г) С АПВ.
Hence, АП В is open.
Now we prove (c). Suppose is a collection of open sets and let
O = LU Ob. We must show that О is open. Let x G O. Then x e Ob for
some 1 G I, say, lq. Since x G ObQ and is open, there is an r > 0 such
that (x — r, x + r) C ObQ. Consequently, because ObQ C O, we have that
(x — r, x + г) С O. Thus, О is open.	
Theorem 2.2(b) shows that the intersection of two open sets is open. It
follows easily by induction that the intersection of a finite number of open
sets is open; that is, if Ok is an open set for к = 1, 2, ..., n, then П£=1 Ok
is also an open set. However, the extension to arbitrary (even countable)
2.3 Open and Closed Sets □ 59
collections is not valid. Indeed, for each n G AT, let On = (—1/n, 1/n).
Then each On is open but Г)^ On = {0} is not open.
We have seen that if a, b G 7£* with a < b, then the interval (a, b) is
an open set, called an open interval. It follows from Theorem 2.2(c) that
unions of collections of open intervals are open sets. As the next proposition
shows, all open sets are of this form.
PROPOSITION 2.13
Each open set О is a countable union of disjoint open intervals. The rep-
resentation is unique in the sense that if C and V are two pairwise disjoint
collections of open intervals whose union is О, then C — f).
PROOF: Let О be an open set. We first show that О can be expressed
as a union of open intervals. The idea is this: For each x G O, go as far
as possible in either direction from x without leaving O; this will yield an
open interval containing x and contained in O. The union of these open
intervals will equal O.
More formally, let x G O. Define Ax = {у : у < x and (3/,x) С О }
and Bx = { z : z > x and (z, z) С О }. The sets Ax and Bx are nonempty
because О is open. Let ax = inf Ax and bx = supBT.
Here are two properties that we will need. First, ax < x <bx. This is
true because if у G Ax, then ax < у < x and, so, ax < x] similarly, x < bx.
Second, ax, bx O. To see this, suppose to the contrary that ax G O. Then
(ax — r, ax + г) С О for some r > 0, and we can always choose r < x — ax.
Since ax + r > aX1 there is a у G Ax such that у < ax + r and, since у E Ax,
(т/, z) С O. It follows that (ax - r, x) = (ax - r,ax + r) U (г/, x) С О and,
hence, that ax — r G Ax. But this is impossible because ax is a lower bound
for Ax. Thus, ax О and, similarly, bx O.
Set Ix = (ax,bx) and note that x G Ix. We claim that Ix С O. Let
и G Ix, then ax < и < bx. Thus, we can choose у G Ax and z G Bx
such that у < и < z. If и < ж, then и G (2/, ж] С О and, if и > x, then
и G (ж, z) С O. Hence, Ix C O. We can now conclude that Uxeo С O.
On the other hand, as x G IXJ LLeo Thus, О = LLeo -k-
Next we show that either Ix П Iy = 0 or Ix = Iy. So, suppose that
1хГ\1у 0 0. Then ax < by and ay < bx. Since ax $ O, we have ax £ (ay, by)
and, so, ax < ay. Similarly, ay < ax. Thus, ax = ay. Likewise, bx — by.
Hence, Ix = Iy.
Now, let C = {Ix : x G О }. Then, as we have seen, C is pairwise dis-
joint and Uagc A = O. We claim that C is countable. Let A G C. Because
A is an open interval, we can, by the density of the rational numbers, select
a rational number гд G A. Define f:C —> Q by f(A) = гд. This function
60 □ Chapter 2 The Real Number System and Calculus
is one-to-one because C is pairwise disjoint. Hence, C is equivalent to a
subset of Q and, consequently, is countable.
We leave the proof of the uniqueness of the representation as an exer-
cise for the reader.	
Closed Sets
Open sets constitute an important class of sets. Another important class
of sets comprises the closed sets. To begin our discussion of closed sets, we
make the following definition.
DEFINITION 2.8 Limit Point, Closure
Let E C 1Z. A real number x is called a limit point Ц E if for each
e > 0, there is a у 6 E such that \y — x| < e. The set of all limit points
of E, denoted E, is called the closure of E.
It is easy to see that each of the following two conditions is equivalent
to x being a limit point of E (i.e., x G E).
•	Each open interval containing x contains a member of E\ that is, if I is
an open interval such that x G /, then I П E 0.
•	There is a sequence {xn}^=1 of elements of E such that limn-^ xn = x,
thus, the terminology x is a limit point of E.
EXAMPLE 2.6 Illustrates Definition 2.8
Wejeave the verification of each part that follows to the reader.
a)	and 0 = 0.
b)	Let a, b G with а < b. Then (a, 6) = [a, 6) = (a, 6] = [a, 6] = [a, 6].
с)	АГ = ЛГ and Z = Z.
d)	Q = 1Z and Qc = K.
e)	If A is a finite subset of 7£, then A = A.	□
Note that every point of a set E is a limit point of E, that is, E С E.
However, the converse is not true — there may be limit points of E that
do not belong to E. For instance, 1 is a limit point of [0,1) but does not
belong to that set. If a set contains all its limit points, it is called closed.
1 Some texts use the term point of closure instead of limit point and reserve the
term limit point for a related concept.
2.3 Open and Closed Sets □ 61
DEFINITION 2.9 Closed Set
A subset F C TZ is said to be a closed set if F = F, that is, if
F contains all its limit points.
EXAMPLE 2.	7 Illustrates Definition 2.9
Referring to Example 2.6, we conclude the following:
a)	TZ and 0 are closed sets. But we also know from Theorem 2.2(a) that TZ
and 0 are open sets. We leave it to the reader as an exercise to show
that these are the only two subsets of TZ that are both open and closed.
(See Exercise 2.46.)
b)	The intervals of TZ that are closed are those of the form [a, b], [a, oo),
and (—oo, b], where a, b G TZ. Such intervals are called closed intervals.
Note: Intervals of the form (a, b] and [a, 6), where a,b e TZ, are called
half-open intervals. Degenerate intervals of the form [a, a] are closed
sets; degenerate intervals of the form (a, a), (a, a], and [a, a) are empty
and, hence, both open and closed.
c)	Af and Z are closed.
d)	Neither Q nor Qc is closed.
e)	Any finite subset of TZ is closed.
f)	A set may be neither open nor closed; examples are Q, Qc, and any
half-open interval.	□
The fundamental relationship between open and closed sets is eluci-
dated by the following proposition.
PROPOSITION 2.14
A set is open if and only if its complement is closed or, equivalently, a set
is closed if and only if its complement is open.
PROOF: Suppose that Ec is open. We will show that E is closed by
proving that it contains all its limit points. So, assume that x E, that is,
x e Ec. Since Ec is open, there is an r > 0 such that (a; — r, x + r) C Ec.
But then (x — r, x + r) is an open interval about x containing no points
of E\ hence, x E. We have therefore shown that E С E, as required.
Conversely, suppose that E is closed. If x G Ec, then x E and,
consequently, since E is closed, we have x E. Hence, there is an e > 0
such that (x — e, x + e) C Ec. We have thus shown that Ec is open. 
62 □ Chapter 2 The Real Number System and Calculus
Open and Closed Sets of a Subset oflZ
Frequently, our “universal set” will be a proper subset of TZ. Therefore, we
need to discuss open and closed sets of a subset D C TZ.
DEFINITION 2.10 Open Set of D
Let D C TZ. A subset G C D is said to be open in D if for each
x G G, there is an r > 0 such that (x — r, x + г) П D C G. Thus, G is
an open subset of D if for each x G G, there is an r > 0 such that all
numbers within r of x that are members of D are also members of G.
EXAMPLE 2.	8 Illustrates Definition 2.10
a)	Let D = [0,2]. Then the interval [0,1) is open in D. Note, however,
that it is not open in TZ.
b)	Let D = [0,2]. Then the interval [0,1] is not open in D because, for
each r > 0, we have (1 — r, 1 4- г) П [0,2] £ [0,1].
c)	Let D = ЛЛ Then every subset A c is open in AT. Indeed, if n G A,
then (n - |,n + |) nAf = {n} C A.	□
The following theorem provides the relationship between open sets of
a subset of TZ and open sets of TZ.
THEOREM 2.3
Let D CTZ. A set G C D is open in D if and only if there is an open set О
ofTZ such that G = Dn О. In other words, the open sets in D are precisely
the open sets ofTZ intersected with D.
PROOF: Suppose G C D is open in D. Then, for each x e. G, there is
an open interval Ix (open in TZ) containing x such that Ix П D C G. Let
О =	Then, by Theorem 2.2(c), О is open in TZ. We will show
that G = D A O. If x G G, then x G D and x G Ix С O; thus, G C D A O.
On the other hand, since Ix A D C G for all x G G, we have
DDO = Z>n( J/А = |J(4nD)cG.
' xEG
Hence, G = D ПО, as required.
Conversely, suppose G = D П О for some open set О of TZ. If x G G,
then x G О and, hence, there is an r > 0 such that (x — r, x + г) С O.
2.3 Open and Closed Sets □ 63
This, in turn, implies that (z — r, x + г) П D С О П D = G. Hence G is
open in D.	
Limit points, closure, and closed sets in D are defined in a way anal-
ogous to that in TZ. We leave the details to the reader in Exercise 2.52.
EXERCISES 2.3
2.38	Prove that the intersection of a finite number of opens sets is open; that is,
if Ok is an open set for к = 1, 2, ..., n, then Ok is also an open set.
2.39	Prove the uniqueness portion of Proposition 2.13 on page 59.
2.40	Prove Lindeldf’s theorem: Let О be a collection of open sets. Then
there is a countable subcollection {(9n}n of О such that
UO==U°-
oeo n
2.41	Let E G TZ and x G TZ. Prove that each of the following is equivalent.
a)	x G E (i.e., x is a limit point of E).
b)	Each open interval containing x contains a member of E; that is, if I is
an open interval such that x G I, then I A E / 0.
c)	Each open set containing x contains a member of E; that is, if О is an
open set such that x G (2, then О A E / 0.
d)	There is a sequence {zn}^°=1 of elements of E such that limn_oo xn — x.
2.42	Refer to Example 2.6 on page 60. Verify each of the statements made in
that example.
2.43	Let E C TZ. A real number x is called an accumulation point of E if
for each € > 0, there is a у G E such that 0 < \y — rr| < e. Prove that the
following are equivalent.
a)	x is an accumulation point of E.
b)	Each open interval containing x contains a member of E different from rr;
that is, if I is an open interval such that x G I, then IA (E \ {a;}) / 0.
c)	x G E \ {я}.
d)	There exists a sequence {^n}^ of distinct elements of E such that
limn—*oo Xn — x.
2.44	Refer to Exercise 2.43. Let E' denote the set of all accumulation points of
a set E C 1Z.
a)	Prove that E' is closed.
b)	Prove that E = E U E'.
★2.45 Refer to Exercise 2.43. Prove the Bolzano-Weierstrass theorem: Every
bounded infinite subset of real numbers has an accumulation point. Hint:
Use the fact that every infinite set contains a countably infinite subset
(Exercise 1.28 on page 25).
64 □ Chapter 2 The Real Number System and Calculus
2.46	Prove that 1Z and 0 are the only two subsets of 1Z that are both open and
closed. Hint: Let A be a nonempty proper open subset of 1Z. Choose x G A
and let ax and bx be as in the proof of Proposition 2.13 on page 59.
2.47	Let A and В be subsets of 1Z. Establish each of the following facts.
a)	A U В = A U B.
b)	АП В С АП В. Provide an example to show that the reverse inclusion
does not hold in general.
c)	A is closed.
d)	If A and В are closed, then so is A U В.
e)	If is a collection of closed sets, then Г\6/ К is closed.
2.48	True or False: If is a collection of closed sets, then IJ^ez *s cl°secl-
2.49	Let A and В be subsets of and	be a collection of subsets of 1Z.
Establish each of the following facts.
a)	If A С B, then A С B.
b)	If А С В C A, then A = B.
c)	We have
Jac ил.
d)	Referring to part (c), can we, in general, replace “C” by “=”? If not,
state a condition on I that assures the replacement is valid.
2.50	Let D C 11.
a)	Suppose that D is an open subset of 1Z. Prove that a subset of D is open
in D if and only if it is open in 1Z.
b)	Show that the result of part (a) fails to hold without the assumption
that D is an open subset of H.
c)	Prove that a subset of D is open in D if it is open in 1Z.
2.51	Let D С 11. Prove that the collection of open sets of D satisfies the three
properties listed in Theorem 2.2 (page 58). That is,
a) D and 0 are open in D.
b)	If A and В are open in D, then so is А П B.
c)	If {Gzjtez is a collection of sets open in D, then Ua€/ is open in D.
2.52 In this exercise, we will explore limit points, closure, and closed sets of a
subset of H. Let D C 11 and E C D.
a)	Define a limit point of E in D; call such a limit point a D-limit point
of E.
b)	Define the closure of E in D; call it the D-closure of E.
c)	Define E is closed in D.
d)	Prove that E is closed in D if and only if D \ E is open in D.
e)	Prove that E is closed in D if and only if there is a closed set F of 1Z
such that E = D П F.
2.4 Real-Valued Functions	□ 65
2.4 REAL-VALUED FUNCTIONS
A real-valued function is a function whose range is a subset of 7Z. If
f : Q —> 7£, then we say that f is a real-valued function on Q. In this
section, we will discuss real-valued functions and several concepts associ-
ated with them. Much of the section is concerned with real-valued functions
whose domains are a subset of H.
We begin by defining algebraic operations on real-valued functions.
This is done pointwise as follows. Suppose that f and g are real-valued
functions on Q and that a G 7£. Then we define the functions f + g,
and f • g on Q by
(/+ £)(*) = /(*)+
(a/)(x) = a/(x),
(/ •	= f(x)g(x),
for each x E Q.
Continuous Functions
The most important functions in calculus are the continuous functions.
They play a prominent role in modern analysis as well. Roughly speaking, a
function f is continuous at xq if f(x) can be made arbitrarily close to f(xo)
by taking x sufficiently close to xq. More precisely, we have the following
definition for a real-valued function defined on a subset of H.
DEFINITION 2.11 Continuous Function
Let D Cft, f:D -> ft, and xq G D. We say that f is continuous
at Xq if for each e > 0, there is a 6 > 0 such that \f(x) — /(xo)| < б
whenever x G D and |x — ях0| < 6. We say that f is continuous
on D if it is continuous at every point of D. We denote by C(D) the
collection of all continuous functions on ZM For simplicity, and when
no confusion will arise, we often write C for (7(7£).
Note: If f is not continuous at £q, then we say that f is discontinuous
at xq or that xq is a point of discontinuity of f.
t This notation is temporary and will be modified and generalized in Chapter 7.
66 □ Chapter 2 The Real Number System and Calculus
EXAMPLE 2.9 Illustrates Definition 2.11
a)	Let D = (0, oo) and define /(x) = 1/x. Then f is continuous on D.
b)	Let D = TZ. Define /(0) = 0 and f(x) = sin(l/x) for x / 0. Then f is
continuous except at 0.
c)	Let D = TZ and define f(x) = [x]. Then f is continuous except at points
of Z.
d)	Every function is continuous on ЛЛ Indeed, let -+TZ and xq e Af.
Then |/(x) — /(xo)I — 0 whenever x e Af and |x — Xo| <1-	О
An important property of the continuous functions on a subset D of TZ
is that they form an algebra of functions. In other words, we have the
following theorem whose proof is left to the reader as an exercise.
THEOREM 2.4
Let D C TZ. Then the collection C(D) of continuous functions on D is an
algebra of functions. That is, if f,g E C(D) and a ETZ, then
a) figeC(D).
b) af e C(D).
c) f-geC(D).
Our next theorem provides a relationship between continuous functions
on D and the open sets in D.
THEOREM 2.5
Let D C TZ and f:D-+TZ. Then f is continuous on D if and only if
/-1(O) is open in D for each open set О in TZ.
PROOF: Suppose that f is continuous on D. Let О be an open set in TZ
and Xo € /-1(O). Then /(xo) € О and, so, because О is open, there is an
r > 0 such that (/(xo)—r, /(xo)4-r) С O. Since f is continuous at xq, there
is a 6 > 0 such that |/(x) — /(xo)| < r whenever |x - Xo| < <5 and x e D.
Therefore, if x G (xo — 5,xo+6)AL>, then /(x) G (/(xo) — r,/(xo) +r) С О
and, hence, x G /-1(O). Consequently, we have found a 6 > 0 such that
(xo - 6,xq 4- 6) A D С	It now follows that /-1(O) is open in D.
Conversely, suppose /-1(O) is open in D for each open set О in TZ.
Let Xo G D. We will prove that f is continuous at xq. Let e > 0. The set
(/(z0) ~ e, /(x0) + c) is open in TZ and, so, G = /-1 ((/(x0) - 6, /(x0) + б))
is open in D. Since x0 G G and G is open in D, there is an r > 0 such that
(x0 - r, xo + r) A D C G. Thus, if x G D and |x — Xo| < r, then x G G and,
consequently, /(x) G (/(x0) - 6, /(x0) + e), that is, |/(x) - /(x0)| <6. 
2.4 Real-Valued Functions □ 67
COROLLARY 2.1
A function f:1Z-^1Z is continuous if and only if is open in 1Z
whenever О is open in 11.
We can restate Theorem 2.5 as follows: A real-valued function on D
is continuous if and only if the inverse image of each open set in 1Z is open
in D. This relationship between continuous functions and open sets is
significant because it provides a way for us to define continuity of functions
in very general settings, as we will see in Chapter 7.
Monotone Functions
Functions defined on an interval that never decrease as x increases or never
increase as x increases play a significant role in analysis.
DEFINITION 2.12 Monotone Function
Let f be a real-valued function defined on an interval I of real numbers.
Then f is said to be
a) nondecreasing on I if f(x) < f(y) whenever x,y e I and x < y.
b) nonincreasing on I if f(x) > f(y) whenever x,y e I and x < y.
c) monotone on I if it is either nondecreasing or nonincreasing on I.
Note: Some authors use the term increasing in place of nondecreasing and
use the phrase strictly increasing to indicate that f(x) < /(?/) whenever
x < y. We will use both “increasing” and “strictly increasing” to describe
functions satisfying this latter condition, but will avoid both terms for
functions that do not increase in the strict sense. Thus, for us, each of
the terms “nondecreasing,” “strictly increasing,” and “increasing” applies
equally well to the function f(x) = x3; but we would only use the term
“nondecreasing” to describe the function f(x) — 1. Similar remarks hold
for the three terms nonincreasing, strictly decreasing, and decreasing.
EXAMPLE 2.10 Illustrates Definition 2.12
a) The function f(x) = e* * x is nondecreasing on any interval. It is also
monotone and (strictly) increasing.
b) The function f(x) = 1/x is nonincreasing on any interval not contain-
ing 0. It is also monotone and (strictly) decreasing on any such interval.
c) The function f(x) = sinx is nondecreasing on [-тг/2,тг/2] and nonin-
creasing on [тг/2, Зтг/2]. However, it is not monotone on [0,7г].	□
68 □ Chapter 2 The Real Number System and Calculus
Pointwise Limits
We know from Theorem 2.4 on page 66 that C(jD), the collection of real-
valued continuous functions on a set D C 7£, is an algebra of functions — it
is closed under sums, multiples, and products. Being an algebra is a useful
and important property for a collection of functions to have.
Another desirable property for a collection of functions is that it be
closed under pointwise limits. This concept is relevant to any collection of
real-valued functions having a common domain, not just to those whose
domain is some subset of TZ. To begin, we define pointwise convergence
of a sequence of real-valued functions.
DEFINITION 2.13 Pointwise Convergence
Let {/n}Xi be a sequence of real-valued functions on a set Q, that
is, fn: Q —> TZ for each n G АЛ Then we say that {/n}^Li converges
pointwise on Q if for each x G Q, the sequence {/n(z)}^Li of real
numbers converges in TZ. * ч
If {fn}Xi converges pointwise on fi, then we can define a function
by f(x) = limn-»oo fn(x)- We say that the function f is the
pointwise limit of the sequence of functions {fn}^=i or that the sequence
of functions {/n}~=1 converges pointwise to the function f. We write
pointwise to indicate pointwise convergence of {/n}^=i to f.
EXAMPLE 2.11 Illustrates Definition 2.13
a) For each n G Af, define fn:1Z —> TZ by fn(x) = (l+z/n)n. Then fn—*f
pointwise on 7£, where f(x) = ex.
b) Let D C H and define, for each n G AT, fn: D -+ H by fn(x) = xn. If
D = [0,1], then fn—>f pointwise, where
ч f 0, if 0 < x < 1;
However, the sequence of functions	fails to converge pointwise
if D = [—1,1] since the sequence {(—1)™}^=! does n°t converge; it
also fails to converge pointwise if D = [0,2] since, for instance, the
sequence {2n}^=1 does not converge in TZ.
c) For each n G Af, define fn:	H by
- z x f n2x, if Ixl <
Zn(^) “ | otherwise.
2.4 Real-Valued Functions	□ 69
Then fn-> f pointwise on ft, where /(0) = 0 and f(x) = 1/x for x / 0.
d) Let jD C ft and define, for each n G Af, fn: D —► ft by fn(x) = x/n.
Then fn —> 0 pointwise on jD, where 0 denotes the function identically
equal to 0, that is, f(x) = 0 for all x G D.	□
DEFINITION 2.14 Closure Under Pointwise Limits
Let J7 be a collection of real-valued functions on Q. We say that
J7 is closed under pointwise limits if whenever	C J7 and
fn —* f pointwise on Q, then / G £
Obviously, the collection of all real-valued functions on a set Q is closed
under pointwise limits. In particular, the collection of all real-valued func-
tions on a subset D of ft is closed under pointwise limits. However, in
general, C(D) is not closed under pointwise limits, as we see from parts (b)
and (c) of Example 2.11.
The fact that C(D) is not generally closed under pointwise limits will
lead us naturally into a discussion of Borel measurable functions in Chap-
ter 3. For now, we introduce a type of convergence stronger than pointwise
convergence that does yield closure for C(jD). This type of convergence is
called uniform convergence, a concept that is relevant to any sequence
of real-valued functions having a common domain.
DEFINITION 2.15 Uniform Convergence
Let	be a sequence of real-valued functions on a set Q. Then we
say that {fn}^LX converges uniformly to the real-valued function f
on fi, if for each e > 0, there is an N G Af such that n > N implies
|/n(z) ~ /(ж)I < € for * * x £ fL We write fn-+f uniformly to
indicate uniform convergence of	to f.
The “uniform” in uniform convergence refers to the fact that by tak-
ing n sufficiently large, fn(x) can be made arbitrarily close to f(x) for all
x G Q, that is, uniformly over Q.
Clearly, uniform convergence implies pointwise convergence. The con-
verse is not true, however, as Example 2.11(b) shows. Uniform conver-
gence also depends on Q. For example, let fn(x) = x/n and f(x) = 0. If
70 □ Chapter 2 The Real Number System and Calculus
Q = [0,1], then fn-+f uniformly on Q. But, if Q = 7£, then fn f
uniformly on Q, although it does so pointwise.
The next proposition verifies our contention that C(D) is closed under
uniform limits.
PROPOSITION 2.15
Let D C 11. Suppose that {fn}n=i c C(D) and that	uniformly.
Then f e C(D).
PROOF: Let xq e D and e > 0. Because fn~*f uniformly, we can choose
N e AT such that |/n(x) - f(x)\ < e/S for all x e D. And, because
/tv is continuous on D and, hence, at Xq, we can choose 6 > 0 such that
|/лг(х) — /n(^o)| < 6/3 whenever x E D and |x — ^ol < It follows that
whenever x G D and |x — ^o| < <5, we have
|/(x) - /(x0)| < \f(x) - fN(z)\ + |/n(x) - /x(®o)| + |/n(zo) - /(^o)|
< б/З + б/З 4- б/З = б,
Thus, f is continuous on D.	
Monotone Sequences of Functions
As we will see beginning in Chapter 3, it is important to consider mono-
tone sequences of functions. As for pointwise and uniform convergence,
this concept is relevant to any sequence of real-valued functions having a
common domain.
DEFINITION 2.16 Monotone Sequence of Functions
Let {/n}n=i be sequence of real-valued functions on a set Q. Then
we say that {/n}^Li is
a)	nondecreasing if for each x e Q, {/n(z)}^Li is a nondecreasing
sequence of real numbers.
b)	nonincreasing if for each x G Q, {/n(^)}^=i is a nonincreasing
sequence of real numbers.
c)	monotone if it is either nondecreasing or nonincreasing.
EXAMPLE 2.12 Illustrates Definition 2.16
a) Let D c 11 and define fn:D -+1l by fn(x) = xn. Then {/n}Xi non”
increasing if D = [0,1], nondecreasing if D = [1,2], and not monotone
if D = [0,2] or £> = (-1,0].
2.4 Real-Valued Functions □ 71
b)	Let D C R and define fn:D —► R by fn(x) = x/n. Then {/n}Xi
nondecreasing if D c (—oo,0], nonincreasing if D c [0, oo), but not
monotone if D contains both positive and negative numbers.
c)	Define fn: [0,1] R by
9
ТГХ
0.
if 0 < x < Л-;
if 277 <
271	71
if £ < x < 1.
n —	—
The sequence {fn}^-i is not monotone.
d)	Let	be a sequence of subsets of a set Q. Define fn: Q —► R by
fn(x) = 1 if ж G An and fn(x) = 0 if x $ An.
(i)	{/n}^°=i is a nondecreasing sequence of functions if and only if
Mn}Xi a (monotone) nondecreasing sequence of sets.
(ii)	{/nj^i is a nonincreasing sequence of functions if and only if
is a (monotone) nonincreasing sequence of sets. □
EXERCISES 2.4
2.53	Let f and g be real-valued functions on Q. Write the pointwise definitions
of f V g (the maximum of f and g) and f A g (the minimum of f and p).
Refer to Exercise 2.3 on page 42.
2.54	Prove Theorem 2.4 on page 66.
2.55	Show that if f G C(D), then so is \f\.
2.56	Prove that C(D) is closed under maximums and minimums. That is, if
f,g G C(D), then
a)	f V 9 £ C(D). Hint: Use Exercise 2.55, Theorem 2.4 on page 66, and
Exercise 2.3(b) on page 42.
b)	f A 9 € C{D). Hint: Use Exercise 2.55, Theorem 2.4 on page 66, and
Exercise 2.3(c) on page 42.
2.57	Verify each part of Example 2.10 on page 67.
2.58	Define f: [0,3] —► R by f(x) = 2ifl<x<2 and f(x) = 1 otherwise. On
which (nondegenerate) subintervals of [0,3] is f
a) nondecreasing?
b)	strictly increasing?
c)	nonincreasing?
d)	strictly decreasing?
e)	monotone?
★2.59 Suppose that f: (a, b) —> R is nondecreasing. For x G (a, 5), let
Lx = { f(t) : a <t < x} and Rx = { /(t) : x < t < b },
and define f(x—) = supL^ and /(x+) = inf Rx.
72 □ Chapter 2 The Real Number System and Calculus
f(x) = {
a)	Show that f(x—) < f(x) < /(#+) for all x G (a, b).
b)	Prove that f is continuous at x if and only if f(x—) = /(#+)•
c)	Prove that f has countably many discontinuities; that is, the set of points
at wl)ich f is discontinuous is countable.
d)	Deduce that a nonincreasing function on (a, 6) has countably many dis-
continuities.
2.60 For each n G define fn‘ [0,1] —> 1Z by fn(x) = xn. Also, define
0, ifO<m<l;
1,	z = l.
Prove that fn^f pointwise, but not uniformly, on [0,1].
2.61	Let D C 11 and define, for each n G N, f:D^1Zby fn(x) = x/n. Also,
define f: D —► H by f(x) = 0 for all x G D.
a)	Show that if D = [0,1], then fn—>f uniformly.
b)	Show that if D = 7£, then fn-/+f uniformly.
c)	In part (b), is it possible for {fn}^=1 to converge uniformly to some
function? Explain your answer.
2.62	Verify each part of Example 2.12 on pages 70 and 71.
★2.63 In this exercise, we ask you to prove Dini’s theorem: Suppose that '
is a monotone sequence of continuous functions defined on a closed
bounded interval [a,b\. Further suppose that	converges pointwise
to the continuous function f. Prove that fn —> f uniformly on [a, b] by
applying the following steps.
a)	Explain why we can assume without loss of generality that {fn}<^L1 is
nonincreasing and that f = 0.
b)	Let € > 0. For each n G set = {x G [a,6] : /п(я) < б}. Show
that {On}?=i is a monotone nondecreasing sequence of open sets in [a, b]
whose union is [a, b\.
c)	Use part (b) to prove that there is an G V such that OeN = [a, b].
Hint: Use the fact that if a closed bounded interval is a subset of the
union of a collection of open sets, then there is a finite subcollection of
that collection whose union also contains the interval. This result is a
special case of the Heine-Borel theorem.
d)	Use part (c) to conclude that fn^0 uniformly.
2.64	Refer to Exercise 2.63. Show that the conclusion of Dini’s theorem does
not hold if we weaken the hypotheses of that theorem in any one of the
following ways:
a)	The interval on which the functions are defined is permitted to be any
closed interval.
b)	The interval on which the functions are defined is permitted to be any
bounded interval.
c)	The limiting function, /, is not restricted to being continuous.
d)	The monotonicity requirement is dropped.
2.5 The Cantor Set and Cantor Function □ 73
2.65	Refer to Example 2.11(a) on page 68. Prove that the convergence is uniform
on any bounded subinterval of [0, oo). Hint: Apply the binomial theorem
and Dini’s theorem (Exercise 2.63).
2.66	Let	be a sequence of real-valued functions on Q.
a)	We say that {fn}^=1 is pointwise Cauchy on Q, if for each x € Q,
{/n(z)}~ j is a Cauchy sequence. Prove that if {fn}^^ is pointwise
Cauchy on Q, then it converges pointwise on Q.
b)	We say that {fn}^=1 is uniformly Cauchy on Q, if for each e > 0,
there is an TV e Af such that \fn(x) — fm(x)\ < e whenever m,n > N
and x G Q. Prove that if	is uniformly Cauchy on Q, then it
converges uniformly on Q.
2.5	THE CANTOR SET AND CANTOR FUNCTION
We next introduce a set and function, called the Cantor set and Cantor
function, that will serve as useful examples and counterexamples through-
out the text?
Before discussing the Cantor set, we present the following proposition
which states in part that for each integer p > 2, every number between 0
and 1 has a base-p expansion. The proof is left to the reader as an exercise.
(See Exercise 2.67.)
PROPOSITION 2.16 Base-p Expansion
Let p be an integer greater than 1. Then for each x G [0,1], there is a
sequence {an}^Li of integers such that 0 < ап < p — 1 for all n and
oo
I=	«I	(2.3)
Z—/ рП p p2 p3	t
The sequence	unique unless x / 1 and is of the form q/pm for
some q,m in which case there are exactly two such sequences, one
having only finitely many nonzero terms and the other having only finitely
many terms different from p — 1.
Note: We use the notation
x = О.ахазаз • • • (p)	(2.4)
as a shorthand for Eq. (2.3).
* The Cantor function and set are named in honor of Georg Cantor. See the biography
on page 2.
74 □ Chapter 2 The Real Number System and Calculus
EXAMPLE 2.13 Illustrates Proposition 2.16
a)	For each p > 2, we have
0 = 0.000... (p)
and
l = 0.(p-l)(p-!)(?-!)...	(p).
b)	The number 1/2 has, respectively, the binary (p = 2), ternary (p = 3),
and decimal (p = 10) expansions given by
0.1000...	(2),
0.1111...	(3),
0.5000... (10).
As predicted by Proposition 2.16,1/2 also has a second binary expansion
and decimal expansion. They are, respectively,
0.0111...	(2),
0.4999... (10).
But the ternary expansion of 1/2 is unique.
□
The Cantor Set
We now construct the Cantor set. The Cantor set is a subset of [0,1]
obtained as follows.
Step 1: Delete the middle third open interval of [0,1], namely, (1/3,2/3).
See Fig. 2.1.
FIGURE 2.1 Set remaining after Step 1.
Step 2: After the first step, there remain two closed intervals, namely,
[0,1/3] and [2/3,1]. Delete the middle third open interval from each of
those two intervals, namely, (1/9,2/9) and (7/9,8/9). See Fig. 2.2.
2.5 The Cantor Set and Cantor Function □ 75
1	1	1	1	1	1	1	1							
n	1	2	1	2	7	8	1
и	9	9	3	3	9	9	1
FIGURE 2.2 Set remaining after Step 2.
Step n: After the (n - l)st step, there remain 2n 1 closed intervals. Delete
from each of these the middle third open interval.
Continue this process inductively. For each n € X, let Gn denote
the set removed at the nth step, Pn the set remaining after the nth step,
G = U^Li Gn, and P = Pn- We have the following noteworthy facts:
•	Gn is the union of 2n-1 disjoint open intervals, each of length l/3n. In
particular, Gn is an open set.
•	Pn is the union of 2n disjoint closed intervals, each of length l/3n. In
particular, Pn is a closed set.
•	P = [0,1] \ G and, so, PPI G = 0 and P U G = [0,1].
•	G is the disjoint union of all removed open intervals, the sum of whose
lengths is 2n“x/3n = 1. In particular, G is an open set.
•	P is a closed set, being the intersection of closed sets. It contains no
interval because, for each n e Af, P C Pn, and Pn contains no interval
whose length exceeds l/3n. The set P is called the Cantor set or,
sometimes, the Cantor ternary set.
As we have just seen, G, the complement in [0,1] of the Cantor set, is
a disjoint union of open intervals, the sum of whose lengths is 1. But the
length of [0,1] is also 1. Thus, from the point of view of length, the Cantor
set appears to be “small.” On the other hand, as we will see shortly, P is
uncountable, so that from a cardinality point of view, the Cantor set is
“large.” These, among other properties of the Cantor set, make it useful
for illustrating many subtle concepts.
We mentioned that the Cantor set is sometimes called the Cantor
ternary set. The reason for this will now be revealed. We begin with the
following lemma.
76 □ Chapter 2 The Real Number System and Calculus
LEMMA 2.1
An interval (a, b) is one of the 2n~1 open intervals removed from [0,1] at
the nth step in the construction of the Cantor set if and only if a and b are
of the form
а = O.(2C1)(2c2) ... (2cn_1)1000... (3),
b = O.(2C1)(2c2) ... (2cn_i)2000... (3),	1 '
where ck e {0,1} for 1 < к < n — 1.
PROOF: We proceed by induction. At the first step (n = 1) of the
construction of the Cantor set, exactly one interval is removed, namely,
(1/3,2/3). And we have that
| =0.1000... (3),
j = 0.2000... (3),
which is of the form (2.5) with n = 1.
Proceeding inductively, we note that an open interval (a, b) is removed
at the nth step if and only if а = r-+-l/3n and b = r4-2/3n, where r equals 0
or is the right endpoint of one of the open intervals removed on or before
the (n — l)st step. If r = 0, then
а = ± =0.0^1000... (3),
n—1 times
b = Д = 0. (KM) 2000... (3),
n—1 times
which is of the form (2.5) with ck = 0 for 1 < к < n — 1. Otherwise, by
the induction assumption, there is a positive integer m < n — 1 such that
r = 0.(2ci)(2c2)...(2cm_i)2000... (3), where ck € {0,1}, 1 < k<n-l.
Then we have
а = r + ± = O.(2ci)(2c2) ... (2cm_i)20^0 1000... (3),
n—m—1 times
b = r + £ = O.(2ci)(2c2) ... (2cm_i)20^02000... (3),
n—m—1 times
which is of the form (2.5) with cm = 1 and ck = 0 for m-Fl < к < n — 1. 
Prom Lemma 2.1, we can obtain the following proposition whose proof
is left to the reader as an exercise. See Exercise 2.69.
2.5 The Cantor Set and Cantor Function □ 77
PROPOSITION 2.17
The Cantor set consists of all numbers in [0,1] that have a ternary expan-
sion without the digit 1.
The Cantor Function
Using Proposition 2.17, we can now define the Cantor function, which is
a real-valued function on [0,1]. We begin by specifying its values on the
Cantor set, in other words, by defining a function f :P -+11. *
Let x G P. By Proposition 2.17, x has a (unique) ternary expansion
without the digit 1, say,
я = O.(2ci)(2c2) ... (3),
where cn G {0,1} for each n G ЛЛ We define
= O.cic2... (2).
PROPOSITION 2.18
Let f:P-+Hbeas defined in the preceding text. Then the range of f
is [0,1].
PROOF: It is clear from the definition of f that its range is a subset
of [0,1]. To show that it is onto, let у G [0,1]. Then, by Proposition 2.16
on page 73, we can write у = O.did2 ... (2), where dn G {0,1} for each
n G ЛЛ Let x = O.(2di)(2d2)... (3). From Proposition 2.17, we know
that x G P and, by definition, f(x) = y-	
COROLLARY 2.2
The Cantor set is uncountable.
PROOF: Prom Proposition 2.18, we have f(P) = [0,1]. By Proposition 1.9
on page 23, the image of a countable set is countable. Thus, since [0,1] is
uncountable and is the image of P under /, P must be uncountable. 
Next we extend f to a function V? on [0,1]. If x G P, we define ^(x) =
f(x). If x G [0,1] \ P, then x is in exactly one of the open intervals (a, b)
removed from [0,1] in the construction of the Cantor set. By Lemma 2.1
on page 76, there is an n G Л7 such that
а = O.(2C1)(2c2) ... (2cn_i)1000... (3),
b = O.(2ci)(2c2) ... (2cn_i)2000... (3),
78 □ Chapter 2 The Real Number System and Calculus
where Ck G {0,1} for 1 < к < n - 1. Now note that
/(a) = O.cic2 ... cn__iOUl...	(2),
/(6) = 0.c1c2...cn_11000... (2),
and, hence, /(a) = f(b). We define ^(x) to be the common value of /(a)
and /(b).
The function ф is called the Cantor function or Lebesgue singular
function. Its graph is sketched in Fig. 2.3.
1 1 7 8 3 4 5 8 1 2 3 8 1 4 1 8	*111	1 II 1	Illi	Illi	x
	J__2_ 1	2 _7__8_ 1	2 19 20 7	8 25 26 27 27 9	9 27 27 3	3 27 27 9	9 27 27 1
FIGURE 2.3 Sketch of the Cantor function.
In the next two propositions, we state two important properties of the
Cantor function. One is that it is nondecreasing and the other is that it
is continuous. The proof of the first proposition is left to the reader as an
exercise. (See Exercise 2.72.)
PROPOSITION 2.19
The Cantor function is nondecreasing.
2.5 The Cantor Set and Cantor Function □ 79
PROPOSITION 2.20
The Cantor function is continuous.
PROOF: We will prove that 0 is continuous at each x G (0,1) and leave
to the reader the proof of continuity at the endpoints of the interval. So,
assume that x G (0,1). By Proposition 2.19, 0 is nondecreasing. Let
Lx = { 0(t) : 0 < t < x } and Rx = { 0(t) : x < t < 1},
and let 0(x—) = supL^ and 0(x+) = inf Rx.
To show that 0 is continuous at ж, it suffices, by Exercise 2.59 on
page 71, to prove that 0(x—) = 0(x+). Suppose that this is not the case.
Then, again, by Exercise 2.59, either 0(&—) < 0(&) or 0(x) < 0(x+). We
will consider the case where the latter holds true, realizing that a similar
argument would ensue if the latter does not hold true but the former does.
As 0 is nondecreasing, we have
0 = 0(0) < 0(x) < 0(#+) < 0(1) = 1.
Select у G (0(z), 0(z4~)) and note that 0 < у < 1. From Proposition 2.18
on page 77, it follows that the range of 0 is [0,1]; thus, there is a z G (0,1)
such that 0(2) = y. Since 0(x) < у = 0(z), we must have x < z and, hence,
0(#+) < 0(z). But we also have 0(г) = у < 0(#+). Consequently, we
have reached a contradiction and, therefore, 0 must be continuous at x. 
EXERCISES 2.5
2.67	Prove Proposition 2.16 on page 73 by using the following steps. We can
assume that x G (0,1). (Why?)
a)	Show that for each n G Af, there are integers ai, 02, •••, fln with
0<Ufc<p — 1 for 1 < fc < n and such that
Ql U2
p p2
CLn
pn
Ql U2
P P2
Qn
pn
pn
Hint: Let ai = [px] and note that ai/p < x < (ui + l)/p. Also note
that, because 0 < x < 1, we have 0 < px < p and, so, 0 < ai < p — 1.
Now use induction.
b)	Use part (a) to show that there exists a sequence {on}“=1 of integers
with 0 < an < p — 1 for each n G Af and such that (2.3) holds.
c)	Show that if x is of the form q/pm, where q,m G X, then it has two
base-p expansions, one having only finitely many nonzero terms and the
other having only finitely many terms different from p— 1. Hint: Use the
Euclidean algorithm to write q = bip™-1 + ЬгР™-2 H---------F bm-ip 4- b™,
where bk G Z and 0 < bk < p — 1 for к = 1, 2, ..., m.
80 □ Chapter 2 The Real Number System and Calculus
d)	Prove that x can have at most two different base-p expansions and that,
if it has two, it is of the form ц/р™, where q,m G J\f. Hint: Assume x has
two different base-p expansions, say, O.aia2 • • •	(p)	and 0.6162 ...	(p).
Let n be the first positive integer for which ak / bk and assume without
loss of generality that an < bn- Show that this implies that an = bn — 1
and that, for к > n -4-1, a* = P ~ 1 and bk = 0.
2.68	Refer to Example 2.13 on page 74. For each part that follows, explain why
we know from Proposition 2.16 that 1/2 has
a) two binary expansions,
b) two decimal expansions,
c) a unique ternary expansion.
2.69	Prove Proposition 2.17: The Cantor set consists of all numbers in [0,1] that
have a ternary expansion without the digit 1. Proceed as follows.
a)	Show that x G Gn if and only if each of its ternary expansions is of the
form
O.(2ci)(2c2) ... (2cn-i)lan+ian+2 . •. (3),	(2.6)
where Ck € {0,1} for 1 < к < n — 1 and not all the a^s are 0 and not all
are 2.
b)	Use part (a) to show that G consists of all numbers in (0,1) that require
a 1 in (each of) their ternary expansions.
c)	Use part (b) to conclude that Proposition 2.17 holds.
2.70	Refer to the notation introduced on page 75.
a)	Let I be any one of the 2n closed intervals whose union is Pn. Prove
that I A P is uncountable.
b)	Prove that for each x € P and 6 > 0, the set (x — 6, x -4- 6) A P is
uncountable.
2.71	Prove that the Cantor function, Vs satisfies ф(х) = 2ф ) for all x G [0,1].
2.72 Prove Proposition 2.19: The Cantor function, ф, is a nondecreasing function
on [0,1]. Hint: Let x,y G [0,1] with x < y. You must show ф(х) < ф(у).
To accomplish that, consider cases depending on whether a? is a member of
the Cantor set and whether у is a member of the Cantor set. First consider
the case where both x and у are members of the Cantor set.
2.73	Complete the proof of Proposition 2.20 on page 79 by showing that the
Cantor function is continuous at the endpoints of [0,1].
2.74	Let ф denote the Cantor function and define
D= r^+ft)-^)::re[0|1]|
I h	J
Show that inf D = 0 and sup Z) = 00.
2.75	Generalize the technique used in the proof of Proposition 2.20 on page 79 to
establish the following fact: If f: [a, 6] —* [c, d] is monotone and onto, then
f is continuous on [a, 6].
2.6 The Riemann Integral □ 81
2.6 THE RIEMANN INTEGRAL
In this section we will discuss the Riemann integral. We will define it
in a way that motivates the definition of the Lebesgue integral, which is
presented in Chapter 3.
In defining the Riemann integral, we need the concept of characteristic
function. The characteristic function of a set indicates which elements
are in the set and which are not. More precisely, we have the following
definition.
DEFINITION 2.17 Characteristic Function
Let Q be a set and А С Q. Then the characteristic function of A,
denoted Xa, is the real-valued function on Q defined by
1,
0,
if x e A;
if x A.
The Riemann Integral
First we define the integral of a step function on a closed and bounded
interval. A step function on an interval [a, 6] is a function of the form
n
h = ^akXik,	(2.7)
fc=i
where n 6 Af,	is a sequence of real numbers, and {Ik}k=i is
a finite sequence of pairwise disjoint intervals whose union is [a, 6]. We
permit degenerate intervals in this representation, that is, intervals of the
form [c, c] = {c} or [c, c) = (c, c] = (c, c) = 0.
Let us denote by £(I) the length of an interval I, where the length
of a degenerate interval is defined to be 0. Then the integral of the step
function in (2.7) is defined by
b	n
h(x) dx =
fc=i
We leave it to the reader as an exercise to show that this definition of the
integral of a step function is well-posed. See Exercise 2.77.
t Note that we are defining the integral of a step function, not the Riemann integral
of a step function. We will see shortly, however, that the two integrals agree.
82 □ Chapter 2 The Real Number System and Calculus
For example, let [a, 6] = [0,1] and define
Then h = 2x[o,i/3) + Зхц/3,1/2] ~	and we have
f1	/1	\	/1 1\	/	1
/ h(x) dx = 2l-—01+3I- — - )—4ll — -
Jo	\'5/	\^'5/	\	z
Next we define the upper and lower Riemann integrals of a bounded
real-valued function. Let f be a bounded real-valued function on [a, 6],
that is, f: [a,b] —> and there is an M G such that |/(x)| < M for
all x G [a,b\. Then the upper Riemann integral of f over [a,b] is
defined by
dx = inf <
dx : h a step function and h > f
Similarly, the lower Riemann integral of f over [a, b] is defined by
dx = sup <
dx : h a step function and h < f
It is not too difficult to show that (see Exercise 2.79)
[ /(x) dx < i /(x) dx.
a	J a
If equality holds, then f is said to be Riemann integrable over [a, b\.
DEFINITION 2.18 Riemann Integrable; Riemann Integral
Let f be a bounded real-valued function on [a, b]. If
dx,
2.6 The Riemann Integral □ 83
then we say that f is Riemann integrable over [a,b\. In this case,
the common value of the upper and lower Riemann integrals is called
the Riemann integral of f over [a, b] and is denoted by
dx.
We write R([a, b]) for the collection of all Riemann integrable func-
tions over [a, b].
EXAMPLE 2.14 Illustrates Definition 2.18
a)	Let f be a step function on [a, b], Then, see Exercise 2.80,
/* f(x) dx = [ f (z) dx
J a	J a
and
/ f(x) dx = f(x) dx,
J a	J a
where each integral on the right is interpreted as the integral of a step
function. Thus, every step function on [a, 6] is Riemann integrable and,
moreover, its Riemann integral equals its integral as a step function.
b)	As we will discover later in this section, a continuous function on [a, b]
is Riemann integrable thereon.
c)	Define /(0) = 0 and f(x) = 1/x2 for x 0. Then / is not Riemann
integrable over a (closed and bounded) interval containing 0, as it is
not bounded on such an interval. On the other hand, by part (b), / is
Riemann integrable over a (closed and bounded) interval that does not
contain 0, because it is continuous on such an interval.
d)	Here is an example of a bounded function that is not Riemann inte-
grable. For x € [0,1], define /(x) = Xq(z). Now, a step function h that
dominates f must satisfy h(x) > 1 except at a finite number of points.
Because 1 (the function identically equal to 1) is a step function that
dominates /, it follows that /(x) dx = 1. Similarly, a step function h
that is dominated by f must satisfy h(x) < 0 except at a finite number
of points. Because 0 (the function identically equal to 0) is a step func-
tion that is dominated by /, it follows that /(x) dx = 0. Hence, f is
not Riemann integrable over [0,1].	□
84 □ Chapter 2 The Real Number System and Calculus
Basic Properties of the Riemann Integral
The following theorem provides some fundamental properties of the Rie-
mann integral. Its proof can be found in many advanced calculus and
introductory real analysis texts?
THEOREM 2.6
Suppose that f,g G й([а, b]) and that a elZ.
a)	If а < c < b, then f G Я([а, с]) П Я([с, b]) and
[ f(x)dx= f /(z)dz+ f ffxjdx.
a	J a	J c
b)	If f < g, then
dx< i g(x)
J а
dx.
c)	We have f 4- g G Я([а, &]) and
rb	rb	rb
I (f + g)(x)dx= / f(x)dx+ / g(x)dx.
a	J a	J а
d)	We have af G B([a, &]) and
e)	We have \f\ e 7?([a, 6]) and
t See, for instance, Richard Goldberg’s Methods of Real Analysis, Section 7.4, 2d ed.
(New York: Wiley, 1976).
2.6 The Riemann Integral □ 85
Characterization of Riemann Integrable Functions
We stated earlier that a continuous function is Riemann integrable. Al-
though there are noncontinuous functions that are Riemann integrable
(e.g., step functions), we will soon see that a function is Riemann inte-
grable if and only if it is “essentially continuous.”
To make “essentially continuous” precise, we introduce the concept
of sets of measure zero. Intuitively, these are sets without much content,
although they may be large in the sense of cardinality.
DEFINITION 2.19 Set of Measure Zero
A subset E of 'll is said to have measure zero if for each c > 0,
there exists a sequence {In}n of open intervals such that E C {JnIn
and £n W < e-
The next proposition shows that a countable union of sets of measure
zero also has measure zero. Its proof is left to the reader as an exercise.
PROPOSITION 2.21
Let {En}n be a sequence of subsets of'll each having measure zero. Then
Un En has measure zero.
EXAMPLE 2.15 Illustrates Sets of Measure Zero
a)	A singleton set has measure zero; that is, if x G 7£, then {ж} has measure
zero. Ipdeed, for e > 0, Ц = (x - e/^x + e/4) is an open interval,
{ж} C Ie, and £(IC) = c/2 < e.
b)	It follows immediately from part (a) and Proposition 2.21 that a count-
able subset of H has measure zero. In particular, a finite subset of
has measure zero and Af, Z, and Q have measure zero.
c)	The Cantor set has measure zero. (See Exercise 2.84). Since the Cantor
set is uncountable (Corollary 2.2 on page 77), we see that although being
countable is a sufficient condition for a subset of 'll to have measure zero,
it is not necessary.
d)	A (nondegenerate) interval does not have measure zero. See Exer-
cise 2.86.	□
We are now in a position to state a continuity-type characterization of
Riemann integrable functions. For a proof, see, for instance, Section 7.3 of '
the text referenced in the footnote on page 84.
86 □ Chapter 2 The Real Number System and Calculus
THEOREM 2.7
A bounded function on [a, b] is Riemann integrable if and only if the set of
points of discontinuity of the function has measure zero.
COROLLARY 2.3
If f is continuous on [a, b], then it is Riemann integrable thereon and
/* f(x) dx = sup /* h(x) dx,	(2.8)
J a	h J а
where the supremum is taken over all step functions h that are dominated
by f.
Equation (2.8) will serve as motivation when, in Chapter 3, we define
the Lebesgue integral of a measurable function.
EXAMPLE 2.16 Illustrates Theorem 2.7
a)	From Theorem 2.7 and Example 2.15(b), we see that a bounded function
on [a, b] having countably many discontinuities is Riemann integrable.
b)	Every monotone function on [a, b] is Riemann integrable. This follows
immediately from Exercise 2.59(c) on page 72 and part (a) here. □
Convergence Properties of the Riemann Integral
An important question in analysis is: If a sequence of functions converges
pointwise, can the integral and limit be interchanged? For Riemann inte-
gration, this question can be stated as follows. Suppose that {/n}^Li is a
sequence of Riemann integrable functions on [a, b] that converges pointwise
to a Riemann integrable function f. Is it true that
lim [ fn(x)dx= [ f(x)dx?	(2.9)
n~4O° J a	J а
The answer, in general, is no. For example, let f = 0 (the function iden-
tically 0) and fn = nx(o,i/n) f°r each n E JV. Then fn f pointwise
on [0,1]. But f(x) dx = 0 and /J fn(x) dx = 1 for each n € ЛЛ There-
fore,
lim / fn(x) dx = 1 ф 0 = / /(x) dx.
n-*°° Jo	Jo
Even if each /n and / are continuous, the limit and the integral cannot,
in general, be interchanged. As in the discussion of pointwise convergence
of continuous functions, the concept of uniform convergence plays an im-
portant role here.
2.6 The Riemann Integral □ 87
THEOREM 2.8
Suppose	is & sequence of Riemann integrable functions on [a, b]
that converges uniformly to a function f. Then f is Riemann integrable
over [a, b] and
rb	rb
lim / fn(x)dx = / f(x)dx.
n—*oo I	I
J CL	J	Q,
PROOF: That f is Riemann integrable is left to the reader as an exercise.
(See Exercise 2.91.)
Let e > 0 be given. Choose N € Af such that |/‘(x) —	< б/(Ь — a)
for all x € [a, 6] whenever n > N. Using Theorem 2/6 on page 84, we have,
for n > N,
rb	rb
I f(x)dx — / fn(x)dx
a	J а
fn){x)dx
|/(z) - fn(x)\dx <
fb 6 J
I ----dx = e
а b-а
as required.
As Theorem 2.8 shows, uniform convergence is a sufficient condition
for the interchange of limit and integral. It is not, however, a necessary
condition. To see this, let fn(x) = xn and
/w={»:
x = 1;
x 1.
Then fn-+f pointwise, but not uniformly, on [0,1]. Moreover,
lim / fn(x)dx= lim —?—= 0 = / f(x)dx.
n-^Jo	n->oon+l Jo
We have seen three important consequences of uniform convergence.
It is a sufficient condition for
•	the limit of a sequence of continuous functions to be continuous (Propo-
sition 2.15 on page 70).
•	the limit of a sequence of Riemann integrable functions to be Riemann
integrable (Theorem 2.8).
•	the interchange of limit and integral (Theorem 2.8).
88 □ Chapter 2 The Real Number System and Calculus
Although uniform convergence has these and other desirable conse-
quences, it is a very strong condition to place on a sequence of functions,
especially when the common domain is the entire real line. This fact and a
need to “integrate” non-Riemann integrable functions will lead us naturally
to Lebesgue measurable functions and the Lebesgue integral in Chapter 3.
EXERCISES 2.6
2.76	Let Q be a set and А, В C Q. Prove the following facts.
а)	хаг\в =XA’Xb-
b)	If А П В = 0, then xaub = Ха + Xb-
c)	More generally than in part (b), if {Cn}n is a pairwise disjoint sequence
of subsets of Q, then X|J Cn ~	•
d)	Obtain a general formula for xaub-
2.77	Show that the definition of the Riemann integral of a step function is well-
posed: Suppose h = o>kXik = bjXJj, where n, m G {a/c}fc=i
and	are sequences of real numbers, and	and	are
each a finite sequence of pairwise disjoint intervals whose union is [a, b\.
Prove that ak£(Ik) = J2JL1 ^(Л)- Hint: First show Ik П Jj / 0
implies that ak = bj. Then show ak£(Jk) =	°* EXi ^Ik П
2.78	Suppose that g and h are step functions on [a, b] and that g < h on [a, b],
that is, g(x) < h(x) for all x G [a, b]. Prove that f^g(x)dx < h(x)dx,
where each integral is interpreted as the integral of a step function. Hint:
See the hint given in Exercise 2.77.
2.79	Let f be a bounded function on [a, b]. Prove that
I f(x) dx < f(x) dx.
J a	J a
Hint: Use Exercise 2.78.
2.80	Let f be a step function on [a, b]. Prove that
pb	pb
J f(x) dx= f(x) dx
a	J a
and
I f(x) dx = I f(x) dx,
a	J a
where each integral on the right is interpreted as the integral of a step
function. Thus, a step function on [a, b] is Riemann integrable thereon
and its Riemann integral equals its integral as a step function. Hint: Use
Exercise 2.78.
2.6 The Riemann Integral □ 89
2.81	Prove that a real-valued function f on [a, 6] is a step function if and only
if there is a partition a = xq <	< • • • < xn = b of [a, b] and real
numbers ci, C2, ..., cn such that for each 1 < к < n, we have f(x) = ck
for Xk-l < x < Xk.
2.82	In this exercise, you are asked to show that the definitions of the upper and
lower Riemann integrals, as presented on page 82, are equivalent to those
usually encountered in advanced calculus and introductory real analysis
courses. Let f be a bounded real-valued function on [a, d]. For a partition
a = xq < xi < • • • < xn = b of [a,6], define, for 1 < к < n, Ik = [жл-1,ж&],
= sup{/(z) : x e Ik}, and m(f,Ik) = inf{/(z) : x G Ik}- Then
define
Uaf = inf <	M(/, Ik)(Xk - Xk-i) : a = Xq < Xi < - - < xn = b ►
and
La/ = sup V' m(/, Ik)(xk ~ Xk-1) : a = xq < Xi < • • • < xn = b
fc=i
Prove that U^f = f^f(x)dx and La(/) = f^f(x)dx. Hint: Use Exer-
cise 2.81.	~
2.83	Prove Proposition 2.21 on page 85.
2.84	Prove that the Cantor set, P, has measure zero. Hint: Recall that P C Pn
for each n G X, where Pn is the set remaining after the nth step in the
construction of the Cantor set.
2.85	Show that a subset of a set of measure zero also has measure zero.
★2.86 Prove that a nondegenerate interval does not have measure zero by pro-
ceeding as follows.
a)	Let a, b G P with a < b. Show that if	is a finite sequence of
open intervals whose union contains [a, 6], then £(Ik) > b — a.
b)	Deduce from part (a) that if a < b, then [a, b] does not have measure
zero. Hint: Use the fact that if a closed bounded interval is a subset of
the union of a collection of open sets, then there is a finite subcollection
of that collection whose union also contains the interval. This result is
a special case of the Heine-Borel theorem.
c)	Conclude from part (b) that a nondegenerate interval does not have
measure zero.
2.87	Define /: [0,1] — by
0, if x = 0 or x G [0,1] \ Q;
1/g, if x G (0,1] П Q and x = p/q in lowest terms.
a)	Show that the set of points of discontinuity of / is (0,1] П Q.
90 □ Chapter 2 The Real Number System and Calculus
b)	Deduce from part (a) that f is Riemann integrable on [0,1].
c)	Show that f(x) dx = 0.
2.88	Find a function on [0,1] that has uncountably many points of discontinuity
but is Riemann integrable. Hint: Do something with the Cantor set.
2.89	Construct a sequence of continuous functions on [0,1] that converges point-
wise to a continuous function but for which the limit and integral cannot
be interchanged.
2.90	Refer to Exercise 2.89. Is it possible to find such a sequence of functions if
the sequence is required to be monotone?
2.91	Prove that if {fn}^^ is a sequence of Riemann integrable functions on [a, 6]
that converges uniformly to a function /, then f is Riemann integrable.
Proceed as follows.
a)	Show that f is bounded.
b)	Show that the set of points of discontinuity of f has measure zero. Hint:
Let En denote the set of points of discontinuity of fn and set E =
En- Show that f is continuous at each point of [а, Ь] \ E.
2.92	Let	be a sequence of Riemann integrable functions on [0,1] that
converges pointwise to the function f. Construct an example showing that
f need not be Riemann integrable on [0,1], even if {fn}™=1 is monotone
and f is bounded.
PART TWO
□
Measure, Integration, and
Differentiation
Emile Felix-Edouard-Justin Borel
(1871-1956)
Emile Borel was born at Saint-Affrique, France,
on January 7, 1871. He exhibited a strong pro-
clivity for mathematics when he was very young
and was sent to а 1усёе at Montauban. In 1890,
he entered the Ecole Polytechnique in Paris,
graduating in 1893. He received his doctorate
from the Ecole Normale Superieure in 1894.
Borel’s most important research was done
in the 1890s when he worked on probability, the infinitesimal calculus,
divergent series, and measure theory. In 1896, he provided the proof of Pi-
card’s theorem, a proof that mathematicians had been seeking for nearly
20 years. Although John von Neumann is credited as the founder of game
theory, Borel completed a series of papers on the subject between 1921
and 1927, thus being the first to define games of strategy.
After WW I. Borel developed an interest in politics, serving as Minis-
ter of the Navy from 1925-1940. He was arrested and briefly imprisoned
by the Vichy regime in 1940. after which he worked in the Resistance.
His honors included the Resistance medal in 1945, the Croix de Guerre
in 1918, the Grand Cross of the Legion d’Honneur in 1950, and the first
gold medal of the Centre National de la Recherche Scientifique in 1955.
Borel was appointed to the faculty of the Ecole Normale Superieure
in 1896, held the Chair in Function Theory at the Sorbonne from 1909
until 1940, and was director and founding member of the Henri Poincare
Institute from 1928 until his death on February 3, 1956, in Paris.
92
Lebesgue Theory on the
Real Line
In Chapter 2, we discussed open sets, continuous functions, and the
Riemann integral. Those classical concepts have served mathematics and
its applications well. However, for the purposes of modern mathematics, a
more general and sophisticated framework is required. In this chapter, we
will take the first steps toward obtaining that framework.
We will expand the collection of continuous functions to the collec-
tion of Borel measurable functions, the smallest algebra of functions that
contains the continuous functions and is closed under pointwise limits. In
doing so, we will be led to consider the collection of Borel sets, the small-
est cr-algebra of subsets of that contains the open sets. Then we will
generalize the Riemann integral so that it applies to Borel measurable
functions. That generalization will take us to the development of Lebesgue
measure, Lebesgue measurable functions, and the Lebesgue integral.
3.1 BOREL MEASURABLE FUNCTIONS AND BOREL SETS
In the previous chapter, we showed that the collection of continuous, real-
valued functions forms an algebra but is not closed under pointwise limits.
Since this latter property is a crucial one in modern mathematical analysis,
93
94 □ Chapter 3 Lebesgue Theory on the Real Line
we will enlarge the collection of continuous functions to a collection of
functions that is closed under pointwise limits.
Specifically, we will consider the smallest algebra of (real-valued) func-
tions that contains the continuous functions and is closed under pointwise
limits. Such an algebra of functions exists — it is the intersection of all
algebras of functions that contain the continuous functions and are closed
under pointwise limits Л
As we will see presently, the condition of being an algebra is superflu-
ous. That is, the smallest collection of functions that contains the continu-
ous functions and is closed under pointwise limits is necessarily an algebra
of functions. Thus, we make the following definition:
DEFINITION 3.1 Borel Measurable Functions
We denote by C the smallest collection of real-valued functions on 1Z
that contains the collection of continuous functions and is closed under
pointwise limits. The members of C are called Borel measurable
functions.
THEOREM 3.1
The collection, C, of Borel measurable functions forms an algebra. That
is, if f and д are Borel measurable and а € 1Z, then
a)	f + д is Borel measurable.
b)	a f is Borel measurable.
c)	f • д is Borel measurable.
PROOF: We prove only part (a); parts (b) and (c) are left as exercises.
First of all, let д € C (the collection of continuous functions on IV) and set
V = {f €C:/ + pEC}.
If f G C, then f e C and f 4- д € С С C. Thus, T> D O'. Now suppose that
{/n}~ i С T> and that fn~+f pointwise. Then fn£C and fn 4~ д € C for
all n € Af and fn + д —> f 4- д pointwise. Since C is closed under pointwise
limits, we conclude that f € C and f 4- д € C. Hence, f € T>. Therefore,
we see that P is closed under pointwise limits.
t The forementioned intersection is not vacuous because the collection of all real-
valued functions is an algebra that contains the continuous functions and is closed
under pointwise limits.
3.1 Borel Measurable Functions and Borel Sets □ 95
The previous paragraph shows that P contains the continuous func-
tions and is closed under pointwise limits. Since, by definition, C is the
smallest such collection of functions^ it follows that P D C. But, by the
definition of P, 'D с C. Thus, T> = C; in other words, f 4- g G C whenever
f G C and g G C.
Next, let f € C and set
£ = {g£C:f + geC}.
It follows from what we just proved that £ contains the continuous func-
tions. Moreover, the same argument that we used to show that T> is closed
under^pointwise limits shows that £ is closed under pointwise limits. Hence,
£ = C; i.e., f 4- g € C whenever f € C and g EC.	
Borel Sets
In Chapter 2, we discovered that there is a natural association between
the continuous functions and the collection of open sets: A function is
continuous if and only if the inverse image of each open set is open. Now we
ask which collection of sets corresponds naturally to the Borel measurable
functions. As we will see, it is the collection of sets whose characteristic
functions are Borel measurable functions.
DEFINITION 3.2 Borel Sets
A set В C TZ is called a Borel set if its characteristic function, xb,
is Borel measurable. The collection of all Borel sets is denoted by B.
Thus, B = {BGTZ:xBeC}.
To begin, we will prove that the Borel sets form a ст-algebra of subsets
of TZ. In order to accomplish this, we will need several lemmas. The proof
of the first lemma is left as an exercise for the reader (see Exercise 3.2).
LEMMA 3.1
Let h denote the absolute value function, that is h(x) = |x|, x € TZ. Then
there is a sequence, {Pn}^=i? of polynomials such that pn —> h pointwise.
Next we introduce the notation used for the maximum and minimum
of two functions and prove that the Borel measurable functions are closed
under those two operations.
96 □ Chapter 3 Lebesgue Theory on the Real Line
DEFINITION 3.3 Maximum and Minimum of Two Functions
Suppose that f and g are real-valued functions on TZ. Then we define
f g — max{/, p} and f Kg = min{/, g}. That is,
(/ V g) (x) = max{f(x),g(x)}
and
(/Aj)(i) = min{/(x),p(x)}.
LEMMA 3.2
Iff and g are Borel measurable functions, then so are fVg and f Kg. More
generally, if Д, f2, ..., fn are Borel measurable, then so are /1 V • • • V fn
and fi A • • • Kfn.
PROOF: We first note that the following identities hold:
fVg= |(/ + 5 + l/-5l)	(3.1)
and
/Л5=|(/ + 5-|/-5|).	(3.2)
Next we show that |F| € C whenever F E C. Use Lemma 3.1 to choose
a sequence of polynomials,	such that pn(z) —► |x| for all x e TZ.
Since C is an algebra of functions (Theorem 3.1), it follows that pn о F € C
for all n € ЛЛ But pn о F —* \F\ pointwise and, therefore, as C is closed
under pointwise limits, |F| € C.
Now suppose that f,g eC. Then \f — g\ EC (why?). Using (3.1)
and (3.2) and the fact that C is an algebra, we deduce that f V g e C and
f К g e C. The remaining conclusions of the lemma follow by employing
mathematical induction.	
LEMMA 3.3
If {/n}Xi 2S a sequence of Borel measurable functions with {/nW}n=i
bounded for each x eTZ, then supn fn and infn fn are Borel measurable.
PROOF: By Lemma 3.2, if /i, /2? • • •, fn are Borel measurable, then so
аге Л V • • • V fn and faK---K fn. But,
sup fn = lim /1 V • • • V/n
3.1 Borel Measurable Functions and Borel Sets □ 97
and
inf fn = lim fi Л • • • Л fn.
n	n—>oo
The lemma now follows because C is closed under pointwise limits. 
Now that we have established Lemmas 3.1-3.3, we can prove that the
collection В of Borel sets is a cr-algebra of subsets of 11.
THEOREM 3.	2
The collection of Borel sets, В = { В C 1Z : хв € C }, is a ст-algebra of
subsets of 1Z.
PROOF: We first show that the collection of Borel sets is closed under
complementation. Assume В € В. Then, by definition, хв € C. Since
1 € C and C is an algebra, we conclude that 1 — xb C C. But 1 — хв = Xbc
and, consequently, Bc E B.
Next we show that the collection of Borel sets is closed under countable
unions. Suppose that Bn E B, for n E Then хвп E C for n 6 jV
and, therefore, by Lemma 3.3, supnXBn € C- But supnXBn = X|j~ Bn-
Hence, UXi Bn e B.	”=1 
Further Properties of Borel Sets and
Borel Measurable Functions
It is left as an exercise for the reader to show that if О is an open set,
then xo is a Borel measurable function. In other words, every open set is
a Borel set. We will prove shortly that, in fact, В is the smallest a-algebra
that contains all the open sets.
But first, we will justify our contention that it is natural to associate
the Borel sets with the Borel measurable functions, as we do the open sets
with the continuous functions. Specifically, we wrill show that a function
is Borel measurable if and only if the inverse image of each open set is a
Borel set. In order to accomplish this, we will introduce another collection
of functions which, as we will see, turns out to be identical to the collection
of Borel measurable functions.
LEMMA 3.4
Let F = {f : /-1(O) € В for all open sets O}. Then F contains the
continuous functions.
98 □ Chapter 3 Lebesgue Theory on the Real Line
PROOF: Suppose that f is a continuous function on 1Z. Then, by Theo-
rem 2.5 on page 66, /-1(O) is open whenever О is open. But every open
set is a Borel set (Exercise 3.3). Therefore, /-1(О) € В whenever О is
open. This shows that F contains the continuous functions.	
LEMMA 3.5
f € F if either of the following conditions hold:
a)	For each a ElZ, f~r ((—oo, a)) € B.
b)	For each a ElZ, f~x ((a, oo)) € B.
PROOF: We will prove part (a). The proof of part (b) is similar and is
left as an exercise. So, suppose that f satisfies the condition in part (a).
We claim that f E F\ that is, /-1(O) € В for all open sets, O. Set
A={Ac1Z:f~\A)EB}.
Because f-^A0) = [/“^A)]6, /“HUn^n) = Un/”1^), and В forms a
а-algebra, it follows that A is a cr-algebra.
Now, by assumption, A contains all sets of the form (—oo,a), where
а E 1Z. If a € 7£, then we can write (—oo, a] = A^i(-00,a + Vn);
therefore, (—oo, a] 6 A because A is a cr-algebra. This in turn implies that
(a, 6) € A for each a, 6 € 7£, since (a, 6) = (—oo, b) П (-oo,a]c. It now
follows easily that A contains all open intervals. But, by Proposition 2.13
on page 59, every open set is a countable union of open intervals. Conse-
quently, A contains all open sets. This means that /-1(О) € В for all open
sets, O; that is, f € F.	
LEMMA 3.6
F is closed under pointwise limits.
PROOF: Suppose that {#n}^Li C F and let д = supn gn. Then, for a € 7£,
(j-1 ((a, oo)) = UXi^n1((a>00)) e Therefore, by the previous lemma,
supn gn £ F. Similarly, infn gn E F.
Now suppose that {/rJXi c an^ /п —* / pointwise. Then for
each x E 1Z, lim^oo fn(x) = f(x) and so limsupn^^ fn(x) = f(x). Thus,
f = infn{supfc>n/fc}. Let gn = supfc>n/fc. Then the previous paragraph
shows in turn that gn E F for all n G and infn gn € F. Hence, f E F. 
COROLLARY 3.1
F contains the Borel measurable functions.
3.1 Borel Measurable Functions and Borel Sets □ 99
PROOF: By Lemmas 3.4 and 3.6, T7 contains the continuous functions
and is closed under pointwise limits. Since the collection of Borel measur-
able functions, C, is the smallest collection of functions that contains the
continuous functions and is closed under pointwise limits, it must be that
ПС.	
LEMMA 3.7
Let f 6 fF. Then there is a sequence, {fn}™=i, of Borel measurable func-
tions such that fn—*f pointwise.
PROOF: First of all, note that if a,b G TZ, then /“1([a, 6)) € В (why?).
For n G AT, let
z? J k \ ^k + X\ /-1 f ffc fc +
Enk = < x : - < f(x) <------ > = f	------
{ n	n J \Ln n / /
for к = 0, ±1, ±2, .... Then Enk G В and so XEnk € C. Since C is an
algebra of functions and is closed under pointwise limits, the function
o° i	к .
fn= 52 ^xe^ = Jim 12
71	k-+oo П
k= — tx>	j=—k
is in C. It is easy to see that |/(x) — fn(x)\ < 1/n for all x G 7Z. Hence
fn —* f pointwise (in fact, the convergence is uniform).	
Using the preceding results, it is now evident that T7 and the collection
of Borel measurable functions are one and the same. Specifically, we have
the following theorem.
THEOREM 3.	3
A function f is Borel measurable if and only if the inverse image of each
open set under f is a Borel set; that is, if and only if f~r{0) G В for all
open sets O.
PROOF: By Corollary 3.1, J7 D C. Conversely, suppose^that f eF. Then,
by Lemma 3.7, f is the pointwise limit oHunctionsjn C. Since C is closed
under pointwise limits, this implies f G C, Thus, C D 7.	
We mentioned earlier that the collection of Borel sets, B, is the smallest
a-algebra of sets that contains the open sets. Here is a proof of that result.
100 □ Chapter 3 Lebesgue Theory on the Real Line
THEOREM 3.	4
The collection of Borel sets, B, is the smallest ст-algebra of subsets of R
that contains all the open sets.
PROOF: We already know that В is a cr-algebra that contains all the open
sets. Let A be any cr-algebra that contains all the open sets. We claim that
В C A. Let Q = {f : /-1(O) € A for all open sets O}. The arguments
used for Lemmas 3.4 and 3.6 depend only on the fact that В is a cr-algebra
containing the open sets. It follows that Q contains the continuous functions
and is closed under pointwise limits; thus, Q D C. This last fact implies that
Xb E Q for all В G B. But then, for each В e В, В = Хв* ((1/2,3/2)) € A.
Thus В C A.	
Remarks: In many texts, the collection of Borel sets, B, is defined to be
the smallest cr-algebra of sets that contains the open sets; and a function
is defined to be Borel measurable if the inverse image of each open set is a
Borel set. As we see from Theorems 3.3 and 3.4, the definitions presented
here (Definitions 3.1 and 3.2) are equivalent to those. It seems more nat-
ural, though, to introduce the Borel measurable functions in a way that is
motivated by a defect in the collection of continuous functions; namely, the
defect of not being closed under pointwise limits. Once this introduction is
accomplished, however, it may indeed be easier to think of Borel measur-
able functions via Theorem 3.3 and Borel sets via Theorem 3.4. Moreover,
those characterizations will be used as a means to define Borel sets and
Borel measurable functions in more general contexts.
Here now are several examples that illustrate Borel measurable func-
tions and Borel sets. We have left some of the justifications as exercises for
the reader.
EXAMPLE 3.1 Illustrates Borel Measurable Functions and Borel Sets
a)	Because В is a cr-algebra containing the open sets, it follows that all
open sets, closed sets, and intervals are Borel sets.
b)	If В is a countable set, then В e B; in particular, Q e B. From this, it
also follows that the set of irrational numbers, 1Z \ Q, is in B.
c)	By definition, any continuous function is Borel measurable.
d)	xq is Borel measurable because Q € B. Indeed, if В € В, then хв is
Borel measurable by the definition of B.
e)	If Вг, ..., Bn E В and ai, ..., an e R, then f = akXBk is Borel
measurable. This follows immediately from part (d) and the fact that
C is an algebra of functions. In particular, all step functions are Borel
measurable.
3.1 Borel Measurable Functions and Borel Sets □ 101
f)	Every monotone function is Borel measurable, as the reader can easily
verify by applying Lemma 3.5.
g)	A real-valued function f on TZ that is 0 except on a countable set is
Borel measurable. Indeed, suppose К is countable and f(x) = 0 for
x K. Let {яп}п be an enumeration of K. Then f = f(xn)X{xny
If К is finite, then f is Borel measurable by parts (a) and (e). If К is
infinite, then f is the pointwise limit of the Borel measurable functions
52fc=i f(xk)X{xk}> n e and, hence, is itself Borel measurable. □
Borel Measurable Functions and Borel Sets
on Subsets of TZ
We conclude this section with a brief discussion of Borel measurable func-
tions and Borel sets when the underlying space is some subset D C TZ.
The pertinent definitions and theorems are obvious modifications of those
discussed earlier.
DEFINITION 3.4 Borel Measurable Functions on D
We denote by C(D) the smallest collection of real-valued functions
on D that contains the continuous functions on D and is closed under
pointwise limits. The members of C(D) are called Borel measurable
functions on D.
DEFINITION 3.5 Borel Sets of D
A set В C D is called a Borel set of D if its characteristic function
is a Borel measurable function on D. The collection of all Borel sets
of D is denoted by B(D). Thus, B(D) = { В C D : Xb € C(D)}.
Using arguments similar to those used earlier, we can obtain the fol-
lowing theorems:
THEOREM 3.	5
A function f is Borel measurable on D if and only if the inverse image of
each open set under f is a Borel set in D, that is, /~1(O) € B(D) for all
open sets O.
102 □ Chapter 3 Lebesgue Theory on the Real Line
THEOREM 3.	6
The collection of Borel sets of D, B(D), is the smallest cr-algebra of subsets
of D that contains all open sets in D.
An interesting and useful characterization of Z3(P) is given by the
following theorem. Note the analogy with open sets in D (see Theorem 2.3
on page 62).
THEOREM 3.	7
В € B(D) if and only if there is an A E В such that В — DC\ A. That is,
the Borel sets of D are precisely the Borel sets (ofTZ) intersected with D.
PROOF: Let A = {D A A : A € B}. We claim that A = B(JD). It is easy
to see that A is a cr-algebra of subsets of D and, since В contains all open
sets of TZ, A contains all open sets of D. Thus, by Theorem 3.6, A D B(P).
Now, let C be any cr-algebra of subsets of D that contains the open
sets of D and set
Then P contains the open sets of 1Z because C contains the open sets of D.
Also, V is a а-algebra because В and C are. Consequently, P D B. But,
by definition, P С B. Hence, P = B. It now follows that А С C and, since
C was an arbitrary a-algebra of subsets of D that contains the open sets,
we conclude that A C B(D). This last result and the previous paragraph
show that A = B(D).	
EXERCISES 3.1
3.1 Prove parts (b) and (c) of Theorem 3.1 on page 94.
★3.2 Let h(x) = |x|. Prove Lemma 3.1 on page 95 by proceeding as follows:
a)	Show that there exists a sequence of polynomials that converges uni-
formly to h on [—1,1]. Hint: Consider the Taylor series expansion for
(1 — on [0,1].
b)	Use part (a) to conclude that for each compact subset К of TZ, there
exists a sequence of polynomials that converges uniformly to h on К.
Hint: If 6 > 0, we can write |x| = b | f |.
c)	Use part (b) to conclude that there exists a sequence of polynomials that
converges to h uniformly on each compact subset of TZ.
d)	Deduce Lemma 3.1 from part (c).
3.3	Prove that every open set is a Borel set by showing that for each open
set, O, xo is a Borel measurable function. Hint: Begin by showing that
Xi is Borel measurable for each open interval I.
3.2 Lebesgue Outer Measure □ 103
3.4	Verify part (b) of Lemma 3.5 on page 98.
3.5	Show that f is Borel measurable if and only if /-1(B) G В for all Borel
sets B.
3.6	Let D be a dense subset of 71. Show that f is Borel measurable if either of
the following conditions holds:
a)	For each d G D, J”1 ((-oo, d)) G B.
b)	For each d G B, Z”1 ((d, oo)) G B.
3.7	Show that all closed sets and all intervals are Borel sets.
3.8	Prove that every monotone function is Borel measurable.
3.9	Prove Theorems 3.5-3.7.
3.10	Verify (3.1) and (3.2) on page 96.
3.11	Show that any countable subset of is a Borel set.
3.12	For subsets A and В of 11, define
Л 4-B = {a + 6 : a G A and b G В }.
Suppose that В is a Borel set. Prove that A + В is a Borel set if A is
a). countable. b) open.
3.13	Most functions encountered in a calculus course can be obtained from the
identity function, i(x) = x, using the standard operations of algebra (sums,
products, quotients, and the extraction of roots) together with the operation
of passing to the limit in a sequence of functions. For example,
Explain why any function obtained using the forementioned operations is a
Borel measurable function.
3.2	LEBESGUE OUTER MEASURE
In the previous section, we enlarged our basic collection of functions from
the continuous functions to the Borel measurable functions. Although both
of those collections of functions are algebras, the latter collection has the
advantage of being closed under pointwise limits.
Our next goal is to extend the Riemann integral to an integral that
applies to all Borel measurable functions. The extension is not trivial
since there are Borel measurable functions that are not Riemann integrable.
Indeed, as we learned in Theorem 2.7 on page 86, a bounded function is
Riemann integrable if and only if it is continuous except on a set of measure
104 □ Chapter 3 Lebesgue Theory on the Real Line
zero. There are certainly Borel measurable functions that do not satisfy
this last condition (e.g., xq).
Referring to Section 2.6, beginning on page 81, we see that the Rie-
mann integral is developed by first defining the integral of a step function,
fc(a:) = Efc=! akXik (*), on [a, 6] to be
fb	n
/ h(x)dx = ^ak£(Jk)
Ja	fc=i
where 1(1) denotes the length of an interval I. Therefore, the definition of
the Riemann integral ultimately depends on the concept of length, which
applies only to intervals of real numbers.
To obtain an integral that applies to all Borel measurable functions,
we proceed by analogy with the development of the Riemann integral.
Specifically, we must first define the integral of a Borel measurable function
of the form s(x) =	where the B^s are Borel sets. If the B^s are
intervals, then s is a step function and we simply define the integral to
equal the Riemann integral, ^адДВ^). If the B^s are not intervals, then
what? It seems that we need to generalize the concept of length so that it
applies to arbitrary Borel sets.
The Definition of Lebesgue Outer Measure
The concept of length will be extended and replaced by that of measure.
As we will see, this is by no means a simple procedure. Let us denote
the required measure by the Greek letter, /z, and the collection of subsets
of 71 to which it applies by the letter A. Subsets of TZ that belong to A
are called measurable sets. We will now list some properties that jj, and A
should satisfy.
Since measure is to be a generalization of length, we require that the
measure of an interval be its length; that is, jjl(I) = £(I) for all intervals I.
Also, for purely mathematical reasons, we require that A be a tr-algebra;
and as we want all Borel sets to be measurable, we require that A D B.
Now, clearly, the measure of the union of two disjoint intervals should
be the sum of their lengths (measures). More generally, then, we require
that the measure of the union of two disjoint measurable sets be the sum
of their measures. That is, if A, В € A and А П В = 0, then
ц(А U B) = jjl(A) + /z(B).
(3.3)
3.2 Lebesgue Outer Measure □ 105
Using mathematical induction, we can show that the previous condition
implies that if Ai, A2, ..., An e A and Ai QAj = 0 for i j, then
Gn	ч n
jAfe) = J>(4fe).	(3.4)
Л=1	'	к=1
This condition on /z is called finite additivity.
For purposes of modern mathematical analysis, we need to impose a
somewhat stronger condition on our measure than finite additivity; namely,
that (3.4) hold not only for finite collections of pairwise disjoint measurable
sets but also for countably infinite collections of pairwise disjoint measur-
able sets. This condition is called countable additivity.
In summary, if /z is the required generalization of length and A is the
collection of subsets of TZ that have a length in this extended sense, then
the following conditions should be satisfied:
(Ml) /z(/) = ^(1), for all intervals I.
(М2) A is a cr-algebra and A D B.
(М3) If Ai, A2, ... are in A, with Ai A Aj = $ for i j, then
д(и^п) =^2м(Л„).
' n ' n
Conditions (M1)-(M3) provide us with the means for extending the
notion of length to all open sets. First, since every open set is a Borel set,
Condition (М2) implies that every open set should be measurable. Now, let
О be an open set. Then О is a countable union of disjoint open intervals,
say О = {JnIn- Now applying, in turn, Conditions (М3) and (Ml), we
infer that
m(o) = m(Uz") = E^) = E^n)-
' n ' n	n
So, we now see how to extend the notion of length to all open sets.
For sets that are more complicated than open sets, however, it is not
at all obvious what to do. In fact, defining a suitable measure for subsets
of 1Z constituted a major problem for mathematicians until the beginning
of the twentieth century, when Henri Lebesgue found the key. His idea was
as follows: For a subset A C ft, consider all open sets that contain A as
a subset. Then define the measure of A to be the greatest lower bound of
the measures of all those open sets:
inf{ /z(O) : О open, О D A }.	(3.5)
106 □ Chapter 3 Lebesgue Theory on the Real Line
With this definition, we “close down on A” or “come at A from the outside,”
so we call this measure of A its outer measure. Outer measure is defined
for all subsets of 1Z. But, as we will see, it is countably additive (i.e.,
satisfies Condition (М3)) only when restricted to a proper subcollection of
subsets of 1Z. Consequently, we will denote outer measure not by p,, but
instead by Л*.
Below we give a formal definition of outer measure. The definition
that we present does not use (3.5) but is equivalent to it.
DEFINITION 3.6 Lebesgue Outer Measure
For each subset A C 7£, the Lebesgue outer measure of A, denoted
by A*(A), is defined by
A*(A) = inf < У^£(1П) : {/n}n open intervals, |^Jln D A ►
4 n	n	,
Note: A sequence of open intervals, {/n}n, appearing in Definition 3.6 can
be either a finite or infinite sequence.
Basic Properties of Lebesgue Outer Measure
Some basic properties of Lebesgue outer measure are proved in the next
two propositions.
PROPOSITION 3.1
Lebesgue outer measure, A*, has the following properties:
a)	A* (A) > 0, for all A C 11. (nonnegativity)
b)	A*(0) = O.
с)	AcB=> A* (A) < A*(B). (monotonicity)
d)	A*(z + A) = A*(A) for x G 7£, A C 1Z} where x + A = {x + у : у e A}.
(translation invariance)
e)	If {An}n is a sequence of subsets of 1Z, then
A‘(UA.) <5>*(ЛП).	(3.6)
' n ' n
In particular if А, В с K, then X*(A U B) < A* (A) + A*(B). The
relation in (3.6) is called countable subadditivity.
3.2 Lebesgue Outer Measure □ 107
PROOF: For each E C 7£, let
Se = <
У^£(ГП) : {ln}n open intervals, [J/n D E >
n	n	>
Then, by definition, A*(E*) = inf{ x : x € Se }•
a)	If A C 7£, then Sa C [0, oo] so that A* (A) = inf{ x : x G Sa } > 0.
b)	For e > 0, the interval Ie = (—б/2,б/2) contains 0; so, e = £(IC) G S®.
Hence, A*(0) = inf{ x : x E S$} < e for all e > 0. This implies that
A*(0) = 0.
c)	Let u G Sb. Then there is a sequence {ln} of open intervals such that
В C UAi and и =	But В C \JIn => A C U^n => и G SA-
Thus, Sb C Sa and, consequently,
A*(A) = inf{x : x G Sa} < inf{x : x G Sb } = A*(B).
d)	The proof of this part is left to the reader as an exercise.
e)	If A*(An) = oo for some n, then, by part (c), A*(|JAn) = oo; hence,
(3.6) holds. So, assume that A*(An) < oo for all n. Let 6 > 0 be
given. For each n, choose a sequence {Ink}k of open intervals such that
\JkInk Э An and ^2k£(Ink) < X*(An) + e/2n. Now, the collection of
intervals, {Ink}n,k> is countable because Af x Af is countable. Therefore,
because Un,k ^k = Un(Ufc Ък) Э Un An, it must be that
A‘(U An> < £^Ink) = SE
п	n,k	n к
<E(a-(A.) + £)
n
<£а*(Лп) + €.
n
As б > 0 was arbitrary, this proves that A*(|Jn An) <	A*(An).	
As we have noted, the domain of A* is P(7£); that is, every subset
of has an outer measure. Our question now is whether A* is the desired
extension of length. That is, do Conditions (M1)-(M3) hold with /1 = A*
and A = Р(Я)? Certainly, Condition (М2) holds; and the next proposition
shows that Condition (Ml) holds also.
PROPOSITION 3.2
The outer measure of an interval is its length. That is,	for
every interval I.
108 □ Chapter 3 Lebesgue Theory on the Real Line
PROOF: First assume I = [a, b], that is, that J is a bounded and closed
interval. If e > 0, then (a — c/2, b 4- c/2) D [a, b] and so
A*([a, Ь]) < £(fa — b -I-) = b — a 4- c.
Thus, for any e > 0, A*([a, b]) <b — a + e and, hence, A*([a, b]) <b — a.
Consequently, it remains to establish that A*([a,b]) > b — a. Let {In}
be a sequence of open intervals such that (J/n D [a, b]. We claim that
52 ^(Jn) > b — a. Since {In} is an open cover for [a,b], the Heine-Borel
theorem implies that there is a finite subcover, say {Ik}k=1- Now, clearly,
SkLi Wk) < SnW- So, we need only show that SitLi^fc) >b-a.
As a e [a,fe], there must be an interval, say Ji = (ai,bi), in the
collection {Ik)k=i ai < a < ^i- If b < bi, then
N
=	>b-a.
fc=l
Otherwise, b\ G [a, b], so there must be an interval, say J2 = (n2,b2), in
the collection {Zfc}^=1 with a2 <	< b2. Note that, necessarily, J2 Ji-
If b < b2, then
N
$>№) > €(Л) + W = (bj - ai) + (b2 - a2)
fc=l
= (b2 - ai) 4- (bi — a2) > b2 — ai > b — a.
Otherwise, b2 G [a,b], so there must be an interval, say J3 = (аз,Ьз), in
the collection {Ik}k=i such that аз < b2 < Ьз and, necessarily, J3 / J2 and
This process can continue at most N times. Consequently, there is an
m G A/* with m < N such that Ji = (a;, bi) G {h}k=i for 1 < z < m and
ai <C a, a2 < bi < b2, ..., am <C bm—1 < bm^ b <C bm.
Therefore,
N	m
— (bi — ai) + (b2 — a2) -|-4- (bm — dm)
к=1	г=1
= (bm — ai) + (bi — a2) 4- (b2 — аз) 4- • • • 4- (bm-i — am)
> bm > b a.
3.2 Lebesgue Outer Measure □ 109
Thus, if {In} is a sequence of open intervals with \JIn [<x, 6], then
> b — a. But then, by definition, A*([a, b]) > b — a. This last fact
and the previously established reverse inequality show A*([a, b]) = b — a.
Now, let I be any finite (i.e., bounded) interval. Then for each e > 0,
there is a closed interval J with J С I and £(/) < £(J) + e (why?). Thus,
£(I) — в < ^(J) = A*(J) < A*(Z). Since e > 0 was arbitrary, it follows that
A*(Z) > £(/). But, on the other hand, A*(Z) < A* (7) = £ (l) = £(Z), so
that A*(Z) < £(Z).
Finally, if I is an infinite (i.e., unbounded) interval, then for each real
number M, there is a closed interval, J, of length M with J С I. It follows
that A*(Z) > A* (J) = £(J) = M. Hence, A*(Z) = oo.	
We have seen that Conditions (Ml) and (М2) are satisfied with /i = A*
and A = So our question now is: Does Condition (М3) holds with
pt = A* and A = P(7£)? If the answer to this question were yes, then A*
would be the desired extension of length and every subset of P, would be
measurable. The answer, however, is no! In fact, as we will discover in the
next section, A* is not even finitely additive.
EXERCISES 3.2
3.14
3.15
3.16
3.17
3.18
Let I be any finite interval. Show that for each c > 0, there is a closed
interval J with J С I and £(J) < £(J) + 6.
Prove part (d) of Proposition 3.1. That is, show that A*(x + A) = A* (A)
for x G P, A C 7£, where x + A = {x + у : у G A}.
Let A be a set with A*(A) < oo. Show that the function, p, defined by
g(x) = A* (An (—oo, x]) is uniformly continuous on P.
Show that the Cantor set, P, has Lebesgue outer measure zero.
Let E С P. Show that there is a sequence of open sets,	such that
Oi D O2 D • • • D E and
A*(E) = A’
lim A* (On).
3.19
3.20
For A CP and b G 7£, define bA = { ba : a € A }. Show that
A*(bA) = |b|A*(A).
Suppose that f: P —► P is differentiable at each point of P.
a)	If |/'(x)| < 1 for each x G P, prove that for each Acft,
А*(/(А))<А*(Л).
Hint: Use the mean-value theorem.
b)	Provide an example to show that the previous inequality may fail to hold
if |/'(x)| > 1 for some x G P.
110 □ Chapter 3 Lebesgue Theory on the Real Line
3.3 FURTHER PROPERTIES OF LEBESGUE
OUTER MEASURE
Recall that we are trying to extend the notion of length so that it applies
to all Borel sets. Specifically, we are searching for a function /i defined on
some collection, A, of subsets of 1Z such that
(Ml) /i(J) = ^(7), for all intervals I.
(М2) A is a cr-algebra and Л D B.
(М3) If Л1, Л2, ... are in A, with Ai	for i / j, then
In Section 3.2, we discovered that Conditions (Ml) and (М2) are sat-
isfied with = A* (Lebesgue outer measure) and A = P(7£). We will
prove in this section that Condition (М3) does not hold with /1 = A* and
a = р(тг).
In fact, we will show that even finite additivity does not obtain. That
is, it is possible to find disjoint subsets, A and B, of such that the
equation
A* (A U В) = A* (A) + A* (B)	(3.7)
fails to hold. The idea is to choose A and В so that they are disjoint but
“sufficiently intermingled.”
Finite Additivity Properties of A*
It is best to begin by determining conditions on disjoint sets, A and B,
that imply that (3.7) holds. Our first result is that if A and В are not only
disjoint but are a positive distance apart, then (3.7) is true. Before proving
that fact, we need some preliminary definitions and lemmas.
DEFINITION 3.7 Distance Between a Point and a Set or Two Sets
If x e It and E C 7£, then the distance from x to E, denoted by
d(x, B), is defined to be
d(x, E) = inf{ \y - x| : у G E }.
3.3 Further Properties of Lebesgue Outer Measure □ 111
If E and F are subsets of 1Z, then the distance from E to F, denoted
by d(F, F), is defined to be
d(F, F) = inf{ \y — x| : у e F, x e F }. * is
It is left as an exercise for the reader to show that (1) for fixed E C 7£,
the function d(x, F) is continuous, (2) d(E,F) — inf{d(?/,F) : у 6 F},
and (3) if A C F and В C F, then d(F, F) < d(A, B).
LEMMA 3.8
Suppose that I is a finite open interval and let > 0 be given. Then there
are a finite number of open intervals, say Jlf ..., Jn, such that £(Jk) < 6,
1 < к < n, I c UL1 Jk, and ££=1 £(Jk) < £(I) + 6.
PROOF: The proof is left as an exercise.
LEMMA 3.9
Suppose that A is a subset ofIZ with A* (A) < oo. Then for each e, 6 > 0,
there is a sequence {In} of open intervals such that £(In) < 6 for all n,
Ufn D A, and £€(/п) < A*(A) + e.
PROOF: Given e > 0, there is a sequence {Jn} of open intervals such
that (J Jn D A and 52f(Jn) < A* (A) + c/2. By Lemma 3.8, for each Jn,
there are a finite number of open intervals, say Jni, Jn2, ..., Jnkn> with
< 5, 1 < j < kn, Ujli Jnj D Jn, and	< £(Л) +e/2n+1.
Now, the collection,
U{Ai}j=i =	^12’ * ’' ’	^21’ ^22’ * ’ ’ ’ ^2fc2’ * ’ • ь
n
is countable, being a countable union of finite collections. We have that
€(Jnj) < <5, for each n and j, and
fan) + 2n+i)
n,j	n J=1	n
<V(4) + | + S A-(A) + «.
n
Also, Un,J ^nj — Un(Uj=l D |Jn *7n Иwe ге‘п^ех the collection,
{Ju, J12, • • •, Jifcj, J21, J22, • • •, J2fc2, • • •}, using a single subscript and
obtain {Jn}n, then this sequence satisfies the conclusions of the lemma. 
112 □ Chapter 3 Lebesgue Theory on the Real Line
THEOREM 3.8
Suppose that A and В are subsets of 71 that are a positive distance apart;
that is, d(A, B) > 0. Then
A*(AUB) = A*(A) + A*(B).
PROOF: Let 6 = d(A, B). If A* (A U B) = oo, then it follows from Propo-
sition 3.1(e) that the conclusion of the theorem holds. So, assume that
A*(A U B) < oo. Let e > 0 be given. By Lemma 3.9, there is a sequence
{/n} of open intervals such that £(In) < <5 for all n, (JIn D A U B, and
£Z(/n)< A*(AuB) + e.
Now, let {Jn} denote the members of {In} that contain a point of
A and let {A"n} denote the ones that do not contain a point of A. Since
A C A U В C (j In, it follows that A C (J Jn. Also, because d(A, B) = 6
and £(In) < 6 for all n, there can be no points of В in any Jn. Therefore,
because В C A U В c (JIn, it must be that В C U Kn-
Using the definition of outer measure, we conclude that
А‘(Я) + A* (В) <	< A*(A U B) + e.
Because e > 0 was arbitrary, A*(A) + A*(B) < A*(A U B). The reverse
inequality is true by Proposition 3.1(e).	
We can, in fact, improve on Theorem 3.8. Indeed, suppose that A
and В are two subsets of 71 with the property that there is an open set, O,
with Ac О and В C Oc. Then the conclusion of Theorem 3.8 obtains.
Roughly speaking, the reason is as follows. Since О is open, it can
be written as a countable union of disjoint open intervals. Because the
points of A must lie within these open intervals and the points of В must
lie outside of them, the sets A and В cannot be too intermingled. Before
we can provide a rigorous proof of the improvement of Theorem 3.8, we
need two more lemmas.
LEMMA 3.10
Let О be a proper open subset of 71 (i.e., О is open, nonempty, and not
equal toll). For each n G let
On = < x : d(x, Oc) > — |.
I	n I
3.3 Further Properties of Lebesgue Outer Measure □ 113
Then,
a)	On is open and On С О for all nG AC
b)	Oi С O2 C • • • and (Jn On = O.
c)	If On / 0, then d(On,Oc) = 1/n.
d)	If On / 0, then d(On, O£+1) = l/n(n 4- 1).
PROOF: The proofs of parts (a) and (b) are left to the reader.
c) Since.d(On, Oc) = inf { d(x, Oc) : x G On }, we see that d(On, 0c) > 1/n.
To prove the reverse inequality, we first note that because О is open, it
can be expressed as a countable union of disjoint open intervals, say the
intervals {Ij}j.
Now, by assumption, On / 0- This means that there is a у G О
such that d(y, Oc) > 1/n. Since у G O, there is a к such that у G Д.
Clearly, the distance from у to Oc is the same as the distance from у
to the nearest endpoint of Ik- Therefore, if we write Ik = (u^, 6&), then
у G (afc 4- 1/n, bk — 1/n). It follows that 0 / (a^ 4- 1/n, bk — 1/n) C On-
Note that at least one of the two numbers, ak and bk, must be finite.
We will assume that ak is finite. (If ak is infinite, a similar argument
holds.)
Since (ak 4- 1/n, bk - 1/n) C On and ak 6 Oc, we conclude by
applying Exercise 3.21(c) that
d(On,Oc) < d((uk + l,bk -	= 1-
\\ n П/ J n
This completes the proof of part (c).
d) We first show that d(On,O£+1) > l/n(n 4- 1). Suppose у G On and
z G O£+1. By definition, d(y,Oc) > 1/n and d(z,Oc) < l/(n4- 1). Let
e > 0 be given. Then there is a w G Oc with \w — z\ < l/(n 4- 1) 4- e.
Also, w G Oc implies that \w — y\ > 1/n. Thus,
। । । । । । 1 1 1
\Z — у > w — у — W — Z >------------7 — € = —:---— — 6.
1	“ 1	1	1 n n-hl n(n-hl)
As z and у do not depend on c, we conclude that |z — t/| > l/n(n + 1).
Consequently, because у G On, z G 0£+1 were arbitrary, it follows that
d(On,O‘+1)>l/n(n + l).
To prove the reverse inequality, let Ik = (а&? &fc) be as in the proof
of part (c) and assume as before that ak is finite. Because ak € Oc,
ak 4- l/(n 4* 1) G On+i- Therefore, by Exercise 3.21(c),
d(On, C)n+i) < d( (ak 4	, bk	4 j г У ) т “ тт*
\\ п п/ I п + 1) J п(п4-1)
This completes the proof of part (d).	
114 □ Chapter 3 Lebesgue Theory on the Real Line
LEMMA 3.11
Suppose that A C 1Z and. A* (A) < oo. Assume there is a proper open
subset О of 71 with AcO. Let On = {x : d(rr, Oc) > 1/n }. Then
A*(A) = lim A*(AnOn)
n—>oo
PROOF: Let An = An On- Then, by Lemma 3.10(b), Ai С A2 C •••
and, consequently, A*(Ai) < A*(Ag) < • • •. Also, since An C A for all n,
A*(An) < A*(A) for all n. By assumption, A*(A) < 00. Thus, {A*(An)}n is
a monotone nondecreasing, bounded sequence of real numbers; and, hence,
converges to a real number, say a. Clearly, a < A*(A).
Now, let Bn = A\An and Cn = An+i \ An. Then A = AnUBn and
Bn = cn U Cn+1 U • • •. Thus,
A*(A)<A‘(An) + A‘(Bn)	(3.8)
and
A* (Bn) < A* (Cn) + A*(Cn+1) + • • • .	(3.9)
Now, for n > 2, An+i = АпиСпЭ An-i UCn, so that
A*(An_iUCn)<A*(An+i). '	(3.10)
Also, An-i C On-i and Cn C O£. So, by Lemma 3.10(d), d(An-i,Cn) >
d(On_i,O£) = l/n(n - 1) > 0. Therefore, Theorem 3.8 implies that
A*(An_i U Cn) = A*(An-i) + A*(Cn).
Using (3.10) and this last equation, we conclude that, for n > 2,
A*(Cn)<A*(An+i)-A*(An_i).
Then (3.9) implies
A‘(Bn) < f>*(An+fc) - A*(An+fc_2)]
fc=l
so that by (3.8)
A*(A) < A*(An) + A*(Bn) < A*(An) + f}[A‘(An+fe) - A*(An+fc_2)].
fc=l
3.3 Further Properties of Lebesgue Outer Measure □ 115
But,
A*(An) 4- У^[А*(Лп+&) — Л*(ЛП_|_£_2)]
k=i
= lim /л*(Ап) + V2[A*(An4-fc) — Л*(ЛП4-А;-2)]1
k	k=l	'
— lim {—A*(An_i) 4- A*(An+Tn-i) 4- A*(An+Tn)}
ттг—+OO
= -A*(An_i) + 2a.
Hence, we have shown that, for all n, A*(A) < — A*(An_i) 4- 2a. Letting
n —> oo, yields A*(A) < a. We have already noted that a < A*(A). Thus,
A (A) — a — limn—>0Q A (A^).	И
THEOREM 3.	9
Suppose that A and В are subsets of with the property that there is an
open set О with AcO and BcOc. Then
A*(A U B) = A*(A) 4-A*(B).
PROOF: If either A* (A) or A*(B) is infinite, then the result is trivial. So,
assume that both are finite. If О = 0, then A = 0; and if О = 7£, then
В = 0. In either of those cases, the result is also trivial.
Hence, we can assume that О is a proper open subset of 7£. As before,
let On = { x : d(x, Oc) > 1/n } and An = An On. Because An C On and
В C Oc, Lemma 3.10(c) implies that d(An,B) > d(On,Oc) = 1/n and,
thus, by Theorem 3.8, A*(AnUB) = A*(An)4-A*(B). Since AnUB C AuB,
A*(An U B) < A* (A U B). Thus, for each n 6 X,
A* (A U B) > A*(An U B) = A*(An) 4- A*(B).
Letting n —> oo and applying Lemma 3.11, we get that
A*(A U B) > A*(A) 4-A*(B).
Proposition 3.1(e) shows that the reverse inequality holds.	
116 □ Chapter 3 Lebesgue Theory on the Real Line
Lebesgue Outer Measure Is Not Finitely Additive
We have now seen that, under certain conditions,
А*(ЛиВ) = A*(A) + A*(B)
(3-11)
for disjoint subsets, A and B, of TZ. Our next theorem, which we will state
and prove shortly, shows that (3.11) does not hold for every pair of disjoint
subsets, A and B, of TZ , that is, that A* is not finitely additive.
In view of Theorem 3.9, it is clear that if (3.11) fails to hold for disjoint
subset, A and B, then those sets must be considerably intermingled. To
obtain this intermingling, we proceed as follows.
LEMMA 3.12
For x, у G TZ, define x ~ у if and only if у — x G Q. Then ~ is an equiv-
alence relation and, hence, partitions TZ into disjoint equivalence classes.
Furthermore, there is a set S C [0,1) that contains exactly one element
from each equivalence class.
PROOF: That ~ is an equivalence relation is left as an exercise for the
reader. By the axiom of choice (see page 16), we can select a set T C TZ
that contains exactly one element from each equivalence class. Let us set
S = { x — [x] : x e T } where [x] denotes the greatest integer in x. Because
for each x, x — [x] € [0,1) and x — [x] ~ x, the proof is complete. 
LEMMA 3.13
Let S be the set defined in Lemma 3.12 and W = (-1,1) П Q. Then
a)	{S + r}r€q forms a collection of pairwise disjoint sets.
b)	(0,1)сигеж(^ + г)с(-1,2).
PROOF:
a)	Suppose q, r G Q and (S + q) П (S + r) / 0. Let у G (S + q) П (S + r).
Then there exist u, v G S such that y = u + q = v + r. Hence, и ~ v.
Since S contains only one element from each equivalence class, we must
have и = v, which, in turn, implies q — r.
b)	Let x G (0,1). Then there is a и G S such that x ~ u. Put r = x — u.
Then r G Q and x G S + r. Moreover, since x G (0,1) and S C [0,1),
-1 < r < 1. Thus, (0,1) C [JreW(S + r). That [JreW(S + r) C (-1,2)
follows immediately from the fact that S C [0,1).	
Note: Lemma 3.13(a) shows that the sets, {S+r}rGQ, are pairwise disjoint.
They are also considerably intermingled as is shown in Exercise 3.27.
3.3 Further Properties of Lebesgue Outer Measure □ 117
THEOREM 3.1	0
Lebesgue outer measure, A*, is not finitely additive.
PROOF: Suppose to the contrary that A* is finitely additive. Let {Qn}^=i
be an enumeration of the rationale in (—1,1) and set En = S + qn. Us-
ing the assumed finite additivity of A*, Proposition 3.2, Lemma 3.13, and
Proposition 3.1(c) and (e), we conclude that
z od	\	oo
1 = A*((0,1)) < A* ( (J En ) < 52 A*(En)
'n=l	'	n=l
n	/ 71	\
= lim y'A’CEfc) = lim A* ( I I Ek ) < A* ((-1,2)) = 3.
n—>oo '	n—>OO	\	v	'
k=i	4=i z
(3-12)
This shows 1 < limn-^oo A* (24) < 3. But, by Proposition 3.1(d),
^A*(Efe) = f>*(S + 9fc) = 5>*(5) = nA‘(S)-
fc=l	k=l	k=l
Consequently, 1 < lim^oo nA*(S) < 3, which is impossible (why?). Hence,
A* is not finitely additive.	
COROLLARY 3.2
Lebesgue outer measure, A*, is not countably additive. That is, Condi-
tion (М3) does not hold with A* and A =
EXERCISES 3.3
3.21	Prove the following facts:
a)	For fixed E С 1Z, the function d(x, E) is continuous.
b)	If E and F are subsets of 1Z, then d(E, F) — inf{ d(y, F) : у G E}.
c)	If A С E and В C F, then d(E, F) < d(A, B).
d)	d(E,F) =d(E,F).
3.22	Prove the following facts:
a)	Suppose that F is a closed subset of P, К is a compact subset of P, and
F П К = 0. Then d(F, K) > 0.	.
b)	Show that part (a) is not true if it is assumed only that К is closed.
3.23	Prove Lemma 3.8 on page 111.
3.24	Verify parts (a) and (b) of Lemma 3.10 on page 113.
118 □ Chapter 3 Lebesgue Theory on the Real Line
3.25	Suppose that О is open. Prove that
Л* (W) = Л* (W П O) + Л* (W П Oc)
for all subsets W of TZ.
3.26	Define x ~ у if and only if x - у G Q. Show that ~ is an equivalence
relation.
3.27	Let N be a positive integer, {rn}£Lx an enumeration of Q, and S as in
Lemma 3.12. For each n G AT, define Sn = S + rn- Prove that there is no
open set О with the property that (Jn=o Sn G О and LCLn+i Sn G Oc.
3.28	Suppose that 0 < a < b < 1. Prove that it is possible to select the elements
of the set, S, in Lemma 3.12 so that S G (a, b).
3.29	Provide a detailed justification for each step in (3.12) on page 117.
3.4	LEBESGUE MEASURE
For ease in reference, we repeat once more that we are searching for a
function /j, defined on some collection, A, of subsets of TZ such that
(Ml) /z(/) = 1(1), for all intervals I.
(М2) A is a (7-algebra and A D B.
(М3) If Ai, A2, ... are in A, with А{ A Aj — 0 for i / j, then

We have seen that Conditions (Ml) and (М2) hold with /z = Л* and
A = P(7£), but that Condition (М3) does not (Corollary 3.2 on page 117).
Note, however, that we do not need to have our measure, /z, defined for
all subsets of 7Z; Condition (М2) requires only that it be defined on a
cr-algebra, A, of subsets of TZ that contains the Borel sets.
Thus, one way to get Condition (М3) to hold might be to restrict Л*
to some proper subcollection of subsets of 7?,; that is, select A to be a
proper subset of P(7Z). And, to do that, we need to identify a criterion
for deciding whether a subset of TZ is measurable, that is, is a member
of A. By Condition (М2), we must have В C A; so, in particular, A must
contain all open sets. Hence, the criterion we select must be satisfied by
all open sets.
3.4 Lebesgue Measure □ 119
The Caratheodory Criterion
Theorem 3.9 states that if A and В are subsets of TZ with the property that
there is an open set, O, with А С О and В C Oc, then
A*(A U B) = A*(A) + A*(B).
As a consequence of Theorem 3.9, we obtain the following proposition.
PROPOSITION 3.3
Let О be an open set. Then
A*(W) = A* (TV A O) + A*(TV A Oc)	(3.13)
for every subset W of TZ.
PROOF: For every subset W of TZ, we have W = (IVAO)U(TVAOC). Since
W А О с О and W A Oc C Oc, we see that (3.13) is a simple consequence
of Theorem 3.9.	
Equation (3.13) provides an additivity relation for Lebesgue outer mea-
sure that is satisfied by all open sets. That relation shows the way to the
required criterion for deciding whether a subset of TZ is measurable.
DEFINITION 3.8 Caratheodory Criterion
A set E C TZ is said to satisfy the Caratheodory criterion if
A*(W) = A*(W A E) + X*(W A Ec)	(3.14)
for all subsets W of TZ. We denote by Л4 the collection of all subsets
of TZ that satisfy the Caratheodory criterion.
Note: By Proposition 3.1(e), the inequality
A*(W) < A* (TV A £?) + A* (TV A Ec)
always holds. Consequently, to prove that a subset E of TZ is a member
of M, it suffices to establish the inequality
A* (TV) > A* (TV A E) + A* (TV A Ec)	(3.15)
for all subsets W of TZ.
The next theorem demonstrates that Condition (М2) holds for the
collection, Л4, of subsets of TZ that satisfy the Caratheodory criterion.
120 □ Chapter 3 Lebesgue Theory on the Real Line
THEOREM 3.11
Л4 is a а-algebra and МэВ.
PROOF: That M is closed under complementation is clear. First we prove
that Л4 is closed under finite unions. So, assume А, В G Л4. We claim
that A U В G M.. Let W C 7Z. Then, we must show that
A* (TV) > A*(TV П (A U B)) + A*(TV П (A U B)c).	(3.16)
(See the note following Definition 3.8.)
Now, we can write W П (A U B) = (TV П A) U (TV П Ac П B) and,
consequently, by the subadditivity of A*,
A* (TV П (A U B)) < A*(TV П A) + A*(TV П Ac П B).
Therefore,
A* (TV П (A U B)) + A* (TV П (A U B)c)
< A* (TV П A) + A* (TV П Ac П B) + A* (TV П (A U B)c)
= A* (TV П A) + [A* ((TV П Ac) П В) + Л* ((TV П Ac) П Bc)] .
Because В G Л4, the quantity between the square brackets in the previous
expression equals A* (TV П Ac). Thus,
A* (TV П (A U B)) + A* (W П (A U B)c) < A* (TV П A) + A* (TV П Ac).
This last sum equals A* (TV) because A G Л4. Hence, (3.16) holds. We have
now established that Л4 is an algebra of sets.
Next, we show that Л4 is closed under countable unions. So, as-
sume {En}^Lx с Л4. We must prove that IJJXi G A4. To begin,
we disjointize the sets En, n — 1, 2, .... Let Ai = Ex, A2 = E2 \ Ex,
A3 = E3 \ (Bi U E2), and, in general, An — En\ (U£=i Then, see
Exercise 3.30, At П Aj — 0, for i / j, and (J^Li An = U^=i Moreover,
because M is an algebra of sets and En G M for n G V, it follows that
An G M for n G Af.
Now, let TV be any subset of Л and set E — UXi = UXi ^n- We
must show that A* (TV) > A*(W П E) + A* (TV П Ec). By the subadditivity
of A*,
z OO	\	OO
= А‘[и(ТГПЛп)) <£Г(И'ПАп).
^n=l	'	n=l
(3-17)
3.4 Lebesgue Measure □ 121
For each n eV, set Bn = |Jfc=i Аь Then, because A4 is an algebra,
Bn G M for all n G A/*. Consequently, for all n,
A*(IV) = X*(W A Bn) + X*(W A B„).	(3.18)
Because Bn C Um=i = E, it follows that Ec C B^. This last fact and
(3.18) imply that
A*(W) > X*(W A Bn) + Л*(Ж A £?c).	(3.19)
We will now prove by induction that for all n G Af,
A*(TVOBn) = ^A*(TVnAfc).	(3.20)
k=l
The equation holds trivially when n = 1. So, assume that it holds for n.
As An+i 6 A4, we have
A*(TVoBn+1)-A*((TVDBn+1)nAn+1)
+ A‘((TVnBn+i)HA‘+1).
Because the AfcS are pairwise disjoint, W П Bn+i П An+i — W П An+i and
WQBn+i nA„+1 = WriBn. Thus, by (3.21) and the induction hypothesis,
A*(TV П Bn+1) = A* (IV П An+1) + A* (TV П Bn)
= A*(TV П An+1) + £A*(TV n Ak) = A*(TV О Afc),
k=l	k=l
as required.
Employing (3.19) and (3.20), we conclude that
A*(TV) > 52 A’(TV О At) + A*(TVnBc)
k=l
for all n G N and, consequently,
A‘(TV) > ^A*(WnAn) +A‘(TVn£?c).
n=l
Applying (3.17) to the previous inequality, we deduce that
'	A*(W) > A*(W A E) + A*(W A Ec).
This shows E G M. We have now established that A4 is a cr-algebra.
122 □ Chapter 3 Lebesgue Theory on the Real Line
It remains to prove that Л4 D B. By Proposition 3.3, Л4 contains
all open sets and, as we have just seen, M is a a-algebra. Consequently,
since В is the smallest a-algebra that contains all open sets, it must be that
M D B.	
Our next theorem demonstrates that Condition (М3) is satisfied when
Lebesgue outer measure, A*, is restricted to Л4. We denote by A the
restriction of Lebesgue outer measure to Л4; that is, А: Л4 —> is defined
by A(B) = A*(B).
THEOREM 3.12
If Ai, Л2, ... are in M, with Ai nAj = 0 for i / j, then
A(UA") = Ew-
' n ' n
PROOF: We first prove that A is finitely additive on Л4. So, let А, В E Л4
with А А В = 0. Set W = A U B. Then W A A = A and W A Ac = B.
Consequently, since A e Л4, we have by (3.14) that
A(A U B) = A(W) = A*(W) = A*(W A A) + A*(W A Ac)
= A*(A) + A*(B) =A(A) + A(B).
This shows that A is finitely additive.
Suppose now that {An}^_1 С Л4 with Ai A Aj = 0 for i / j. Using
the fact that A is finitely additive on Л4 and the monotonicity of Lebesgue
outer measure, we conclude that
zn	/ m	\	/ 00
£A(Afc) = A(|jAfc) <A(U An
k—1	^k-1	'	'n—1
for all m E X. Letting m —> 00 gives ^(^n) < A (U^Li ^n)« The re-
verse inequality obtains because of the countable subadditivity of Lebesgue
outer measure.	
Lebesgue Measurable Sets and Lebesgue Measure
From Proposition 3.2 on page 107 and Theorems 3.11 and 3.12, we see that
Conditions (M1)-(M3) are satisfied with /z = A and A = Л4; that is,
3.4 Lebesgue Measure □ 123
(LI) A(/) = £(/), for all intervals I.
(L2) At is a a-algebra and M D B.
(L3) If Ai, A2, ... are in At, with Ai C\Aj=$ for i / j, then
A
Consequently, the set function, А: At 11, is the required extension
of length. We will employ the following terminology:
DEFINITION 3.9 Lebesgue Measurable Sets and Lebesgue Measure
The members of At are called Lebesgue measurable sets. That is,
E is a Lebesgue measurable set if and only if for every subset W of 1Z,
A*(W) = A*(W П E) + A*(W П £?c).
The restriction of Lebesgue outer measure to At is denoted by Л and
is called Lebesgue measure.
In the next few propositions, we will establish some additional prop-
erties of Lebesgue measure and Lebesgue measurable sets.
PROPOSITION 3.4
A subset of 1Z with Lebesgue outer measure zero is a Lebesgue measurable
set; that is, X*(E) = 0 => E 6 M.
PROOF: Suppose that A*(E) = 0. Let W be an arbitrary subset of 1Z.
Since W П E С E, we conclude from the monotonicity of Lebesgue outer
measure that A*(W П E) < A*(E) = 0. Using the fact that W П Ec C W,
we now conclude that
A*(W) > A*(W П Ec) = A*(W П E) + A*(W П Ec).
This last inequality shows that E E Л4.	
PROPOSITION 3.5
Every countable subset of 71 has Lebesgue measure zero.
124 □ Chapter 3 Lebesgue Theory on the Real Line
PROOF: Let E с ft be countable, say E = {xn}“=1. Then we can write
E = U~=i{®n}. Note that if a 6 P, then, by. (LI), A({a}) = A([a, a]) =
a — a = 0. Therefore, applying (L3), we conclude that
/ OO	\	oo
Л(Е)=А( J{xn})=£A({2:n})=0)
'n=l	'	n=l
as required.	
The next proposition shows that the converse of Proposition 3.5 does
not hold.
PROPOSITION 3.6
The Cantor set, P, has Lebesgue measure zero.
PROOF: Let G = [0,1] \ P. From Chapter 2 (page 59), we know that G
can be written as a countable union of disjoint open intervals, {/nKXp
with the property that	= L Hence, by (L3) and (LI),
oo	oo
A(G) =	=	€(/„) = !•
n=l	n=l
Clearly, P and G are disjoint and P U G = [0,1]. Therefore,
1 = A(P) + A(G) = A(P) + 1,
which shows that A(P) = ft.	
Another useful result is the following.
THEOREM 3.13
Suppose that {P'nl^Li is a sequence of Lebesgue measurable sets with
Er С E2 C • • •. Then
x([)En} ^Im^A^).
'71=1	'	П-*00
PROOF: If A(Pn) = oo for some n, then both sides of the previous equation
equal oo. So, assume A(£'n) < oo for all n.
3.4 Lebesgue Measure □ 125
To begin, we disjointize the Ens. Let Аг = Ei and, for n > 2, let
An — En\ En-i- Then it is easy to see that {An}^Lj с M, Ai П Aj — 0
for i j, and |J“ x An = U^Li En. Therefore, by countable additivity,
GOO	\	z OO	\	oo
j£n)=A (|j4n)=£A(An).
»*=1	'	'n=l	'	n=l
Because En-\ C En, we have А(ЛП) = A(En \ -En-i) = A(En) - A(En_i)
for n > 2. Consequently,
0OO	\ oo	n
J En ) = £ A(An) = Jim A(Afc)
,b=l	'	n=l	k=l
(n	\
A(Ei) + V[A(Efe) - A(Efc_x)] = lim A(En),
*	/	n—>oo
k=2	/
as required.	
The Relation Between В and M
We close this section by discussing the relationship between the collection
of Borel sets, B, and the collection of Lebesgue measurable sets, Л4. By
Theorem 3.11, В С Л4. The question now is: Does В = Л4? In other
words, is every Lebesgue measurable set a Borel set or are there Lebesgue
measurable sets that are not Borel sets?
It is not easy to answer that question. In fact, Lebesgue and Borel
argued the question without finding the answer. It turns out that the
answer to the question is no — there are Lebesgue measurable sets that are
not Borel sets. In other words, we have the following theorem:
/
THEOREM 3.14
The a-algebra of Borel sets, B, is a proper subcollection of the a-algebra
of Lebesgue measurable sets, Л4.
PROOF: See Exercise 3.50.	
EXERCISES 3.4
3-30 Let	be any sequence of subsets of 71. Define Ai = Ei, Аъ = Еъ\Ех,
Аз = Ез \ (Ei UE2), and, in general, An = En \ (UZ=i f°r n £ Ap-
prove that Ai П Aj = 0, for i / j, and IJXi = UZ=i
126 □ Chapter 3 Lebesgue Theory on the Real Line
3.31 In Chapter 2, we introduced the concept of measure zero. Prove that this
concept is equivalent to that of Lebesgue measure zero. In other words,
show that a subset ECU has measure zero in the sense of Definition 2.19
on page 85 if and only if X(E) = 0.
★3.32 Verify that if A G M, X(A) = 0, and В C A, then В G M and A(B) = 0.
3.33	Suppose that A, В G M are such that А С В and A(A) < oo. Show that
A(B \ A) = A(B) — A(A).
3.34	Use properties of Lebesgue measure to supply a simple proof that any (non-
degenerate) interval of P is uncountable.
3.35	Suppose that {Bn}^=1 С M and that Ei D E2 D • • •. Also suppose that
A (Bi) < oo. Prove that
A
lim A(Bn).
Can the assumption that A(Bi) < oo be dropped? Why?
3.36	Show that if A, В G M and A(A A B) < oo, then
A(A U B) = A(A) + A(B) - A(A П B).
3.37	Suppose A* (A) = 0.
a)	Show that for any set B, A* (A U B) = A*(B).
b)	Show that if A U В G Л4, then В G Л4.
3.38	Find a sequence of pairwise disjoint sets, {An}^°=1, such that strict inequal-
ity holds in the relation
A*
✓ OO	X	oo
Sl=l	'	n=l
A*(An).
Hint: Is {An}“=1 С M possible?
★3.39 If 0 < a < 1, construct a set, Pa, in a manner similar to that in which
the Cantor set is constructed, except that at the nth step remove open
intervals of length a/3n instead of l/3n. Show that Pa is closed and that
A(Pa) = 1 — a > 0.
3.40	Prove that there is a sequence of continuous functions,	on [0,1]
that converge pointwise to a function f £ B([0,1]). Hint: Use Exercise 3.39.
3.41	Prove that there is a Riemann integrable function on [0,1] that is not a
Borel measurable function. Hint: The proof of Theorem 3.14, which is
carried out in Exercise 3.50, shows there is a subset of the Cantor set that
is not a Borel set.
3.42	Suppose that В G Л4. Show that for each e > 0, there is an open set, O, with
О D В and A(<9 \ B) < €. Hint: First consider the case where A(B) < oo
and use the definition of Lebesgue outer measure.
3.4 Lebesgue Measure □ 127
★3.43 Suppose that E G Л4. Show that for each € > 0, there is a closed set, F,
with F С E and X(E \ F) < 6.
★3.44 A set is called a G^-set if it is the intersection of a countable number of
open sets; and a set is called an Fa-set if it is the union of a countable
number of closed sets. Note that G^-sets and Fa-sets are Borel sets. Now
suppose that E G Л4.
a)	Show that there is a G$-set, G, and an Fa-set, F, such that F С E C G
and A(E \ F) = A(G \ E) = 0.
b)	Referring to part (a), deduce that A(F) = A(E) = A(G).
3.45	Let E G M. Prove that A(E) = inf{ A(O) : О D E, О open}.
3.46	Let E G Л4. Prove that A(E) = sup{ A(K) : К С E, К compact}.
3.47	Let E C TZ.
a)	Suppose there is a Borel set, B, such that В С E and A*(E \ B) = 0.
Show that E G M.
b)	Suppose that A*(E) < oo and that
A*(E) = sup{ A(F) : F С E, F closed }
= inf{ A(G) : О D E, О open }.
Show that E G M.
3.48	Suppose that {En}^! is a sequence of pairwise disjoint Lebesgue measur-
able sets. Prove that
for all subsets A of TZ.
3-49 Suppose that E G M and that A(E) < oo. Show that for each б > 0, there
axe a finite number of pairwise disjoint intervals, /i, /2, • • • Д, such that
★3_50 Prove Theorem 3.14 on page 125. Proceed by establishing each of the
following facts:
a)	If C G M and x G TZ, then C + x G M and A(C + x) = A(C).
b)	Let S be the set defined in Lemma 3.12 on page 116. If C G M and
G C S, then A(G) = 0. Hint: Consider { C + r : r G (-1,1) П Q }.
128 □ Chapter 3 Lebesgue Theory on the Real Line
c)	If D C and A*(Z>) > 0, then there is a nonmeasurable subset of D.
Hint: Let Dr = D A (S + r) for r € Q. Use parts (a) and (b) to show
that if Dr € At, then A(£>r) = 0.
d)	Define f: [0,1] —> TZ by f(x) = x + V’M? where denotes the Cantor
function (see page 77). Then f is a strictly increasing function and maps
[0,1] onto [0,2].
e)	The function, g — f is continuous and, hence, Borel measurable.
f)	f maps the Cantor set onto a set, 4, with A(4) = 1.
g)	Let E C A with E M. [Such an E exists by parts (f) and (c).] Then
f-\E) ем but f-'(E) $B.
3.	51 Prove that the set S defined in Lemma 3.12 on page 116 is not a Lebesgue
measurable set.
3.5 THE LEBESGUE INTEGRAL FOR
NONNEGATIVE FUNCTIONS
Recall that our reason for generalizing the concept of length to all Borel sets
is so that we can extend the Riemann integral to an integral that applies to
all Borel measurable functions. We have, in fact, generalized the concept
of length to all Lebesgue measurable sets. Consequently, we will be able to
extend the Riemann integral to an integral that applies to a much larger
collection of functions than the Borel measurable functions. We will call
that larger collection of functions the Lebesgue measurable functions.
Lebesgue Measurable Functions
There are two ways that we can approach the definition of Lebesgue mea-
surable functions. Here is the first approach: Taking our cue from the
development of the Riemann integral, we begin by defining the integral of
a function of the form s(x) — ^акХЕк- If the E^s are intervals, then s is
a step function and we simply define the integral to equal the Riemann
integral, У^акДЕь)- But now that we have generalized the concept of
length, we can do much better. Provided only that the EkS are Lebesgue
measurable sets, we define the integral to be £2а&А(Е^).
In particular, we see that every function of the form	where E G Л4,
should be a Lebesgue measurable function; that is, should be integrable in
the extended sense. Since we want the collection of all Lebesgue measurable
functions to constitute an algebra and be closed under pointwise limits, we
make the following definition.
3.5 The Lebesgue Integral for Nonnegative Functions □ 129
DEFINITION 3.10 Lebesgue Measurable Functions
We denote by £ the smallest algebra of real-valued functions on 7Z
that contains all functions of the form хе, E e M, and is closed under
pointwise limits. The members of £ are called Lebesgue measurable
functions.
Our second approach to obtain the definition of Lebesgue measurable
functions is by analogy with a characterization of Borel measurable func-
tions. Specifically, as we know from Theorem 3.3 (page 99), a function f is
Borel measurable if and only if the inverse image of each open set under f
is a Borel set; that is, if and only if /~1(О) e В for all open sets O. This
leads to the following definition of Lebesgue measurable functions:
DEFINITION 3.11 Lebesgue Measurable Function
A real-valued function f on ft is said to be a Lebesgue measurable
function if the inverse image of each open set under f is a Lebesgue
measurable set; that is, if /~1(О) e M for all open sets O.
Note: For brevity, we will often indicate that a function is a Lebesgue
measurable function by saying that it is an Ad-measurable function.
It really doesn’t matter whether we use Definition 3.10 or Defini-
tion 3.11 because the two definitions are equivalent (see Exercise 3.53).
But to be specific, we will take Definition 3.11 as our definition of Lebesgue
measurable functions.
Our next proposition, whose proof is similar to that of Lemma 3.5 on
page 98 and is left to the reader, provides some useful equivalent conditions
for a function to be Lebesgue measurable.
PROPOSITION 3.7
Let f be a real-valued function on ft. Then the following statements are
equivalent:
a)	f is M-measurable.
b)	For each a eft, /-1((-оо,а)) e M.
c)	For each a e ft, У”1 ((a, oo)) e M.
d)	For each a eft, oo,a]) e AT
e)	For each a eft, /“1([a, oo)) e M.
130 □ Chapter 3 Lebesgue Theory on the Real Line
Several important properties of Lebesgue measurable functions are
given in the next two theorems. We postpone the proofs of those theo-
rems until Chapter 4, where more general results will be established.
THEOREM 3.15
The collection of Lebesgue measurable functions forms an algebra. That
is, if f and д are Ad-measurable and a ETZ, then
a) f + д is Ad-measurable.
b) f • д is Ad-measurable.
c) otf is Ad-measurable.
THEOREM 3.16
Suppose that f and д are Ad-measurable functions and that {/n}^Li is
a sequence of Ad-measurable functions that converges pointwise to a real-
valued function. Then
a) fV д is Ad-measurable.
b) f Ад is Ad-measurable.
c) limn-^oo fn is Ad-measurable.
The Lebesgue Integral of a Nonnegative
Simple Function
We now begin our extension of the Riemann integral to an integral that
applies to all Lebesgue measurable functions. First we introduce a special
type of Lebesgue measurable function that generalizes the notion of step
functions.
DEFINITION 3.12 Simple Function and Canonical Representation
An jM-measurable function, s, is said to be a simple function if
it takes on only finitely many values; that is, if its range is a finite
set. Let ai, П2> • • •, denote the distinct nonzero values of s and,
for 1 < к < n, set Ak = { x : s(x) — ak }. Then we can write
n
g = акХАк •	(3.22)
/с=1
This is called the canonical representation of s.
3.5 The Lebesgue Integral forNonnegative Functions □ 131
It is easy to see that every step function is a simple function, but not
every simple function is a step function. Also, we leave it as an exercise
for the reader to show that the sets} Ai, A2, ..., An, appearing in the
canonical representation of a simple function are Lebesgue measurable and
pairwise disjoint.
EXAMPLE 3.2 Illustrates Definition 3.12
The function, s = 3x(o,2) +^X(i,3] is a simple function. However, the
given representation in not canonical. In fact, the canonical representation
of s is
5 = 3X(o,i) — 3x{i} + 5x(i,2) “ 4X{2,3} + 2^(2,3) “ §XM\{ 1,2,3}?
as is easily verified.	□
Here is the definition of the Lebesgue integral for a nonnegative simple
function. As already noted, this definition is a natural generalization of the
Riemann integral of a step function.
DEFINITION 3.13 Integral of a Nonnegative Simple Function
Let s be a nonnegative simple function with canonical representation,
s = акХлк • Then the Lebesgue integral of s over is
defined by
[ s(x)dA(x) = f\A(Afc).
fc=i
If E e Л4, then the Lebesgue integral of s over E is defined by
/ s(x)dA(z) = / xe(x)s(x) dA(x).
Je	Jn
The next proposition shows how we can obtain the Lebesgue integral of
a nonnegative simple function from a possibly noncanonical representation.
PROPOSITION 3.8
Let s be a nonnegative simple function that can be expressed in the form,
s ~ IZfcLi ЬкХвк, where this representation is not necessarily canonical but
132 □ Chapter 3 Lebesgue Theory on the Real Line
Bk E M for 1 < к < m and Bi П Bj = 0 for i Ф j. Then
(x)dA(x) = f;6fcA(Bfc).
fc=l
More generally, we have
[ s(x)dX(x) = ^bkX(BkC}E)
k=l
(3.23)
(3-24)
for each E e M.
PROOF: Let s = aiXAi be the canonical representation of s. Also,
set ao = 0 and Aq = {x : $(#) = 0}. Because the B^s are pairwise disjoint,
we know that for each к = 1, 2, ..., m, there is an i (0 < i < n) such that
bfc = Let Di = {k : bk = a;}. Then the L^s are pairwise disjoint,
U?=o A = {1,2,..., m}, and Ai = U/ced -Bfc for 1 < i < n. Consequently,
[ s(x)dX(x) = ^aiXtAi) = ^2ai 52
i=l	i=l k^Di
= 52 E	= 52 52ЬкХ(ы=^ь^в^-
i=o keDi
i=0 k€Di	k=l
Thus, (3.23) holds.
To establish (3.24), we first observe that
771	771
XbS = 52	= У^ЬкХВкГУЕ-
fc=l	k—1
Applying (3.23), we now conclude that
/ s(x)dX(x)= / Xe(x)s(x) dX(x)
Je	J7l
= [ (,Xes)(x) dX(x) = 52 bkX(Bk П E).
This completes the proof of Proposition 3.8.
We should point out that Definition 3.13 really does provide a gener-
alization of the Riemann integral of a step function; that is, the Lebesgue
3.5 The Lebesgue Integral for Nonnegative Functions □ 133
f g(x)dX(x).
[a,b]
integral of a step function equals its Riemann integral. Indeed, suppose
that g is a step function on [a, 6], say g = £X=i akXik, where the I^s are
pairwise disjoint subintervals of [a, b]. Then, by Proposition 3.8,
bn	n
g(x)dx =	= 5?afcA(Zfc)
k=l	fc=i
n
= 52 ak^k П [a, fr]) =
fc=i
A technicality: We have defined the Lebesgue integral only for functions
whose domain is all of TZ] but, the domain of the step function g is only [a, 5].
To remedy this difficulty, define g(x) = 0 for x E [a, b]c.
EXAMPLE 3.3 Illustrates the Integral of a Nonnegative Simple Function
a)	Let s = 3x(-2,-i] + 4X(-i,i) + Then
s(x) dA(x) = 3A((—2, -1]) + 4A((-1,1)) + 8A(AQ
= 3 • 1 + 4 • 2 + 8 • 0 = 11.
b)	Let s = xqc = 1xqc- Then, by Proposition 3.8,
[ s(x) dX(x) = 1A(QC A [0,2]) = 2.
J[0,2]
c)	Let f(x) = 1; that is, f = хтг- Then f(x) dX(x) = X(TZ) = oo. Thus,
the Lebesgue integral of a nonnegative simple function can be oo. □
LEMMA 3.14
Suppose that s and t are nonnegative simple functions and that a, /3 > 0.
Then as + /3t is a nonnegative simple function and
[ [as(x) + (3t(x)] dX(x) = a f s(x)dX(x)+/3 I t(x)dX(x)
Je	J e	J e
for each E E M.
PROOF: Let s = £fc=i akXAk and t = 52^=1 bjXBj be the canonical rep-
resentations of s and t, respectively. Also, let a0 = 0?'A) = { x : s(x) = 0 },
bo = 0, and Bq = {x : t(x) = 0}. For each к (0 < к < n) and
134 □ Chapter 3 Lebesgue Theory on the Real Line
j (0 < j < m), set Ckj = Ak A Bj. Then the CkjS are pairwise
disjoint, s = E"=o akxCki, and t = ££=0 E£=o bjXckj  Hence,
n m
as + pt = Y^aak +	•
fc=oj=Q
This last equation shows that as 4- fit is a nonnegative simple function.
Moreover, by applying Proposition 3.8, we can deduce that
I [as(x) 4- 0t(xy] dX(x)
E
= I (as 4-dA(rr)
J E
= ZZ(aa*+^)A(c^nE)
fc=0j=0
n 771	71 m
- “EEП E) + P£ £Ь,Х(Ск, П E)
k=0 j=0	'	k=0 j=0
as required.
The Lebesgue Integral of a Nonnegative
M-measurable Function
Next we will define the Lebesgue integral of a nonnegative Lebesgue mea-
surable function. Before doing so, however, it is useful for motivational
purposes to prove the following proposition:
PROPOSITION 3.9
a)	Let f be a nonnegative M-measurable function. Then there is a nonde-
creasing sequence of nonnegative simple functions that converges point-
wise to f. In other words, there is a sequence, {sn}^_1, of nonnegative
simple functions such that, for all x e 71, si(x) < $2(х) < ••• and
ИгПп-юо Sn(z) = f(x).
b)	If {sn}^Li is a sequence of nonnegative simple functions that con-
verges pointwise to a real-valued function f, then f is a nonnegative
M-measurable function.
3.5 The Lebesgue Integral forNonnegative Functions □ 135
PROOF:
a)	For each n E X, set
2n —	2n J ’
for m = 1, 2, ..., n2n, and En — {x : /(x) > n}. As f is Lebesgue
measurable, the sets Enmi En are Lebesgue measurable sets. Let
n2n ,
Em — 1
—^ГХЕпгп + nXEn-
m=l
Then {sn}Xi is a sequence of nonnegative simple functions and, clearly,
sn < f for all n G ЛЛ Also, by construction, |/(x) — sn(x)| < 2-n as
soon as n is large enough so that f(x) < n. Thus, sn —> f pointwise.
Next we show that sn < sn+i for all n G ЛЛ Let x G 11. If
x G Enm for some m = 1, 2, ..., n2n, then (m — l)2“n < f(x) < m2~n.
Therefore, either
m — 1
2n
2m — 1
2n+i
or
2m — 1
2n+1
< /(*)
m
2n‘
< /(*) <
In the former case, sn(x) = sn+i(x) = (m — l)2“n and, in the latter
case, sn(x) = (m — l)2”n < (2m — i)2“(n+1) = sn+i(x). Consequently,
in either case, sn(x) < 5n+i(x).
Finally, if x e En, then f(x) > n = (n2n+1)/2n+1. This implies
that sn+i(x) > (n2n+1)/2n+1 = n = sn(x). This completes the proof
of part (a).
b)	This part follows immediately from Theorem 3.16(c) on page 130.	
Proposition 3.9 shows that the functions that can be approximated
by nonnegative simple functions are precisely the nonnegative Lebesgue
measurable functions. With that proposition in mind, we now define the
Lebesgue integrable of an arbitrary nonnegative Ad-measurable function.
DEFINITION 3.14 Lebesgue Integral of a Nonnegative Function
Let f be a nonnegative Ad-measurable function. Then the Lebesgue
integral of f over H is defined by
/ f(x) dX(x) = sup / s(x)dA(z),
Jn	s Jn
(3.25)
136 □ Chapter 3 Lebesgue Theory on the Real Line
where the supremum is taken over all nonnegative simple functions
that are dominated by /. If E e Л4, then the Lebesgue integral
of f over E is defined by
[ f(x)dX(x)= f XE(x)f(x)dX(x).
Je	Jn
(3.26)
Note that (3.25) makes sense for any nonnegative function, Lebesgue
measurable or not. Thus, we might ask: Why define the Lebesgue inte-
gral only for nonnegative Lebesgue measurable functions; why not define
it for any nonnegative function? The reason lies in the previous proposi-
tion, Proposition 3.9. For, if f is not Lebesgue measurable, then it cannot
be approximated by a sequence of nonnegative simple functions. Hence,
the quantity on the right-hand side of (3.25) will generally not reflect the
behavior of /.
We should mention that there are several widely used notations for the
Lebesgue integral: fE f (x) dA(x), fE /(x) dx, fE f(x)X(dx), and fE f dX all
denote the Lebesgue integral of f over E. By the way, we will refer to the
Lebesgue integral simply as “the integral” when there is no possibility of
confusion.
PROPOSITION 3.10
Let f and g be nonnegative Lebesgue measurable functions, а > 0, and
E e AL Then
&) f <g=> fEfdX< fEgdX.
b)	Ac E and AeM^ fAfdX< fEfdX.
c)	f(x) = 0 for all x с E => fE f dX = 0.
d)	A(E) = 0=> fEfdX = 0.
e)	JEafdX = afEfdX-
PROOF:
a)	Suppose that s is a nonnegative simple function that is dominated by
XeJ- As f < g, it follows that s is also dominated by xe9- Therefore,
( XEfdX =
тг
sup
0<3<XEf
s simple
I sdX<
тг
s simple
sup I sdX
0<8<XEff Jn
as required.
3.5 The Lebesgue Integral for Nonnegative Functions	□ 137
b)	If A C E, then ха/ < Xe/- Thus, by part (a),
[ fdX= [ XAfdX< [ XEfdX= [ fdX,
J A Jn	Jn	Je
as required.
c)	If f(x) = 0 for all x G E, then xeJ = 0- Thus, the only nonnegative
simple function dominated by XEf is 0- Consequently, f^XEf dX = 0;
that is, fEfdX = 0.
d)	The function xeJ is zero off of E. Therefore, any nonnegative simple
function, s, that is dominated by xeJ must also be zero off of E. In
other words, $(#) = 0 for x G Ec. Since X(E) = 0, we have A(A) = 0
for all subsets A of E. It now follows that if s is a nonnegative simple
function that is dominated by XEf, then f^sdX = 0. Hence,
/ f dX= sup / s dX = 0.
Je o<s<XEf Jn
s simple
e)	If a = 0, then there is nothing to prove. So, assume that a > 0.
Clearly, the required result holds for simple functions. Now, let s be a
nonnegative simple function that is dominated by xe ’ (afY Then we
have 0 < a"1 s < XEf and, hence, by part (a),
a-1 f sdX = [ a~1sdX< f XEfdX= f f dX.
Jn Jn	Jn Je
Thus, f^sdX < a fEfdX for each nonnegative simple function, s, that
is dominated by xe • (a/)- This last fact implies that
/ afdX = sup / sdX<a fdX.
Je	o<s<XE-(af) Jn	Je
s simple
On the other hand, let s be a nonnegative simple function that is dom-
inated by xe/- Then we have 0 < as < xe • (a/)- Therefore, by
part (a),
a / sdX = / asdX< / XE*(»/)dA = / af dX.
Jn Jn Jn	Je
Thus, a f^sdX < fE af dX for each nonnegative simple function, s,
that is dominated by xe/- Consequently,
a f dX = a • sup / sdX< af dX.
Je	o<s<XEf Jn Je
s simple
This completes the proof of part (e).	
138 □ Chapter 3 Lebesgue Theory on the Real Line
EXERCISES 3.5
3.52	Let G = {xe'-EeM}. Show that Q is closed under pointwise limits.
3.53	Prove that Definitions 3.10 and 3.11 (page 129) are equivalent by proceeding
as follows: Let
? = { f : /”1(O) 6 M for all open sets О }.
We must prove that У = £.
a)	Why do we know that F is an algebra of functions and is closed under
pointwise limits? Hint: See Theorems 3.15 and 3.16 on page 130.
b)	Show that if E G M, then xe 6 F.
c)	Deduce from parts (a) and (b) that F D C.
d)	Show that T7 C £ by employing a suitable modification of the proof given
in Lemma 3.7 on page 99.
3.54	Explain why every Borel measurable function is a Lebesgue measurable
function. Is the converse true? Why?
3.55	Prove Proposition 3.7 on page 129. Hint: Refer to the proof of Lemma 3.5
on page 98.
3.56	Suppose that f is Ad-measurable.
a)	Show that /~2(В) G M for all В G B.
b)	True or False: f~r(E) G M for all E G Ad. Hint: Refer to Exer-
cise 3.50(g) on page 128.
3.57	Verify that if f is Ad-measurable, then { x : f(x) = a } G Ad for each a 6 TZ.
Show that the converse is not true. Hint: Let К be a nonmeasurable set;
that is, К Ad. Construct a function, /i, such that {rr : h(x) = a} G Ad
for each a E and h”1 ((0, oo)) = K.
3.58	Show that if f is Ad-measurable, then so is \f\.
3.59	Show that every step function is a simple function but not every simple
function is a step function.
3.60	Suppofee that the sets, Ai, A2, . -., An, are the ones appearing in the canoni-
cal representation of a simple function, s. Prove that those sets are Lebesgue
measurable and pairwise disjoint.
3.61	Theorem 3.16(c) on page 130 indicates that if	is a sequence of Ad-
measurable functions converging pointwise to /, then f is Ad-measurable.
What can be said if the family of functions is indexed by an uncountable
set? Specifically, supppse that {/t}te(o,oo) is a family of Ad-measurable
functions that converges pointwise to f; that is, lim*_00/*(□?) = f(x) for
all x 6 H. Is f necessarily Ad-measurable?
3.62	Let E be a Lebesgue measurable set with A(B) < 00. Suppose that {fn}^=1
is a sequence of Lebesgue measurable functions that converges pointwise
on E to a function f. Prove that for each pair of positive numbers, e and <5,
there is an N G AT and a Lebesgue measurable set A С E such that A(A) < 6
3.5 The Lebesgue Integral forNonnegative Functions □ 139
and \f(x) — fn(x)\ < e for x G E \ A and n > N. Hint: Let Em = {x G E :
|f(x) — fn(x)\ > e for some n > m} and apply Exercise 3.35.
★3.63 Egorov’s theorem: The following result shows that, in a certain sense,
pointwise convergence of measurable functions is close to being uniform
convergence: Let E be a Lebesgue measurable set with X(E) < oo. Suppose
that {/n}^! is a sequence of Lebesgue measurable functions that converges
pointwise on E to a real-valued function f. Prove that for each 5 > 0, there
is a Lebesgue measurable set В С E with A(B) < 6 such that fn —* f
uniformly on E \ B. Hint: Apply Exercise 3.62 with e replaced by 1/k and
6 replaced by 6/2k.
+3.64 Prove the following facts:
a)	Suppose that F is a nonempty closed subset of "R, and О is a proper open
subset of 7£. Further suppose that F С O. Then there is a continuous
function, /, such that f(R) C [0,1], f(F) — {1}, and f(Oc) = {0}.
Hint: f can be constructed from the functions, d(- , F) and d(- ,OC).
b)	Let E G M. Then there is a sequence of open sets, {On}^1? and a
sequence of closed sets, {Fn}^^, such that for all n G AT, Fn С E C On,
Fn C Fn+i, On Э On+i, and А((П~ , On) \ (U~ i Я»)) = 0.
c)	Let E 6 AL Then there is a Lebesgue measurable set, B, with A(B) = 0,
and a sequence of continuous functions,	with 0 < gn < 1 for
all n 6 AT, such that limn->oo gn(x) = xe(x) for each x G Bc.
d)	Let s be a simple function with |s(z)| < M for all x G where M is a
real number. Then there is a sequence of continuous functions,
and a Lebesgue measurable set, B, such that A(B) = 0, |^n(a;)| < M for
x G and n G AT, and limn—oo gn(x) — s(x) for x G Bc.
e)	Let f be a nonnegative A4-measurable function that is bounded by the
real number M and vanishes outside of a finite interval, [—L, L]. Then
there is a Lebesgue measurable set В C [—L, L] with A(B) = 0 and a
sequence of continuous functions,	such that 0 < gn(x) < M for
x G H and n G AT, gn(x) = 0 for x	[—L - l/n,L + 1/n] and n G AT,
and limn—oo gn(x) = /(x) for x G Bc. Hint: Define
Ejk = lxe [—L,L] :	< f(x) < (J + 1-)— |
I	AC	AC J
and set Sk = ^T~XEjk- Then {sfc}fc is a sequence of simple func-
tions with 0 < Sk < f and f — Sk < M/к. By part (d), there is a
sequence of continuous functions,	and a Lebesgue measurable
set, Cfc, such that A(Cfc) = 0, |pnfc| < Af, and limn—oognk(x) = Sk(x)
for x £ Ck- Furthermore, the gnk$ can be chosen so that they vanish
outside of [—L — 1,L 4- 1]. Now apply Exercises 3.62 and 3.63 to the
sequence {ы}“=1-
f)	Let f be a nonnegative A4-measurable function that is bounded by the
real number M. Then there is a Lebesgue measurable set, B, with
140 □ Chapter 3 Lebesgue Theory on the Real Line
A(B) = 0 and a sequence of continuous functions, {дп}™=^ such that
each gn vanishes outside a finite interval, 0 < gn(x) < M for я G and
n G Af, and limn—oo gn(x) = f(x) for x e Bc. Hint: Apply part (e) and
Exercise 3.62 to the function fn = X[-n,n]/-
g)	Let f be a nonnegative Ad-measurable function. Then there is a se-
quence of nonnegative continuous functions,	and a Lebesgue
measurable set, B, with A(B) = 0, such that lim^oo gn(x) = f(x) for
x G Bc. Hint: Apply part (f) to the function F = f /(1 4- /).
3.65	Let f be an Ad-measurable function. Then there is a sequence of continuous
functions, {pnKXn and a Lebesgue measurable set В with A(B) = 0 such
that limn-+oo gn(x) = f(x) for x G Bc. Hint: Use the fact that f = f + — f~,
where /+ = / VO and f~ = —(/Л0), and apply Exercise 3.64(g).
3.66	Lusin’s theorem: The following result shows that, in a certain sense,
a measurable function is close to being a continuous function: Let / be
a Lebesgue measurable function and E a Lebesgue measurable set with
A(B) < oo. Assume that А ({я G E : /(x) = ±oo}) = 0. Prove that for
each e > 0, there is a Lebesgue measurable set A C E with A(A) < e such
that / is continuous on E \ A. Hint: Employ Exercises 3.65 and 3.63.
3.67	Suppose that / is a nonnegative Ad-measurable function and that E G Ad.
a) Let c > 0 and set Ac = { x G E : f(x) > c}. Prove that
A(AC) < i f fdX.
C J E
b) Let A = { x G E : f(x) > 0}. Show that if fEfdX = 0, then A(A) = 0.
3.68	Suppose that / is a nonnegative Ad-measurable function and that E G Ad.
Let A = { x G E : f(x) = oo }. Show that if fEfdX< oo, then A(A) = 0.
3.6	CONVERGENCE PROPERTIES OF THE LEBESGUE
INTEGRAL FOR NONNEGATIVE FUNCTIONS
An important problem in mathematics is to determine when it is permis-
sible to interchange a limit and an integral. For example, suppose that
{/nj^Li is a sequence of functions that converges pointwise. Under what
conditions can we conclude that
I lim fn = lim [ fn?
J n—>oo	n—>oo J
As we noted at the end of Chapter 2, one significant advantage of
the Lebesgue integral over the Riemann integral is that the interchange
of limit and integral can be justified under less restrictive conditions. In
this section and the next, we will develop theorems that provide sufficient
conditions for the interchange of those two operations.
3.6 Convergence properties of the integral for nonnegative functions □ 141
Monotone Convergence Theorem
The first theorem that we will discuss is called the monotone convergence
theorem, or MCT for short. We begin with the following lemma.
LEMMA 3.15
Suppose that s is a nonnegative simple function and that	a
sequence of Lebesgue measurable sets with	С E2 C • • •. Then,
I sdX = lim I s dX.
n~*(X> JEn
PROOF: For convenience, set E — UJXi En. Since s is a simple function,
we can write s = akXAk- Then, by Proposition 3.8 on page 131, we
have for each n G A/*,
I s dX —	A En)-
^En	k=i
Now, for each к = 1, 2, ..., m, consider the sequence, {Ak A
Lebesgue measurable sets. Since E^ С E2 C • • • and E = U^Li it
follows that Ak О Ei C Ak О E2 C • • • and U^=i	^n) = Ak П E.
Thus, by Theorem 3.13 on page 124, limn—oo X(Ak A En) = A(Afc A E), for
each к (1 < к < m). Consequently,
*7П	771
lim / sdX = lim akX(Ak A En) = ak lim X(Ak^En)
n—>OO	n—.OO *—*	'	П—too
JE’>	fc=l	k=l
771	л
= ХакХ(АкПЕ) = / sdX.
k=i
This completes the proof of the lemma.
Before we state and prove the monotone convergence theorem (MCT),
it will be useful to introduce two common conventions. First, if the integral
of a function f is over all of 11, then the 1Z is often omitted; in other words,
by convention, J* f dX = J^fdX.
Second, we sometimes write fn f f to indicate that {/n)Xi *s a
monotone nondecreasing sequence of functions that converges pointwise to
the function /. And, likewise, we sometimes write fn | f to indicate that
is a monotone nonincreasing sequence of functions that converges
pointwise to the function f.
142 □ Chapter 3 Lebesgue Theory on the Real Line
THEOREM 3.17 Monotone Convergence Theorem (MCT)
Suppose that	is a monotone nondecreasing sequence of nonneg-
ative Lebesgue measurable functions that converges pointwise to a real-
valued function; in other words, for each x €
0 < fi(x) < f2(x) <	< fn(x) <
and lirrin—oo fn(x) < 00Л Then
I lim fn dX = lim [ fn dX
JE n—^oo	n^ooJE
for each E € M.
PROOF: For convenience, set f = linin-^o fn. For each E € АЛ, we have
0 < Хе/п T XEf- Hence, it suffices to prove the theorem for E = H. Be-
cause fn < fn+i for all n € Af, Proposition 3.10(a) on page 136 implies that
ffndX< j/n+i dX for all n € Thus, lim^oo J* fn dX exists (possibly
infinite). Let L = lim^^ f fn dX.
We must show that L = f f dX. First, fn < f for all n € Af, so it
follows immediately that
To establish the reverse inequality, let 0 < а < 1 and s be a nonnegative
simple function dominated by f. Set En = {x :	> as(x)} for each
n € Af. Since /i < /2 < • • •, it is clear that Ei С E2 C • • •. Also, because
0 < a < 1, /п T /, and 0 < s < /, it follows that UXi = H- Applying
Proposition 3.10(e) and Lemma 3.15, we conclude that
a sdX = a lim / s dX = lim / as dX
Jn n^°° JEn	n^°° J En
< limsup / fndX< lim / fndX = L.
n—>oo JEn	n—>00
Consequently, J's dX < a"1 L for each nonnegative simple function, s, that
is dominated by f. This implies that
/ fdX= sup / sdX<a~1L^
Jn Q<s<f Jn
s simple
t Since, for each x G И, {fn(x)}^=1 is monotone nondecreasing, limn-*oo fn(x) exists
but it may be 00. We assume here that the limit is finite for each x 6 'Jt although, as
we will learn in Chapter 4, the theorem is also true without that restriction.
3.6 Convergence properties of the integral for nonnegative functions □	143
for each 0 < a < 1. Letting a | 1 yields f f dX < L. This completes the
proof of the theorem.
Note: For a fixed E E the conclusion of the MCT remains valid if the
hypotheses are satisfied only on E. (See Exercise 3.72.)
Proposition 3.10 on page 136 lists several properties of the Lebesgue
integral for nonnegative functions. Conspicuous by its absence is the addi-
tivity property. By employing the MCT and Proposition 3.9 on page 134,
that important property can now be established.
PROPOSITION 3.11
Let f and g be nonnegative Lebesgue measurable functions. Then
(f + ff)dX = [ fdX + f gdX
for each E € M.
PROOF: We first observe that, by Lemma 3.14 (page 133), the additivity
property holds for simple functions. Next, we use Proposition 3.9 to select
sequences of nonnegative simple functions, {sn}^=1 and	such that
sn T f and tn g. Noting that sn + tn T f + g, we can apply the MCT and
Lemma 3.14 to conclude that
[ tf + 9)dX= lim / (sn + tn) dX
J m	n—> 00 J m
= lim I sndX+ lim I tndX= I fdX+ I gdA,
as required.
By induction, it follows immediately from Proposition 3.11 that if
{A}fc=i a finite sequence of nonnegative Ad-measurable functions, then
for each E € Л4. However, with the aid of the MCT, we can prove the
following stronger result.
144 □ Chapter 3 Lebesgue Theory on the Real Line
THEOREM 3.18
Suppose that	is a sequence of nonnegative Lebesgue measurable
functions such that fn converges to a real-valued function) Then,
for each E € M,
PROOF: For convenience, set f = fn and let gn = fk for
each n € ЛЛ Then {pn}^=i is a monotone nondecreasing sequence of
nonnegative Lebesgue measurable functions and gn J f. Thus, by the
MCT and (3.27),
У fdX = lim^ [ gndX = Jim~ [ ^fkdX
E
~JEk=X
/Efndx'
n=l JE
n
k=lJE
as required.
COROLLARY 3.3
Let f be a nonnegative Lebesgue measurable function and {En}n be a
sequence of pairwise disjoint Lebesgue measurable sets. Then
L fdx=x[ fd^
J\JnEn n JEn
In particular, if А, В € M and A A В = 0, then
[ fdX= [ fdX+ [ fdX.	(3.28)
Улив	J A	JB
PROOF: Because the Ens are pairwise disjoint, X|j e = U,nXEn and,
S°’ XUn En ’ $ = ^n(XEn/)« Therefore, by Theorem 3.18,
/dA = fax^'fdX = f^XEJ)dX
n
= X [xEjdX^ [ fdX.
n J	n
The proof of Corollary 3.3 is now complete.	
1 See the footnote on page 142.
3.6 Convergence properties of the integral for nonnegative functions □	145
Remark: Equation (3.28) generalizes the property of Riemann integrals
that we presented in Theorem 2.6(a) on page 84.
Further Convergence Properties
The MCT shows that it is permissable to interchange limit and integral for
monotone nondecreasing sequences of nonnegative Lebesgue measurable
functions. Two additional questions concerning integrals and sequences of
nonnegative functions come to mind.
Question 1: Suppose that {/n}^=i is a monotone nonincreasing sequence
of nonnegative Lebesgue measurable functions; in other words, for each
x G 7£,
AW > Л(х) > • • > Ш > • • • > 0 .
Is it true that the limit and integral can be interchanged, that is, does
(3.29)
The answer to Question 1 is no — in general, the limit and the integral
cannot be interchanged! For example, define fn(x) == \x\/n for each x G R
and n G ЛЛ Then fn | 0 pointwise; thus, linin-^o fn dX = 0. But, it is
easy to see that fndX = oo for all n G M and, so, lim^oo fn dX = oo.
Consequently, (3.29) fails in this case.
With an additional condition, however, we can answer Question 1 in
the affirmative. Specifically, we have the following theorem.
THEOREM 3.19
Suppose that {fn}^=i is a monotone nonincreasing sequence of nonnegative
Lebesgue measurable functions. Further suppose that f Д dX < oo. Then
for each E e M.
PROOF: For convenience set, f = limn-^o /n. As {/n}^Li is monotone
nonincreasing, {Д - /n}^Li is a monotone nondecreasing sequence of non-
negative Lebesgue measurable functions. Therefore, by the MCT,
[ (A-/)dA= lim [ (f!-fn)dX.
E	n-*°° JE
146 □ Chapter 3 Lebesgue Theory on the Real Line
Because ffidX < oo and fn < fi for all n € AT, Proposition 3.10 on
page 136 implies that fEfndX < oo for n € AT and E € At. Also, by
Proposition 3.11,
f fidX= / ((A-/„) + /n)dA= / (/i-/n)dA+ / fndX.
e Je	Je	Je
Consequently, we see that fE(fi — fn) dX = fE fi dX — fE fn dX. This last
equality also holds when fn is replaced by f.
It now follows that
[ fidX — [ fdX= [ fcdX- lim [ fndX.
Je Je Je n-*oo jE
Since all integrals in the previous equation are finite, the proof of the
theorem is now complete.	
Question 2: Suppose that {/niXi is a sequence of nonnegative Lebesgue
measurable functions that converges pointwise to a real-valued function.
Does a general relationship hold between the sequence, {/B fn <^}n=1,
and the number, fE lim^^ fn dX ? Of course, Irnin-^o fn dX need not
exist and, so, (3.29) may not even make sense. The most that one can say
in general is related by the following theorem.
THEOREM 3.20 Fatou’s Lemma
Suppose that	is a sequence of nonnegative Lebesgue measurable
functions that converges pointwise to a real-valued function. Then, for each
EtM,
[ lim /ndA<liminf [ fndX	(3.30)
JE П—^OQ	П—OO JE
PROOF: For convenience, set f = limn_>oo fn and let gn = infk>nfк for
each n € АЛ Then {<7n}^=i is a monotone nondecreasing sequence of
nonnegative Lebesgue measurable functions and gn T f pointwise (why?).
Thus, by the MCT,
I f dX = lim / gn dX.
Je n~¥OC Je
However, since gn < fn for each n € A, it follows from Proposition 3.10(a)
that
lim / 0ndA<liminf / fndX.
n-*°° J e	n“*°° Je
The proof of Fatou’s lemma is now complete.	
3.6 Convergence properties of the integral for nonnegative functions □ 147
EXAMPLE 3.4 Illustrates Strict Inequality in Fatou's Lemma
This example shows that the inequality in (3.30) cannot be replaced by an
equality. For each define
{X[n,n+1] J
X[n,n+2]?
n odd;
n even.
Then fn —► 0 pointwise and, hence, in particular, limn-^o fn dX = 0.
But,
1, n odd;
2, n even.
and, so, liminfn^oo fn dX = 1. Thus, we see that
[ fndX =
Jn
dX.
lim fn dX < lim inf / fn
П-+ОО	n—>OO
Consequently, the inequality in Fatou’s lemma cannot be replaced by an
equality.	□
EXERCISES 3.6
3.69 Let f be a nonnegative Lebesgue measurable function. Show that
lim / fdX= I fdX.
n“*°° J[-n,n] Jn
3.70 Let f be a nonnegative Lebesgue measurable function. For each n G A/",
define fn = f /\n. Prove that limn—oo fE fn dX = fE f dX for each E G M.
it3.71 Prove that Lemma 3.15 holds for all nonnegative A4-measurable functions.
That is, if f is a nonnegative Lebesgue measurable function and
is a sequence of Lebesgue measurable sets with Ei G E2 C • • •, then .
I fdX= lim / fdX.
</и°°Лп	JEn
V^7l=l
3.72	Show that for a fixed E G A4, the conclusion of the MCT remains valid if
the hypotheses are satisfied only on E. In other words, let E be a Lebesgue
measurable set and	a sequence of nonnegative Lebesgue measurable
functions that is monotone nondecreasing on E and converges pointwise to
a real-valued function on E\ that is, for each x G E,
0 < fl(x) < f2{x) <	< fn{x) <
148 □ Chapter 3 Lebesgue Theory on the Real Line
and limn-oo fn(x) < oo. Prove that
I lim fn dX = lim / fn dX.
JEn^oo
3.73	Provide an example where strict inequality holds in Fatou’s lemma and
where limn—oo f fn dX exists.
3.74	Let {fn}^=1 be a sequence of nonnegative M-measurable functions such
that fn—>f pointwise and ffndX—> f f dX < oo. Show that for each
E e Л4, fE fndX —► fE f dX. Hint: Use Fatou’s lemma and the inequality,
limsupn_>oo(an + bn) > limsupn_>oo an + lim infn—+OO bn-
3.75	Suppose f is a nonnegative M-measurable function and {En}^=1 С M
with Ei Э £?2 D • • •. Further suppose f^fdX< oo. Prove that
I fdX= lim / fdX.
n^JEn
Hint: Apply Theorem 3.19.
3.76	Supply a proof for the following improved version of Fatou’s lemma: Sup-
pose that	is a sequence of nonnegative Lebesgue measurable func-
tions. Then
I lim inf fn dX < lim inf / fn dX
JE n^°°	n“*°° JE
for each E e M.
+3.77 Suppose f is a nonnegative .M-measurable function with ^fdX < oo.
Then we define the Laplace transform of /, denoted, F, by
F(t)= [ e~txf(x)dX(x), t > 0.
J [0,oo)
Show that
a)	F is real valued.
b)	F is continuous on [0, oo). Hint: First establish that F is nonincreasing,
c) lim*-»
oo F(t) = 0.
3.78	Establish the following results:
a)	If О is an open set, then
A(O) = sup { J^f dX: Q < f <xo and f continuous } .
Hint: Consider fn(x) = (d(a;, Oc)/[1 -I- d(x, Oc)]) n , for n € ЛЛ
3.7 The General Lebesgue Integral □ 149
b)	If F is a closed set, then
A(F) = inf y/dA : / > xf and f continuous j-.
Hint: If A(F) < oo, select an appropriate open set, О D F, and consider
fn(x) = (d(x, Oc)/[d(x, Oc) + d(x, F)])n, for n e ЛЛ
3.79	In the next section, we will see how to define the Lebesgue integral for
Lebesgue measurable functions that are not necessarily nonnegative. As-
suming that can be done, construct a sequence of Lebesgue measurable
functions for which the conclusion of Fatou’s lemma fails. Hint: A se-
quence consisting of characteristic functions and negatives of characteristic
functions will do the trick.
3.7	THE GENERAL LEBESGUE INTEGRAL
Up to this point, we have defined the Lebesgue integral only for nonnegative
Lebesgue measurable functions. In this section, we will define the Lebesgue
integral for arbitrary Lebesgue measurable functions and present some of
its most important properties.
Definition of the General Lebesgue Integral
Basically, the Lebesgue integral of an arbitrary Ad-measurable function, /,
is obtained as follows: (1) express f as the difference of two nonnegative
functions and (2) define the Lebesgue integral of f to be the difference of
the Lebesgue integrals of the two nonnegative functions. To make this idea
precise, we begin by defining the positive and negative parts of a function.
DEFINITION 3.15 Positive and Negative Parts of a Function
Suppose that f is a real-valued function. Then the positive part
of /, denoted by /+, is defined by
/+ = f VO = max{/, 0}
and the negative part of /, denoted by is defined by
f~ = -(/ A 0) = - min{/, 0}.
Note that both /+ and f~ are nonnegative functions. Proposition 3.12
states some other basic properties of the positive and negative parts of a
function. The proof of the proposition is left as an exercise for the reader.
150 □ Chapter 3 Lebesgue Theory on the Real Line
PROPOSITION 3.12
Suppose that f is a real-valued function on H. Then
b)	1/1 =/+ + /-.
c)	If f is Lebesgue measurable, then so are and f .
We now see that if f is a Lebesgue measurable function, then it can
be expressed as the difference of two nonnegative Lebesgue measurable
functions; namely, f = f + — f~. Consequently, it is quite natural to define
the Lebesgue integral of an arbitrary Lebesgue measurable function in the
following way:
DEFINITION 3.16 Lebesgue Integral; Lebesgue Integrable
Let f be a Lebesgue measurable function and E € Л4. Then the
Lebesgue integral of f over E is defined by
[ f(x)dX(x) = f /+(x)dA(x)- i f~(x)dX(x) (3.31)
J E	J E	J E
provided that the right-hand side makes sense; that is, at least one of
the integrals on the right-hand side of (3.31) is finite. In addition, we
say that f is Lebesgue integrable over E if both integrals on the
right-hand side of (3.31) are finite or, equivalently, if
( |/(a:)|dA(a:) = f f+(x) dX(x) 4- f f (x) dA(x) < oo. (3.32)
E	JE	JE
If f is Lebesgue integrable over 11, then we say that f is Lebesgue
integrable.
EXAMPLE 3.5 Illustrates Definition 3.16
a) Let
/(-> = {
x > 0;
x < 0.
Then f dX is not defined. Indeed, /+ = X[o,oo) and f~ = X(-oo,0)> so
that both dX and f^f~dX are infinite. However, the Lebesgue
integral of f is defined (and, in fact, f is Lebesgue integrable) over
any Lebesgue measurable set with finite measure. For instance, if
3.7 The General Lebesgue Integral □ 151
E = [—3,4], then fE /+ dX = 4 and fE f dX = 3, so that we have
JE f dX = 4 - 3 = 1 and fE |/| dX = 4 + 3 = 7 < oo.
b) We can generalize part (a): If f is a bounded Lebesgue measurable
function, then f is Lebesgue integrable over any measurable set, E,
with X(E) < oo. For, if | < L, then by Proposition 3.10(a),
c) Let
[ |f|dA < [ LdX
e Je
= f LXEdX = LX(E)
Jn
/(x) =
2,
< -3,
,o,
0 < x < 1;
x > 1;
elsewhere.
Then f+ = 2x(o,i) and f~ = 3x[i,oo) and, so, fnf+dX = 2 and
f~ dX = oo. This implies that f is not Lebesgue integrable over 1Z al-
though the Lebesgue integral is defined and JR/dA = 2 — oo = —oo. □
Properties of the General Lebesgue Integral
The next theorem provides some important properties of Lebesgue inte-
grable, functions. In proving this theorem, we will employ the following
lemma whose proof we leave as an exercise for the reader.
LEMMA 3.16
Suppose that f is a Lebesgue measurable function and that E € Al. Fur-
ther suppose that f = /i — /2, where /1 and /2 are nonnegative and
Lebesgue integrable over E. Then f is Lebesgue integrable over E and
f fdX= [ fcdX- [ f2dX.
e	J e	J e
THEOREM 3.21
Suppose that f and g. are Lebesgue integrable over E G At and that а € H.
Then
a)	f 4- д is Lebesgue integrable over E and
[ (f + g)dX = [ fdX+ [ gdX.
E	J E	J E
152 □ Chapter 3 Lebesgue Theory on the Real Line
b)	a f is Lebesgue integrable over E and
/ afdX = a / fdX.
JE	JE
c)	f <9=> JEfdX< JEgdX.
d)	\fEfdX\<fE\f\dX.
e)	If A and В are measurable subsets of E with АПВ = 0, then
[ fdx= [ fdX + [ fdX.
JAUB	J A	JB
PROOF:
a)	Since \f 4-g\ < \ f\ -I- |p|, it follows from Proposition 3.10(a) on page 136
and Proposition 3.11 on page 143 that f 4- g is Lebesgue integrable
over E. Now, we have f 4- g = (У+ 4- p+) — (/“ 4- p“). Hence, by
Lemma 3.16 and Proposition 3.11, we conclude that
f(f + g)dX= f(f++g+)dX — f (f-+g~)dX
J E	J E	J E
= [ f+dX+ [ g+dX —[ f~dX- [ g~ dX
JE	JE	JE	JE
= [ f+dX— [ f-dX+ [ g+dX- f g~ dX
JE	JE	JE	JE
= [ fdX+ [ gdX.
JE JE
b)	Since \af\ = |а||У|, Proposition 3.10(e) implies that af is Lebesgue
integrable over E. If a > 0, then (a/)+ = af+ and (af)~ = af~.
Thus, by Proposition 3.10(e) again,
[ afdX= [ af+dX- [ af~ dX
E	J E	J E
= a [ f+dX-a [ f~dX = a [ fdX.
JE	JE	JE
If a < 0, then (a/)+ = — oif and (a/) = —af+. Consequently, by
Proposition 3.10(e),
[ afdX= [ (-af-)dX- [ (-af+)dX
E	JE	JE
= a( [ f^dX— [ f~dXj =a i f dX.
\J E	J E	/	J E
3.7 The General Lebesgue Integral □ 153
с)	/ <p=>p-/>0=> fE(g - /) dX > 0 by Proposition 3.10(a). Now
applying parts (a) and (b), we deduce that
[ gdX — [ fdX — f (g — f)
E	JE	JE
dX > 0.
In other words, fEfdX< fEg dX,
d)	Because f < \ f\ and — f < |/|, we can use parts (b) and (c) to conclude
that
[ fdX< [ \f\dX
E	JE
and - [ fdX< [ \ f\dX.
JE	JE
These last two relations imply that |/E /dA| < fE \ f\ dX.
e)	Since A С E, \xa/\ < \xe)\ and, so, \f\ dX < fE \f\ dX. Thus, f is
integrable over A or, equivalently^ Ха/ is integrable over E. Similarly,
Xef is integrable over E. However, because А П В = 0, we have that
XaubJ = XAf + XBf- Therefore, by part (a),
[ XAuefdX= [ XAfdX+ [ XBfdX.
JE	JE	JE
Since A and В are subsets of E, the previous equation is equivalent to
AUB
fdX= [ fdX+ [ fdX.
J A	J В
This completes the proof of the theorem.
Remark: Parts (a) and (b) of Theorem 3.21 together imply that if a, /3 €
and f and g are Lebesgue integrable over E, then
[ (af + (3g)dX = a f fdX + p[ gdX.
E	J E	J E
This is called the linearity property of the Lebesgue integral.
The next theorem, called the dominated convergence theorem, or DCT
for short, is one of the most important theorems in analysis. Like the mono-
tone convergence theorem, it gives sufficient conditions for the interchange
of limit and integral.
154 □ Chapter 3 Lebesgue Theory on the Real Line
THEOREM 3.22 Dominated Convergence Theorem (DCT)
Suppose that {/n}^=i is a sequence of Lebesgue measurable functions that
converges pointwise to a real-valued function. Further suppose that there
is a nonnegative Lebesgue integrable function, g, such that \fn\ < g for all
n 6 AT- Then
/ lim fn dX = lim / fn dX
for each E 6 M.
PROOF: For convenience, set f = limn__»oo fn- Because \ fn\ < g and g is
Lebesgue integrable, it follows that /, /i, /2, • • • are Lebesgue integrable.
Now, g —	> 0 for all n 6 Af and g — fn -+ g — f pointwise. Thus, by
Fatou’s lemma (page 146) and the linearity of the integral,
dX < lim inf / (p — fn) dX
n-°° Je
= lim inf
n—ЮО
/ gdX — limsup / fndX.
Je n-ЮО Je
Since the previous integrals are all finite, we conclude that
lim sup / fndX <
n—>00 Je
(3.33)
On the other hand, we also have that g + fn > 0 for all n € Af and
g + fn 9 + f pointwise. Applying Fatou’s lemma again, we obtain the
relations
f gdX+ [ fdX = [ (g + f)dX
Je Je Je
< lim inf / (g + fn)dX= / pdA-hliminf / fndX
n-°° Je	Je 71-400 Je
or, in other words,
[ /dA<liminf [ fndX.	(3.34)
Je 71-400 Je
From (3.33) and (3.34), we see that
lim sup / fndX = lim inf / fn dX
n—>00 Je	n—юо JE
I fdX.
E
3.7 The General Lebesgue Integral □ 155
This last fact implies that lim^oo fE fn dX exists and
[ fdX = lim [ fndX,
JE	n—oo JE
as required.	
Note: For a fixed E € M, the conclusion of the DCT remains valid if the
hypotheses are satisfied only on E. (See Exercise 3.87.)
EXAMPLE 3.6 Illustrates the DCT
a)	In general, the conclusion of the DCT may fail if there is no dominating
integrable function, g. For instance, let fn = nX(6,±)« Then fn —► 0
pointwise. Moreover, ffndX = 1 for all n E Л'. Thus,
[ lim fn dX = 0 / 1 = lim [fn dX.
J n—+OO	n—ЮО J
The problem here is that there is no integrable function that dominates
the sequence {/n}Xr
b)	Let fn(x) = xnX[o,i] (я) for x € 7£, n e ЛЛ Then fn —> X{i} pointwise.
Now, \fn\ < X[o,i] for all n € Af and, clearly, X[o,i] is Lebesgue integrable.
Thus, by the DCT,
lim I xnd,X(x)= [ X{i}(x)dA(x) = A({1}) =0.
n“*°° ./[0,1]	</[0,1]
Note: Theorem 3.23 on page 157 provides a simpler way to obtain this
result.	□
There are many corollaries of the DCT. Two of the most important
are stated in what follows. The proofs of these two corollaries are left as
exercises for the reader.
COROLLARY 3.4
Suppose that {/n}Xi is a sequence of Lebesgue measurable functions such
that 52X1 \fn\ converges to a Lebesgue integrable function. Then 52X1 fn
is Lebesgue integrable and
г 00	_°° г
JEn=l	n=lJb
for each E 6 A4.
156 □ Chapter 3 Lebesgue Theory on the Real Line
COROLLARY 3.5
Let f be a Lebesgue integrable function and {En}n a sequence of pairwise
disjoint Lebesgue measurable sets. Then
Un En	n J En
The Lebesgue Integral is an Extension
of the Riemann Integral
We will now establish that the Lebesgue integral is indeed an extension
of the Riemann integral. In other words, we will show that a Riemann
integrable function is also Lebesgue integrable and that the two integrals
are equal. First, we need the following lemma.
LEMMA 3.17
Let f be a bounded Lebesgue measurable function on [a, 6]. Then f is
Lebesgue integrable over [a, b] and, moreover,
/ f(x) dX(x) = sup / s(x) dX(x)
J[a,b]	s<f J[a,b]
s simple
(3.35)
and
[ /(*) dX(x) = inf
J [a,6]
t simple
f t(x) dX(x)
[a,b]
(3.36)
PROOF: Example 3.5(b) on page 151 shows that f is Lebesgue integrable
over [a,b\. We will prove (3.35). The proof of (3.36) is similar and is left
as an exercise.
First note that if s is a simple function with s < f, then, by Theo-
rem 3.21(c) on page 152, f^a bjsdX < b] f dX. Consequently,
sup / s(x)dX(x)< /	/(x)dA(x).
[a.,6]	' •/ [<X,b]
s simple
It remains to prove the reverse inequality. To accomplish that, we will
construct a sequence {5n}^=1 of simple functions with sn < f for all n € N
and
I fdX= lim / sndX.	(3.37)
J[a,b] n-°° J[a,b]
3.7 The General Lebesgue Integral □ 157
Set L = sup { | f(x) | : x E [a, 6] }. Then f 4- L is nonnegative and Lebesgue
measurable on [a, 5]. Applying Proposition 3.9 (page 134), we obtain a
sequence {un}^=1 of nonnegative simple functions such that un T f 4- L,
Setting sn = un — L, we see that {«n}^Li is a sequence of simple functions
such that sn < f for all n E V and sn f pointwise on [а, 6]. Furthermore,
because |sn| < L on [a, b], the DCT implies that (3.37) holds.	
THEOREM 3.23
Suppose that f is Riemann integrable on [a, b]. Then f is Lebesgue inte-
grable on [a, b] and
[ /(a) dX(x) = [ f(x) dx.
v/[a,b]	J а
PROOF: To begin, we extend the domain of f to all of 1Z by defining
/(#) = 0 for x E [a, b]c. Now, since f is Riemann integrable on [a, 6], it is
bounded thereon. So, to prove that f is Lebesgue integrable on [a, b], it
suffices to show that f is Lebesgue measurable (why?).
Let О be an open subset of 11. We must verify that /~1(О) e Л4. Set
2?={xE7£:/is discontinuous at x }. We have
/-i(O) = (/-^O) A E) U (Г т(0) A Ec).	(3.38)
Clearly, E C [a, b] and, consequently, by Theorem 2.7 on page 86, X(E) = 0.
But, every subset of a set with Lebesgue measure zero is Lebesgue measur-
able (Exercise 3.32). Hence, the first intersection on the right of (3.38) is
a Lebesgue measurable set.
Next we show that the second intersection on the right of (3.38) is
a Lebesgue measurable set. To begin, note that /~1(O) A Ec = /^С(О).
Now, by the definition of E, the function f\£C is continuous. Therefore,
by Theorem 2.5 on page 66, /^(O) is an open subset of Ec. In view
of Theorem 2.3 on page 62, there is an open subset U of 1Z such that
/|^l(O) = U A Ec. Both sets in this last intersection are Lebesgue measur-
able (why?). Thus, f-\O) Г\ЕС e Л4.
To complete the proof of the theorem, we must prove that the Riemann
and Lebesgue integrals of / over [a, b] are equal. First recall that every step
function is a simple function and that the Riemann and Lebesgue integrals
agree for step functions (because the Lebesgue measure of an interval is
the length of the interval). Applying Lemma 3.17 and the definition of the
158 □ Chapter 3 Lebesgue Theory on the Real Line
Riemann integral, we now obtain that
sup
9<f
& step function
sup / s(x)dX(x)
s<f J[a,b]
s simple
= / /(*) dX(x) =
J [a, b]
inf
t>f
t simple
/ t(x) dX(x)
J[a,b]
inf
h>f
h step function
l*b	rb
I h(x) dx = I f(x) dx.
a	J a
These relations imply that
dx = /	/(x)dA(x),
J[a,b]
as required.
We have now verified that the Lebesgue integral is indeed a general-
ization of the Riemann integral. Consequently, we will frequently denote
the Lebesgue integral of f over [a, b] by
b
f(x) dx
regardless of whether f is Riemann integrable over [a, b]. In other words,
the notation for the Riemann integral is also used for the Lebesgue integral.
Moreover, as previously mentioned, we will often write f(x) dx instead
of Je/W ^(z).
EXAMPLE 3.7 Illustrates the Lebesgue and Riemann Integrals
a)	By Theorem 3.23,
r	rb	&n+i _ an+i
/ xndX(x)= / xn dx —-------------------.
J[a,b]	J а	П 4- 1
b)	Clearly, xq is Lebesgue integrable over [0,1]. However, it is not Rie-
mann integrable on [0,1] because it is discontinuous everywhere.
c)	Define f(x) = 1/y/x, for 0 < x < 1, and zero otherwise. Note that f has
only two discontinuities and, hence, the set of points of discontinuity of f
3.7 The General Lebesgue Integral □ 159
has measure zero. But, f is not Riemann integrable on [0,1] because it
is not bounded. It is, however, Lebesgue integrable on [0,1], as we will
now show. For each n e let fn = X[i/n,i]/- Then fn is Riemann
integrable on [0,1] and, so, by Theorem 3.23,
/ /n(a:)dA(rc) = [ fn(x)dx = [ ^==2-2x/n 4
./[0,1]	Jo	J± Vх
Now, {fn}^=1 is a monotone nondecreasing sequence of nonnegative
Lebesgue measurable functions and fn —> f pointwise. Applying the
MCT, we conclude that
[ f(x) dX(x) = lim I fn(x) dX(x) = 2 < oo.
•/[0,1]	n-°° J[o,i]
Hence, f is Lebesgue integrable over [0,1].
□
EXERCISES 3.7
3.80	Prove Proposition 3.12 on page 150.
3.81	Determine the positive and negative parts of the following functions:
a) sinx. b) x2 — 4. c) |z|.
3.82	Prove Lemma 3.16 on page 151.
3.83	Show that if f is Lebesgue integrable (over 7£), then it is Lebesgue integrable
over E for each E € M.
3.84	Prove Corollary 3.4 on page 155.
3.85	Prove Corollary 3.5 on page 156.
3.86	Suppose that f is Lebesgue integrable over E and that	is a se-
quence of Lebesgue measurable sets with Ei С E2 C • • • and	~
Prove that
I fdX = lim / fdX.
Je n-*°° JEn
3.87	Let E be a Lebesgue measurable set and {fn}™=1 a sequence of Lebesgue
measurable functions that converges pointwise on E to a real-valued func-
tion. Suppose that g is Lebesgue integrable over E and that \fn(x)| < g(x)
for n € AT, xtE. Prove that
I lim fn dX = lim / fn dX.
JEn-^oo
160 □ Chapter 3 Lebesgue Theory on the Real Line
3.88	Bounded convergence theorem (BCT): Let E 6 M with A(F) < oo
and {/n}^! a sequence of Lebesgue measurable functions that converges
pointwise on E to a real-valued function. Further suppose that there is an
M Gfa such that |/n(x)| < M for n G A/*, x E E. Show that
lim fn dX = lim / fndX.
71—‘OO	71—‘OO
3.89	Construct an example where \fn\ < M for all n G AT, fn—>f pointwise but
J fndX f fdX. Why doesn’t this contradict the BCT?
3.90	Complete the proof of Lemma 3.17 on page 156 by establishing (3.36).
*3.91 Theorem 3.23 on page 157 shows that every Riemann integrable function is
also a Lebesgue integrable function. This refers only to the proper Riemann
integral. In this exercise, we will exhibit a function that has an improper
Riemann integral but is not Lebesgue integrable. Let
/(®) = 1
sinx
x '
1,
x 0;
x = 0.
Show that
a)	f has an improper Riemann integral over 7Z equal to 7Г.
b)	f is Lebesgue measurable.
c)	f is not Lebesgue integrable over IV
3.92	Show that, if f is Lebesgue integrable (over TV) and the improper Riemann
integral exists, then f(x) dX(x) = f(x} dx.
3.93	Prove that the results of Exercise 3.77 on page 148 remain valid if f is
Lebesgue integrable over [0, oo).
3.94	Let {fn}^=1 and	be sequences of Lebesgue measurable functions
and E G AL Suppose that on E, \fn\ < gn, fn —> f, and gn —> g. Fur-
ther suppose that g, <?i, c/2, • • • are Lebesgue integrable over E and that
fE gn dA — JEg dX. Prove that jE fn dX-> fEf dX.
3.95	For E ClZ and a G 7£, let E+a = { x + a : x G E } and aE = { ax : x G E }.
Suppose that f is a Lebesgue integrable function.
a)	Show that
I f(x -h a) dX(x) = I f(x)dX(x)
Jit	Jit
and, if a / 0,
I f(ax)dX(x) = pr / f(x)dX(x).
Jit	lai Jit
Hint: Start with the case / = Xa, where A G M.
3.8 Lebesgue Almost Everywhere □ 161
b)	Show that, for E E M,
I f(x + a) dX(x) — I f(x) dX(x)
and, if a 0,
/(ax) dA(x) = j^i Д /(X) dA(a:)-
3.96	Consider a function F: 7Z x I —♦ 7£, where I is a nonempty open interval.
Suppose that dF/dt exists at each point of 7Z x Z, F(-,to) is Lebesgue
integrable for some to € Z, and there is a Lebesgue integrable function G
such that 137(ж,t)| < G(x) for x € 7Z and t E I. Prove that F(-,t) is
Lebesgue integrable for each t E I and that
F(x,t)dX(x) = j ^L(x,t)dX(x).
3.97	Consider a function F.7Z x T —* 7Z, where T C 71. Suppose that F(-,t) is
Lebesgue measurable for each t E T and that there is a Lebesgue integrable
function g such that |F(rr,t)\ < g(x) for x E 7Z and t E T. Establish the
following:
a)	If F(x, •) is continuous on T for each x E 7Z, then the function defined
on T by f(t) = f F(x, t) dX(x) is continuous.
b)	If T is an interval of the form (6, oo) and if lim*—oo F(x,t) exists for each
x E 7Z, then
lim / F(x, t) dX(x) = / lim F(x, t) dX(x).
t-^ooj	J t—*oo
3.98	Provide an example to show that the conditions given in the DCT are not
necessary for the interchange of limit and integral.
3.8	LEBESGUE ALMOST EVERYWHERE
Frequently, we are not concerned whether a certain property holds ev-
erywhere as long as it holds “most places.” For example, in order for a
bounded function, /, to be Riemann integrable on [a, b], it does not have
to be continuous everywhere on [a, b] — all that is required is that the set
of points at which f is discontinuous have measure zero.
Consider, also, the sequence of functions, /n(x) = X[-1,1] (#)#n, for
n E Л'. That sequence of functions does not converge pointwise on 7Z, but
it almost does. Indeed, fn(x) X{i}(#) except when x = —1. As the
162 □ Chapter 3 Lebesgue Theory on the Real Line
Lebesgue (or Riemann) integral is not affected by the value of a function
at a single point, the lack of convergence of {fn}^-i at x = — 1 should
really not disturb any convergence results involving the integral.
In this section, we will define the concept of a property holding almost
everywhere and show that our previous results for the Lebesgue integral
remain valid when “everywhere” is replaced by “almost everywhere.”
DEFINITION 3.17 Lebesgue Almost Everywhere
A property is said to hold Lebesgue almost everywhere, or A-ae
for short, if it holds except on a set of Lebesgue measure zero, that is,
except on a set N with A (TV) = 0.
EXAMPLE 3.8 Illustrates Definition 3.17
a)	Two functions, f and g, are equal Lebesgue almost everywhere, written
f = 9 A-ae, if A ({ x : g(x) / Дж) }) = 0.
b)	A sequence of functions, {/n}£Li> converges Lebesgue almost everywhere
to f, written fn—*f A-ae, if limn_4oo fn(x) = f(x) except on a set of
Lebesgue measure zero. In other words, fn -> f A-ae if and only if
A ({ x : lim^oo /n(x) / f(x) }) = 0.	□
Out first proposition demonstrates that a function equal almost every-
where to a Lebesgue measurable function is itself Lebesgue measurable.
PROPOSITION 3.13
Suppose that f is a Lebesgue measurable function and that g = f A-ae.
Then g is Lebesgue measurable.
PROOF: Set В = { x : g(x) = f(x) } and let О be an open set. We claim
that ^~1(O) E M. To begin, we write
g~\O) = (5-40) П B) U П Bc).	(3.39)
We will show that both intersections on the right of (3.39) are Lebesgue
measurable sets. As g = f on B, it follows that ^~1(O) Г\В = /-1(О)ПВ.
However, this last intersection is a Lebesgue measurable set because В € M
(why?) and f is Lebesgue measurable. Hence, the first intersection on the
right of (3.39) is a Lebesgue measurable set.
3.8 Lebesgue Almost Everywhere □ 163
Now, by assumption, A(BC) = 0. Therefore, the second intersection
on the right of (3.39) is a Lebesgue measurable set because it is a subset
of a set having Lebesgue measure zero.	
Our next result shows that the collection of Lebesgue measurable func-
tions is closed under almost-everywhere limits. More precisely, we have the
following proposition.
PROPOSITION 3.14
Suppose that {/n}Xi a sequence of Lebesgue measurable functions and
that fn —* f A-ae. Then f is a Lebesgue measurable function.
PROOF: Set В = { x : lim^oo fn(x) =	}. Then A(BC) = 0. Let
9n = Хв/п and д — x&f. Then	is a sequence of Al-measurable
functions and gn —* g pointwise. Hence, by Theorem 3.16(c) on page 130,
g is Al-measurable. But f = g A-ae and, consequently, f is Af-measurable
by Proposition 3.13.	
Remark: We should point out that Propositions 3.13 and 3.14 are not
valid for Borel measurable functions. This is because subsets of Borel sets
of measure zero are not necessarily Borel spts. (See Exercise 3.99.)
Next, we will prove that the Lebesgue integral of a function is not
affected by changing its values on a set of measure zero.
PROPOSITION 3.15
Let f and g be Lebesgue measurable functions with f = g A-ae. If f is
Lebesgue integrable, then so is g and, moreover, for each E € M,
PROOF: Set В = {x : g(x) — f(x)}. Then, by assumption, A(BC) = 0.
Applying Corollary 3.3 on page 144 and Proposition 3.10(d) on page 136,
we find that
Therefore, g is Lebesgue integrable.
164 □ Chapter 3 Lebesgue Theory on the Real Line
Now, let E G M. Then, by Theorem 3.21(e) on page 152,
E
ЕПВ
ЕПВС
ЕПВ
I gdX
EC\BC
(3.40)
ЕГ\ВС
I gdx.
ЕПВС
E
We will complete the proof of the proposition by showing that the last
two integrals in (3.40) are zero. Employing Theorem 3.21(d) and Proposi-
tion 3.10(d), we deduce that
ЕПВ1
(	|/|dA = o.
ЕПВС
Similarly, JEfV3cgdX = 0.
We often encounter functions that are only defined Lebesgue almost
everywhere. Since the integral of a Lebesgue measurable function is not
affected by its values on a set of measure zero, it is reasonable to make the
following definition.
DEFINITION 3.18 Integral of a Function Defined Almost Everywhere
Suppose that f is a function defined Lebesgue almost everywhere; that
is, if D is the domain of /, then X(DC) = 0. Further suppose that there
is a Lebesgue measurable function, g, such that g(x) = f(x) for x G D.
Then, for E G At, we define the Lebesgue integral of f over E by
[ fdX = [ gdX
E	JE
provided that the integral on the right-hand side exists (i.e., the in-
tegrals of the positive and negative parts of g over E are not both
infinite).
Finally, we should point out that Fatou’s lemma and the DCT remain
valid if the hypothesis of pointwise convergence is replaced by convergence
A-ae. The proofs are left as exercises for the reader.
3.8 Lebesgue Almost Everywhere □ 165
f№ = |
EXERCISES 3.8
*3.99 Show that a subset of a Borel set of Lebesgue measure zero is not neces-
sarily a Borel set. Hint: Refer to Exercise 3.50 on page 127.
*3.100 Show that Proposition 3.13 (page 162) fails for Borel measurable functions.
3.101	For Lebesgue measurable functions, f and g, define f ~ g if and only if
f = g A-ae. Prove that ~ is an equivalence relation.
3.102	Respond True or False to each of the following statements. Justify your
answer.
a) If f is continuous A-ae, then f is equal to a continuous function A-ae.
b) If f is equal to a continuous function A-ae, then f is continuous A-ae.
3.103	Let	be a sequence of Lebesgue measurable functions such that
limn->oo fn (x) exists A-ae. Define
lim /n(x), if lim fn(x) exists;
n—ЮО	n—юо
0,	otherwise.
Prove that f is Lebesgue measurable.
3.104	Verify that Definition 3.18 is well posed. That is, assume g and h are
Lebesgue measurable functions that equal f on its domain, D. Show that,
for E € Ad, either fEhdX = fEgdX or neither integral exists.
3.105	Show that the DCT (page 154) remains valid if convergence pointwise is
replaced by convergence A-ae. In other words, suppose that {fn}™-! is a
sequence of Lebesgue measurable functions that converges A-ae to a real-
valued function. Further suppose that there is a nonnegative Lebesgue
integrable function, p, such that \fn\ < g for all n € A/*. Prove that
I lim fn dX = lim / fn dX
Je n^°°	n^°° Je
for each E € Л4. Note: limn—oo fn is not defined on all of 71 unless, of
course, {fn}™=1 converges everywhere.
3.106	Show that Fatou’s lemma (page 146) remains valid if convergence pointwise
is replaced by convergence A-ae.
3.107	Verify that Egorov’s theorem, Exercise 3.63 on page 139, remains valid if
fn~*f A-ae on E.
3.108	Let f and g be Ad-measurable functions with f\f — g\dX = 0. Prove that
f = g A-ae.
3.109	Show that, if f is Lebesgue integrable and fE f dX = 0 for each E € Л4,
then f = 0 A-ae.
Henri Lion Lebesgue
(1875-1941)
Henri Lebesgue was born at Beauvais, France,
on June 28, 1875. He attended the Ecole Nor-
male Superieure in Paris between 1894 and
1897, where he was a student of fenile Borel.
He worked on his doctoral thesis between 1899
and 1902 while teaching mathematical science
at the 1усёе in Nancy and received his doctor-
ate from the Sorbonne in 1902,
Lebesgue did research in many different areas of mathematics, among
which were function theory, set theory, and the calculus of variation. His
and Emile Borel's work provided the foundation for the modern theory
of functions of a real variable,
Lebesgue’s interest in Riemannian integration and its associated prob-
lems led to his creation of the Lebesgue integral in 1902. Not only has
the Lebesgue integral been important to the amplification of the theory
of trigonometric series, curve rectification, and calculus, but it has also
proved central to the development of measure theory.
Many honors were bestowed upon Lebesgue. Among these were the
Prix Houllevique in 1912, the Prix Poncelet in 1914, and the Prix Sain-
tour in 1917. He was elected to the French Academy of Sciences in 1922
and to the Royal Society in 1934.
Lebesgue taught at the University of Rennes from 1902-1906; at the
University of Poitiers from 1906-1910, at the Sorbonne from 1910-1921;
and, finally, at the College de France. He died in Paris on July 26, 1941.
166
□ □
Measure Theory
In Chapter 3, the collection of continuous functions was expanded to the
collection of Borel measurable functions, the smallest algebra that contains
the continuous functions and is closed under pointwise limits. We then
extended the Riemann integral so that it applies to all Borel measurable
functions and, in doing so, we encountered Lebesgue measure, the collection
of Lebesgue measurable functions, and the Lebesgue integral.
We will discover, in this chapter, that the concepts and methods of
Chapter 3 lend themselves to considerable generalization with relatively
little effort and huge rewards. This generalized theory has extensive ap-
plications throughout mathematics and, as well, to a large variety of fields
outside of mathematics.
4.1 MEASURE SPACES
When we examine the definition of the Lebesgue integral carefully, we find
that it depends ultimately on the concept of measure. More precisely, the
mathematical framework requires a set, a a-algebra of subsets, and a set
function that assigns to each set in the a-algebra a nonnegative number (its
167
168 □ Chapter 4 Measure Theory
measure). In Chapter 3, this consisted, respectively, of 7£, Л4, and A. But
we can abstract the mathematical framework to provide a broader setting
for the integral. We begin by considering the general concept of measure.
In developing Lebesgue measure, we imposed three conditions; namely,
Conditions-(Ml )-(M3) on page 105. The first two conditions are specific to
the generalization of length; but the third is not. In fact, Condition (М3),
the countable-additivity condition, is the primary property of an abstract
measure.
DEFINITION 4.1 Measure, Measurable Space, Measure Space
Let Q be a set and A a a-algebra of subsets of Q. A measure, /z, on A
is an extended real-valued function satisfying the following conditions:
a) /z(A) > 0 for all A G A.
b) m(0) = O.
c) If Ai, A2, ... are in A, with Ai ClAj = 0 for i / J, then
м(ил«) =
' n ' n
The pair (Q, A) is called a measurable space and the triple (Q, A, /z)
is called a measure space.
Note: We will often refer to members of A as А-measurable sets.
We should point out the following fact: If /z satisfies (a) and (c) of
Definition 4.1, then it is a measure (i.e., also satisfies (b)) if and only if
there is an A G A such that /z(A) < 00. We leave the proof of this fact to
the reader.
EXAMPLE 4.1 Illustrates Definition 4.1
a)	(7£, Л4, A) is a measure space, the one that we studied in Chapter 3.
b)	(7J, B, A|#) is a measure space.
c)	Suppose that (Q,A,/z) is a measure space and that D G A. Define
Ap = {D П A : A G A} and /zp = /Z|XD• Then Ap is a a-algebra
of subsets of D, /zp is a measure on Ap, and, hence, (D, Ap,/zp) is a
measure space.
d)	Referring to part (c), let Q = 7£, A = M, /z = A, and D = [0,1].
Then ([0,1], At[0,1], A[o,ij) is a measure space. A[0,i] *s ca^ed Lebesgue
measure on [0,1]. More generally, if D is any Lebesgue measurable
4.1 Measure Spaces □ 169
set, then (D, A4p,Ap) is a measure space and Ap is called Lebesgue *
measure on D.
e)	Refer to part (c). By Theorem 3.7 on page 102, if D G B, then we have
BD = B(D).
f)	Let Q be a nonempty set and A = P(fl). Define /z on A by
„(E)-“Eis“‘e;
[ oo, if E is infinite,
where N(E) denotes the number of elements of E. Then ц is a measure
on A and is called counting measure.
g)	Let Q = Af, Л = P(Af), and /z be counting measure on Л, as defined
in part (f). Then, for instance, /z(Af) = oo and /z({l, 3}) = 2. We will
see later that (Af, P(Af),/z) is the appropriate measure space for the
analysis of infinite series.
h)	Suppose that (Q,X,/z) is a measure space. If /z(Q) = 1, then (fl, A, /z)
is called a probability space and /z a probability measure. Fur-
thermore, /z is usually replaced by a P (for probability). Two simple
examples are as follows:
(i) ([0,1],Л4[од], A[o,i]) is a probability space since A([0,1]) = 1. It
is an appropriate measure space for analyzing the experiment of
selecting a number at random from the unit interval.
(ii) Consider the experiment of tossing a coin twice. The set of possible
outcomes for that experiment is fi = {НН, HT, TH, TT} where, for
instance, HT denotes the outcome of a head on the first toss and
a tail on the second toss. Set A = P(fl) and, for E G Л, define
P(E) = 7V(P)/4 where, as before, N(E) denotes the number of
elements of E. Then (П,Л, P) is a probability space—the appro-
priate measure space to use when the coin is balanced (i.e., equally
likely to come up heads or tails). To illustrate: The probability
of getting at least one head in two tosses of a balanced coin is
P({HH, HT, TH}) = 3/4.
i) Let fi be a nonempty set, {rrn}n a sequence of distinct elements of Q,
and {an}n a sequence of nonnegative numbers. For E C Q, define
m(E) = £ a”>
xnEE
where the notation, ^,XnEE, means the sum over all indices, n, such that
xn G E. Then /z is a measure on P(Q) and, consequently, (Q,P(Q),/z)
is a measure space. Here are two special cases:
(i) If Q is countable, {xn}n is an enumeration of Q, and an = 1 for
all n, then the measure, /z, defined in (4.1) is counting measure.
170 □ Chapter 4 Measure Theory
(ii) If the sequence, {rrn}n, consists of only one element, say xq, and if
ao = 1, then the measure, /z, defined in (4.1) takes the form
/БЛ fl, if Xq e E;
^> = (0, if^E.
This measure is denoted by 6XQ and is called the unit point mass
or Dirac measure concentrated at Xq. Note that is a
probability measure.
j) Let (Q, A) be a measurable space such that {x} G A for each x G Q. A
measure, /z, on A is called discrete if there is a countable set К С Q
such that /z(JCc) = 0. It is not too difficult to show that if /z is a discrete
measure, then we can write /z =	See Exercises 4.6
and 4.19 for more on discrete measures.	□
The following theorem provides some important properties of mea-
sures. We leave the proof as an exercise for the reader.
THEOREM 4.	1
Suppose that (Sl,A,p) is a measure space and that A and В are A-
measurable sets. Then the following hold:
a)	If /z(A) < 00 and А С В, then p(JB \ A) = p(B) — /z(A).
b)	Ac В => /z(A) < /z(B). (monotonicity)
c)	If {En}„=1 C A with EiD E2 D • • • and /z(Ei) < 00, then
lim /z(En).
n—*00
d)	If {£n}^=i G A with Ei С E2 C  , then
lim м(-Еп)-
n—>00
e)	If {En}n C A, then

This property is called countable subadditivity.
4.1 Measure Spaces □ 171
Almost Everywhere and Complete Measure Spaces
Recall from Section 3.8 that a property holds Lebesgue almost everywhere
(A-ae) if it holds except on a set of Lebesgue measure zero. That concept
can be generalized to apply to any measure space.
DEFINITION 4.2 Almost Everywhere
A property is said to hold p almost everywhere, or //-ae for short,
if it holds except on a set of //-measure zero, that is, except on a set N
with p(N) = 0.
Note: Several terms are used synonymously for “almost everywhere.” Here
are a few: almost always, for almost all x G fl, and, in probability
theory, almost surely, with probability one, and almost certainly.
Proposition 3.4 on page 123 implies that subsets of Lebesgue measur-
able sets of Lebesgue measure zero are also Lebesgue measurable sets. On
the other hand, Exercise 3.99 on page 165 indicates that there exist subsets
of Borel sets of Lebesgue measure zero that are not Borel sets.
Those two facts have relevance to almost-everywhere (ae) properties
of measurable functions. For instance, by Proposition 3.13 on page 162, if
f is Lebesgue measurable and g — f A-ae, then g is Lebesgue measurable.
However, as Exercise 3.100 on page 165 shows, that result is not true for
Borel measurable functions.
We now see that it is important to know whether subsets of sets of
measure zero are measurable sets. Hence, we make the following definition.
DEFINITION 4.3 Complete Measure Space
A measure space, (П,Л, //), is said to be complete if all subsets of
Л-measurable sets of //-measure zero are also Л-measurable; in other
words, if A e Л and //(A) = 0, then В G Л for all В C A.
Thus, (7£, Л1, A) is a complete measure space while (7£, B, A|#) is not
a complete measure space.
The following theorem shows that any measure space can be extended
to a complete measure space. We leave the proof of the theorem as an
exercise for the reader.
172 □ Chapter 4 Measure Theory
M(F) = {
THEOREM 4.	2
Let (Q, Д, p) be a measure space. Denote by A, the collection of all sets of
the form В U A where В G A and А С C for some C G A with p(C) = 0.
For such sets, define p(B UA) = p(B). Then A is a a-algebra, p is a
measure on A, and (fl, A, p) is a complete measure space. Furthermore,
Ac A and р|Л = p. (Q, Л, p) is called the completion of (Q, A, p).
It can be shown that the measure space, (7£, Л4, A), is the completion
of the measure space, A|#). See Exercise 4.16.
EXERCISES 4.1
4.1	Suppose that (Q, Д, p) is a measure space and that D is an Л-measurable
set. Define Ad — { DO A : A G A } and pp = р|Лг). Show that (D, Ad, Pd)
is a measure space.
4.2	Let Q be a nonempty set and Д = P(Q). Define p on Л by
N(E), if E is finite;
oo, if E is infinite,
where N(E) denotes the number of elements of E. Prove that p is a measure
on A.
4.3	Consider the experiment of selecting a number at random from the closed
interval [—1,1].
a)	Construct an appropriate probability space for this experiment.
b)	Determine the probability that the number selected exceeds 0.5.
c) Determine the probability that the number selected is rational.
4.4	Let (П,Л) be a measurable space, p and v measures on Л, and а > 0.
Define set functions, p 4- v and cup, on A by
(g + i/)(A) = ц(А) + u(A), (aii)(A) = ац(А).
a)	Show that p + v is a measure on Д.
b)	Show that ap is a measure on A.
4.5	Let (Q, A) be a measurable space, {pn}^! a sequence of measures on A,
and {an}^=1 a sequence of nonnegative real numbers. Define anPn
on A by
OO	\	oo
У7 ) и)= У? &niin(A).
n=l	'	n=l
Prove that QnPn is a measure on A.
+4.6 Refer to Example 4.1(j). Let (Q,X) be a measurable space and suppose
that {ж} G A for each x G Q. Show that a measure p on Л is discrete if and
only if there is a countable subset К of Q such that p = YLxek
4.1 Measure Spaces □ 173
4.7	Suppose that a balanced coin is tossed three times.
a)	Construct a probability space for this experiment in which each possible
outcome is equally likely.
b)	Determine the probability of obtaining exactly two heads.
c)	Express the probability measure, P, as a finite linear combination of
Dirac measures.
4.8	Let Q be a nonempty set, {xn}n a sequence of distinct elements of Q, and
{fln}n a sequence of nonnegative real numbers. For E C Q, define
= 52 “n-
xn GE
a)	Show that p is a measure on P(Q).
b)	Interpret the ans in terms of the measure, p.
c)	Express p as a linear combination of Dirac measures.
4.9	Suppose that two balanced dice are thrown.
a)	Construct a probability space for this experiment in which each possible
outcome is equally likely.
b)	Use part (a) to determine the probability that the sum of the dice is
seven or 11.
c)	Construct a probability space for this experiment in which the outcomes
consist of the possible sums of the two dice.
d)	Use part (c) to determine the probability that the sum of the dice is
seven or 11.
4.10	Prove Theorem 4.1.
4.11	Let (0,Л) be a measurable space. A measure, p, on A is called a finite
measure if /i(Q) < oo. A measure space, (0,Л,/1), is called a finite
measure space if p is a finite measure. For a finite measure space, prove
the following:
a)	If A and В are Л-measurable sets, then
jtt(A U B) = p(A) + p(B) — p(A П B).
b)	Generalize part (a) to an arbitrary finite number of А-measurable sets.
4.12	Let {En}^! be a sequence of Л-measurable sets. Prove that
lim inf p(En)-
n—»oo
4.13	Let {En}^-! be a sequence of Л-measurable sets with p ((J^Li < oo.
Prove that
lim sup p(En\
n—»oo
174 □ Chapter 4 Measure Theory
★4.14 Let (Q, Д, д) be a measure space and {Fn}^-! a sequence of Л-measurable
sets. Define E = { x : x G En for infinitely many n }.
a)	Prove that E = f|“=1 (|J~ n Ek).
b)	Prove that ^{En) < oo => ц(Е) = 0.
4.15	Prove Theorem 4.2.
4.16	Prove that the measure space, (7£, M, X), is the completion of the measure
space, (TZ, B, X|B). Use the following steps:
a)	Verify that В С M by employing Exercise 3.32 on page 126.
b)	Show that В D M by applying Exercise 3.44 on page 127.
c)	Prove that Л = A|b. Hint: Use the fact established in parts (a) and (b)
that M = B.
+4.17 Let (Q,X, д) be a measure space. Suppose that (О,^7, p) is_a complete
measure space with T D Л and Р|д = д. Prove that F D A and that
= д. Conclude that (Q, Д,д) is the smallest complete measure space
that contains (П,Л, д).
4.18 Let f be a nonnegative At-measurable function. Define д/ on M by
M(E)= [ fdX.
J E
Prove that д/ is a measure on Л4.
4.19 Let (О,Л, д) be a measure space such that {ж} G A for each x G Q. An
element x G Q is said to be an atom of д if д({я}) > 0. Assume now that
д is a finite measure, that is, д(О) < oo. Prove the following facts,
а) д has only countably many atoms.
b) д can be expressed uniquely as the sum of two measures, дс and /id,
where дс has no atoms and да is discrete. Moreover, we have that
да =	where К is the set of atoms of д.
4.2 MEASURABLE FUNCTIONS
The next step in developing the abstract Lebesgue integral is to introduce
the concept of measurability for functions defined on an abstract space. In
addition to real-valued functions, we will also consider complex-valued and
extended real-valued functions. We begin with real-valued functions.
Real-Valued Measurable Functions
Let (Q, Д) be a measurable space and f: Q —> 'll. We want to specify when
f is measurable. In the previous chapter, we discussed two kinds of mea-
surable functions: Borel measurable functions and Lebesgue measurable
4.2 Measurable Functions □ 175
functions. Recall that a real-valued function, /, is Borel measurable if and
only if /~1(O) G В for each open set О C TZ and it is Lebesgue measurable
if and only if /~1(О) e M for each open set О C TZ. Hence, it is quite
natural to make the following definition.
DEFINITION 4.4 Real-Valued Measurable Function
Let (П, A) be a measurable space. A real-valued function f on Q is said
to be an Л-measurable function if the inverse image of each open
subset of TZ under f is an Л-measurable set, that is, if /-1(O) G A for
all open sets О C TZ.
EXAMPLE 4.2 Illustrates Definition 4.4
a)	Let Q = TZ. Then, as we know from Chapter 3, the Borel measurable
functions are the В-measurable functions and the Lebesgue measurable
functions are the jM-measurable functions.
b)	Let (Q,Л) be a measurable space, D G Л, and Ad = {DHA : A e A}.
Then a function, /:£)—> 7£, is Лр-measurable if and only if for each
open subset О of TZ, /~г(О) is of the form D П A for some A e A.
c)	Let Q be a nonempty set. Then every real-valued function on Q is
P(Q)-measurable. An important special case: If Q = Af, then A is usu-
ally taken to be P(J\f); hence, all functions ~^TZ are Л-measurable.
But functions on are infinite sequences. Consequently, in this case,
the Л-measurable functions are precisely the infinite sequences. □
The following proposition provides some useful equivalent conditions
for a function to be Л-measurable. To prove the proposition, we proceed
in a similar manner as we did in the proof of Lemma 3.5 on page 98.
PROPOSITION 4.1
Let (П, Л) be a measurable space and f a real-valued function on Q. Then
the following statements are equivalent:
a)	f is A-measurable.
b)	For each a e TZ, /“x((—oo,a)) e A.
c)	For each a eTZ, ((a, oo)) G Л.
d)	For each a eTZ, /-1 ((—oo, a]) G Л.
e)	For each a eTZ, f~r ([a, oo)) G A.
176 □ Chapter 4 Measure Theory
Theorem 4.3, which we prove next, gives several important properties
of real-valued Д-measurable functions. Note that Theorem 3.15 on page 130
is a special case.
THEOREM 4.3
Let (SI, A) be a measurable space. Then the collection of real-valued
A-measurable functions forms an algebra. In other words, if f and д are
Л-measurable and а G H, then
a) f + д is A-measurable.
b) af is Л-measurable.
c) f • д is A-measurable.
PROOF:
a)	By Proposition 4.1, to prove that f + д is Д-measurable, it suffices to
show that { x : f(x) + д(х) > а } G Д for each a G 1Z. Now,
{ x : f(x) + g(x) > a} = {x: f(x) > a- g(x) }
= U { x : f(x) >r> a- g(x) }
Гб<Э
= и	ng-1((a-r,o°))).
reQ
This last union is an Д-measurable set since f and g are Д-measurable
functions, Д is a cr-algebra, and Q is countable. Consequently, f + g is
an Д-measurable function.
b)	If a = 0, then af = 0, which is Д-measurable (why?). So, assume a / 0
and let О be any open set in 1Z. Then a~YO = { a~ry : у G О } is open.
Therefore, because f is Д-measurable, (a/)“1(O) = /“1(a“1O) G Д.
This proves that af is Д-measurable.
c)	First we show that if f is Д-measurable, then so is f2. If a < 0, then
(/2)“1((a, сю)) = Q G Д. If a > 0, then we have
Cf2)_1((a,o°)) = {*  f{x)2 > a}
= { x : fix') > y/a,} U { x : f(x) < -y/a}
= /-1 ((x/a, oo)) U/-1((-oo,-x/a))-
This last union is an Д-measurable set because f is Д-measurable.
Hence, f2 is an Д-measurable function whenever f is.
4.2 Measurable Functions □ 177
Now, for any two functions, f and g, we can write
f-9= j(Cf + 5)2 ~ Cf-5)2)-
Applying parts (a) and (b) of this theorem and the fact that the square
of an Л-measurable function is Л-measurable, we conclude that f • g is
an Л-measurable function.	
We should emphasize that the measurability (or nonmeasurability) of
a function depends only on the cr-algebra, Л, of subsets of fi; that is, it has
nothing to do with a measure. Nonetheless, if (fi, Л, P) is a probability
space, then the Л-measurable functions are called random variables.
Thus, an Л-measurable function is a random variable only when con-
sidered in the context of a probability space. By the way, in probabil-
ity theory, random variables are usually denoted by uppercase italicized
English-alphabet letters that are near the end of the alphabet (e.g., X, Y,
and Z) instead of the more usual /, g, and h.
EXAMPLE 4.3 Illustrates Random Variables
Let (0,Л, P) be the probability space from subpart (ii) of Example 4.1(h)
on page 169. Define X(hh) = 2, X(ht) = X(th) = 1, and X(tt) = 0.
Then X: fi —* TZ is a random variable. It indicates the number of heads
obtained when a balanced coin is tossed twice.	□
Our next result is a generalization of Proposition 3.13 on page 162 to
an arbitrary complete measure space. Its proof is essentially identical to
that of Proposition 3.13.
PROPOSITION 4.2
Suppose that (О,Л, p) is a complete measure space. If f is A-measurable
and g = f p-ae, then g is A-measurable.
Complex-Valued Measurable Functions
In applying real analysis, we often encounter complex-valued functions.
This occurs, for instance, in Fourier analysis. We will denote the set of
all complex numbers by C. Here now is the definition of measurability for
complex-valued functions.
178 □ Chapter 4 Measure Theory
DEFINITION 4.5 Complex-Valued Measurable Function
Let (Q, Л) be a measurable space. A complex-valued function f on Q
is said to be an *4-measurable function if the inverse image of each
open subset of C under f is an Л-measurable set, that is, if /“1 (O) G A
for all open sets О С C.
The following theorem provides a useful characterization of measura-
bility for complex-valued functions. We leave the proof of the theorem as
an exercise for the reader.
THEOREM 4.4
A complex-valued function f on Q is A-measurable if and only if both its
real part, 3tf, and its imaginary part, Qf, are (real-valued) A-measurable
functions.
EXAMPLE 4.4 Illustrates Complex-Valued Measurable Functions
a)	Any real-valued Д-measurable function on Q is also a complex-valued
Л-measurable function.
b)	Let Q = H and A = B. Define f: 71 —> C by f(x) = егх. The real and
imaginary parts of f(x) are cos x and sin x, respectively. Since those
two functions are continuous, they are B-measurable. Consequently, by
Theorem 4.4, f is a complex-valued Б-measurable function.
c)	If д and h are real-valued Л-measurable functions, then, by Theorem 4.4,
the complex-valued function, f = д + ih, is also Л-measurable.
d)	Let {an}^_i be a sequence of complex numbers and define f:M —> C by
/(n) = an. Then f is a complex-valued P(JV}-measurable function. □
Theorem 4.3 holds also for complex-valued Л-measurable functions.
That is, the collection of complex-valued Д-measurable functions forms a
(complex) algebra. See Exercise 4.32.
Extended Real-Valued Measurable Functions
In addition to real- and complex-valued functions, we frequently must deal
with extended real-valued functions, in other words, functions that
take values in = 71 U {—oo, oo}. This is especially so when considering
suprema, infima, and limits. For instance, define
/„(x) = -^е-(^)2/2
V 27Г
4.2 Measurable Functions □ 179
-for x G И and n G ЛЛ Then, as n —> oo, fn(x) —> 0 if x / 0 and
/n(0) —> oo. Consequently, the sequence, {/n}^i> of real-valued functions
converges pointwise to the extended real-valued function, /, where
/(*) =
o,
00,
if x A 0;
if x = 0.
Thus, we next consider measurability for extended real-valued func-
tions. Recall that, by definition, a real-valued function, /, is Л-measurable
if	G Л for all open sets О C 1Z. Also, by definition, a complex-
valued function, /, is Л-measurable if /-1(O) G Л for all open sets О С C.
Hence, once we identify the open sets of 7£*, we have a natural way to
define extended real-valued Л-measurable functions.
DEFINITION 4.6 Open Subsets of the Extended Real Numbers
A subset of 7£* is said to be open if it can be expressed as a union of
intervals of the form (a, 6), [—oo, 6), and (a, oo], where a, b G 11.
DEFINITION 4.7 Extended Real-Valued Measurable Function
Let (Q, Л) be a measurable space. An extended real-valued function f
on Q is said to be an Л-measurable function if the inverse image
of each open subset of H* under f is an Л-measurable set, that is, if
/“1(O) G Л for all open sets О C 11*.
The next proposition provides the analogue of Proposition 4.1 for ex-
tended real-valued functions. Its proof is left as an exercise.
PROPOSITION 4.3
Let (Q, Л) be a measurable space and f an extended real-valued function
on Q. Then the following statements are equivalent:
a)	f is A-measurable.
b)	For	each a	eH,	oo,a)) G Л.
c)	For	each a	G	ft,	f~r ((a, oo]) G Л.
d)	For	each a	G	11,	f~r ([—oo, a]) G Л.
e)	For	each a	G	1Z,	f~r ([a, oo]) G Л.
180 □ Chapter 4 Measure Theory
Theorem 4.3 shows that the collection of real-valued Л-measurable
functions forms an algebra. In the case of extended real-valued functions,
if we adopt the convention that oo — oo is some fixed extended real num-
ber, then the collection of extended real-valued Л-measurable functions is
closed under addition, scalar multiplication, and multiplication. See Exer-
cises 4.39 and 4.40.
The next theorem shows that the collection of extended real-valued
Л-measurable functions is closed under maxima, minima, suprema, infima,
and pointwise limits. Note that Theorem 3.16 on page 130 is an immediate
consequence.
THEOREM 4.5
Suppose that f and g are extended real-valued A-measurable functions and
that {fn}n=i is a sequence of extended real-valued A-measurable functions.
Then
a)	f V g and f f\g are A-measurable.
b)	supn fn and infn fn are A-measurable.
c)	limsup^QQ fn and lim inffn are Л-measurable.
d)	If {/n}^°=i converges pointwise, then limn_+oo fn is A-measurable.
PROOF:
a)	Let h = f\/g and а G K. Then b-1 ((a, oo]) =	oo])u<7-1((a, oo]).
This union is in A because f and g are Л-measurable functions. Thus,
f V g is Л-measurable. Similarly, f A g is Л-measurable.
b)	Let h = supn/n and a G 11. Then iT^oo]) = U~=i fn 1 ((<*,^]).
This union is_in A because each fn is an Л-measurable function. Hence,
we see that supn fn is Л-measurable. Similarly, infn fn is Л-measurable.
c)	Since limsupn-^/n = infn supfc>n Д, it follows from part (b) that
lim sup^^ fn is Л-measurable. Using an entirely similar argument,
we find that lim infn-^ fn is Л-measurable.
d)	If {/n}^=i converges pointwise, then Ит^^ fn = limsup^^ fn. So,
limn-юо fn is Л-measurable by part (c).	
A common application of Theorem 4.5 occurs when {/n}^Li is a se-
quence of real-valued Л-measurable functions but one or more of infn /n,
supn/n, liminfn—oo fn, limsupn_^00 fn, and lim^^ fn are extended real-
valued Л-measurable functions.
4.2 Measurable Functions □ 181
EXAMPLE 4.5 Illustrates Theorem 4.5
a)	Let (П, A, /i) = (TZ, A4,A). Define
fn(x) =
n c-(nx)2/2
л/2тг
for x G fZ and n € Af. Then	is a sequence of real-valued
Af-measurable functions and fn~*f pointwise, where
/(x) =
if x / 0;
if x = 0.
By Theorem 4.5(d), f is an extended real-valued Д-measurable function,
a fact that we can easily verify directly.
b)	Let f be an extended real-valued Д-measurable function. By Theo-
rem 4.5(a), \ f\ is Д-measurable since \ f\ = f V — f.	□
Theorem 4.5(d) shows that if a sequence, {fnKXp of Д-measurable
functions converges pointwise to a function, /, then f is an Д-measurable
function. What if the convergence is only almost everywhere? In general,
we cannot conclude that f is Д-measurable; however, for complete measure
spaces we can.
PROPOSITION 4.4
Let (fl,A,p) be a complete measure space. Suppose that	a se-
quence of complex-valued or extended real-valued A-measurable functions
and that fn^f p-ae. Then f is an A-measurable function.
PROOF: The proof is essentially identical to that of Proposition 3.14 on
page 163 and is left to the reader.	
EXERCISES 4.2
4.20	Prove Proposition 4.1 on page 175.
4.21	Let (Q, Д) be a measurable space and f a real-valued function on Q. Prove
that f is Д-measurable if and only if /~1(B) G A for each В 6 B.
4.22	Suppose that (О,Д) is a measurable space and that f:Q —*	is an
Д-measurable function. Further suppose that g:1Z —► 1Z is a Borel mea-
surable function. Prove that g о f is Д-measurable.
4.23	Let D e B. Show that C(D), the collection of Borel measurable functions
on D, is precisely the collection of В /^-measurable functions.
182 □ Chapter 4 Measure Theory
4.24	Let (Q, Л) be a measurable space, D € Л, and Ad = {D A A : A G Л }.
a) If /: Q —►	is Л-measurable, show that /|p is Xp-measurable.
b) Suppose that g\ D —> H is Ad-measurable. Define /: Q —»	by
f(x} = / 9^
J( ’	[0, x£D.
Prove that f is Л-measurable. (This shows that every Лр-measurable
function can be extended to an Л-measurable function.)
4.25	Prove Proposition 4.2.
4.26	Provide an example to show that the hypothesis of completeness cannot be
omitted from Proposition 4.2.
4.27	If О is an open subset of H and a is a nonzero real number, show that оГгО
is an open subset of
4.28	Prove Theorem 4.4. Hint: Use the fact that each open set in C is a countable
union of open rectangles. [An open rectangle in C is a set of the form
{u + iv € О: a < и < b, c < v < d }.]
4.29	Show that every real-valued Л-measurable function is a complex-valued
Л-measurable function.
4.30	The collection, P2, of Borel sets of C is defined to be the smallest cr-algebra
of subsets of C that contains all the open subsets of C. Show that f: Q —» C
is Л-measurable if and only if /“1(B) G A for all В G #2-
4.31	Let (Q, Л, P) be a probability space, X a random variable on Q, and t a
fixed real number. Define g:£l —> C by g = eltX; that is, for each x G Q,
g(x) =	Prove that g is Л-measurable. Is g a random variable?
Explain your answer.
★4.32 Prove that the collection of complex-valued Л-measurable functions forms
a complex algebra. That is, if f and g are complex-valued Л-measurable
functions and a G C, show that f + g, af, and f • g are complex-valued
Л-measurable functions.
4.33	Show that each open subset of Tt is also an open subset of TV.
4.34	Prove Proposition 4.3 on page 179. Hint: Show that each open set in K*
can be written as a countable union of intervals of the form (a, b), [—00, b\
and (a, 00], where a, b G
4.35	Show that ► TV is Л-measurable if and only if (i)	00}) and
/~1({oo}) are in A and (ii) f~\B) G A for all В Ев.
4.36	Show that every real-valued Л-measurable function is an extended real-
valued Л-measurable function.
4.37	Show that a set О С H is open in H if and only if there is an open subset
U of ft* such that О = H A U.
4.2 Measurable Functions □ 183
4.38	Suppose that f and g are extended real-valued Л-measurable functions.
Prove that the following three sets are Л-measurable:
a)	{ x : f(x) > g(x) }.
b)	{x : f (x\> g(x)}.
с)	{x: /(x) = #(x)}.
4.39	Suppose that f and g are extended real-valued Л-measurable functions and
that /3 E TV. Set
E = { x : f(x) = oo, g(x) = —oo } U { x : f(x) = —oo, g(x) = oo }.
For x e E, define (/+ <?)(£) = /?; otherwise, define (f + g)(x) = f(x)+g(x),
as usual. Prove that f + g is Л-measurable.
4.40	With the convention established in the preceding exercise, prove that the
collection of extended real-valued Л-measurable functions is closed under
scalar multiplication and multiplication.
4.41	Suppose that	is a sequence of extended real-valued Л-measurable
functions. Verify that { x : limn—oo fn(x) exists } is an Л-measurable set.
4.42	Suppose	is a sequence of complex-valued Л-measurable functions
that converges pointwise to a complex-valued function, f. Prove that f is
Л-measurable.
4.43	Construct a sequence,	of Л-measurable functions that converges
almost everywhere to a function, /, that is not Л-measurable. Hint: Take
(Q, Л, /z) = (7£, B, A|S) and do something with a non-Borel measurable sub-
set of the Cantor set.
4.44	Prove Proposition 4.4.
★4.45 Suppose that	is a sequence of complex-valued Л-measurable func-
tions. Define
lim /n(z), if lim fn(x) exists;
n—>oo	n—*oo
0,	otherwise.
Prove that f is Л-measurable.
4.46	Suppose that	is a sequence of complex-valued Л-measurable func-
tions and that fn —* g /i-ае. Prove that there exists an Л-measurable
function, /, such that fn~*f fi-ae. Note: g need not be Л-measurable
unless, of course, (Q, Л, /z) is complete.
4.47	Suppose that E is an open subset of C and that g is a real-valued continuous
function on E. Further suppose that f is a complex-valued Л-measurable
function on Q with the range of f being a subset of E. Prove that g о f is a
real-valued Л-measurable function on Q. Repeat the proof if E is a closed
subset of C.
4.48	Suppose that f: Q —> C is Л-measurable. Verify that f can be written
in the “polar” form, f — Re'e, where R: Q —> [0, oo) and 0:Q —> R are
Л-measurable functions.
f(x) = I
184 □ Chapter 4 Measure Theory
4.3	THE ABSTRACT LEBESGUE INTEGRAL FOR
NONNEGATIVE FUNCTIONS
Now that we have discussed measure spaces and measurable functions,
we can proceed to develop the abstract Lebesgue integral, that is, the
Lebesgue integral on an arbitrary measure space, (Q, Л, /z). As we will see,
the development of the abstract Lebesgue integral is almost identical to
that of the Lebesgue integral on the real line, that is, on (7£, Л4, A), given
in Chapter 3. Consequently, many of the proofs will be left to the reader.
Following the procedure in Chapter 3, we will first define the abstract
Lebesgue integral of a simple function, then of a nonnegative Л-measurable
function, and then of a real-valued Л-measurable function. In addition, we
will also define the abstract Lebesgue integral of extended real-valued and
complex-valued Л-measurable functions. Nonnegative functions will be
considered in this section and general functions in the next.
The Lebesgue Integral of a Nonnegative Simple Function
Let (Q, Л, p) be a measure space. An Л-measurable function on Q is called
a simple function if it takes on only finitely many values. More precisely,
we have the following definition.
DEFINITION 4.8 Simple Function and Canonical Representation
An Л-measurable function, s, is said to be a simple function if its
range is a finite set. Let <22, • • •, an denote the distinct nonzero
values of <s and set Ak = { x : s(;r) = ak }, 1 < k < n. Then
n
S = ^акХАк-
fc=l
This is called the canonical representation of s.
We leave it as an exercise for the reader to show that the sets, Л1? A2,
..., An, appearing in the canonical representation of an Л-measurable
simple function, are Л-measurable and pairwise disjoint.
EXAMPLE 4.6 Illustrates Definition 4.8
a)	The Lebesgue measurable simple functions introduced in Chapter 3 are
Al-measurable simple functions in the sense of Definition 4.8.
b)	If Q is a finite set, then every Л-measurable function is simple. □
4.3 The Abstract Lebesgue Integral for Nonnegative Functions □ 185
In Definition 4.9, we give the definition of the abstract Lebesgue inte-
gral of a nonnegative Д-measurable simple function. It is a straightforward
generalization of the definition presented in Chapter 3 for the Lebesgue in-
tegral of a nonnegative Lebesgue measurable simple function.
DEFINITION 4.9 Integral of a Nonnegative Simple Function
Let (fl, A, fjb) be a measure space and s a nonnegative Л-measurable
simple function on Q with canonical representation, s = &kXAk-
Then the (abstract) Lebesgue integral of s over Q with respect
to p is defined by
/ s(x)dn(x) = Vak^Ak).
fc=i
If E G A, then the (abstract) Lebesgue integral of s over E with
respect to p is defined by
/ s(z) dfi(x) = / XE(x)s(x)dp,(x).
Je	Jn
Note: The notations fE s dp and fE s(x) p(dx) are commonly used in place
of JEs(x)dfi(x).
The next proposition shows how we can obtain the abstract Lebesgue
integral of a nonnegative simple function from a possibly noncanonical
representation. The proof is identical to that of Proposition 3.8 on page 131.
PROPOSITION 4.5
Let s be a nonnegative А-measurable simple function that can be expressed
in the form, s = ькХвк, where this representation is not necessarily
canonical but Bk E A for 1 < к < m and Bi П Bj = 0 for i / j. Then
P	m
/ s(x) dlAx) = VbkidJBk).
More generally,
f s(x) dp(x) = bklKBk o E)
k=l
for each E e A.
186 □ Chapter 4 Measure Theory
The following fact is proved in precisely the same way as Lemma 3.14
on page 133.
PROPOSITION 4.6
Suppose that s and t are nonnegative A-measurable simple functions and
that a, /3 > 0. Then ois+[3t is a nonnegative A-measurable simple function
and
/ (as + /3Z) dp = a / sdp + (3 I tdp
JE	JE	JE
for each E e A.
The Lebesgue Integral of a Nonnegative
A-measurable Function
The next thing on the agenda is the definition of the abstract Lebesgue inte-
gral for a nonnegative extended real-valued Д-measurable function. Propo-
sition 4.7 provides the motivation for that definition.
PROPOSITION 4.7
a)	Suppose that f is a nonnegative extended real-valued A-measurable
function on fl. Then there is a nondecreasing sequence of nonnega-
tive A-measurable simple functions that converges pointwise to f. In
other words, there is a sequence, {snJ-^Lp of nonnegative A-measurable
simple functions such that, for all x E fl, < S2^x) < ••• and
limn-^oo sn(x) = f[x).
ъ)	is a sequence of nonnegative A-measurable simple functions
that converges pointwise on fl to a function, f, then f is a nonnegative
extended real-valued A-measurable function.
PROOF: The proof is left as an exercise for the reader.	
Proposition 4.7 shows that the functions that can be approximated by
nonnegative Д-measurable simple functions are precisely the nonnegative
extended real-valued Д-measurable functions. Thus, we make the following
definition.
DEFINITION 4.10 Lebesgue Integral of a Nonnegative Function
Let f be a nonnegative extended real-valued Д-measurable function
on Q. Then the (abstract) Lebesgue integral of f over fl with
4.3 The Abstract Lebesgue Integral for Nonnegative Functions □ 187
respect to /1 is defined by
/ f(x) dp>(x) = sup /
where the supremum is taken over all nonnegative Д-measurable sim-
ple functions that are dominated by f. If E e Д, then the (abstract)
Lebesgue integral of f over E with respect to /j, is defined by
( f(x) dp(x) = [
e	Ja
Xe(x)J(x) dp(x).
Note: The abstract Lebesgue integral of a nonnegative jM-measurable func-
tion with respect to A is identical to its Lebesgue integral, as defined in
Chapter 3.
Some of the more important properties of the abstract Lebesgue in-
tegral for nonnegative extended real-valued Д-measurable functions are
provided in Proposition 4.8. The proof is left as an exercise for the reader.
PROPOSITION 4.8
Let f and g be nonnegative extended real-valued A-measurable functions
on ft, а > 0, and E e A. Then
a)	f <9 H-ae => fEfd(j, < fEgdp.
b)	В с E and В E fBfdp<fEf d/j,.
c)	f(x) — 0 f°r all x G E => f„f du — 0.
d)	M(F) = 0^/B/dM = 0.
e)
Convergence Properties of the Abstract Lebesgue Integral for
Nonnegative A-measurable Functions
We now present two major convergence theorems for the abstract Lebesgue
integral of nonnegative extended real-valued Д-measurable functions — the
monotone convergence theorem (MCT) and Fatou’s lemma. The proofs are
similar to those given in Section 3.6 (page 140 onward).
The MCT is stated first. Note that it applies to extended real-valued
Д-measurable functions as well as to real-valued Д-measurable functions.
188 □ Chapter 4 Measure Theory
THEOREM 4.6 Monotone Convergence Theorem (MCT)
Suppose that	is & monotone nondecreasing sequence of nonnega-
tive extended real-valued A-measurable functions. Then, for each E e A,
COROLLARY 4.1
Let ft 9> fl, /2, • • • be nonnegative extended real-valued A-measurable
functions and let E e A. Then
&) fE(f + d)df/ =	+ fEgdfi.
b) fEEn=1fnd^ = Zn=1fEfnd^.
c) If {f?n}n C A are pairwise disjoint, then Jjj E f dp = fE^ f dp.
Proposition 4.8(e) and Corollary 4.1(a) together imply that if f and g
are nonnegative extended real-valued A-measurable functions and a, (3 > 0,
then
[ {af 4- (3g) dp = a f fdp + /3 f gdp.	(4.2)
J £2	J £2	J £2
Equation (4.2), Proposition 4.7, and the MCT are frequently used
together for “bootstrapping arguments.” That is, suppose we want to
prove that a certain Lebesgue-integral property holds for all nonnegative
Л-measurable functions. To bootstrap, we employ three steps: First we
show that the property holds for characteristic functions of Л-measurable
sets; next we apply (4.2) to conclude that the property holds for nonnega-
tive simple functions; and then we use Proposition 4.7(a) and the MCT to
deduce that the property holds for all nonnegative Л-measurable functions.
Exercises 4.60 and 4.61 provide illustrations of bootstrapping.
Next we state Fatou’s lemma. This version of Fatou’s lemma not
only generalizes to arbitrary measure spaces the version presented in The-
orem 3.20 on page 146 but its hypotheses are less restrictive. Specifically,
it does not impose any convergence conditions on {fn}n=r
THEOREM 4.7 Fatou’s Lemma
Suppose that {/n}Xi JS a sequence of nonnegative extended real-valued
A-measurable functions. Then, for each E e A,
/ lim inf fn dp < lim inf
JE n~*°°	n—*oo
f fndp-
E
4.3 The Abstract Lebesgue Integral for Nonnegative Functions □ 189
EXAMPLE 4.7 Illustrates the Abstract Lebesgue Integral
a)	Let (О,Л, /z) be a measure space and f a nonnegative extended real-
valued Л-measurable function on Q. Suppose that xq e Q and that
{xo} e Л. We claim that
/d/z = /(xo)/z({^o})-
(4-3)
To see this, note that X{x0}f is the simple function /(xo)x{x0} an<45
hence, by Definition 4.9 on page 185,
/	/ x{xo}fdfi= / f (xq)x{x0} dfi — f{x0)^{xQ}).
«/{xo}
More generally, let C = {xn}n be a countable subset of Q such that
{a:n} G A for each n. Then, by Corollary 4.1(c) and (4.3),
[ fdn= [	f dp,
Jc •'Un{*n}
=52 [ fd^=^2 f&MM)-
(4-4)
b)	Consider the measure space (A/', P(AZ'), /z) where /z is counting measure
on P(JV’). Then, as we learned in Example 4.2(c), a nonnegative real-
valued V(^-measurable function, /, on X is a nonnegative infinite
sequence, {an}^Li, where we have let an = f(n). Thus, by (4.4),
-	oo	oo
/ fdn=52-f(nM<n})= 52an-
n=l	n=l
Hence, we can apply abstract measure theory to study infinite series.
c)	Let (Q, Л, P) be a probability space and X a nonnegative random vari-
able. Then the abstract Lebesgue integral of X over Q with respect
to P is called the mean (expectation, expected value) of X. The
mean of X is denoted by £(X). Thus,
£(X) = [ XdP.
For instance, consider the experiment of tossing a balanced coin twice.
An appropriate probability space for that experiment is (Q, Л, P), where
190 □ Chapter 4 Measure Theory
Q = {HH, HT, TH, тт}, A = Р(П) and, for E € A, P(E) — N(E)/4.
Let X denote the number of heads obtained. Then, by (4.4), the mean
of X equals
£(X) = / XdP = X(hh)P({hh}) + X(ht)P({ht})
JQ
+ X(th)P({th}) + X(tt)P({tt})
„ 1	1 ,1	1 ,
= 2-- + l-- + l-- + 0-- = l,
4	4	4	4
which is intuitively what it should be.
d)	Let Q be a set, {xn}n a sequence of distinct elements of П, and {bn}n
a sequence of nonnegative real numbers. For E С П, define
M(E) = £ bn.
xn&E
Then p is a measure on P(Q). Let f be a nonnegative function on Q
and set C = {^n}n- Then, by Corollary 4.1(c) on page 188, Proposi-
tion 4.8(d) on page 187, and (4.4),
= 52 /(хп)д({®п}) = ^f(.Xn)bn.
(4-5)
We will employ (4.5) frequently.
e)	Let Q be a set, A = P(Q), and p counting measure on A. If f is a
nonnegative function on fl, then
( fdfi= 52/(x),
where f(x>) = SUP {23xgf /(x) : F finite, F C fi}. The verifica-
tion of this is left to the reader.	□
EXERCISES 4.3
4.49	Establish that the sets appearing in the canonical representation of an
Д-measurable simple function are Д-measurable and pairwise disjoint.
4.50	Prove Proposition 4.7 on page 186. Hint: Refer to Proposition 3.9 on
page 134.
4.3 The Abstract Lebesgue Integral for Nonnegative Functions □ 191
4.51	Suppose that f is a nonnegative extended real-valued Д-measurable func-
tion on Q, c > 0, and Ac = { x : f(x) > c}. Prove that
д(Л) < 7 [ fdfi.
cJn
+4.52 Let f be a nonnegative extended real-valued Л-measurable function on Q
and E 6 Л. Prove that f f dfi = 0 if and only if f = 0 jz-ae on E.
+4.53 Suppose that f is a nonnegative extended real-valued Л-measurable func-
tion on Q and that f du < oo. Show that f is finite jz-ae.
4.54	Prove Proposition 4.8 on page 187. Hint: Refer to Proposition 3.10 on
page 136.
4.55	Prove the MCT, Theorem 4.6 on page 188.
4.56	Show that for a fixed E 6 Л, the conclusion of the MCT remains valid if
the hypotheses are satisfied only on E.
4.57	Prove Corollary 4.1 on page 188.
4.58	Suppose that f is a nonnegative extended real-valued Л-measurable func-
tion on Q. Also, suppose that	C A with Ex С E2 C • • •. Prove'
that
/	f d/i = lim / f dfi.
7-1=1
4.59	ProveTatdu’s lemma, Theorem 4.7 on page 188.
4.60	Suppose that (Q, Л, /z) is a measure space, D 6 Л, and f is a nonnegative
extended real-valued Л-measurable function on Q. Let (D,Ad^d) be as
defined in Example 4.1(c) on page 168. Show that
/ fdp.= / f\DdnD.
J D	J D
Hint: Use bootstrapping.
+4.61 Let (Q, Л, jz) be a measure space and g a nonnegative Л-measurable function
on Q. For E € Л, define
v(E) = / gd^i.
J E
a)	Show that и is a measure on Л.
b)	Show that
/ fdv= / fgdfi
Jq Jn
for each nonnegative Л-measurable function, /. Hint: Bootstrap.
192 □ Chapter 4 Measure Theory
4.62	Let {flmn}m,n=i be a double sequence of nonnegative numbers. Prove that
oo oo	oo oo
0>mn —	U-пгп •
П=1 771=1	771=1 П=1
Hint: Refer to Example 4.7(b) on page 189.
4.63	Let f : Q —► [0,1] be an Л-measurable function.
a)	Prove that limn—oo	dp =	((0,1]))-
b)	If /i(Q) < oo, prove that lim n—>oo /n7ndM = M(r1({l}))-
4.4 THE GENERAL ABSTRACT LEBESGUE INTEGRAL
In the previous section, we discussed the abstract Lebesgue integral for
nonnegative extended real-valued Л-measurable functions. We will now
expand the definition of the abstract Lebesgue integral so that it applies
to Л-measurable functions that are not necessarily nonnegative. We begin
with extended real-valued functions.
Lebesgue Integral of an Extended Real-Valued Function
Let (fl, Л, p) be a measure space. To define the abstract Lebesgue integral
of an extended real-valued Л-measurable function, /, on Q, we follow the
procedure used in Section 3.7 for defining the Lebesgue integral of a real-
valued Lebesgue measurable function on 11.
DEFINITION 4.11 Integral of an Extended Real-Valued Function
Let f be an extended real-valued Л-measurable function on Q and
E € Л. Then the (abstract) Lebesgue integral of f over E with
respect to /1 is defined by
[ fdn^~[ f+dfi- i f du
e Je Je
(4-6)
provided that the right-hand side makes sense; that is, at least one of
the integrals on the right-hand side of (4.6) is finite. Here /+ = / V 0
and f~ = —(/Л0) denote the positive and negative parts of /, respec-
tively. In addition, we say that f is Lebesgue integrable over E
4.4 The General Abstract Lebesgue Integral □ 193
if both integrals on the right-hand side of (4.6) are finite or, equiva-
lently, if
I \f\dp.= I f+dfj,+ [ f d/j, < oo.
Je Je Je
(4-7)
If f is Lebesgue integrable over Q, then we say that f is Lebesgue
integrable.
We should mention that if f is Lebesgue integrable (over Q), then it
is Lebesgue integrable over every E € A. Here are some examples.
EXAMPLE 4.8 Illustrates Definition 4.11
a)	Let ($7,Д,^) = (7^,Л4,Л) and /(x) = x. Then
rw = {S,'
ж > 0;
x < 0.
and
0,
—x,
x > 0;
x < 0.
(i)	If E = 11, then f+ dX = f^f dX = oo. Hence, the integral,
JbfdX, is not defined.
(ii)	If E = [-1,2], then JEf+dX = 2 and fE f~ dX = 1/2 so that
fEfdX = 2—1/2 = 3/2. And, since fE \f\ dX = 2+1/2 = 5/2 < oo,
we see that f is Lebesgue integrable over [—1,2].
(iii)	If E = (—oo, 1), then fE f+ dX = 1/2 and fE f~ dX = oo so that
fE fdX = 1/2 - oo = -oo. However, as fE |/| dX = 1/2 + oo = oo,
we see that f is not Lebesgue integrable over (—oo, 1).
b)	Let (П,Д,^) = (Л/",7?(A/’),/i), where p is counting measure on
Then real-valued Д-measurable functions are simply infinite sequences
of real numbers. Referring to Example 4.7(b) on page 189, we see that
a sequence of real numbers,	is Lebesgue integrable (over Af) if
and only if
oo
^2 lan| < OO,
n=l
(4-8)
that is, the series is absolutely convergent. For instance, the sequence,
{(—1)п/мр}^=1, is Lebesgue integrable if and only if p > 1. Note that,
although £„=1(-1)п/п converges, {(“l)n/^}Xi not Lebesgue inte-
grable as the series is not absolutely convergent.	□
194 □ Chapter 4 Measure Theory
Lebesgue Integral of a Complex-Valued Function
Next we will define the abstract Lebesgue integral for complex-valued
Л-measurable functions. First some preliminaries.
DEFINITION 4.12 Modulus of a Complex-Valued Function
Let f be a complex-valued function on Q. Then the modulus of /,
denoted by |/|, is defined to be the real-valued function
l/l = vm2 + (W-
In other words, |/|(z) = |/(z)|, where |/(x)| denotes the modulus of
the complex number /(z).
The following two propositions will be required. We leave the proofs
as exercises for the reader.
PROPOSITION 4.9
Let f be a complex-valued function on Q. Then
a)	|/| < |3?/| + |S/|.
b)	|»/| < |/| and |9/| < \f\.
c)	\f\ is A-measurable if f is.
PROPOSITION 4.10
Let f be a complex-valued A-measurable function on Q and E e A. Then
\f\ is Lebesgue integrable over E if and only if both Rf and are.
In view of Proposition 4.10 and the fact that f = %lf 4- it is
reasonable to make the following definition.
DEFINITION 4.13 Integral of a Complex-Valued Function
Let f be a complex-valued Л-measurable function on Q and E € Л.
We say that f is Lebesgue integrable over E with respect to p
if \ f\ is Lebesgue integrable over E with respect to д; that is,
[ |/| dfi < oo.
JE
4.4 The General Abstract Lebesgue Integral □ 195
In that case, the (abstract) Lebesgue integral of f over E with
respect to д is defined by
[ fd^i= [W)d^i [ &f)dfi.
Je Je	Je
If f is Lebesgue integrable over Q, then we say that f is Lebesgue
integrable.
For a measure space, (Q, Л, /z), the collection of all complex-valued
Lebesgue integrable functions is denoted by £1(Q, Л,/1). When no con-
fusion will arise, we write /^(/i) for £1(Q, Л,/z).
EXAMPLE 4.9 Illustrates Definition 4.13
a)	Let (Г2,Л,/х) = (7£, A4,A) and f(x) = егх/(1 + x2). Then we have
Ж/(ж) = cosrr/(l-Fz2), ^f(x) = sinz/(l-hz2), and |/(z)| = 1/(1 H-rr2).
By Exercise 3.71 on page 147 and Theorem 3.23 on page 157,
[ |/(x)| dX(x) = f (1 + z2) 1 dX(x)
n	Jit
r	fn dx
= lim /	(1 4-z2)-1 dA(z) = lim / ——
n-00 J^nn]	n->oo J_n (1 + x2)
= lim (arctan(n) — arctan(—n)) = 7г < 00.
Therefore, f e C\X).
b)	Let (Q, Л, ^) = (Л<, P(Af),/i), where /i is counting measure on P(-V).
Then complex-valued Л-measurable functions are simply infinite se-
quences of complex numbers. Referring to Example 4.7(b) on page 189,
we see that a sequence of complex numbers, {an}^L15 is in £1(m) if
and only if the series,	converges absolutely. We point out
here that the notations, f1 or ^1(A/’), are generally used in place of
c)	Let (fl, Л) be a measurable space. A measure, /1, on A is said to be a
finite measure if /i(Q) <00. If is a finite measure, then (Q, A, /1)
is called a finite measure space. For a finite measure space, each
bounded complex-valued Л-measurable function, /, is in £1(/i). Indeed,
if l/l < M, then by Proposition 4.8(a) on page 187,
[ \ f\dp.< [ Mdp, = M/i(Q) < 00.
Jq Jq
196 □ Chapter 4 Measure Theory
Note that boundedness is a sufficient but not necessary condition for
integrability. For instance, let (О,Д,/1) = ((0,1), A4(0,i)? A(0,i)) and
f(x) = x~i. Then f is not bounded on (0,1) but is in C1 (A(o,i))-
d) If (Q, Д, P) is a probability space, then the integrable functions, that
is,	members of £г(Р), are called random variables with finite mean or
finite expectation.	□
The following theorem, whose proof is left as an exercise, provides
some important properties of Lebesgue integrable functions.
THEOREM 4.	8
Suppose that f and g are in £Х(П, Д,д) and that a G C. Then
a) f + 9 £ & M and
[ (f + 9)dp= [ fdp + f
Г2	J VI J Г2
9 dp.
b)	af G ^(p) and
I af dp = a I f dp.
Ja Jq
c)	If f and g are real-valued and f < g on fl, then f^fdp < j^gdp.
d)	|/п/йм| < /П1/Ид-
e)	KE) = 0 => JEf du = 0.
f)	If A and В are disjoint A-measurable sets, then
A\JB
в
A
Remark: Parts (a) and (b) of Theorem 4.8 together imply that if a, fl G C
and f,ge ^(p), then
[ (af + flg) dp = a [
JQ	JQ
fdp + fl / gdp.
Jq
This is called the linearity property of the abstract Lebesgue integral.
As mentioned in Section 3.8, we often encounter functions that are
only defined almost everywhere. Because the Integral of an Д-measurable
function is not affected by its values on a set of measure zero, it is reasonable
to make the following definition.
4.4 The General Abstract Lebesgue Integral □ 197
DEFINITION 4.14 Integral of a Function Defined Almost Everywhere
Let (О,Д, p) be a measure space. Suppose that f is a function de-
fined p-ae on Q; that is, if D is the domain of /, then p(Dc) = 0.
Further suppose that there is an Д-measurable function, p, such that
g(x) = f(x) for x G D. Then, for E G Д, we define the (abstract)
Lebesgue integral of f over E by
/ fdp = / gdp,
JE	JE
provided that the integral on the right-hand side exists.
Dominated Convergence Theorem
Theorem 3.22 on page 154 gives the dominated convergence theorem (DCT)
for real-valued functions on the measure space (7£, Л4, Л). In what follows,
we generalize the DCT so that it applies to complex-valued functions on
an arbitrary measure space ($7,Д,р). Note that the version of the DCT
given here has weaker hypotheses than the one presented in Theorem 3.22.
THEOREM 4.	9 Dominated Convergence Theorem (DCT)
Let (fl,A,p) be a measure space. Suppose that	is a sequence
of complex-valued A-measurable functions that converges p-ae. Further
suppose that there is a nonnegative Lebesgue integrable function, g, such
that \fn\ < g p-ae for each n G X. Then
/ lim fn dp = lim / fn dp	(4.9)
JE n-oo	n->oo JE
for each E G A.
/(*) = ’
PROOF: Without loss of generality, we can assume that, for each n G X,
\fn\ < g everywhere on Q (why?). Define
lim fn(x), if lim fn(x) exists;
n—>oo	n—*oo
0,	otherwise.
Then, by Exercise 4.45 on page 183, f is Д-measurable. Moreover, since
{/n}~ x converges p-ae, fn—*f р-ж. From Definition 4.14, we see that
limn—oo fn dp = fEf dp and, therefore, to prove (4.9) it suffices to prove
[ fdfi = lim [ fnd(J,	(4-10)
Je ”-,o° Je
for each E G Д.'
198 □ Chapter 4 Measure Theory
First suppose that each fn is real-valued. Then (4.10) can be proved
by employing Fatou’s lemma (Theorem 4.7 on page 188) and the same
argument that was used in the proof of the DCT for the Lebesgue integral
on the real line (Theorem 3.22).
Next, we remove the restriction that each fn is real-valued. Note
that {|/ — /n|}^=i is a sequence of real-valued Л-measurable functions that
converges to 0 /i-ае. Furthermore, for each n e X, we have \f — fn\ <
I/1 + |/n| < 2g, an integrable function. Consequently, by Theorem 4.8 and
the previous paragraph, as n —* oo,
I fdp — I fndp
JE	JE
< [ \f-fn\dp-* [ 0с?м = 0
JE	JE
for each E e A. This completes the proof of the DCT.
Three of the many important corollaries of the DCT are given in what
follows. Several other corollaries are considered in the exercises.
COROLLARY 4.2
Suppose that {/n}^Li is a sequence of complex-valued A-measurable func-
tions such that	oo
£ [ l/nl du < oo.
п=17Я
Then fn converges p-ae and
for each E 6 A.
PROOF: From Corollary 4.1(b) on page 188, we know that
1/n Им-
By assumption, the sum on the right-hand side of the previous equation is
finite and, hence, so is the integral on the left-hand side. In other words,
if we set g = 52^ \ fnL then g is Lebesgue integrable. From Exercise 4.53
on page 191, we conclude that g is finite p-az which, in turn, implies that
fn converges p-ae.
4.4 The General Abstract Lebesgue Integral □ 199
Set gn = fk- Then, for each n e ЛГ, \gn\ < g and, as we have
just seen, {gn}™=i converges /i-ае (to Zn)« Therefore, by the DCT
and Theorem 4.8(a),
f 52 fn dp = / lim gn dp = lim / gn d/i
e “ Je	n-*°° Je
г n	n г	°° г
= lim / V fk dp = lim У2 / fkdp=y2 fn dp
П—>OO	n—' /о	' / E,
Jbfc=l	k=lJE	n=lJE
for each E e A.
COROLLARY 4.3
Let (Q, Д,/i) be a measure space, f e £1(m), and	a sequence of
А-measurable sets with Ei C E% C • • •. Then
ur=1
f dp = lim I f dp.
En n~^°° JEn
PROOF: For convenience, let E = U^=i -E'n- It is easy to see that XEnf —*
Хе/ pointwise and that |хеп/| < \f\ E >C1(/i) for each n e ЛЛ Thus, by
the DCT,
/ fdp= / XEfdp= lim / XEnfdp =
Je Jn n-*°° Jn
lim /
n-*°° Je,
as recpgred.
COROLLARY 4.4 Bounded Convergence Theorem
Let (Q, A, p) be a Snite measure spacer Suppose that {/n}^=i 2S a sequence
of uniformly bounded, complex-valued, А-measurable functions that con-
verges p-ae. Then
lim fn dp = lim / fn dp
n—+OO	n—+OO J
E
for each E e A.
200 □ Chapter 4 Measure Theory
PROOF: By assumption, there is a real number, Af, such that \fn\ < M
for all n € X. Because (Q, Л, д) is a finite measure space, the function
g(x) = Af is Lebesgue integrable (why?). Applying the DCT completes
the proof.	
EXAMPLE 4.10 Illustrates the DCT
a)	Suppose that for each n E X, {flnfc}fc?=i is a sequence of complex num-
bers and that limn_oo^nfc = for each к E AT. Further suppose
that there is a sequence of nonnegative numbers, {bfc}fcL15 such that
YlkLi bk < oo and \ank | < bfc for fc, n E X. We claim that
oo	oo
lim У^апк = У'ak.	(4.11)
n—*oo
fc=l	fc=l
Indeed, consider the measure space, (Х,Р(Х),д), where /1 is counting
measure. Define fn(k) = anfc, /(fc) = ak. and p(fc) = bk- By assump-
tion, g is integrable, \fn\ < g for all n G X, and fn —* f pointwise
on X. Thus, by the DCT, fj^jndp-* fj^fdpasn-^ 00. How-
ever, fn dp = ank and j^fdp = ak (see Exercise 4.73).
Thus, (4.11) holds.
Without a dominating integrable sequence, (4.11) may fail. For
instance, take ank = 6nk and ak = 0. Then linin—oo anfc = &к for each
к E Af. But, as 52^ unfc — 1 for all n G X, we see that
00	00
lim 52 ank = 1 / 0 = У2 ak.
fc=l	fc=l
Therefore, (4.11) fails to hold.
b)	Let (fl, A. P) be a probability space and X a real-valued random vari-
able having finite expectation, that is, X E £г(Р). Define f on 11
by /(f) = £(eltX). Note that the definition of / makes sense because
|eltX| < 1. We claim that /'(0) = i£(X). To prove this, let {fn}^Li be
an arbitrary sequence of nonzero real numbers that converges to 0. For
each n G X, define Yn = (eltnX — 1) /tn. Then (see Exercise 4.75)
(4.12)
Cn — 0 JQ tn	JQ
Now, for x E 7£, we have \егх — 1| < |x| and, therefore, |УП| < |X| for
each n G X. As Yn —* IX pointwise on fi, we can apply the DCT to
conclude that
lim	= Пт ( YndP= [ iX dP = i£(X).
n—00 tn - 0	n-ioo /0	/fi
4.4 The General Abstract Lebesgue Integral □ 201
Because {£n}^Li is an arbitrary sequence of nonzero real numbers con-
verging to 0, it follows that /'(0) exists and equals i£(X).	□
EXERCISES 4.4
4.64	Let (Q, A, /2) = (A/*, P(A/*), /i), where /2 is counting measure on Define
f(n) = (-l)n/n for n € Af. Is f^fdp defined? Explain your answer.
4.65	Prove Proposition 4.9 on page 194.
4.66	Prove Proposition 4.10 on page 194.
4.67	Let f(x) = x~?. Show that f 6 £* ((0,1), A4(o,i), A(0,i)).
4.68	Prove Theorem 4.8 on page 196.
4.69	Prove that Definition 4.14 on page 197 is well-posed. In other words, assume
that f is defined /2-ae on Q and that g and h are Л-measurable functions
that equal f on its domain. Show that for E G A, either fEgdp = fE hdp
or neither integral exists.
4.70	Show that for a fixed E G A, the conclusion of the DCT remains valid if
the hypotheses are satisfied only on E.
4.71	State and prove a version of the DCT for extended real-valued A-measurable
functions.
★4.72 Suppose that f 6 £г(П, A, p). Further suppose that {En}^ is a sequence
of pairwise disjoint A-measurable sets. Prove that
[ fdp = y[ fdp.
±4.73 Let f e ^(QjAjp) and C = {xn}n a countable subset of Q such that
{xn} 6 A for each n. Prove that
/ fdp. = '^f(xn)n({xn}).
n
Deduce that if {an}^! € then
f	°°
J*	n=1
where /(n) = an.
4.74	Assume that	is a convergent series of nonnegative numbers and’
that, for n, k e Af, bnk are complex numbers with |bnfc| < M < oo. Also
assume that limn->oo bnk = bk for each к G A/*. Prove that
oo	oo
lim dkbnk = У2 ak^k’
n~* k=l ' k=l
202 □ Chapter 4 Measure Theory
4.75	Provide a detailed justification of (4.12) on page 200.
4.76	Let (Q, Л, P) be a probability space and Y a real-valued random variable
taking on only finitely many values, say yi, уъ, ..., Уп- Verify that
'£(У) = ^укР(У = ук)	(4.13)
fc=l
where, by convention, {Y = y} = {x e Q : Y(x) = y}. (Equation (4.13)
shows that the mean of a random variable, У, taking on only finitely many
values is a weighted average of the values of Y, weighted according to their
probabilities.)
4.77	Let (Q,Л,/х) be a measure space, f G £г(/1), and {En}^=1 a sequence of
Л-measurable sets with Ei D Eq D • • •. Prove that
I fdp.— lim I fd^.
4.78	Suppose that f : [0,1] x (0,1) —* 11 is such that for each fixed у G (0,1), the
function, /М, defined by /^(ж) = f(x,y), is A4[0,i]-measurable. Further
suppose that df /ду exists and is bounded on [0,1] x (0,1). Show that
f(x,y)dx = lo ^(x,y)dx.
4.79	Let f G £г(О,Л,/i). Show that for each 6 > 0, there is an A G Л with
m(A) < oo and fAC \ f\ dp < c
★4.80 Suppose that f G £1(П,Л,/1). Show that for each e > 0, there is a 6 > 0
such that /i(E) < 6 => fE \ f\ dp < e.
*4.81 Let f G £1<(7^,Л4,Л). Then we define the Fourier transform of /, de-
noted /, by
f(t) = I e~itxf(x)dX(x), tell.
Jn
a)	Prove that f is continuous on H.
b)	Prove that if	[xf (x)| dX(x) < oo, then f is differentiable on and
/'(*) = f {-ix)e~itxf(x)dX(x). tell.
Jn
*4.82 Suppose that f G £1(Q,X,/i).
a) Show that for each e > 0, there is a bounded Л-measurable function, <?,
such that fQ I/ - g| d/i < e.
b) Show that for each e > 0, there is an Л-measurable simple function, s,
such that Jn I/ - s| d/i < 6.
4.5 Convergence in Measure □ 203
4.5 CONVERGENCE IN MEASURE
To this point, we have discussed three types of convergence for functions:
pointwise convergence, uniform convergence, and almost-everywhere con-
vergence. Another kind of convergence, important especially in probability
theory, is convergence in measure? Here is the definition.
DEFINITION 4.15 Convergence in Measure
Let (Г2,Л,/х) be a measure space and	a sequence of complex-
valued Л-measurable functions on Q. Then {/n}^Li is said to con-
verge in measure to the Л-measurable function f, if for each e > 0,
lim ц ({ x : |/(x) - fn(x)\ > c }) = 0.
n—>oo
We often write	indicate convergence in measure. Thus,
fn f if the measure of the set where fn differs from f by more than
any prescribed positive number tends to zero as n oo.
A first question is whether there is a relationship between almost-
everywhere convergence and convergence in measure. The following exam-
ple shows that, generally speaking, there is no relationship.
EXAMPLE 4.11 Illustrates Definition 4.15
a)	Let (Г2,Л,/х) = (7£,A4,A). Set f(x) = 0 and /n(z) = x/n for x e H
and n e АЛ Then fn~*f pointwise and, hence, A-ae. But fn -/+ f in
measure. Indeed, for e > 0,
{ x : |/(x) - /п(я)| > e } = (~°°,-Пб) u (n6> °0)
which has infinite Lebesgue measure for every n 6 Af. Therefore, we
see that A({x : |/(x) - /n(^)| > e})	0. Hence, almost-everywhere
convergence does not imply convergence in measure.
b)	Let (П,Л,м) = ([о, 1],A4[O,1],A[O,1])- Define /1 = X[o,i], /2 = X[o,i/2],
/3 = X[i/2,i] and, in general, if n = к + 2-\ where 0 < к < 2J , define
t In probability theory, the terminology convergence in probability is used in place
of convergence in measure.
204 □ Chapter 4 Measure Theory
fn = X[k2-j,(fc+i)2->]- Then for e > 0,
2
g({x : |/n(x)| > e}) < - -* 0
as П —oo. So, fn 0. But, for each x e [0,1], the sequence,
{/n(^))Xi’ contains infinitely many Is and infinitely many Os. Thus,
{/n}^Li converges for no x e [0,1] and, in particular, № □
Example 4.11(a) shows that, in general, convergence almost every-
where does not imply convergence in measure. For finite measure spaces,
however, the implication is correct.
PROPOSITION 4.11
Suppose that (fl^A^p) is a finite measure space and that	is a
sequence of complex-valued A-measurable functions that converges p-ae to
the A-measurable function f. Then fn f-
PROOF: Let В = { x : fn(x) f(x) }. Then, by assumption, p(B) = 0.
For e > 0, define En = { x : \ f(x) - /n(x)| > e } and E = П~=1 (U“=n Ek).
We must show that lim^oc p(En) = 0.
Note that x e E if and only if x e En for infinitely many n. It
follows easily that E С В and, hence, p(E) = 0. Because p(£l) < oo and
Ek э ur=n+1 Ek for each n e Af, we conclude from Theorem 4.1(c)
on page 170 that
limsup/z(En) < lim p[ I J Ek ) = p(E) = 0.
n—oo	n—oo \	J
k=n z
Hence, linin—oo p(En) = 0, as required.	
As we discovered in Example 4.11(b), convergence in measure does not
imply almost-everywhere convergence. However, we do have the following
useful result.
PROPOSITION 4.12
Suppose that	is a sequence of complex-valued A-measurable func-
tions that converges in measure to the A-measurable function f. Then
there is a subsequence, {fnk}^=1, of {/n}^=i such that fnk —» f p-ae.
4.5 Convergence in Measure □ 205
PROOF: We can, for each к e AT, choose an e X such that
м Qx : |/(x) - /njk(x)| > |	< 2-fc.	(4.14)
Furthermore,	can be selected so that n\ < П2 < •••• Now let
Ek = {x : \f(x) - fnk(x)\ > AT1} and E =	Note that
x E E if and only if |/(x) — /nfc(^)| > for infinitely many k.
From (4.14), we see that м№) < 00 and, consequently, by
Exercise 4.14 on page 174, p(E) = 0. We claim that fnk —» f on Ec. So
let x E Ec and e > 0 be given. Choose fci E AT so that fcf1 < e. Since
x E, it follows that there is a A?2 E Af such that x Ek for к > k2-
Let К = max{ki,&2}- Then we have that |/(x) - fnk(x)\ < &-1 < e for
all к > К.	
The DCT for Convergence in Measure
By employing Proposition 4.12, we can prove that the dominated con-
vergence theorem remains valid when almost-everywhere convergence is
replaced by convergence in measure. That is, we have the following result:
THEOREM 4.10
Let (Sl,A,p) be a measure space. Suppose that {/n}^=i is a sequence
of complex-valued А-measurable functions that converges in measure to
the А-measurable function f. Further suppose that there is a nonnegative
Lebesgue integrable function, g, such that \fn\ < g p-ae for each n E Af.
Then
[ У dp = lim [ fndp	(4.15)
JE	n-^OQ JE
for each E E A.
PROOF: Let E E A. To prove (4.15) it suffices, by Exercise 2.32 on
page 56, to show that every subsequence of {fE fndp}n=1 has a subse-
quence that converges to fEfdp. So, let {n/c}^L1 be a subsequence of АЛ
Whereas fn f, it is clear that fnk f. Applying Proposition 4.12, we
deduce that {fnk}kLi has a subsequence,	with fnk. f p-ae.
Clearly, we have |/nfc. | < g p-ae for each j E Jv and, hence, by the DCT
(Theorem 4.9 on page 197),
fdp = lim / fn dp.
Je
This completes the proof.	
206 □ Chapter 4 Measure Theory
EXERCISES 4.5
4.83	Show that if fn f and fn g, then f = g /2-ae.
+4.84 Suppose that f, Ji, /2, are in £1(О,Л,/х) and that f \f — fn\dp, —> 0 as
n —> 00. Show that fn —> / in measure.
4.85	Let (Q,v4,/z) be a measure space. A sequence,	of complex-valued
Л-measurable functions on Q is said to converge almost uniformly to the
complex-valued Л-measurable function, /, if for each € > 0, there is a set
A G Л such that /z(A) < € and fn—>f uniformly on Ac.
a)	Prove that almost-uniform convergence implies convergence in measure;
that is, if fn —> f almost uniformly, then fn —> f in measure.
b)	Prove that almost-uniform convergence implies almost-everywhere con-
vergence; that is, if fn —> f almost uniformly, then fn—>f p-ae.
c)	Does almost-uniform convergence imply pointwise convergence? Justify
your answer.
4.86	Provide a detailed justification for all statements in Example 4.11(b).
4.87	Let /, <7, /1, /2, • • • be complex-valued Л-measurable functions. Suppose
that fn —► g /z-ае and that fn —> f in measure. Prove that f — g /i-ае and,
hence, that fn-+f /x-ae.
4.88	Fatou’s lemma for convergence in measure: Suppose that {fn}^-!
is a sequence of nonnegative Л-measurable functions that converges in mea-
sure to f. Prove that
I f dp < lim inf I fn dp
Je n^°° Je
for each E G Л. Hint: Select a subsequence of { A that con-
verges to lim infn-oo fE fn dp.
4.89 Establish the following fact: If {fn}^=1 converges in measure, then it is also
Cauchy in measure, that is, for each e > 0, /i({ x : |/n(^) — f™(x)\ > e }) —* 0
as m, n —* 00.
4.90 Prove the following strengthened version of Proposition 4.12. Suppose that
is a sequence of complex-valued Л-measurable functions that con-
verges in measure to f. Then there is a subsequence, {/nfc}fc?=i, of {/n}^
such that fnk —> f almost uniformly. Hint: Show that there is a subse-
quence {nfcJkLi of Af such that
M (*[ x : l/nfc(x) — fnk+1 (я)| > 2	}) < 2
You will also need to apply the Weierstrass M-test.
4.91 Suppose that (9,Л,/1) is a measure space and that f, fi, f2, ... are
complex-valued Л-measurable functions on Q. Show that
00	/ 00 ✓ 00	\ \
{*: Jim^nCz) = /(*)} = P| ( |j(P]{z:	< £}) j •
m=l \n=l 'fc=n	' /
4.6 Extensions to Measures □ 207
★4.92 Suppose that (9,Л,/х) is a finite measure space and that /, /i, /2, • * • are
complex-valued Л-measurable functions on Q. Show that fn—>f M-ae if
and only if for each c > 0,
lim M{x : \f(x) - fk(x)\ > e}) =0.
n—»oo \	/
Xk=n	7
Compare this equation with the definition of convergence in measure.
4.93 Suppose (П,Л, д) is a finite measure space and	is a sequence of
complex-valued Л-measurable functions that converges in measure to f.
Further suppose g: C —* C is continuous. Prove that g о fn —> g о f in mea-
sure. Hint: For a given e > 0, let an = Д ({ x : |^(/(ж)) - ^(/п(ж)) | > e }).
Show that each subsequence of {an}^_1 has a subsequence that converges
to 0.
4.94 Egorov’s theorem: Suppose that (О,Л, д) is a finite measure space and
that /, /i, /2, • • • are complex-valued Л-measurable functions on Q. Prove
that if fn —> f	then fn—*f almost uniformly.
4.6 EXTENSIONS TO MEASURES
In Chapter 3, the concept of length was extended and replaced by that of
measure. Specifically, we began with the collection of intervals and the set
function, t, that assigns to each interval its length. The problem was to
extend t, to a measure defined on a a-algebra of subsets of TZ that contains
all intervals. We proceeded as follows: First we extended the concept of
length to all subsets of by defining Lebesgue outer measure, A*:
А*(Л) = inf <	: {In}n open intervals, D A ►.	(4.16)
Then we defined the Lebesgue measurable sets, Л4, to be the collection of
subsets E of that satisfy
A*(W) = A*(W П E) + A*(W П Ec)	(4.17)
for all W C 1Z. Finally, we proved that M is a cr-algebra containing all
intervals and that the set function, A = A* 1^4, is a measure on Л4 satisfying
A(7) = ^(/) for all intervals I. Thus, Lebesgue measure, A, provided the
required extension of length.
208 □ Chapter 4 Measure Theory
In this section, we will use our experience from Chapter 3 to handle
more general situations. Suppose then that Q is a set, C is a nonempty
collection of subsets of fi, and l is a nonnegative extended real-valued set
function on C. Our two primary questions are:
Question 1: Can l be extended to a measure on a ст-algebra containing С 1
Question 2: If such an extension exists, when is it unique?
We begin by considering Question 1.
Necessary Conditions; Semialgebras
First we will obtain some necessary conditions on l for an affirmative answer
to Question 1. So, assume that l can be extended to a measure, g, on a
a-algebra, A D C. Then, by Definition 4.1 on page 168, Theorem 4.1 on
page 170, and the fact that g is an extension of t, we must have
(El) If 0 e C, then t(0) = 0.
(E2) If {Cfc}J?=1 is a finite sequence of pairwise disjoint members of C
whose union is in C, then
Gn	\ n
Л=1	/	fc=l
(E3) If C, Ci, C2, .. • are in C and С C (Jn then
n
Conditions (E1)-(E3) are necessary conditions for the extension of l
to a measure on a ст-algebra containing C. In other words, unless those
three conditions hold, such an extension is impossible. Remarkably, as we
will see, if C is a semialgebra (defined in what follows), then those three
conditions are also sufficient for the extension.
DEFINITION 4.16 Semialgebra of Subsets
Let Q be a set. A nonempty collection, C, of subsets of Q is called a
semialgebra if the following conditions hold:
a)	If А, В e C, then А П В e C.
b)	If C € C, then there is a pairwise disjoint finite (possibly empty)
sequence of members of C whose union is Cc.
4.6 Extensions to Measures □ 209
In words, C is a semialgebra if it is closed under intersection and the
complement of each member of C is a finite (possibly empty) disjoint
union of members of C.
In what follows, we present a few examples of semialgebras. The jus-
tifications are left as exercises for the reader.
EXAMPLE 4.1	2 Illustrates Definition 4.16
a)	Any algebra and, hence, any ст-algebra is a semialgebra.
b)	Suppose that Q is a finite set. Let C denote the collection of sets consist-
ing of the empty set and all singleton sets, that is, sets of the form {#},
where x G Q. Then C is a semialgebra.
c)	Let I denote the collection of all intervals of 7£, including intervals of .
the form (a, a) and [a, a]. Then T is a semialgebra of subsets of It.
d)	Let In denote the collection of all n-dimensional intervals in 1tn] that
is, all sets of the form Ц x I2 x • • • x In where Ij € I for 1 < j < n.
Then Tn is a semialgebra of subsets of 1tn.	□
Existence of an Extension
Suppose now that Q is a set, C is a semialgebra of subsets of Q, and l is
a nonnegative extended real-valued set function on C satisfying Condi-
tions (E1)-(E3). As we mentioned earlier, under those assumptions, there
exists an extension of l to a measure, д, on a ст-algebra, Л, containing C.
To obtain the extension, we will mimic the procedure used in Chapter 3
for extending the concept of length.
The first step is to extend l to all subsets of Q using (4.16) on page 207
as a guide. This is done in Definition 4.17.
DEFINITION 4.17 Outer Measure
Let Q be a set, C a semialgebra of subsets of Q, and l a nonnegative ex-
tended real-valued set function on C satisfying Conditions (E1)-(E3).
Then the set function, д*, defined on P(Q) by д*(0) = 0 and
M*(A) = inf J J>(Cn) ; {C„}n С C, (JCn D A к
for A / 0, is called the outer measure induced by l and C.
210 □ Chapter 4 Measure Theory
The next example provides some illustrations of outer measure. The
details of verification are left to the reader as exercises.
EXAMPLE 4.1	3 Illustrates Definition 4.17
a)	Suppose that Q = {#i, #2?,xn} is a finite set and {ax, a2, • • • ? an} are
nonnegative real numbers. Let C denote the collection of sets consisting
of the empty set and all singleton sets. Define l on C by t(0) = 0 and
= ak for 1 < к < n. Then Conditions (E1)-(E3) hold and
m*(a) = 22
хкел
for each A C Q.
b)	Let T denote the collection of all intervals of 7£, including degenerate
intervals of the form (a, a) and [a, a]. Take Q = 7£, C = Z, and l = £
( = length). Then Conditions (E1)-(E3) hold and /i* = A*; that is, the
outer measure induced by £ and I is Lebesgue outer measure.
c)	Let In denote the collection of all n-dimensional intervals in 1Zn. Take
Q = 1V1, C — Tn, and l = £n = volume; that is, for Д x I2 x • • • x In € Zn?
£n(A x I2 x • • • x In) = £(/i)£(/2) • • -£(ln). Then Conditions (E1)-(E3)
hold. The outer measure induced by £n and Tn is called n-dimensional
Lebesgue outer measure and is denoted by A*.	□
Some basic properties of outer measure are provided by the following
proposition. Note that part (a) of the proposition shows that /z* is indeed
an extension of l.
PROPOSITION 4.13
The outer measure, /z*, induced by l and C satisfies
a)	— 1; that is, = l(C) for C G C.
b)	p*(A) > 0, for all A C Q. (nonnegativity)
с) А с В => /z*(A) < /z*(S).
d) M*(Un An) <
(monotonicity)
(countable subadditivity)
PROOF: We leave the proofs of parts (b) and (c) as exercises.
a) Let С G C. If C = 0, then, by Condition (El) and Definition 4.17,
t(0) = 0 = ;z*(0). So, assume C/0. Since {С} С C and C D C, we
have /z*(C) < t(C). On the other hand, if {Cn}n С C and |Jn Cn D C,
then, by Condition (E3), l(C) <	Thus, t(C) < /z*(C).
d)	If /z*(An) = 00 for some n, then, by part (b), the required inequality
holds. So, we can assume that /z*(An) < 00 for all n. Let б > 0 be
4.6 Extensions to Measures □ 211
given. For each n, choose {Cnk}k С C such that \JkCnk D An and
Efci(Cnfc) < /z*(An) + e/2n. Then {Cnfc}n,fc С C, \Jn>kCnk D UnA»
and, therefore,
д*(ил”) E^*) = EE^)
' n ' n,k	п к
<22(д*(Ап) + ^)<£М*Ш4-е.
n	n
Because e > 0 was arbitrarily chosen, /1* ((Jn An) <	
We have now completed the first step in obtaining the extension of l
to a measure on a cr-algebra containing C; namely, the construction of the
outer measure, /i*, which is an extension of l to all subsets of fi. The
second step is to restrict /i* to an appropriate a-algebra of subsets of Q so
as to ensure countable additivity. Thus, with (4.17) on page 207 in mind,
we make the following definition.
DEFINITION 4.18 Measurable Sets
A set E С fi is said to be /i*-measilrable if
/i*(W) = /i*(W П E) + П Ec)	(4.18)
for all subsets W of Q. The collection of all /^-measurable sets is
denoted by A.
EXAMPLE 4.1	4 Illustrates Definition 4.18
a)	Suppose that fi = {xi, 2:2,, xn} is a finite set and {ai, a2? • • •,o,n} are
nonnegative real numbers. Let C denote the collection of sets consisting
of the empty set and all singleton sets. Define l on C by t(0) = 0 and
t({x*}) = o>k for 1 < к < n. Referring to Example 4.13(a), it is easy to
see that A = P(f2). In other words, all subsets of Q are /^-measurable.
b)	Take Q = 7£, C = Z, and l = t. Then, by Example 4.13(b), /1* = A*.
Hence, in this case, the /immeasurable sets are the Lebesgue measurable
sets; that is, A = Л4.
c)	Take Q = 7£n, C = Tn, and l = tn = volume. Then the /1*-measurable
(i.e., A*-measurable) sets are called n-dimensional Lebesgue mea-
surable sets and the collection of all such sets is denoted by Mn. □
212 □ Chapter 4 Measure Theory
We claim that the set function, /1 = /х*|д, is the required extension of l.
To verify this, we must now establish three facts: A D C, A is a cr-algebra,
and /1 is a measure on A. The proofs of these facts are considered in the
following three propositions.
PROPOSITION 4.14
Every set C EC is ^-measurable. That is, A D C.
PROOF: Let C € C. We must show that (4.18) holds with E = C. Because
of countable subadditivity (Proposition 4.13(d)), it suffices to prove that
g*(W) > ^(W	+ Cc).	(4.19)
If C = 0, it is trivial. So assume C/0. If /i*(W) = 00, then clearly
(4.19) holds. So, assume that /i*(W) < 00. Let c > 0 be given. Choose
{Cn}n С C such that W C |Jn Cn and
£t(C'n)<M*(TV) + e.	(4.20)
n
Now, W А С C Un(Cn Cl C) and, hence, by Proposition 4.13,
д,(1ГПС)<^/1‘(СпПС) = £\(СпПС).	(4.21)
Also, we have WnCc C Un(Cn ACC) and, so, Proposition 4.13 implies that
/x*(W A Cc) < /i*(Cn A Cc). Since С E C and C is a semialgebra, there
exist a finite number of pairwise disjoint members of C, say Ai, ..., Am,
such that Cc = UZLi Ak- Then, for each n, Cn A Cc =	A Ak)
and, therefore, M*(Cn A Cc) < p,*(Cn A Ak) =	A Ak).
Consequently,
M*(WnCc)< J2j\(CnnAfe).	(4.22)
n k=l
Because of (4.21) and (4.22), we can conclude that
f!*(W П C) + p.*(W П Cc) < 22 c(Cn П C) + 22 E i(Cn n Ak)
П V=1	(4-23)
= 22 (^Cn n C) + 22 i(Cn n Ak)).
n '	fc=l	'
4.6 Extensions to Measures □ 213
But, Cn = (Cn П C) U (Cn n Cc) = {Cn n С) U (U£=i(Cn n Л)), Which is
a finite disjoint union of members of C. Thus, by Condition (E2),
m
t(Cn) = t(CnnC) +	(4.24)
fc=l
Substituting the left-hand side of (4.24) for the right-hand side in (4.23)
and employing (4.20), we can conclude that
H*{WnC) + f(WO Cc) < ^2i(Cn) < p,*{W) + e.
n
As e > 0 was arbitrary, we see that (4.19) holds.	
PROPOSITION 4.15
A is a а-algebra of subsets of Q.
PROOF: The proof is a duplication of the one given for Theorem 3.11 on
page 120 with Л4 replaced by A and A* replaced by /1*.	
PROPOSITION 4.16
Let /1 = /х*|Л. Then /i is a measure on A.
PROOF: Since, by definition, /i*(0) = 0, it follows that /1(0) = 0. Also, by
Proposition 4.13(b), /i*(A) > 0 for all A G Q and, hence, /i(A) > 0 for all
A € A. To show that /i is countably additive, we duplicate the proof of
Theorem 3.12 on page 122, replacing Л4 by A, A* by /i*, and A by /i.	
We have now established that /i is the required extension of l. As an
added bonus, it turns out that the measure space, (Q,A,/i), is complete.
To see this, let A G A with /i(A) = 0. We must show that if В G A, then
В € A. By the monotonicity of /i*, we have /i*(B) < /i*(A) = /i(A) = 0.
Therefore, /i*(B) = 0. Now, let W C Q. Then /i*(W П B) < /i*(B) = 0
and /i*(W П Bc) < Thus,
/i*(W) >/1*(ЖПВс) =/i*(WnB)+/i*(WnBc),
which implies that В € A.
The results that we have obtained so far are summarized in the fol-
lowing theorem.
214 □ Chapter 4 Measure Theory
THEOREM 4.11 Extension Theorem
Suppose fi is a set, C is a semialgebra of subsets of fl, and l is a nonneg-
ative extended real-valued function on C satisfying Conditions (E1)~(E3)
on page 208. Let p* denote the outer measure induced by c and C, A the
collection of p*-measurable sets, and p = М*|д- Then A is a a-algebra,
4 D C, p is a measure on A, and p\c = Moreover, the measure space,
(tl,A,p), is complete.
An important application of Theorem 4.11 is to n-dimensional Lebesgue
measure: Let fi = 1Zn, C =Tn, and l = tn = volume. Then /i* = A* and
A = Л4П. The restriction of A* to Mn is denoted by An and is called
n-dimensional Lebesgue measure.
Uniqueness of an Extension
Theorem-4.il states, in particular, that l has an extension to a measure on
a cr-algebra containing C, thus answering Question 1 on page 208. Now we
will consider Question 2, the question of uniqueness: Under the assump-
tions of Theorem 4.11, is an extension of c to a measure on a cr-algebra
containing C unique? In general, the answer to the uniqueness question is
no (see, for instance, Exercise 4.107). However, under certain conditions,
we can establish uniqueness results. We now proceed to do that.
To begin, we define two collections of subsets of Q associated with C:
C(r denotes the collection of all subsets of Q that are countable unions of
members of C; in other words, E € Ca if and only if there exists {Cn}n С C
such that E = |Jn Cn. Са$ denotes the collection of all subsets of fi that
are countable intersections of members of Ca; in other words, F G Cas if
and only if there exists {En}n C Ca such that F — Qn En.
Next, we establish three lemmas that are required in order for us to
prove a uniqueness theorem.
LEMMA 4.1
Let A C Q.
a) Given e > 0, there is an E G Ca with E D A and p*(E) < p*(A) 4- e.
b) There is an F € Ca$ such that F D A and p*(F) = P*(A).
PROOF:
a)	If p*(A) = oo, then also /i*(Q) = oo. The required result now follows
because Q G Ca. (Why?) So, assume that p*(A) < oo. Then there
exists {Cn} С C such that (Jn Cn Э A and £n t(Cn) < д*(А) 4- e. Let
4.6 Extensions to Measures □ 215
E = Un Cn. Then E e Ca, E D A, and
д*(Е) < 5>*(Cn) = J>(Cn) < M*(A) + e.
b)	If /i*(A) = oo, take F = Q. So, assume that /i*(A) < oo. By part (a),
we can, for each n € AT, choose En G Ca such that En D A and
Д*(ВП) < д*(А)+1/п. Let F = |X=1 En. Then F G Ca6 and F D A. In
particular, then, /i*(F) > /z*(A). On the other hand, because F C En,
we have p*(F) < /1*(ВП) < /i*(A) + l/n f°r a^ n G ЛА Therefore,
д*(В)<д*(А).	
LEMMA 4.2
The algebra generated by C, Ao(C), consists of the empty set and all finite
disjoint unions of members of C.
PROOF: Let denote the collection of sets consisting of the empty set and
all finite disjoint unions of members of C. We must prove that T> — Аъ(С).
Clearly any algebra of sets containing C must contain P; so, С Ao(C).
To establish the reverse inequality, it suffices to prove that is an algebra,
because Aq(C} is the smallest algebra containing C.
First we show that is closed under finite intersections. So, suppose
A G V and В G V. We claim that А П В G V. If either A or В is empty,
then А П В = 0 G T). So, assume neither A nor В is empty. Then there
exists a pairwise disjoint sequence, {Ai}Jl1? of members of C such that
A = (Jili and a pairwise disjoint sequence, {Bj}j=1, of members of C
such that В = Uj=i Bj- Consequently,
m /	✓ n \ \ m n
AnB = U An(jBj) =ии(лпвд
i=l \	J = 1	/ i=lj=l
Since A^ Bj G C, we have А; П Bj G C. Moreover, since the AiS and BjS
are each pairwise disjoint, so are the (Ai П By)s. Hence, А П В is a finite
disjoint union of members of C and, consequently, is a member of V.
Next, we show that V is closed under complementation. Assume that
A G P. If A = 0, then Ac = Q G (Why?) If A 0, then there
exists a pairwise disjoint sequence, {Ai}£L 1? of members of C such that
A = Uili Л- Since Ai G C, Af is a finite disjoint union of members of C;
hence, Af G V. From the previous paragraph, we know T) is closed under
finite intersections. Thus, Ac = x A^ G	
216 □ Chapter 4 Measure Theory
LEMMA 4.3
Let E 6 Ca. Then E can be written as a countable disjoint union of
members of C.
PROOF: By definition, there exists {Cn}n С C such that E = |Jn Cn. In
particular, {Cn}n С До(С). Let = Ci and Dn = Cn\|JZ=i for n - 2-
Then the Dns are pairwise disjoint and E — (Jn Dn. Moreover, Dn € Ло(С)
for each n. Without loss of generality, we can assume that Dn /= 0 for all n.
Since Dn € Ло(С), we know by Lemma 4.2 that there is a finite sequence,
of pairwise disjoint members of C such that Dn = U*=i &nj- It
follows that {£11,..., Eik! , £*21 ? • • •, E2k2> • • •} is a countable collection of
pairwise disjoint members of C whose union is E.	
We are now in a position to prove a theorem that deals with the
question of uniqueness for an extension of l to a measure on a a-algebra
containing C.
THEOREM 4.12
Let Q be a set, C a semialgebra of subsets of П, and l a nonnegative
extended real-valued function on C satisfying Conditions (E1)-(E3) on
page 208. Suppose there is a sequence, {Cn}n, of subsets of Q such that
(E4) {Cn}n с C, |Jn Cn = Q, and ь(Сп) < 00 for each n.
Then there exists a unique extension of l to a measure on A(C), the
а-algebra generated by C.
PROOF: Let /x* be the outer measure induced by l and C, A the collection
of jx*-measurable sets, and p = /х*|д. By Theorem 4.11, A is a cr-algebra,
A D C, p is a measure on A, and p\c = ь. It follows that A D A(C) and
that if we define v = P\a(C), then v is an extension of l to A(C). Therefore,
the existence portion of the theorem is established.
It remains to prove the uniqueness portion of the theorem, that 1/ is
the only extension of l to Л(С). In other words, we must show that if т is
a measure on Л(С) with r(C) = t(C) for all C € C, then
r(A) = i/(A), A € A(C).	(4.25)
In establishing (4.25), we will use the fact that CG С Л(С), which follows
because A(C) is a cr-algebra containing C.
First, we will show that
t(E) = i/(E), EeCa.	(4.26)
4.6 Extensions to Measures □ 217
If E G Ca, then, by Lemma 4.3, there exists {Cn}n С C with Ci П Cj = 0,
forfi / j, such that E = |J Cn. Consequently,
7(E) = £ 7(Cn) = Y, <cn) = £ ^(Cn) = p(E),
n	n	n
which establishes (4.26).
Next, we will show that
r(A) = i/(A), A e A(C), i/(A) < oo.	(4.27)
For a given e > 0, we can, by Lemma 4.1(a), select a set E € Ca such
that E D A and /1*(Е) < /i*(A) 4- б which, in this case, is equivalent to
i/(E’) < i/(A) 4- e. As E D A and E G Ca, we conclude from (4.26) that
t(A) < r(E) = i/(E) < i/(A) + 6.
As e > 0 was arbitrary, we see that
r(A) < i/(A), A G A(C), i/(A) < oo.	(4.28)
To prove the reverse inequality, we again select, for a given б > 0, a set
E G Ca such that E D A and v(E) < p(A) 4- 6. Since i/(A) < oo, we
have v(E \ A) = v(E) - i/(A) < 6. Applying (4.28) to E \ A, we obtain
т(Е \ A) < 6. Hence, by (4.26), we can now conclude that
p(A) < i/(E) = t(E) = t(A) + r(E \ A) < t(A) 4- 6.
As e > 0 was arbitrary, we see that i/(A) < r(A). This and (4.28) imply
that (4.27) holds.
It remains to establish (4.25) when i/(A) = oo. Let {(7n}n be as
in Condition (E4). By Exercise 4.106, we can assume that the Cns are
pairwise disjoint. Now, A = AnQ = |Jn(A^n)’ Because
i/(A П Cn) < u(Cn) = b(Cn) < oo,
(4.27) implies that i/(A П Cn) = т(А П Cn). Consequently,
i/(A) = J>(A П Cn)	Cn) = 7(A).
n	n
The proof of the theorem is now complete.	
Three particularly important consequences of Theorem 4.12 are given
here in Corollaries 4.5-4.7. We will refer to these corollaries frequently.
218 □ Chapter 4 Measure Theory
COROLLARY 4.5
Let (Q, A, p) be a measure space. Suppose that C is a semialgebra of subsets
of Q such that the а-algebra generated by C is A. Further suppose that
there is a sequence, {Cn}n С C, with Un^ = an(^ < °° f°r
each n. If v is a measure on A such that v(C) = p(C) for all C € C, then
v = p, that is, = /i(A) for all A € A.
PROOF: Let i = p\c (= P|c)• Since p is a measure, it follows immediately
that Conditions (E1)-(E3) are satisfied by i and C. Also, by assumption,
Condition (E4) holds. Therefore, by Theorem 4.12, i has a unique extension
to the a-algebra generated by C, which, by hypothesis, is A. Since both p
and if are extensions of i to A, it must be that if = p.	
COROLLARY 4.6
Let p and v be two Borel measures (i.e., measures onB) such that р(Г) < oo
for all finite intervals and p(I) = i/(I) for all I G I. Then p = v.
PROOF: By Exercise 4.98(b), T is a semialgebra and, by Exercise 4.108,
the cr-algebra generated by Z is B. We have H = U^=i[“n?n] an<^ by
assumption, /z([—n, n]) < oo for all n E J\l\ The required result now
follows from Corollary 4.5.	
COROLLARY 4.7
Let (Q, Л) be a measurable space and C a semialgebra of subsets offl such
that the а-algebra generated by C is A. If p and v are two finite measures
on A such that p(C) = v(C) for all C EC, then p = u.
PROOF: By Exercise 4.105(a), Q G Ca. As p is a finite measure, we see
that all the assumptions of Corollary 4.5 are satisfied.	
Remark: In Corollary 4.7, we really need only assume that at least one of
the measures, p and i/, is finite (why?).
EXERCISES 4.6
4.95	Provide the details showing that Conditions (E1)-(E3) on page 208 are
necessary for the extension of l to a measure on a a-algebra containing C.
★ 4.96 Let C denote the collection of intervals of R of the form (a, 6] and (c, oo),
where — oo < a < b < oo and —oo < c < oo. Prove that C is a semialgebra
and that Л(С) = В.
4.97	Suppose that Q = {#i, #2,.. •, £n} is a finite set and {ai, аг,..., an} are
nonnegative real numbers. Let C denote the collection of sets consisting of
4.6 Extensions to Measures □ 219
the empty set and all singleton sets, that is, sets of the form {#}, where
x G Q. Define l on C by t(0) = 0 and t({zfc}) = л/с for 1 < fc < n.
a)	Verify that Conditions (E1)-(E3) on page 208 hold.
b)	Show that C is a semialgebra of subsets of Q.
4.98	Let T denote the collection of all intervals of 7£, including degenerate in-
tervals of the form (a, a) and [a, a]. Take Q = Tfc, C = Z, and l = I
( = length).
a)	Show that Conditions (E1)-(E3) on page 208 hold.
b)	Show that I is a semialgebra of subsets of TZ.
4.99	Let Z2 denote the collection of all two-dimensional intervals in T?2; that
is, all sets of the form /1 x I2 where I3e T for 1 < j < 2. Take Q = 7£2,
C = Z2, and l = £2 = area; that is, for Zi x Z2 G Z2, £z(Zi x Z2) = Z(Zi)Z(Z2).
a) Show that Conditions (El)-(E3) on page 208 hold.
b) Show that Z2 is a semialgebra of subsets of 1Z2.
4.100	Generalize Exercise 4.99 to n-dimensions.
4.101	Refer to Exercise 4.97. Prove that p* = ]Г£=1 a^xk\ that is, prove that
м*И) = ИХкЕА ak for each A c Q-
4.102	Refer to Exercise 4.98. Prove that the outer measure, /1*, induced by £
and Z is Lebesgue outer measure.
4.103	Prove parts (b) and (c) of Proposition 4.13 on page 210.
4.104	Refer to Exercises 4.97 and 4.101. Establish that every subset of Q is
/2*-measurable; that is, A = P(Q).
4.105	Let C be a semialgebra of subsets of Q.
a)	Prove that Q G
b)	Is it necessarily true that Q G C?
4.106	Suppose Condition (E4) on page 216 holds. Prove there exists {En}n С C
with (Jn En = Q, EiO Ej = 0, for i / J, and c(En) < 00 for each n.
4.107	Prove that Condition (E4) cannot be omitted as a hypothesis in Theo-
rem 4.12. Hint: Let C be as in Exercise 4.96, t(0) = 0, and l(C) = 00 for
C G C and C / 0.
4.108	Let Z be as in Exercise 4.98. Show that the cr-algebra generated by Z is B.
4.109	Let I be as in Exercise 4.98. Suppose that g is a nonnegative Lebesgue
measurable function on 1Z satisfying gdX < 00 for each n G
Define l on Z by
t(C) = [ gdX.
Jc
a)	Verify that Conditions (E1)-(E3) are satisfied by l and Z.
b)	Show that there is a unique extension of l to a measure, д, on В and
that 11(B) = fBg dX.
4.110	Suppose that p and 1/ are two finite Borel measures with the property that
//((—00, я]) = i/((—00, re]) for all x elZ. Prove that /1 = 1/.
220 □ Chapter 4 Measure Theory
4.111	Can the finiteness assumption be dropped in Exercise 4.110? Explain.
4.112	Suppose that p and и are two Borel measures with the property that
/i((—oo, я]) = г/((~-оо,я]) < oo for all x 6 H. Prove that /1 = v.
it4.113 Let Q be a set, C a semialgebra of subsets of Q, and t a nonnegative
extended real-valued function on C satisfying Conditions (E1)-(E4). Also,
let /i* be the outer measure induced by l and C, A the collection of /im-
measurable sets, and p = М*|д- Suppose that E G A.
a)	Show that there is an A € A(C) with A Z) E and p(A \ E) = 0. Hint:
First assume that p(E) < oo and employ Lemma 4.1.
b)	Show that there is a В G A(C) with BcE and p(E \ B) = 0.
it4.114 Let Q be a set, C a semialgebra of subsets of Q, and l a nonnegative
extended real-valued function on C satisfying Conditions (E1)-(E4). Also,
let /z* be the outer measure induced by l and C, A the collection of //*-
measurable sets, p = м*|д, and u = М|Л(С)- Prove that (Q,A, p) is the
completion of (Q, A(C),i/). Hint: Use Exercise 4.113 and Exercise 4.17 on
page 174.
4.115 Consider the fneasure space (7£, A4,A).
a)	Can we deduce from Theorem 4.12 that A is the unique extension of
length to a measure on Л4? Explain.
b)	Prove that A is the unique extension of length to a measure on Л4.
4.7 THE LEBESGUE-STIELTJES INTEGRAL
In the previous section, we developed existence and uniqueness theorems
for extensions to measures. Specifically, suppose that Q is a set, C is a
semialgebra of subsets of Q, and l is a nonnegative extended real-valued
function on C. If Conditions (E1)-(E3) on page 208 hold, then there is an
extension of l to a measure on a a-algebra containing C; and if, in addi-
tion, Condition (E4) on page 216 holds, then an extension to the smallest
a-algebra containing C is unique.
Two important applications of this theory are to the Lebesgue-Stieltjes
integral and to product measure spaces. We will discuss the former appli-
cation in this section and the latter in the next.
Distribution Function of a Finite Borel Measure
Recall that a measure, p, on the Borel sets, Б, is called a Borel mea-
sure and that such a measure is called finite if p(1Z) < oo. With these
conventions in mind, we make the following definition.
4.7 The Lebesgue-Stieltjes Integral □ 221
DEFINITION 4.19 Distribution Function of a Finite Borel Measure
Let /z be a finite Borel measure. Then the distribution function
of /x, denoted FM, is the real-valued function defined on 11 by
= д((-оо,а:]).
Note: We will sometimes omit the subscript /z in FM, provided that no
confusion will arise.
Example 4.15 gives some illustrations of distribution functions. The
reader should supply the details of verification.
EXAMPLE 4.15 Illustrates Definition 4.19
a)	Let fi = A|#. Then /z is a Borel measure but is not finite because
/z(7£) — X(1Z) = oo. Hence, we do not define the distribution function
of fl.
b)	For В e B, define fi(B) = A(Bn (0,1)). Then fi is a Borel measure
and, as fi(lZ) = A ((0,1)) — 1 < oo, it is a finite Borel measure. Its
distribution function, FM, is easily seen to be
r 0, x < 0;
FM(z) = < x, 0 < x < 1;
11, z>l.
c)	Recall that if b G 7£, then the set function,
is a measure on P(7£), called the Dirac measure concentrated at b. Let
fi = restricted to B. Then fi is a finite Borel measure and
„ f ч f 0, x < b;
W = b, X>b.
d)	Suppose that {nn}^=i is a sequence of nonnegative real numbers with
52^= i an < °0- Define fi on В by
m(b) = 52an-
n€B
222 □ Chapter 4 Measure Theory
Then fi is a finite Borel measure whose distribution function is
И
= an>
n=l
where [z] denotes the greatest integer in x.	□
Some of the more important properties of distribution functions are
presented in the next two propositions.
PROPOSITION 4.17
Let fi be a finite Borel measure and F its distribution function. Then
a)	F is monotone nondecreasing.
b)	F is right continuous.
c)	F is bounded.
d)	Ипъс_>_О0 F(x) = 0.	\
PROOF:
a)	If x < ?/, then (—оо,ж] C (—oo,?/] and, hence, by the monotonicity
property of measures, F(x) = fi((—oo,x]) < g((—oo, ?/]) = F(y).
b)	Let x e TZ. Since F is nondecreasing, limy;x F(y) = F(x+) exists. Now,
(-oo.x + 1] D (-oo,z 4- |] D ••• and П^°=1(-оо, x + £] = (~oo,z].
Therefore, because fi is a finite Borel measure, we have, by Theo-
rem 4.1(c) on page 170,
F(x) = /z((-oo,x]) = ^lim, д((—oo,x + £])
= lim F(x -I- A) = F(x+).
n—+oo 4 n/
Hence, F(x+) = F(x), that is, F is right continuous at x. Because
x € TZ was arbitrarily chosen, we see that F is right continuous.
c)	As fi is a finite measure, we have F(x) = fi((—oo,x]) < fi(1Z) < oo, for
each x € TZ. Hence F is bounded by fi(TZ).
d)	First note that, because F is monotone, lim^-so F(x) exists. Also, we
have 0 = D^L1(—co, — n] and (—oo, —1] D (—oo, —2] D • • •. Thus, since
fi is a finite measure,
0 = /z(0) = lim /z((-oo, -n]) = lim F(-n) = lim F(x).
The last equality holds because lim^-^-oo F(x) exists.	
Proposition 4.17(a) shows that FM is monotone nondecreasing. Hence,
F^x) has a limit as both x —> — oo and x —» oo. We denote those limits by
FM(—oo) and FM(oo), respectively. By Proposition 4.17(d), FM(—oo) = 0;
and it is easy to prove that FM(oo) = fi(TZ).
4.7 The Lebesgue-Stieltjes Integral □ 223
PROPOSITION 4.18
Let be a finite Borel measure and F its distribution function. Then, for
—oo < а < b < oo,
д((а, b]) = F(b) - F(a)	(4.29)
and, for —oo < c < oo,
/z((c, oo)) = F(oo) - F(c).	(4.30)
PROOF: If а = — oo, then, by Definition 4.19 and Proposition 4.17(d),
g((a, b]) = F(b) = F(b) - F(-oo) = F(b) - F(a).
If —oo < a < oo, then, since fi is a finite measure, we have
д((а, 6]) = /z((-oo, 6]) - p((-oo, a]) = F(b) - F(a).
This proves (4.29). To prove (4.30), note that
д((с, oo)) = д(тг) - м((-00,c]) = F(oo) - F(c),
as required.	
Lebesgue-Stieltjes Measure
We now consider the following two important questions concerning a real-
valued function, F, on 7£:
Question 1: Under what conditions is F the distribution function of some
finite Borel measure?
Question 2: Can F be the distribution function for two different finite
Borel measures?
As we have just seen, a necessary condition for a real-valued function,
F, on R, to be the distribution function of some finite Borel measure is
that (a)-(d) of Proposition 4.17 hold. In other words, unless F satisfies the
properties listed in Proposition 4.17, it cannot possibly be the distribution
function of a finite Borel measure.
By employing Theorem 4.12 on page 216, we can show that the proper-
ties listed in Proposition 4.17 are not only necessary, but are also sufficient
for F to be the distribution function of some finite Borel measure. More-
over, using that same theorem, we can prove that the answer to Question 2
is no.
So, assume that F satisfies (a)-(d) of Proposition 4.17. We will use F
to define a nonnegative set function, l, on a semialgebra, C, of subsets
224 □ Chapter 4 Measure Theory
of H. Then we will prove that Conditions (E1)-(E4) of Section 4.6 hold
for l and C. Finally, we will show that the measure, /z, guaranteed by
Theorem 4.12 is a finite Borel measure whose distribution function is F
and that /z is the only such measure.
To begin, let C denote the collection of intervals of 11 of the form
(a,b] or (с, сю), where —oo < a < b < oo and —oo < c < oo. Then, by
Exercise 4.96 on page 218, C is a semialgebra and Л(С) = В.
Next, we want to use F to define a nonnegative set function, z, on C
in such a way that if /z is an extension of l to a measure on Z3, then F is
the distribution function of /z. In view of (4.29) and (4.30), we see that
l should be defined on C as follows: For —oo<a<b<oo,
t((a,i>])=F(b)-F(a),	(4.31)
and, for —oo < c < oo,
z((c, oo)) = F(oo) - F(c).	(4.32)
Note that l is nonnegative because, by assumption, F is nondecreasing.
Now we will verify that Conditions (E1)-(E4) hold for l and C. Us-
ing (4.31) with b = a, we see that z(0) = z((a, a]) = F(a)—F(a) = 0. Hence,
Condition (El) on page 208 holds. To verify Condition (E4) on page 216,
we can, for instance, take {Cn}n to consist of the single set (—00,00).
The validity of Conditions (E2) and (E3) for l and C are established
in Lemmas 4.4 and 4.5, respectively. In proving those lemmas, it is con-
venient to write (c, 00) as (с, ш], with the conventions that F(cj) = F(oo),
t((c, tu]) = F(oo) - F(c), and (c, tu]c = (—00, с]. Using this notation, C con-
sists of all sets of the form (a, b], where either —00 <a<b<ooora> —00
and b = tu.
LEMMA 4.4
Suppose that	is a finite sequence of pairwise disjoint members ofC
whose union is in C. Then z(|J^=1 Ck) = 52/Ui ь(Ск)-
PROOF: Set C = Ufc=i &k- Then, by assumption, C € C. So we can write
C = (a, b] and Ck = (ak,bk\, 1 < к < n. Without loss of generality, we
can assume that ax < аг < • • • < an- Since UZ=i = C and the C^s are
pairwise disjoint, а = ax < b± = a 2 < 62 = • • • = Qn-i < bn-i — an and
bn = b. Hence,
t(C) = F(6) - F(a) = £(F(i>fc) - F(afc)) =
fc=l	fc=l
as required.	
4.7 The Lebesgue-Stieltjes Integral □ 225
LEMMA 4.5
Assume C, C2, ... are in C and С C |Jn ^n- Then	ь(Сп).
PROOF: We can write Cn = (an, bn], for each n, and C = (a, b]. Assume
first that a, b G Я; that is, C is a finite interval. Let б > 0 be given. Because
F is right continuous, we can choose a <5 > 0 such that F(a+<5) < F(a) 4-б/2
and, for each n, a 6n > 0 such that F(bn + <5n) < F(bn) + e/2n+1.
The interval [a + <5, b] is closed and bounded and, since С C Un^’
it follows that [a + <5,6] C Un(an»^n + <5n)« Hence, by the Heine-Borel
theorem, there is an N G A/* such that [a + <5,6] C U^=i(an, bn + <5n). Set
In — bn + 6n). Arguing as in Proposition 3.2 On page 107, we find that
there is an integer m, with m < N, and a sequence of intervals {Л}£1
such that Ji = (ci,d$) G {ln}n=i> for 1 < г < m, and
Cj < a 4“ 5, c2 < dj d2, ..., Cm dm—Y < b <z dm.
Because F is nondecreasing and C {ln}n=i C {(an> &n+ <5n)}n> we
conclude that
F(b)-F(a + 6)<F(dTn)-F(c1)
< F(dm) - F(cj) + (F(d!) - F(c2))
+ -.- + (F(dro_1)-F(Cm))
m
=	- Ffo)) < ]T(F(bn + 6n) - F(an)).
i=l	n
Consequently,
F(b) - F(a) < F(i>) - F(a + 6) + f
<	£(F(6n + 6n)-F(an)) + f
n
<	£№) + 2^r - F(an)) + f
n
<	2(F(6n)-F(an))+e.
n
In other words, l(C) < t(Cn) + e. As б > 0 was arbitrary, we have
б(С) < 52n^(C'n), as required.
The lemma has now been established when C is a finite interval. The
proof for the case where C is an infinite interval is left as an exercise for
the reader.	
We have now verified that Conditions (E1)-(E4) are satisfied by l
and C. Using that fact, we can prove a theorem that answers Questions 1
and 2 on page 208.
226 □ Chapter 4 Measure Theory
THEOREM 4.13
Suppose that F is a real-valued function on 7Z satisfying (a)-(d) of Proposi-
tion 4.17 on page 222. Then there is a unique finite Borel measure having F
as its distribution function.
PROOF: Let C and l be as defined earlier. Since Conditions (E1)-(E4) are
satisfied by C and t, Theorem 4.12 implies that there is a unique extension
of l to a measure, p, on A(C) — B. Using the fact that p is an extension
of l and the relation (4.31), we conclude that, for each x € 7£,
p((—oo, x]) = t((—oo,x]) = F(x) — F(—oo) = F(x).
Thus, F is the distribution function of p.
Suppose that v is also a finite Borel measure having F as its distri-
bution function. Then, by Proposition 4.18 and the definition of t, we see
that i/|C = l. Therefore, by the uniqueness of the extension of t, we must
have v = p.	
Theorem 4.13 reveals that the properties listed in Proposition 4.17 on
page 222 are sufficient conditions for a real-valued function on to be the
distribution function of some finite Borel measure. Consequently, we make
the following definition.
DEFINITION 4.20 Distribution Function; Lebesgue-Stieltjes Measure
A real-valued function, F, on TZ is called a distribution function
provided that the following conditions hold:
a) F is monotone nondecreasing,
b) F is right continuous,
c) F is bounded.
d) lim^.oo F(x) = 0.
For such a function, the unique finite Borel measure having F as its
distribution function is called the Lebesgue-Stieltjes measure cor-
responding to F.
The next example provides some illustrations of Theorem 4.13 and
Definition 4.20. The details of verification are left to the reader as exercises.
4.7 The Lebesgue-Stieltjes Integral □ 227
EXAMPLE 4.1	6 Illustrates Theorem 4.13 and Definition 4.20
a)	Let F be defined by

x < 0;
0 < x < 1;
x > 1.
Then F is bounded, nondecreasing, continuous, and F(—oo) = 0. Con-
sequently, by Theorem 4.13, there is a unique finite Borel measure hav-
ing F as its distribution function. Let /z be the Borel measure defined
by = Л(ВП (0,1)). Then, as we discovered in Example 4.15(b) on
page 221, /z has F as its distribution function. Hence, /z is the unique fi-
nite Borel measure having F as its distribution function; in other words,
/z is the Lebesgue-Stieltjes measure corresponding to F.
b)	Let g be a nonnegative Lebesgue integrable function on 11. Define F
on 1Z by
F(z) = [ g(t)dX(t).	(4.33)
J (—oo,a;]
Then F is nondecreasing, continuous, bounded, and F(-oo) = 0. So,
by Theorem 4.13, there is a unique finite Borel measure having F as
its distribution function. For В € Б, define ц(В) = jBgdX. Then /z
is a finite Borel measure and, clearly, F is the distribution function
of /z. Consequently, /z is the Lebesgue-Stieltjes measure corresponding
to F.	'	□
The Lebesgue-Stieltjes Integral
Assume F is a distribution function; that is, a real-valued function on 1Z
satisfying (a)-(d) of Proposition 4.17 on page 222. Then, as we know from
Theorem 4.13, there is a unique finite Borel measure, /z, having F as its
distribution function. Hence, it is natural to make the following definition.
DEFINITION 4.21 Lebesgue-Stieltjes Integral
Suppose that F is a distribution function and that /z is the Lebesgue-
Stieltjes measure corresponding to F. Let f be a Borel measurable
function and В € В. Then the Lebesgue-Stieltjes integral of f
over В with respect to F is defined to be
[ f(x)dF(x)= [ /(x)d/z(z),
J в	Jb
provided the integral on the right-hand side makes sense.
228 □ Chapter 4 Measure Theory
EXAMPLE 4.1	7 Illustrates Definition 4.21
sl) Let
°,
F(x) = x,
x < 0;
0 < x < 1;
11, z>l.
By Example 4.16(a), F is a distribution function and the Lebesgue-
Stieltjes measure corresponding to F is given by = A(B П (0,1)),
В € В. Let f be a Borel measurable function and В € В. Then the
Lebesgue-Stieltjes integral of f over В equals
[ fdF= [ fdp = [ fdX
В Jb JBn(0,l)
(4.34)
provided the integral makes sense. To verify the last equality in (4.34),
we apply the bootstrapping technique. The details are left to the reader
as an exercise.
b) Let g be a nonnegative Lebesgue integrable function. Define F on H by
F(x) = [ g(t)dX(t).
J (—00,2:]
By Example 4.16(b), F is a distribution function and the Lebesgue-
Stieltjes measure corresponding to F is given by /z(B) = fBg dX, В € В.
Let f be a Borel measurable function and В € В. Then the Lebesgue-
Stieltjes integral of f over В equals
f fdF= f fd» = f fgdX	(4.35)
J В	J В	J в
provided the integral makes sense. To establish the last equality in
(4.35), we proceed as follows. By Exercise 4.61 on page 191, the equality
holds if f is a nonnegative Borel measurable function. If / is an extended
real-valued Borel measurable function, write f = /+ — f~ and use the
linearity of the abstract Lebesgue integral to conclude that (4.35) again
holds. Finally, if f is a complex-valued Borel measurable function, write
f = %tf 4- and apply the linearity of the abstract Lebesgue integral
to again conclude that (4.35) obtains.
Before leaving this example, we should point out that part (a) is a special
case of part (b) with g = X(o,i)«	□
4.7 The Lebesgue-Stieltjes Integral □ 229
EXERCISES 4.7
4.116	Provide the details for the illustrations given in parts (b)-(d) of Exam-
ple 4.15 on page 221.
4.117	Let {zn}n be a sequence of distinct real numbers and {6n}n a sequence of
nonnegative real numbers with J2n bn < oo. Define /z on В by
m(B) = 52 bn
xnEB
a)	Explain why p is a finite Borel measure.
b)	Determine the distribution function of p.
4.118	Define p on В by /z(B) = Х[о,оо)(я)яе-х dX(x).
a)	Explain why p is a finite Borel measure.
b)	Determine the distribution function of p.
4.119	Let p be a finite Borel measure. Prove that FM(oo) = p(7V).
4.120	Complete the proof of Lemma 4.5 on page 225. In other words, prove that
Condition (E3) on page 208 is satisfied by t and C when C is an infinite
interval. Hint: First assume C = (—oo, 6], where b < oo, and note that
t(C) = limx-^-oo l ((x,b]).
4.121	Verify all statements made in Example 4.16(a) on page 227.
4.122	Verify all statements made
4.123	Verify all statements made
4.124	Verify all statements made
in Example 4.16(b) on page 227.
in Example 4.17(a) on page 228.
in Example 4.17(b) on page 228.
4.125 Define F on H by
x < 0;
0 < x < 1;
1	< x < 2;
2	< x < 3;
x > 3.
a)	Show that F satisfies (a)-(d) of Proposition 4.17 on page 222.
b)	Obtain the finite Borel measure, /z, whose distribution function is F.
4.126	Suppose that {an}^! is a sequence of nonnegative real numbers and that
522^1 an < oo. Define F on 7^ by
[®]
F(x)=52an-
n=l
a)	Show that F is a distribution function, that is, satisfies (a)-(d) of Propo-
sition 4.17.
b)	Determine the Lebesgue-Stieltjes measure corresponding to F.
c)	If f is Borel measurable, determine J f dF.
230 □ Chapter 4 Measure Theory
4.127	Generalize the previous exercise as follows: Suppose that {xn}n is a se-
quence of real numbers and that {an}n is a sequence of nonnegative real
numbers with an < oo. Define F on 1Z by
= 52 “»•
Xn<x
a)	Show that F is a distribution function, that is, satisfies (a)-(d) of Propo-
sition 4.17.
b)	Determine the Lebesgue-Stieltjes measure corresponding to F.
c)	If f is Borel measurable, determine f f dF.
4.128	Let a be a positive constant and define F(x) = 1 — e~ax, for x > 0, and
zero otherwise.
a)	Show that F satisfies (a)-(d) of Proposition 4.17.
b)	Find a nonnegative Borel measurable function, g, such that
F(a?) = f g(t) dt
J —oo
for all x G 1Z.
c)	Determine the unique finite Borel measure that has F as its distribution
function.
d)	Find f^xdF(x) and f^ettx dF(x). Hint: Use Example 4.17(b).
4.129	Let F:7Z —* 1Z be defined by
F(x) = <
0,
(x + 2)/4,
1,
x < —2;
—2 < x < 2;
x > 2.
a)	Show that F is a distribution function.
b)	Determine the Lebesgue-Stieltjes measure corresponding to F.
c)	Find f^xdF(x), f^x2 dF(x), and /^еах dF(x) for t elZ.
4.130	Let F: 1Z TZ be defined by
n=0
where a is a positive constant. Obtain J^xdF(x) and J^x2 dF(x).
4.131	Suppose that F is a distribution function. Further suppose that F is
differentiable on 1Z and that F' € R([a, 6]) for all a, b G 1Z. If f is Borel
measurable, show that
/(z)dF(z) = / f(x)F'(x)dX(x)
4.8 Product Measure Spaces □ 231
whenever the integral on the right-hand side makes sense. Hint: Use the
fundamental theorem of calculus and Example 4.17(b) on page 228.
4.132	Let V’ denote the Cantor function, as defined on page 77. Set
ro,
F(x) = < V’Ce),
, 1,
x < 0;
0<x< 1;
x > 1.
a)	Show that F is a distribution function.
b)	Verify that F' = 0 A-ae.
c)	Prove that the conclusion of the previous exercise is not valid.
4.8 PRODUCT MEASURE SPACES
Our second application of the theory of extensions to measures, which we
developed in Section 4.6, will be to product measure spaces. In this section,
we will see how two measure spaces naturally give rise to a third measure
space, called the product measure space. To help motivate product measure
spaces, we consider the following example.
EXAMPLE 4.18 Motivates Product Measure '
Note that in each of the illustrations below, a nonnegative set function is
expressed in terms of a product of two measures.
a)	Let A denote Lebesgue measure on 1Z. If I and J are two intervals in 7£,
then the Cartesian product, I x J, is a rectangle in 1Z2 ( = 1Z x 1Z)
whose area can be expressed as
area(/ x J) = £(/)£(J) = A(I)A(J).
b)	Let Г and Л be two finite sets and p and v counting measure on Г and Л,
respectively. Also, as before, let N(E) denote the number of elements
of a finite set, E. If А С Г and В С Л, then the number of elements of
Ax В can be expressed as
N(A x B) = /z(A)t/(B),
as we know from the fundamental principle of counting.	□
232 □ Chapter 4 Measure Theory
Existence of a Product Measure
Suppose that (Г, 5,/z) and (Л, T, v) are two measure spaces. As usual, we
let Г x Л denote the Cartesian product of Г with Л:
Г x Л = { (я, у) : x G Г and у G Л }.
Our first task is to prove the existence of a cr-algebra, Л, of subsets of
Г x Л that contains all sets of the form S x T, where S E S and T G T,
and a measure, on A such that
u/(S xT) = /z(S>(T).	(4.36)
This will be accomplished by applying the theory of extensions to measures.
We begin with the following definition.
DEFINITION 4.22 Measurable Rectangles
Let (Г,5) and (Л, T) be measurable spaces. A subset of Г x A of the
form S x T, where S E S and T G T, is called a measurable rect-
angle. The collection of all measurable rectangles is denoted by U.
Proposition 4.19 establishes that U is a semialgebra.
PROPOSITION 4.19
The collection, 1Л, of all measurable rectangles is a semialgebra of subsets
of Г x A.
PROOF: Let A, В G U. Then there are sets Si, S2 € S and T1? T2 C T
such that A = Si x Ti and В = S2 x T2. As А А В = (Si A S2) x (Ti А Г2)
and S and T are сг-algebras, it follows that А А В G U. Hence, U is closed
under finite intersections. Now let C G U and choose S G S and T G T
such that C = S x T. Then it is easy to see that Cc = (Г x Tc) U (Sc x Г),
which is a finite disjoint union of members of U.	
Next we define a nonnegative extended real-valued set function, ь,
on U. In view of (4.36), this should be done as follows: For S G S and
T G T, define
t(S x T) = g(S>(T).
(4-37)
4.8 Product Measure Spaces □ 233
If we can verify that Conditions (E1)-(E3) on page 208 hold for l and U,
then Theorem 4.11 on page 214 will ensure the existence of a cr-algebra,
Д D W, and a measure, on A satisfying (4.36) —thereby completing our
first task.
To verify Condition (El), we note that 0 € U. In fact, 0 = S x T if and
only if at least one of S and T are empty. But then t(0) = /i(5r)z/(T’) = 0,
as required. The validity of Conditions (E2) and (E3) are established here
in Lemmas 4.6 and 4.7, respectively.
LEMMA 4.6
Suppose that {<Л}£=1 is a finite sequence of pairwise disjoint members
of U whose union is in U. Then t(Ufc=i Ct) = ^Gfe)-
PROOF: Set C = U£=i &k- Then, by assumption, we can write C = S x T
and Ck = Sk x Tfc, for 1 < к < n, where S, Sk G 5 and T, Tk E T. Let
x € S and set Nx = { к : x G Sk }. If у € T, then (x, у) € C and so there
is a к such that (x, y) G Sk *Tk\ thus, у € Tk for some к G Nx. On the
other hand, if у € Tk for some к E Nx, then (я,т/) G Sk x Tk C S x T,
so that у e T. Hence, T = Ukev Tk and, since the CfcS are pairwise
disjoint, the sets Tk, к E'NX, are also pairwise disjoint. Consequently,
u(T) = y(Tk). It follows (see Exercise 4.133) that
n
p(T)XS(x) = J2I/(Tfc)xsfc(x)	(4.38)
fc=l
for all x G Г. Therefore,
t(C) = n(S)v(T) = [ i/(T)xs(x)dM(x) = [
Пл	71	71
к=1	k=l	k=l
as required.	
LEMMA 4.7
Assume C, Ci, C2, . •. are in U and С C |Jn Cn. Then l(C) < t(Cn).
PROOF: We can write C = S x T and Cn = Sn x Tn, where S, Sn G 5
and T, Tn G T. Let x G S and set Nx = {n : x G Sn }. If у G T,
234 □ Chapter 4 Measure Theory
then (rr, у) € C and so there is an n € Nx such that € Sn x Tn.
Therefore, T C UneNx an<^ so "(Р) — Y^neNx ^(^n)- This implies that
l/(r)xs(x) < EnHTn)Xs„(z) for all X e Г. Thus;
t(C) = /z(S>(T) = ^T)Xs(x) d^x) <	£ v^Xsn(x)
= v(Tn)XsAx)d^x) = Y,^n}v(Tn) = J>(Cn),
nJV
as required.	
We have now verified that Conditions (E1)-(E3) are satisfied by l
and 14. Therefore, by Theorem 4.11 on page 214, we can deduce the fol-
lowing result, which completes our first task.
THEOREM 4.14
Suppose that (Г, 5, p) and (Л, T, v) are measure spaces. Let 14 — {S' xT :
SeS and TeT} and define l on 14 by l(S x T) = p(S)v(T). Then there
exists an extension of l to a measure on a a-algebra containing 14.
The Product Measure Space
In most Of our work with product measure spaces, it will be necessary to
impose a restriction on the factors, (Г,5,/1) and (A,T, z>), namely that
they are a-finite measure spaces.
DEFINITION 4.23 сг-finite Measure Space
A measure space, (П,Л,/i), is called a а-finite measure space if
there is a sequence, {An}n, of Л-measurable sets such that |Jn An = Q
and p(An) < oo for each n.
EXAMPLE 4.19 Illustrates Definition 4.23
a) (7?., Л4,Л) is сг-finite. Indeed, the sets, An = [—n,n], n 6 Af, satisfy
UXi An — 1Z and A(An) < oo for each пеЛГ.
b) Let 7 be counting measure on P(AT). We have Af = UXi{n) an<^
т({п}) = 1 < oo for each n 6 Af. Therefore, we see that (Af, 7?(Af), 7)
is a сг-finite measure space.
4.8 Product Measure Spaces □ 235
c)	Let Q be a nonempty set and A = {Q, 0}. Define /z(Q) = oo and
/1(0) = 0. Then (Q, A, p) is not a сг-finite measure space.
d)	Clearly, any finite measure space is cr-finite. In particular, any proba-
bility space is a сг-finite measure space.	□
As you probably noted, the condition of сг-finiteness is quite similar
to Condition (E4) on page 216. In fact, the next proposition shows that,
for product measure spaces, there is an important relationship between the
two conditions.
PROPOSITION 4.20
Suppose that (Г, 5, p) and (Л, T, i/) are two a-finite measure spaces. Let
If be the semialgebra of measurable rectangles and ь the nonnegative ex-
tended real-valued set function on If as defined in (4.37) on page 232. Then
Condition (E4) is satisfied by l and U.
PROOF: By the сг-finiteness assumption, we can choose {Sn}n C 5 and
{Tn}n С T such that /i(Sn) < oo and z/(Tn) < oo, for all n, and Г = |Jn Sn
and A = Un Let An = ULi $k and Bn = ULi Then {An}n C 5
and {Bn}n С T; p(An) < oo and v(Bn) < oo for all n; and Г = Un^n
and A = Un ^n- Moreover, Ai С A2 C • • • and Bi С B2 C • • •.
Let Cn = An x Bn. We claim that {Cn}n is the required sequence of
sets; that is, {Cn}n C If, Un = Г x A, and t(Cn) < oo for each n. The
first and third properties of {Cn}n are obvious from the previous paragraph.
To prove the second property, suppose that (rr, у) € Г x A. Since x G Г
and у G A, there is an ni with x G Ani, and an n2 with у G Bn2. Let
n = max{ni,n2}. Then, because {An}n and {Bn}n are nondecreasing
sequences of sets, we have x G An and у G Bn and, consequently, that
(x, y) eAnxBn = Cn. Thus, Г x A = UnCn.	
To summarize, we have now shown that Conditions (E1)-(E3) hold
for l and W; and that, if (T,5,/i) and (Л, T, i/) are both сг-finite, then
Condition (E4) holds as well. Therefore, on account of Theorem 4.12 on
page 216, we have the following result.
THEOREM 4.15
Suppose that (Г, S, p) and (Л, T, v) are а-finite measure spaces. Let
If = {S xT : S e S andT eT}
and define l on If by l(S x T) = p(S)v(T). Then there exists a unique
extension of l to a measure on the cr-algebra generated by If.
236 □ Chapter 4 Measure Theory
Special notation and terminology are used for the extension of 6, the
a-algebra generated by W, and the resulting measure space. This is intro-
duced in Definition 4.24.
DEFINITION 4.24 Product Measure Space
Suppose that (T,5,/i) and (A,T, z/) are а-finite measure spaces and
let U and l be as in Theorem 4.15. The a-algebra generated by U, the
smallest a-algebra containing all measurable rectangles, is called the
product a-algebra of S with T and is denoted by S X T. The
unique extension of l to a measure on 5 x T is called the product
measure of pt with v and is denoted by pt X v. The measure space
(Г x A, S x T, p x z/) is called the product measure space of (Г, 5, p)
with (Л, T, z/).
Note: It is important to realize that <S x T is a notation for the a-algebra
generated by U and is not the Cartesian product of the sets 5 and T.
EXAMPLE 4.2	0 Illustrates Definition 4.24
a)	Let (Г,<9, p) = (A,T, z/) = (7£, A4,A). Since Л4 contains all intervals,
any rectangle, R € T?2, is a measurable rectangle. If R = I x J, where
I and J are intervals, then (A x A)(7?) = A(I)A( J) = area(P). So, A x A
is a generalization of area to all M x .M-measurable sets.
b)	Let (r,S,/z) = (A,T, z/) = (Af,P(Af),7), where 7 is counting measure
on P(Af). As we know from Example 4.19(b), (Af, P(Af),7) is a a-finite
measure space. We leave it as an exercise for the reader to show that the
product a-algebra of P(Af) with Р(М) consists of all subsets of Af x Af;
that is, P(Af) x P(Af) = P(J\f x Af). And, furthermore, the product
measure of 7 with 7 is counting measure on P(M x X). In other words,
the product measure space, (Af x Af, P(J\T) xP(Af), 7 x 7), is the measure
space (Af xAf, P(Af xAf), к), where к is counting measure on P[N^N).
c)	If (П1,Л1,Р1) and (Q2,-4-2,P2) are two probability spaces, then so is
(Qi x 02,Д1 x Д2?Р1 x P2). As we will discover in Chapter 5, the
product probability space is the appropriate mathematical model for
the juxtaposition of two independent experiments.	□
Sections of Sets and Functions in Product Spaces
We learned in calculus that a double (Riemann) integral can be evaluated
as two iterated single integrals. Our next task is to prove a generalization
4.8 Product Measure Spaces □ 237
of that result to product measure spaces; roughly speaking, a theorem of
the following form: If f: Г x Л —► U is S x T-measurable, then
(4.39)
In establishing (4.39), we must first show that it makes sense. For
instance, we need to verify that if f is an <S x T-measurable function
on Г x Л, then the function, /^j, defined on Л by f[x](y) = f(x,y) is
T-measurable; that the function, g, defined on Г by g(x) = fA f(x, y) dv(y)
is S-measurable; and so forth. To begin, we define the sections of a set.
DEFINITION 4.25 Sections of a Set in a Product Space
Suppose А С Г x Л. Then the Г-sections of A and the A-sections
of A are defined, respectively, by
Ax = { у € A : (ж, у) € A }, x € Г;
and
Ay = { x € Г : (x, у) € A }, у E Л.
Note that each Г-section is a subset of Л and that each Л-section is a
subset of Г. Figure 4.1 provides a visual representation of a Г-section.
FIGURE 4.1 A Г-section.
238 □ Chapter 4 Measure Theory
EXAMPLE 4.2	1 Illustrates Definition 4.25
a)	Let A = S x T, where S С Г and T C A. Then
Ад
f T’
\9,
if x € S;
if x S.
and
if У € T;
ify£T.
b)	Let Г = A = TZ and A = { (x, y) : x2 + 4y2 < 4 }. Then
Ax = -|(4-a:2)5 i(4-x2)5
Z	Z
for |rr| < 2, and Ax = 0, otherwise. Similarly,
Ay = [-2(1 - y2)5,2(1 - y2)^]
for \y\ < 1, and Ay = 0, otherwise.
□
We next prove that sections of 5 x T-measurable sets are themselves
measurable. More precisely, we have the following proposition.
PROPOSITION 4.21
Suppose that (Г, S) and (Л, T) are measurable spaces and that A € S xT.
Then
a)	Ax e T for all x € Г.
b)	Ay € 5 for all у € Л.
PROOF: We prove only (a). The proof of (b) is similar and is left as an
exercise for the reader. Set
P = { A € S x T : Ax G T for all x € Г }.
It follows immediately from Example 4.21(a) that P contains all measurable
rectangles; that is, P D W. Because 5 x T is, by definition, the smallest
cr-algebra containing Z/, the proof will be complete once we show that P is
a cr-algebra; because that will imply P = 5 x T (why?).
So,	assume that A € P. Then A € S x T and Ax € T for all x € Г.
Therefore, Ac G SxT and (Лж)с € T for all x € Г. But, (Лж)с = (Ас)ж and,
hence, Ac G P. Now assume that {An}n С P. Then {Лп}п C 5 x T and
{(Ai)x}n С T for all x G Г. Thus, \Jn An e SxT and ип(Лп)ж 6 T for all
rr G Г. However, UnHnJx = (Un4)x and, so, Un An € V- Consequently,
P is a cr-algebra.	
Having discussed sections of subsets of Г x Л, we now move on to
the consideration of sections of functions on Г x Л. The sections of such
functions are obtained by holding one of the two variables fixed.
4.8 Product Measure Spaces □ 239
DEFINITION 4.26 Sections of a Function on a Product Space
Suppose that f is a function on Г x Л. Then the Г-sections of f and
the Л-sections of f are defined, respectively, by
and
f[x](y) = f(x,y), х€Г;
Note that each Г-section of f is a function on Л and that each A-section
of f is a function on Г.
EXAMPLE 4.22 Illustrates Definition 4.26
Let Г = 1Z and Л = АЛ Define f: 1Z x A/* —* 1Z by
f(x,y)=xy-\------.
У
Then /[ipAf —* 1Z is given by f[i](y) = l/2y + 1/4?/ and f^ilZ —* TZ is
given by f^(x) = x2 + x2/2 = 3x2/2.	□
Proposition 4.22, which we prove next, shows that sections of S x T-
measurable functions are themselves measurable functions.
PROPOSITION 4.22
Let (Г, 5) and (Л, T) be measurable spaces. Suppose that f is an extended
real-valued or complex-valued S x T-measurable function on Г x Л. Then
a)	is T-measurable for all x e Г.
b)	is S-measurable for all у € Л.
PROOF: To prove part (a), let x € Г. We will employ the bootstrapping
technique to show that is T-measurable. So, assume first that f = x As
where A G S x T. Then,
£ f \ £( \ f T (x,y) £ fl» у £ A&;	/ \
/н(г/) -f(x,y) -	“(о, y<£Ax. ~XA^-
Since A e S kT, we know by Proposition 4.21(a) that Ax € T and, hence,
that XAx is T-measurable. Next assume that f is a nonnegative simple
240 □ Chapter 4 Measure Theory
function, say f = akXAk> where Ak G 5 x T for 1 < к < n. Then
/[z] = акХ(Аь)х which is T-measurable, being a linear combination
of T-measurable functions.
Now assume f is a nonnegative extended real-valued 5 xT-measurable
function. Then, by Proposition 4.7(a) on page 186, there is a sequence,
{$п}^1, of nonnegative 5 x T-measurable simple functions that converges
pointwise to f on Г x Л. From the previous paragraph, we know that
is a sequence of T-measurable functions on Л. Moreover,
since sn —► f pointwise on Г x Л, it is clear that (sn)[z] “* f[x] pointwise
on Л. Therefore, by Proposition 4.7(b), f[x] is T-measurable.
Next assume that f is an extended real-valued 5 x T-measurable
function. We write f = f+ — f~ and note that f[x] =	— /[”]. Us-
ing the result of the previous paragraph and the fact that the difference
of two T-measurable functions is T-measurable, we conclude that f[x] is
T-measurable.
Finally, assume that f is a complex-valued S x T-measurable function.
We write f = $lf + iQf and note that f\x] = (SR/)[Z] + i(S/)[z]. Then
we apply the result of the previous paragraph and the fact that a linear
combination of T-measurable functions is T-measurable to conclude that
/[z] is T-measurable. This completes the proof of part (a). The proof of
part (b) is similar and is left as an exercise.	
In final preparation for our theorems on iterated integrals in product
spaces, which we will consider in the next section, we prove the following
two lemmas. Note the o-finiteness assumptions in each lemma.
LEMMA 4.8
Suppose that (Г, S, p) and (Л, T, i/) are two a-Gnite measure spaces. Then,
for each AtSxT,
a) the function, g, defined on Г by g(x) = v(Ax) is S-measurable.
b) the function, h, defined on Л by h(y) = p(Ay) is T-measurable.
PROOF: Let
V = { A e S x T : (a) and (b) hold}.
We will show that V D Aq(W) and that T> is closed under monotone lim-
its. It will then follow from the monotone class theorem, Theorem 1.1 on
page 30, that V D S x T. Since, by definition, V C S x T, we will have
P = S x T, as required.
We first establish that P D W. So, let S x T be a measurable rectangle.
Then (S x T)z = T, if x € S, and is empty otherwise. Consequently, we
4.8 Product Measure Spaces □ 241
have g(x) = i/((S x T)x) = v(T)xs(x). Since S € S, g is 5-measurable.
Similarly, we find that h(y) = x T)37) =	is T-measurable.
So, T)^U.
Next we show that P D Ao(^). Let A € Aq(LT). Then, by Lemma 4.2
on page 215, A is a finite disjoint union of members of ZY, say A — |J£=1 Ль-
Now, Ax = Uk=i(^A:)a: and, because the A^s are pairwise disjoint, so are
the (Afc)zs for each fixed x € Г. Consequently, v(Ax) = ^Jk=i р(СЛ)я)-
Because Ak G W, the previous paragraph implies that gk(x) = v((Ak)x) is
5-measurable. Hence, g(x) = z/(Ax) = 52fc=i^fc(^) is 5-measurable, being
a sum of 5-measurable functions. Similarly, the function h(y) = //(A37) is
T-measurable. Thus, V D Aq(U).
Next we prove that T> is closed under nondecreasing limits; that is,
if {An}^=1 G P and Ai G A2 G •••, then |J^=i	€ P. We have
(Ai)x C (A2)x G • • • and, hence, Theorem 4.1(d) on page 170 implies that
(/ °O	\ \	/ °O	к
(UAn) )= u\ )= Диасом*)- (4-4°)
Vn=i	/	41=1	/	n-*°°
Since {An}^=1 G P, the function, ^n(x) = ^((An)x), is 5-measurable for
each ntN and, by (4.40), gn(x) —* ^((U^Li Ai)z) pointwise on Г. So,
by Theorem 4.5(d) on page 180, g(x) = ^((IJ^Li An)x) is 5-measurable.
A similar argument shows that the function, h(y) = m((U^=i ^n)27), is
T-measurable. Hence, IJ^Li G. "D.
Finally, we must verify that P is closed under nonincreasing limits;
that is, if {An}^=1 G P and Ai D A2 D • • •, then An G P. Suppose
first that there exist S G 5 and T G T with //(5) < oo and v(T) < oo
such that Ai G SxT. Then (AJ^ G T and (Ai)37 G S and, consequently,
i/((Ai)x) and //((Ai)37) are both finite. Applying Theorem 4.1(c) and an
argument similar to the one used in the preceding paragraph, we find that
ПХМпеР.
To handle the general case — that is, no restriction on Ai—we must
invoke the сг-finiteness assumption. We can select nondecreasing sequences,
{Sk}k C 5 and {Tfcjfc G T, such that Г = |Jfc Sk and A = |Jfc7k, and, for
all fc, v(Sk) < oo and z/(7fc) < oo. Define
8 = { E G 5 x T : E A (Sk x Tk) G P for all к }.	(4.41)
We leave it as an exercise for the reader to prove that £ = 5 x T. (See
Exercise 4.144.)
Again, let {An}^^ be a nonincreasing sequence of members of P, but
this time with no restriction on Ap For convenience, set A = Ai-
242 □ Chapter 4 Measure Theory
Then A e £ ( = 5 x T) and, thus, An(Sk x Tk) E V for all k. The sequence,
{Sk x Tk}k> is nondecreasing because {Sk}k and {Tfc}fc are nondecreasing.
This, in turn, implies that the sequence, {An(Sfc хТ^)}^, is nondecreasing.
Since we have already shown that T> is closed under nondecreasing limits,
we can conclude that ^(A C x W) But, x W = Г x Л
(why?) and, consequently,
A = А П (Г x Л) = А П x Tfc)) = |J(A Г) (Sk x Tfe)).
' к	'к
This proves that A € T>.
We have now established that T> D Ao (I/) and that P is closed under
monotone limits. Therefore, by the monotone class theorem, P contains
the cr-algebra generated by -Aq(W), which is 5 x T. Since, by definition,
P C S x T, we deduce that P = 5 x T, as required.	
LEMMA 4.9
Suppose that (Г, 5, p) and (Л, T, v) are two а-Snite measure spaces. Then,
for each AeSxT,
a) (jj. x i/)(A) = Jr „(Az) dfi(x)
b) (V x i/)(A) = JA fj.(Av) dv{y).
PROOF: We will prove part (a). The proof of part (b) is similar and is
left as an exercise for the reader. For A € S x T, define
t(A) = J v(Ax)dp(x).
In view of Lemma 4.8, the integral exists because the function, g, defined
on Г by g(x) = и{Ах) is a (nonnegative) 5-measurable function. We will
show that т is a measure on S x T and that т = p x и on U. This will imply,
by the uniqueness portion of Theorem 4.15 on page 235, that т = p x v on
5 x T, as required.
Clearly, t(A) > 0, for all A € SxT, and r(0) = 0. Assume that {An}n
is a sequence of pairwise disjoint members of S x T. Then {(An)x}n is a
sequence of pairwise disjoint members of T. Consequently,
Hence, т is a measure on S x T.
4.8 Product Measure Spaces □ 243
Now suppose that S x T is a measurable rectangle. Then
r(5 x T) =	i/((S x T)x) dfi(x) = I p(T)xs(z) d^x)
= n(S)v(T) = (цх z/)(S x T).
This shows that r agrees with /z x и on U.	
EXERCISES 4.8
4.133	Verify (4.38) on page 233. Hint: Show that if v(Tk)xsk(x) > 0 for some /с,
then x € 5.
4.134	Let /1 be counting measure on P(7£). Show that (7£, P(7£),/z) is not a
cr-finite measure space.
4.135	Suppose that (Г,5, /z) and (A,T, z/) are cr-finite measure spaces. Prove
that the product measure space, (Г x A, S xT,/ixi/), is cr-finite.
4.136	Show that Л4 x Л4 contains all open and closed subsets of 7£2; that is,
each open or closed subset of 7£2 is Л4 x Л4-measurable.
★4.137 Let 7 be counting measure on P(Af).
a)	Show that the product cr-algebra of P(N) with P(Af) consists of all
subsets of A/* x J\T-, that is, P(Af) X Р(АГ) — Р(ЛГ x M). Hint: N x AT
is countable.
b)	Show that the product measure of 7 with 7 is counting measure on
P(Af) x P(A/*) ( = P(AT x Af)).
4.138 Suppose that Г = {xi,X2,..., xm} and A = {2/1,3/25.. •, Уп} are finite sets
and that {ai, аг, •.., flm} and {&i, 62, • • •, bn} are two sets of nonnegative
numbers. Define /z on P(Q) and v on P(A) by /z(A) = ^x.EAaj and
z/(B) = €B bk. Determine explicitly
a) the product cr-algebra, 'P(Q) x P(A).
b) the product measure, /z x 1/.
4.139	Suppose that g is a complex-valued S-measurable function on Г and that
h is a complex-valued T-measurable function on A. Define f on Г x A by
f(x,y) = g(x)h(y). Show that f is 5 x T-measurable.
4.140	Let Г = A = P, and A — { (rr, у) : 0 < у < x2 and x > 0}. Determine Ax
and Ay.
4.141	Prove part (b) of Proposition 4.21 on	page 238.
4.142	Prove part (b) of Proposition 4.22 on	page 239.
4.143	True or False:
a)	If А, В С Г x A are disjoint and x	G Г, then	Ax	and Bx	are	disjoint,
b)	If Si x Ti and S2 x T2 are disjoint	rectangles	in	Г	x A, then	Si	and S2
are disjoint.
c)	If Si x 7i and S2 x T2 are disjoint rectangles in Г x A, then either Si
and S2 are disjoint or Ti and T2 are disjoint.
244 □ Chapter 4 Measure Theory
4.144	Let 8 be defined as in (4.41) on page 241. Prove that 8 = S x T by
employing the following steps.
a)	Show that 8 D -4o(ZV). Hint: Ao(U) is an algebra and Ao(U) C T>.
b)	Show that 8 is closed under nondecreasing limits.
c)	Show that 8 is closed under nonincreasing limits.
d)	Conclude that 8 = 5 x T by employing the monotone class theorem.
4.145	This exercise shows that ст-finiteness cannot be omitted as a hypothesis
in Lemma 4.8. Let S be the set defined in Lemma 3.12 on page 116. By
Exercise 3.51, S Л4. Also, let (Г,5, /z) = (7£, A4,A) and (A,T, u) =
(TZ, P(TZ), v), where i/(T) = y(T П S') and 7 is counting measure.
a)	Let Qo = Q \ {0} and define A = { (x, x + r) : x G TZ, r G Qo }• Show
that A G M xP(TZ). Hint: First show that the function, f(x, y) = y-x,
is Л4 x P(7£)-measurable.
b)	Show that v(Ax) = xs^(x) and conclude that x —* v(Ax) is not Ad-
measurable.
c)	Why doesn’t the result in part (b) contradict Lemma 4.8?
4.146	Prove part (b) of Lemma 4.9 on page 242.
Exercises 4.1J7-4.151 should be completed by all readers who plan to cover the
probability material in Chapter 5.
+4.147 Denote by B2 the smallest cr-algebra of subsets of TZ2 that contains all open
sets of TZ2. Members of B2 are called two-dimensional Borel sets.
a)	Show that B2 = В x B.
b)	A measure on B2 is called a two-dimensional Borel measure. Sup-
pose that /z and v are finite two-dimensional Borel measures such that
p(A x B) = v(A x B) for all A, В G B. Prove that p = и.
4.148	Let T2 denote the collection of all two-dimensional intervals in 7£2; that
is, all sets of the form I x J where I, J G T.
a)	Show that the ст-algebra generated by T2 is B2; that is, Л^г) = B2.
Hint: Use Exercise 4.147(a).
b)	Let p and v be two-dimensional Borel measures such that p(K) < 00
for all bounded two-dimensional intervals and p(K) = v(K) for all
К G T2. Prove that p = 1/.
4.149	Let J denote the collection of intervals of TZ of the form (a, 5] and (c, 00),
where —00 < a < b < 00 and —00 < c < 00. Also, let J 2 denote the
collection of all subsets of TZ2 of the form I x J where I, J G J. Prove
that J is a semialgebra and that the cr-algebra generated by J2 is #2-
4.150	Suppose that p and v are finite two-dimensional Borel measures such that
p{(—00, cr] x (—00,2/]) = i/((—00,я] x (—00,2/]) for ali x, у E TZ. Prove
that /z = v. Hint: It suffices to prove that p = v on J 2-
4.151	Let p and v be finite Borel measures and r a two-dimensional Borel mea-
sure. Suppose that т((—oo,z] x (—сю, 2/]) = /z((—00, z])i/((—oo, 2/]) for all
x, у G TZ. Prove that т = p x v.
4.9 Iteration of Integrals in Product Measure Spaces □ 245
4.9 ITERATION OF INTEGRALS IN PRODUCT
MEASURE SPACES
In Section 4.8 we discussed product measure and product measure spaces.
Now we will learn how to evaluate integrals on product measure spaces by
iteration; that is, by the evaluation of two integrals on the factor spaces.
We will present several theorems of this type. The first theorem is known
as Tonelli’s theorem?
THEOREM 4.1	6 Tonelli’s Theorem
Suppose that (Г, 5, p) and (Л, T, p) are a-Snite measure spaces. Let f be
a nonnegative extended real-valued S x T-measurable function on Г x A.
Then
a)	f[x] is T-measurable for all x € Г.
b)	f M is S-measurable for all у € Л.
с)	g(x) = fA f(x,y)dis(y) is S-measurable.
d)	h(y) = frf{x,y)dp(x) is T-measurable.
e)	the equalities,
17 л
л L
hold.
PROOF: Parts (a) and (b) are the contents of Proposition 4.22 on page 239.
It remains to verify parts (c)-(e). To begin, we will show that if /i, /2,
..., fn are nonnegative S x T-measurable functions on Г x Л that satisfy
(c)-(e) and ci, C2, ..., cn are nonnegative real numbers, then ££=1 ckfk
satisfies (c)-(e). It suffices to verify this for n = 2. Let f = c^fi + 02/2-
Then, by the linearity of the Lebesgue integral, we have
/ f(x,y)dv(y) = ci / fi(x,y)dv(y) + c2 / /2(2, У)dv(y)
Ja	Ja	Ja
Since a linear combination of measurable functions is measurable and, by
assumption, Д and /2 satisfy (c), we conclude that g(x) = fA f(x,y) dv(y)
t Some authors attribute this theorem to G. Fubini.
246 □ Chapter 4 Measure Theory
is S-measurable. Similarly, h(y) = fr f(x,y)dp,(x) is T-measurable. Now
using the linearity of the Lebesgue integral and the assumption that Д
and /2 satisfy (e), we get
Hence, the first equation in (e) holds for f. A similar argument shows that
the second equation in (e) holds for f.
We will now bootstrap to prove the theorem. If f = where A €
5 x T, then g(x) = v(Ax) and h(y) = ц(Ау). Therefore, by Lemma 4.8
on page 240, (c) and (d) hold; and, by Lemma 4.9 on page 242, (e) holds.
Hence, (c)-(e) are satisfied if f is the characteristic function of a set in SxT.
It now follows immediately from the previous paragraph that (c)-(e) hold
for nonnegative simple functions.
If f is a nonnegative extended real-valued S x T-measurable function,
then, by Proposition 4.7(a) on page 186, we can choose a sequence, {sn}^=i,
of nonnegative simple functions such that sn J f pointwise on Г x Л. It
follows that (sn)[z] T f[x] pointwise on Л and, so, by the MCT,
= / ffay)dv(y) = Hm / sn(x,y)dv(y).
J A	n-*oo
This shows that g is the pointwise limit of the 5-measurable functions,
/лsn(я,*/)dr/(?/), n € J\f. Hence, g is 5-measurable. Similarly, we find
that h(y) = fr f(x, y) dp>(x) is T-measurable. Finally, employing the MCT
twice more yields
/ /d(/zxp) = lim / snd(/zxz/)
JrxA	n”*°°JrxA
This verifies the first equation in (e) and a similar argument establishes the
second equation in (e).	
4.9 Iteration of Integrals in Product Measure Spaces □ 247
Tonelli’s theorem deals with iterated integrals for nonnegative mea-
surable functions. In that case, there is no issue of the existence of the
integrals occurring in the theorem. Now we will consider the iteration of
integrals for complex-valued measurable functions. To ensure the existence
of the integrals involved, an integrability condition is imposed.
THEOREM 4.1	7 Fubini’s Theorem
Suppose that (T,S,p) and (A,T,v) are сг-finite measure spaces. Let f be
a complex-valued S x T-measurable function on Г x Л such that at least
one of the quantities,
(i)	[	x
«/ГхЛ
is finite. Then
a)	f[x] G f°r p-almost all x € Г.
b)	€ £\p) for v-almost all у € A.
с)	g(x) = /л fjx,y) du{y) is defined p-ae and is in £х(р).
d)	h(y) — Jr y) dp(x) is defined v-ae and is in jC1^).
e) the equalities,
ГхЛ
hold.
PROOF: By Tonelli’s theorem, the three integrals, (i), (ii), and (iii), are
equal. Since, by assumption, at least one of the integrals is finite, they all
must be finite.
By Proposition 4,22 on page 239, f[x] is T-measurable for all x € Г
and is 5-measurable for all у € Л. Assume now that f is real-valued
and write f = f+ — f~. It will be convenient to let Д = /+ and f2 = f~.
Because 0 < fj < \f\, it follows from (ii) that Jr [fA fj dv\ dp < po, for
j = 1, 2. Consequently, by Exercise 4.53 on page 191,
I fj(x,y)dv(y} M“ae> J = 1, 2-	(4.42)
JA
Let E = {z e Г : JAfj(x,y)dv(y) < oo, forj = land2}. Then, for
248 □ Chapter 4 Measure Theory
x e E, both and are in jC1 (zx); hence, so is j\x]. Since, by (4.42),
^(£*c) == 0, we see that (a) holds. A similar argument establishes (b).
Next, for j = 1 and 2, we define gj(x) = Xe(^) fA fjfx, y) dv(y). Then
gj is real-valued and, by part (c) of Tonelli’s theorem, is 5-measurable.
Moreover, it follows immediately from (ii) that fr gj dp < oo. Conse-
quently, gj e £X(m) f°r J = 1 and 2. But, then, Theorem 4.8 on page 196
implies that g\ — g2 E El{p). However, if x € E,
Pi(^)-P2(^) = / fi(x,y)dv(y) - / f2(x, y) dv(y) = / f(x,y)dv(y).
J A	J A	J A
Therefore, the function, ^(x) = fA f(x, y) dv(y), is defined /z-ае and is
in £1(/z). This proves that (c) holds and a similar argument verifies (d).
Employing part (e) of Tonelli’s theorem, we deduce that
ГхА
ГхА
ГхА
= I gidp- I g2dp= I gdp = /	/ f du dp.
Jr Jr Jr Jr Lja
This establishes the first equation in (e) and a similar argument verifies the
second equation in (e). We have now shown that Fubini’s theorem holds if
f is real-valued. The verification for complex-valued f is left to the reader
as an exercise.	
Example 4.23 provides applications, illustrations, and remarks about
the Tonelli and Fubini theorems.
EXAMPLE 4.23 Illustrates the Tonelli and Fubini Theorems
a)	The following is a theorem from calculus: Suppose that f is a real-valued
function of two variables, defined and continuous on the rectangle,
Then f is Riemann integrable on R and
r r	Г*> [ i
>d
(4-43)
a
c
R	L	J
This result can be proved by employing Fubini’s theorem, as outlined
in Exercise 4.167.
4.9 Iteration of Integrals in Product Measure Spaces □ 249
b)	Suppose that {amn}m,n=1 is a double sequence of nonnegative real num-
bers. Then
(4.44)
To prove (4.44), let (Г, 5,/1) = (A,T, r/) — (Af\P(M),y) where 7 is
counting measure. Then, by Exercise 4.137 on page 243, the product
measure space is (Af x Af,P(Af x Af), к), where к is counting measure
on P(Af x Af). Define /: Af x Af —* by /(m,n) = amn. Then, by
Example 4.7(b) on page 189 and Tonelli’s theorem,
/(m, ri)
c)	This part shows that the a-finiteness hypothesis cannot be dropped in
Tonelli’s theorem. Let 7 be counting measure, (Г, S, ^) = (7£, A4,A),
(A,T,z/) = (7£,7^(7?.),7), and D — {(x,y) : x = y}. Set f = xp- We
claim that f is A4 x P(7?,)-measurable or, equivalently, D e A4 x P(7£).
Indeed, since A4 x P(P) is a a-algebra and contains all rectangles (in
the geometry sense), it also contains all open sets in P2 and, hence,
all closed sets in P2. Clearly D is a closed subset of P2. Now, it is
easy to see that = X{x}(y) and f^(x) = X{y}(xY Consequently,
we have f(x,y) dy(y) = 7({z}) = 1 for each x G P and we have
/(#, y) dX(x) = A({?/}) = 0 for each у eP. Hence,
f(x,y) dy(y)
dX(x) = 00 7^ 0
/(x,?/) dX(x)
dy(y).
Therefore, the second equation in part (e) of Tonelli’s theorem fails.
Note that the measure space, (7£,P(7£), 7), is not a-finite.
d)	In this part, we show that the integrability condition cannot be omitted
from Fubini’s theorem. Let (Г,5,/1) = (A,T, i/) = (Z,T?(Z),7), where
Z is the set of integers and 7 is counting measure on P{Z). Clearly,
(Z,P(Z),7) is а-finite. Let /: Z x Z —► P be defined by
{rr, у = x\
-x, y = x + l->
0, elsewhere.
250 □ Chapter 4 Measure Theory
Then, L f (x, y) dr^y) = £ f(x, y) = x + (-x) = 0, for each x e Z,
and, hence,
Z \.JZ
/(x, y) d^y) dy(x) = ^2
0 = 0.
On the other hand, fz f(x, y) d-^x) = Y,x f(x> ?/) = _(?/“1) + У = 1,
for each у G Z, and, consequently,
/(x, y) d7(x) dy(y) = 521 =
Thus, the second equation in part (e) of Fubini’s theorem fails. Note
that here none of the integrals in (i)-(iii) of Fubini’s theorem is finite. □
The Completion of the Product Measure Space
The product of two measure spaces may not be complete, even when each
factor space is complete. For instance, we know that the measure space,
(7£,.A4, A), is complete. But the product of that measure space with itself,
(7£2,A4 x Л4,Л x A), is not complete. Indeed, let N be a non-Lebesgue
measurable set, A = N x {0}, and В = 1Z x {0}. Then (A x A)(B) = 0,
A С B, but A M. x M because A0 — N M. (see Proposition 4.21
on page 238). Consequently, we see that (7£2,A4 x Л4, A x A) is not a
complete measure space.
Recall from Theorem 4.2 on page 172 that, given a measure space,
(fl, A,/z), there is a complete measure space, (fl,A,/z), called the com-
pletion of (fl, A,/z), such that A D A and = /z. It is often more
appropriate to work with the completion of a product measure space than
the product measure space itself. An important example of this occurs in
classical analysis, as we now show.
We just discovered that the measure space, (7£2, Л4 x Л4, A x A) is not
complete. This can cause difficulties. For instance, as Exercise 4.162 re-
veals, a function can be Riemann integrable over a set D C TZ2 without
being Л4 x M-measurable. However, this cannot happen with the comple-
tion, (7?.2,Л4 x Л4,А x A). In fact, we have the following two-dimensional
analogue of Theorem 3.23 on page 157, whose proof is left as an exercise
for the reader.
4.9 Iteration of Integrals in Product Measure Spaces □ 251
THEOREM 4.18
Suppose that f is Riemann integrable on [a,b] x [c,d]. Then f is Lebesgue
integrable on [a, b] x [c, d] with respect to A x A and
f(x,y)dX x X(x,y) - f f f(x,y)dxdy.
[a,b] x [c,d]	J a J c
Note: Because of Theorem 4.18, we will often denote the integral on the left
of the previous equation by the integral on the right, regardless of whether
f is Riemann integrable.
We should point out that the measure space, (7£2,A4 x A4,A x A),
is identical to the measure space,	A2), discussed in Section 4.6
on page 214. In other words, A x A is two-dimensional Lebesgue measure
and M x Л4 is the collection of two-dimensional Lebesgue measurable sets.
The verification of these facts is considered in Exercise 4.163.
We Can derive analogues of Tonelli’s theorem and Fubini’s theorem for
the completion of a product measure space provided that the factor spaces
are complete and сг-finite. We begin with two lemmas.
LEMMA 4.10
Let (fi, Л, p) be a measure space and f an A-measurable function on fi.
Then there exists an A-measurable function, ф, on fl such that ф = f p-ae.
PROOF: We employ the bootstrapping technique. So, first suppose that
f = Xe, where E € A. By definition, we can select sets, A, B, and C,
such that E = В U A, where В, C G A, A С C, and p(C} = 0. Now
we define the function, ф = хв, and note that, because В G А, ф is
Л-measurable. Let D — { x : ф(х) / f(x) }. Then D = E\B C Ac(l
Since p(C) = p(C) = 0 and D С C, it follows by completeness that D G A
and p(D) = 0. Thus, ф — f p-ae.
Suppose now that f is a simple function, say f = akXAk- Select
Л-measurable functions, фь, such that фь = XAk p-ae, for 1 < к < n. If
Dk = {x : фк(х) / XAk(x)}, then the set, D = Uk=i &k, has Д-measure
zero and the Л-measurable function, ф = 52ь=1 акФк> equals f on D. __
If f is nonnegative, choose a sequence,	of nonnegative Л-
measurable simple functions such that sn T f pointwise on fl. Then, using
the previous paragraph, select a sequence,	of Л-measurable func-
tions such that, for each n G N, tn = sn p-ae. Define ф = limsupn_>oo
and note that ф is Л-measurable. We claim that ф = f p-ae. To prove this,
252 □ Chapter 4 Measure Theory
let An = { x : tn(x) / sn(x)} and A = (JXi Then p(A) = 0 and, if
x £ A,
ф(х) = limsup tn(x) = lim sn(x) — f(x).
n—*oo	n—*oo
Consequently, ф = f p-ae. We leave the remainder of the proof as an
exercise for the reader.	
LEMMA 4.11
Suppose that (Г,5, p) and (Л, T, p) are complete, а-finite measure spaces.
If £ is an S x T-measurable function such that £ = 0 p x v-ae, then
a) for p-almost all x G Г, = 0 v-ae.
b) for v-almost all у G Л, f M = 0 p-ae.
PROOF: Let E — {(x,y) : £>(x,y) /0). Then p x — 0, by as-
sumption. We can select sets, A, B, and C, such that E = В U A, where
В, С E S x T, Ac C, and (p x i/)(C) — 0. Since p x v(E) = 0, it is clear
that (p x i/)(B) = 0. Let D = BuC. Then D G 5 x T and (p x i/)(D) = 0.
Consequently, by Lemma 4.9(a) on page 242, Jr v(Dx) dp(x) — 0. This
implies that v(Dx) = 0 p-ae.
Set N = {x : v(Dx) /0} and note that p(N) = 0. If т TV, then
i/(Dz) = 0 and, since Ex C Dx and (Л, T, i/) is complete, it follows that
Ex G T and v(Ex) = 0. But Ex = {у : £[ж](£/) /0} and, hence, we see
that £[я.] =0 p-ae. Thus, part (a) holds and a similar argument establishes
the validity of part (b).	
We are now in a position to prove the analogues of Tonelli’s theorem
and Fubini’s theorem for the completion of a product measure space. The
former theorem is presented as Theorem 4.19 and the latter theorem is left
to the reader as an exercise.
THEOREM 4.1	9
Suppose that (Г, 5, p) and (Л, T, i/) are complete, а-finite measure spaces.
Let f be a nonnegative extended real-valued S x T-measurable function
on Г x Л. Then
a)	f[x] is T-measurable for p-almost all x G Г.
Ъ)	is S-measurable for v-almost all у G Л.
с)	g(x) = f(x, y) dv(y) is defined p-ae and is equal to an S-measurable
function p-ae.
d)	h(y) = fr f(x,y) dp(x) is defined p-ae and is equal to a T-measurable
function v-ae.
4.9 Iteration of Integrals in Product Measure Spaces □ 253
e) the equalities,
f(x, y) dp x i/(x, y) =
f(x,y) dv(y)
dp(x)
f(x, y) dp(x)
dv(y),
hold.
PROOF: Choose, by Lemma 4.10, an 5 x T-measurable function, ф, such
that ф = f ijl x i/-ae. Let E = {(x,y) : ф{х,у) / /(rr, ?/)}. Arguing as
in the proof of Lemma 4.11, we can select D € S x T with E C D and
(/z x i/)(D) = 0. If we define h = хэсФ and £ = then h is S x T-
measurable, f = h + £, and £ = 0 /z x i/-ae.
By part (a) of Tonelli’s theorem, h[x] is T-measurable for x € Г. Since
/[x] = fyz] + £[z] and £ = 0 /z x p-ae, we can conclude from Lemma 4.11(a)
that, for /z-almost all x € Г, = Лэд i/-ae. Consequently, because
(Л, T, i/) is complete, f[x] is T-measurable for /z-almost all x E Г. This
completes the proof of part (a) and a similar argument establishes part (b).
From part (c) of Tonelli’s theorem, the function, p, defined on Г by
p(x) = fAh(x,y) dv(y) is 5-measurable. Let A = {x : f[x] = h[x] i/-ae}.
By the previous paragraph, /z(Ac) =0. If x 6 A, then f[x] is T-measurable
and fA f(x, y) dv(y) = fAh(x,y) dis(y). This verifies part (c). Similarly,
part (d) holds.
To establish the first equation in part (e), we apply Exercise 4.166,
part (e) of Tonelli’s theorem, and Definition 4.14 on page 197:
f dp x v
A similar argument verifies the second equation in part (e).
The Product of More Than Two Measure Spaces
Up to this point, we have only considered product measure spaces in which
there are two factors. Using similar techniques, we can develop the theory
of product measure spaces in which there are a finite number of factors.
We present only the highlights.
254 □ Chapter 4 Measure Theory
THEOREM 4.2	0
Suppose that (Q^, Л/с, /ifc), 1 < к < n, are а-Snite measure spaces. Let
X£=1	= { (xi,...,xn) : Xk G Qfc, for 1 < к < n }, the Cartesian prod-
uct of Qi, ..., Qn. Also, let X £=1 Ak denote the а-algebra generated by
the n-dimensional measurable rectangles. Then there is a unique measure,
XjLi/Zfc, on x£=1-4fc such that (x£=1Mfc) (x£=1 Afc) = flLiMfcHk)
for all n-dimensional measurable rectangles.
The (т-algebra, Х£=1 Л&, is referred to as the product cr-algebra
of Ai, ..., An’, the measure, X^=1//fc, as the product measure of
Mi, pn’, and the measure space, (x£=1Qfc, X£=1 Ak, X£=1//fc), as
the product measure space of (Qi, Ai,pi), ..., (£ln,An, pn\
Tonelli’s theorem and Fubini’s theorem generalize to n-dimensional
product spaces. In particular, the integral of a nonnegative Х£=1Лаг
measurable function or a function in C1 (X^=1//fc) can be evaluated by
forming the iterated integrals in any order. For example, if n = 3 and
f G ^(pi x /z2 x Рз), then
for each permutation, ii, i%, is, of 1, 2, 3.
EXAMPLE 4.24 Illustrates the Product of Finitely Many Measure Spaces
a)	Let Bn denote the (т-algebra generated by the open sets of TZn. Members
of Bn are called n-dimensional Borel sets. It is not too difficult to
show that Bn = В x • • • x B.
b)	It can be shown that (IV1, Л4 x • • • x Л4, A x • • • x A) = (7in, A4n, An).
In other words, Л x • • • x Л is n-dimensional Lebesgue measure and
Л4 x • • • x M is the collection of n-dimensional Lebesgue measurable
sets. We should also point out that Theorem 4.18 can be general-
ized to arbitrary dimensions. That is, if f is Riemann integrable on
X^=i[ttfc,bfc], then f is Lebesgue integrable on X \&k, bk] with re-
spect to An and
The proof of this fact is left to the reader as an exercise.
4.9 Iteration of Integrals in Product Measure Spaces □ 255
EXERCISES 4.9
Note: In some of the exercises, you will need the following two facts: (1) Let
D C 1Z2. A subset of D is open in D if and only if it can be expressed as
the intersection of D with an open subset of H2. (2) A function, f: D —> 11, is
continuous if and only if /~1(O) is open in D for each open set О C 11.
4.152	Complete the proof of Fubini’s theorem; that is, assuming that the theo-
rem holds for real-valued functions, prove that it holds for complex-valued
functions. Hint: Write f = 3?/ + iQf and use Proposition 4.9(b), found
on page 194.
4.153	Suppose that f is a Lebesgue measurable function on 1Z such that
[ \f(x)\dx<oo and [ №^dx<oo.
J —OO	J —OO I '
Define <7(x,y) = f(x)/(x2 4- 2/2), if (x,y) / (0,0), and zero otherwise.
a)	Show that G is M x A4-measurable.
b)	Prove that G G £г(А x Л) and that
[ Gd(AxA)=7r f
J-** J-x M
4.154	Suppose that {amn}m,n=i is a double sequence of complex numbers for
which at least one of the quantities,	| Umn I)	(Em I),
is finite. Then both quantities are finite and
Hint: Use Exercise 4.73 on page 201.
4.155	Let (Г,5,д) = (A,T, z/) = ([0,1], A4[0,i], A[o,i]) and suppose / G £1(/zxz/).
Prove that
4.156	Let (Г,5,/1) = (A, 7", z/) = ([-1,1], A4[-i,i], А[_1д]) and / be a continuous
function on Г x A. Prove that / G £1(A[_ljl] x А[_1д]) and that, if
D = { (x, y) : x2 + y2 < 1}, then
1
1—X2
f(x,y)dy dx.
256 □ Chapter 4 Measure Theory
+4.157 This exercise introduces the convolution of two Borel measurable functions.
For convenience, we will write A|B simply as Л.
a)	Let f be a Borel measurable function on 1Z and define F on 1Z2 by
F(x, y) = f(x — y). Show that F is B2-measurable. Hint: (x,y)	x —у
is continuous.
b)	Suppose that £ is a Borel measurable function on 71. Prove that the
function, ф, on 7Z2 defined by ф(х,у) = £(y) is B2-measurable.
c)	Suppose that h is a nonnegative Borel measurable function. Show that
I h(x — y) dX(x) = / h(x) dA(x)
Jn	Jn
for each у e 11. Hint: Bootstrap.
d)	Suppose that f,g€ CJtlZ.B, A). Prove that the function, f*g, defined
on 1Z by
(7*S)(»)= / f(x-y)g(y)dX(y)
Jn
exists for А-almost all x G H and is in £j (7£, B, A). The function, f *g,
is called the convolution of f with g.
+4.158 This exercise introduces the convolution of two cr-finite Borel measures, jz
and y.
a)	If E G B, show that the set, A — {(x^y) : x 4- у G E}, is a two-
dimensional Borel set, that is, is in B2.
b)	Show that the functions, g(x) — y(E — x) and h(y) = p,(E — ?/), are
Borel measurable.
c)	Verify that
(д x z/)(A) = [ v(E - x) dfj.(x) = I n(E - y) dv(y),
Jn	Jn
where A is the set defined in part (a).
d)	For E G B, define
(д * v)(E) = I n(E - y) dv(y).
Jn
Show that /z * у is a Borel measure. The measure (i * у is called the
convolution of /z with y. Part (c) shows that /z * у = у * /z.
e)	If f G B, /2 * 1/), prove that
I + y) d(jJ. x v)(x, у) = I f(t) d(p, *
Jn2	Jn
Hint: Bootstrap.
4.9 Iteration of Integrals in Product Measure Spaces □ 257
4.159	Let /л be a finite Borel measure. Define д:	—> C by
/z(s) = / ezts dp,(t).
The function д is called the Fourier-Stieltjes transform of /z.
a) Show that /z is well-defined, that is, the integral exists for each s G 1Z.
b) Let и be a finite Borel measure. Prove that
g*l/(s) = /z(s)l/(s),
where /z * и is the convolution of /z and v as defined in Exercise 4.158.
Hint: Use Exercise 4.158(e) and Fubini’s theorem.
4.160	Suppose that (Г,5, /z) and (A, 7", v) are two сг-finite measure spaces. Let
1Л be the semialgebra of measurable rectangles and l be defined on 1Л by
t(S x T) = ij,(S)v(T). Furthermore, let (Г x Л, Л, r) be the complete mea-
sure space induced by U and t, as in Theorem 4.11. Prove that A = S x T
and т = /z x v.
4.161	Provide an example in which the measure space, (ГхЛ,5хТ,//х z/), is
complete.
4.162	Construct a function f on 1Z2 that is Riemann integrable on [0,1] x [0,1]
but is not M x M-measurable. Hint: Do something with a non-Lebesgue
measurable set and use the fact that a function is Riemann integrable on
[o, 1] X [0,1] if and only if the set of its points of discontinuity has two-
dimensional Lebesgue measure zero.
4.163	Prove that (7£2,.M x Л4, A x A) = (7Z2,M2, A2). Use the following steps:
a) Let T2 denote the collection of all sets of the form I x J, where I and J
are intervals of 1Z. Show that M x M D A(T2) and that A x A agrees
with A 2 on %2-
b)	Use part (a) to conclude that M x M D М2 and A x X\m2 = ^2- Hint:
Employ Theorem 4.12, Exercise 4.114 on page 220, and Exercise 4.17
on page 174.
c)	Show that M x M С М2 and that A2 agrees with A x A on M x M.
Hint: First show that В x 1Z G М2 for all В G В and then that
E x 1Z G М2 for all E G M.	____
d)	Use part (c) to conclude that М2 D M x M and A2|JV1xJV1 = A x A.
e) Deduce the required result.
4.164	Generalize the previous exercise to n-dimensions.
4.165	Complete the proof of Lemma 4.10 on page 251.
4.166	Suppose that (Г2,Л,д) is a measure space and that f is an Л-measurable
function on Q. Prove that
/	= / fdp.
Jn Jn
Hint: Bootstrap.
258 □ Chapter 4 Measure Theory
4.167	Establish the calculus theorem stated in Example 4.23(a) on page 248 by
proceeding as follows:
a)	Define h on 7£2 by h(x,y) = /(#,?/), if (x,y) G R, and zero otherwise.
Show that h is Л4 x A4-measurable and is in £?(А x A).
b)	Prove that
Ijf(x,y)dxdy = f fd(XxX),
R	R
where the integral on the left is a double Riemann integral. Hint: Use
Exercise 4.166.
c)	Deduce (4.43) on page 248.
d)	Does (4.43) remain valid if f is assumed only to be Riemann integrable
on [a, 6] x [c, d]? Explain.
4.168	State and prove the analogue of Fubini’s theorem for the completion of a
product measure space.
4.169	Integration by parts: In this exercise, we will develop an integration
by parts formula for Lebesgue-Stieltjes integrals. We proceed using the
following steps:
a)	Let /1 be a finite Borel measure on R with distribution function, FM.
Define FM(x—) = sup{FM(t) : t < x}. Show that F^(x—) = д((—oo, я)}
for each x G R.
b)	Use part (a) to deduce that, for each x G R, д({я}) = Fm(j:) ~ FM(x—).
[Thus, FM is continuous at x if and only if x is not an atom of p,.]
c)	Let у be a finite Borel measure having distribution function, Fu. Prove
that, for a, b G R,
I Fn(x)dy(x)+ I Fv(x) dp.(x)
J (a,b]	J (a,b]
= ГД(Ь)Я(Ь) - Гм(а)Я(а) + /	(F^x) - ГДх-)) dv(x).
J (a,b]
Hint: Apply Tonelli’s theorem to show that
/ FMdvty) + /	F„(x) dp,(x) = / H(x,y)d(n x v)(x,y),
J(a,b]	•'(“.4
where H(x,y) = X(-oo,j/](?/) +	Then show
H(x,y) = Х{к}(г)Х(а,Ь)(3/) + X(-oo,a](z)X(a,b](!/)
+ X(a,b](;r)X( — oo,a1(») + Х(.,ч(1)Х(а,ч(»).
4.170	Let (Qfc, Л/с), 1 < к < n, be measurable spaces. Denote by U the collection
of n-dimensional measurable rectangles; that is,
U = < X Ak : Ah G Ak, 1 < к < n > .
I fc=i	J
Prove that U is a semialgebra.
4.9 Iteration of Integrals in Product Measure Spaces □ 259
Exercises 4-^71-4-175 should be completed by all readers who plan to cover the
probability material in Chapter 5.
+4.171 Denote by Bn the smallest cr-algebra of subsets of 7Zn that contains all
open sets of 7dn.
a)	Show that Bn = В x • • • x B.
b)	A measure on Bn is called an n-dimensional Borel measure. Sup-
pose that /i and у are two finite n-dimensional Borel measures such
that X ^=1 Bk) = i/( X ^=1 Bk) for all Bi, B2, ..., Bn e B. Prove
that p = y\ that is, /z(B) = i/(B) for all В e Bn-
4.172 Let Zn denote the collection of all n-dimensional intervals in 7Zn] that is,
all sets of the form x I2 x • • • x In where Ij G T for 1 < j < n.
8l) Show that the cr-algebra generated by Tn is Bn\ that is, Л(ТП) = Bn-
Hint: Use Exercise 4.171(a).
b) Let p and у be two n-dimensional Borel measures such that p(I) < 00
for all bounded n-dimensional intervals and p(I) = z/(Z) for all I €ln.
Prove that p — y.
4.173 Let J denote the collection of intervals of of the form (a, 5] and (c, 00),
where —00 < a < b < 00 and —00 < c < 00. Also, let Jn denote the
collection of all subsets of 7Zn of the form Ji x J2 x • • • x Jn where Jk € J
for 1 < к < n. Prove that J7n is a semialgebra and that the cr-algebra
generated by Jn is Bn.
*
+4.174 Suppose that p and у are two finite n-dimensional Borel measures such
that p( X £=1(—00, Xfc]) =	X ^=1(—00, Xfc]) for all xi, £2, - - -, xn ETZ.
Prove that p = y. Hint: It suffices to prove that p = у on Jn-
+4.175 Let /11, ..., pn be finite Borel measures and p an n-dimensional Borel
measure. Suppose that д(X£=1(—00,□;&]) = flfc=i	xfc]) f°r aii
xi, Х2, ..., xn G TZ. Prove that p = X £=1 ph-
Andrei Nikolaevich Kolmogorov
(1903-1987)
Andrei Kolmogorov was born on April 25.1903,
in Tambov, Russia. He attended Moscow State
University, graduating from there in 1925.
Kolmogorov's contributions to mathematics
encompass a formidable range of subjects. A
partial listing includes functions of a real vari-
able, trigonometric series, probability theory,
theory of algorithms, functional analysis, topol-
ogy, dynamical systems, information theory, and classical mechanics.
Kolmogorov revolutionized probability theory. He introduced the mod-
ern axiomatic approach to probability and proved many of the fundamen-
tal theorems that are a consequence of that approach. He also developed
two systems of partial differential equations that play a crucial role in the
theory of Markov processes.
In addition to his work in higher mathematics, Kolmogorov was inter-
ested in the mathematical education of schoolchildren. He was chairman
of the Commission for Mathematical Education under the Presidium of
the Academy of Sciences of the U.S.S.R. During that time, he was in-
strumental in the development of a new training program which was
incorporated into the Soviet schools.
Many articles and books were written by Kolmogorov. The book; In-
troductory Real Analysis, co-authored with S. V. Fomin, provides, in
the bibliography, a listing of some of his publications. Kolmogorov was a
member of the faculty at Moscow State University until his death in 1987.
260
Elements of Probability
Probability is the mathematical discipline dealing with the analysis of ran-
dom phenomena. Intuitively, the probability of an event is a measure of
the likelihood of its occurrence — a probability near 0 indicates that the
event is unlikely to occur, whereas a probability near 1 suggests that the
event is likely to occur.
The origins of the theory of probability are usually taken to be in the
middle of the seventeenth century, although the basic concepts of proba-
bility date back to before the birth of Christ. With the development of the
natural sciences in the early 1900s, it became increasingly important for
probability to have a formal mathematical framework similar to that found
in other branches of mathematics such as geometry and abstract algebra.
Measure theory supplied the required framework.
In this chapter, we will introduce the elements of probability theory
based on the axiomatic development by Andrei Nikolaevich Kolmogorov.
The foundations will be presented in Sections 5.1-5.3. Then, as a first
application, we will examine several theorems, known collectively as laws
of large numbers, which comprise some of the most important results in
probability. We will return to further explore probability theory in other
chapters of the text.
261
262	□ Chapter 5 Elements of Probability
5.1 THE MATHEMATICAL MODEL FOR PROBABILITY
In this section, we will develop the mathematical model for probability
based on the theory of measure discussed in Chapter 4. However, before
we begin with that development, it will be useful for motivational purposes
to provide an interpretation of the meaning of probability.
To that end, let us think of an event as some specified result that may
or may not occur when an experiment is performed; for example, a head
comes up (the event) when a coin is tossed (the experiment). The usual
interpretation of probability is the relative-frequency interpretation,
which construes the probability of an event to be the relative frequency of
its occurrence in a large number of repetitions of the experiment.
More formally, let E be an event and P(E) its probability. For n repe-
titions of the experiment, let n(E) denote the number of times that event E
occurs. The relative-frequency interpretation is that, for large n, the pro-
portion of times that event E occurs in the n repetitions of the experiment
will be approximately equal to the probability that event E occurs on any
particular trial:
~~ ~	for lar6e n'	(5.1)
To illustrate, consider the experiment of a single toss of a balanced
coin. Because the coin is balanced, we reason that there is a 50-50 chance
that the coin will come up heads (i.e., will land with heads facing up).
Thus, we attribute probability 0.5 to that event. The relative-frequency
interpretation is that in a large number of tosses of the coin, heads will come
up about half the time. We used a computer to perform two simulations
of tossing a balanced coin 100 times. The results are displayed in Figs. 5.1
and 5.2 and seem to corroborate the relative-frequency interpretation.
FIGURE 5.1
FIGURE 5.2
5.1 The Mathematical Model for Probability □ 263
We should emphasize that all attempts to use (5.1) as a definition of
probability have failed. Nonetheless, the relative-frequency interpretation
is invaluable for motivational purposes in the axiomatic development. Fur-
thermore, we shall see that once the axioms of probability are in place, a
mathematically precise version of (5.1) can be proved as a theorem.
Probability Spaces
Consider now an experiment whose outcome cannot be predicted with cer-
tainty beforehand. Such an experiment is called a random experiment.
The set of possible outcomes of the experiment is called the sample space
and is usually denoted by the English letter, S, or the Greek letter, П; we
will use the latter notation? The possible outcomes themselves are denoted
generically by the Greek letter, w.
Actually, we will permit as a sample space any set containing all the
possible outcomes of the experiment. This is because, a priori, it is some-
times difficult to know precisely the possible outcomes of an experiment.
For instance, consider the experiment of rolling a die once and observing
the number of dots on the face pointing up. The most natural choice for
the sample space is Q = {1,2,3,4, 5,6}. However, it is conceivable, because
of, say, an imperfection in the die, that four would never come up. The im-
portant factor in the choice of a sample space is that all possible outcomes
are included as elements, not that all elements are possible outcomes.
Associated with a random experiment is a collection of events, usually
called the event class, which we will denote by A. The assumption is that
any specified event will either occur or not occur when the experiment is
performed. Each event, E € Л, can be considered a subset of the sample
space; namely, the collection of outcomes that satisfy the conditions for the
occurrence of E. Using this identification between events and sets, we see
that an event, E, occurs if and only if the outcome of the experiment, cu,
is a member of E, that is, w G E.
We should point out that the empty set, 0, corresponds to an event that
cannot occur and is called the impossible event. Two events, A and B,
are called mutually exclusive if their joint occurrence is impossible; in
other words, if A and В are disjoint. More generally, if each pair of events
among a collection of events is mutually exclusive, then we say that the
events in the collection are pairwise mutually exclusive.
1 The term outcome space is more descriptive than “sample space,” but we will
adhere to the traditional terminology.
264 □ Chapter 5 Elements of Probability
EXAMPLE 5.1 Illustrates Sample Spaces and Events
a)	Consider the experiment of tossing a coin three times. A sample space
for the experiment is Q = {HHH, HHT, HTH, HTT, THH, THT, ТТН, TTT}
where, for instance, HTT denotes the outcome of a head on the first
toss and tails on the second and third tosses. Then, for instance, the
event, E, that the first two tosses are heads consists of the two out-
comes, HHH and HHT. In other words, E = {HHH, HHT}. Now, let F
be the event that exactly two of the three tosses are tails. Clearly, it
is not possible for both E and F to occur when the experiment is per-
formed; hence, E and F are mutually exclusive. We can see this fact set
theoretically by noting that F = {HTT, THT, TTH} and, so, E Q.F = 0.
b)	Suppose that, starting at 6:00 PM, we observe the elapsed time, in hours,
until the first patient arrives at a certain emergency room. For this
experiment, we can take the sample space to be the nonnegative real
numbers: Q = [0, oo). Then, for instance, the event, E, that the first
patient arrives between 6:15 and 6:30 PM, inclusive, consists of all real
numbers between 1/4 and 1/2, inclusive; that is, E = [1/4,1/2].	□
Next we need to decide on what properties an event class, A, must
have. First of all, if E € A (i.e., E is an event), then we can speak of
the occurrence or nonoccurrence of E. However, the nonoccurrence of E
is equivalent to the occurrence of the complement of E. Hence, if E e A,
then we require that Ec 6 A.
Suppose that A, В e A. Then we can speak of the occurrence of each
of the two events individually. Hence, it should be meaningful to speak of
the occurrence of at least one of the two events. But, the occurrence of
at least one of A and В is equivalent to the occurrence of the union of A
and B. Thus, if A, В 6 A, then we require that A U В 6 A; that is,
A should be closed under finite unions. For mathematical reasons, we will
impose the stronger requirement that A be closed under countable unions.
To summarize, we see that the event class, A, should be closed under
complementation and countable unions. In other words, A should be а
a-algebra of subsets of Q.
We now turn our attention to probability. In the axiomatic treatment
of probability, we assume that to each event, E, there corresponds a num-
ber, P(E), representing the probability that event E occurs. Thus, we can
think of P as a set function defined on the collection, A, of events. We
will employ the relative-frequency interpretation of probability in order to
delineate the properties required of P.
So, assume that the experiment is repeated a large number, n, of times.
Then, by (5.1), P(E) « n(E)/n for each event, E. Clearly, n(E)/n > 0
5.1 The Mathematical Model for Probability □ 265
and, consequently, we require that P(E) > 0 for each event, E. In other
words, probabilities should be nonnegative numbers, an obvious restric-
tion. Note also that since Q contains all of the possible outcomes of the
experiment, it must occur every time the experiment is performed. Hence,
n(Q)/n = 1, which means that we should have P(Q) = 1, another obvious
condition. Further, since 0 represents an impossibility, n(0)/n = 0, which
means that we should have P(0) = 0, again an obvious condition.
Finally, suppose that A and В are mutually exclusive (disjoint) events.
Then, we have n(A UB) = n(A) 4- n(B) and, consequently, by (5.1),
P(A u B) « ra(A ug) = п(л) + n(B) = 2^ +	« Р(Л) + P(B).
n	n	n n
Hence, we require P to be finitely additive. Again, for mathematical rea-
sons, we will impose the stronger condition of countable additivity. This
and the previous paragraph indicate that P should be a probability measure
on the a-algebra, A, of events.
In summary, the mathematical model for a random experiment consists
of a set, Q, containing the possible outcomes of the experiment; a a-algebra,
Л, of subsets of Q, representing the collection of events; and a probability
measure, P, on Л, where, for each E € A, P(E) is interpreted as the
probability that event E occurs. As we learned, in Example 4.1(h) on
page 169, the triple, (Q,.A, P), is called a probability space.
DEFINITION 5.1 Probability Space
A probability space is a triple, (Q, Л, P), where Q is a set, A is a
a-algebra of subsets of Q, and P is a probability measure on A.
The following examples illustrate the discussion of probability spaces.
We leave any remaining details as exercises for the reader.
EXAMPLE 5.2 Illustrates Definition 5.1
a)	Refer to Example 5.1(a) on page 264. In this case, we take A — P(fl)
so that every subset of Q is an event. If the coin is balanced, then, by
symmetry, each possible outcome should be equally likely, implying that
each has probability 1/8. This, in turn, implies that the appropriate
probability measure is P = 7/8, where 7 is counting measure on P(Q).
In other words, for each E G Л, P(B) = 2V(B)/8, where N(E) denotes
the number of elements of E.
266 □ Chapter 5 Elements of Probability
b)	Suppose that fi is a countable sample space, that is, the experiment has
either a finite or countably infinite number of possible outcomes, say
cui, o?2, ... . For a countable sample space, we always take A = P(fi).
Let pn = P({o?n}). Then P = £npnt>u>n; that is, for E € A,
P(E) = £ pn.
c)	As a special case of part (b), suppose that fi is finite and that each
possible outcome is equally likely. Then we must have pn = l/7V(fi) for
n = 1, 2, ..., 7V(fi) and, moreover,
p(E} = w.
P(E’ JV(fi)
for each event, E. This probability model is often referred to as the
discrete uniform model. It can be used as the mathematical model
for selecting a point at random from the finite set fi.
d)	Suppose that fi is a bounded Lebesgue measurable subset of 1Zn having
positive Lebesgue measure and let A = { fi П M : M G A4n }• For
E e A, define P(E) = An(E)/An(fi). Then (fi, A, P) is a probability
space. This probability model is often referred to as the continuous
uniform model. It can be used as the mathematical model for selecting
a point at random from the set fi.	□
Because a probability space is, in particular, a finite measure space,
we can immediately infer for probability spaces any properties of finite
measure spaces. For future reference, we list some of the more important
properties of probability measures in Proposition 5.1.
PROPOSITION 5.1
Suppose that	is a probability space and that A, B, and E are
events, that is, А, В, E € A. Then the following hold:
a)	If A С B, then P(B \ A) = P(B) - P(A).
b)	P(EC) = l-P(B).
с)	А С В => P(A) < P(B).
d)	0 < P(B) < 1.
e)	P(A U B) = P(A) 4- P(B) - P(A П B).
f)	If {Bn}Xi c -A Ei E2D '••, then
p(C}En}= lim P(Bn).
\ 1 1	/	n—>oo
5.1 The Mathematical Model for Probability □ 267
g)	If {.Enl^Li c Ei С E2 C , then
p( U = lim w)-
'n=l '	n->0°
h)	If {En}n C A, then
p(l)En} <^P(En).
' n ' n
In probability, this last property is called Boole’s inequality.
Conditional Probability
Frequently, we need to obtain the probability of an event, B, under the
condition that another event, A, has occurred. For instance, consider the
experiment of selecting an adult American at random. We might be inter-
ested in the probability that the person selected is a Democrat (event B).
But we also might want to know the probability that the person selected is
a Democrat assuming that the person selected is a female (event A). The
former probability, as we know, is denoted by P(B). On the other hand,
the latter probability is denoted by P(B\ A), read “the probability of В
given A,” and is called the conditional probability of event В given that
event A has occurred.
More generally, we can refer to the relative-frequency interpretation of
probability in order to obtain a formal definition of conditional probability;
that is, a definition in terms of the original probability space, (Q,A, P).
So, assume that the experiment is repeated a large number, n, of times.
Let E be an event with nonzero probability. Given that event E occurs,
an event F will occur if and only if event E П F occurs. Consequently,
in the n repetitions of the experiment, the relative frequency of occur-
rence of event F among those times in which event E has occurred equals
n(E П F)/n(E). But, by (5.1),
n(EnF) _ n(EQF)/n _ P(EnF)
n(E) “ n(E)/n ~ P(E) ’
Therefore, we make the following definition:
DEFINITION 5.2 Conditional Probability
Let (Q, Л, P) be a probability space and E G A with P(E) > 0. Then,
for F G A, the conditional probability of event F given that event E
268 □ Chapter 5 Elements of Probability
has occurred is defined by
P(F|E) =
P(E П F)
P(E)
EXAMPLE 5.3 Illustrates Definition 5.2
Refer to Examples 5.1(a) and 5.2(a). Suppose that a balanced coin is
tossed three times. Let В denote the event that a total of two heads are
tossed and A denote the event that the first toss is a head. We have
A = {ИНН, HHT, HTH, HTT} and В = {HHT, HTH, THH}. Consequently,
the conditional probability of event В given that event A has occurred is
P(n I дч = P(AHB) = P({HHT, HTH}) = I =
1 1 ' P(A) P({HHH, HHT, HTH, HTT}) I
Observe that the (unconditional) probability of event В is
P(B) = - = 0.375.
Hence, the information that event A has occurred affects the probability
that event В occurs.	□
The next proposition, whose proof we leave as an exercise for the
reader, shows that, for fixed E, the set function P(-1E) is a probability
measure on A. That probability measure provides the likelihood of events
under the condition that event E has occurred.
PROPOSITION 5.2
Let (Q, A, P) be a probability space and E e A with P(E) > 0. Define Pe
on A by Pe(A) = P(A | E). Then Pe is a probability measure.
Independent Events
Next we will define independence for events. Intuitively, event F is inde-
pendent of event E if the occurrence or nonoccurrence of event E doesz not
affect the probability of F; that is, if P(F | E) = P(F). In view of Defini-
tion 5.2, this is equivalent to the condition that P(E П F)/P(E) = P(F).
Clearing fractions yields the equation P(E П F) = P(E)P(F). This last
equation has the advantages of symmetry and not requiring the event E to
have positive probability. Hence, we make the following definition:
5.1 The Mathematical Model for Probability □ 269
DEFINITION 5.3 Independent Events
Two events, E and F, are said to be independent^ if
P(FAF) =P(B)P(F).
If E and F are not independent, then they are called dependent.
EXAMPLE 5.4 Illustrates Definition 5.3
Refer to Example 5.3. Suppose that a balanced coin is tossed three times.
Let A denote the event that the first toss is a head, В the event that a total
of two heads are tossed, and C the event that the last two tosses are heads.
We have P(A) = 4/8 = 0.5, P(B) = 3/8 = 0.375, P(C) = 2/8 = 0.25,
P(A ПВ) = 2/8 = 0.25, and P(A ПС) = 1/8 = 0.125. It follows that
P(A П B) / P(A)P(B) and P(A A C) = P(A)P(C). Hence, events A
and В are dependent, while events A and C are independent.	□
We have defined independence for two events. For more than two
events, we must be careful to distinguish between two types of indepen-
dence, pairwise independence and mutual independence. Events Ai, A2,
..., An are said to be pairwise independent if, for г j, A$ and Aj are
independent in the sense of Definition 5.3. In probability theory, however,
the concept of mutual independence plays a more prominent role.
DEFINITION 5.4 Mutually Independent Events
Let (fi,Л, P) be a probability space. Events Ai, A2, ..., An are said
to be mutually independent if for each subset {zi, гг,. .., im} of
{1,2,..., n}, we have
Р(А^ A Ai2 A • • • A Aim) = Р(Аъ)Р(Ла) • • • P(Aim).
The events of an arbitrary (not necessarily finite) collection are called
mutually independent if every finite number of them are mutually in-
dependent.
t The terms statistically independent, stochastically independent, and
probabilistically independent are also used.
270 □ Chapter 5 Elements of Probability
Note: Although mutually independent events are pairwise independent,
the converse is not true. See Exercise 5.18(a).
One advantage of mutual independence over pairwise independence is
that, with mutual independence, events formed by set operations on disjoint
subcollections are also mutually independent. For example, if E, F, and G
are mutually independent events, then E U F and G are also independent
events.
The following theorem plays a crucial role in many probabilistic argu-
ments. In interpreting the theorem, observe that for a sequence of events,
{An}^Li, the event ПХ1 (Ujb=n ^*0 occurs if and only if infinitely many
of the Ans occur.
THEOREM 5.1 Borel-Cantelli Lemma
Suppose that (Q, A, P) is a probability space and that {An}^ C A.
a) If12n=i p(An) < oo, then
b) If Ai, A2, ... are mutually independent and 52X1 -P(An) = 00, then
PROOF: For convenience, set En =
a)	We have Ex D E2 Z) •  • and |"]Xi	= rC=i(Ub=n Afc)- Applying
Proposition 5.1(f) and Boole’s inequality, we obtain that
GOO	\	00
П En ) = lim P(Pn) < lim VP(Afc) = 0,
’ J п-^оо	П—ЮО '
»t=l 7	fc=n
where the last equation holds because P(An) < oo.
b)	In this part, we will use the fact that, for x > 0, e~x > 1 — x. Let ntN
be fixed but arbitrary. Applying Proposition 5.1(f) to the sequence of
events, A£ for m = n, n 4-1, ..., and using Exercise 5.20(b), we
5.1 The Mathematical Model for Probability □ 271
get that
Goo \	/ m
p(n-n
л=П 7	^fc=n
m	m
= nm)=,ta,n[i-p(M
k=n	k=n
m
< lim ГТ e-p(j4fc)
m—>oo
k=n
lim exp - > PMfc)
m—>oo	z—-'
L k=n
= o,
where the last equality holds since Y^k=nP(Ak) = oo. Consequently,
for each пбЛС P(En) = 1- The required result now follows easily. 
EXERCISES 5.1
5.1	Marilyn vos Savant publishes a column in Parade magazine. A variation
of the following problem appeared in her column and caused tremendous
controversy among the mathematical community: On a game show, there
are three doors behind which there is one prize each. Two of the prizes are
worthless and one is valuable. A contestant selects one of the doors following
which the game-show host, who knows where the valuable prize lies, opens
one of the remaining two doors to reveal a worthless prize. The host then
offers the contestant the opportunity to change his selection. Should he
switch? Hint: Use the relative-frequency interpretation of probability.
5.2	Refer to Example 5.2 on page 265. Provide the details for parts (a)-(d) of
that example.
5.3	Suppose that (Г,5,/х) is a measure space and that Q G S is such that
0 < /z(Q) < oo. Let A = So and, for E G A, define P(E) = /i(E)/jz(Q).
Show that (Q, A, P) is a probability space.
5.4	Refer to Example 5.1(b) on page 264. As in the example, let Q = [0, oo)
and set A = A4[o,oo)- Experience shows that the probability is 1 — e~7t that
the first patient arrives within t hours of 6:00 PM.
a)	Prove that there exists a unique probability measure on A consistent
with the previous sentence.
b)	Determine explicitly the probability measure in part (a).
c)	Determine the probability that the first patient arrives between 6:15
and 6:30 PM.
5.5	Provide the proof for Proposition 5.1 on page 266. You may cite any theo-
rems from Chapter 4.
5.6	Let (Q,A, P) be a probability space and {An}n a sequence of events with
P(An) = 1 for each n. Prove that P(Qn An) = 1.
272 □ Chapter 5 Elements of Probability
5.7	Use induction to prove the following generalization of Proposition 5.1(e): If
Ei, E2, • • •, En are n events, then
P{E1 U E2 U • • • U E„) = y^F(Ei) - £2 P(Eii n Ei2) + • • •
i=l	ii<i2
+ (-l)*+1	$2 P(EiinEi2n---nEik)
+ • • - + (-l)n+1P(Ei П E2 П • • • П En).
5.8	Suppose that a coin has probability, p, of coming up heads, where 0 < p < 1.
Consider the experiment of tossing the coin until a head appears.
a)	Determine a sample space for this experiment.
b)	Assign probabilities to each of the possible outcomes.
c)	Construct a probability space for the experiment.
d)	Repeat parts (a)-(c) if p = 1.
e)	Repeat parts (a)-(c) if p = 0.
5.9	Consider the experiment of rolling two balanced dice.
a)	Construct a probability space for the experiment.
b)	Determine the probability of rolling doubles, that is, of both dice coming
up the same number.
c)	Use Definition 5.2 to obtain the conditional probability of rolling doubles
given that the sum of the dice is four.
d)	Solve part (c) without using Definition 5.2 but instead by constructing
a new sample space based upon the condition that the sum of the dice
is four.
5.10	Suppose that two cards are selected at random from an ordinary deck of
52 playing cards, where the first card selected is not replaced prior to the
drawing of the second card.
a)	Employ counting techniques to determine the number of possible out-
comes of the experiment.
b)	Use Definition 5.2 and counting techniques to obtain the conditional
probability that the second card selected is a heart given that the first
card selected is a heart.
c)	Solve part (b) without using Definition 5.2 but instead by constructing
a new sample space based on the condition that the first card selected
is a heart.
5.11	Refer to Exercise 5.4.
a)	Determine the probability that the first patient arrives after 6:15 PM.
b)	Determine the (conditional) probability that the first patient arrives af-
ter 6:15 PM given that the first arrival occurs after 6:10 PM.
5.12	Prove Proposition 5.2 on page 268.
5.1 The Mathematical Model for Probability □ 273
★5.13 Let (Q, А, P) be a probability space. Suppose that {En}n is a sequence of
pairwise mutually exclusive events with |J En — Q.
a)	Prove that, for each event A,
PU)=^P(EnflA).
n
b)	Assuming also that P(Fn) > 0 for each n, prove the law of total
probability: For each event A,
P(A) = £ P(A | E„)P(E„) = $2 PEn (A)P(En).
n	n
c)	Assuming also that P(A) > 0, prove Bayes’ rule (named in honor of
the 18th century clergyman, Thomas Bayes): For each fc,
Р<Е,|Л)-£.р(л I &)««.)•
5.14	This exercise considers some basic properties of independence.
a)	Show that if events Ё and F are both mutually exclusive and indepen-
dent, then either P(E) = 0 or P(F) = 0. Equivalently, two events with
positive probability cannot be both mutually exclusive and independent.
b)	Show that if event E and event F are independent and E C F, then
either P(E) = 0 or P(F) = 1.
5.15	Refer to Example 5.2(d) on page 266. Take n = 1 and Q = [0,1]. Sup-
pose that [a, 5] is a nonempty, proper subinterval of [0,1]. Determine all
subintervals of [0,1] that are independent of [a, 5].
5.16	Suppose that a card is randomly selected from’an ordinary deck of 52 playing
cards. Let A denote the event that the card selected is a king, В the event
that the card selected is a heart, and C the event that the card selected is
a face card.
a)	Are events A and В independent?
b)	Are events A andC independent?
5.17	Refer to Example 5.2(d) on page 266.
a)	Let Q = {(x, y) € 7£2 : 0 < x, у < 2 }. Suppose that a point is selected
at random from Q. Let A denote the event that the ж-coordinate of
the point selected is at most one and let В denote the event that the
^-coordinate of the point selected is at most 0.5. Determine whether A
and В are independent events.
b)	Repeat part (a) if Q = {(ж, у) C И2 : 0 < у < x < 2 }.
274 □ Chapter 5 Elements of Probability
5.18	Suppose that two balanced dice, one orange and the other black, are rolled.
Let
A = event the orange die comes up even;
В = event the black die comes up even;
C = event the sum of the dice is even;
D = event the orange die comes up 1, 2, or 3;
E = event the orange die comes up 3, 4, or 5;
F = event the sum of the dice is 5.
a)	Show that the events, A, B, and C, are pairwise independent but not
mutually independent.
b)	Show that A U В and C are dependent events.
c)	Show that P(P П E П F) = P(D)P(E)P(F) but that D, E, and F are
not pairwise independent (and, hence, not mutually independent).
5.19	Prove that if E and F are independent events, then so are E and Fc.
5.20	Suppose that Ai, Аг, .. •, An are mutually independent events.
a)	Prove that Ai U Аг U • • • U An-i and An are independent events. Hint:
Use induction.
b)	Prove that P(Q^=1 A£) = Пл=1	Use induction, part (a),
and Exercise 5.19.
5.2 RANDOM VARIABLES
When a random experiment is performed, it is often some numerical quan-
tity associated with the outcome that is of interest, rather than the out-
come itself. For example, consider the classical (noncasino) game of craps
in which two balanced dice are rolled. Each possible outcome of the ex-
periment can be represented as an ordered pair of integers, (г, j), where i
and j are the number of dots showing on the two dice. But what is of
concern here is the sum, i + J, not the outcome, (г, j), itself. Similarly, in
studying the relationship between height and weight, we might sample in-
dividuals from the population. Here we would be interested in the heights
and weights of the individuals selected, not the individuals themselves.
In the first example of the previous paragraph, we have a real-valued
function, sum of the two dice, defined on a sample space; and, in the second
example, a vector-valued function, (height, weight), defined on a sample
space. Traditionally, in probability, real-valued functions on a sample space
are called random variables and vector-valued functions on a sample space
are called random vectors. It is also traditional to denote random variables
and vectors by uppercase italicized English-alphabet letters near the end
of the alphabet.
5.2 Random Variables □ 275
Random Variables and Their Distributions
For a rigorous development of random variables and random vectors, we
need to be more precise. So, suppose that (Q, A, P) is a probability space
and that X is a real-valued function on fi. Usually, we are interested in
the probability that X takes on various values (e.g., the probability that
X equals two, that X exceeds 7.5). More generally, for each Borel set, B,
we want to know the probability that the value of X is a member of B; that
is, P({ u> : X(w) G В }). But, for that probability to exist, { ш : X(cu) G В }
must be an event. Hence, we make the following definition:
DEFINITION 5.5 Random Variable
Let (Q, Л, P) be a probability space. A real-valued function, X, on Q
is called a random variable if { w : X(o>) G В } G A for each В e В.
Remark: From Exercise 4.21 on page 181, we know that a real-valued func-
tion f on Q is Л-measurable if and only if /-1(B) G A for each В G B.
Thus, we see that random variables are just real-valued Л-measurable func-
tions. However, as we mentioned in Section 4.2, the term “random variable’'
is used for measurable functions in the context of probability spaces, even
though the measurability (or nonmeasurability) of a function has nothing
at all to do with a measure.
In probability, we ordinarily employ the notation {X 6 B} in place
of the more common notations, X~1(B) or {cu : X(o>) G B}. The reason
is that the former notation is more suggestive. Also, for brevity, commas
usually replace intersection symbols in probability expressions involving
events defined in terms of random variables. For instance, we generally
write P(X G А,У G B) instead of P({X G А} П {Y G B}).
One of the most important quantities affiliated with a random variable
is its probability distribution. Roughly speaking, the probability distribu-
tion of a random variable describes the probabilities associated with the
various values of the random variable. More precisely, we have:
DEFINITION 5.6 Probability Distribution
Let X be a random variable on the probability space (Q, Л, P). Then
the probability distribution of X, denoted дх, is the set function
on В defined by — P(X G B).
276 □ Chapter 5 Elements of Probability
The proof of the next proposition is left to the reader as an exercise.
PROPOSITION 5.3
Let X be a random variable on the probability space (Sl,A,P). Then px
is a probability measure on B.
In the following example, we will present some illustrations of random
variables and their probability distributions. The reader should supply the
required details of verification.
EXAMPLE 5.	5 Illustrates Definition 5.6
a)	A random variable, X, is said to be a discrete random variable
if there is a countable set, K, such that P(X € K) = 1. For such
a random variable, write К = {xn}n. Then the probability distribu-
tion of X is given by px =	where pn = P(X = zn). For
a discrete random variable, the function, px-H —* [0,1], defined by
Px(^) = P(X = x), is called the probability mass function (pmf)
of X. Note that px is zero on Kc and that px(zn) = pn.
b)	Suppose that two balanced dice are rolled. An appropriate probability
space is obtained by taking Q = { (г, j) : г, j = 1,2,..., 6 }, A = P(Q),
and P = 7/36 where 7 is counting measure. Let X denote the sum of
the dice. Because P(X E {2,3,..., 12}) = 1, we see that X is a discrete
random variable. The pmf of X is
px(x) = <
(x-l)/36,
(13 - z)/36,
0,
x = 2, 3, .
x = 8, 9, .
otherwise.
7;
12;
c)	A random variable, X, is said to be an absolutely continuous ran-
dom variable if there is a nonnegative Borel measurable function, /,
such that px(B) = fBfdX for all В E B? For such a random vari-
able, we usually write f = fx and call fx the probability density
function (pdf) of X.
d)	Suppose that a number is selected at random from the interval [0,1]
and let X denote the number obtained. Then, for В E B, we have
Mx(B) = P(X E В) = A(B П [0,1]) = fB X[o,i] dA. Hence, X is an
t In elementary probability courses, absolutely continuous random variables are
usually referred to simply as continuous random variables. However, as we will see in
part (e), to be precise we need to include the adjective “absolutely.”
5.2 Random Variables □ 277
absolutely continuous random variable with pdf, fx = X[o,i]- Such a
random variable is said to have the uniform distribution on [0,1].
e)	A random variable, X, is said to be a continuous random variable
if P(X = x) = 0 for all x € P. Note that if X is a continuous random
variable, then P(X € K) = 0 for each countable subset К C Tl\ thus,
a continuous random variable is not discrete and vice versa. Also, note
that an absolutely continuous random variable is a continuous random
variable. However, the converse is not true. See Exercise 5.28.
f)	There are random variables that are neither discrete nor continuous.
See, for instance, Exercise 5.32.	□
Closely associated with the probability distribution of a random vari-
able is the probability distribution function. We define this next.
DEFINITION 5.7 Probability Distribution Function
Let X be a random variable on the probability space (Q, Д, P). Then
the probability distribution function of X, denoted Fx, is the
real-valued function on R, defined by Fx(rr) = P(X < x).
Remark: From Definitions 5.6 and 5.7, we see immediately that, for a
random variable, X, the probability distribution and probability distribu-
tion function are related by the equation Fx(a?) = Мх((~°°>ж])- In other
words, the probability distribution function of X is also the distribution
function of px, in the sense of Definition 4.19 on page 221.
EXAMPLE 5.	6 Illustrates Definition 5.7
a)	For a discrete random variable, as described in Example 5.5(a), we have
*x(z) = Xxn<xPn = Y,t<xPX^-
b)	For an absolutely continuous random variable, as described in Exam-
ple 5.5(c), we have Fx(tf) — fx(t) dt, where, in general, the integral
is a Lebesgue integral.	□
Clearly, two random variables having the same probability distribution
must also have the same probability distribution function. The converse is
also true, as the next theorem shows.
THEOREM 5.2
Two random variables having the same probability distribution function
have the same probability distribution; that is, Fx = Fy => px = Mr •
278 □ Chapter 5 Elements of Probability
PROOF: Let F = Fx = Fy. By assumption, both of the finite Borel
measures, /z% and /zy, have F as their distribution function. Therefore,
by the uniqueness portion of Theorem 4.13 on page 226, we must have
Given a probability measure, /z, on the Borel sets, or, equivalently, a
distribution function with F(oo) = 1, does there exist a probability space
and a random variable defined thereon whose probability distribution is /z?
The answer is yes! See Exercise 5.25.
Random Vectors and Their Distributions
Frequently, we are interested in two or more numerical quantities associated
with the outcome of a random experiment, for example, the height and
weight of a randomly selected individual. This leads to the notion of a
random vector or, equivalently, two or more random variables considered
simultaneously.
To begin our discussion of random vectors, we recall that Bn denotes
the a-algebra generated by the open sets of Ип and that the members of Bn
are termed n-dimensional Borel sets. In Exercise 4.171(a), we showed
that Bn = В x • • • x B\ in other words, Bn is also the cr-algebra generated
by the n-dimensional Borel rectangles—sets of the form x ••• x Bn,
where Bk € Б, 1 < k < n. With these facts in mind, we now prove
Proposition 5.4.
PROPOSITION 5.4
Let Xi, ..., Xn be n random variables all defined on the same probability
space (Q, Л, P). Then { w: (A'i(u'),..., Xn(u/f) € В } e A for all В e Bn.
PROOF: Let Bk € B, 1 < к < n. Because each Xk is a random variable,
we have {Xk € Bk} E A for 1 < к < n. Therefore, because A is a
a-algebra,
{ w : (XjCw), .... Xn(w)) € By x • • • x Bn } = Q {Хк € Bk} E A. (5.2)
fc=l
Now, let
7={Вб^:{и:(Ш-Л(«))бВ}бЛ}.
Since Bn and A are сг-algebras, so is F. Furthermore, by (5.2), F contains
all n-dimensional Borel rectangles. Thus, F — Bn.	
5.2 Random Variables □ 279
In view of Proposition 5.4, we now make the following definition:
DEFINITION 5.8 Joint Probability Distribution
Let Xi, ..., Xn be n random variables all defined oiHrhe same prob-
ability space (П,Д, P). Then the joint probability distribution
of Xi,... , Xn, denoted /zx1 x„> is the set function on Bn defined
by цХ1,...,хЛВ) = р((хъ.. .’,k) e в).
The proof of the following proposition is left to the reader.
PROPOSITION 5.5
Let %i, ..., Xn be n random variables all defined on the same probability
space (Л,Л, P). Then pxi,...,xn Is a probability measure on Bn.
Here now are some examples of joint probability distributions. The
details of verification should be supplied by the reader.
EXAMPLE5.7 Illustrates Definition 5.8
a)	Random variables, Xi, ..., Xn, all defined on the same probability
space (fi, Л, P), are said to be jointly discrete if there is a countable
set, К C 7£n, such that P((Xi,... ,Xn) € X) = 1. It is easy to see
that if Xi, ..., Xn are jointly discrete, then each X^, 1 < к < n, must
be a discrete random variable. The function, Pxi,...,Xn:	[0,1],
defined by	.• ,*n) = P(Xi = xu ... ,Xn = xn) is called
the joint probability mass function (joint pmf) of Xi, ..., Xn. In
this context, each individual pmf, pxk, 1 < к < n, is called a marginal
probability mass function (marginal pmf).
b)	Random variables, Xi, ..., Xn, all defined on the same probability
space (О,Д, P), are said to be jointly absolutely continuous if
there is a nonnegative jBn-measurable function, /, on TV1 such that
MXi,...,Xn(^) = fsfd^n for all В e Bn. For such a random variable,
we usually write f = fx^.^Xn and call /xi,...,x« the joint probability
density function (joint pdf) of Xi, ..., Xn. It is not too difficult to
show that if Xi,..., Xn are jointly absolutely continuous, then each X^,
1 < к < n, must be absolutely continuous. In this context, each indi-
vidual pdf, /xfc, 1 < к < n, is called a marginal probability density
function (marginal pdf).	□
280 □ Chapter 5 Elements of Probability
In analyzing jointly distributed random variables, it is useful to gener-
alize the concept of a probability distribution function to apply to several
random variables. This is done in Definition 5.9.
DEFINITION 5.9 Joint Probability Distribution Function
Let Xi, ..., Xn be n random variables all defined on the same prob-
ability space (Q,A, P). Then their joint probability distribution
function, denoted Fxlj...jxn, is the real-valued function on 1Zn defined
by Fxi(^i,..., xn) = P(Xi < Xi,..., Xn < xn).
Remark: From Definitions 5.8 and 5.9, we see that the joint probability
distribution and joint probability distribution function are related by the
equation Fx1,...,x„(xi,...,xn) = Mx1,...,x„((-00,a;i] x ••• x (-00,жп]).
By the previous remark, it is clear that if Xi, ..., Xn and Yi, ..., Yn
have the same joint probability distribution, then they must also have the
same joint probability distribution function. That the converse is also true
is an immediate consequence of Exercise 4.174 on page 259.
THEOREM 5.3
Two random vectors having the same joint probability distribution function
have the same joint probability distribution; that is,
Fxx,...,xn = FY1.yn =>	= ДУ!...yn-
Given a probability measure, p, on Bn, does there exist a probability
space and random variables defined thereon whose joint probability distri-
bution is //? The answer is yes! See Exercise 5.44.
Independent Random Variables
Next we will discuss independence for random variables. Let us begin by
considering two random variables. Intuitively, two random variables are
independent if knowing the value of one of the variables does not affect the
probability distribution of the other random variable.
To be precise, two random variables, X and Y, are called indepen-
dent if for each pair of Borel sets, A and B, the events {X € A} and
{Y G B} are independent in the sense of Definition 5.3 on page 269; that
is, if P(X € A, Y G B) = P(X G A)P(Y G B). More generally, we have
the following definition:
5.2 Random Variables □ 281
DEFINITION 5.10 Mutually Independent Random Variables
Random variables, Xi, ..., Xn, all defined on the same probability
space (Q, Л,Р), are said to be mutually independent if
P(xx e Bi,...,xn e Bn) = P(xx e BJ • • • P(xn e Bn),
for all Borel sets Bi, ..., Bn. The random variables of an infinite col-
lection are called mutually independent if the random variables of each
finite subcollection are mutually independent. In other words, if I is
an infinite set, then the random variables {Xc}lEi are mutually inde-
pendent if, for each n 6.V and subset {ti,..., tn} C /, the n random
variables Xtl, ..., Xin are mutually independent.
We can also define pairwise independence for random variables: Ran-
dom variables [Xl}lEi, all defined on the same probability space, are said
to be pairwise independent if, for each pair of distinct elements г, j 6 I,
the random variables Хг and X3 are independent. It is easy to see that mu-
tually independent random variables are pairwise independent. However,
the converse is not true. See Exercise 5.45(b).
EXAMPLE 5.8 Illustrates Definition 5.10
Consider the experiment of rolling three balanced dice, say, one orange,
one green, and one black. Let Xi, Хг» and X$ denote the number of dots
facing up on the orange, green, and black dice, respectively, and let X4
denote the sum of the three dice. Then it is clear intuitively that Xi, X2,
and X3 are mutually independent but that Xi, X2, X3, and X4 are not
even pairwise independent. The reader should justify these statements
mathematically.	□
An important property of mutual independence is that functions of
disjoint subcollections of mutually independent random variables are also
mutually independent. That is, we have the following proposition:
PROPOSITION 5.6
Suppose that Xi, ..., Xn are mutually independent random variables and
that nj e A/*, 1 < j < fc, with ni < П2 < • • • < njt = n. Further
282 □ Chapter 5 Elements of Probability
suppose that fi is Bni-measurable, /2 is Bn2_ni-measurable, ..., and Д is
ВПк-Пк_1-measurable. Then the random variables,
А(хг,...,xni), /2(хП1+1)• • •,xn2),fk(xnk_1+1,...,x„k),
are mutually independent.
PROOF: We will prove the proposition in case nj = j, 1 < j < к = n. The
general case is left as an exercise for the reader. Let Bj G В for 1 < j < n.
For the special case, we have
P(f1(X1)eB1,...Jn(Xn)eBn)
= Р(Хг 6 /г^вд...,xn e /-'(Bn))
= p(Xi 6 /{-'(BJ) •  -p(xn e f-\Bn))
= p(f1(x1)eB1)---P(fn(xn)eBn),
as required.
We will now obtain two equivalent conditions for the mutual indepen-
dence of random variables.
THEOREM 5.4
Suppose that Xi, ..., Xn are random variables all defined on the same
probability space (0,Л, P). Then Xi, ..., Xn are mutually independent
if and only if
Vx^.^Xn = MXj x * • * X gxn;	(5.3)
that is, if and only if the joint probability distribution of Xi, ..., Xn
is equal to the product measure induced by the n marginal probability
distributions.
PROOF: Let B^, for 1 < к < n, be any n Borel sets. Suppose first that
(5.3) holds. Then
P(Xi eBi,...,xn eBn) = gxi,...,xnf X Bk\
' fc=i '
= f X \ f X в*}
4 k=i ' 4 fc=i 7
= ПмхДВ^) = ПР(х,еВ^.
fc=i	fc=i
Hence, Xi, ..., Xn are mutually independent.
5.2 Random Variables □ 283
Conversely, suppose that ., Xn are mutually independent. Then
we have
✓ 71	к	n
цХ1,...,х„( X Bk) =P(x1eB1,...,xneBn) = Y[P(XkeBk)
' fc=l	'	fc=l
=	=(x	V x
fc=l	' fc=l	' ' fc=l	'
Thus, px^...,xn agrees with X£=1^xfc on n-dimensional Borel rectangles.
Therefore, by Exercise 4.171(b) on page 259, ДХ1,...,ХП = Х£=1 ^xfc- 
Our second equivalent condition for mutual independence is, in prac-
tice, easier to verify than the one given in Theorem 5.4.
THEOREM 5.5
Suppose that Xi, ..., Xn are random variables all defined on the same
probability space (Q,A, P). Then Xi, ..., Xn are mutually independent
if and only if for all xlf ..., xn e 7£,
Ex1,..„x„(xi,...,rn) = -Fx1(xi)---Fx„(a:n);	(5.4)
in other words, if and only if the joint probability distribution function
of X-i, ..., Xn is equal to the product of the marginal probability distri-
bution functions.
PROOF: Let Xi, ..., xn be any n real numbers. Suppose first that Xi,
..., Xn are mutually independent. Then
^X1}...,Xn(xb • • • ,^n) = P(X1 < £1, • • • ,Xn < xn)
n	' n
= Под<^) = Пад.
k=l	k=l
Hence, (5.4) holds. Conversely, suppose that (5.4) holds. Then we have
Mx1,...,Xn(Xfc=1(-oo,xfe]) = П^=1 Mfc((—oo,Xfc]). Thus, by Exercise 4.175
on page 259, ^Xi,...,xn = X£=1 Pk and, consequently, on account of The-
orem 5.4, Xi, ..., Xn are mutually independent.	
We should point out that special equivalent conditions for mutual inde-
pendence exist for jointly discrete and jointly absolutely continuous random
variables. See Exercises 5.51 and 5.53 for details.
284 □ Chapter 5 Elements of Probability
EXERCISES 5.2
5.21	Prove Proposition 5.3 on page 276.
5.22	Provide the details of verification for parts (a), (b), (d), and (e) of Exam-
ple 5.5 on page 276.
5.23	Let (П,Д, P) be a probability space and X a random variable defined
thereon. Respond True or False to each of the following statements. Justify
your answer.
a)	If Q is countable, then X is a discrete random variable.
b)	If the range of X is countable, then X is a discrete random variable.
c)	If X is a discrete random variable, then the range of X is countable.
5.24	Prove that X is a continuous random variable if and only if its probability
distribution function, Fx, is a continuous function on H. Hint: Refer to
Exercise 4.169(b).
5.25	Let [i be a probability measure on B. Show that there exists a probability
space and a random variable defined thereon whose probability distribution
is p. Hint: Define an appropriate random variable on (7£,B,/z).
5.26	Refer to Example 5.5(a).
a)	Assume X is a discrete random variable with pmf, px- Let {xn}n
be a sequence of real numbers such that P(X € {xn}n) = 1 and set
Pn = px(xn). Prove that {pn}n is a sequence of nonnegative real num-
bers whose sum is one.
b)	Conversely, suppose that {xn}n is a sequence of real numbers and that
{pn} n is a sequence of nonnegative real numbers whose sum is one.
Define p(x) = pn, if x = xn for some n, and zero otherwise. Prove
that there is a discrete random variable, X, having p as its pmf. Hint:
Employ Exercise 5.25.
5.27	Refer to Example 5.5(c).
a)	Assume X is an absolutely continuous random variable with pdf, /%.
Show that fnfx dX = 1.
b)	Conversely, suppose that f is a nonnegative Borel measurable function
such that J^fdX = 1. Prove that there is an absolutely continuous
random variable, X, having f as its pdf. Hint: Employ Exercise 5.25.
★5.28 Let ф be the Cantor function and define F on by
( °,
F(x) =
k 1,
x < 0;
0 < x < 1;
x > 1.
a)	Show that F is the probability distribution function of a random vari-
able, X.
b)	Prove that the random variable, X, in part (a) is continuous but not
absolutely continuous.
5.2 Random Variables □ 285
★5.29 An absolutely continuous random variable with pdf, f(x) = (27r)~ie~z2/2,
is said to have the standard normal distribution. Suppose that X has
the standard normal distribution and let Y = X2.
a)	Obtain the probability distribution function of the random variable, У,
in terms of that of X.
b)	Show that Y is absolutely continuous and determine its pdf.
c)	Obtain the probability distribution of Y. (This probability distribution
is called the chi-square distribution with one degree of freedom.)
★5.30 Suppose that a number is selected at random from the interval [a,/3] and
let X denote the number obtained.
a)	Find the probability distribution function of the random variable X.
b)	Show that X is absolutely continuous and determine its pdf.
c)	Determine the probability distribution of X. (This probability distribu-
tion is called the uniform distribution on [ct,/3].)
5.31	Suppose that X has the uniform distribution on [0,1]. Let m E M and
define Y = 1 + [mX], where [z] denotes the greatest integer in x. Obtain
the pmf of Y.
5.32	Construct an example of a random variable, X, that is neither discrete
nor continuous. Hint'. Let Y have the uniform distribution on [—1,1] and
set X = У+.
5.33	Suppose that a point is selected at random from the unit disk, that is, from
the set D = {(x,2/) : x2 + y2 < 1}. Let R denote the distance from the
origin to the point obtained.
a)	Find the probability distribution function of the random variable R.
b)	Show that R is absolutely continuous and determine its pdf.
c)	Determine the probability distribution of R.
5.34	Refer to Exercise 5.32. Obtain the probability distribution function of X.
5.35	Prove Proposition 5.5 on page 279.
5.36	Refer to Example 5.7(a) on page 279. Write К = where Xj E Rn
for each j. Determine the joint probability distribution of Xi, ..., Xn.
5.37	Refer to Example 5.7 on page 279.
a)	Suppose that X and У are jointly discrete random variables. Show that,
individually, X and У are discrete random variables and determine their
(marginal) probability mass functions in terms of the joint pmf.
b)	Suppose that X and У are jointly absolutely continuous random vari-
ables. Show that, individually, X and У are absolutely continuous ran-
dom variables and determine their (marginal) probability density func-
tions in terms of the joint pdf.
286 □ Chapter 5 Elements of Probability
5.38	Refer to Example 5.7 on page 279. This exercise generalizes the previous
one from n = 2 to general n.
a)	In Example 5.7(a), show that each Xk must be a discrete random variable
and obtain its (marginal) pmf in terms of the joint pmf.
b)	In Example 5.7(b), show that each Xk must be an absolutely continuous
random variable and obtain its (marginal) pdf in terms of the joint pdf.
5.39	Respond True or False to each of the following. Justify your answers.
a)	If Xi, ..., Xn are discrete random variables all defined on the same
probability space, then they are jointly discrete.
b)	If Xi, ..., Xn are absolutely continuous random variables all defined on
the same probability space, then they are jointly absolutely continuous.
5.40	Suppose that Xi, ..., Xn are mutually independent, absolutely continuous
random variables. Prove that they are jointly absolutely continuous.
★5.41 Suppose that two balanced dice are rolled. Let X and Y be, respectively,
the minimum and the maximum of the two numbers observed.
a)	Show that X and Y are jointly discrete.
b)	Determine the joint pmf of X and Y.
c)	Obtain the marginal pmf of X; of Y.
5.42 Suppose that a point is selected at random from the unit square, that is,
from the set S = { (.т, у) : 0 < x, у < 1}. Let X and Y denote, respectively,
the x- and ^/-coordinates of the point obtained.
a)	Show that X and Y are jointly absolutely continuous.
b)	Determine the joint pdf of X and Y.
c)	Obtain the marginal pdf of X; of Y.
+5.43 Repeat the previous exercise if S is replaced by the unit disk, D.
5.44	Let p be a probability measure on 23n- Show that there exists a proba-
bility space and random variables defined thereon whose joint probability
distribution is p.
5.45	This exercise examines the relationship between mutual independence and
pairwise independence of random variables.
a)	Suppose that Xi, ..., Xn are mutually independent random variables.
Prove that they are also pairwise independent.
b)	Construct an example to show that pairwise independence does not im-
ply mutual independence.
5.46	Provide a detailed verification for all statements made in Example 5.8 on
page 281.
5.47	Supply the proof for Proposition 5.6 in the general case.
+5.48 Consider an experiment having two possible outcomes, say, success, s, and
failure, /, with respective probabilities, p and q = 1 — p. Suppose now that
5.2 Random Variables □ 287
the experiment is repeated independently a finite number of times. Such
repetitions are called Bernoulli trials in honor of James Bernoulli.
a)	Construct a probability space for a sequence of n Bernoulli trials.
b)	Let X denote the total number of successes in n Bernoulli trials. Obtain
the pmf and probability distribution of the random variable X. (This
probability distribution is called the binomial distribution with pa-
rameters n and p.)
★5.49 Refer to Exercise 5.48. Suppose that, for each n G V, Xn has a binomial
distribution with parameters n and A/n, where A is a positive constant,
a) Prove that, for each nonnegative integer fc,
lim P(Xn = fc) = e"A 77.	(5.5)
n—foo	Ki
b) Let pk denote the quantity on the right-hand side of (5.5). Show that
the function defined on by p(x) = pk, if x = к for some nonnegative
integer k, and zero elsewhere, is the probability mass function of a ran-
dom variable. (The probability distribution of such a random variable
is called the Poisson distribution with parameter A.)
5.50	Consider an experiment having a finite number, r, of possible outcomes,
say, oi, ..., Or, with respective probabilities, pi, ..., pr. Suppose now that
the experiment is repeated independently a finite number of times. Such
repetitions are called multinomial trials.
a)	Construct a probability space for a sequence of n multinomial trials.
b)	For each k, 1 < к < r, let Xk denote the total number of times that out-
come Ok occurs in the n multinomial trials. Determine the joint pmf and
the joint probability distribution of the random variables Xi, ..., Xr.
(This probability distribution is called the multinomial distribution
with parameters n and pi, ..., pr.)
c)	For each fc, 1 < к < r, determine the (marginal) probability distribution
of Xk. Hint: Reformulate the model so that each trial has only two
possible outcomes.
5.51	Suppose that Xi, ..., Xn are jointly discrete random variables. Prove that
they are mutually independent if and only if their joint probability mass
function is equal to the product of the marginal probability mass functions;
that is, if and only if px1,...,xn(xi,... ,xn) = pxj(^i) • *-рхпЫ for all
xi, ..., xn €
5.52	Let X and Y be the random variables defined in Exercise 5.41. Apply
Exercise 5.51 to determine whether X and Y are independent.
5.53	Suppose that Xi, ..., Xn are jointly absolutely continuous random vari-
ables. Prove that they are mutually independent if and only if the func-
tion, /, defined on 'R.n by /(xi,..., xn) = /xj (xi) • • •/xn(xn), is a joint
probability density function for Xi, ..., Xn.
288 □ Chapter 5 Elements of Probability
5.54	Apply Exercise 5.53 to determine whether the random variables, X and У,
are independent, where X and Y are as in
a)	Exercise 5.42.
b)	Exercise 5.43.
★5.55 Let X and Y be jointly absolutely continuous random variables with joint
pdf given by
fx,y(x,y) =-----*
where 0 < p < 1.
a)	Determine the marginal pdf of X and of Y.
b)	Show that X and Y are independent if and only if p = 0.
★5.56 Suppose that X and Y are independent random variables.
a)	Prove that /xx+y = px * P>y, where * denotes convolution of measures,
as defined in Exercise 4.158 on page 256.
b)	If X is absolutely continuous, prove that X 4- Y is absolutely continuous
and has pdf, fx+Y (2) = fx(z - y) dp,y(y).
c)	If both X and Y are absolutely continuous, prove that fx+Y = fx* fy,
where * denotes convolution of functions, as defined in Exercise 4.157
on page 256.
5.3 EXPECTATION OF RANDOM VARIABLES
In this section, we will discuss the expectation of a random variable, a
concept that is central to the theory of probability and its applications.
To motivate the formal definition of expectation, we will first provide an
interpretation of its meaning. The most common interpretation is the
long-run-average interpretation, which construes the expectation of
a random variable to be the average value of the random variable in a large
number of independent observations.
More formally, let X be a random variable on a probability space
(П,Л, P), and let £(X) denote its expectation. For n independent rep-
etitions of the experiment, let Xi, ..., Xn represent the n values of the
random variable, X. The long-run-average interpretation is that for large n,
the average value of Xi, ..., Xn will be approximately equal to £(X):
%-  ----+ —n ~ 8(X), for large n.	(5.6)
n
We will now employ (5.6) to motivate the formal definition of expecta-
tion for a simple random variable, that is, a random variable that takes on
only finitely many values, say, xi, ..., xm. So, assume that the experiment
5.3 Expectation of Random Variables □ 289
is repeated independently a large number, n, of times. Then, in view of the
long-run-average interpretation of expectation and the relative-frequency
interpretation of probability, we have
£(X) «	+	|	• n({X = xfc})
n	n
m	m	(5.7)
*	П
k=l	k=l
where, as usual, n(E) denotes the number of times that an event E occurs
in n repetitions of the experiment.
Because of (5.7), we see that the expectation of a simple random vari-
able, X, should be defined by
£(X) = f>fcP(X = ;rfc),	(5.8)
fc=l
where Xi, ..., xm are the possible values of X. But the quantity on the
right-hand side of (5.8) is the abstract Lebesgue integral of the simple ran-
dom variable, X, over Q with respect to P. Generalizing now to arbitrary
random variables, we make the following definition:
DEFINITION 5.11 Expectation of a Random Variable
Let X be a random variable on a probability space (Q, Д, P). Then
the expectation of X, denoted £(X), is defined by
£(%) = [ X(u)dP(u),
Jn
(5-9)
provided the integral on the right-hand side exists. If X 6 £X(Q, Л, P),
that is, the integral on the right-hand side of (5.9) exists and is finite,
then we say that X has finite expectation.
Remark: Terms used synonymously for expectation are mean, expected
value, and first moment.
290 □ Chapter 5 Elements of Probability
EXAMPLE 5.9 Illustrates Definition 5.11
a)	Suppose that two balanced dice are rolled and let X denote the sum
of the dice. Note that X is a simple random variable, taking on the
values 2, 3, ..., 12. And, as we found in Example 5.5(b) on page 276,
P(X — к} — /
- к) - I (13 _ fe)/36j
Therefore, the expectation of X equals
Г	12
£(X) = / XdP = \^kP(X = k) = 2 •
k=2
к = 2, 3, ..., 7;
к = 8, 9,	12.
— + 3- — + -- - + 12- — = 7.
36	36	36
b)	Suppose that a point is selected at random from the unit disk, that
is, from the set D = {(x,y) : x2 4- y2 < 1}. Let R denote the distance
from the origin to the point obtained. Referring to Example 5.2(d)
on page 266, we see that an appropriate probability space for the ex-
periment is (П,Л, P), where ft = D, A = {D Q M : M e A^}, and
P(A) = 7Г-xA2(A) for A € A. Referring to Theorem 4.18 on page 251,
we have
ОД= [ RdP=- [ y/x2 + y2 dX2(x,y)
JQ	JD
= | УУ \4r2 + y2dxdy = |,
D
where the last equality is easily obtained using polar coordinates. □
Since the expectation of a random variable is, by definition, its abstract
Lebesgue integral, all properties of abstract Lebesgue integration apply
immediately to expectation, for example, linearity, MCT, DCT. On the
other hand, because a probability space is a finite measure space, with
total measure equal to one, expectation has properties that are not shared
by abstract Lebesgue integrals on arbitrary measure spaces. For instance,
a bounded random variable has finite expectation and the expectation of
a constant random variable is equal to the constant.
Expectation in Terms of Probability Distributions
All of the probabilistic information about a random variable, X, is con-
tained in its probability distribution, [ix • This indicates that we should be
able to express 8(X), the expectation of X, in terms of As a matter
of fact, we can do considerably more.
5.3 Expectation of Random Variables □ 291
I X{XEB}dP = P(X&B)
Jo
THEOREM 5.6
Let X be a random variable on the probability space (Q, Д, P). Then, for
each Borel-measurable function, g, on TZ, we have
£(<?(*)) = f g(x)dpx(x),	(5.10)
Jn
in the sense that if one side exists, then so does the other and they are
equalJ In particular,
5(X) = [ xdpx(x).	(5.11)
Jn
PROOF: We employ the bootstrapping technique. So, suppose first that
g — Xb, where В € В. Then
£(g(X)) = £ (х{хев}) =
= цх(В) = / XB(x)dnx(x) = I g(x)dnx(x).
Jn	Jn
Hence, (5.10) holds for characteristic functions.
Next suppose that g is a nonnegative В-measurable simple function,
say, g = £Г=1 bkXBk- Noting that g(X) = bkX{xeBk}, we can apply
the linearity property of abstract Lebesgue integrals and the result of the
previous paragraph to conclude that (5.10) again holds.
Now assume that g is a nonnegative Borel measurable function. Us-
ing Proposition 4.7(a) on page 186, we select a nondecreasing sequence of
nonnegative В-measurable simple functions, {$n}^Li5 converging pointwise
on TZ to g. Then it is easy to see that, {sn(X)}^t=1, is a nondecreasing se-
quence of nonnegative random variables converging pointwise on Q to g(X).
Consequently, by the MCT (applied twice) and the result of the previous
paragraph, we have
£(<z(X)) =	(sn(X)) i sn{x)dnx{x) = f g(x) dgx(x).
J n	J n
The remainder of the proof proceeds in the usual way and is left as an
exercise for the reader.	
t Recall that g(X) is another notation for the composition, g о X, of g with X.
292 □ Chapter 5 Elements of Probability
It is important to note that Theorem 5.6 provides us with two methods
for obtaining the expectation of a function of a random variable. Specifi-
cally, suppose that У is a random variable and that h is a Borel measurable
function. Applying (5.10) with X = У and g = to, we have the formula
£(Л(У)) = [ h(x)dpy(x).
Jn
On the other hand, by using (5.10) with X = Л(У) and g the identity
function, we get the formula
£(Л(У)) = [ xdnh(Y)(x).
Jn
Generally speaking, the first formula is easier to use because it avoids
having to determine the probability distribution of Л(У). However, there
are cases when the second formula is more efficient.
We should also point out that Theorem 5.6 implies that the expecta-
tion of a function of a random variable depends only on the probability
distribution of the random variable. In other words, if X and У have
the same probability distribution, then £(^(X)) = £(</(У)) for all Borel
measurable functions, g.
EXAMPLE 5.10 Illustrates Theorem 5.6
a)	Refer to Example 5.5(a) on page 276. Suppose X is a discrete random
variable with probability mass function, px- Let {xn}n be such that
P(X e {zn}n) = 1 and set pn - P(X = xn)- Then px = Y,nP^xn
and, hence, by (5.10),
= ^g(x)dpx(x) = ^g(xn)pn = 52g(x)px(x),
for each Borel-measurable function, g.
b)	Refer to Example 5.5(c) on page 276. Suppose that X is an absolutely
continuous random variable with probability density function, Then
/zx(B) = fx dX and, hence, by (5.10),
£(g(X))= [ g(x)dpx(x)= f g(x')fx(x')dX(x),
Jn	Jn
for each Borel-measurable function, g.
5.3 Expectation of Random Variables □ 293
c)	Let X be a random variable on (Q,X,P) and n € ЛЛ If Xn e C^P),
then we say that X has a finite nth moment and we define the
nth moment of X to be 8(Xn). By Theorem 5.6, specifically, (5.10),
we have £(Xn) = xn dpx(z). It can be shown that if X has a finite
nth moment, then it has a finite moment of each order less than n. □
Next we will discuss a generalization of Theorem 5.6 to random vec-
tors. The proof of this generalization is essentially identical to that of
Theorem 5.6 and is left as an exercise for the reader.
THEOREM 5.7
Let Xi , ..., Xn be random variables all defined on the same probability
space, (Q,A,P). Then, for each Bn-measurable function, g, on TZn,
£	, • • • J *n)) == I	? ^n) ^МХ1,...,ХП(^1) • • • ) ^n)>
in the sense that if one side exists, then so does the other and the two sides
are equal.
We will apply Theorem 5.7 to obtain an important result concerning
the expectation of the product of random variables. By the linearity prop-
erty of the abstract Lebesgue integral, we know that the expectation of
the sum of two random variables equals the sum of their expectations. Al-
though, in general, the expectation of the product of two random variables
does not equal the product of their expectations, we do have the following
result.
PROPOSITION 5.7
Suppose that X and Y are independent random variables having Suite
expectations. Then XY has Suite expectation and 8(XY) = £(Х)£(У).
PROOF: First note that, on account of Theorem 5.6, we have
I \xy\dfiY(y) dpx(x)= / |z|d/xx(z) / |уНду(у)
n	J	Jn	Jn
= £(|X|)£(|У|) < oo.
Since X and Y are independent, Theorem 5.4 on page 282 implies that
Mx,y = Mx xMr- Therefore, applying Theorem 5.7, Fubini’s theorem, and
294 □ Chapter 5 Elements of Probability
Theorem 5.6, we get
E(XY) = f xyd^X'Y
Jn2
xydp.Y(y)
dyx{x)
= [ xdfix(x) i ydfj.Y(y) = £(X)£(Y).
Jn Jn
This completes the proof.
We can generalize Proposition 5.7 to n mutually independent ran-
dom variables. The proof can be accomplished either by employing the
n-dimensional version of Fubini’s theorem or by using induction, Proposi-
tion 5.6 on page 281, and Proposition 5.7.
COROLLARY 5.1
Suppose that Xi, , Xn are mutually independent random variables hav-
ing Snite expectations. Then the random variable, Пк=1 a^so ^ias
expectation and 5(П£=1 xk) = Пь=1 £№)•
Variance of a Random Variable
If X is a random variable, then the expectation of (X — S(X))2 is of
particular importance. That quantity is called the variance of X.
DEFINITION 5.12 Variance of a Random Variable
Let X be a random variable having finite expectation. Then we define
the variance of X, denoted Var(X), by
Var(X) = f((X-£(X))2}.
If Var(X) < oo, then X is said to have finite variance. The square
root of the variance of X is called the standard deviation of X.
Note: We leave it as an exercise for the reader to prove that
Var(X) = E(X2) - (£(X))2.
5.3 Expectation of Random Variables □ 295
It is often simpler to compute the variance of a random variable by using
this latter formula. The formula also makes it clear that X has finite
variance if and only if it has a finite second moment.
The variance of a random variable, X, is a measure of its dispersion
relative to the mean, being the expected value of the square of the distance
from X to £(X). Thus, the smaller the variance, the less likely that X will
take a value far from its mean. More precisely, we have the following fact:
PROPOSITION 5.8 Chebyshev’s Inequality
Suppose that X is a random variable defined on the probability space
(Q, A, P) and having finite variance. Then, for each e > 0,
P(|X-£(X)| > e) <	(5.12)
PROOF: We have
Var(X) = f((X-£(X))2} = I (X — £(X))2 dP
> [	(X-£(X))2dP
J{|X-£(X)|>e}
>[	€2 dP = e2P(\X — £(X)\ >e),
J{\X-£(X)|>e}
as required.	
Note: It is trivial that Chebyshev’s inequality also holds for random vari-
ables having finite expectation and infinite variance, but it is of little value
in that case.
Although Chebyshev’s inequality is quite easy to prove, it is, nonethe-
less, indispensable as a tool in probability theory. The importance of
Chebyshev’s inequality is due to its universality—it holds for every ran-
dom variable (having finite expectation). And despite the fact that (5.12)
will usually not be sharp, it is the best that can be said in general. See
Exercise 5.73.
Variance of a Sum
Many probabilistic arguments require the analysis of the variance of a sum
of random variables. We will see this, for instance, in the next section when
we discuss laws of large numbers. To begin, it will be useful to make the
following definition.
296 □ Chapter 5 Elements of Probability
DEFINITION 5.13 Covariance of Two Random Variables
Suppose that X and Y have finite variances and are defined on the
same probability space. Then the covariance of X and У, denoted
Cov(X, У), is defined by
Cov(X, У) = f((X - £(X)) (У - £(У))}.
The finite-variance assumption in Definition 5.13 assures the existence
of 8(ХУ). This is a consequence of Cauchy’s inequality, which will be
proved in a more general setting in Section 9.2 (see Theorem 9.1).
Note that Cov(X, X) = Var(X). Also, it follows easily from properties
of expectation that
Cov(X,y) = 8(ХУ) - 8(X)f (У).
We now present a formula for the variance of the sum of a finite number
of random variables.
PROPOSITION 5.9
Suppose that Xi, ..., Xn have finite variances and are defined on the same
probability space. Then Xi 4- • • • 4- Xn‘has finite variance and
xfc=l '	fc=l PROOF: We have Var(f^Xfc) = (^Xk-t 4=i '	\ 4=i =*((£(**- \ 4=1 =£^(xfe-£( = f;var(xfc) + : k=l	Var(Xfc) + 2^2 Cov(Xi, Xj).	(5.13) i<j A))2+2 E(x« -£ (x«))	-£ (*>)) j i<j	' -адж - од)), i<3
as required.
5.3 Expectation of Random Variables □ 297
From (5.13), we see that a significant simplification will occur in the
formula for the variance of the sum of random variables if the covariances
are all zero. This leads to the following definition and corollaries:
DEFINITION 5.14 Uncorrelated Random Variables
Suppose that X and Y have finite variances and are defined on the
same probability space. Then they are said to be uncorrelated if
Cov(X, У) = 0. Random variables {Xl}lEi are called uncorrelated if,
for each pair of distinct elements, г, j E /, the two random variables, Хг
and X,, are uncorrelated.
Note: It follows immediately from Proposition 5.7 on page 293, that two
independent random variables with finite variances are uncorrelated. The
converse, however, is not true. See Exercise 5.80.
COROLLARY 5.2
If Xi, ..., Xn are uncorrelated random variables, then
s П	ч	n
Var(£xk) =£Var(Xk).
4=i	'	k=i
COROLLARY 5.3
If Xlf ..., Xn are pairwise independent random variables with Snite vari-
ances, then
s П	\ n
Var(£xfc) =£Var(Xfc).
4=i	'	fc=i
In particular, the previous equation holds for mutually independent random
variables with Snite variances.
EXERCISES 5.3
5.57	Let Q be a finite set, P a probability measure on P(Q), and X a random
variable on (Q, P(Q),P). Show that £(X) =	X(w)P({u>}), so that
£(X) is a weighted average of the values of X, weighted by probabilities.
5.58	Suppose that a balanced coin is tossed three times. If X denotes the total
number of times the coin comes up heads, determine the expectation of the
random variable X.
298 □ Chapter 5 Elements of Probability
5.59	Suppose that a point is selected at random from the unit square, that is,
from the set S = { (ж, у) : 0 < x, у < 1}. Let U denote the larger of the two
coordinates of the point obtained. Compute the expectation of the random
variable U.
5.60	Provide a detailed verification for parts (a) and (b) of Example 5.10.
5.61	Find the first two moments for a random variable having the
a)	uniform distribution on [a, /3] (refer to Exercise 5.30 on page 285).
b)	standard normal distribution (refer to Exercise 5.29 on page 285).
5.62	Let n 6 ЛЛ Construct a random variable having a finite nth moment but
no finite moment of any higher order.
5.63	Suppose X is a random variable with finite nth moment. Prove that X has
a finite mth moment for all nonnegative integers, m < n.
if 5.64 Suppose that X is a nonnegative random variable and that n G ЛЛ
a)	Prove that £(Xn) = n xn~1P(X > x)dx. Hint: Express xn as an
integral and apply Tonelli’s theorem.
b)	If, in addition, X is nonnegative-integer valued, deduce from part (a)
that 5(Xn) = x (kn -(k- l)n)P(X > k).
5.65	Prove Theorem 5.7 on page 293.
5.66	Show that, in general, the expectation of the product of two random vari-
ables is not equal to the product of their expectations.
5.67	Prove Corollary 5.1 on page 294.
5.68	Suppose that a point is selected at random from the unit ball, that is, from
the set В = { (ж, у, z) : x2 + у2 + z2 < 1}. Let X, Y, Z, and R denote,
respectively, the ж-coordinate, ^-coordinate, ^-coordinate, and distance to
the origin of the point obtained.
a)	Determine £(R) by employing Theorem 5.7.
b)	Determine £(R) by first finding the probability distribution of R and
then applying (5.11).
★5.69 This exercise examines some basic properties of variance. Let c 6 R and
X be a random variable with finite expectation. Prove that
a) Var(X) = f (X2) - (f(X))2.
b)	Var(cX) = c2Var(X).
c)	Var(c + X) = Var(X).
d)	Var(X) = 0 if and only if X is constant P-ae.
5.70	Let Y have the uniform distribution on [—1,1] and set X = Y+. Obtain
the mean and standard deviation of X. Refer to Exercises 5.34.
5.71	Refer to Exercise 5.48 on page 286. Let X have the binomial distribution
with parameters n and p. Determine the mean and variance of X.
5.72	Refer to Exercise 5.49 on page 287. Let X have the Poisson distribution
with parameter Л. Determine the mean and variance of X.
5.3 Expectation of Random Variables □ 299
5.73	Construct an example where equality holds in Chebyshev’s inequality for
some c > 0.
5.74	The following result, known as Markov’s inequality, is due to the Russian
mathematician Andrey Andreyevich Markov (1856-1922): Suppose X is a
nonnegative random variable on the probability space (Q, Д, P). Then, for
each € > 0, we have P(X > e) < £(X)/e.
a)	Prove Markov’s inequality.
b)	Deduce Chebyshev’s inequality from Markov’s inequality.
5.75	Let X be a random variable on (Q, A, P) and suppose that ф is a function
on H that is positive, increasing on (0, oo), and satisfies </>(—x) = Ф(х)-
Prove that, for each e > 0, P(|X| > c) < £(</>(%))/0(c).
5.76	This exercise investigates some basic properties of covariance. All random
variables are assumed to have finite variance and to be defined on the same
probability space.
a)	Show that Cov(X, Y) = £(XY) - £(X)£(Y).
b)	Let ai, 1 < i < m, and bj, 1 < j < n, be sequences of real numbers.
Prove that
Cov I	aiXi, ^2 bjYj \	^2 a»b>Cov(Xi, Yj).
j = l	' i=l j = l
This is called the bilinearity property of covariance.
5.77	Suppose that two balanced dice are rolled. Let X and Y denote, respec-
tively, the minimum and maximum of the two numbers observed. Determine
Cov(X, У). Note: Refer to Exercise 5.41 on page 286.
5.78	Obtain the covariance of the two random variables in each part that follows:
a) Suppose that a point is selected at random from the unit square, that is,
from the set S = {(x, у) : 0 < x, у < 1}. Let X and Y denote, respec-
tively, the x- and ^/-coordinates of the point obtained.
b) Repeat part (a) if S is replaced by the unit disk, D. Note: Refer to
Exercise 5.43 on page 286.
5.79	Refer to Exercise 5.55 on page 288. Determine Cov(X, Y).
5.80	Let О be uniformly distributed on [0,2тг]. Define X = cos О and Y = sinO.
Show that X and Y are uncorrelated but not independent.
5.81	Redo Exercise 5.71 using the following steps: For 1 < к < n, let Xk be 1
or 0 according as the fcth trial results in success or failure.
a)	Obtain £(Xfc) and Var(Xfc).
b)	Explain why Xi, ..., Xn are independent. (No work is required here!)
c) Explain why X = Xi 4--------F Xn.
d)	Use parts (a)-(c) to find the mean and variance of X. Compare the work
done here with that in Exercise 5.71.
300 □ Chapter 5 Elements of Probability
5.82	Suppose Xi, Xn are mutually independent random variables having
identical probability' distributions (such random variables are said to be
iid, short for “independent and identically distributed”). Further suppose
those random variables have finite variance and denote by fi and a2 their
common mean and variance, respectively. Set
Xn = (Xi 4-• • • 4-Xn)/n.
Prove that
a)	£(Xn) = Did you use independence here?
b)	Var(Xn) = a2In.
~^n)2) = (n- l)a2-
5.83	For a finite numerical population, {ai,..., an}, the population mean,
and population standard deviation, a, are defined by
and <7 = , ± J2(Oi-p)2.
i=l	\ i=l
Suppose that n members of the population are selected at random, where
we assume that n < N if the sampling is done without replacement. Denote
by Xk, 1 < к < n, the value of the fcth member obtained. Set
Xn = (Xi + - + Xn)/n.
Prove that
a)	£(Xk) = and Var(Xfc) = cr2, 1 < к < n. Hint: The value of Xk is
equally likely to be any of the N population values.
b)	8(Xn) = fi.
c)	Var(Xn) = cr2/n, if the sampling is with replacement.
d)	Var(Xn) =	* a2/nJ the sampling is without replacement.
★5.84 Let (Q, Д, P) be a probability space and E e A with P(E) > 0. For
А б Л, define Pe{A) = P(A \E). By Proposition 5.2 (see page 268), Pe is
a probability measure on A. Hence, we can define expectation with respect
to Pe. This is called the conditional expectation relative to E. Thus, by
definition, the conditional expectation relative to E of a random variable X,
denoted by £(X | E) or Ее (X), is
8(X\E) = 8E(X) = [ X(u)dPE(^
Jn
provided the right-hand side exists.
a)	Prove that £F(X) = fE X(w) dP(u)/P(E).
b)	Use part (a) to interpret conditional expectation.
c)	Suppose that {En}n are pairwise mutually exclusive events having pos-
itive probability and satisfying Q = |Jn Pn. Further suppose that X is
5.4 The Law of Large Numbers □ 301
a random variable having finite expectation. Prove that £еп(Х) exists
and is finite for each n and that
£(X) =	£(X | En)P(En) = 52 £En (X)P(En)-
n	n
This is called the law of total expectation.
d)	Interpret the previous equation in words.
e)	Compare the law of total expectation with the law of total probability
(Exercise 5.13(b) on page 273). Precisely how are they related?
5.4 THE LAW OF LARGE NUMBERS
At the beginning of Section 5.3, we introduced the long-run-average inter-
pretation of expectation, (5.6) on page 288, in order to motivate the formal
definition of the expectation of a random variable. Now that we have made
that formal definition and established some basic properties of expectation,
it is natural to ask whether we can prove a mathematically precise version
of (5.6) as a theorem.
So,	let X be a random variable associated with some random experi-
ment. Suppose that the experiment is repeated independently and let Xk
represent the value of X on the fcth trial. More precisely, we assume that
Xi, X2, ••• are mutually independent random variables, all having the
same probability distribution as ХЛ Then we want to prove that, in some
sense,
Xi + • • • + Xn	.14)
n
as n —> oo. The question now is, in what sense do we take the convergence
in (5.14)? Naively, we might want the convergence to be pointwise. But
that is too much to expect, as the following example shows.
EXAMPLE5.il Illustrates (5.14)
Consider the experiment of tossing a balanced coin once. Let X = 1 or 0
according to whether the coin comes up a head or a tail. As the coin is
balanced, £(X) = 0-(l/2) + l-(l/2) = 1/2. Here, repeating the experiment
independently means tossing the coin over and over again. Also, Xk = 1
t The existence of such random variables is a consequence of the Kolmogorov
extension theorem. See, for example, Robert B. Ash’s Real Analysis and Probability
(Cambridge, MA: Academic Press, 1972).
302 □ Chapter 5 Elements of Probability
or 0 according to whether the fcth toss is a head or a tail. In this context,
(5.14) becomes
+ i	(5.15)
n	2	v 7
as n —> oo, which says simply that, in the long run, the coin comes up heads
half of the time. However, it is clear that (5.15) does not hold pointwise,
that is, for every possible infinite sequence of heads and tails. For instance,
if every toss comes up heads, then the limit is one, while if every toss comes
up tails, then the limit is zero.	□
Example 5.11 shows that it is unreasonable to expect the convergence
in (5.14) to be pointwise. As a next best choice, we might try to prove
ahnost-everywhere convergence—and that is exactly what we will do. Be-
fore proceeding, we recall from Section 4.1 that, in probability theory, the
terms, almost surely, with probability one, and almost certainly, are used
synonymously for “almost everywhere.”
Preliminaries
Several preliminary results will be needed in order to prove (5.14). We
begin with the following three lemmas. The proofs of the first two are left
to the reader as exercises.
LEMMA 5.1
bet {cmn}^ n=1 be a double sequence of real numbers such that
•	for each n € A/\ cmn —* 0 as m —> oo, and
•	the sequence {|стпп|}^=1 Is bounded.
For a bounded sequence {t/n}X=i of real numbers, let zm = 23^ cmnyn
for m € АЛ Then
а)	Уп —* 0, as n —* oo, => zm —* 0, as m —> oo.
b)	12X1	-> 1, as m -> oo, and yn -> y, as n -+ oo =>	-> y,
as m —> oo.
LEMMA 5.2 Toeplitz’s Lemma
Suppose that an Is a divergent series of positive real numbers and
that	is a convergent sequence of real numbers. Then
r 2-/fc=l aksk ,.
hm ------------= lim sn.
n-,0° Xfc=i “fc n~x
5.4 The Law of Large Numbers □ 303
LEMMA 5.3 Kronecker’s Lemma
Suppose that	is an increasing sequence of positive real numbers
such that linin—oo bn = oo. Further suppose that xn is a convergent
series of real numbers. Then,
1 n
lim — Y2 bkxk = 0.
n k=l
PROOF: Define x =	«о = 0 and, for n € sn — Ylk=ixk-
Also define bo = 0 and, for n e Af, an = bn- Ьп-г. Using summation by
parts (see Exercise 5.87), we get that
n	n
bk%k = bnSn sk—lak-	(5.16)
fc=l	fc=l
Note that bn =	and that sn —> x as n —> oo. Hence, by (5.16) and
Toeplitz’s lemma,
171	/	V"^7l	\
L	1-	/ z^fc—1 aksk-l \	n
lim — у bkxk = lim sn-----------*=A-------- = x - x = 0,
n->0° bn	n-,o° \ Xfc=l °fc /
as required.	
Next we will prove two propositions, both due to Kolmogorov.
PROPOSITION 5.10 Kolmogorov’s Inequality
Let Xi,..., Xn be mutually independent random variables, each with Snite
variance. Set Sj = Xi 4----F Xj, 1 < j < n. Then, for each e > 0,
P( max |S, -£(S,)| > e) <
PROOF: Without loss of generality, we can assume that £(Xj) = 0 for
j — 1, ..., n (why?). Let A = {maxi<j<n \Sj| > c} and, for 1 < к < n,
^fc = {|SJ<6, j =	|Sfe| > e}.
Note that the A^s are mutually exclusive and that A = |Jfc=i Ak- Now,
Var(Sn.) = £(S^) = [ S2ndP> [ S2ndP = ^ [ S2ndP. (5.17)
«/ Г2	J A.
304 □ Chapter 5 Elements of Probability
Let Yk = -Xk+i 4--1- Xn. Then Sn = Sk 4- Yk and, hence,
[ S2ndP= [ SkdP + 2 [ SkYkdP+ [ Yk dP. (5.18)
«/Afc	J Ak	J Ak	J Ak
Because XAk^k is a Sfc-measurable function of Xi, ..., X^, and Yk is
a Bn_jfe-measurable function of Xfc+i» • • •, Xni we deduce from Proposi-
tion 5.6 on page 281 that XAk$k and Yk are independent random variables.
Thus, by Proposition 5.7 on page 293, and the fact that 8(Yk) = 0,
[ SkYkdP + [ XAkSkYkdP = 8(xAkSk -П) = 8(xAkSk)£(Yk) = 0.
JAk	Jn
This last equation and (5.18) imply fAk S2dP > fAk S%dP > e2P(Ak).
Consequently, by (5.17),
Var(Sn)>e2^P(Afc) = e2P(A).
fc=l
This completes the proof of the proposition.	
Note: If n = 1, Kolmogorov’s inequality reduces to Chebyshev’s inequality.
PROPOSITION 5.11
Suppose that Xi, X2, ... are mutually independent random variables and
that SXi Var(Xn) < 00. Then (Xn - £(Xn)) converges P-ae.
PROOF: We can assume without loss of generality that 8(Xn) = 0 for all
пеМ. Set Sn = Xk- We want to prove that, with probability one,
{SnHJ'Li converges. First we will show that, for each e > 0,
Jim^Pf (J{|5m+fc -Sm| > e} j = 0.
(5.19)
By Proposition 5.1(g) on page 267, we have, for each m € X,
p(U {|5m+fe - Sm| > e}) = lim p(u {|Sm+fc -	> e})
4=1	'	n~t°° 4=1	' (5.20)
= lim P( max |Sm+fc - STO| > t).
n—>oo \ l<fc<n	/
5.4 The Law of Large Numbers □ 305
For 1 < к < n, let Yk = Xm+k and
к	т+к
=	= Y, х1 = 3т+к-3т.
J = 1	J=7n+1
As Xi, X2, ... are mutually independent, so are Yi, ..., Yn. Applying
Kolmogorov’s inequality to the T^s, we get
p/ l<? с I s> Var(Sm+n - Sm) _ 1
•P( max |*9Г7г_|_д!; Sm| > el <	2	— 2 z > Var(X^).
\ i<«<n	/	e	б ,	_
k=m+l
The previous relation and (5.20) imply that, for each m € Af,
£ Var(xfc).
4=1	' fc=m+l
Letting m —► 00 in this last relation and using YlkLi Var(Xfc) < 00, we see
that (5.19) holds.
Now we can show that {5’n}^=1 converges with probability one. Let
E = {cj : {Sn(o?)}^=1 does not converge}. Then w G E if and only if
is not a Cauchy sequence, which means there exists an r E.V
such that, for each n € Af, there is a A: e Af with |Sn+k(^) ~ S'n(a>)| > r"1.
In other words,
E=U ri(U{lS»«-S"^-} 	(5-21)
r=l \n=l 4=1	7 /
But, for each r 6 AT, we have, by Proposition 5.1(f) and (5.19), that
/ 00 / 00	- = lim P m—^oo < lim P 7П—+OO	((j {|Sm+fc - Sm| > J}) = 0.
This last fact and (5.21) imply that P(E) = 0.
306 □ Chapter 5 Elements of Probability
The Strong Law of Large Numbers
Before proving our next theorem, we will introduce some additional termi-
nology. We say that the random variables, Xi, X2, ..., obey the strong
law of large numbers if there exists a sequence,	of real numbers
and a sequence,	of positive real numbers tending to infinity such
that, with probability one,
Xi 4- • • ‘ 4- Xn — an
lim ----------------------= 0.
n->OO	bn
(5.22)
If the convergence in (5.22) is in probability (i.e., in P-measure), then we
say that Xi, X2, ... obey the weak law of large numbers. Because a
probability space is a finite measure space, Proposition 4.11 on page 204
implies that if a sequence of random variables obeys the strong law of large
numbers, then it obeys the weak law of large numbers.
The next result, also due to Kolmogorov, provides a sufficient condition
for a sequence of random variables to obey the strong law of large numbers.
THEOREM 5.8 Kolmogorov’s Strong Law of Large Numbers
Let Xi, X2, ... be mutually independent random variables with Unite vari-
ances and set Sn = X± 4----F Xn. Suppose that {bn}^Li is an increasing
sequence of positive real numbers satisfying limn_>oo bn = 00 and
Vax(Xn)
< 00.
Then, with probability one.
,iro =(,.
n-»OO bn
In other words, Xb X2, ... obey the strong law of large numbers with
an — £(Sn).
PROOF: For n e let Yn = (Xn — £(Xn))/bn and note that £(Yn) = 0.
In view of Exercise 5.69 on page 298,
У Var(y„) = У Var	- У ХЦ™
oo.
5.4 The Law of Large Numbers □ 307
Therefore, by Proposition 5.11, Yn converges with probability one.
But, we have
Sn-£(Sn) 1 Xk-£(Xk) 1 v
The required result now follows from Kronecker’s lemma.	
An immediate corollary of Kolmogorov’s strong law of large numbers
is the following result. Its proof is left to the reader as Exercise 5.88.
COROLLARY 5.4
Suppose that Xi, X2, • • • are mutually independent random variables with
common finite mean p and variance a2. Then
with probability one.
In Kolmogorov’s strong law of large numbers and the foregoing corol-
lary, besides presuming that the random variables, Xi, X2, • • •, are mutu-
ally independent, we impose a restriction on their variances. If we assume
that Xi, X2, ... all have the same probability distribution, then the re-
striction on the variances can be eliminated. To prove this statement, we
first establish the following lemma.
LEMMA 5.4
Suppose that X is a nonnegative random variable. Then
00	00
£р(х>п)<од<£р(х>п). -
n=l	n=0
Thus, X has finite expectation if and only if	> n) < 00.
PROOF: For n < x < n + 1, P(X > n + 1) < P(X > x) < P(X > n).
Integrating these inequalities from n to n + 1 and summing the results, we
conclude that
OO	00 rn+l	00
J2P(X>n + l)<^2 / P(X>x)dx< £Р(Х>П).
n=0	n=0	n=0
308 □ Chapter 5 Elements of Probability
By Corollary 3.3 on page 144, the integral in the previous relation equals
Jo> x) dx. Applying Exercise 5.64(a) on page 298, we obtain the
required result.	
We will now prove the strong law of large numbers in the case where
the random variables, Xi, X2, ..., are mutually independent and have the
same probability distributions. Such random variables are said to be iid,
short for “independent and identically distributed.” Note that there is no
assumption made about the common variance of the X^s; in particular,
the common variance may be infinite.
THEOREM 5.9 Strong Law of Large Numbers (iid Case)
Suppose that Xi, X2, ... are mutually independent and identically dis-
tributed random variables with Snite mean, p. Then
Xi 4- • • • + Xn
hm ---------------= p	(5.23)
n—>oo	n
with probability one.
PROOF: We can assume without loss of generality that p = 0. Because
Xi, X2, ... are identically distributed and have finite mean, Lemma 5.4
implies that
00	00
22 p(i*ni n) = 22 pdxii > ”) < °°-	(5-24)
n=l	n=l
The idea of the proof is to truncate the X^s and then apply Theo-
rem 5.8. Let E =	0Х=п{1х*1	fc})- Then (5-24) and part (a)
of the Borel-Cantelli lemma (page 270) imply that P(E) = 0. Define the
sequence of random variables, Yi, Y2, ..., by
у ___ у-	__J Xn, if |Xn I <C n;
Уп - лпх{[Хп\<п} - 10 if |Xnj > n.
and note that, if cu 6 Ec, then Yn(w) = Xn(u>) for n sufficiently large.
Therefore, to establish (5.23), with p = 0, it suffices to prove that, with
probability one,
1 n
lim—V~'Yfe = 0.	(5.25)
n—*00 n z—'
5.4 The Law of Large Numbers □ 309
Since Xi, X2, • • • have the same probability distribution, Theorem 5.6
on page 291 implies that £(УП) = £(XnX{|xn|<n}) = £(-XiX{|Xi|<n))-
Hence, by Corollary 4.3 on page 199,
£{Yn) = [ XxdP-+ £(Xi) = /z = 0,
J{|Xi|<n}
as n —* oo. This implies that n 1	£(Yk) —> 0 as n —> oo. Conse-
quently, proving (5.25) is equivalent to proving
iim n.,n-££;.,rt) _ 0,
n—>oo	• 71
with probability one; and, to accomplish that, we will verify that the Yns
satisfy the hypotheses of Theorem 5.8 with bn = n.
As Xi, X2, ... are mutually independent, so are У1, 1г, ... (why?).
Furthermore, we have
Уаг(Уп)
" n2
n=l
oo 1 n r
=еЛе / xidp
OO -	OO -|
= yl	xtdPY^
^/{rn-lSPGKm}
OO	-	OO 1
<^m
J {m—l<\Xi\<m}	n=mn
00 г
<2 V /	|X1|dP = 2f(|X1|)<oo,
where, in the previous line, we have used the fact that, for m € AT,
X^=mn~2 — 2/m (see Exercise 5.92).	
Theorem 5.9 indicates that the intuitive notion of expectation as the
long-run-average value of a random variable in repeated, independent ob-
servations can be formulated and proved mathematically as a consequence
310 □ Chapter 5 Elements of Probability
of the axioms of probability. A simple corollary of that theorem shows that
this is also true for the relative-frequency interpretation of probability?
COROLLARY 5.5 Borel’s Strong Law of Large Numbers
Suppose that E is an event associated with some random experiment and
let p be its probability. Denote by n(E) the number of times that event E
occurs in n independent repetitions of the experiment. Then
r n(#)
hm ------= p
n-^oc n
with probability one.
PROOF: For each n € AT, define Xn = 1 or 0 according to whether event E
occurs or does not occur on the nth repetition of the experiment. Then
n(E) = Xi 4- • • • 4- Xn and, as the repetitions of the experiment are inde-
pendent of one another, the random variables Xi, %2, • • •, are iid. Their
common mean isp = 0'(l—p) + l- p = p. The required result now follows
from Theorem 5.9.	
We have concentrated our discussion in this section on the strong law
of large numbers. As we know, if a sequence of random variables obeys the
strong law of large numbers, then it must also obey the weak law of large
numbers. Nonetheless, the weak law is important in its own right because,
for example, it can be proved under weaker conditions than the strong law.
Several versions of the weak law will be considered in the exercises.
EXERCISES 5.4
Note: In the exercises below, we will use the notation Sn = X\ 4-h Xn.
5.85	Prove Lemma 5.1 on page 302.
5.86	Prove Toeplitz’s lemma, Lemma 5.2 on page 302.
5.87	Prove the summation by parts formula, (5.16) on page 303. Hint: Write
bk = aj and interchange summations.
5.88	Prove Corollary 5.4 on page 307.
5.89	Describe, in words, the difference between the weak and strong laws of
large numbers for iid random variables having finite mean. Refer to Defini-
tion 4.15 on page 203 and Exercise 4.92 on page 207.
t Actually, the following corollary is also a corollary of Corollary 5.4.
5.4 The Law of Large Numbers □ 311
5.90	The following result is known as Cantelli’s strong law of large numbers:
Suppose that Xi, Xz, ... are mutually independent random variables with
uniformly bounded fourth moments. Then (Sn — 5(Sn))/n —> 0, as n —> oo,
with probability one.
a)	Deduce Cantelli’s strong law from Kolmogorov’s strong law. .
b)	Prove Cantelli’s strong law without reference to Kolmogorov’s strong
law. Hint: Employ Exercise 5.75 on page 299 with ф(х) = я4, the
Borel-Cantelli lemma (page 270), and Exercise 4.92 (page 207).
5.91	Suppose that independent trials are performed in which an event, E, occurs
on the fcth trial with probability pk- Let n(E) denote the number of times
that event E occurs in the first n trials. Show that, as n —► oo,
n(E) H=1p* ; 0
n	n	’
with probability one.
5.92	Prove that for m E JV, V°° A < 2/m.
5.93	Let Xi, X2, ... be iid with finite mean, д, and f a bounded continuous
function on 11.
a)	Prove that
lim fX1 + "; + Xn)) =№)•
n—»oo \	\	Tl	/ j
b)	Deduce from part (a) that, for each t E [0,1],
ит f>(£)
n-^oo	\Tl/ J
Hint: Refer to Exercise 5.81.
5.94	Each number in [0,1] has a decimal expansion and, except for numbers of
the form m/10n, the expansion is unique. For definiteness, we will use the
unique terminating expansion for numbers of the form m/10n. Now, let
x E [0,1] have decimal expansion .X1X2 ...; that is, x =	xn/10n. For
each n E N and к E {0,1,..., 9}, denote by nfc(x) the number of the first
n decimal digits of x that equal k. Then x is said to be a normal number
if njt(x)/n —> 1/10 as n —> 00 for all digits fc; in other words, if the relative-
frequency of occurrence of each decimal digit in x is 1/10. In this exercise,
we will prove the following result due to Borel: Except for a Borel set of
Lebesgue measure zero, every number in [0,1] is normal.
a)	Let (П,Д,P) = ([0,1],B[o,i],A[o,i])- Define the functions Yi, Y2, ...
on Q by Yn(x) = xn, where xn is the nth decimal digit of x. Prove that
312 □ Chapter 5 Elements of Probability
Yi, Y2, ••• are random variables, that is, are В[0,1]-measurable. Hint:
Note that
9	9
{yn = fc}=J... (J {Y1=k1,...,Yn-1=kn-1,Yn = k}
fci=O
and show that each set in the union is an interval.
b)	Prove that the random variables, Yi, Y2, ..., are iid.
c)	Show that, for each decimal digit k, limn—oo nk(x)/n = A-ae. Hint:
Let X<fe) = x{Yj~k}, for j = 1, 2, ... .
d)	Deduce that, except for a Borel set of Lebesgue measure zero, every
number in [0,1] is normal.
5.95	Repeat Exercise 5.94 for binary instead of decimal expansions. Explain how
this provides a model for the random experiment of tossing a balanced coin
indefinitely.
5.96	Prove Markov’s weak law of large numbers: Suppose that Xi, X2, • • •
are random variables all defined on the same probability space and having
finite variances. Further suppose that Var(Xi +----------h Xn) = o(n2). Then
linin—oo (Sn — £(Sn))/n = 0, in probability. That is, Xi, X2, ... obey the
weak law of large numbers with an = £(Sn) and bn = n.
5.97	Prove Chebyshev’s weak law of large numbers: Suppose Xi, X2, ...
are uncorrelated random variables having uniformly bounded variances.
Then linin—oo (Sn — £(Sn))/n = 0, in probability.
5.98	Establish the following generalization of Chebyshev’s weak law of large num-
bers: Suppose that Xi, X2, ... are random variables all defined on the same
probability space. Further suppose that they have uniformly bounded vari-
ances and that
lim i V Cov(Xfc,Xn) = 0.	(5.26)
n—*oo П
k=l
a)	Prove that limn->oo(Sn — £(Sn))/n = 0, in probability.
b)	Random variables, Xi, X2, ..., are said to be asymptotically uncor-
related if Cov(Xi, Xj) —> 0 as |i — j\ —► 00. Prove that asymptotically
uncorrelated random variables with uniformly bounded variances satisfy
(5.26) and, hence, the weak law of large numbers.
5.99	A standard example of a series that is convergent but not absolutely con-
vergent is
00	1
' <5-27)
n=l
5.4 The Law of Large Numbers □ 313
Suppose, instead of (5.27), we consider a similar series in*"1, where
the signs are chosen at random. In other words, suppose we consider the
series
oo
(5.28)
n=l
where the Xns are iid, taking the values ±1 each with probability 1/2.
a)	Show that the series in (5.28) converges with probability one.
b)	What can be said about convergence of the series ^^=1anXn, w^en
{an} is a sequence of real numbers with an < oo?
5.100	Let Xij X2, ... be a sequence of mutually independent random variables
with P(Xn = nb) = P(Xn = —nb) = 1/2, wher^ b is a positive real number.
Prove the following:
a)	If b < 1/2, then Sn/n —* 0 with probability one.
b)	If b > 1, then limsupn_too |Sn|/n = 00 with probability one.
c)	Conclude that Theorem 5.8 fails if the hypothesis Var(Xn)/&2 < 00
is removed.
5.101	Let Xi, X2, ... be iid random variables, and suppose that £(X*) = 00 and
£(Xf) < 00. Prove that Sn/n-+ 00 with probability one.
Johann Radon
(1887-1956)
Johann Radon, born on December 16,1887, in
Tetschen, Bohemia, began his formal schooling
at the age of 10 at the Gymnasium in Leit-
maritz, Bohemia. Eight years later, in 1905. he
enrolled at the University of Vienna to pursue
the study of mathematics and physics. Radon
presented his doctoral dissertation in 1910 on
the calculus of variations.
Radon taught at several universities between 1910 and 1919; he spent
a semester at the University of Gottingen, a year at the University of
Brunn, and time at the Technische Hochschule of Vienna and at the
University of Vienna. In 1919 he went to the University of Hamburg for
three years, moved subsequently to Greifswald, Erlangen, and then to
Breslau in 1928 where he remained until 1945. In 1947, he was elected
to the Austrian Academy of Sciences.
The calculus of variations continued to fascinate Radon because of
its many applications to analysis, geometry, and physics. He applied it
to differential geometry to discover Radon curves. Other work included
the combination of Lebesgue's and Stieltjes's theories of integration (the
development of the Radon integral), the Dirichlet problem of the loga-
rithmic potential (application of the Radon-Nikodym theorem), and the
development of the Radon transformation technique.
Radon spent the last nine years of his life as a full professor at the
University of Vienna in Vienna, Austria, where he died on May 25, 1956.
314
Differentiation
Up to this point, we have been concentrating on the theory of integra-
tion. In this chapter, we will study the theory of differentiation, both in
the classical sense of derivatives of functions and in an extended sense of
derivatives of measures.
We will prove Lebesgue’s remarkable theorem that any monotone func-
tion is differentiable almost everywhere. We will also introduce the concepts
of bounded variation and absolute continuity and use them to generalize
the two fundamental theorems of calculus to Lebesgue integration.
We will also extend the notion of measure to include those that are real-
valued and complex-valued, establish decomposition and representation
theorems for measures, prove and apply the famous Radon-Nikodym theo-
rem, and generalize the classical change-of-variable formula for integration.
6.1 DERIVATIVES AND DINI-DERIVATES
In this section, we will introduce derivatives and establish the fact that
any monotone function has a (finite) derivative almost everywhere. Note:
For brevity, we will use the phrase almost everywhere instead of Lebesgue
316 □ Chapter 6 Differentiation
almost everywhere and use the notation ae instead of A-ae. To begin, we
recall the following definition from elementary calculus.
DEFINITION 6.1 Derivative of a Real-Valued Function
A real-valued function f defined in some open interval about x € is
said to be differentiable at x if
/(x + fe) -/(x)
lim   ;----------
h->0	h
exists and is finite. In that case the limit is called the derivative of f
at x and is denoted by /'(ж)Л
For our study of differentiation, it is useful to introduce the concept
of the Dini-derivates of a function at a point. And, in order to do that, we
need the following definition.
DEFINITION 6.2 Lower and Upper Limits
Let g be a real-valued function defined in a deleted interval about the
point ж, that is, a set of the form (c, d) \ {ж}, where c < x < d. Then
we define
lim sup g (г/) = inf sup g(y)
y—*x+	0<y—x<6
liminfg(y) = sup inf o(y)
y—>x+	s>0 Q<y-x<S
limsupp(y) = inf sup g(y)
y-*x~	6>"Q<x-y<6
lim inf ^(y) = sup inf g(y).
y-^x-	6>0 o<x-y<8
These extended real numbers are called, respectively, the upper right,
lower right, upper left, and lower left limits of g at x.
t If lim^—.o (f(x + h) — /(z)) /h — oo, we will write f'(x) = oo but will not say that
f is differentiable at x and, similarly, if the limit is — oo.
6.1 Derivatives and Dini-Derivates □ 317
We introduce these lower and upper limits for the same reason that
we introduce the limit inferior and limit superior of sequences; namely,
although lim2/_>a;+ g(y), etc., may not exist, limsup2/_>x+ g(y), etc., always
exist (in 7£*). We leave it as an exercise for the reader to prove that the
right-hand limit of g at ж, lim^i g(y), exists in 7£* if and only if the
lower and upper right limits of g at x are equal; in that case, we denote the
right-hand limit by g(x+). An analogous result holds for left-hand limits.
EXAMPLE 6.	1 Illustrates Definition 6.2
• Let g(y) = sin(l/i/) for у / 0. It is easy to see that for each 6 > 0,
sup g(y) = 1 and inf g(y) = -1.
0<y<6	Q<y<6
Consequently, we have lim sup2/__>0+ g(y) = 1 and liminf2/_>0+ g(y) = — 1.
Similarly, limsup^Q- g(y) = 1 and lim infy_>o~ g(y) = -1.	□
Dini-Derivates
We now define the Dini-derivates of a real-valued function.
DEFINITION 6.3 Dini-Derivates
Let f be a real-valued function defined in an open interval about the
point x. Set
n+/z \ г /(я + h) -/(я)
D f(x) = hmsup —-------£----—
л—o+ h
O+/W = liminf +
h—>0+	h,
C-/W = limsup/(l + ',)~№)
/1-0-
/i—о-	rt
These four extended real numbers are called the Dini-derivates of f
at x. They are, respectively, the upper right, lower right, upper
left, and lower left derivates.
318 □ Chapter 6 Differentiation
It follows that f is differentiable at x if and only if all four of the
Dini-derivates are equal and finite. It also follows that
f M lim + h>)~
exists in TZ* if and only if D+f(x) — D+f(x); and similarly for fL(x).
EXAMPLE 6.	2 Illustrates Definition 6.3
Let f = xq and x eQ. Then,
f(x + h) — f (x)	Г -1, if h i Q-,	f6 n
h	10,	if h e Q.	k '
It follows easily from (6.1) that for each 6 > 0,
f(x + h) - f(x)	. t f(x + h)~ /(x)
sup —--------7—= 0 and mf —-------------------------f---— = -oo.
o</i<6 h	o<h<6 h,
Therefore, D+f(x) = 0 and D+f(x) = -oo. Similarly, we find that for
each 6 > 0,
f(x + h) - /(ж)	f(x + h) - f(x}
sup —--------7----—- = oo and mf — ---------------------г-----— = 0,
-6<h<o h	-6<h<o h,
so that D~f(x) = oo and D_f(x) = 0.	□
An Everywhere-Continuous, Nowhere-Differentiable Function
It is an elementary fact proved in calculus that if f is differentiable at a
point x, then it is continuous at x. The converse of this fact fails. For
example, f(x) = |ж| is continuous but not differentiable at x = 0.
We now present a more striking example, namely, a function that is
continuous at every point of 1Z but differentiable at no points of 11. The
idea is to construct a function that is everywhere continuous but oscillates
so wildly as to be nowhere differentiable.
EXAMPLE 6.	3 A Continuous, Nowhere-Differentiable Function
Define ф(х) on [0,1] by
0(x) =
x,
1 — ж,
0 < ж < I;
| < x < 1.
6.1 Derivatives and Dini-Derivates □ 319
Extend ф to all of H via ф(х + fc) = ф(х) for к G Z. See Fig. 6.1.
Next define the functions un, n = 0, 1, 2, ..., by un(x) = </>(4nx)/4n,
as portrayed in Fig. 6.2. Note that, for each n, un is continuous on TZ.
FIGURE 6.2 Graphs of the uns.
320 □ Chapter 6 Differentiation
Now consider the function f defined for all x G TZ by
oo
/(*) = SUn(x).
n=0
For x G It, |un(z)| < 1/(2 -4n) and so the series converges uniformly on 1Z.
Hence, f is continuous on H. Note also that for к = 0, 1,2, ... and n > fc,
4n
,	<№&±4Гк))	<№nx±4n~k)
»n(-±4 )-----------------=-------4.----
_ </>(4nx) _
4n ~
for all x G 7£.
To show that f is nowhere differentiable we consider two cases. Ref-
erence to Fig. 6.2 will prove helpful during the discussion.
Case 1: x is not of the form m/4fc, for some m G Z and к E fJ"-
We will find a sequence	such that hn / 0 for all n G X, hn —> 0,
but (/(a; 4- hn) — fix)')/hn does not have a limit. This will show that f is
not differentiable at x.
We can assume without loss of generality that 0 < x < 1. Then
x lies in exactly one of the intervals (0,1/4), (1/4,1/2), (1/2,3/4), (3/4,1).
Hence, we can choose hi so that |hi| = 1/4 and that x + hi G (0,1/2) if
x G (0,1/2) and x 4- hi G (1/2,1) if x G (1/2,1). It then follows from (6.2)
that
/(a; + fei) -/(x)
Ai
ф\х + Ai) - ф(х)
Ai
Next, x also lies in exactly one of the intervals (0,1/16), (1/16,1/8),
(15/16,1). Hence, we can choose h2 so that |Ti2| = 1/42 and that
x 4- h2 E (0,1/8) if x G (0,1/8), x 4- h2 E (1/8,1/4) if x G (1/8,1/4), ...,
x 4- h2 E (7/8,1) if x E (7/8,1). It then follows from (6.2) that
f(x + h2)-f(x)	(2, X e (4=1,1), к = 1, 3, 6, 8;
A2	I 0, a: e (V. I)> * = 2, 4, 5, 7.
Continuing in this manner, we obtain a sequence {hn}^=1 such that
|hn| = l/4n and
/(ж 4- hn) — f(x) _ ( odd integer, n odd;
hn	[ even integer, n even.
6.1 Derivatives and Dini-Derivates □ 321
= 0)
Thus, hn —> 0, but Ишп_>оо(/(з; + hn) — f(xf)/hn does not exist. Hence,
f is not differentiable at x.
Case 2: x is of the form m/4k, for some m G Z and к G ЛЛ
Let hn = 4~n. If r > n then, by (6.2), ur(x -I- hn) = ur(x + 4“n) = ur(x)
and, consequently,
Ur(x ~/ifj) U<p (x
hn
Now, let n G JV” with n > k. If <r < n — 1, then
ur(x) = ur = </>(m4r-fc)/4r = 0.
Moreover, because 0 < 4r~n < 1/4 < 1/2, we have </>(4r~n) = 4r-n, and
therefore,
/	7 x	(m , \	1 f Arfm 1 \\ /АГ
ur(x + hn) = ur + hnj = ф I 4	+ —J J /4
= ф(т4г~к + 4r~n)/4r = </>(4r“n)/4r = 4~n.
Consequently,
»r(» + b.)-»rW=1|	к<Т<п^!.	(6.4)
hn
Next note that for all r, ur has a right derivative at all points and so,
in particular, at x. Hence, it follows that
(ur& + hn) - ur(x)
hm >	---------7----------
n^°°	\ hn
(6.5)
exists and is finite.
Now, for convenience, let dr = (ur(x + hn) — ur(x))/hn. For n > fc,
we have
/(x + /in) - f(x)
/in
к—1	n—1	oo
r=0	r=fc	r—n
322 □ Chapter 6 Differentiation
By (6.4), the second term equals n—к and, by (6.3), the third term equals 0.
Hence, for n > fc,
f(x + hn) - fix) = yi / ur(x + ftn) - ttr(a:) \
r=0 '
Applying (6.5), we can now conclude that
]•	+ hn) - fix) _
lim -------------------= oo.
n-^OO	hn
In particular, f is not differentiable at ж.	□
Vitali Covers
Example 6.3 shows that continuity is by no means sufficient for differen-
tiability. The function /, constructed in that example, is everywhere con-
tinuous but nowhere differentiable. Essentially, the reason that function is
nowhere differentiable is because it “oscillates vigorously.” In Section 6.2
we will show that functions that do not oscillate vigorously (in a sense to
be made precise) are differentiable almost everywhere.
Our next goal is to prove that a monotone function is differentiable
almost everywhere, a theorem due to Lebesgue. The proof we give uses the
concept of Vitali covers. Roughly speaking, a family, V, of closed intervals
is a Vitali cover of a set E of real numbers if every point of E is in arbitrarily
small intervals of V. More precisely, we have the following definition.
DEFINITION 6.4 Vitali Cover
Let E C TZ. A family, V, of nondegenerate closed intervals is said to
be a Vitali cover of E if for each x € E and each 6 > 0 there is an
I eV such that x e I and £(/) < 6.
The following theorem, called the Vitali covering theorem, uses the
concept of Lebesgue outer measure A*, defined in Chapter 3 on page 106.
THEOREM 6.1 Vitali Covering Theorem
Let E C TZ with A*(£?) < oo and suppose V is a Vitali cover of E. Then
for each e > 0 there is a finite disjoint collection {Д}£=1 С V such that
x*(e\IJ/J <€.
' fc=l '
6.1 Derivatives and Dini-Derivates □ 323
PROOF: Because A*(£?) < oo, we can choose an open set, O, such that
О DE, and A*(O) < oo. Set W = {I e V : I С O}. Then W is a Vitali
cover for E. (See Exercise 6.12.)
The idea of the proof is this: Starting with some Д € УУ, select an
I2 G W as large as possible but missing Д; then select an /3 € W as large as
possible but missing Д U/2; and continue the process until A* (-E\Ufc=i Ik)
becomes small.
So, let Ii e W. If E С Д, then we are done. Otherwise, let
51 = sup{ 7(7) : I G УУ, IП Ii = 0 }. BecauseW is a Vitali cover of E and
E \ Ii / 0, it follows that 5i > 0. Also, because I С О for all I G УУ, it
follows that 5i < A*(O) < 00. Hence we can choose I2 G УУ with I2 A7i = 0
and ^(/2) > 5i/2. Again, if E c Ii U then we are done.
We now proceed inductively. Suppose 7i,...,7n G УУ are pairwise
disjoint. If E C ULi Ik, then we are done. Otherwise, let 5n = sup{£(7) :
I e УУ, I A Ik = 0, 1 < к < n }. Since W is a Vitali cover of E and
E \ UJUi Ik / 05 it follows that 5n > 0. Also, because I С О for all I G W
and A*(O) < 00, it follows that 5n < 00. Hence, there is a member of УУ,
say 7n+i, such that In+i A 7fc = 0, 1 < fc < n, and £(7n+i) > 5n/2.
If this process terminates after a finite number of steps, then we are
done. Otherwise, it yields a sequence {Tn}^Li Pairwise disjoint members
of УУ such that £(7n+i) > 5n/2 and 52£(7n) < 00. Because ^t(In) < 00,
there is an N G X such that
00
£ ад<с/5.
n=N4-l
Set A = E \ (j£=i In- We claim that A*(A) < e.
Let x G A. Then x (Jn=i In and so ^(a:, (JnLi In) = S > 0. Because
x G E and W is a Vitali cover for E, there is an I G УУ with x G I and
5(7) < 5. It follows that IA In = 0 for n = 1, 2, ..., N.
Now, there must be an n G V with 7A7n / 0. Suppose to the contrary.
Then for each n G Af, 7 A 7^ = 0, 1 < к <n. Applying the definition of 5n,
we get that for each n G.V, 7(7) < 5n < 2£(7n+i). But this is impossible
because 7(7) > 0 and £(7n+i) —> 0 as n —> oo. Let m = min{ n : 7A7n / 0 };
note that m > N. Let ym be the midpoint of Im. Then,
|z - Ут\ < 1(1) + т/(Лп) < <5m—1 + т/(Лп)
/	z	z
1	5
< 2£(Лп) + -e(Im) =
z	z
Consequently, x G [ym - 11(1т),Ут +	= Лп-
324 □ Chapter 6 Differentiation
Hence, if x G A, there is an m > N such that x G Jm\ in other words,
A c Um=N+i «An- It follows that
A‘(A)< f; £(Jm) = 5 £ €(4n)<e.
m=N+l .	m=7V+l
This completes the proof.
Differentiability of Monotone Functions
We are just about ready to prove Lebesgue’s famous theorem on the almost-
everywhere differentiability of monotone functions. First a lemma.
LEMMA 6.1
Let f be a real-valued function on (а, ft). Then the set of points in
where f±(x) and fL(x) exist (possibly ±oo) but are unequal is countable
and, hence, has Lebesgue measure zero.
PROOF: We show { x G (a, b) : f'+(x) and fL(x) exist and Д (ж) < fL(x)}
is countable. An analogous/argument shows {x G (a, b) : f'+(x) and fL(x)
exist and Д (ж) > /-(ж) } also countable.
Let E = {x G (a, b) */f+(x) and fL(x) exist and f'+(x) < fL(x)}. We
will set up a one-to-one correspondence between E and a countable set,
thus establishing the countability of E.
So, let x G E. Choose rx G Q such that Д(ж) < rx < fL(x). By the
definitions of /+(я) and /L(x), we can choose rational numbers sx and tx
such that a<tx<x<sx<b and
f(y) - /(д)
y-x
f(y) - f(x)
X<y<Sx
tx <y < X
or, in other words,
f(y) “№) < rx(y-x),
tx <У < sx, y^x.
Now consider the mapping ф:Е —> Q3 defined by ф(х) = (rx,sx,tx).
Because Q3 is countable, it will follow that E is countable if we can prove
that ф is one-to-one.
6.1 Derivatives and Dini-Derivates □ 325
Assume to the contrary that there exist x,z e E with x / z and
ф[х) — 0(z). Then rz = rx, sz = sx, and tz = tx. So,
f(y) - /(*) < rx(y - x),
№) - /(*) < rx(y - z),
tx<y <SX, y^X
tX < У < У Z.
Since tx = tz < z < sz — sx and z / x, and tx < x < sx and x 0 z, we
conclude that-
/(*) - /(x) < rx(z - x)
/(*) - /(*) < rx(x - *)>
which is impossible.	
THEOREM 6.2
Let f be a monotone function on [a, b\. Then f is differentiable almost
everywhere on [a, b],
PROOF: The method of the proof is as. follows. First we will show that
{x € (a, 6) : D+f(x) < D+f(x)} has measure zero; a similar argument
will show that {a: G (a,6) : D_f(x) < D~f(x)} has measure zero as
well. This will establish that {x e (а,Ь) : Д(х) and fL(x) exist in 7£* }c
has measure zero. Then, by Lemma 6.1, we will be able to conclude that
{z G (a,6) : f'(x) exists in 7£* }c has measure zero. Finally, we will show
that {z G (a,b) : ff(x) is infinite} has measure zero.
We can assume without loss of generality that f is nondecreasing. Let
E = {x G (a,b) : D+f(x) < D+f(x) }. We will show that A*(E) =*= 0. For
each r, s G Q with 0 < r < s, let
Ers = {x G (a,b) : Щ/(ж) < r < s < D+f(x)}.
Then E = U { Ers :r,s G Q, 0 < r < s }, a countable union. If we can
prove A*(£?rs) — 0 for all r, s G Q with 0 < r < s, then we will have
established that A*(j£) = 0.
So, let r,s G Q with 0 < r < s, and set а = A*(F?rs). Let c > 0 be
given, and choose О open with О D Ers and A(O) < a4-6. If x G Ers, then
D+f(x) < r. Consequently, for each 6 > 0, inf0<h<6(/(^+^)""/(^))/^ <
and, hence, there is an h, with 0 < h < 5, and f(x 4- h) — /(x) < rh.
Now, let V be the collection of all closed intervals of the form [я, я 4-h],
where h > 0, x G Ers, f(x+h) — f(x) < rh, and [z, rr4-h] C On(a,b). Then
it follows from the previous paragraph that У is a Vitali cover of Ers- Thus,
326 □ Chapter 6 Differentiation
by the Vitali covering theorem, Theorem 6.1, there is a finite sequence
A, 12^ •. •, In of pairwise disjoint members of V such that
X*(Ers\ Q/fe) <e
' fc=l '
Let Ik — [xk,Xk + hk\- We will need to work with the open intervals
(xk.Xk + hk). Set U = {Jk=Axk^xk + hk) and note that
X*(Ers\U) < e.	(6.6)
Also, because U С O, we have
^hk = X(U)<X(O)<a + e.	(6.7)
k=l
Then, by the definition of V, we may conclude that
+ hk) - f{xk)) <r^Thk <r(a + e).	(6.8)
fc=l	fc=l
Next, assume that у G Ers A U. Then D+f(y) > s so that for each
6 > 0 there is a fc, with 0 < к < 6, such that f(y + k) — f(y) > sk. As
U is open, [у, у 4- к] C U for sufficiently small k. Consequently, if we let W
be the collection of all closed intervals of the form [?/, у 4- fc], where к > 0,
у G Ers A C7, f(y 4- k) — f(y) > sk, and [у, у 4- к] G U, then W is a Vitali
cover of Ers A U. Using the Vitali covering theorem again, we can choose
pairwise disjoint members of W, say Ji, J2, ..., Jm, such that
A*((Er,nU) \ (j J,) < e.	(6.9)
'	J=1 '
From (6.6) and (6.9), we get
a = A*(Ers) < X*(Ers П U) + A*(£rs \ U)
<« + £>)+< (6’10)
J=1
Setting Jj = [yj,yj 4- kj], we obtain from (6.10) and the definition of W
that
m	mm
^(ДУз + kj) - f(yj)) > s^2kj = s^Jj) > s(a - 2e).	(6.11)
J=1	J=1	j=l
6.1 Derivatives and Dini-Derivates □ 327
Now,	+ fcj] C U C Ufc=i fcfc, xk + hk] and, therefore (see
Exercise 6.14),
m	n
+fcj) - /(%•)) < 52 (f(xk+hk> ~ л**))-
j=l	fc=l
This along with (6.8) and (6.11) imply that s(a — 2e) < r(a 4- б). As б > 0
was arbitrary, it follows that sa < ra. Because r < s and a > 0, we must
have a = 0.
Thus, we have shown that { x € (a, b) : D+f(x) < D+f(x) } has mea-
sure zero. A similar argument shows that { x € (a, b) : D_f(x) < D~f(x) }
also has measure zero. Hence, {x e(a,b): f'+(x) and fL(x) exist in TV }c
has measure zero. Applying Lemma 6.1, we conclude that f'(x) exists for
А-almost all x G (a, 6), although its value may be infinite at some points.
Let A = { x G (a, b) : f'(x) = 4-oo }. We wish to show that A(A) = 0.
Let x G A and N G AT. Then D+f(x) = oo so that for each <5 > 0 there is
an Л, with 0 < h < 6, such that
f(x + h) - f(x) > Nh.
(6-12)
Consequently, if we let U be the collection of all closed intervals of the form
[x, ir + h], where h > 0, x G A, f(x + h) — f(x) > Nh, and [rr, rr + Л] G (a, 6),
then 11 is a Vitali cover of A. By the Vitali covering theorem, there exist
pairwise disjoint members of U, say Ii, I2, ..., In, such that
k=l '
(6.13)
Set Д = [xk,Xk 4- hk\- Then, by (6.12) and (6.13),
NX*(A) < 1 + NX* (Q /Л =1 + N^hk
'	fc=i
< 1 + £(/(sfc + hfc) - /(xfc)) < 1 + /(b) - /(a),
k=l
where the last inequality follows from the fact that f is nondecreasing and
Ik C (a, 6), 1 < к < n. (See Exercise 6.13.) But, VA*(A) <14- f(b) — f(a)
for each N G Af implies that A* (A) = 0.	
328 □ Chapter 6 Differentiation
Derivatives of Complex-Valued Functions
We conclude this section by briefly discussing differentiation of complex-
valued functions of a real variable.
DEFINITION 6.5 Derivative of a Complex-Valued Function
A complex-valued function f defined in an open interval about x G
is said to be differentiable at x if	_
/(x + h) - /(x)
lim ------;-------
h-+0	h
exists. In that case the limit is called the derivative of f at x and
is denoted by fr(x).
The proof of the following proposition is left as an exercise for the
reader.
PROPOSITION 6.1
Let f be a complex-valued function defined in an open interval about x.
Let и = %lf and v = 3f. Then f is differentiable at x if and only if и and v
are differentiable at x and, in that case, f'(x) = u'(x) 4- iv'(x).
EXERCISES 6.1
6.1	Show that limJ/_>a.+ g(y) = L G 7£* if and only if for each sequence
with yn > x and yn —* x, limn->oo g(yn) = L. Establish a similar result for
left-hand limits.
6.2	Suppose that	is a sequence with yn > x and yn —> x. Show that
lim inf g(y) < liminf g(yn) < limsupg(yn) < limsupg(y).
y-*x+	n—*oo	n-*oo	y—
Establish a similar result for left-hand limits.
6.3	Show that limJ/_tx+ g(y) = L G TV if and only if
lim inf g(y) = lim sup g(y) = L.
Establish a similar result for left-hand limits.
6.4	Prove that lim^-^ g(y) = L G IV if and only if
liminf g(y) = lim sup g(y) — liminf g(y) — lim sup g(y) = L.
У—х"	y—x~	У~*х+	y—x+
6.1 Derivatives and Dini-Derivates □ 329
6.5	Find the Dini-derivates of f = xq at x if x Q.
6.6	Set /(0) = 0 and f(x) = zsin(l/z) for x 0 0. Find the Dini-derivates of f
at each x e Я.
6.7	Suppose / attains its minimum at a point x and that f is defined in an
open interval about x. Show that D+f(x) > 0 > D~ f(x).
Exercises 6.8-6.11 discuss differentiation of convex functions.
A real-valued function, /, on (a, b) is called convex if
f(cx + (1 - c)y) < cf(x) + (1 - c)f(y),
for all x, у e (a, b) and 0 < c < 1.
6.8	Prove that a convex function on (a, b) is continuous thereon.
6.9	Let f be a convex function on (a, b). Prove that f+(x) and fL(x) exist for
all x 6 (a, b).
6.10	Let f be a convex function on (a, b). Prove that f'+ and fL are nondecreasing
on (a, b).
6.11	Let f be a convex function on (a, b). Prove that f' exists almost everywhere
on (a, b).
6.12	Let ECU and О an open set with О D E. Suppose V is a Vitali cover
of E. Show that W = {I 6 V : I C 0} is also a Vitali cover of E.
6.13	Suppose f is nondecreasing on [a, 6]. Let Л, 1 < fc < n, be a sequence of
pairwise disjoint subintervals of [a, b] having endpoints аь and bk, 1 < k < n.
Then
n
J2(/(b*)-/(afc))</(b)-/(a).
k=l
6.14	Suppose f is nondecreasing on [a, 6]. Let	and {Zfc}^ be two dis-
joint sequences of closed subintervals of [a, 6] such that UJLi c Uk=i
Denote the left and right endpoints of Jj by aj and bj, respectively, and
those of Ik by Ck and dk, respectively. Then
m	n
£(№) - /(<b)) <	- f(ck)).
j=l	k=l
6.15	Suppose f is nondecreasing on an interval I having a nonempty interior.
a)	Show that at each interior point x, both f(x+) and f(x~) exist and that
f(x+) = inf{ f(y) :y el and у > x },
f(x-) = sup{ f(y) :y el and у < x }.
330 □ Chapter 6 Differentiation
Furthermore, verify that f(x—) < f(x) < /(z+). Conclude that f is
continuous at x if and only if f(x+) = f(x—).
b)	Formulate and prove appropriate analogues of the statements in part (a)
in the cases where x is either a left or right endpoint of I.
c)	Suppose that a, b G I with a < b. Let zi, Z2, ..., xn be points of (a, b).
Prove that J2"=i(/(a:t+) “	< /(&) “ f(a)-
d)	Deduce from the previous parts that the set of points in I where f is
discontinuous is countable.
6.16	Show that the derivative of the Cantor function equals zero almost every-
where on [0,1].
6.17	We know that the Cantor function, is nondecreasing on [0,1] and, from
the preceding exercise, if/ = 0 ae on [0,1]. Although is nondecreasing and
maps [0,1] to [0,1], it is “usually constant” in the sense that it is constant
on each subinterval of the complement of the Cantor set. In this exercise,
the Cantor function is used to construct a strictly increasing continuous
function, /, on [0,1] such that f' = 0 ae on [0,1]. For each n E Af and
nonnegative integer к < 2n, let
0,
fnk(x) = < t/j(2nx-k),
k 1,
Define the function, /, on [0,1] by
oo	2n —1
n=l k—Q
a)	Show that f is well-defined and continuous.
b)	Show that f is strictly increasing on [0,1].
c)	Prove that /' = 0 ae on [0,1].
6.18	Let f be a real-valued function on [a, 6]. Suppose E C [a, d] and f exists
and is bounded on E, say, by M. Prove that Л* (/(B)) < MA*(E).
6.19	Prove Proposition 6.1 on page 328.
6.2 FUNCTIONS OF BOUNDED VARIATION
In Section 6.1 we proved that every monotone function is differentiable
almost everywhere (Theorem 6.2). Furthermore, we stated that, not only
monotone functions, but any function that does not “oscillate vigorously”
is differentiable almost everywhere. Definition 6.6 makes precise the notion
of not “oscillating vigorously.”
6.2 Functions of Bounded Variation □ 331
DEFINITION 6.6 Total Variation; Bounded Variation
Let f be a complex-valued function on [a, b\. The total variation
of f over [a, b], denoted by is defined by
n
V*f = sup < ^2- f(xk-1)I a = x0<x1<-- - <xn
= b .
I fc=i
If K6/ < oo, then f is said to be of bounded variation on [а,Ь].
EXAMPLE 6.4 Illustrates Definition 6.6
a)	Any monotone function on [a, b] is of bounded variation. In fact, we
have V^f = f(b) - f(a), if f is nondecreasing, and V^f = /(a) - /(6),
if f is nonincreasing.
b)	Define f on [0,1] by /(0) = 0 and f(x) = rrsin(l/a;), for x 0. Then
f is not of bounded variation on [0,1]. To see this, consider for each
the partition of [0,1] given by
0 <	2	<	2 <	2	<	< 2 < 2 < i
(4n 4-1 )тг	4п7г (4n — 1)тг	2тг 7Г
In other words, xq = 0; x^ = 2/(4n + 2 — А?)тг, к = 1, 2, ..., 4n 4-1; and
^4n+2 = 1- Then,
4n4-2	2	о
£ ^Xk>> ~ ^Xk~1^ = (4n + 1)7F + (4п + 1)тг
2	2
(4n — 1)тг + (4n — l)?r
2.2	2
2	2	2	. ,	2
+ • + — + — + -+ sin 1 - -
7Г
Зтг ЗТГ ' 7Г
and so,
4n4-2	л 4n+2 -
ti	* tik
As к 1 —> oc as n —> oc, it follows that V^f = oo.
□
Our next goal is to prove that any function of bounded variation
on [a, b] is differentiable almost everywhere on [a, b], This is an imme-
diate consequence of two facts: that any monotone function is differen-
tiable almost everywhere (Theorem 6.2) and that any real-valued function
of bounded variation can be written as the difference of two nondecreasing
functions. This latter fact is the content of Theorem 6.3.
332 □ Chapter 6 Differentiation
THEOREM 6.3
Let f be a real-valued function of bounded variation on [a, b\. Then
VZf + vyf = yyf, a<x<y< b.	(6.14)
Moreover, f can be written as the difference of two nondecreasing functions
on [a, b\.
PROOF: Define V£f = 0. First we prove (6.14). If x = a, there is nothing
to prove. So, assume x > a. Let a = xq < xi < • • • < xn = у be a partition
of [a, y] and set к = min{j : x < Xj }. Then, by the definition of total
variation,
fc-i
52 |/(x,) -	+ |/(x) - /(Xfc.jl < V*f
j=l
and
|/(xfc)-/(x)|+ £ \/(хэ)-/(х^\<УУГ
j=fc+l
Consequently, by the triangle inequality,
£	- /(xej-1) i < 521Ж)- Ж-i)! +1/(®) - /(^-1)1
J=1	j=l
+\f(xk)-f(x)\+ 52 i/(^) -/(^-1)1
j=fc+l
<v*f + vxyf.
Because the preceding inequality has been established for any partition
of [a,y], we have V*f < V*f + V^f.
To obtain the reverse inequality, let a = a?o < Xi < • • • < xm = x and
x = уо < yi <•••< yn = у be partitions of [a,a;] and [a;,y\, respectively.
Then
a = x0 < xi < • • • < xm = x = yQ < т/i < • • • < yn = у
is a partition of [a, ?/]. Therefore,
m	n
vm > 52 №•) - /to-oi+52и w -
j=l	k=l
6.2 Functions of Bounded Variation □ 333
Taking the supremum over partitions of [a,®], we obtain that, for any
partition, x = yo < уг <••< yn — у of [x,у],
n
k=l
Taking the supremum over all partitions of [rr, y], yields V^f>V^f + V^f.
This establishes (6.14).
Write f(x) = V£f-	We claim that V*f and V*f-f(x)
are nondecreasing functions on [a, b\. From (6.14) and the fact that V*f > 0
for x < y, we deduce that, as a function of x, V£ f is nondecreasing on [a, b\.
It remains to prove that V*f — f(x) is nondecreasing on [a, b]. Let
a < a; < ?/ < b. By (6.14),
(KV - /(y)) - (V*f - /(x)) = v*f - (f(y) - /(x))
> KV-|/(2/)-/(x)|>0,
where the last inequality holds because x = xq < Xi = у is a partition
of[x,y].	
COROLLARY 6.1
Suppose f is a complex-valued function of bounded variation on [a, b].
Then f can be written in the form
f = (fi-f2) + i(f3-f4),
where fj, 1 < j < 4, are nondecreasing functions on [a, b\. -
PROOF: Because f is of bounded variation, so are SR/ and Qf. (See Ex-
ercise 6.24.) Applying Theorem 6.3 to SRf and S/ completes the proof. 
Since any monotone function on [a, b] is differentiable almost every-
where on [a, b] (Theorem 6.2) and every function of bounded variation
on [a, b] can be written as a linear combination of nondecreasing functions
(Corollary 6.1), we have the following theorem.
THEOREM 6.4
Any function of bounded variation on [a,b] is differentiable almost every-
where on [a, Ь].
334 □ Chapter 6 Differentiation
EXERCISES 6.2
6.20	Using only Definition 6.6, prove that if f is of bounded variation on [a, 5],
then it is bounded thereon.
6.21	Define / on [0,1] by /(0) = 0 and f (x) = x2 sin(l/z), for x 0 0. Show that
f is of bounded variation on [0,1].
6.22	Define f on [0,1] by /(0) = 0 and f(x) = x2 sin(l/z2), for x 0 0. Show
that f is not of bounded variation on [0,1].
6.23	Define f on [0,1] by /(0) = 0 and f(x) = xa sin(l/z), for x 0 0. Show that
f is of bounded variation on [0,1] if and only if a > 1.
6.24	Prove that f is of bounded variation on [a, 6] if and only if Rf and S/ are
of bounded variation on [a, 5].
★6.25 Let f and g be complex-valued functions on [a, 5] and a G C. Prove that
a) V^(f + g)< V*f + V*g
b) = |a|Va7
6.26 Suppose f is a function of bounded variation on [a, 5]. Show that f has only
a countable number of discontinuities.
6.27 Let f be of bounded variation on [a, 6] and D denote the set of points of (a, b)
at which f is discontinuous. By Exercise 6.26, we can write D = {zn}n-
For each n, set dn = f(xn+) - f(xn~). Show that \dn| < Vabf.
+6.28 Let f and g be of bounded variation on [a, 6]. Show that
Va(fg) < (sup{ |/(x)| : x € [a, 6] })V*g + (sup{ |ff(a:)| : x € [a, t>] }) V*f.
Deduce that the product of two functions of bounded variation on [a, 5] is
also of bounded variation on [a, b].
6.29	Let	be a sequence of real-valued functions on [a, b] that converge
pointwise to the function f. Prove that V*f < liminfn—oo V* fn-
6.30	Suppose that f: [a, 6] —> [c, d] is monotone and that g is of bounded variation
on [c,d]. Prove that Va(g о f) < V^g.
6.31	Let f be of bounded variation on [a, 6]. If f is continuous at xo 6 [a, 6],
show that the function g(x) = V*f is also continuous at xq.
6.32	Let {/n}^Li be a sequence of functions of bounded variation on [a, 5] such
that £2n fn(a) converges absolutely and	Vabfn < oo. Prove that
a) fn(x) converges absolutely for each x e [a, 6]
b) vab(En/n)<Envo7„
6.3 THE INDEFINITE LEBESGUE INTEGRAL
Recall that the two fundamental theorems of calculus for Riemann integra-
tion show that differentiation and integration are inverse operations. More
precisely, we have the following two facts.
6.3 The Indefinite Lebesgue Integral □ 335
First Fundamental Theorem of Calculus: Suppose f is Riemann integrable
on [a, b]. Let
F(x) = I f(t) dt, a<x <b.
Ja
Then F is differentiable at all points x at which f is continuous (hence,
almost everywhere by Theorem 2.7) and at such points F'(x) = /(x). In
other words,
d fx
fajaf№=f(x)	(6-15)
at all continuity points of f.
Second Fundamental Theorem of Calculus: Suppose f is defined on [a, b]
and /' exists and is Riemann integrable on [a, b\. Then
f f'(t) dt = f(x) - /(a), a < x < b.	(6.16)
J a
In this section, we will prove a generalization of (6.15) to Lebesgue
integration theory. Then, in the next section, we will characterize all func-
tions f for which (6.16) holds in the Lebesgue sense.
To begin, we introduce some useful abbreviations. We write C1 ([a, b])
for /^([a,Ь],Л4[в>ц, А[а>ь]) and £1(72.) for £1(7^,Л4,Л). Recalling our con-
vention for using Riemann-integral notation for Lebesgue integrals, we have
that f G £1([a,6]) means f is Л4[а,ь]-measurable and f^\f(x)\dx < oo;
f G £1(7?.) means f is Al-measurable and |/(x)| dx < oo. Moreover,
we will continue to use the phrase almost everywhere instead of Lebesgue
almost everywhere and the notation ae instead of A-ae.
Our first goal is to prove that whenever f G £1([a,b]), (6.15) holds
almost everywhere on [a, b]. Several preliminary results are needed to es-
tablish that fact.
PROPOSITION 6.2
Suppose f G £1([a, 6]) and set
F(x) = i f(t) dt, a <x <b,
J a
Then F is continuous and of bounded variation on [a, b\. Moreover,
Cb
V*F= / |/(x)|dx.	(6.17)
336 о Chapter 6 Differentiation
PROOF: Let x e [a, b\. We will show that F is continuous at x. Let
{xn}Xi C [a, b] be such that xn x as n	oo. Set fn = fx[a,xn)- Note
that we have fn —> fx[a,x) except possibly at x, so that fn fx[a,x)
Moreover, since \fn\ < |/| 6 £1([a, 6]), the DCT implies that
/*®n
lim F(xn) = lim / f(t) dt
n—>oo	n—>oo /_
dt
dt = F(x).
Consequently, F is continuous at x.
Now we show that F is of bounded variation on [a, b\. Consider an
arbitrary partition a = xq < Xi < • • • < xn = b of [a, b\. Then
n
E
k=l
П fXk	fXk-l
|F(xfc) - F(xfc_i)| =	/ f(x) dx - I
fc=l Ja	J*
f(x) dx
Hence,
(6.18)
To establish the reverse of (6.18), we first consider the case where f is
a continuous complex-valued function on the interval [u, 6]. Let б > 0 be
given. By the uniform continuity of /, there is a <5 > 0 such that x,y G [a, b]
and |z - y\ < 6 implies |/(x) - f(y)\ < e/2(b - a).
6.3 The Indefinite Lebesgue Integral □ 337
Now, let a = xq < xi < • • • < xn = b be a partition such that
|xj+i — Xj\ < 6 for j = 0, 1, ..., n — 1. Then
|F(xj+i) — F(®j)|
f(x) dx
|/(x) - f(xj)\dx
|y(xj)|dx 2(i> —a)^+1
\f(x)\dx- f ||/(x)| - |f(xj)||dx
J Xj
(6.19)
2(b-afr+l Xj>>
fxJ+l	£
f |/(x)| dx - (xj+i - Xj).
Xj	° a
Summing on both sides of (6.19) we obtain
n-l	-b
VobF>£|F(xj+i)-F(x,)|> / \f(x)\dx — e.
J=o	Ja
As б is an arbitrary positive number, we obtain the reverse of (6.18) in the
case where f is a continuous complex-valued function on [a, 6].
To prove the reverse of (6.18) in the general case, let f e £x([a, 6])
and б > 0. Using Exercise 4.82 on page 202, we select a simple function s
such that J* \f(x) — $(z)| dx < e/2. Then applying Exercise 3.64(d) on
page 139 and the dominated convergence theorem, we choose a continuous
function fo such that J* |s(x) — /o(z)| dx < e/2. It follows that
ь
\f(x) - /о(^)I dx < 6.	(6.20)
Now let Fq(x) = J® fo(t) dt. From Exercise 6.25 on page 334, we have
KbFo-V0b(Fo-F)< V*F.
It follows from (6.18) and (6.20) that
KbFo - 6 < VabF.
(6-21)
338 □ Chapter 6 Differentiation
We have already proved the proposition in the case F = Fq. Thus, in view
of (6.20), (6.21), and the previous inequality, we conclude that
rb	rb	rb
I \f(x)\dx — e< I \f(x)\dx—l \f(x)-f0(x)\dx
a	J a	J a
dx = V*F0<V*F + e.
Because e was arbitrarily chosen, we have proved the reverse of (6.18) in
the general case.	
Remark: Since any continuous function on [a, b] is uniformly continuous
on [a, 6] we have, in fact, that the function F in Proposition 6.2 is uniformly
continuous on [a,b\.
A result analogous to Proposition 6.2 is valid for functions defined
on R. Specifically, we have the following fact whose proof is left as an
exercise for the reader.
PROPOSITION 6.3
Suppose f 6 £x(7?.) and set
F(x) = /* f(t) dt, — oo < x < oo.
J — oo
Then F is continuous on R and is of bounded variation on every finite
closed interval. Moreover,

(6.22)
where, by definition, V^F = lim^oo V™nF.
EXAMPLE 6.5 Illustrates Proposition 6.2
a)	It is quite easy to prove directly that sin x is of bounded variation
on [0,2тг]. However, it is even easier to prove that fact by employing
Proposition 6.2. We just note that cos x G £x([0,2тг]) and that
sin x = / cos t dt, 0 < x < 27Г.
Jo
b)	Define F on [0,1] by F(0) = 0 and F(x) = xsin(l/a;), for x / 0. Clearly,
F is continuous on [0,1]. However, as we discovered in Example 6.4(b)
6.3 The Indefinite Lebesgue Integral □ 339
on page 331, F is not of bounded variation on [0,1]. Consequently, by
Proposition 6.2, it is impossible to find a function f G £1([0,1]) such
that xsin(l/x) = fg f(t)dt for 0 < x < 1. In words, F is not the
indefinite integral of a Lebesgue integrable function on [0,1].
c)	The Cantor function, is continuous and of bounded variation on [0,1]
(because it is monotone). Exercise 5.28 on page 284 shows, and we will
show again, that is not the indefinite integral of a Lebesgue integrable
function on [0,1]. This proves that the converse of Proposition 6.2
fails.	□
First Fundamental Theorem of Calculus
By Proposition 6.2 (page 335) and Theorem 6.4 (page 333), if f G jC1 ([a, b])
and F is the indefinite integral of /, then F is differentiable almost every-
where on [a, b]. So we will have established the generalization of (6.15)
to Lebesgue integration theory once we show that F'(x) = f(x) ae. To
accomplish that, we need the following two lemmas, whose proofs are left
to the reader. (See Exercises 6.38 and 6.39.)
LEMMA 6.2
If f E £1([a, 6]) and f* f(t)dt = 0 for all x G [a, b], then f = 0 almost
everywhere on [a, b].
LEMMA 6.3
Suppose g is defined in some open interval about x E 1Z and g is continuous
at x. Then
x+h
g(t) dt = g(x).
We are now in a position to establish the first fundamental theorem
of calculus for Lebesgue integration.
THEOREM 6.5 First Fundamental Theorem of Calculus
Suppose f G Z21 ([a, b]) and set
F(x) = f f(t) dt, а < x <b.
J а
Then F is differentiable almost everywhere on [a, b] and, in fact,
F'(x) = f(x)	(6.23)
lim
h->0 h
for almost all x G [a, b].
340 □ Chapter 6 Differentiation
PROOF: We have already observed that F is differentiable almost every-
where on [a, b\. Hence, it remains to prove (6.23). We do this first for
bounded, nonnegative f. So, assume 0 < f < M on [a, b\. Extend the
domain of f (and, hence, of F) by setting f(x) = 0 for x [a, b\. Let
a < t < b.
Note that fn —► F' ae. Because /n(t) = n ftt+” f(s) ds and f is bounded
by M, we have |/n| < M for all n. Applying the dominated convergence
theorem, we get that
fX	fX
lim / fn(i)dt— / F'(t)dt,
n->o° J a	J a
a < x <b.
On the other hand, by Lemma 6.3, we have for a < x < 6,
rx	/	1 \
lim I fn(t)dt= lim n I Fit d—) — F(t)
n-ocJa	n-*°° J a L V nJ
= lim n
n—+OO
dt
fx+i	ra+i
I F(t)dt- / F(t)dt
x	J a
n
Note that this result does not require f to be bounded.
Consequently, we see that J® f(t) dt — f* F'(t) dt, for a < x < b, or
fa (/W ~ W) dt = 0, for a < x < b. Thus, by Lemma 6.2, Ff = £ ae.
This proves (6.23) in case f is nonnegative and bounded.
Next we will establish (6.23) for nonnegative f without the bounded-
ness condition. Let fn(t) be as before. Because f is nonnegative, so are
the fns. Applying Fatou’s lemma and the previous displayed equation, we
get that
[ F'(t) dt < liminf f fn(t)dt= i f(t)dt. (6.24)
da	ж* J a	J a
We use the method of truncation to reduce this case to the bounded
case. For each n G AT, let

f(t) <
/(t) > n.
6.3 The Indefinite Lebesgue Integral □ 341
Then each gn is a nonnegative bounded measurable function. Consequently,
by what we already established for such functions, we conclude that for
almost all x G [a, b],
d [x
fa J 3n(t)dt = gn(x).
Now,
F(x)= [ f{t)dt = i (f(t)-gn(t))dt+ f gn(t)dt.
Ja	J a	Ja
The first term on the right-hand side is nondecreasing because f > gn
and, hence, is differentiable almost everywhere and, clearly, its derivative
is nonnegative where it exists. Thus, for almost all x G [a,b],
d fx
F\x) - fa j (/W - dt + gn(x) > gn(x).
Because gn] f pointwise on [a, b], the monotone convergence theorem and
the previous inequality give
[ F'(t)dt> lim [ gn(t)dt = [ f(t)dt. (6.25)
«/Cl	v CL	J Q,
It now follows from (6.24) and (6.25) that j* Ff(t) dt = f* f(t) dt for
a < x < b, and so F' = f ae on [a, b].
It remains to establish (6.23) without the nonnegativity assumption.
For f G ([a, b]), write f = fa - f2 + ifz - ifa where fj > 0, 1 < j < 4.
Then,
F(x) = [ fa(t)dt — [ f2(t)dt + i /* fz(t)dt-i /* fa(t)dt.
J a	J a	J cl	J a
It now follows from what we have proved for nonnegative functions that
F' = /i - f2 + г/3 - ifa = f ae on [a, b].	
As a corollary of Theorem 6.5, we obtain the following result whose
proof is left as an exercise for the reader. (See Exercise 6.40.)
COROLLARY 6.2
Suppose f G £x(7i) and set
F(x) = f fat) dt,	—oo < x < oo.
J — oo
Then F is differentiable almost everywhere on TZ and, in fact, F'(x) = f (z)
for almost all x E1Z.
342 □ Chapter 6 Differentiation
EXERCISES 6.3
6.33 Let C £1([a, b]). For each n G AT, set
Fn(x) = /* fn(t)dt, a<x<b.
J a
Suppose that Fn —♦ 0 ae. Does it follow that
a)	fn 0 ae?
b)	fn —► 0 in measure?
* 6.34 Suppose Fi and F2 are disjoint bounded closed sets. Prove that there exist
disjoint open sets, Oi and (Э2, such that Oi D Fi and O2 D F2. Hint: Refer
to Exercises 3.21(a) and 3.22(a).
6.35	Prove Proposition 6.3 on page 338.
6.36	Give an example of a function that is of bounded variation on [0,1] but that
is not continuous on [0,1]. Can such a function be the indefinite integral of
a Lebesgue integrable function on [0,1]? Explain your answer.
6.37	Give an example of a function that is continuous on [0,1] but not of bounded
variation on [0,1]. Can such a function be the indefinite integral of a
Lebesgue integrable function on [0,1]? Explain your answer.
6.38	Prove Lemma 6.2 on page 339 by proceeding as follows.
a)	Explain why we can, without loss of generality, assume that f is real-
valued.
b)	Show that if f is positive on a set of positive measure, then there is
a closed subset К of (a, b) such that J f(x) dx > 0. Hint: Refer to
Exercise 4.52 on page 191 and Exercise 3.43 on page 127.
c)	Use part (b) to deduce that if f is positive on a set of positive measure,
then fTf(x)dx 0 for some open interval I C (a, b). Hint: Write
О = (a, b) \ К and note that f f(x) dx = — f f(x) dx.
d)	Use part (c) to show that if f is positive on a set of positive measure,
then J* f(t) dt^O for some x G [a, b], contradicting a hypothesis of the
lemma. Conclude that the set where f is positive has measure zero.
e)	Explain why the set where f is negative must have measure zero.
6.39	Prove Lemma 6.3 on page 339.
6.40	Prove Corollary 6.2 on page 341.
6.4	ABSOLUTELY CONTINUOUS FUNCTIONS
We have now extended the first fundamental theorem of calculus to the
setting of Lebesgue integration theory. We might expect the generalization
of the second fundamental theorem of calculus to be as follows: Suppose f is
6.4 Absolutely Continuous Functions □ 343
defined on [a, b] and f exists almost everywhere and is Lebesgue integrable
on [a, 6]. Then
f f'(t) dt — f(x) — /(a), a < x < b.
J a
But this is not true in general? For example, let -0 be the Cantor
function. Then = 0 ae and so ip G ([0,1]). However,
[ ^(t)dt = 0/ 1 = ^(1) ~^(0).
Jo
In fact, for all x G (0,1], fg ifi'(t) dt = 0 /	— V>(0)«
Evidently then, a generalization of the second fundamental theorem of
calculus to Lebesgue integration theory requires more restrictive hypotheses
on f. In the next few pages, we will characterize the functions for which
that generalization holds. To begin, we give such functions a special name,
the rationale for which will become apparent once we characterize them.
DEFINITION 6.7 Absolutely Continuous Function on [a,b]
Suppose that f is defined on [a, 6], /' exists almost everywhere and is
Lebesgue integrable on [a, b], and
/(x) = /(a) + /* /'(£)dt, a<x<b.
J a
Then f is said to be absolutely continuous on [a, b].
DEFINITION 6.8 Absolutely Continuous Function on 1t
Suppose that f is defined on It, f exists almost everywhere and is
Lebesgue integrable on 1t, and
/(x) = /* f'(t) dt, —oo < x < oo.
J—oo
Then f is said to be absolutely continuous on It.
t It is true, however, if in the previous paragraph, exists almost everywhere” is
replaced by exists everywhere.” See Theorem 5 of John F. Randolph’s Basic Real
and Abstract Analysis (New York: Academic Press, 1968), p. 424.
344 □ Chapter 6 Differentiation
EXAMPLE 6.6 Illustrates Definitions 6.7 and 6.8
a)	We just saw that the Cantor function, is not absolutely continuous
on [0,1]. Because = 0 ae on [0,1], Theorem 6.5 shows the impossibil-
ity of representing as an indefinite integral of an £x([0,1]) function.
For if *ф(х) = J** g(t) dt, 0 < x < 1, then we must have 0 = t// = g ae,
implying ^(x) = 0, 0 < x < 1, which is not true.
b)	Theorem 6.5 on page 339, in particular (6.23), shows that if a function F
on [a, b] can be represented as the indefinite integral of some function
in £x([a, b]), then F(x) = J* F'(t)dt for а < x < b, so that F is
absolutely continuous. Note: F(a) = 0.
c)	Corollary 6.2 (page 341) shows that if a function F on К can be rep-
resented as the indefinite integral of some function in £x(7£), then
F(x) = F'(t) dt for x G 1Z, so that F is absolutely continuous.
Note: F(—oo) = 0.	□
The results of Examples 6.6(b) and (c) are summarized in the following
proposition.
PROPOSITION 6.4
a)	Let F be a function deSned on [a, b] and suppose that there is a function
f E £x([a, b]) such that F(x) = f* f(t)dt for а < x < b. Then F is
absolutely continuous on [a, b\.
b)	Let F be a function defined on 1Z and suppose that there is a func-
tion f e £x(7£) such that F(x) = fl^f^dt for x e 11. Then F is
absolutely continuous on 1Z.
PROPOSITION. 6.5
Iff is absolutely continuous on [a, b], then it is continuous and of bounded
variation on [a, b\. Moreover,
(6.26)
PROOF: By assumption, /' G £x([a, b]) and
Ж) = №) + Г f'(t)dt,
J а
a <x <b.
6.4 Absolutely Continuous Functions □ 345
Proposition 6.2 (page 335) shows that the function J* f'(t) dt is continuous
and of bounded variation on [a, b\. Obviously, constant functions are contin-
uous and of bounded variation. Hence, f is continuous and of bounded vari-
ation on [a, 6]. Equation (6.26) follows immediately from Proposition 6.2
once we note that for any function g and constant a, V^(a + g) = V^g. 
EXAMPLE 6.7 Illustrates Proposition 6.5
Define f on [0,1] by /(0) = 0 and f(x) = xsin(l/x), for x / 0. It is easy to
see that f is continuous on [0,1]. In Example 6.4(b) on page 331, we learned
that f is not of bounded variation on [0,1]. Hence, by Proposition 6.5, it
is not absolutely continuous on [0,1].	□
An Equivalent Condition for Absolute Continuity
Although, as we know from Proposition 6.5, continuity and bounded varia-
tion are necessary conditions for absolute continuity, they are not sufficient.
Indeed, the Cantor function is continuous and of bounded variation on [0,1]
but is not absolutely continuous.
In the next few pages, we will discover a continuity-type condition that
is equivalent to absolute continuity. As we have seen, any such continuity-
type condition must be stronger than ordinary continuity and, in fact, than
uniform continuity.
PROPOSITION 6.6
Suppose f is absolutely continuous on [a, b\. Then for each e > 0, there is
a S > 0 such that if {(a^, bk)}k=1 is any finite sequence of pairwise disjoint
subintervals of [a, b] with^k=i(bk~ak) < 6, then lf(bk)-f(ak)l < e.
PROOF: By assumption, /' e £x([a, 6]) and f(x) = f(a) 4- j* f'(t)dt
for a < x < b. It follows that
/(d) - /(c) = £ f\t) dt, a<c<d<b.
Now, let e > 0 be given. Applying Exercise 4.80 on page 202, we can
choose <5 > 0 such that if E C [a, 6] is measurable and A(E) < 6, then
Д, |/'(t)| dt < e
Let {(afc,bfc)}fc=1 be any finite sequence of pairwise disjoint subinter-
vals of [a, b] such that	“ ak) < Set E = Ufc=i(afc> M- Then
346 □ Chapter 6 Differentiation
E is measurable, E C [a, b], and A(E) < 6. Therefore,
fc=l	k=l

k=i ^ak
as desired.
Proposition 6.6 implies that an absolutely continuous function on a
finite closed interval is necessarily uniformly continuous. However, the
converse is not true. Indeed, we know that any continuous function on a
finite closed interval is uniformly continuous, and we have already encoun-
tered several functions that are continuous but not absolutely continuous
on a finite closed interval.
The question that now arises is whether the necessary condition for
absolute continuity in Proposition 6.6 is also sufficient. The answer is yes,
as Proposition 6.7 shows.
PROPOSITION 6.7
Suppose f is defined on [a, b] and for each e > 0, there is a 6 > 0 such
that if {(afc,&fc)}^=is any finite sequence of pairwise disjoint subintervals
of [a,b] with (bfc - a*:) < 6, then Sfc=i l/(M ~/(ak)l < e- Then f is •
absolutely continuous on [a, b].
To establish Proposition 6.7, we first prove the following two lemmas.
LEMMA 6.4
Suppose f satisfies the hypotheses of Proposition 6.7. Then f is of bounded
variation on [a, b] and f' e jC1 ([a, b]).
PROOF: By assumption, we can choose <5 > 0 so that if {(ajfe,bfc)}^=1
is any finite sequence of pairwise disjoint subintervals of [a, b] such that
Efc=i(bfc - at) < <5, then \f(bk) - f(ak)\ < 1.
Choose N G Af so that (b—a)/N < 6 and consider the partition of [a, b],
a = xq < xi < • • • < xn = b, that divides it into N intervals of length
(b-d)/N. If Xk^-i = г/o < У1 < • • • < Ут = Zfc is any partition of [хк_ъхк],
then - Vj-1) =xk- xk-i < 6 and so Y™=1	- f(yj-i)\ < 1-
6.4 Absolutely Continuous Functions □ 347
Therefore, < 1 for k = 1, 2, ..., N. Using (6.14) on page 332, we
can now conclude that
N
fc=l
Hence, f is of bounded variation on [a, b].
Next we must prove that /' 6 jC1 ([a, b]). Since f is of bounded varia-
tion on [a, b], it is differentiable almost everywhere on [a, b]. To show that
f is measurable, we first extend the domain of f by setting f(x) = /(b)
for x > b. Then we note that because / is measurable, so is the function
/n(x) = n(/(x 4- 1/n) — /(ж)) for each n G X (why?); and, because we
have fn(x) —» /'(ж) ae, it follows from Proposition 3.14 on page 163 that
f is measurable.
Finally, we show that /' is Lebesgue integrable. Since / is of bounded
variation, we can write / = Д — /2 + г/з ~ if4, where fj, 1 < j < 4, are
nondecreasing.
Now, let g be any nondecreasing function on [a, b]. Then gf exists ae
on [a, b] and so
lim -----------------= g (x) ae.
n—>00	—
n
Define g(x) = g(b) for x > b. The functions n{g(x + 1/n) — <7(2:)) are
nonnegative and hence, by Fatou’s lemma,
[ g'(x) dx < liminf n [ taLr + —) — <?(x)l dx
Ja	n“>0° Ja n
f fb+n	rb
= liminfn< / g(i)dt — / g(t) dt
n-00	Ja
f rb+i	ra+n
= lim inf nlj g(t) dt — J g(t) dt
= g(b) — lim sup n / g(t) dt.
n-*oo Ja
Noting n fa + n g(t) dt > g(a) for all n G Af, we conclude that
(x)dx < g(b) - g(a).
(6.27)
348 □ Chapter 6 Differentiation
Returning to /, we have fr =	— /2 + ^/3 ~ ^/4 almos^ everywhere,
so that, \f'\ < Y^j=i fj Цепсе, by (6.27),
This shows that /' G C1 ([a, b]) and completes the proof of the lemma. 
LEMMA 6.5
Suppose g satisfies the hypotheses of Proposition 6.7. If gf = 0 ae on [а,Ь],
then g is constant on [a, b].
PROOF: Let x € (a, b] be fixed but arbitrary. We will establish the lemma
by showing that g(x) = g(a). By hypothesis, g'(y) = 0 for almost all
у E (a,x). Let E = {у E (a,x) : g'(y) = 0} and note that X(E) = x — a.
Let e > 0. Then, by assumption, we can choose 6 > 0 such that if
{(efc,bfc)}fcLi is any finite disjoint collection of subintervals of [a, b] with
the property that Y^k=i(!>k — flfc) <6, then
m
fc=l
€
2
(6.28)
If у E E, then lim^o+G?(Z/ + Л) — g(yf)/h = 0 and, therefore, for
h sufficiently small,
\g(y + h) -g(y)| < j < 
2 \X CL)
(6.29)
Because у E E C (a, x), we also have у + h E (a, x) for h sufficiently small.
It follows that the collection, V, of all closed intervals of the form
[у,У + h], where h > 0, у E E, \g(y + h) — g(y)\ < he/2(x - a), and
[у, У + h] C (a, x) is a Vitali cover of E. So, by the Vitali covering theorem,
there exist pairwise disjoint members of V, say [yj,yj + hj], 1 < j < n,
such that
( & \ U [Vjh Vj + hj] j < 5.	(6.30)
By relabeling, we can assume that a < y± < У2 < • • • < yn < x.
Therefore, we have Fig. 6.3, to which it will be helpful to refer in the
ensuing discussion.
6.4 Absolutely Continuous Functions □ 349
	1	 J	1	1	1		1	1	I					
a	У1 -yi + hl	У2	02 4- h2 • •	* Уп	Уп 4~ hn x
FIGURE 6.3
Now, from (6.30) and the fact that A(F) = x — a, we can conclude
that hj > x — a — 6. It follows that
n—1
(2/1 ~ a) + £(j/j+i - (% + hj)) + (x - (j/n + hn)) < 6.
1=1
In other words, the sum of the lengths of the pairwise disjoint intervals
(a,?/i), (?/i + hi,2/2), • • •, (yn 4- hn,x) is less than 6. Therefore, by (6.28),
n—1
|p(Vi) - s(a)l + 52 ^(j/j+x) - 9{Vj + M +	- g(.Vn + hn)\ < 2’
J=1
On the other hand, by (6.29),
E b(s/fe + hk) - g(yk)\ < £	- i•
fc=l	fc=l k 7
Consequently, by the previous two relations and the triangle inequality,
IpW -	< I^(z/1) - $(a)l + |p(z/i + hl) - g(yi)|
+ IflW - g(yi + hi)| + |p(2/2 + h2) - 0(3/2) |
4- • • • 4- \g(yn + hn) - p(2/n)| 4- \g(x) - g(yn 4- hn)|
As e > 0 was chosen arbitrarily, this shows that g(x) = 0(a).	
We can now prove Proposition 6.7 (page 346). Let f satisfy the hy-
potheses of that proposition. From Lemma 6.4, we know /' e £1([a, 6]).
Now, set
F(x) = I f'(t) dt, a < x <b.
J a
Then F' = f ae on [a, b] (Theorem 6.5) and F is absolutely continuous
on [a, b] (Proposition 6.4(a)). The latter fact and Proposition 6.6 imply
350 □ Chapter 6 Differentiation
that F satisfies the hypotheses of Proposition 6.7 and, consequently, so
does F - f.
But (F—fY = F'-f = 0 ae on [a, b] and, consequently, by Lemma 6.5,
F — f is constant on [a, b]. Because F(a) = 0, we can now conclude that
for all x € [a, b], F(x) — f(x) = —/(a); or, in other words,
/(ж) = /(a) 4- [ f'(t) dt, a <x<b.
J a
This shows that f is absolutely continuous and completes the proof of
Proposition 6.7.
Second Fundamental Theorem of Calculus
We summarize Propositions 6.6 and 6.7 in the following theorem, which is
often referred to as the second fundamental theorem of calculus for
Lebesgue integration.
THEOREM 6.6 Second Fundamental Theorem of Calculus
Suppose f is defined on [a, b]. A necessary and sufficient condition for f to
exist almost everywhere and be Lebesgue integrable on [a, b], and for
[ f'(t) dt = f(x) — f(a), a < x <b,
J a
is that for each e > 0 there is a 6 > 0 such that 52Z=i l/(M ~ f(ak)\ < €
whenever {(a^, bfc)}^=1 is a finite sequence of pairwise disjoint subintervals
of[a,b] with 1Х=1(Ьк “ ak) < 6.
We conclude this section by giving necessary and sufficient conditions
for a function to be absolutely continuous on TZ. The proof is left as an
exercise for the reader. (See Exercise 6.56.)
THEOREM 6.7
A function f is absolutely continuous on TZ if and only if it is absolutely con-
tinuous on every finite closed interval, V^f < oo, and lim^—-oo f(x) = 0.
EXERCISES 6.4
6.41	Establish the following facts.
a)	Suppose f is defined on [a, b] and ff exists and is Riemann integrable
on [a, b]. Prove that f absolutely continuous on [a, b]. Conclude that
f is absolutely continuous on [a, b] if f' is continuous on [a, b].
6.4 Absolutely Continuous Functions □ 351
b)	Suppose f is continuous on 7£, V^J < oo, and limx_>_oo f(x) = 0.
Further suppose there exists a finite number of points such that f is
absolutely continuous on any finite closed interval that contains none of
those points. Prove that f is absolutely continuous on H.
c)	Suppose F is a continuous distribution function such that F' exists and is
continuous except at a finite number of points. Prove that F is absolutely
continuous on Tt.
6.42	Prove that f(x) = y/x is absolutely continuous on [0,1].
6.43	Show that if f and g are absolutely continuous on [a, d], then f + g is
absolutely continuous on [a, 6].
6.44	Show that if f is absolutely continuous on [a, 6] and a G C, then af is
absolutely continuous on [a, 6].
6.45	Let a > 0 and define f: 71 —► TZ by f(x) = e“Qlxl. Show that f is absolutely
continuous on TZ.
6.46	Is f(x) = 1 absolutely continuous on [0,1]? On 7£?
6.47	Define f on [0,1] by /(0) = 0 and f(x) = xsin(l/a:), for x / 0. From
Example 6.7 on page 345, we know that f is not absolutely continuous
on [0,1]. Show that f is absolutely continuous on any interval [a, 6] not
containing 0.
6.48	Define f on [0,1] by /(0) = 0 and f(x) = xa sin(l/x), for x / 0. Show that
f is absolutely .continuous on [0,1] if and only if a > 1.
6.49	Let f be real-valued and absolutely continuous on [a, b\. Prove that
a)	f takes sets of measure zero to sets of measure zero; that is, if E C [a, 5]
and A(E) = 0, then A (/(E)) = 0.
b)	/ takes measurable sets to measurable sets; that is, if A C [a, b] is
measurable, then so is f(A). Hint: Choose an increasing sequence of
closed sets contained in A such that the set difference between A and
the union of the closed sets has measure zero. Next apply part (a) to
show that the set difference between the image of A and the union of
the images of the closed sets has measure zero.
6.50	Suppose /: [a, b] —► [c, d] is absolutely continuous and monotone and g is
absolutely continuous on [c,d]. Prove that g о f is absolutely continuous
on [a, 6].
6.51	Proposition 6.5 on page 344 states in part that if / is absolutely continuous
on [a, 5], then V^f = j* |/'(x)| dx. Show that the hypothesis of absolute
continuity cannot be weakened by finding a continuous function of bounded
variation that does not satisfy the previous equation.
6.52	Give an example of a function / that is absolutely continuous on [0,1] but
is such that /' is not Riemann integrable on [0,1].
6.53	Construct an absolutely continuous function on [0,1] that is strictly increas-
ing but whose derivative vanishes on a set of positive measure. Hint: Let
Pa be as in Exercise 3.39 on page 126 and set f(x) = f* xp£ (t) dt.
352 □ Chapter 6 Differentiation
6.54	In establishing Lemma 6.4 on page 346, we proved that if f is of bounded
variation on [a, d], then f' G £1([a, 6]). Here is an alternate derivation of
that result.
a)	Show that f is Lebesgue measurable.
b)	Show that /' is Lebesgue measurable.
c)	Prove that	dx < V£f and, therefore, that }' is Lebesgue inte-
grable on [a, 6]. Hint: Use (6.14) on page 332.
6.55	A function, /, is said to be Lipschitzian on [a, b] if there is a constant M
such that
l№) -	< M\x - j/|,	x,j/e[a,b].
a)	Show that if f has a bounded derivative on [a, d], then it is Lipschitzian
on [a, 6].
b)	Prove that if f is Lipschitzian on [a, d], then it is absolutely continuous
thereon.
6.56	Prove Theorem 6.7. Hint: Use (6.26).
6.57	Integration by parts: Suppose f and g are absolutely continuous on [а, Ь].
Then fg' and f'g are in C1 ([a, b]) and
pb	pb
I f(.x)g'(x>) dx = f(tyg(b) - f{a)g(a) - / f(x)g(x) dx. (6.31)
J a	J a
Establish this result by employing the following steps.
a)	Show that fg' and f'g are in £*([а, &]).
b)	Prove that fg is absolutely continuous on [a, 5]. Hint: Show that the
hypotheses of Proposition 6.7 (page 346) are satisfied.
c)	Prove that (6.31) holds. Hint: Recall the product rule from elementary
calculus.
6.58	Give an example where the integration by parts formula fails in case f is ab-
solutely continuous and g is uniformly continuous and of bounded variation
but not absolutely continuous.
6.59	Let h € С1 ([a, &]) and define
F(x) = f h(t)(x — t)n dt\ a <x <b.
J а
Show that F is n-times differentiable on [a, 6], F^ is absolutely continuous
on [a, d], and = n! h ae on [a, b]. Hint: Use induction and integration
by parts.
6.60	Taylor’s theorem: Suppose f is defined on [a, d], f is n-times differentiable
on [a, 6], and f^ is absolutely continuous on [a, b]. Then, for a < x < b,
Лх) = f(a) + /'(a)(x - a) + • • • +	- a)n
. r	(6-32)
+ ± / /(n+1)(t)(x-t)"dt.
6.4 Absolutely Continuous Functions □ 353
Establish this result. Hint: Use induction on n, integration by parts, and
Exercise 6.41.
6.61	Show that the hypothesis of absolute continuity cannot be removed in the
version of Taylor’s theorem given in the previous exercise.
6.62	Prove that the converse of Taylor’s theorem is true: Suppose that f is
defined on [a, 6] and that there are constants ao, ai, ..., an and a function
h G C1 ([a, 6]) such that for a < x < 6,
f{x) = a0 + ai(x — a) 4----Hn(z-a)n+ f h(t)(x — t)n dt.
\	J a
Then, on [a, 6], f is n-times differentiable and is absolutely continuous.
Moreover, ak = f^k\a)/kl, 0 < к < n, and h = /^n+1^/n! ae on [a, 6]. Hint:
Use Exercise 6.59.
★6.63 Integration by substitution: Suppose that g is a monotone and abso-
lutely continuous function on [a, b] with range [c, d] and that f G E1 ([c, d]).
Then (/ о g)g G C1 ([a, 6]) and
[ №(*W(*)|d:r = f f(y)dy.
Ja	J c
Establish this result by employing the following steps.
a)	Show that, without loss of generality, we can assume that g is nonde-
creasing.
b)	Show that for each open set О C [c, d],
A(O) = f g'(x)dx.
Jg-ЦО)
c)	Show that if D is a G^-set, then D satisfies the equation in part (b).
d)	Let H = {x : g'(x) /0}. IfEC [c,d] has measure zero, prove that
А* (t?-1 (E) П H) = 0 and, hence, that g-1 (E) П H is measurable and has
measure zero.
e)	Prove that if A C [c, d] is measurable, then so is g-1(A) Г) H and
A(A) = / g'(x)dx.
f)	Show that if f is a nonnegative measurable function on [c, d], then the
function (/ о g)g' is measurable on [a, 6] and
/»Ь	rd
/ f(g(.x))g'(x) dx = / f(y)dy.
J a	J c
354 □ Chapter 6 Differentiation
g)	Show that if f G C1 ([c, d]), then (/ о g)gr G C1 ([a, 5]) and the equation
in part (f) holds.
6.64 Let f be the function in Exercise 6.53.
a)	Show that there is a set E of measure zero such that /~1(E) is not
measurable. Hint: Use the fact that any set of positive (outer) measure
contains a nonmeasurable set. Also, refer to Exercise 6.63(e).
b)	Show that the function g =	1 is not absolutely continuous. Hence, the
inverse of an absolutely continuous function, when it exists, need not be
absolutely continuous. Hint: Refer to Exercise 6.49(b).
6.5 SIGNED MEASURES
We discovered earlier (see, e.g., Exercise 4.61) that, if (Q, A, /1) is a measure
space and f is a nonnegative extended real-valued Л-measurable function
on Q, then the set function
i/(A) = [ fdfi, A G A,	(6.33)
J A
is a measure on A.
What about the converse: If (Q,A, /i) is a measure space and у is
a measure on A, can a nonnegative extended real-valued A-measurable
function f be found such that (6.33) holds? It is easy to see that the
answer to this question is no!
For example, take (Q,A,/i) = (7£, A4,A) and у = 6q, the Dirac mea-
sure concentrated at 0. Then a representation of the form (6.33) is impos-
sible. Indeed, if f is any nonnegative extended real-valued Ad-measurable
function, then
5o({O}) = l/O = [ fdX.
J{0}
What goes wrong here is the following: There exists a set A, namely, the
set {0}, such that A(A) = 0 but^o(^) / 0, whereas, by Proposition 4.8(d),
(6.33) forces i/(A) = 0 whenever /i(A) = 0. In other words,
M(A) = 0 => i/(A) = 0	(6.34)
is a necessary condition for a measure у to be representable as in (6.33).
Remarkably, in most important cases, (6.34) is also a sufficient condition
for that representation. That fact is called the Radon-Nikodym theo-
rem and will be proved in Section 6.6.
6.5 Signed Measures □ 355
EXAMPLE 6.8 Illustrates (6.33) and (6.34)
a)	Let I denote the collection of all intervals I С 7£, including degenerate
intervals of the form (a,a) and [a, a]. Also, let у be the measure on В
such that i/(J) = 0 if I C (—oo,0) and y(I) = e~3a — e~3b if I has
endpoints a and 5, where 0 < a < b < oo; according to Corollary 4.6 on
page 218, у is determined by these conditions. If we let /i = A|#, then
у has a representation in the form (6.33), namely,
i/(B) = [ fdX, BeB,
Jb
where f(x) = 3e~3x for x > 0, and f(x) = 0 otherwise. This is true
because the measure w(B) = fBfdX agrees with у on intervals and so,
by Corollary 4.6, must equal v.
b)	Let (fi, Л,//) — (Tt,At,A) and E € AL Set у (A) = X(ACiE), A € At.
Clearly (6.34) holds. Here it is obvious that у can be represented in the
form (6.33), namely, y(A) = fAXEdX.
c)	Let (fi, Д) = (Ti, At) and v = <50- As we have seen, v cannot be rep-
resented in the form (6.33) if = A. However, if /1 is counting mea-
sure on (Ti, Л4), then v does have such a representation, namely, with
f = X{0}-
d)	Let (fi, Л) = (Ti, B) and be the Cantor function. Define
' F(x) =
°,
< i/j(x),
11,
x < 0;
0 < ж < 1;
x > 1.
Note that F is a distribution function, that is, it satisfies (a)-(d) of
Definition 4.20 on page 226. Therefore, by Theorem 4.13 on page 226,
there is a unique finite Borel measure, i/, having F as its distribution
function. We claim that у has no representation in the form (6.33) if
= A|g. Suppose to the contrary that there is a nonnegative Borel
measurable function f such that y(A) = fAf dX, for A G B. Because
y(R) = 1, / e £X(T£); hence, if we let g = /|[o,i], then 9 G £X([0,1]).
Moreover, for 0 < x < 1,
^(x) = F(x) - F(0) = i/((0,z]) = [ fdX= [ g(t)dt.
J(o,x] Jo
This implies that is absolutely continuous on [0,1], which we know is
not true. So у cannot be represented in the form (6.33) if /1 = A|g. □
356 □ Chapter 6 Differentiation
Signed Measures
In order to prove the Radon-Nikodym theorem, we need to introduce signed
measures. This is a simple generalization of measures where the nonneg-
ativity condition is dropped. Thus, any measure is a signed measure, but
not conversely.
DEFINITION 6.9 Signed Measure
Let (Q, A) be a measurable space. A signed measure, i/, on A is an
extended real-valued function satisfying the following conditions:
a)	^(0) = °-
b)	If Ai, A2, ... are in A, with Ai f}Aj = $ for i / j, then
v = 52р(лп)-
' n ' n
Remark: The equation in (b) of Definition 6.9 is taken in the extended
sense; that is, any values in 7£* are permitted. However, the right-hand
side of the equation must make sense: it must converge or it must diverge
to ±oo. In particular, и cannot take on both oo and -oo as values; that
is, if i/(E) = oo for some E e A, then z/(A) > —oo for all A € A, and if
v(E) = —oo for some E e A, then i/(A) < oo for all A € A.
EXAMPLE 6.	9 Illustrates Definition 6.9
a)	Let (Q, A) be a measurable space, ai,a2 € and /ii and /i2 measures
on A, at least one of which is finite. Define ai/ii + a2/i2 on A by
(aiMi + a2/i2)(A) = ai/ii(A) + a2/i2(A).
It is easy to see that ai/ii + a2/i2 is a signed measure on A. Note that
if ai > 0, then is a measure on A.
b)	Let (Q, A, /i) be a measure space and / 6 £*(//) be extended real-valued.
Define
i/(A) = I f dp, AeA.
J A
By Exercise 4.72 on page 201, у is a signed measure on A. If f is
nonnegative, then v is a measure.	□
6.5 Signed Measures □ 357
The Hahn Decomposition Theorem
From Example 6.9(a) it follows that the difference of two measures on a
a-algebra, at least one of which is finite, is a signed measure. Our next goal
is to show that the converse is also true: Any signed measure on a a-algebra
can be expressed as the difference of two measures on that a-algebra, at
least one of which is finite.
The idea is the following. Let v be a signed measure on a a-algebra A.
Suppose we can find a set De A such that y(E) > 0 for each Д-measurable
subset of D and such that i/(E) < 0 for each Д-measurable subset of Dc,
Then we can define set functions, i/+ and i/“, on A by i/+(A) = ^(Д П jD)
and у^~(А) = —y(A П jDc). And it is easy to see that i/+ and y~ are
measures on A and that и =	— y~.
The existence of such a set D is the substance of the Hahn decompo-
sition theorem, and the pair (jD, Dc) is called a Hahn decomposition
for y. We begin with the following definition.
DEFINITION 6.10 Positive and Negative Sets
Let (Q, Д) be a measurable space and у a signed measure on A. A
set P e A is called positive for у if i/(E) > 0 for all sets E e A
with EcP. A set N e A is called negative for у if у(Е) < 0 for all
sets E e A with E C N.
EXAMPLE 6.1	0 Illustrates Definition 6.10
a)	Let (Q, Д) = (7^,Л4), ц = <5o + <5i, and у = A —g. Note that for A e Л4,
(X(A),
y(A) = A(A) - 1,
A(A) - 2,
if {0,1} П A = 0;
if exactly one of 0,1 are in A;
if {0,1} C A.
The sets 7i\ {0,1} and (1,2) are positive for y, the sets {0,1} and Q are
negative for i/; the set {2,3,4,...} is both positive and negative for y\
and the set [0,1] is neither positive nor negative for i/, although we
have iz([0,1]) = —1. In fact, the positive sets for у are precisely those
At-measurable sets containing neither 0 nor 1; and the negative sets
for у are precisely those At-measurable sets having Lebesgue measure
zero. Moreover, the pair consisting of TZ \ {0,1} and its complement is
a Hahn decomposition for i/, as is the pair consisting of 11 \ {0,1,2} and
its complement, etc.
358 □ Chapter 6 Differentiation
b)	Let (Я,Л) be a measurable space and v a measure on Л. Then ev-
ery Л-measurable set is positive for y\ the negative sets for v are the
Д-measurable set having z/-measure zero. A Hahn decomposition for у
is (Q,0).
c)	Any set that is both positive and negative for a signed measure v must
have ^-measure zero. However, there may be sets of z/-measure zero that
are neither positive nor negative for y. For instance, if v is the signed
measure defined in part (a), then the set [0,1) has z/-measure zero, but
is neither positive nor negative for y.	□
Using the terminology introduced in Definition 6.10, we see that in a
Hahn decomposition, (jD,jDc), the set D is positive for у and its comple-
ment is negative for v. Thus, intuitively, the set D should in some sense
be a maximal positive set.
In proving the Hahn decomposition theorem, we will need several lem-
mas. The first lemma shows that signed measures share important prop-
erties with measures. Its proof is left as an exercise for the reader.
LEMMA 6.6
Let (П,Л) be a measurable space and у a signed measure on A. Then the
following hold:
a)	If А, В e Л, A С B, and |i/(B)| < oo, then |i/(A)| < oo.
b)	If	c Е1СЕ2С--', then
i/f M En\ = lim i/(En).
\	/	n—>oo
xn=l '
c)	If {Дг}^1 C A with Ei D E? 2> • •  and |iz(Ei)| < oo, then
v( П	= lim
\ 1 •	/	n—+00
xn=l 7
LEMMA 6.7
A countable union of positive sets is positive.
PROOF: Let (Я,Л) be a measurable space and у a signed measure on Л.
Suppose {Pn} С Л is a sequence of positive sets for y. We claim that
Un^Pn is positive for y.
6.5 Signed Measures □ 359
Let E c Un^n- We must show that p(jE) > 0. To that end, we
“disjointize” E as follows. Set E± = E П P± and, for n > 2,
n—1
En — E П Pn \ Pfa.
k=l
Then the Ens are pairwise disjoint and (Jn En = E. Since Pn is positive
for у and En c Pn, we have p(En) > 0. Hence, p(E) = p(En) >0- 
The next lemma shows that any set of finite positive p-measure has a
subset of positive p-measure that is positive for y.
LEMMA 6.8
Let (Q,A) be a measurable space and у a signed measure on A. Suppose
A e A and 0 < y(A) < oo. Then there is an A-measurable set P C A that
is positive for у and satisfies y(P) > 0.
PROOF: Note that by Lemma 6.6(a), any subset of A has finite p-measure.
The idea of the proof is to keep extracting sets from A that have negative
p-measure of large magnitude.
If A is positive for p, we are done. Otherwise, A contains a set with
negative p-measure. In that case, let Iq = inf{p(E) : E C A}. By
assumption, < 0; so there is an n E V such that Zq < —n-1. Let
ni be the smallest such n. By definition of Zq, there is an Ai C A such
that p(Ai) < —n^1. Note that if n < ni, then L\ > —п"1 and hence
p(E) > —n""1 for all E C A.
Extract Ai from A; that is, consider A \ Ax. Note that
p(A \ AJ = p(A) — p(Ai) > p(A) + — > 0.
ni
If A \ Ai is positive for p, we are done. Otherwise, A \ Ai contains a set
with negative p-measure. In that case, let L2 = inf{ p(E) : E C A \ Ai }.
By assumption, L2 < 0; so there is an n e Af such that L2 < —n~1. Let
П2 be the smallest such n. By definition of Z2, there is an A2 C A \ Ai
such that р(Аг) < — П2 Note that if n < 712, then L2 > —n”1 and hence
p(E) > -rT1 for all E c A \ Ai.
Extract A2 from A \ Ai; that is, consider A \ Ai \ A2 = A \ (Ai U A2).
Note that
р(л \ (41 и 42)) = 1/(4) - (i/(4i) + i/(42))
> i/(4) + — + — > 0.
П1 n2
360 □ Chapter 6 Differentiation
If this process terminates after a finite number of steps, we are done.
Otherwise, we obtain a sequence of pairwise disjoint subsets	of A
and a sequence of positive integers {n^}^ such that for each E M
Ak C	Aj, z/(Afc) < — тг^1, and is the smallest positive integer n
for which there is a subset of A\(J^~X Aj having z/-measure less than —п"1.
Let P = A \ Afc, and note that
oo	oo -
p(P) = i/(A) - £ p(Afe) > u(A) + £ - > 0.
k=i	fc=iПк
We claim that P is positive for y. Suppose to the contrary that there is
a set В G P with i/(B) < 0. Since i/(A) < oo, Lemma 6.6(a) implies that
p(P) < oo. Consequently,	< 00 anc^ so> particular, rik oo
as к —> oo. Since rik —> oo, there is a fcg such that (nk0 — I)-1 < — i/(B) or,
in other words, i/(B) < — (п&0 — I)-1. But В С P and so В С A\|J^2=^1 Aj.
This contradicts the minimality of nk0. Hence P is positive for i/.	
THEOREM 6.8 Hahn Decomposition Theorem
Let (Q, A) be a measurable space and у a signed measure on A. Then there
is a set D G A such that D is positive for у and Dc is negative for y. The
pair (D, Dc) is called a Hahn decomposition for y.
PROOF: We can assume without loss of generality that у does not take
on the value oo. (Why can this be done?)
As we mentioned earlier, D should in some sense be a maximal positive
set for y. With that idea in mind, let
p = sup{ i/(P) : P positive for у }.
Then we can choose a sequence {Pn}^Li of positive sets for у such that
lim^oo i/(Pn) = p. Let Dn - Ufc=i pk and D = UXi Dn- Applying
Lemma 6.7 twice we see, in turn, that D2, ..., are positive for у and
D is positive for y.
To show that (D,DC) is a Hahn decomposition for 1/, it remains to
prove that Dc is negative for y. To that end, we first note that because
D± C D2 G • • •, Lemma 6.6(b) implies that y(D) = lim^oo i/(Dn). Also,
Dn D Pn and so Dn = Pn U (Dn \ Pn). Since Dn \ Pn G	and Dn is
positive, y(Dn \ Pn) > 0; hence, y(Dn) > y(Pn)- Recalling that D is
positive, we have
P > „(J?) = lim y(Dn) > lim y(Pn) = P-
n—>oo	n—>00
6.5 Signed Measures □ 361
Consequently, y(D) = p. Since у does not assume the value oo, we must
have p < oo.
Now, suppose that Dc is not negative for y. Then there is a set A C Dc
with y(A) > 0 and, by assumption, i/(A) < oo. Thus, by Lemma 6.8,
A contains a set P that is positive for у and has positive z/-measure. Since
P and D are positive, so is P U D; and since P C Dc, P П D = 0. Hence,
P U D is positive for у and
i/(PUD) = i/(P)+^)>p.
This contradicts the definition of p. Hence Dc is negative for y. 
Is the Hahn Decomposition for a signed measure unique? In general,
it is not. For example, consider the signed measure у defined in Exam-
ple 6.10(a) on page 357. Then (7£\{0,1}, {0,1}) and (7£\{0,1,2}, {0,1,2})
are both Hahn decompositions for y. However, as Exercise 6.69 shows, if
(D, Dc) and (E, Ec) are two Hahn decompositions for a signed measure i/,
then D and E differ by a set that contains only sets of z/-measure zero, and
likewise for Dc and Ec.
The Jordan Decomposition Theorem
Now that we have established the existence of a Hahn decomposition for
a signed measure, we can easily prove that any such measure can be ex-
pressed as the difference of two measures, a result known as the Jordan
decomposition theorem. Before doing so, we introduce the following
terminology.
DEFINITION 6.11 Mutually Singular Measures
Let (Q, A) be a measurable space. Two measures, pi and /12, on A
are said to be mutually singular, denoted pi ± /12, if there is a set
E e A such that pi(Ec) = 0 and p2(E) = 0.
Note that if pi ± /12, then pi and p2 are supported by complementary
sets: For each A G A, pi (A) = pi(A A E) and /12(A) = Pz(A A Ec).
EXAMPLE 6.11 Illustrates Definition 6.11
a)	Let Л be Lebesgue measure and p any discrete measure on Л4, that is,
there is a countable set, K, such that p(Kc) = 0. Then p ± Л.
362 □ Chapter 6 Differentiation
b)	Let Л be Lebesgue measure and /1 be the Borel measure induced by the
Cantor function. Then, considered as Borel measures, /1 ± A. Indeed,
if P is the Cantor set then p(Pc>) = 0 and A(P) = 0.
c)	Let D = {(rr, y) e P2 : у = x }. Define у by
i/(B) = A({x € P : (rr,x) e В }), В € B2.
Then Лг(В) = 0 and y(Dc) = 0, so that и ± A2.	□
THEOREM 6.9 Jordan Decomposition Theorem
Let (Q, A) be a measurable space and у a signed measure on A. Then
у can be expressed uniquely as the difference of two mutually singular
measures, y+ and v~, on A. The representation у = i/+ — y~ is called the
Jordan decomposition of y.
PROOF: Let (jD,Dc) be a Hahn decomposition for v and, for A e A,
define
i/+(A) = y(A A D) and i/-(A) = —y(A A Dc).
Clearly, v =	— y~ and, because (B, Dc) is a Hahn decomposition for 1/,
z/+ and y~ are nonnegative. Noting that i/+(Dc) = 0 and i/~(jD) = 0, we
see that y+ ± v~.
Now suppose we can write v = /11 — /12, where and p2 are mutually
singular measures on A. We must show that /11 = i/+ and p2 = y~. Since
/11 ± /i2, we can choose E e A such that pi(Ec) = 0 and /12(B) = 0.
We claim (E, Ec) is a Hahn decomposition for 1/. Indeed, if F С E, then
/i2 (F) = 0, so that i/(F) = /11 (F) > 0; hence, E is positive for 1/. On
the other hand, if F C Ec, then /11(F) = 0, so that i/(F) — —/12(F) < 0;
hence, Ec is negative for y.
From Exercise 6.69, we have for all A e A that y(A A D) = y(A A E)
and у (A A Dc) = у (A A Ec). The former equality implies that for A € A,
/11 (A) = /ii (A A E) -f- /ii (A A Ec) = /11 (A A E)
= /ii(A A E) — /1г(А A E) = i/(A A E)
= p(AaP) = /(A).
Thus, /ii = i/+. Similarly, p2 = y~.	
6.5 Signed Measures □ 363
EXERCISES 6.5
6.65	Refer to Example 6.8(d) on page 355, where is denotes the Borel measure
induced by the Cantor function. Show that there is no nonnegative Borel
measurable function, /, such that is(B) = fB f dA, for В e B, by finding a
set A e В such that A (A) = 0 and i/(A) / 0.
6.66	Define is(E) = fExdx.
a)	Show that, for each positive real number c, the preceding definition yields
a signed measure on Л4[_с,с].
b)	Is is a signed measure on Л4? Justify your answer.
6.67	Prove Lemma 6.6 on page 358.
6.68	Concerning the proof of the Hahn decomposition theorem:
a)	Why can we assume without loss of generality that is does not take on
the value oo?
b)	What happens if p = 0?
6.69	Let (Q, A) be a measurable space and is a signed measure on A. Suppose
(D, Dc) and (E, Ec) are two Hahn decompositions for is.
a)	Show that D and E differ by a set that contains only sets of i/-measure
zero, that is, if A G A and A C (D \ E) U (E \ D), then is(A) = 0.
b)	Show that Dc and Ec differ by a set that contains only sets of i/-measure
zero.
c)	Prove that is (A A D) = is(A A E) and is(A A Dc) = v(A A Ec) for all
A g A.
6.70	Let (Q, A, p) be a measure space and is and us measures on A. Show that if
и ± p and ш1ц, then is 4- w ± p.
6.71	Let (Q, A) be a measurable space such that {x} G A for each x G Q. A
measure, p, on A is called continuous if it has no atoms, that is, p({x}) = 0
for all x G Q. If p is a continuous measure on A and v is a discrete measure
on A, show that p ± is.
6.72	Refer to Example 6.11(c) on page 362. For В G B2, let
A = {x G 7£ : (x,x) G B}.
a)	Provide a geometric interpretation of the relation between A and B.
b)	Prove that A G B.
6.73	Let A be Lebesgue measure and p be the Borel measure induced by the
Cantor function. Find a Borel measure, cv, such that ш _L A|g and и ± p.
6.74	Provide an example to show that the uniqueness condition in Theorem 6.9
fails without the requirement of mutual singularity.
★6.75 This exercise will be useful as motivation for the proof of the Radon-
Nikodym theorem. Let (Q,A, p) be a measure space and f G £1(/i) be
extended real-valued. Define
I/(A) = У fdn, AeA.
364 □ Chapter 6 Differentiation
As we have seen, v is a signed measure. Let D = { x: f(x) > 0 }.
a)	Show that (D, Dc) is a Hahn decomposition for и.
b)	Prove that for A 6 A,
i/+(A) = j f+dp and i/“(A) = j f~ dp.
6.76	Let (Q, A) be a measurable space and i/ a signed measure on A. Prove that
for each A G A,
y+(A) = sup{ i/(E) : E G A, E C A }
and
i/“(A) = -inf{i/(E) :Ec A, EC A}.
6.77	Let (Q,A) = (A/”, PlT^)) and	a sequence of real numbers such that
£Г=1 la"l < °°- Define v on P(AT) by v(A) =	“n-
a)	Prove that у is a signed measure.
b)	Determine i/+ and y~.
6.78	Let (Q,A) be a measurable space and i/i and 1/2 measures on A, at least
one of which is finite. Define у = 1/1 — 1/2- Prove that v+ < v\ and v~ <1/2-
6.6 THE RADON-NIKODYM THEOREM
Let (Q,A, p) be a measure space and и a measure on A. We want to
determine when у can be represented in the form
i/( A) = [ fdp, Ac A,
J A
for some nonnegative extended real-valued Л-measurable function f on Q.
As we have seen, a necessary condition for such a representation is that
p(A) = 0 whenever //(A) = 0. The Radon-Nikodym theorem shows that,
subject to 67-finiteness restrictions, that condition is also sufficient for such
a representation. In the following definition, we give that condition a name.
DEFINITION 6.12 Absolutely Continuous Measures
Let (Q, A) be a measurable space and p and v measures on A. Then
v is said to be absolutely continuous^ with respect to p, denoted
1/	/2, if ^(A) = 0 whenever /i(A) = 0.
t The reason for the term “absolutely continuous” will become apparent shortly.
6.6 The Radon-Nikodym Theorem □ 365
EXAMPLE 6.12 Illustrates Definition 6.12
a)	As we have already noted, if f is a nonnegative extended real-valued
Л-measurable function, then the measure
i/(A). = f fdp, A e Л,
is absolutely continuous with respect to д.
b)	Let v be the Borel measure on induced by the Cantor function. Then,
as Borel measures, no one of the measures <5q, v, and A, is absolutely
continuous with respect to one of the others; that is, <?o A, A i/,
and so forth.
c)	Let (Q, Д) = (7£,.Л/t). We have <5q	<5o 4* , but <5q + <$i	<$o-
d)	Let (Q, Д) = (7£,Л1). Define /(x) = 0 for x < 0, and /(x) = e~x for
x > 0. Set z/(A) = fAfdX for A e Л4. Then i/ < A, but A v. □
THEOREM 6.10 Radon-Nikodym Theorem
Let (£l,A,p) be a а-Hnite measure space and v a а-finite measure on Д.
If и p, then there is a nonnegative extended real-valued A-measurable
function, f, on SI such that
i/(A)= [fdp, AeA.	(6.35)
J A
Moreover, f is unique in the sense that if д is a nonnegative extended real-
valued A-measurable function with i/(A) = fAgdp for all A e A, then
9 = f ц-ае.)
Before proving the Radon-Nikodym theorem, let us consider the main
idea behind the proof. Suppose, say, that p is a finite measure on (Q, Л)
and that We want to show that (6.35) holds for an appropriately
chosen f. What would f have to look like?
Let а > 0 and note that и — ар is a signed measure. If f is the required
(but unknown) function, then
(р-сф)(Л) = У f d/i - сф(Л) =	- a) dfj..
t Regarding the ст-finite conditions, stronger versions of the Radon-Nikodym theorem
are available. See, for example, (19.27) in Hewitt and Stromberg’s Real and Abstract
Analysis (New York: Springer-Verlag, 1965), p. 318.
366 □ Chapter 6 Differentiation
By Exercise 6.75 on page 363, it follows that if Da = { x : /(x) > a }, then
(Da,Dca)is a Hahn decomposition for i/ — ap. Thinking now of x as fixed
and a as varying, we have x e Da if and only if a < /(x) and, therefore,
/(x) = sup{ a : a < f{x)} = sup{ a : x € Da }.
Thus the procedure for finding f will be essentially as follows: For
each a > 0, let (Da,D^) be a Hahn decomposition for v — a/z. Define
/(x) = sup{ a : x € Da } for x G Q. This should give a function, /, that
satisfies (6.35). We now present a formal proof of the Radon-Nikodym
theorem.
PROOF: We first assume that ц is a finite measure. For each positive
rational number, r, let (Dr, D£) be a Hahn decomposition for v—r/j,. Define
f on Q by
я/ 4 f sup{ r € Q : x e Dr }, if x € for some r;
(0,	otherwise.
Clearly f is a nonnegative extended real-valued function on Q. We
assert that f is Л-measurable. To prove that, it suffices to show that
/~1([а,оо]) E Л for each a e It. For a < 0, the inverse image is Q.
Hence, we can assume a > 0. We will show that for a > 0,
/-1([a,oo])= n(UP-)’	(6-36)
q<a 'r>q '
where, here and until specified otherwise, r and q denote positive rational
numbers.
From the definition of /, we have that /-1((g, oo]) = \Jr>q^r, for
each q. Because [a, oo] = Пд<а(<7} oo], it follows that
oo]) = P] /-1((g,oo]) = Q ({J Dr\
q<ot	q<ot 'r>q '
and so (6.36) holds.
Since Q is countable and Dr 6 A for each r, it follows from (6.36) that
/-1 ([a, oo]) E A for each a > 0. Thus, f is Л-measurable.
We next show that f satisfies (6.35). To that end, let A e A and, for
each pair of rational numbers, a and /?, with 0 < a < /?, define
E = {x e A i a < f(x) < /?}.
6.6 The Radon-Nikodym Theorem □ 367
We claim that
a/z(E) < v(E) < /Зц(Е).	(6.37)
We begin by establishing the first inequality in (6.37). If a = 0,
that inequality is trivial; consequently, we assume that a > 0. By (6.36),
E C {x : f(x) > a} C |Jr>gZ)r f°r eac^ Я < a- If we can show that,
for each q < a, the latter set is positive for p — q/z, then we will have
(p — g/z)(F?) ^at is, p(-E) > qy(E), for each q < a, from which it
follows that u(E) > ац(Е).
So, suppose r > q and let F C Dr. Because Dr is positive fbr p — r/z,
we have
0 < (i/ - r/i)(F) = i/(F) - rju(F) < 1/(F) - g/z(F) = (1/ - g/x)(F).
Hence, (p — g/z)(F) > 0 for F C Dr and, consequently, Dr is positive for
p — qfjL. Lemma 6.7 now implies that |Jr>q is positive for p — g/z, as
required.
To establish the second inequality in (6.37), we first note that by defi-
nition, if f(x) < (3, then x £ Dp or, in other words, { x : f(x) < /3} C Dp.
Because Dp is negative for p — /3/z and E C {x : /(x) < (3 } C Dp, it follows
that (p — (3ijl)(E) < 0. That is, p(F-) < /Зц(Е). We have now shown that
(6.37) holds.
To continue, we need to consider where f is infinite on A. To that
end, let H = {x : f(x) = oo }. We will show that if /z(A ПЯ) > 0, then
p(A П H) = oo. In doing so, we will use the already established fact that,
for each q, (Jr>g is positive for p — g/z.
Using the definition of f, we see that if f(x) = oo, then for each q,
there is an r > q such that x 6 Dr. Thus H C Ur>q an^’ hence, H is
positive for p — q/z. Consequently, for each q, (v — q/y(A АЯ) > 0, that is,
p(A A H) > g/z(A П H). -Hence, if /z(A АЯ) > 0, then p(A A H) = oo.
Now, assume that /z(A A H) > 0. Then p(A A H) = oo and, hence,
p(A) = oo. On the other hand, because /z(A A H) > 0 and f = oo on H,
I f dp, > I fdii = oo.
J A	JADH
Thus, if /z(A АЯ) > 0, then both sides of (6.35) equal oo.
So, assume that /z(A A H) =0. Then, because p /z, we have
p(A А Я) = 0. For each n G Af, set
A	f A “ 1 - Г/ A
An,k — S X G. A '.	< f (x) <
In	n
368 □ Chapter 6 Differentiation
for к = 1, 2, .... Then A = Ап,к UlAflH) and, since the An,fc’s are
pairwise disjoint and v(A A H) = 0, it follows that i/(A) =	^(An,fc).
By (6.37),
-—-ц(АП'к) < v(An<k) < —fj.(An<k)
n	n
and, from the definition of An,fc, we conclude that
-—-ц(Ап,к) < f fdfj,< -p.(An,k)-
n	JAn,k	n
Therefore, for 6 X
[ fdfi - -/z(An,fc) < v(An,k) < [ f dp+-jjL(An,k).
J a l.	n	Ja l	n
Recalling that ц(А П H) =0, we obtain upon summing on к that, for
each neJJ,
f fdii--^A)<v{A'}< i f dp +—ц(А).
J a	n	Ja n
Since we are assuming /1 is finite, /z(A) < °0? and therefore, letting n —> oo
in the previous display, we get that
[ fdp<v(A)< [ fdp.
J A	J A
So again, (6.35) holds.
Suppose now that /z is a а-finite measure. We can write fi as a count-
able disjoint union of Л-measurable sets, {Fn}n, where p^En) < oo and
i/(En) < oo for each n € X. (See Exercise 6.83.) Let (Еп,Лбп,М^п) be as
usual, and set /zn = щЛЕп = цЕп and vn = ^AjEn.
We have /J>n(En) =	< oo and, since i/ //, we have vn Mn-
Hence, by what we have proved for finite measures, there is a nonnegative
AEn-measurable function gn such that
^n(-B) == / gndpn, В G AEn-
J в
6.6 The Radon-Nikodym Theorem □ 369
For each n e Af, define fn on Q by fn(x) = gn(x) if x G En, and
fn(x) = 0 otherwise. Then f = fn is a nonnegative Л-measurable
function. Moreover, if A e Л,
= ^v{Ar\En) = "^vn(Af\En) = 52 [ • 9ndfin
n=l	71=1	71=1
= 52 / fn & = 52 [ fdn=[fdii.
n=l J A(~}En	n=l АПЕп	J A
Thus, (6.35) holds.
It remains to prove uniqueness. So suppose that g is also a nonnegative
extended real-valued Л-measurable function on Q such that
z/(A) = gdfjL, Ae A.
J A
We must show that g = f //-ae.
Let E = {x : f(x) > g(x)}. We claim that //(Е) = 0- Let {En}n be
the sequence of sets defined in the preceding. We have, for each n G Af,
/ f dp, = I g d{i = i/(E П En) < oo.
J E(~\En	J ЕГ\Еп
Thus, f and g are integrable over E П En with respect to /л and, conse-
quently, so is f — g. Moreover,
I (/-fz)^M= / fdn- [ gdfj. = 6.
J E(~\En	J E(~\En	J ЕГ\Еп
Since f — g > 0 on E and, hence, on E A En, we must have /л(Е A En) = 0.
Consequently, /л(Е) = /л(Е A En) = 0. A similar argument shows that
{x : g(x) > f(x) } has //-measure zero. Therefore, g — f /z-ae.	
DEFINITION 6.13 Radon-Nikodym Derivative
The function f given in the statement of the Radon-Nikodym theorem
is called the Radon-Nikodym derivative of и with respect to /z and
is denoted by du/dfi.
Remark: The Radon-Nikodym theorem shows that the Radon-Nikodym
derivative is determined only up to sets of //-measure zero.
370 □ Chapter 6 Differentiation
EXAMPLE 6.13 Illustrates Definition 6.13
a)	Let a be a positive constant. Define F(x) = 1 — for x > 0, and
F(x) = 0 otherwise; and let v denote the unique Borel measure induced
by F. Define the Borel measure ш by
= [ fdX, BeB,
Jв
where f(x) = ае~ах for x > 0, and f(x) = 0 otherwise. Then it is
easy to see that ш has F for its distribution function and, so, w = v.
Hence и С Л and dv/dX = f A-ae. Note that we also have, for example,
dv/dX = g A-ae, where g(x) = ае~ах for x > 0, and g(x) = 0 otherwise,
because g = f A-ae.
b)	Let p be the measure on	defined by p(A) = 7(AnV), where
7 is counting measure on P(7£). Consider the measure и = 522L1 2~n6n
on P(7£), that is,
neArW
If p(A) = 0, then A nV = 0 and, therefore, v(A) = 0. Hence, и <£ p.
Let f(x) = 1/2* if x E V, and f(x) = 0 otherwise. We claim that
dv/dp = f p-ae. Indeed, if A 6 P(?i), then
f dp = I f dp + I f dp = [ f dp
J A JacW JAC\Afc JAO!^
-	oo -	oo
= / fXA dp=\^ fXA dp = V J(n)xA(n)g({n})
=	52 ^=VW-
n=l	пЕАПЛГ
Note that a nonnegative function g on 1Z is a Radon-Nikodym derivative
of v with respect to p if and only if p = / on V.
c)	Let (Q, Л, P) be a probability space and E e A such that P(E) > 0.
Recall that the conditional probability measure, Pe, corresponding to E
is defined by PE(A) = P(A | E) = P(E A A)/P(E). Clearly PE < P.
A Radon-Nikodym derivative for PE with respect to P is xe/P(E}
because, for each A e A,
6.6 The Radon-Nikodym Theorem □ 371
d)	Refer to Example 5.5(c) on page 276. Let X be an absolutely continu-
ous random variable on (П, Д, P) with probability density function fx-
Then /zx C A (as Borel measures). Moreover, since by definition, fx is a
nonnegative Borel measurable function such that px(B) = fx dX for
В € Б, fx is a Radon-Nikodym derivative of /zx with respect to A. □
PROPOSITION 6.8
Let (Sl,A,p) be a а-finite measure space and у a а-finite measure on A
such that и /z. If g E £x(i/), then E £x(/z) &nd
/ gdv = [ g^-d/j..
Jq Jn dp
PROOF: Exercise 4.61(b) on page 191 shows that the proposition is true if
g is nonnegative. For real-valued p, write g = g+ —g~ and apply the result
for nonnegative functions twice. For complex-valued g, write g = 3?# + i^g
and apply the result for real-valued functions twice.	
A Relation Between Absolutely Continuous Functions
and Absolutely Continuous Measures
In Section 6.4 we discussed absolutely continuous functions and, in this
section, we discussed absolutely continuous measures. A relation between
the two concepts is expressed in the following proposition.
PROPOSITION 6.9
Let у be a finite Borel measure and Fy its distribution function. Then у is
absolutely continuous with respect to Lebesgue measure if and only if Fv is
absolutely continuous on TZ. In this case, dy/dX = F'v A-ae.
PROOF: Suppose у A. Then, for В E B, y(B) = JB(dy/dX) dX. In
particular, for В = (—оо,ж],
f„(x) = р((-оо,х|) =	£ dX = j_x £(«) dt.
Because Fp(oo) =	< oo, it follows that dv/dX € £1(7^). Conse-
quently, by Proposition 6.4(b) on page 344, Fy is absolutely continuous
on TZ and, by Corollary 6.2 on page 341, F' = dy/dX A-ae.
372 □ Chapter 6 Differentiation
Conversely, suppose that Fy is absolutely continuous on TZ. Then, by
definition, F' € £x(7£) and Fy(x) = Fy(t) dt, -oo < x < oo. Define
cu(B) = [ F'ydX, BeB.
J в
Then cu has Fy as its distribution function and, consequently, by Theo-
rem 4.13 (page 226), lu = v. It follows that у < A and dv/dX = Fy A-ae. 
Conditional Probability Given a cr-Algebra
Let us recall the definition of conditional probability from Section 5.1:
Suppose that (Q, Д, P) is a probability space and F is an event having
positive probability, that is, F e A and P(F) > 0. Then, for E e A,
the conditional probability of event E given that event F has occurred is
defined by
P(E|F)- P(F) '
We can generalize the notion of conditional probability by conditioning
on a cr-algebra instead of just an event. The idea is that if Q is a cr-algebra
with Q C A, then conditioning on Q means that we know whether or not
each G E G has occurred; and the conditional probability of an event E,
given G, denoted P(E\G), is the probability of E computed with that
knowledge.
To see how to define P(E | (7), we consider the simplest nontrivial case.
Suppose that F is an event with probability strictly between 0 and 1. Let
G be the cr-algebra generated by F, that is, the smallest cr-algebra con-
taining F; clearly, G = {0, F, FC,Q}. Then, given G, we know whether
or not F has occurred, that is, whether F has occurred or Fc has oc-
curred. In the former case, P(E | G) = P(E | F) and, in the latter case,
P(E | G} = P(E | Fc). In other words,
P(E | G) = P(E | F)Xf + P(E | Fc)Xfc
Note that P(E|(7) is not only a random variable (i.e., is Л-measurable),
but is in fact (/-measurable. Furthermore, it is not too difficult to see that
P(GnE) = [ P(E\G}dP, GeG-	(6.38)
Jg
We can use the Radon-Nikodym theorem to show the existence of
conditional probability given a cr-algebra in the general case. Specifically,
with (6.38) in mind, we have the following proposition.
6.6 The Radon-Nikodym Theorem □ 373
PROPOSITION 6.10 Existence of Conditional Probability
Let (Q, Л, P) be a probability space, E e A, and Q a а-algebra with Q C A.
Then there exists a nonnegative (/-measurable function, P(E | G), such that
P(GnE) = [ P(E\G)dP, GeG-	(6.39)
Jg
Moreover, such a function is unique P-ae and is called the conditional
probability of E given Q.
PROOF: Define pe(G) = P{G П P), for G G G- Then pe is a finite
measure on G and /ле < P. The result now follows from the Radon-
Nikodym theorem.	
We will investigate further properties of conditional probability given
a cr-algebra in the exercises.
EXERCISES 6.6
6.79	This exercise provides an alternative for the definition of absolute continu-
ity of measures. Let (Q, Л) be a measurable space and p and и measures
on A with и finite.
a)	Prove that a necessary and sufficient condition for v p is that for
each c > 0, there is a 6 > 0 such that v(A) < e, whenever A E A
and p(A) < 6. Hint: For the necessity part, suppose to the contrary
that there is an c > 0 such that for each 6 > 0, there is an A E A with
p(A) < 6 and i/(A) > c. For each n e AT, let An correspond to 6 = 2“n.
b)	Show that weakening the finiteness condition on и to <7-finiteness inval-
idates the necessity portion of part (a).
6.80	Let v be a finite Borel measure and Fu its distribution function. Provide
an alternate proof of Proposition 6.9 (page 371) without relying on the
Radon-Nikodym theorem, but instead by using Theorem 6.6 (page 350),
Theorem 6.7 (page 350), and Exercise 6.79.
6.81	Find two measures, p and u, such that v <£ p, p v, and p JL v.
6.82	Suppose that p and v are measures on (Q, A) such that p ± v and p <C v.
What can you say about p?
6.83	Let (Q,A) be a measurable space and p and v cr-finite measures on A.
Show that Q can be written as a countable disjoint union of Д-measurable
sets, {En}n, where p(En) < oo and v(En) < oo.
6.84	Let Q be a nonempty set and p counting measure on P(Q). Suppose v is
a discrete measure on P(Q), that is, there is a countable set К C Q such
that v(Kc) = 0.
a)	Show that v <C p.
374 □ Chapter 6 Differentiation
b)	Find du/dp.
c)	Are the hypotheses of our version of the Radon-Nikodym theorem (The-
orem 6.10 on page 365) necessarily satisfied in this problem?
6.85	Define v on (7?,,Л4) by i/(A) = Л(АП [—2,2]). Show that i/ <£ A and
find du/dX.
6.86	Let /x belhe measure on P(7£)) defined by /x(A) = ClAf), where
7 is counting measure on	Let	be a sequence of nonnegative
real numbers and set v =	an8n. Show that v p and find du/dp.
6.87	Refer to Example 5.5(a) on page 276. Let X be a discrete random variable
on (Г2,Л, P) with probability mass function px-
a)	Show that px p, where p is counting measure on (7£,B).
b)	Prove that px is the unique Radon-Nikodym derivative of px with
respect to p.
6.88	Provide an example showing that the cr-finiteness of p cannot be dropped
as an hypothesis in the Radon-Nikodym theorem.
6.89	Let (Q, Л, p) be a a-finite measure space and i/i and 1/2 cr-finite measures
on A. Assume ui p and 1/2 P- Prove that 1/1 + P2 < M and
d(i/i + У2)	du\	dv2
----1-------	M-ae.
dp	dp	dp
6.90	Let (Q, Д) be a measurable space and u, 1/, and p cr-finite measures on A
such that w v p- Show that w p and
dcu _ dw du
dp du dp ae*
6.91	Let (И,Л) be a measurable space and p and и сг-finite measures on A such
that p <C и and v p. Prove that
du I dp
— =1 / —	p-ae.
dp	/ du
6.92	Suppose that p and v are two cr-finite Borel measures. Recall that the
convolution of p and v is the measure defined by
(д*1/)(В)= [ p(B — y)du(y), Be в.
Jn
a)	Show that if p A, then p * и A and
!^x~^dv^ хея-
Hint: Refer to Exercises 4.157 and 4.158 (page 256).
6.6 The Radon-Nikodym Theorem □ 375
b)	Show that if both /x and и are absolutely continuous with respect to
Lebesgue measure, then
fn^x~v^dx^’ x^-
In words, the Radon-Nikodym derivative of the convolution of two mea-
sures that are absolutely continuous with respect to Lebesgue measure
is the convolution of their Radon-Nikodym derivatives.
In Exercises 6.93-6.100, (£l,A,P) is a probability space and Q is a a-algebra of
subsets of Q with Q C A.
6.93	Suppose that F is an event with probability strictly between 0 and 1. Let
Q be the cr-algebra generated by F, that is, Q — {0,F, FC,Q}. Define
P(E | £) = P(E | F)Xf + P(E | Fc)Xfc .
a)	Show that P(E | Q) is ^-measurable.
b)	Prove that for each G 6 Q, P(G П E) = fG P(E | £) dP.
6.94	Suppose that {Fn}n is a sequence of pairwise mutually exclusive events
each having positive probability and such that (J Fn = Q. Let Q be the
a-algebra generated by {Fn}n.
a)	Characterize the sets in Q.
b)	Prove that, with probability one, P(E | Q) = P(E | Fn)xrn.
6.95	Suppose that E eg.
a)	What does your intuition tell you regarding P(E | <7)?
b)	Prove your assertion in part (a).
6.96	Suppose that g = {Q,0}.
a)	What does your intuition tell you regarding P(E | £7)?
b)	Prove your assertion in part (a).
6.97	Establish that each of the following hold with probability one.
a)	P(Q|0) = 1.
b)	For each E e A, P(E | g) > 0.
c)	If Ei, E2, ... are in A, with Ei П Ej = 0 for i j, then
P(U£"
x n	' n
*6.98 Conditional probability given a random variable: An important
case of conditional probability given a a-algebra is when the cr-algebra is
generated by a random variable, X. The cr-algebra generated by X, de-
noted Л(Х), is by definition the smallest cr-algebra of subsets of Q for which
X is measurable. We define the conditional probability of E given X,
denoted P(E | X), to be the conditional probability of E given Л(Х); that
is, by definition, P(E | X) = P(E | Л(Х)).
a)	Show that Л(Х) = { {X 6 В} : В e В].
376 □ Chapter 6 Differentiation
b)	Prove that there is a nonnegative Borel measurable function, </>, such
that P(E | X) = ф о X, P-ae.
c)	Let ф be the function in part (b). For x G set P(E | X = x) = ф(х),
called the conditional probability of E given X = x. Prove that
P({XgB}DE) = I P(E\X = x)dpx(x), beB,
J в
where fix is the probability distribution of X. Hint: Use Theorem 5.6
on page 291.
d)	Prove that if g is a nonnegative Borel measurable function such that
P({XeB}DE) = J g(x)dfix(x), BeB,
then, for /ix-almost all я, g(x) = P(E | X = x).
6.99	Refer to Example 5.7(a) on page 279. Suppose X and Y are jointly discrete
random variables with joint probability mass function px,Y- Define
PY\x(y\x) =
' Рхх(х,У)
< Px(x)
0,
px(x) > 0;
otherwise.
a)	Prove that for each у G 7£,
P(X = y\X = x)=pY[x(y\x)
for each possible value, x, of X. Hint: Use Exercise 6.98(d).
b)	Determine Р(У = у | X).
6.100	Refer to Example 5.7(b) on page 279. Suppose X and Y are jointly ab-
solutely continuous random variables with joint probability density func-
tion fx,Y- Define
fY\x(y\x) =
' fxx(x>y)
' fx(x)
0,
fx(x) > °;
otherwise.
a) Prove that for each C G B,
P(Y eC\X = x
fY\x(y\x)dy
for /zx-almost all x. Hint: Use Exercise 6.98(d).
b) Determine Р(У G С | X).
6.7 Signed and Complex Measures □ 377
6.7 SIGNED AND COMPLEX MEASURES
In Section 6.5 we introduced the concept of a signed measure. Now, in
this section, we will further investigate signed measures and also introduce
the concept of complex measures. Recall that if (fl, A) is a measurable
space, then a signed measure, z/, on A is an extended real-valued function
satisfying the following two conditions:
•	i/(0) = 0.
•	If Ai, A2, ... are in A, with Ai = for i ± j, then
р(иЛп) =
' n ' n
For this definition to make sense, v cannot take on both 00 and —00 as
values.
We proved the Jordan decomposition theorem—that any signed mea-
sure v can be expressed uniquely as the difference of two mutually singular
measures, z/+ and z/“, on A. The representation z/ = z/+ — v~ is called the
Jordan decomposition of v. In fact, if (Z>, Dc) is a Hahn decomposition
for z/, then
z/+(A) = z/(A A Z>) and z/“(A) = — v(A A Dc).
Now we will give names to the measures z/+ and z/~ and define yet
another measure corresponding to a signed measure.
DEFINITION 6.14 Variations of a Signed Measure
Suppose that (Q, A) is a measurable space and that z/ a signed measure
on A with Jordan decomposition v == z/+ — z/“. Define
|z/| = i/+ 4- y~.
The measures z/+, z/~, and |z/| are called, respectively, the positive
variation, negative variation, and, total variation of v. Note
that |z/| is a measure, and |z/| = v if v is a measure.
Before proving our next result, we introduce the following terminology.
378 □ Chapter 6 Differentiation
DEFINITION 6.15 Measurable Partition
Let (Q, Л) be a measurable space and A E A A finite sequence,
{А*}£=1, of subsets of Q is said to be a measurable partition of A
if the AkS are Л-measurable, pairwise disjoint, and their union is A.
That is,
a) Afc e Л, fc = 1, 2, ..., n,
b) Ai nAj = $ for i / j, and
c) UZ=1A = A.
The next proposition shows that the concept of the total variation of
a signed measure is similar to that of the total variation of a function.
Exercises 6.103 and 6.104 further explore the analogy.
PROPOSITION 6.11
Let (Q, Л) be a measurable space and v a signed measure on A. Then, for
each A € A,
{71	'
|p(Afc) | : {Лй}£=1 is a measurable partition of A > .
fc=i
PROOF: Let {А^}^-! be a measurable partition of A. Then, because |l/| is
a measure, |i/|(A) =	|i/|(Afc). From Exercise 6.101(b), we know that
МИ*) > ИЛ01- Therefore, |p|(A) > |iz(Afc)| and, so,
{n	'
У2 HAfc)| : {^fc}fc=i is a measurable partition of A > .
fc=i
To prove the reverse inequality, let (Z>, Dc) be a Hahn decomposition
for i/, and set Ax = АПР and A2 = AC\DC. Then {Ai, A2} is a measurable
partition of A and we have p(Ai)| + |iz(A2)| = ^+(A) + i/“(A) = |i/|(A).
Hence,
{71	'
У2 |^(Afc)| • Mfc}fc=i is a measurable partition of A ►.
fc=i
This completes the proof.	
Next we define the abstract Lebesgue integral of a measurable function
with respect to a signed measure. It should be clear that the following
definition is reasonable and natural.
6.7 Signed and Complex Measures □ 379
DEFINITION 6.16 Integral with Respect to a Signed Measure
Suppose that (Q, Л) is a measurable space and that v a signed measure
on A with Jordan decomposition v = — v~. Let E G A and f be an
extended real-valued or complex-valued Л-measurable function on Q.
Then the (abstract) Lebesgue integral of f over E with respect
to 1/ is defined by
[ fdv = [ f dv+ - [
e Je Je
f du ,
provided the right-hand side makes sense.
EXAMPLE 6.14 Illustrates Definition 6.16
Let (Q, Л) = (7£, A4) and и = Л — 6q — 6i. Then = A and u~ = 60 -F <5^
Let f(x) = x3 + 2. We have
and	*
[ fdu = [ (x3 + 2)dx- [ (z3 + 2) d((50 + <5i)(z)
•/[0,1] Jo	«/[0,1]
= J-'2 + 3> = -T'
as the reader should verify.	□
Complex Measures
Here now is the definition of a complex measure.
DEFINITION 6.17 Complex Measure
Let (Q, Л) be a measurable space. A complex measure, i/, on A is
a complex-valued countably additive set function; that is,
380 □ Chapter 6 Differentiation
a)	z/(A) G C for all A € A.
b)	If Ai, A2, ... are in A, with Ai П Aj = 0 for i j, then
р(иЛп) =
' n ' n
Remark: Because, for a complex measure, v(A) G C for each A G A, we see
that there are no sets of infinite v measure. It follows easily from countable
additivity that p(0) = 0.
EXAMPLE 6.15 Illustrates Definition 6.17
a)	Any finite measure or any finite signed measure is a complex measure.
b)	Let (Q, A) be a measurable space and у a complex measure on A. De-
fine, for A G A,
(SRp)(A) = 3?(i/(A)) and (9p)(A) = 3(p(A)).
Then it is easy to see that Jiz/ and 3т/ are finite signed measures on A
and у = SRz/ + i^y.
c)	Lebesgue measure is not a complex measure on AL (Why?)
d)	Let (Q, A) be a measurable space and 1 < к < 4, finite measures
on A. Define
у = i/i — z/2 + ^2 — 1^4,
that is, i/(A) = z^i(A) — У2 (A) -j- iy% (A) — 21/4 (A) for A G A. Then у is
a complex measure on A. As we will see shortly, all complex measures
are of this form.
e)	Let (fl, A,/z) be a measure space and f G Define
p(A) = [ f dp, A G A.
J A
Then, у is a complex measure on A.	□
Using the Jordan decomposition theorem, we can easily establish the
following theorem. Its verification is left to the reader as an exercise.
THEOREM 6.11
Let (Q, A) be a measurable space and у a complex measure on A. Then
there exist unique measures, y+, у±, у^ y2 > suc^ yi y2 y2 >
and i/ = i/+ — y~ 4- iy+ — iy~,
Next we want to define the total variation, |zz|, of a complex measure y.
In view of Proposition 6.11 (page 378), we make the following definition.
6.7 Signed and Complex Measures □ 381
DEFINITION 6.18 Total Variation of a Complex Measure
Let (fl, A) be a measurable space and v a complex measure on A. For
each A e A, define
{n
|^(Afc)| : {Afc}JJ=1 is a measurable partition of A > .
k=l
The set function |z/| is called the total variation of v.
Because of Proposition 6.11, Definition 6.18 is consistent with the def-
inition of total variation of signed measures. The next proposition shows
that, as is the case for signed measures, the total variation of a complex
measure is a measure.
PROPOSITION 6.12
Let (Q, A) be a measurable space and v a complex measure on A. Then
|zz| is a finite measure on A.
PROOF: It follows from Definition 6.18 and Theorem 6.11 that
|p|(A) < ^+(A) + ^(A) + p2+(A) + p2-(A)
for all A E A. Since the right-hand side of the preceding inequality is finite,
we see that |p|(A) < сю for all A E A.
Clearly, 11,|(0) = 0 and |i/|(A) > 0 for A E A. Now, let {An}n be a
pairwise disjoint sequence of A-measurable sets. We can, without loss of
generality, assume that the sequence is infinite. Set A = iXXi Ai- We
claim that
M(A) = f>|(An).
n=l
Let а < |p|(A). Then there is a measurable partition of A, say,
such that а <	|i/(Efc)|. But
m oo
<££№пап)|
fc=l n=l
oo m	oo
= 52^№пап)|<52м(>1п),
n=l k=l	n=l
382	□ Chapter 6 Differentiation
where the last inequality holds because, for each n, {E^ П	is a
measurable partition of An> Consequently, we have |^|(An) > a for
each a < |p|(A). It now follows that |i/|(A) < 1И(An)-
Next we prove the reverse inequality. Let 6 > 0 be given. For
each n E ЛС we can choose a measurable partition {Еп^}^2_х of An such
that |^(-Enfc)| > \v\(An) - e/2n. Let N e AT be fixed but arbitrary.
Then {Enk : 1 < к < kn, l<n<7V}isa measurable partition of
Un=i An- Using Exercise 6.107, we obtain that
7 N	X	N kn
'n=l '	n=l fc=l
N	NN
> £ (m (a.) ^) = 5>i<a„) - £
n=l	n=l	n=l
Letting N —* oo gives |^|(A) > IH(An) — As 6 > 0 was chosen
arbitrarily, we have J2^LX |i/|(An) < |p|(A).	
Radon-Nikodym Theorem for Complex Measures
Next, we will prove the Radon-Nikodym theorem for complex measures.
We begin with the following obvious extension of the definition of absolute
continuity.
DEFINITION 6.19 Absolutely Continuous Complex Measures
Let (Q, A, p) be a measure space and v a complex measure on A. Then
v is said to be absolutely continuous with respect to /z, denoted
и д, if p(A) = 0 whenever //(A) = 0.
We have noted previously that if f e £X(Q, A, /z), then the set function
. i/(A) = у f dp, Ae A,
is a complex measure and, clearly, v < /z. The following generalization
of the Radon-Nikodym theorem shows that, under cr-finite conditions, the
converse is true.
6.7 Signed and Complex Measures □ 383
THEOREM 6.12 Radon-Nikodym Theorem for Complex Measures
Let (£1, A, /z) be a cr-finite measure space and v a complex measure on A.
If v p, then there is a function f 6 С1 (£1, A, p) such that
p(A) = J f dp, A e A.	(6.40)
Moreover, f is unique in the sense that if g is an A-measurable function
with p(A) = fAgdp for all A € A, then g = f p-ae,
PROOF: First we assume that v is a finite signed measure. Let (D, Dc) be
a Hahn decomposition for p, and let and v~ be the positive and negative
variations of v. Because v is finite, so are and v~.
Suppose p{A) = 0. Then p(AnD) = 0 and, therefore, because v p,
we have p+(A) = у (A A D) = 0. Hence, i/+ p. Similarly, v~ C p.
Applying the Radon-Nikodym theorem (Theorem 6.10 on page 365), we
conclude that there exist nonnegative Л-measurable functions, /i and /2,
such that for each A € A,
p+(A) = / fadp and i/“(A)= / /2 dp.
J A	J A
Note that /i,/2 € £x(/z) because and v~ are finite measures. Letting
/ = /1 — /2, we see that f E £х(/х) and that (6.40) holds.
To prove uniqueness, suppose that g is an Л-measurable function such
that v(A) = fAgdp for each A E A. Since v is a finite measure, g E £1(/x)
and, hence, so is f — g.
Now, let E = { x : f(x) > g(x) }. We have
[ (f-g)dfi= f f dp, — [ gdp, = v(E) - i/(E) = 0.
Je	Je Je
Because f — g > 0 on E, it must be that p(E) = 0. Hence, f < g p-a&.
Similarly, f > g p-ae. Consequently, g = f p-ae.
Now suppose that и is a complex measure. Applying what was just
proved to the finite signed measures SRp and we find that there are
functions gi,g2 € ^(p) such that for each A E A,
(SRi/)(A) = [ g\dp and (£h/)(A) = [ g2dp.
J a	J a
Letting f = gi + ig2 yields a function in C\p} such that (6.40) holds.
Uniqueness follows easily from the uniqueness established in the preceding
for finite signed measures.	
384 □ Chapter 6 Differentiation
Conditional Expectation Given a a-algebra
Let (Q, A, P) be a probability space and F an event having positive prob-
ability, that is, F G A and P(F) > 0. Recall that the set function, P/?,
defined by
Pf{E) = P(E| F) =	E G Д
is a probability measure on A. Hence, we can define expectation with
respect to P/?. This is called conditional expectation relative to F.
Thus, if Y e £1(О,Л, P), then the conditional expectation of Y given
that event F has occurred is defined by
£(У|Р) = [ Y dPF.
Jn
According to Exercise 5.84(a) on page 300,
£(ylF)=p<w/‘ip-
We can generalize the notion of conditional expectation by condi-
tioning on a a-algebra instead of just an event. The idea is that if Q is
a a-algebra with Q С Л, then conditioning on Q means that we know
whether or not each G G G has occurred; and the conditional expectation
of a random variable У, given denoted £(Y | £), is the expectation of Y
computed with that knowledge.
To see how to define £(У | £), we consider the simplest nontrivial case.
Suppose that F is an event with probability strictly between 0 and 1.
Let Q be the a-algebra generated by F, that is, the smallest a-algebra
containing F; clearly, Q = { 0, F, Fc, Q }. Then, given Q, we know whether
or not F has occurred, that is, whether F has occurred or Fc has occurred.
In the former case, we have £(YfG) = £(У|Р) and, in the latter case,
£(У | G) = £(У | Fc). In other words,
£(y|^) = £(y|F)XF + £(y|Fc)XFc.
Note that £(Y | Q) is not only a random variable (i.e., is Л-measurable),
but is in fact ^-measurable. Furthermore, it is not too difficult to see that
f YdP= [ £(Y\6)dP, Ge в.	(6.41)
Jg Jg
We can use the complex version of the Radon-Nikodym theorem to
show the existence of conditional expectation given a a-algebra in the gen-
eral case. Keeping (6.41) in mind, we have the following proposition.
6.7 Signed and Complex Measures □ 385
PROPOSITION 6.13 Existence of Conditional Expectation
Let (Q, A, P) be a probability space, Y G £*(Q, A, P), and G a a-algebra
with G C A. Then there exists a G-measurable function, £(Y\G), such
that
[ Y dP = [ £(Y\G)dP, r GeG-	(6.42)
Jg Jg
Moreover, such a function is unique P-ae and is called the conditional
expectation of Y given G>
PROOF: Define py(G) = fGY dP, for G € G- Then py is a complex
measure on G and py < P. The result now follows from the complex
version of the Radon-Nikodym theorem.	
Although it follows trivially from (6.42), it is important to note that
£(£(Y|(7)) = SY. We will investigate further properties of conditional
expectation given a cr-algebra in the exercises.
EXERCISES 6.7
6.101 Let у be a signed measure on (Q,A) and у = y+ — y~ its Jordan decom-
position. Show that for A G A,
a) -v-(A) < y(A) < y+(A)
b) \y(A)\ < \y\(A)
6.102	Let У1 and У2 be finite signed measures on (fl, A). Prove that
l^i + ^1 < Ы + Ы;
that is, 11/1 + 1/2KA) < 11/11 (A) + 11/21 (A) for each A G A.
6.103	Let f be an extended real-valued Borel-measurable function, integrable
with respect to Lebesgue measure. Then y(B) = fB f dX is a finite signed
measure on B. Define F„(x) = i/((—00, a?]) =	f(t) dt. Prove that
|iz|((a,fe]) = VakF„,	—oo < a < b < oo.
6.104	Exercise 6.103 can be generalized: Let у be a finite signed measure on B.
Define Fu(x) = y((—00, я]) for x G TZ. Then, it can be proved that
|i/|((a, b]) = V^Fy, —00 < a < b < 00.
Show that the preceding equation does not necessarily hold for other types
of intervals, even if у is a measure.
6.105	Suppose that (Q,A) is a measurable space and that у a signed measure
on A with Jordan decomposition у = y+ — y~.
a) Show that f G £1/|i/|) if and only if f G L1(y+) О £1(i/“).
386 □ Chapter 6 Differentiation
b)
Suppose that f is Л-measurable and \f\ < M on Q.
Prove that for
each E G A,
< M- \y\(E).
6.106	Prove Theorem 6.11 on page 380.
6.107	Let (Q, A) be a measurable space and у a complex measure on Л. Using
only Definition 6.18 on page 381, prove that |i/| is monotone; that is, if
A, В e A and A С B, then |i/|(A) < |i/|(B).
6.108	Let (Q, A) be a measurable space and i/ a complex measure on A.
a)	Prove that |i/(A) | < |i/| (A) for each A G A.
b)	From part (a), we have |i/(A)| < |i/|(A) for each A G A. Prove that
|i/| is the smallest measure dominating у in that sense; that is, if т is a
measure on A such that |i/(A)| < r(A)Tor each A G A, then |i/| < r.
6.109	Let (Q, A) = (7£,P(7£)) and у — 6o 4- гбо. Determine |i/|.
6.110	Recall that the total variation of a signed measure, i/, is |i/| = i/+ + i/”,
where у =	— i/“ is the Jordan decomposition of y. According to Theo-
rem 6.11, a complex measure can be uniquely decomposed into four mea-
sures as i/ =	— i/f 4- Щ j where y+ -L ui an<^ u2 Show
that it is not generally true that |i/| equals y+ + yj~ + yf 4-
6.111	Let (Q, A) be a measurable space. Prove the following facts.
a)	If v is a complex measure on A and a G C, then |ai/| = |a||i/|.
b)	If i/i and V2 are complex measures on A, then |i/i 4-1/2! < |^i| 4- |i/21-
★6.112 Let (Q, A) be a measurable space. For each complex measure, 1/, on A,
define ||i/|| = |i/|(Q). Prove that
a)	|| 1/1 4- У21| < H^i || 4- || 1/21| for all complex measures 1/1 and 1/2 on A.
b)	||cti/|| = I ct 11| ИI for all complex numbers a G C and all complex mea-
sures v on A.
c)	||i/|| = 0 implies that v = 0.
6.113	Let v be a complex measure on (Q, A) and у = 1/+ — i/f 4- iy% — ^2
the decomposition of 1/ given by Theorem 6.11 on page 380. Show that
f G if and only if f G	) A n
6.114	Let у be a complex measure on (Q, A). If f G £1(|i/|), define
/ fdy= / f dy+ —If dvi + i I f dvz —if dy^.
Jn Jn Jn Jn Jn
(Exercise 6.113 shows that the right-hand side of the foregoing equation
makes sense.) Prove the following:
a)	If /, g G £1(|i/|), then
/ (/ + 9) dy = / f dy + / g dy.
Jn	Jn Jn
6.7 Signed and Complex Measures □ 387
b)	If a E C and f E then
★6.115 Refer to Exercise 6.114. Let (Q, Л,/z) be a measure space and f E £1(/z).
Define
fdp, A E A.
Prove that for <? E and A E Л,
6.116 Let (Q, A,p) be a measure space and f E ^(p). Define

z/i
A E A.
Prove that
М(Л) =
A E A.
Hint: To prove that |i/|(A) > fA\f\ dp, choose a sequence {sn}^! of
A-measurable simple functions such that sn —► xa • sgnj pointwise on Q,
where “ denotes complex conjugation and
(sgn7)(z) = I /(rc)/l/(x)h
/(ж) / 0;
№) = 0.
Show that each sn can be chosen so that |sn| < 1.
★6.117 Let (О,Л) be a measurable space and v a complex measure on A.
a)	Prove that there is an Л-measurable function ф on Q such that |</>| = 1
and
i/(A) = y* A E A.
b)
Hint: Use the Radon-Nikodym theorem and Exercise 6.116.
Suppose that / is Л-measurable and \f\ < M on Q.
Prove that for
each E E Л,
f fdv
E
< M • |i/|(E).
Hint: Refer to part (a) and Exercise 6.115.
388 □ Chapter 6 Differentiation
In Exercises 6.118-6.129, (£1,А,Р) is a probability space, Y G £*(0, A, P), and
Q is a a-algebra of subsets of Q with Q C A.
6.118 Suppose that F is an event with probability strictly between 0 and 1.
Let Q be the ст-algebra generated by F, that is, Q = { 0, F, Fc, Q }. Define
8(Y \g) = E(Y \F)Xf £(Y \Fc)Xfc-
a) Show that 8(Y | Q) is (/-measurable.
b)	Prove that for each G G Q, fG Y dP = fG S(Y | Q) dP. Hint: Refer to
Exercise 5.84 on page 300.
6.119 Suppose that {Fn}n is a sequence of pairwise mutually exclusive events
each having positive probability and such that (Jn Fn = Q. Let g be the
cr-algebra generated by {Fn}n-
a) Characterize the sets in g.
b) Prove that, with probability one, £(Y | g) =	£(Y | Fn)xFn-
6.120	Show that, for each E G A, P(E 16) = £(xe | 6) P-ae. Thus, conditional
probability is a special case of conditional expectation.
6.121	Suppose that Y is (/-measurable.
a)	What does your intuition tell you regarding £(У | (7)?
b)	Prove your assertion in part (a).
6.122	Suppose that g = {Q, 0}.
a)	What does your intuition tell you regarding £(У | (7)?
b)	Prove your assertion in part (a).
6.123	Let У, У1, Y2 G A, P) and a, an, a2 G C. Establish that each of the
following holds with probability one.
a)	If У = a, then S(Y | g) = a.
b)	If У1 < У2, then £(У11 g) < S(Y2 | g).
с)	£(аУ|0)=а£(У|0).
d)	£(У1+У2|6) = £(У1|6) + £(У2|6).
e)	m£)|<£(|Y||£).
6.124	Let	be a nondecreasing sequence of nonnegative, integrable ran-
dom variables converging to the integrable random variable У. Prove that,
with probability one, limn—oo £(Yn | g) = £(У | g).
6.125	Suppose g± and g2 are cr-algebras such that Si C S2 C A. Prove that,
with probability one,
а) £(£(У|61)|62)=£(У|61).
b) £(£(У |62) 101) = £(У 161).
6.126	Suppose that Z is ^-measurable and ZY G £1(Q, Л, P). Prove that, with
probability one, 8(ZY 15) = ZS(Y | g).
6.127	Conditional expectation given a random variable: A special case of
conditional expectation given a cr-algebra is when the cr-algebra is gener-
ated by a random variable, X. Recall from Exercise 6.98 on page 375 that
the cr-algebra generated by X, denoted A(X), is the smallest cr-algebra
of subsets of Q for which X is measurable. Exercise 6.98(a) shows that
6.7 Signed and Complex Measures □ 389
-4(^0 = {{X € В} : В e B}. If Y € £*(0,^4, P), then we define
the conditional expectation of Y given X, denoted £(У|Х), to
be the conditional expectation of Y given Д(Х); that is, by definition,
£(У|Х)=£(У|Д(Х)).
a)	Prove that there is a Borel measurable function, 0, such that 8(У | X) =
ф о X, P-ae. Hint: First assume У > 0.
b)	Let ф be the function in part (a). For x € 7£, set £ (У | X = x) = </>(x),
called the conditional expectation of Y given X = x. Prove that
I Y dP = I £(Y\X = x)dp.x(x\ BeB,
J{XEB}	JB
where /ix is the probability distribution of X. Hint: Use Theorem 5.6
on page 291.
c)	Prove that if g is a Borel measurable function such that
/ YdP= g(x)dp.x(x), BeB,
J{xeB} Jb
then, for /zx-almost all ж, g(x) = £(У | X = x).
6.128	Refer to Example 5.7(a) on page 279. Suppose X and У are jointly discrete
random variables with joint probability mass function px,r- Define
( рх.у(д,у)
py|x(y|s) = < px(x) ’ p{'
( 0,	otherwise.
a)	Prove that
£(Y\X = x) = Y/yPr\x(y\x)
У
for each possible value, ж, of X. Hint: Use Theorem 5.7 on page 293
and Exercise 6.127(c).
b)	Determine 8(Y | X).
6.129	Refer to Example 5.7(b) on page 279. Suppose X and У are jointly ab-
solutely continuous random variables with joint probability density func-
tion fx,Y- Define
fy\x(y\x) =
' fx,v(x, y)
* fx(x)
0,
fx(x) > 0;
otherwise.
a)	Prove that
£(У|Х = х) = jyfY\x(y\x)dy
for /ix-almost all x. Hint: Use Theorem 5.7 on page 293 and Exer-
cise 6.127(c).
b)	Determine S(Y | X).
390 □ Chapter 6 Differentiation
6.8 DECOMPOSITION OF MEASURES
Ill this section, we will study several results regarding the decomposition of
measures. Our first result, known as the Lebesgue decomposition theorem,
is a consequence of the Radon-Nikodym theorem.
THEOREM 6.13 Lebesgue Decomposition Theorem
Let (Я, A, /z) be a а-Gnite measure space and v a а-Gnite measure on A.
Then there exist measures, щ and 1/2, on A such that i/i p, 1/2 J- p,
and v =	+1/2. Moreover such a representation is unique. It is called the
Lebesgue decomposition of i/ with respect to p.
PROOF: Clearly p C p + i/, and p and p + v are cr-finite. Therefore, the
Radon-Nikodym theorem implies that there is a nonnegative Л-measurable
function f on Q such that
p(A) = У fd(p + v), AeA.
Let E = {x : f(x) > 0}. Obviously, then, p(Ec) = 0.
Define measures i/i and 1/2 on A by
1/1 (A) = i/(A Pi E) and	1/2 (A) = i/(A П Ec).
Clearly, 1/ = i/j 4-1/2. Moreover, since 1/2(E) = 0 and p(Ec) = 0, we see
that 1/2 ± p.
We claim that 1/1 p. So, suppose p(A) = 0. Then
I fd(p + v)= I fd(p + v) = p(A) = 0.
JАПЕ	JA
Because f > 0 on А П E, it must be that (/z -F i/)(A П E) =0 and, hence,
1/1 (A) = i/(A Cl E) must equal zero.
It remains to prove uniqueness. Assume that v = cui ~F<^2, where
u/i <C p and 0,2 J- p. We must show that = 1/1 and o>2 = ^2 •
Because 1/2 JL p and ± p, there exist sets BrC € A such that
/z(B) = /z(C) = 0 and'1/2 (Bc) = 0,2 (C^) = 0. In particular, then, any
subset of В U C has /z-measure zero and any subset of Bc О Cc has 1/2- and
o>2-nieasure zero. Since 1/1 p and p, it follows that any subset
6.8 Decomposition of Measures □ 391
of В U C also has and cui-measure zero. Thus, for A e A,
(J2(A) = u2(AQ(BuC)) + ш2(АП(ВсПСс)) = u2(AQ(BuC))
= <Л (А П (B U C)) -h cu2 (А П (B U С)) = p(An (BU C))
= Pi (А П (B U C)) + p2(A П (BU C)) = p2(A h (B U C))
= p2(A n (B U С)) + p2(A n (Bc n Cc)) = p2(A).
Hence cj2 = p2. A similar argument shows that cji = px.	
EXAMPLE 6.1	6 Illustrates the Lebesgue Decomposition Theorem
Let F: TZ —► TZ be defined by

(0,
< 3-e"*,
A~e~x,
x < 0;
0 < x < 1;
x > 1.
Note that В is a distribution function; that is, F is nondecreasing, right
continuous, bounded, and F(x) —> 0 as x —► —oo. According to Theo-
rem 4.13 on page 226, there is a unique finite Borel measure, p, having F
as its distribution function. We will obtain the Lebesgue decomposition
of p with respect to Л (considered a Borel measure). Define f(x) = e~x for
x > 0 and zero otherwise, and let
ur(B) = [ f(t)dt, BeB.
Jb
Also, set p2 = 2<5q + <$1- Then, clearly, Pi Л and p2 ± A. A simple
calculation shows that pi + p2 has F as its distribution function, which
implies that p = Pi + p2. Therefore, we have found the unique Lebesgue
decomposition of p with respect to A. Example 6.20 provides an alternate
(and more straightforward) method for obtaining this decomposition. □
Further Decomposition of Measures
Suppose that (Q, A, p) is a ст-finite measure space such that {x} e A for
all x e Q. We will show that if p is a ст-finite measure on A, then' it
can be decomposed into three mutually singular measures, of which one is
absolutely continuous with respect to p, one is singular with respect to p
and has no atoms, and one is singular with respect to p and is discrete. To
begin, we recall the following definitions.
392 □ Chapter 6 Differentiation
DEFINITION 6.20 Atoms
Let (fi, Д) be a measurable space such that {x} G A for all x G Q and
let i/ be a measure on A. An element x G Q is said to be an atom of z/
if z/({x}) > 0.
DEFINITION 6.21 Continuous and Discrete Measures
Let (fl, A) be a measurable space such that {x} G A for all x € П.
a)	A measure v on A is said to be continuous if it has no atoms, that
is, z/({x}) = 0 for all x G fl.
b)	A measure и on A is said to be discrete if there is a countable
subset К of Q such that v(Kc) = 0.
EXAMPLE 6.1	7 Illustrates Definitions 6.20 and 6.21
a)	Let (П,Л) = (7£,A4). Lebesgue measure, Л, is continuous; the measure
6o + is discrete; and the measure A + 6q + is neither continuous nor
discrete. For the latter two measures, the set of atoms is {0,1}.
b)	Refer to Example 5.5 on page 276. Let X be a random variable on
(fl, A, P) and fix its probability distribution, that is,
дх(В) = P(x g В), Be B.
X is discrete if and only if fix is a discrete measure on В; X is continuous
if and only if fix is a continuous measure on B.	□
The following proposition was (essentially) proved in Exercise 4.6 on
page 172, but we state it formally here for completeness. The proof is left
to the reader.
PROPOSITION 6.14
Let (fl, A) be a measurable space such that {x} G A for all x G П, and let
и be a measure on A. Then v is discrete if and only if there is a countable
subset К of Q such that v = HxeK	We can always take К to be
the set of atoms of и.
The next proposition shows that, under ст-finite conditions, we can
decompose a measure as the sum of a continuous and discrete measure.
6.8 Decomposition of Measures □ 393
PROPOSITION 6.15
Let (Q, A) be a measurable space such that {z} e A for all x € Q, and let
v be a cr-finite measure on A. Then there exist mutually singular mea-
sures, i/c and i/j, on A such that i/c is continuous, is discrete, and
v = vc + i/j- Moreover, such a representation is unique.
PROOF: First assume v is finite. We claim и has countably many atoms.
Let F С Я be finite. Then Y^xeF ^({z}) = ^(F) < i/(Q). Taking the supre-
mum over all finite subsets of Я, we deduce that ^2xEq ^({^}) <	< oo.
Consequently, by Exercise 2.37 on page 57, only countably many of the
^({z})s are nonzero; that is, v has only countably many atoms.
Let К denote the set of atoms of v. As we have just seen, К is
countable. Therefore, the measure i/j = Ихек ^({я})^ *s discrete. Let
vc = v — To show that vc is a measure on A, it is enough to show that
it is nonnegative, because the other two conditions for being a measure
are clearly satisfied. Noting that i/({x}) — ^d({^}) for each x e K, we
conclude that и and щ agree on all subsets of K. Let A G A. Then
i/(A) = i/(A П К) + i/(A П Kc) = pd(A П К) + p(A A Kc)
and
pd(A) = i/d(A П K) + П Kc) = i/d(A A K).
Consequently,
pc(A) = i/(A) - pd(A) = р(Л П Kc) > 0.
Thus, vc is a measure.
Recalling that К denotes the set of atoms of p, we can apply the
previous equation with A = {x} to conclude that pc is continuous. Indeed,
if x G К, then uc({x}) = i/(0) = 0. On the other hand, if x K, then it is
not an atom of и and we have ^c({^}) = ^({x}) — 0.
Now assume that и is ст-finite. Select a countable collection, {En}n, of
disjoint Л-measurable sets of finite i/-measure whose union is Q. For each
n e Af, define the measure vn on A by i/n(A) = v(En A A). Then vn is
a finite measure on A. Consequently, by what we just proved for finite
measures, we can write vn = i/nc + i/nd, where i/nc is continuous and i/nd is
discrete; moreover, i/nd =	^({z})^ — Их^кп	where
Kn denotes the set of atoms of vn. Note that Kn consists of those atoms
of v lying in En.
Let К = \JnKn. As each Kn is countable, so is K\ moreover, К is
the set of atoms of v. Let uQ — unc and
v<i=52Vnd = 52	(6-43)
n	xQK
394 □ Chapter 6 Differentiation
Because each i/nc is continuous, so is i/c; and, because К is countable, i/a is
discrete. We have
v=52Уп = 52(Pnc+Pnd)= 52 vnc+52 ynd = +^d-
n	n	n	n
Moreover, because i/c is continuous and is discrete, it follows easily that
i/c ± pj- (See Exercise 6.135.)
It remains to prove the uniqueness of the decomposition. So, assume
we have и = тс + та, where тс is continuous and Ta is discrete. By Propo-
sition 6.14 we can write
та = 52Td (W)^’	(6-44)
xec
where C is the collection of atoms of rj. On the other hand, since rc is
continuous, we have та({т}) = i/({x}) for all x. Therefore, the set of atoms
of tj is identical to that of i/, namely, K. Therefore, С = K. Since С = К
and та({я}) = i/({r}) for all x, we see from (6.43) and (6.44) that та = v&.
Next note that because tc and i/c are continuous and К is countable,
тс(Л П К) = vc(A П К) = 0 for all A e A. Consequently, for A e A,
rc(A) = rc(A ПК) + rc(A П Kc) = rc(A П Kc)
= rc(A П Kc) + та (А П Kc) = v(A П Kc)
= vc(A П Kc) + i/d(A A Kc) = vc(A П Kc)
= i/c(A П K) + i/c(A П Kc) = i/c(4).
In other words, tc = i/c.	
THEOREM 6.14
Let (Q, A,/z) be a а-finite measure space such that {x} e A for all x e Q,
and let у be a a-finite measure on A. Then there exist measures, u^, i/8C,
and v&, having the following properties:
a)	v&c < p, i/sc J- p, and v& L p
b)	v8C is continuous and is discrete
c)	v = i/ac + *4c + vd
Moreover, the representation in (c) is unique.
PROOF: By the Lebesgue decomposition theorem, there exist unique mea-
sures, i/i and i/2, on A such that i/i p, 1/2 ± p, and 1/ = 1/1 + 1/2. Set
Vac = vr.
6.8 Decomposition of Measures □ 395
Because v is а-finite, so is z/2. Consequently, by Proposition 6.15, there
exist unique mutually singular measures, z/c and z/d, on A such that z/c is
continuous, i/d is discrete, and z/2 = *4 4- ^d- Setting z/sc = z/c, we have
и — ^ac + ^sc + ^d- So, (b) and (c) hold. Because z/2 ± д, it is easy to see
that z/sc ± p and z/d J- p. Thus, (a) holds.
It remains to prove uniqueness. So, suppose rac, tsc, and rj are mea-
sures on A that satisfy (a)-(c). Let r2 = rsc + tj. Since rsc ± д and
та ± /1, we have т2 ± д. Therefore, by the uniqueness part of the Lebesgue
decomposition theorem, rac = z/ac and r2 = z/sc + But then, by the
uniqueness part of Proposition 6.15, rsc = z/sc and та = z/d.	
Remark: We leave it as an exercise for the reader to show that the mea-
sures z/ac, z/sc, and z/j in Theorem 6.14 are mutually singular.
EXAMPLE 6.18 Illustrates Theorem 6.14
Let (Г2,Л, д) = (7?.,Л4,/х), where p is counting measure on AT, that is,
Д = SXi Let z/ = <50 + <51 + A. Then z/ac = 6i, z/sc = Л, and z/d = <f>o-
Moreover, we have dz/ac/d/z = d6\/dp = X{i} Д-ае. This simple example
shows that the absolutely continuous component of a measure need not be
a continuous measure.	□
Decomposition of Finite Borel Measures
An important application of Theorem 6.14 is to the decomposition of finite
Borel measures on H with respect to Lebesgue measure (considered a Borel
measure), that is, where (Q,X, /z) = (7£, B, A|#) and и is a finite Borel
measure. When there is no possibility of confusion, we will, as before,
write Л in place of A|#. First we have the following obvious corollary to
Theorem 6.14.
THEOREM 6.15 Decomposition Theorem for Finite Borel Measures
Let v be a Unite Borel measure on TZ. Then v can be decomposed uniquely
as the sum of three Unite Borel measures, z/ac, z/sc, and z/d, where z/ac is
absolutely continuous with respect to A, z/sc is continuous and singular
with respect to A, and z/d is discrete.
Our next task is to express the conclusions of Theorem 6.15 in terms
of distribution functions. First we state the following proposition relating
the atoms of a finite Borel measure to the discontinuities of its distribution
function. The proof is left as an exercise for the reader.
396 □ Chapter 6 Differentiation
PROPOSITION 6.16
Let у be a finite Borel measure on TZ and Fy its distribution function. Then
i/({x}) = Fy(x) - Fy(x-), x G TZ.	(6.45)
Consequently, the set of atoms of у is precisely the set of discontinuities
of Fy. In particular, у is a continuous measure if and only if Fy is a
continuous function.
EXAMPLE 6.19 Illustrates Proposition 6.16
a) Define
'0,
F(x) = < ^(x),
1,
x < 0;
0<x< 1;
x > 1,
where ip denotes the Cantor function. Let v be the unique Borel measure
having F as its distribution function. Because F is continuous on TZ,
Proposition 6.16 shows that и is a continuous measure.
b) Let {xn}n be a sequence of distinct real numbers and {an}n a sequence
of positive real numbers such that ^2nan < oo. Set у — ^2пап^хп-
Then и is a discrete measure and its set of atoms is {xn}n. The distribu-
tion function of у is given by Fy(x) = <x an. It follows from Propo-
sition 6.16 that F is continuous except at xn, n = 1, 2, ..., where it
has, respectively, a “jump discontinuity” of magnitude an, n = 1, 2,... .
Note that if у has only a finite number of atoms (i.e., {xn}n and {an}n
are finite sequences), then Fy is a step function on TZ.	□
In view of Example 6.19(b), it is natural and reasonable to make the
following definition.
DEFINITION 6.22 Discrete Distribution Function
A distribution function, F, is said to be discrete if it can be expressed
in the form F(x) = Exn<«an, where {zn}n is a sequence of real
numbers and {an}n a sequence of positive real numbers with an <
oo.
The following two propositions will also be required for our decom-
position of distribution functions. The proof of the first one is left to the
reader as an exercise.
6.8 Decomposition of Measures □ 397
PROPOSITION 6.17
A distribution function, F, is discrete if and only if the Lebesgue-Stieltjes
measure corresponding to F is a discrete measure.
PROPOSITION 6.18
Let v be a finite Borel measure on 1Z and F„ its distribution function. Then
v is singular with respect to Lebesgue measure if and only if F„ = 0 A-ae.
PROOF: For convenience, set F = F„. Suppose that v ± A. Select A € В
such that v(A) = 0 and A(AC) = 0. Let E = { x : F'(x) > 0 }. We have
f F'dA= [ F'dX + [ F'dX = [ F'dX< [ F' dX.
E	J Eft A	JeC]Ac	J eh a	J a
We will show that the last integral in the previous display is zero. This will
imply that F' dX = 0 which, in turn, implies that X(E) = 0, as required.
Let C be the collection of intervals of 1Z of the form (a, b] and (c, oo),
where —oo < а < b < oo and —oo < c < oo. Then, by Exercise 4.96, C is a
semialgebra and A(C) = B.
Because F is nondecreasing, we have by (6.27) on page 347 that
dA<F(b)-F(a) = i/((a,b]),
—oo < а < b < oo.
From this it follows easily that
F'dX<v(C),
CeC.
(6.46)
Let e > 0. By Lemma 4.1 (page 214) and Lemma 4.3 (page 216), we
can select a sequence, {Cn}n, of pairwise disjoint members of C such that
Un Cn D A and v (Un Cn) < v(A) 4- e. Because i/(A) = 0 and the Cns are
pairwise disjoint, we thus have i/(Cn) < c
From Proposition 4.18 on page 223, we have i/((a, b]) = F(b)—F(a), for
—oo < a < b < oo, where we are using the convention that (a, oo] = (a, oo).
Writing Cn = (an>bn] and referring to (6.46), we conclude that
[ F'dX< [ F'dX^Y [ F'dX< V^Cn) < e.
A ‘'Un n Je™ n
As e > 0 was chosen arbitrarily, it follows that fA F1 dX = 0.
398 □ Chapter 6 Differentiation
Conversely, suppose that F' = 0 A-ae. By the Lebesgue decomposition
theorem, we can write у — i/i 4-1/2, where 1/1 A and 1/2 -L A. Because у is
finite, so are v\ and 1/2- We must show that 1/1 = 0.
By the Radon-Nikodym theorem, there is a nonnegative Lebesgue mea-
surable function, /, such that
[ fdX,
J в
Be В,
(6-47)
and, because v\ is finite, f e £1(TC). By Corollary 6.2 on page 341, we have
f = Fyi A-ae. Also, by the necessity part of this proposition (proved in the
foregoing), we know that F^ = 0 A-ae. Noting that Fy = Fyi + F^, we
conclude that 0 = F^ = F'T 4-F'2 = F'x A-ae. Therefore, f = Fyi = 0 A-ae,
which, by (6.47), implies that 1/1 = 0.	
THEOREM 6.16 Decomposition Theorem for Distribution Functions
Let F be a distribution function. Then F can be expressed uniquely in the
form
F = Fac 4- Fsc 4- Fj,	(6.48)
where Fac is absolutely continuous, Fsc is continuous and F'c = 0 A-ae, and
Fd is discrete. Moreover,
F'c = F' A-ae.	(6.49)
PROOF: Let у be the Lebesgue-Stieltjes measure corresponding to F. Ap-
plying Theorem 6.15, we can write у = yac 4- ysc 4- yd, where y&c C A,
14c ± A and is continuous, and i/j is discrete. Let Fac, Fsc, and Fj denote,
respectively, the distribution functions of i/ac, ysc, and i/j- Then we have
F = Fac 4- Fsc 4- Fj.
Since i/ac A, Proposition 6.9 on page 371 implies that Fac is abso-
lutely continuous; since ysc -L A and i/d -L A, Proposition 6.18 implies that
F^. = F^ = 0 A-ae. Also, because ysc is continuous, Proposition 6.16 on
page 396 implies that Fsc is continuous; and, because i/d is discrete, Propo-
sition 6.17 implies that Fd is discrete. Because F — F&c 4- Fsc 4- Fd and
Fs'c = F^ = 0 A-ae, it follows immediately that F(c = F' A-ae.
It remains to establish uniqueness. So suppose F = Fi + F2 + F3,
where Fi is absolutely continuous, F2 is continuous and F£ = 0 A-ae, and
F3 is discrete. For 1 < j < 3, let Tj denote the Lebesgue-Stieltjes measure
corresponding to Fj.
6.8 Decomposition of Measures □ 399
Let C be the collection of intervals of 1Z of the form (a, 6] and (c, oo),
where —oo < a < b < oo and —oo < c < oo. Then, by Exercise 4.96 on
page 218, C is a semialgebra and A(C) = B. As F = Fi + F2 4- F3, the
measure и + т2 4-T3 agrees with 1/ on C and, therefore, on B; in other words,
у = л + r2 + 73.
Since Fi is absolutely continuous, Proposition 6.9 on page 371 implies
that Ti A. Since F2 is continuous, Proposition 6.16 (page 396) implies
that r2 is a continuous measure and, since F2 = 0 A-ae, Proposition 6.18
(page 397) implies that t2 ± A. And, because F3 is a discrete distribution
function, Proposition 6.17 (page 397) implies that T3 is a discrete measure.
It now follows from the uniqueness portion of Theorem 6.15 that we
have Ti = i/ac, r2 = i/sc, and 73 = У&. This implies that the corresponding
distribution functions are equal: Fi = Fac, F2 = Fsc, and F3 = Fa.	
Theorem 6.16 provides a concrete method for determining the decom-
position of a distribution function, F, and, hence, of its corresponding
Lebesgue-Stieltjes measure, y. Specifically:
Step 1. Determine the derivative, F', of F.
Step 2. Fac is the indefinite integral of F', that is,
Fac(:r) = [ Ff(t)dt.
J—00
We have i/ac(B) = F'dA.
Step 3. Fj is obtained from the discontinuities of F, that is,
Fd(^) ~ \ an,
Xn<X
where {xn}n denotes the set of discontinuities of F and {an}n denotes the
corresponding magnitudes of the jumps, that is, an = F(xn) — F(xn—).
(See Exercise 6.140.) We have щ = ^2na^xn-
Step 4. Fsc = F - Fac - Fd. We have ysc = у - i/ac - i/d.
EXAMPLE 6.20 Illustrates Decomposition
Let F: H —>	be defined by
F(x) =
f°,
< 3 — e~x,
. 4 - e~x,
x < 0;
0 < x < 1;
x > 1.
400 □ Chapter 6 Differentiation
Note that F is a distribution function; that is, F is nondecreasing, right
continuous, bounded, and F(x) —► 0 as x —► —oo. We will apply the pre-
ceding Steps 1-4 to decompose F and its corresponding Lebesgue-Stieltjes
measure.
Step 1. We have
F'(x)
0,
x < 0;
x > 0, x / 1.
Step 2. We have
x < 0;
x > 0.
Also, i/ac(B) = fBF'dX.
Step 3. F is discontinuous at x = 0 and x = 1. The magnitudes of the
jumps at those two points are, respectively, F(0) - F(0—) = 2 — 0 = 2 and
F(l) - F(l-) = (4 - e"1) - (3 - e-1) = 1. Therefore,
°,
< 2,
1з,
Fd(x) =
x < 0;
0 < x < 1;
x > 1.
Also, щ = 26q + 6i.
Step 4- From Steps 2 and 3 we see that Fac + Fj = F. Consequently,
Fac = F - Fac - Fa = 0. We have i/sc = v -	- I'd = 0.	□
EXERCISES 6.8
6.130
Let (Q,A,/i) = (F.,B,X). For E € M, define r(E) = A(£fl [0,1]). Also
define
F(x) = <
0,
V’(x),
1,
x < 0;
0 < x < 1;
x > 1,
where -0 denotes the Cantor function, and let cu be the unique Borel mea-
sure having F as its distribution function. Finally, set и = t4-o>4-£o 4-<5i.
Determine the Lebesgue decomposition of и with respect to Л.
6.131	Let (П,Л, p) = (7£,Л4,/1), where p is counting measure on X, that is,
д =	6n. Set и =	4-Л. Determine the Lebesgue decomposition
of и with respect to p.
6.132	Let (Q, A, p) be a measure space and vi and 1/2 measures on A. If p
and U2 L /z, show that ± 1/2 • Hence the two measures in the Lebesgue
decomposition of a measure are mutually singular.
6.8 Decomposition of Measures □ 401
In Exercises 6.133-6.135, (П,Л) denotes a measurable space such that {z} G A
for all x 6 Q.
6.133	What can you say about a measure, i/, that is both discrete and continuous?
6.134	Find a measure, v, on a measurable space (Q, Д) such that every element
of Q is an atom, but и is not discrete.
6.135	Let /1 and v be measures on (Q, A). Prove the following.
a)	If p is discrete and и is continuous, then /i ± v.
b)	If и	p and /1 is continuous, then so is v.
c)	If и	p and p is discrete, then so is v.
6.136	Show that the measures r/ac, and i/d in Theorem 6.14 are mutually
singular.
6.137	Let X be a random variable on (Q, A, P) and let px denote its probability
distribution. Prove that px can be expressed as a convex combination of
probability measures, /ц, /12, and дз, on B, where /11 A, /12 JL A and is
continuous, and /13 is discrete. Convex combination means we can write
Дх =	where otj > 0, 1 < j < 3, and aj = 1.
6.138	Let и be a finite Borel measure on 1Z and F„ its distribution function.
a)	Show that for each x G r/({z}) = Fu(rr) — F^x—).
b)	Prove that x is an atom of и if and only if Fu is discontinuous at x.
6.139	Prove Proposition 6.17 on page 397.
6.140	Suppose that F is a discrete distribution function, say, F(x) = ^xn<x an'
a) Show that {rrn}n constitutes the set of discontinuities of F.
b)	Prove that for each n G Л/*, F(xn) — F(xn~) = an- This shows that
the magnitude of the discontinuity of F at the point xn is an-
c)	Show that F is constant on any interval not containing a discontinuity
point of F.
6.141	Let {rn}^! be an enumeration of the rational numbers and {an}^! a
sequence of positive real numbers such that £2^1 an < °°- Define the
function F on 1Z by F(x) = ^,r <x an'
a)	Prove that F is continuous at every irrational number and discontinuous
at every rational number.
b)	Show that F is strictly increasing.
c)	Prove that Ff = 0 A-ae.
6.142	Let и be a finite Borel measure on TZ and Fv its distribution function.
Let 1/ = 1/1 + 1/2 be the Lebesgue decomposition of и with respect to A
(considered a Borel measure). Prove that dv\/dX = F„ A-ae.
6.143	Let denote the Cantor function. Define
{0,	x <	0;
2 + x + iIj(x), 0 <	x <	1;
4 + rr,	l<rr<2;
9,	x >	2.
402 □ Chapter 6 Differentiation
a)	Decompose F into its absolutely continuous, singular continuous, and
discrete parts.
b)	Use part (a) to determine the decomposition of the Lebesgue-Stieltjes
measure, i/, corresponding to F into its absolutely continuous, singular
continuous, and discrete parts.
6.144	Refer to Exercise 5.30 on page 285. Let X be uniformly distributed
on [a, b], and let a < c < b.
a)	Define Y = min{c, X}. Determine the decomposition of p,y into its
absolutely continuous, singular continuous, and discrete parts.
b)	Define Z = max{c, X}. Determine the decomposition of /iz into its
absolutely continuous, singular continuous, and discrete parts.
c)	Let X be an absolutely continuous random variable with probability
density function, fx. Let M be a positive real number, and define
Г-M, X<-M\
Y = I X, \X\ < M;
[M, X>M.
Determine the decomposition of /zy into its absolutely continuous, sin-
gular continuous, and discrete parts.
6.145	Let (П,Л,/х) = (7£2,B2,A2), and let D = {(x,y) € 7£2 : x2 + y2 < 1}.
Define /z, cu, and r on B2 as follows: /z(B) = A ({a; 6 R : (x.x) 6 B});
cu(B) = 1 if (0,0) 6 B, and zero otherwise; and т(В) = A2(B П D). Let
и = /z 4- cu 4- r. Determine the decomposition of и into its absolutely
continuous, singular continuous, and discrete parts with respect to two-
dimensional Lebesgue measure.
6.9	MEASURABLE TRANSFORMATIONS AND THE
GENERAL CHANGE-OF-VARIABLE FORMULA
Recall the following change-of-variable formula from elementary calculus,
often referred to as integration by substitution.
Change-of- Variable Formula for Riemann Integration: Suppose that g is
a continuously differentiable monotone function on [a, 6] with range [c, d]
and that f is continuous on [c, d]. Then
rb	rd
/ f(g(xy)\g'(x)\dx= / f(y)dy.
J a	J c
In this section, we will generalize the change-of-variable formula by
applying the theory of measurable transformations. We begin with the
following definition.
6.9 Measurable Transformations □ 403
DEFINITION 6.23 Measurable Transformation
Let (fi, Л) and (Л, 5) be measurable spaces. A mapping T fi —> Л is
called a measurable transformation if T~r(S) € A for each SeS.
EXAMPLE 6.2	1 Illustrates Definition 6.23
a)	The measurable transformations from (11, B) to (1Z, B) are precisely the
(real-valued) Borel measurable functions.
b)	The measurable transformations from (1Z, Л4) to (11, B) are precisely
the real-valued Lebesgue measurable functions.
c)	More generally than in parts (a) and (b), let (£1,Л) be any measur-
able space and (A, 5) = (7£,B). Then the measurable transforma-
tions from (fi,v4) to (H, B) coincide with the real-valued Л-measurable
functions.	□
Our next result shows that the composition of a measurable function
with a measurable transformation is a measurable function.
PROPOSITION 6.19
Suppose that T is a measurable transformation from (Г2,Л) to (A,S) and
that f is an S-measurable function on Л.. Then f oT is an A-measurable
function on fi.
PROOF: Let О be open in 1Z (in C if f is complex-valued, in H* if / is ex-
tended real-valued). Because f is 5-measurable, /“^(O) E S, and because
T is a measurable transformation from (£1,Л) to (A, 5), T-1(/_1(O)) E A.
Thus, (f о T)-1(<?) = T~1(f~1(O)) E A for each open set O. This shows
that f о T is Л-measurable.	
From a measurable transformation and a measure on its domain space,
we get in a natural way a measure on the range space. This is the content
of the following proposition whose proof is left to the reader as an exercise.
PROPOSITION 6.20
Let T be a measurable transformation from (Г2,Л) to (A, 5) and p a mea-
sure on Л. Define
цоТ-\3) = KT-^S)), s&s.
Then доТ-1 is a measure on S, called the measure induced by ц and T.
404 □ Chapter 6 Differentiation
EXAMPLE 6.2	2 Illustrates Proposition 6.20
Let (Q, Л, P) be a probability space and X a random variable thereon.
According to Definition 5.6, the set function
Mx(B) = P(XeB), Be в,
is the probability distribution of X. But note also that X is a measurable
transformation from to (7£,B) and that the measure induced by P
and X is
P о X-!(B) = P(X-x(B)) = P(X e B) = nx(B), BeB.
In other words, the measure induced by P and X is the probability distri-
bution of X.	□
The General Change of Variable Formula
With Propositions 6.19 and 6.20 in mind, we now prove the general change-
of-variable formula.
THEOREM 6.17 General Change of Variable Formula
Let {Q, A, p) be a measure space, (A, <S) a measurable space, and T a mea-
surable transformation from (Q, A) to (Л, 5). Then, for any S-measurable
function f on Л,
[ f oT(x)dn(x}= [
n	J Л.
f(y)dy.oT 1(y),
(6.50)
in the sense that if one of the integrals in (6.50) exists, then so does the
other, and they are equal.
PROOF: Suppose first that f is the characteristic function of a set S € S.
Then,
/ f О T(x) dfi(x) = / хз(Т(хУ) dfj,(x) = / XT-i(S)(x)dfj,(x)
Jn	Jn	Jn
=	= yoT^S) = [ xs^dfioT-^y)
Ja
= [
Jk
6.9 Measurable Transformations □ 405
Hence (6.50) holds if f is a characteristic function. It now follows easily
that (6.50) holds if f is a nonnegative S-measurable simple function.
If f is a nonnegative extended real-valued S-measurable function, se-
lect a sequence {sn}^Li of nonnegative S-measurable simple functions such
that sn T f on Л. Then sn о T f f о T on Q. Applying the monotone con-
vergence theorem twice, we get that
[ foT(x)dp(x) = lim [ sn(T(x))dp(x)
Jn	n-*°° Jn
= lim [ Sn^dfj, о T~1(y)= [	о Т-1(у).
n-°° J a	Ja
If f is a complex-valued or extended real-valued S-measurable func-
tion, we proceed in the usual manner. That is, we decompose f into a
linear combination of nonnegative S-measurable functions and apply the
result of the previous paragraph to each component.	
As an immediate consequence of Theorem 6.17, we have the following
corollaries. Their proofs are left to the reader as exercises.
COROLLARY 6.3
Let (fi, Л, jz) be a measure space, (A, S) a measurable space, T a measurable
transformation from (fi, Л) to (A, S), and f an S-measurable function on A.
Then, for each S € S,
[ fo T’(x) dfj.(x) = [ f(y) dp. О
Jt-1(S)	Js
in the sense that if one of the integrals exists, then so does the other, and
they are equal.
COROLLARY 6.4
Let (Sl,A,p) be a measure space and (A,S, i/) a ст-finite measure space.
Suppose that T is a measurable transformation from (SI, A) to (A,S) such
that p о T"1 i/ and p о T-1 is а-finite. Then, for any S-measurable
function f on A,
° Г(х) d/j.(x) =	-(.y) dv(y),
in the sense that if one of the integrals exists, then so does the other, and
they are equal.
406 □ Chapter 6 Differentiation
EXERCISES 6.9
6.146	True or False: Every real-valued Lebesgue measurable function is a mea-
surable transformation from (7^,A4) to (7^,A4).
6.147	Prove Proposition 6.20 on page 403.
6.148	Let (П,Л, P) be a probability space and Xi, ..., Xn random variables
thereon. Define X: Q —► Hn by X(cu) = (Xi(cu),..., Xn(u>)).
a) Prove that X is a measurable transformation from (Q, Д) to (7£n,Bn)«
b) Identify the measure induced by P and X.
6.149	Prove Corollary 6.3.	z
6.150	Prove Corollary 6.4.
6.151	Suppose that g is an absolutely continuous and monotone function on [a, b]
with range [c, d] and that f is Borel measurable and Lebesgue integrable
on [c, d]. Use the general change-of-variable formula (Theorem 6.17 on
page 404) to prove that
pb	pd
/ №(*))|p'(®)l<&= / f(y)dy,
J a	J c
where both integrals are in the Lebesgue jsense. Hint: Assume first that
g is nondecreasing. Let pJB) = f gf dX for В € ^[a,b] and show that, as
measures on M ° 9"1 = A.
6.152	Use Exercise 6.151 to establish the change-of-variable formula for Riemann
integration given on page 402.
6.153	Let X be an absolutely continuous random variable on (Q, Л, P) with
probability density function fx, and let ф be a strictly monotone function
on 11 whose inverse is absolutely continuous on 1Z. Prove that Y = ф о X
is an absolutely continuous random variable on (Q, Л, P) with probability
density function given by
fY(y)=fx(<l> ЧуУ) f-ф l(y) 
ay
6.154	Let (£2,Л,/z) be a finite measure space, and let ф be a nonnegative real-
valued Д-measurable function on Q. For x > 0, set G(x) = /z(0-1((r, oo))).
Prove that
Г	Г°°
I фИр,= I G(x)dx.
Jn Jo
6.155	Let (П,Л,Р) be a probability space, and let X be a nonnegative random
variable thereon. Use Exercise 6.154 to prove that
(1 — Fx(x))dx.
6.9 Measurable Transformations □ 407
6.156	Suppose g is a real-valued Lebesgue measurable function such that if В G В
with A(B) = 0, then A(p~1(B)) = 0; that is, the inverse image under g of
any Borel set of Lebesgue measure zero has Lebesgue measure zero.
a)	Prove that if E 6 M with A(E) — 0, then p“1(E) has Lebesgue (outer)
measure zero, and hence is measurable; that is, the inverse image un-
der g of any Lebesgue measurable set of Lebesgue measure zero is
Lebesgue measurable and has Lebesgue measure zero.
b)	Prove that g~1(B) G Л4 for each E € Л4, so that g is a measurable
transformation from (7£,A4) to (7£,A4).
6.157	Suppose that g is a real-valued Borel measurable function such that, as
Borel measures, Ao#-1 A. Prove that g is a measurable transformation
from (7£,A4) to (7£,A4). Hint: Exercise 6.156.
6.158	In each of the following parts, we have specified a real-valued Borel mea-
surable function, g. In each case, show that, as Borel measures, Aop”1 A
and find, explicitly, d(A о g~1)/dX.
a)	g(x) = x2
b)	g(x) = x3
c)	g(x) = ex
6.159	Let ip denote the Cantor function. Show that, as Borel measures on [0,1],
we have А о гр'1 ± A.
6.160	In Exercise 6.63 on page 353, we proved the following generalization of the
change-of-variable formula for Riemann integration: Suppose that g is an
absolutely continuous and monotone function on [a, 5] with range [c, d] and
that f G C1 ([c, d]). Then (/ о g)g' G jC1 ([a, b]) and
rb	rd
/ /М0)Ь'(*)1dx = / f(v)<fy,
J a	J c
where both integrals are in the Lebesgue sense. Explain why this result
does not follow directly from the general change-of-variable formula.
6.161	Use the general change-of-variable formula to provide a proof of Theo-
rem 5.6 on page 291.

PART THREE
□
Topological, Metric, and
Normed Spaces
Pavel Samuilovich Urysohn
(1898-1924)
Pavel S. Urysohn was born in Odessa, Russia,
on February 3, 1898, the son of a financier,
Urysohn, in 1915, enrolled at the University of
Moscow to study physics. However, influenced
by Egorov and Luzin, he soon began to con-
centrate on mathematics.
Urysohn graduated in 1919. but remained
at the university to continue his studies. He
focussed his early work on integral equations and other analysis prob-
lems. In 1921. Urysohn was appointed assistant professor at the Univer-
sity of Moscow and Egorov supplied him with two problems that turned
Urysohn's attention to topology.
In 1922, Urysohn published papers on topology in the journal of the
Academie des Sciences and in Soviet and Polish journals. These papers
laid the foundations of the Soviet school of topology. His most famous
result is the ingenious lemma that bears his name.
The results of his work in abstract topology included a theorem on the
existence of a topological mapping of any normed space with a countable
base into Hilbert space. Urysohn presented, in his memoirs on Cantortan
varieties (published posthumously in 1925-26). an inductive definition of
dimensionality that became classical. The theory of dimensionality is also
known as the Urysohn-Menger theory.
Tragically, Urysohn’s contributions were cut short when he drowned at
the age of 26 off the coast near Batz, France, on August 17. 1924.
410
Elements of Topological,
Metric, and Normed Spaces
In this chapter, we will introduce topological, metric, and normed spaces
and study some of their basic properties. Like most good ideas in mathe-
matics, the concept of a topological space can be approached from several
points of view. Since our perspective is from analysis, we will emphasize
the connections between topological spaces and the concepts of limit and
continuity.
7.1 INTRODUCTION TO TOPOLOGICAL SPACES
In this section, we will show how extending the limit concept can lead
naturally to the notion of a topological space. Suppose that we have a
function f: fi —► Л. What kinds of structures on fi and Л are needed in
order to make sense of the formula
lim f{x) = 6?	(7.1)
x—*a
In case Q = Л = 7£, (7.1) can be described verbally as follows: “/(x) will
be near b whenever x is sufficiently near (but not equal to) a.” Of course,
411
412 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
“f(x) being near b” means that /(x) lies in some prescribed open interval I
centered at b and “x is sufficiently near a” means that x lies in a certain
interval J centered at a. Here, we seek to capture the idea of nearness in
terms of intervals. In general, our approach to (7.1) will be to consider col-
lections of subsets of Q that have properties mimicking those of collections
of intervals having a common center.
Neighborhood Bases, Continuous Functions, and Open Sets
Consider an element a e Q. We would like to define a collection 91a of
subsets of Q that can be thought of as “neighborhoods” of a. Certainly,
our collection should be nonempty and satisfy the condition:
a G N for all N G 91/	(7.2)
If we think of an element of Q as being “near” a if it belongs to some
member of the collection 91a, then at least some of the elements of the
intersection of two members of 91a should also be “near” a. Thus, it is
reasonable to assume that the collection 91a satisfies the condition:
If ЛГ1, #2 ё there exists ТУз 6 Vta such that ^з С M П^2.	(7.3)
A nonempty collection У1а of subsets of Q satisfying (7.2) and (7.3) is
called a neighborhood basis at the point a. Members of %la are called
neighborhoods in or, simply, neighborhoods. Using the concept
of a neighborhood basis at a point,- we make the following definition.
DEFINITION 7.1 Neighborhood Basis on a Set
A collection 91 of subsets of a set Q is said to be a neighborhood
basis on Q if for each a 6 Q, the collection { N G 91 : a G N} is a
neighborhood basis at the point a.
The next proposition provides an equivalent set of conditions for a
collection of subsets of Q to be a neighborhood basis on Q. Its proof is left
to the reader as an exercise.
PROPOSITION 7.1
A collection 91 of subsets of a set Q is a neighborhood basis on Q if and
only if it satisfies the following two conditions:
7.1 Introduction to Topological Spaces □ 413
а)	П — Un col N.
b)	NUN2 € 01 and x G N\ П N2 implies there exists N3 G 07 such that
x G N3 G M П W2.
EXAMPLE 7.1 Illustrates Neighborhood Bases
a)	The collection I of all open intervals of is a neighborhood basis on 1Z.
And so is { (x — r, x 4- r): x G r > 0 }.
b)	There are several natural neighborhood bases on H2. Two examples are
I2 = {Ix J:J,JgI}‘
and
P = { Dr(a, b) : (a, b) G 7J2, r > 0 },
where Dr(a, b) = { (x, y) : (x — a)2 + (y — b)2 < r2 }. Other examples of
neighborhood bases on H2 can be found in the exercises at the end of
this section.	□
Using the notion of neighborhood basis, we can make sense of (7.1).
Suppose У1а and ОТь are neighborhood bases at a and b, respectively. Then
we will take (7.1) to mean that the following condition is satisfied: For each
M G ЯЯь, there exists N G9la such that f(N \ { a }) С M.
In cases where (7.1) holds with b = /(a), the function f is said to
be continuous at a with respect to the neighborhood bases %ta
and If 91 and 9Я are neighborhood bases on Q and Л, respectively,
then f is said to be continuous on П with respect to 97 and S91 if
it is continuous at each a G П with respect to the neighborhood bases 9la
and OT/(a), where
9ia = {#G9i:aGN}
and
OT/(a) = {MGOT:/(a)GM}.
When we have a neighborhood basis 91 on П, we can also generalize
the idea of an open set. Referring to Definition 2.7 on page 57, we define a
subset О C Q to be open with respect to 9t if for each x G O, there is
an N G 91 such that x G N С O. When it is clear from the context which
neighborhood basis we are using, we will say simply that О is an open set.
414 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
We note that all of the sets belonging to 91 are open with respect
to 91. More generally, as the reader should verify, a subset of Q is open
with respect to 91 if and only if it is a union of members of 91.
Theorem 2.2 on page 58 states three fundamental properties of the
collection of open subsets of 7£. The next proposition, whose proof is left
to the reader, shows that those properties also hold for the collection of
subsets of Q that are open with respect to a neighborhood basis.
PROPOSITION 7.2
Let 91 be a neighborhood basis on the set SL Then the open sets with
respect to 91 satisfy the following conditions:
a)	The empty set and the set Q are open.
b)	The union of any collection of open sets is an open set.
c)	The intersection of any finite collection of open sets is an open set.
Exercise 7.2 shows that the neighborhood bases T2 and 7?, defined
in Example 7.1(b), determine the same collection of open subsets of H2.
It is easy to construct other examples where distinct neighborhood bases
determine the same collection of open sets.
The following proposition shows, however, that the property of conti-
nuity for a function f: Q —> A, where Q and A have neighborhood bases 91
and ЯИ, respectively, depends only on the open sets determined by 91
and Ш1. The proof of the proposition is left to the reader as an exercise.
PROPOSITION 7.3
Let 91 and SOI be neighborhood bases on fi and A, respectively. Then a
function f: Q —> A is continuous on Q (with respect to 91 and 9У1) if and
only if f-^O) is open in Q with respect to 91 whenever О is open in A
with respect to 9И.
Topological Spaces and Continuous Functions
We note that Proposition 7.3 generalizes and is motivated by Theorem 2.5
on page 66. It also shows that, with respect to the concept of continuity, the
notion of open set is more fundamental than that of neighborhood basis
since two distinct neighborhood bases on a set can determine the same
collection of open sets and, hence, the same continuous functions. Thus,
we are led to formalize the concept of open set via the following definition.
7.1 Introduction to Topological Spaces □ 415
DEFINITION 7.2 Topology, Topological Space
Let fl be a nonempty set. A collection T of subsets of fl is said to be
a topology on fl if it satisfies the following conditions:
a)0,flGT.
b) S С T implies Uogs О ^T.
c) Oi, O2 € T implies O\ П O2 € T.
If T is a topology on fl, then the pair (Q,T) is called a topological
space; the members of T are called T-open or, if there is no danger
of confusion, simply open.
Note: When the topology under consideration is clear from context, a
topological space (fl,T) will usually be referred to simply as fl.
It follows from Proposition 7.2 that if 91 is a neighborhood basis on
a set fl, then the subsets of fl that are open with respect to 91 constitute
a topology on fl, which we will call the topology determined by 91.
On the other hand, if T is a topology on the set fl, then the collection
{O tT : a G О } is a neighborhood basis at the point a for each a G fl,
T is a neighborhood basis on fl, and the topology determined by T is T.
We also have the following definition.
DEFINITION 7.3 Neighborhood Basis for a Topological Space
Let (fl, T) be a topological space. A collection 91 of subsets of fl is
said to be a neighborhood basis for (Л, T) if the following two
conditions are satisfied:
a)	91 is a neighborhood basis on Q.
b)	The topology determined by 91 is T.
In such cases, we also say that the neighborhood basis 91 induces or
determines T.
The reader should verify that each of the following conditions is neces-
sary and sufficient for a collection 91 of subsets of fl to be a neighborhood
basis for (fl, T).
•	91 С T and each open set (i.e., member of T) is a union of members
of 91.
•	91 С T and for each OeT and each x G O, there is an N G 91 such that
x G N С O.
416 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
Motivated by Proposition 7.3 (page 414), we now extend the definition
of continuity to functions /: Q —* A, where Q and Л are topological spaces.
DEFINITION 7.4 Continuous Functions on Topological Spaces
Let Q and Л be topological spaces. A function /: Q —> A is said to be
continuous if /”1(O) is open in Q whenever О is open in A.
EXAMPLE 7.	2 Illustrates Topological Spaces and Continuous Functions
a)	Let Q = 72, and T consist of the usual open sets as given by Definition 2.7
on page 57. Then, according to Corollary 2.1 on page 67, a real-valued
function on 72, is continuous in the sense of Definition 2.11 on page 65
if and only if it is continuous in the sense of Definition 7.4.
b)	The neighborhood bases given in Example 7.1(b) determine the same
topology T on T22. With respect to T and the usual topology on 72,,
the functions f,<?:7£2 -♦ 1Z given by f(x,y) = x and д(х, у) = у are
continuous.
c)	Let fi be any set and T = {0, fi}. Then T is a topology on Q, albeit not
an interesting one. Nevertheless, this topology is sometimes useful as an
illustrative example. It is not hard to show that a function f: Q —*	is
continuous with respect to the topology T if and only if it is constant.
d)	Let Q be any set. Then the collection P(Q) of all subsets of Q is a
topology on Q which is sometimes referred to as the discrete topology.
It is not hard to see that 7>(Q) is determined by the neighborhood basis
consisting of all the single-’element subsets of fl. Also, it is obvious
that all functions from Q to 7£, or to any other topological space, are
continuous with respect to the discrete topology on Q.	□
Note: From now on, unless stated otherwise, we will assume that 1Z is
equipped with the topology determined by the neighborhood basis of all
open intervals, which is the same topology as the one consisting of the open
sets of 72, as given by Definition 2.7 on page 57.
Relative Topologies and Continuous Functions
Given a topological space, we can produce still others by considering subsets
with topologies defined as follows. Let (Q, T) be a topological space and
D С fl. Then it is easy to check that the collection {D ПО : О ET} is a
topology on D, that is, it satisfies (a)-(c) of Definition 7.2. This topology
is given a special name.
7.1 Introduction to Topological Spaces □ 417
DEFINITION 7.5 Relative Topology
Let (П, T) be a topological space and D a subset of fl. The collection
of sets 7b = {jDnO:OeZ}isa topology on D, called the relative
topology. Sets in are said to be relatively open.
Remark: The reader should compare the definition of relatively open set
given here with that given for subsets of in Chapter 2; specifically, see
Definition 2.10 and Theorem 2.3, both on page 62.
Unless stated otherwise, when we say that a function is continuous
on a subset of a topological space, we will mean that it is continuous with
respect to the relative topology. For example, when we say a function is
continuous on the interval [0,1], we are assuming that [0,1] is equipped
with the relative topology inherited from R,.
We note that if /: Q —> Л is continuous and D C Q, then the function
f\D'D —» Л, the restriction of f to D, is continuous with respect to the
relative topology on D.
Homeomorphic Topological Spaces
We conclude this section by considering what it means for two topological
spaces to be equivalent.
DEFINITION 7.6 Homeomorphic Spaces; Homeomorphism
Suppose that (Q, T) and (A,Z7) are topological spaces and h: fi —> A is
a 1-1 correspondence. If h~1(l7) G T for each U EU and h(O) G U for
each OgT, then we say that (Q,Z) and (A,Z7) are homeomorphic
and call h a homeomorphism.
We note that, if h is a homeomorphism, then both h and hr1 are
continuous and, moreover, U —♦	is a 1-1 correspondence from U
to T. Thus, homeomorphic spaces are equivalent as topological spaces.
EXAMPLE 7.	3 Illustrates Definition 7.6
The function f(x) = 2x is a homeomorphism of the interval (0,1) onto the
interval (0,2). Indeed, it can be shown that any two open intervals of TZ are
homeomorphic. On the other hand, [0,1] and (0,1) are not homeomorphic.
See Exercises 7.14-7.16.	□
418 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
EXERCISES 7.1
7.1	Let Q be a nonempty set and suppose that for each a € Q, 97a is a neigh-
borhood basis at the point a. True or False: The collection 91 = IJaen
is a neighborhood basis on Q.
7.2	Refer to Example 7.1(b) on page 413. Show that the topologies determined
by the neighborhood bases Z2 and P are identical.
7.3	Show that each of the following are neighborhood bases on 7£2 and that
each determines the same topology as that in Exercise 7.2.
a) The collection
£ = { Lr(a, b) : (a, 5) G 7£2, r > 0 },
where Lr(a, b) = { (x, y) : |x — a| + \y — b\ < r }.
b) The collection
Wl = {Mr(a,5) : (a,5) G 7г2, r > 0},
where Mr(a, b) = { (z, y) : |x — a^2 + \y — bp^2 < r }.
7.4	Let T denote the neighborhood basis on H consisting of all open intervals
and T the topology determined by T. Show that T consists precisely of the
open sets of as given by Definition 2.7 on page 57.
7.5	Prove Proposition 7.1 on page 412.
7.6	Prove Proposition 7.2 on page 414.
7.7	Prove Proposition 7.3 on page 414.
7.8	Let 91 = { [a, b) : —oo < a < b < oo }
a)	Show that 97 is a neighborhood basis on
b)	Let T be the topology determined by 9L Give an example of a real-valued
function that is continuous with respect to T but not with respect to
the usual topology on
7.9	A collection S of subsets of a set Q is called a sub-basis on Q if the
collection of finite intersections of members of S is a neighborhood basis
on Q. The topology determined by the neighborhood basis is also said to
be determined by the sub-basis S.
a)	Show that a collection 5 of subsets of Q is a sub-basis on Q if and only
ifUsess = n.
b)	Show that the topology determined by the basis Z2 in Example 7.1(b)
is also determined by the sub-basis {I x : I G T} U { x I: I G Z }.
7.2 Metrics and Norms □ 419
7.10	Verify the assertions made in parts (a)-(d) of Example 7.2 on page 416.
7.11	Let Q be a set and T = { 0 } U{ 17 . Uc is finite }. Show that T is a topology.
★ 7.12 Refer to Exercise 1.33 on page 25. Suppose (Q, T) is a topological space
and = is an equivalence relation on Q. Let Q/= denote the corresponding
set of equivalence classes and, for each x E Q, let (x) denote the equivalence
class containing x. For any subset W of Q/=, let W =
a)	Show that 7i = {W:lV€T}isa topology on Q/=. The topology T=
is often called the quotient topology determined by =.
b)	Show that the function p: Q —> Q/= defined by p(x) = (x) is continuous
with respect to T and T=.
7.13 Prove that “homeomorphic to” is an equivalence relation on the collection
of all topological spaces.
7.14 This exercise asks you to show that any two nonempty open intervals of TZ
are homeomorphic.
a)	Prove that (0,1) and Ti are homeomorphic.
b)	Prove that any two nonempty open intervals are homeomorphic.
7.15 Show that if h is a homeomorphism from (a, b) onto (c, d), then h is either
strictly increasing or strictly decreasing.
7.16 Show that no two of the intervals (0,1), [0,1), and [0,1] are homeomorphic.
★ 7.17 Let € be a collection of topologies on a set Q.
a)	Show that T is also a topology on Q.
b)	Does the result in part (a) hold if intersection is replaced by union?
7.2 METRICS AND NORMS
In the previous section, we developed the notion of a neighborhood basis
as a way of expressing the concept of “nearness.” An alternative approach
to the idea of nearness is through a generalized concept of distance.
In the case of the real line, Ti, we usually think of the distance, d(x, y),
between the numbers x and у as being given by d(x, у) =	— y\. Proofs
of many of the fundamental theorems of analysis on Ti make use of three
crucial properties of this distance function; namely, for all x, y, z e Ti,
(DI) d(x, y) > 0, with equality if and only if x — y.
(D2) d(rr,2/) = d(y,x}.
(D3) d(x, z) < d(x, y) + d(y, z).
Properties (D1)-(D3) are the model for the general notion of a dis-
tance function or metric, which we will introduce in a moment. Of course,
420 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
(D1)-(D3) are derived from the properties of the absolute value function
(see Exercise 2.2 on page 42). The absolute value function is also the model
for another idea that we will introduce later in this section, namely, that
of a norm.
DEFINITION 7.7 Metric, Metric Space
Let fi be a set. A function p: fi x fi —♦ TZ is said to be a metric on fi
if it satisfies the following conditions for all x, 2/, z G fi:
а) р(я, ?/) > 0, with equality if and only if x = y.
b) p(x,y) = p(y,x).
c) p(x, z) < p(x, y) + p(y, z).
If p is a metric on fi, then the pair (fi, p) is called a metric space.
Note: When it is clear which metric is defined on fi, we will often suppress
the p and simply write fi for (fi, p).
Normed Spaces
While the distance function on 1Z given by d(x, y) = \x — y\ is the model
for the concept of a metric, it has an algebraic aspect that is not present in
Definition 7.7. We will combine algebraic and metric properties by adapting
the notion of distance function to the setting of a linear space. To begin,
we recall the definition of a linear space.
DEFINITION 7.8 Linear Space
A linear space (vector space) consists of a set fi, a field Fj and two
functions +: fi x fi —► fi and •: F x fi —> fi, where we denote 4-(x, y) by
x + y and -(a, x) by az, such that the following conditions are satisfied
for all x, 2/, z 6 fi and a, /3 6 F:
a)	x 4- у = у 4- x.
b)	x 4- (2/ 4- z) = (x 4- y) 4- z.
c)	There exists a 0 E fi such that x 4- 0 = x.
t A field is a set along with two binary operations satisfying the field axioms
(F1)-(F5) on page 36. In this book, F will always be either TZ or C.
7.2 Metrics and Norms □ 421
d)	There exists —x G fl such that x 4- (—x) = 0.
e)	a(f3x) = (a/3)x.
f)	a(x + y) = ax + ay.
g)	(a 4- /3)x = ax 4- fix.
h)	lx = x.
The field F is called the field of scalars, 4- vector addition, and • scalar
multiplication.
Note: On account of (b), sums of the form x+y-j----}-z are unambiguously
defined. Also, it is conventional to write x — у for x 4- (—?/)•
The space 1Zn is a linear space having К as its field of scalars, where
4- and • are defined by
(£1,2:2, • • •, Xn) + (yi, 2/2, • •  ,Уп) = (zi + 2/1, X2 + 2/2, • • •, Xn + 2/n)
and
• • • i^n) = (ах1,ах2т--,ахп).
Using analogous definitions, we can make Cn into a linear space having C
as its scalar field.
A nonempty subset D of fl is called a (linear) siubspace of fl if
(1) x,y G D implies x 4- у G D and (2) a G F, x G D imply ax G D.
We observe that a subspace D of a linear space fl is itself a linear space,
where the operations of vector addition and scalar multiplication in D are
the restrictions to D of those operations in fl.
Often we will deal with linear spaces of real- or complex-valued func-
tions on a set. When we do so, the operations of vector addition (4-) and
scalar multiplication (•) will always be defined pointwise, as explained in
Section 2.4 on page 65.
Similarly, we also define the following operations pointwise: multi-
plication of functions, fg; maximum of two real-valued functions, f V g;
minimum of two real-valued functions, f Л g; the real part of a complex-
valued function JRJ; the imaginary part of a complex-valued function, S/;
the absolute value (modulus) of a complex-valued_ function, |/|; and the
complex conjugate of a complex-valued function, f =	— i^f. Also, a
function that is constantly equal to a is denoted simply by a.
Now that we have recalled the definition of a linear space, we can
define a normed space. This is done as follows.
422 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
DEFINITION 7.9 Norm, Normed Space
Let fi be a linear space having as its scalar field F either or C. A
function || ||:f2 —> 7£, whose value at x is written as ||x||, is said to
be a norm on Q if it satisfies the following conditions for all x, у G £2
and a G F:
а)	Ы > 0, with equality if and only if x = 0.
b)	||aa;|| = |a|||x||.
c)	lk + y|| < M + IMI-
If || || is a norm on Q, then the pair (Q, || ||) is called a normed space.
Note: When it is clear from context which norm is being considered, the
normed space (fl, || ||) will be indicated simply by fl.
It is easy to check that if (Q, || ||) is a normed space, then
p(x,y) = ||x-J/||
defines a metric on fl. We will call this the metric induced by the
norm || ||. Hence, any normed space can also be viewed as a metric space;
indeed, the first examples of metric spaces that we consider arise from
norms. However, as we will see, there is still a need for the more general
theory of metric spaces to handle, among other cases, those metric spaces
where there is no underlying linear-space structure.
EXAMPLE 7.	4 Euclidean n-Space Equipped with Various Norms
The space 7Zn of n-tuples of real numbers is a linear space with respect to
the operations of vector addition and multiplication by real scalars given
on page 421. Here are three, naturally arising, norms defined on Ttn:
||x||2 = (x?4-a;2 + --- + *n)1/2>
Iklll = |xi| + |l2| + ••• + |®n|,
Woo = max{|xi|, |x2|,..., |xn|},
where x = (x1,X2,... ,£n)-	□
EXAMPLE 7.	5 Unitary n-Space Equipped with Various Norms
The set of complex numbers C = {x + iy : x,y G 1Z} with the usual
absolute value function (modulus) defined by |x 4- iy\ = (x2 4- y2)1^2 is a
normed space, where the scalar field is also C.
7.2 Metrics and Norms □ 423
The space Cn of n-tuples of complex numbers is a linear space with
respect to the operations of vector addition and multiplication by complex
scalars given on page 421. We will abuse notation slightly by also using
|| Ц2, || ||i, and || ||oo to denote the norms defined, respectively, on Cn via:
№ - (|21|2 + |z2|2 + ••• + Ы2)1/2>
И1 = kl + |z2| + • • • + |zn|,
Halloo =max{|zi|,|z2|,...,|zn|},
where z = (21, Z2,..., гп).	□
EXAMPLE 7.	6 Spaces of Measurable Functions
We will present three normed spaces of functions that are generalizations
of those given in Example 7.5. Let (Q, Д,/х) be a measure space.
a)	Recall from Section 4.4 that the set £1(/x) consists of all complex-valued
Д-measurable functions satisfying \ f\ dp < 00. Parts (a) and (b) of
Theorem 4.8 on page 196 show that £1(^lz) is a linear space with scalar
field C. Furthermore, if we identify two functions in whenever
they are equal jx-ae and define
. Il/lli= [ \f\dp,
then || ||i is a norm on £х(м), called the £1-norm.
b)	A somewhat more difficult task is to show that if we again identify two
functions that are equal jx-ae, then
f r \ 1/2
11/112 uJ/|2dM)
defines a norm on the linear space £2(m) consisting of all complex-
valued Л-measurable functions such that | f |2 dp < 00. The norm || Ц2
is called the £2-norm,
c)	Another important space of measurable functions is £°°(^x). This space
consists of all complex-valued Д-measurable functions that satisfy the
following condition: There is a real number M such that \ f\ < M p-ae.
Such functions are said to be essentially bounded. If we again identify
two functions that agree p-ae, then
ll/lloo = inf{Af : I/I < M M-ae}
defines a norm on the linear space £°°(д). The norm || ||oo is called the
£°°-norm or essential-supremum norm.
424 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
In case /1 is n-dimensional Lebesgue measure restricted to a mea-
surable subset Q of 7£n, we denote the spaces £r(p), £2(/Д and £°°(m)
by /^(Q), £2(fi), and £°°(Q), respectively. And when p is counting mea-
sure on some set Q, we denote the spaces £г(р), £2(p), and £°°(m) by €r(Q),
^2(П), and €°°(Q), respectively.	□
EXAMPLE 7.	7 Metric Spaces That Are Not Normed Spaces
To see how metric spaces that are not normed spaces can arise naturally, we
first consider a simple way of constructing new metric spaces from existing
ones. Suppose that (Q, p) is a metric and that D is a subset of Q. Then we
can define a metric pd on D by restricting the function p to D x D. When
there is no danger of confusion, we will denote the metric space (£>,pp)
by (D,P).
Now, suppose that D is a subset, but not a linear subspace, of a normed
space and let p be the metric induced by the norm. Then (£>, p) is a metric
space that is not a normed space.	□
Metric Spaces as Topological Spaces
We now show how metrics can be used to define topologies. Let (Q, p) be
a metric space. For x G Q and r > 0, let JB£(x) = { у G П : p(x, y) < r }.
We call B?(x) the open ball of radius r centered at x. When the
metric with which we are dealing is given unambiguously, we write Br(x)
for B?(x).
In case Q = TZ and p = d (the metric induced by absolute value), we
have Br(x) = (x — r, x + r). The next proposition, whose proof is left to the
reader as an exercise, shows that, just as the collection of open intervals
{ (x — r, x + r) : x G TZ, r > 0 } is a neighborhood basis on 7£, the collection
of open balls of a metric space Q is a neighborhood basis on Q.
PROPOSITION 7.4
Let (П, p) be a metric space. Then the collection, {Br(x) : x G fi, r > 0 },
of open balls of Q is a neighborhood basis on Q.
The neighborhood basis {Br(x) : x E SI, r > 0} determines a topol-
ogy on Q, denoted by Tp, which we call the topology induced by the
metric p. If the metric p is itself induced by a norm || ||, we also say that
Tp is the topology induced by the norm || ||.
When we have a metric space (or a normed space), we assume, unless
stated otherwise, that it has the topology induced by the metric (or norm).
7.2 Metrics and Norms □ 425
Thus, for example, suppose that Q and Л are each either a metric, normed,
or topological space and that f: Q —> A. Then, when we say that f is
continuous on Q, we mean, unless stated otherwise, that it is continuous
with respect to the induced topologies on Q and Л.
A topological space is said to be metrizable if there is a met-
ric p on Q such that Tp = T. Later we will address the difficult problem
of determining when a topological space is metrizable. We will see that
even in cases where a topological space is metrizable, the metric may not
be defined by a simple usable formula.
When two metrics on the same set or two norms on the same linear
space induce the same topology, we say that they are equivalent.
EXAMPLE 7.	8 Nonequivalent Norms
Consider the space C([a, b]) of continuous complex-valued functions on the
closed interval [a, b]? C([a, b]) is a linear subspace of each of the spaces
£х([а, b]), £2([a, b]), and £°°([a, b]); hence, it can be given any of the norms
|| ||i, || ||2, and || ||oo defined in Example 7.6 on page 423. It is left to the
reader to show that no two of these norms on C([a, b]) are equivalent. □
The following proposition and its corollary provide useful equivalent
conditions for two metrics or norms to be equivalent. We leave the proof
of the corollary to the reader as an exercise.
PROPOSITION 7.5
Let p and a be metrics defined on a set Q. Then p and a are equivalent
if and only if the following condition is satisfied: for each x E Q and
e > 0, there are positive numbers r and s such that B°(x) C Bf (x) and
BP(x) C
PROOF: Suppose that the condition specified in the statement of this
proposition is satisfied. Let О be open with respect to the metric p and let
x E O. Then there is an б > 0 such that B?(x) С O. So, by assumption,
there is an s > 0 such that B°(x) C Bf(x) С O. Hence, О is also open
with respect to a. A similar argument shows that a set that is open with
respect to о is also open with respect to p.
Conversely, suppose p and a are equivalent. Let x E Q and б > 0.
Then, since Bf(x) is an open set containing x in the topology induced
t In the terminology of Definition 2.11 on page 65, C([a,b]) denotes the collection of
continuous real-valued functions on [a, b]. But, as we said in a footnote to that
definition, the notation introduced there was temporary.
426 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
by p, it is also an open set containing x in the topology induced by a.
Thus, there is an s > 0 such that Bf(z) C A similar argument
shows that there is an r > 0 such that B?(x) C Bf (x).	
COROLLARY 7.1
Let || || and ||| ||| be norms on a linear space Q. Then || || and ||| ||| are
equivalent if and only if there are positive constants A and В such that
ли < |||x||| < B||r||
for all x G Q.
Exercise 7.25 shows that the three norms on 1Zn defined in Example 7.4
on page 422 are equivalent, that is, they induce the same topology, T.
Unless otherwise stated, we assume that each subset D of TV1 has the
relative topology 7b* Similar comments hold for Cn.
We conclude this section with a construction showing that every metric
is equivalent to a bounded metric.
PROPOSITION 7.6
Let (fyp) be a metric space. Then there is a bounded metric a on Q such
that p and a are equivalent.
PROOF: It can be shown (see Exercise 7.30) that the function a defined
on Q x Q by
"(1'9> = ггй
is a metric. Clearly, cr(z,i/) < 1 for all x,y G fi and, so, a is bounded.
Now, since cr(x, y) < p(x, y), it follows that for each x G fi and б > 0,
Bf(z) C B°(x). On the other hand, choosing s = б/(1 4- e) and using
p(x,y) = a(x,y)/(l — cr(x,y)), we find that Bf(x) C Bf(x). Thus, the
condition of Proposition 7.5 is satisfied by p and a.	
EXERCISES 7.2
J.18 Let (Q, p) be a metric space. Prove each of the following facts.
a)	For x,y, z G Q,
\p(x,y) -p(z,y)\ < p(x,z).
b)	For xi,X2,... ,xn G Q,
P(®1, *Гп) < p(«Tl j X2) + p(X2, *Гз) 4“ * ’ * + p(Xn—1 j Xn)-
7.3 Weak Topologies □ 427
7.19	Refer to	Example	7.4.	Verify	that	each of	||	||i,	||	Ц2,	and	||	||oo	are	norms.
7.20	Refer to	Example	7.5.	Verify	that	each of	||	||i,	||	Ц2,	and	||	||oo	are	norms.
7.21	Refer to	Example	7.6.	Verify	that	each of	||	||i,	||	Ц2,	and	||	||oo	are	norms.
7.22	For each	x E 7Z, let (x) = |a?|ly/2.
a)	Show that ( ) satisfies conditions (a) and (c) of Definition 7.9 but not
condition (b).
b)	Show that, nevertheless, p{x^ y) = (x — y) defines a metric on TZ that is
equivalent to the metric induced by the absolute value function.
7.23	Prove Proposition 7.4.
7.24	Prove Corollary 7.1.
7.25	Prove that the three norms defined in Example 7.4 are all equivalent.
7.26	Prove that no two of the norms in Example 7.8 are equivalent.
7.27	Let p and a be metrics on Q. Show that each of the following are also
metrics on Q.
a)	pi = p + a.
b)	P2 = (p2 + cr2)1/2.
c)	poo = max{p, cr}.
7.28	Refer to Exercise 7.27. Show that any two of the three metrics, pi, рг,
and poo, are equivalent.
7.29	Refer to Example 7.2(d) on page 416. Let T be the discrete topology on a
set Q. Show that (Q, T) is metrizable.
★ 7.30 Refer to the definition of a metric (Definition 7.7 on page 420).
a)	Show that if p satisfies condition (c), then so does 0 = p/(l + p).
b)	Deduce that the function cr in Proposition 7.6	is a metric.
7.31	Provide an example of a topological space that is	not metrizable.
it7.32	Suppose that (Q,p) and (A, cr) are metric spaces	and let /:Q —♦	A.	Show
that f is continuous on Q if and only if for each a	E Q and e > 0,	there	is	a
8 > 0 such that р(я, a) < 6 implies a(f (re), f(a)) < e.
7.3 WEAK TOPOLOGIES
While metric spaces are ubiquitous in analysis, there are natural ways in
which nonmetrizable spaces enter the subject. For example, we will see
later that nonmetrizable spaces often arise in the context of weak topologies
determined by families of functions. It is the concept of weak topology that
we introduce in this section.
428 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
If T and U are two topologies on a set Q and T CU, then we say that
T is weaker than U. If T is weaker than but not equal to W, then T is
said to be strictly weaker than U.
Let fl be a nonempty set. Consider a family of functions T such that
for each f € F,	is a topological space and	—> Ay. Can we
find a topology T on Q such that f is continuous with respect to T and 7}
for each f E Fl The answer to this question is “yes” because the discrete
topology (Example 7.2(d) on page 416) on fi will always do the trick.
However, the discrete topology is of little interest because, with respect
to it, any function from fi into a topological space is continuous. Therefore,
it is better to ask the following question: Of all the topologies on Q with
respect to which each f E T is continuous, is there a weakest one? The
answer to this question is based on the observation that if T is a nonempty
collection of topologies on the set Q, then the intersection, ПТ€хТ is also
a topology on Q. (See Exercise 7.17(a) on page 419.)
DEFINITION 7.10 Weak Topology
Let fl be a nonempty set. Consider a family of functions T such that
for each / E J7, (Ay,7y) is a topological space and /:f2 Ay. Let
I denote the collection of topologies on Q with respect to which all
functions in J7 are continuous. Then the topology
AT
7" GT
is called the weak topology determined by the family F
We leave it to the reader as an exercise to prove that Tjc is the weakest
topology on fl for which all f E T are continuous. (See Exercise 7.34.)
Usually, when there is no possibility of confusion, functions that are
continuous with respect to Tjr are called weakly continuous. Further-
more, Tjr-open sets are called weakly open.
The following proposition provides a useful alternative way of looking
at the topology Tjf.
PROPOSITION 7.7
Let fl be a nonempty set. Consider a family of functions F such that for
each f E J7, (Ay,7y) is a topological space and f :fl —> Ay. Suppose that
7^3 Weak Topologies □ 429
the topology 7} is determined by a neighborhood basis 91/. Then sets of
the form
П	(7.4)
fen
where P is a Unite subset of 5* and, for each f e P, Wf e 91/, form a
neighborhood basis that induces Tjr.
PROOF: We first note that the collection of sets of the form (7.4) is a
neighborhood basis on Q. We need to show that the topology T determined
by that neighborhood basis is Tjr.
Let f e F and О e 7/. Then
/-1(О) = и
WO,
Because each /-1(W) belongs to T, it follows that /~1(О) e T. Thus,
every function in F is continuous with respect to T. It follows that Tjr is
weaker than T.
On the other hand, each set of the form (7.4) is weakly open, being the
intersection of finitely many weakly open sets. Consequently, T is weaker
than Tjr.	
EXAMPLE 7.9 Compares Weak and Metric Topologies
The space C([a, b]) of continuous complex-valued functions on [a, b] is a
linear subspace of £°°([a, b]). Thus, C([a,b]) is a normed space with
norm || Цое. Let denote the topology induced by this norm.
For each x e [a, b], the complex-valued function on C([a, b]) defined by
еж(/) = /(я) satisfies the inequality |ex(/) — ex(^)| < Ц/ — p||oo- From this
inequality, it is easy to show that each function ex is continuous with respect
to the topology T^o. It follows that the weak topology Tjr determined by
the family F — { ex : x e [a, b] } is weaker than 7^.
Is it possible that Tjr = 7^? Suppose the answer is yes. Then, in
particular, the open ball Bi(O) is weakly open. Applying Proposition 7.7,
we see that there exist X\,X2,... ,xn E [a,b], wi,W2,-..,wn e C, and
positive numbers bi, 62, ..., 6n such that
{f : |/(rrj) -Wj\ < 6j, j = 1, 2, ..., n} C Bi(O).
However, it is easy to construct a function д e C([a,b]) with g(xj) = Wj
for j = 1, 2, ..., n and g(c) = 2 for some c e [a,b] \ {xi,Z2,... ,xn }.
Clearly, g is an element of the set on the left of the previous display but
cannot belong to Bi(O). since ||p||oo > 2. Hence, we have a contradiction.
Thus, Tjf / T^o and we conclude that Tj- is strictly weaker than □
430 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
Product Topologies
Suppose that {(flt,71)}tez is an indexed family of topological spaces. The
idea of a weak topology can be used to define a topology on the Cartesian
product fl = XtG/flt. We recall from Definition 1.11 on page 18 that
each element f e fl is a function on I such that /(t) € flt for each l € I.
Furthermore, we know from the axiom of choice that fl / 0 provided flt / 0
for all l E I.
DEFINITION 7.11 Product Topology
Let {(flt,T^)}tGj be an indexed family of topological spaces and set
fl = X tG/ flt. The function pL defined by pL(J) = /(z) is called the zth
coordinate projection on fl. The weak topology on fl determined by
the family of coordinate projections { pL : l e I} is called the product
topology. Thus, the product topology is the weakest topology for
which all coordinate projections are continuous.
Examples of product topologies are discussed in the exercises.
EXERCISES 7.3
7.33 Let T be a collection of topologies on a set Q. Show that if a function f is
continuous with respect to every member of T, then it is continuous with
respect to the intersection of T, that is, with respect to T.
7.34 Refer to Definition 7.10 on page 428. Prove that Tjr is the weakest topology
on fl for which all f G F are continuous. That is, prove the following:
a) Each f 6 F is continuous with respect to Tjf.
b) If U is a topology on Q such that each f 6 T is continuous with respect
to 1/, then Tj- is weaker than U.
7.35	Show that the function L:C([a, 6]) —► C defined by L(f) = f(x)dx is
not continuous with respect to the weak topology defined in Example 7.9
on page 429.
7.36	Refer to Example 7.9 on page 429. For Ac [a, 6], let A = {ex : x 6 A}.
Show that if A is a proper subset of B, then 7д is strictly weaker than Tjg-
7.37	In Example 7.9 on page 429, show that every Tjr-open set that contains the
constant function 0 must also contain a nonzero linear subspace of C([a, 6]).
7.4 Closed Sets, Convergence, and Completeness □ 431
7.38	Let Q and Л be linear spaces having the same scalar field, F. A func-
tion L:Q —► A is called a linear mapping or linear operator if for all
x, у G Q and all scalars a € F,
L(x + y) = L(x) + L(y)
and
L(ax) = aL(x).
Suppose that L\ C([a, &]) —> C is linear and continuous with respect to the
weak topology Tjr defined in Example 7.9 on page 429. Show that there are
finitely many points zi, 2:2,.. ., xn G [a, b] and constants ci, C2,..., cn G C
such that
L CiCxj 02^x2 4“	4“ Cn^xnj
where ex(f) = f(x). Hint: Find a finite set of points {#i,..., 2?n} C [a, 6]
such that if g(xj) = 0 for 1 < j < n, then |L(p)| < 1.
7.39	Refer to Exercise 7.12 on page 419. Show that if p is continuous with respect
to some topology U on Q/=, then U is weaker than T=.
7.40	Show that the product topology on Cn is the same as the topology defined
by any of the norms in Example 7.5 on page 422.
7.41	The space ^2(Л7) is a subset of the Cartesian product C^. Thus, ^2(Л7) can
be given the relative product topology T. Show that T is strictly weaker
than the topology induced by the norm || Ц2.
7.42	Do Exercise 7.41 with replacing ^2(Af) and || ||i replacing || Ц2.
7.43	Do Exercise 7.41 with £°°(X) replacing ^2(Af) and || ||oo replacing || Ц2.
7.44	Consider the Cantor set P as defined on pages 74-75. Recall that each x G P
has a unique ternary expansion of the form x =	an(^)3-n, where
an{x) G {0,2} for all n G A/*. Define A:P —► {0,2}^ by A(x)n = an(x).
Suppose that {0,2} is given the discrete topology and {0,2}^ is given the
corresponding product topology. Prove that A is continuous.
7.4 CLOSED SETS, CONVERGENCE, AND COMPLETENESS
In this section, we discuss closed sets and convergence in topological and
metric spaces and some related topics as well. We assume throughout that
(Q,T) is a topological space.
432 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
DEFINITION 7.12 Closed Set
A subset F of a topological space is said to be closed if Fc is open.
Note: From Proposition 2.14 on page 61, we see that the closed subsets
of Ti, as given by Definition 2.9 on page 61, are also closed in the sense of
Definition 7.12, and vice versa.
It follows immediately from Definition 7.12 and the definition of a
topology (Definition 7.2 on page 415) that the collection C of closed sets
satisfies the following conditions:
(Cl) 0,fieC.
(C2) £ с C implies p|FG5 F € C.
(C3) 2* *i, 7*2 £ C implies F\ U F% EC.
Conversely, if C is a collection of subsets of fi satisfying (C1)-(C3), then
{F^FeCjisa topology on fi for which C is the collection of closed sets.
A simple example of a closed subset of TZ is [a, b], where a < b, because
Mc = (—oo, a) U (&, oo). On the other hand, an interval of the form [a, b)
is not closed.
Limit Points, Closure, and Convergent Sequences
Next we define the limit points and closure of a set.
DEFINITION 7.13 Limit Point, Closure
Let E be a subset of a topological space fi. A point x € fi is called a
limit point of E if each open set containing x intersects E\ that is,
if О is open and x E O, then О A E / 0. The set of all limit points
of E) denoted E, is called the closure of E.
Note: If E = fi (i.e., every point of fi is a limit point of E)y then we say
that E is dense in fi. Thus, we see that E is dense in fi if and only if it
has a nonempty intersection with every nonempty open set.
We leave it to the reader as an exercise to show that the following
properties hold:
• E is the intersection of all closed sets that contain E and, hence, E is
the smallest closed set containing E.
7.4 Closed Sets, Convergence, and Completeness □ 433
• Let 91 be a neighborhood basis that determines the topology. Then x € E
if and only if
x e W € 91 implies W A E 0.	(7.5)
Condition (7.5) suggests that we can interpret E as the set of points
of Q that can be “approximated arbitrarily closely” by points of E. Thus,
in the case of the real line the rational numbers are dense since any real
number can be approximated arbitrarily closely by rational numbers.
The next proposition, whose proof is left to the reader as an exercise,
provides the basic properties of the closure operation.
PROPOSITION 7.8
Let А, В C fi. Then
a) A = A if and only if A is closed.
b)^ = A.
c)	AU В = AUB_ _
d)	А С В implies Ac B.
e)	Ac F and F closed implies A C F.
Remark: It follows easily from Proposition 7.8(c) that Ufc=i Ль — Ufc=i
whenever Ai, A2. ..., An are subsets of Q. (See Exercise 7.49.)
When a topological space is metrizable, there is a useful characteriza-
tion of limit point in terms of sequences. To give that characterization, we
must first define convergence of a sequence in a topological space.
DEFINITION 7.14 Convergent Sequence in a Topological Space
Let (fl, T) be a topological space. A sequence {^n}^=1 of points in fl is
said to converge to the point x € fl if for each open set О containing x.
there is an integer N such that xn G О whenever n > N. Convergence
of {xn}Xi x is denoted by
lim xn = x or xn -+ x
П—><Х>
or, in case it is important to indicate the topology in which convergence
occurs, by xn x.
It is not hard to see that if the topology on fl is induced by a metric p,
then xn —* x if and only if p(xn,x) —* 0. In case the topology is the
weak topology determined by a family of functions J7, it can be shown that
xn —> x if and only if f(xn) —► f(x) for each f € F (See Exercise 7.54.)
434 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
PROPOSITION 7.9
Suppose that the set fl has the topology induced by a metric p. Let E C fl.
Then x G E if and only if there exists a sequence {^n}^=i of points of E
such that Ишп-.оо xn = x.
PROOF: Suppose that x G E. Then, by (7.5), for each positive integer n
there exists an xn G Bi/n(x) П E. Because p(xn,x) < 1/n, it follows
that limn_^oo xn = x.
Conversely, suppose that {zn}Xi a sequence of elements of E such
that limn-,» xn = x. Then, for each e > 0, there is a positive integer N
such that p(xn,x) < e whenever n > N. It follows that Be(x) П E / 0.
Thus, by the condition (7.5), we have that x G E.	
As a simple illustration of the previous proposition, consider the case
where fl = and E = Q. Since every real number is the limit of a sequence
of rational numbers, it follows that Q == 1Z. A more elaborate application
of Proposition 7.9 is discussed in the following example.
EXAMPLE 7.10 Illustrates a Nonmetrizable Space
Let fl be the Cartesian product {0,1}я. Let {0,1} have the discrete topol-
ogy and fl the corresponding product topology. Recall that fl is the set
consisting of all functions from 'll to {0,1} and the product topology on fl
is the weak topology determined by the family of functions {pt : t G 1Z},
where pt(/) = f(t). The product topology is determined by the neighbor-
hood basis of sets of the form
{/efl:/(tfc) = afc, fc = l, 2, ..., n},	(7.6)
where ntN and € {0,1}, 1 < к < n. Consider the set
U = { f G fl: /'"1({0}) is countable}.
We claim that U is dense in fl. Indeed, the intersection of U with each set
of the form (7.6) contains the function д defined by gfa) = ak for fc = 1, 2,
.. ., n and g(t) = 1 for t G \ {£2, • • • ,tn }. In particular, U has a
nonempty intersection with every set in a neighborhood basis determining
the topology of fl. Hence U is dense in fl.
We claim that fl is not metrizable. Suppose to the contrary. Then,
by Proposition 7.9, there is a sequence {/n}^Li C U converging to the
function on TZ that is identically 0. It follows from Exercise 7.54 that
fin*n->oo fn(t) — О f°r each t G 7^. But, for each t in the complement
of the countable set	we have limn->oo fn(t) = 1. This
contradiction shows that Q is not metrizable.	□
7.4 Closed Sets, Convergence, and Completeness □ 435
Completeness
There is a powerful extension of the notion of closed set, namely, the idea
of a complete set. Before we can give a formal definition of a complete set,
however, we require the following:
DEFINITION 7.15 Cauchy Sequence
A sequence {#n}^Li in a metric space (£2,p) is said to be a Cauchy
sequence if for each e > 0, there is a TV e -V such that p(xn, xm) < e
whenever n, m > N.
In the space TZ with the usual metric, this definition of a Cauchy
sequence is exactly the same as Definition 2.6 on page 52. Cauchy sequences
in TZ always converge by Theorem 2.1 on page 53. In general, however,
Cauchy sequences may fail to converge. Metric spaces for which all Cauchy
sequences converge are called complete.
DEFINITION 7.16 Complete Metric Space, Complete Set
A metric space (Q, p) is said to be complete if every Cauchy sequence
converges; that is, if {#n}Xi a Cauchy sequence of elements of Q,
then there exists an x € £2 such that limn-^ xn = x. A subset E C £2
is called complete if (E, p) is a complete metric space.
The real line, 7£, provides an example of a complete metric space.
Many other examples will be encountered in the exercises in this section
and in the text and exercises of future sections.
Our.next proposition, whose proof is left to the reader as an exercise,
relates the concepts of closed and complete.
PROPOSITION 7.10
Let fl be a metric space and E C £2. Then the following hold.
a)	If E is complete, then it is closed.
b)	If £2 is complete and E is closed, then E is complete.
The converse of Proposition 7.10(a) fails. Indeed, the interval (0,1]
is closed in the relative topology of the space (0,2); however, (0,1] is not
436 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
complete because the sequence {1/n}^! is a Cauchy sequence in (0,1] but
not convergent in (0,1].
Interior of a Set
__ /
As we have seen, E is the smallest closed set containing E. Similarly, there
is a largest open set contained in E defined as follows.
DEFINITION 7.17 Interior of a Set
Let E be a subset of a topological space Q. A point x € fi is called an
interior point of E if there is an open set О such that x G О С E.
The set of all interior points of E, denoted E°, is called the interior
of E.
Remark: We note that the interior of a set may be empty. For example, if
we take fl = 7£, then Q° = 0.
We leave it to the reader as an exercise to show that each of the
following properties hold:
•	E° is the union of all open sets contained in E and, hence, E° is the
largest open set contained in E.
•	Let 91 be a neighborhood basis that determines the topology. Then
x G E° if and only if there is a W G 91 such that x G W С E.
The following is the analogue of Proposition 7.8 for the interior of a
set. Its proof is left to the reader as an exercise. (See Exercise 7.63.)
PROPOSITION 7.11
Let А, В C fl. Then
a)	A° = A if and only if A is open,
b)	(A°)° = A°.
c)	(AAB)° = A°AB°.
d)	А С В implies A° CB°.
e)	U C A and U open implies U C A°.
EXERCISES 7.4
7.45	Let E be a subset of a topological space fl. Prove the following facts.
a)	E is the intersection of all closed sets that contain E and, hence, E is
the smallest closed set containing E.
7.4 Closed Sets, Convergence, and Completeness □ 437
b)	Let 91 be a neighborhood basis that determines the topology. Then
x G E if and only if x G W G 91 implies W П E / 0.
7.46	Prove Proposition 7.8 on page 433.
+ 7.47 Let (Q,p) be a metric space. For x G Q and 0 E C Q, define
p(x, E) = inf{ p(x, y) :y£ E}.
p(x, E) is called the distance from x to E. Prove the following:
a)	There is a sequence {яп}^ C_E such that limn—о© p(x,xn) = p(x, E).
b)	p(x, E) = 0 if and only if x G E.
с)	p(x,E) = p(x, E).
d)	|p(a;i,E)-p(z2,B)| < p(zi,x2).
e)	The function /: Q —>	defined by f(x) — p(x, E) is continuous.
f)	Let A and В be disjoint closed nonempty subsets of Q. Define
f(x} =	P&,A)(l + p(x,B))
p(x, A) + (1 + p(x, A))p(x, B) ’
Prove that f is continuous, /(Q) C [0,1], f(A) = {0}, and f(B) = {1}.
+ 7.48 Consider an open ball Br(x) in a metric space (Q, p).
a)	Show that Br(x) C {y : p(x,y) < r}.
b)	Show that equality holds in part (a) for the case of a normed space.
c)	Give an example where the containment in part (a) is strict.
d)	The set Br(x) — { у : p(x, y) < r } is called the closed ball of radius r
centered at x. Verify that Br(x) is a closed set.
7.49	Verify the formula UZ=i = Ufc=i Ak-
7.50	Suppose that Q and A are topological spaces and /:Q —► Л. Show f is
continuous if and only if /~X(F) is closed in Q whenever F is closed in Л.
7.51	Suppose that Q and Л are topological spaces and f : Q —> Л. Show that f is
continuous if and only if f(A) C f(A) for all A C Q.
7.52	Suppose Q and Л are topological spaces and f: Q —> Л. If f is continuous,
does it follow that f(E) is closed (open) whenever E is a closed (open)
subset of Q? Explain your answer.
+ 7.53 Suppose Q and Л are topological spaces and f: Q —> Л.
a)	Show that the condition “f(xn) -* f(x) whenever xn ж” is necessary
for f to be continuous.
b)	Show that the condition in part (a) is sufficient when Q is metrizable.
Hint: Refer to Exercise 7.32 on page 427.
7.54 Suppose that Q has the weak topology determined by some set F of func-
tions. Let {xn}^-! be a sequence of points of Q and x G Q. Show that
xn —* x if and only if f(xn) —> f(x) for all f G F. Hint: See Exercise 7.53.
438 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
★ 7.55 Show that'a Cauchy sequence is convergent if it has a convergent subse-
quence.
7.56	Show that if a sequence in a metric space in convergent, then it is Cauchy.
7.57	Give an example of a nonconvergent Cauchy sequence.
7.58	Prove Proposition 7.10 on page 435.
it 7.59 Show that Ял is complete in each of the norms defined in Example 7.4 on
page 422.
it 7.60 Show that Cn is complete in each of the norms defined in Example 7.5 on
page 422.
7.61	Let Q be a nonempty set. Show that each of the spaces ^(Q), ^2(Q),
and described in Example 7.6 on page 423, is complete.
7.62	Let E be a subset of a topological space Q. Prove the following facts.
a)	E° is the union of all open sets contained in E and, hence, E° is the
largest open set contained in E.
b)	Let 91 be a neighborhood basis that determines the topology. Then
x 6 E° if and only if there is a W 6 91 such that x € W С E.
7.63	Prove Proposition 7.11 on page 436.
it 7.64 Let Q be a topological space. For E C Q, define dE = E\E°. The set dE
is called the boundary of E. Prove the following:
a) dE is closed.
b)	E is closed if and only if дЕ С E.
c)	(dE)°=0._
d)	dE = EC\Ec.
7.5 NETS AND CONTINUITY
Proposition 7.9 on page 434 describes the limit points of a subset of a
metric space in terms of convergent sequences. Example 7.10 on page 434,
on the other hand, shows that a similar characterization fails to be correct
for general topological spaces.
In this section, we present a generalization of sequences that is flexible
enough to permit a version of Proposition 7.9 to hold for general topolog-
ical spaces and provides as well an alternative method for characterizing
continuity. We first introduce the concept of a directed set.
DEFINITION 7.18 Directed Set
A directed set is a nonempty set I together with a relation having
the following properties:
7.5 Nets and Continuity □ 439
a)	l ь for each ь G I.
b)	Li b2 and b2 2^ imply b\ -< ьз.
c)	b2 € I implies there exists ьз G I such that b\ < ьз and b2 ьз.
An element of a directed set is called an index.
Remark: It follows easily from Definition 7.18 that for each finite subset J
of 7, there exists а к G I such that ь к for each ь G J.
EXAMPLE 7.1	1 Illustrates Definition 7.18
a)	A nonempty subset of real numbers with the order relation < is a di-
rected set. In particular, the set of integers greater than or equal to
some fixed integer is a directed set.	*
b)	Let 91 be a neighborhood basis and %lx = {7Vg91:tG7V}. For
<7, V G %r, say that U	V if U D V. Then $lx is a directed set with
respect to the relation	that is, with respect to D.
c)	Let Pjr(S) denote the collection of finite subsets of a set S. Then
Pjr(S) is a directed set with respect to C.	□
DEFINITION 7.19 Net, Convergence of Nets
A net of points in a set fi is a function x from a directed set I into fi.
The set I is called the index set of the net. We write xb for x(b) and
denote the net by
When (fi,T) is a topological space, a net of points in fi is said
to converge to the point x G fi if for each open set О containing x,
there is an index to G 1 such that xb G О whenever to Convergence
of {rrt}t€i to x is denoted by
lim xb = x or	xb —> x
qt, in case it is important to indicate the topology in which convergence
occurs, by xb x.
EXAMPLE 7.1	2 Illustrates Definition 7.19
a)	A sequence is a net with I = Af and = <• A sequence in a topological
space that converges to a point x is also a net converging to x.
b)	A slightly more general situation than in part (a) occurs when a se-
quence of the form {zn}“=1 is rePlacecl by a net indexed by the set
440 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
of integers {j G Z : j > k} for some integer fc, where the relation
is the usual < ordering. Such nets axe customarily also referred to as
sequences.
c)	Consider a function J: [a, oo) —> 7£. Since [a, oo) is a directed set with re-
spect to the relation <, the function f can be viewed as a net {ft}te[a,oo)
in where ft = f(t). Furthermore, lim/t = L if and only if for each
e > 0, there is a number M such that \f(t) — L\ <6 whenever t > a V M.
d)	Refer to Example 7.11(b). Suppose that xu G U for each U G ЭТЖ. Then
the net {xu}u&jix converges to x.
e)	Let f be a Riemann integrable function on [a, b] and <S the collection of
step functions on [a, b] that axe dominated by /. Then <S is a directed
set with respect to the usual < ordering of functions. For each h G S,
let yh = Ja6 h(x) dx. Then {yh}hes a net of real numbers and we have
that lim г/h = f(x) dx. (See Section 2.6 starting on page 81.)	□
Infinite Series and Infinite Sums
Using nets, we can now discuss infinite series and infinite sums in normed
linear spaces.
DEFINITION 7.20 Infinite Series and Sums in Normed Spaces
Let fi be a normed space and S an infinite subset of Z with the usual
< ordering. Suppose that Xj G fi for each j G S.
a)	Assume that S = {j G Z : j > к } for some integer fc. Then the
expression
oo
j=k
is called an infinite series. Let sn = xj f°r each n G S. If
the net {sn}nes converges in Q to s, then we say that the infinite
series хз converges to s and write s = xj- Otherwise,
we say that the infinite series fails to converge.
b)	In general, the expression
j£S
is called an infinite sum. Let sf = HjeF хз f°r eac^ & 'Pf(S),
where denotes the collection of all finite subsets of S with
the C ordering. If the net {sf}fep^(s) converges in fl to s, then
7.5 Nets and Continuity о 441
we say that the infinite sum ^,jeSXj converges to s and write
s = ^2j^sx3' Otherwise, we say that the infinite sum fails to con-
verge. In the special case that S is as in part (a), we denote the
infinite sum Xj by Xj.
Remark: If fi = 1Z and Xj > 0 for each j E S', then infinite sums are a
special case of generalized sums, discussed in Exercise 2.37 on page 57.
EXAMPLE 7.13 Illustrates Definition 7.20
a)	Let fl be a normed space, S = { j E Z : j > к } for some integer fc, and
Xj E fl for each j E S. We will show that convergence of the infinite sum
to s implies convergence of the infinite series to s. Suppose that Xj
converges to s. We must show that Xj also converges to s. Let
e > 0 and choose a finite subset Fq of S such that ||s — sp|| < e whenever
Fq C F. Let N = maxFo- Then for n > N, Fq C {j : к < j < n };
hence, ||s — sn|| < e. Thus, ^2°°.kXj converges to s.
b)	Let fl be a normed space, S an infinite subset of Z, and Xj E fl for
each j. It is not difficult to show that the infinite series xj con-
verges if and only if the infinite series 22^U xj converges, where к < t.
Similarly, the infinite sum 22jes xj converges if and only if the infinite
sum 52j€S\f хз converges, where F is a finite subset of S. '
c)	By Exercise 7.67, the series 22^=i VJ converge. In fact, we have
Итп_>оо Vi = °0-
d)	In part (a), we showed that if S = { j E Z : j > к }, then convergence of
an infinite sum to s implies convergence of the corresponding infinite se-
ries to s. Here we show that the converse is false. By Exercise 7.68, the
series	converges. However, the infinite sum	/J
fails to converge. Indeed, suppose Fq is a finite subset of positive inte-
gers. Let N = maxFo and set F = Fq U { 2j : N < j < n }. Then, by
part (c) of this example,
£(-1)7; = £	+ (1/2) E Vj
jEF	jEF0	j=N
can be made arbitrarily large by choosing n sufficiently large. Hence,
the infinite sum	fa^s converge.
e)	In part (a), we showed that if S = { j E Z : j > к }, then convergence of
an infinite sum to s implies convergence of the corresponding infinite se-
ries to s and, in part (d), we showed that the converse of that statement
442 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
is false. We will now show that in the special case of Q = 11 and Xj > 0
for each j G 5, the converse is true. Thus, assume that Xj G [0, oo) for
each j G S and that хз converges to s. Let e > 0. Choose N so
that n > N implies |s — Y^j=kxj\ < €- ^et = {j :	< J < N}.
Then F G and Fq C F implies
N	M
s-e<y\j<'£/xj<^xj<s + e,
j=k jeF j=k
where M = maxF. Therefore, Ylj>kx3 converges to s.
f)	Assume the XjS belong to a normed space. If the series ||х^||
converges, then ^j>k хз converges to s if and only if Xj converges
to s.	□
Remark: When the XjS are nonnegative real numbers, ^^кхз
converge if and only if the terms Xj become arbitrarily large as n in-
creases, that is, limn-^Q sn — oo. Consequently, in this case, we often
indicate convergence of Xj by Xj < oo and lack of convergence
by HT=kx3 =°°-
Nets and Topological Properties
Using nets, we can generalize Proposition 7.9 on page 434 to arbitrary
topological spaces.
PROPOSITION 7.12
Let E be a subset of a topological space П. Then x G E if and only if there
is a net {xb}bEi of points in E such that limxt = x. >
PROOF: Suppose that x G E. Let Tx be the collection of all open sets
containing x. Then Tx is a directed set with the relation D. For each
О G Tx, we have ОПЕ / 0; using the axiom of choice, we select xq 6 OoE.
Then {xo}oerx is a net of points in E such that limxo = x.
Conversely, suppose there is a net {xb}bEj of points in E such that
limxt = x. Then, for each open set О containing x, we have xb G О for
some index l. Because xb G E, it follows that ОПЕ / 0. Hence, x G E. 
We can also use nets to characterize continuity of functions. Before we
do so, however, it will be convenient to introduce the idea of a subnet. For
motivation, we note that a subsequence	of a sequence {^n}^Li
7.5 Nets and Continuity □ 443
is really the composition of the sequence (i.e., the function x on AT) with
the strictly increasing function n:AT —* AT defined by n(fc) = Thus, we
have the following definition.
DEFINITION 7.21 Subnet
Let {xb}bej be a net with order relation A subnet of {xb}bej is
a composition of that net (i.e., the function x on 7) with a function
h : К -+ I, where AT is a directed set with order relation < such that
the following conditions are satisfied:
a)	If «1 < «2» then h(«i)	h(«2)-
b)	For each t 6 I, there is а к 6 К such that l h(«).
Usually we write instead of and denote the subnet {хн(к)}кек
by
Of course, a subsequence is also a subnet. Other examples of subnets
are considered in the exercises. We leave it to the reader as an exercise to
show that if a net converges to a;, then so does every subnet of that net.
We now use nets to characterize continuity of functions. ,
THEOREM 7.1
Let Q and Л be topological spaces and f:£l —* Л. Then the following
conditions are equivalent:
a)	For each x G fi and each open set V containing f(x), there is an open
set U containing x such that f(U) С V.
b)	f is continuous, that is, /-1(O) is open in Q whenever О is open in A.
c) f-ifF) is closed in Q whenever F is closed in Л.
d) If {xb}bei is a net converging to x, then {f(xb)}bei has a subnet con-
verging to f(x).
e) If {xb}bej is a net converging to x, then {f(xb)}bei converges to f(x).
PROOF: The equivalence of (a) and (b) is shown by Proposition 7.3 on
page 414 and the observation that a topology is a neighborhood basis for
itself. The equivalence of (b) and (c) follows at once from the set the-
oretic identity f-^F0) = (f-^F))0. To complete the proof, it suffices
to establish the chain of implications (a) implies (e), (e) implies (d), and
(d) implies (c).
Suppose (a) holds and that {xb}bei is a net in Q such that lima;t = x.
Let V be an open set containing /(x). Then, by the continuity of /, there
444 □ Chapter 7 Elements of Topological Metric, and Normed Spaces
is an open set U containing x such that f(U) С V. Since limxt = x,
there is an index lq such that xL G U whenever lq b. It follows that
f(xb) € С V whenever bQ b. Thus, lim/(xj = f(x).
Next, suppose that (e) holds and that {xt}tG/ is a net in Q such that
limxt = x. Then lim/(xj = f(x). Because {/(xt)}t€j is a subnet of itself,
it follows that (d) holds.
Finally, suppose that (d) holds and that F is a closed subset of Л. We
will show that /""1(F) is closed by proving that f^fJF) —	Let
x G /~1(F). By Proposition 7.12, there is a net {xb}Lei in /~1(F) con-
verging to x. It follows from (d) that {/(xt)}tG/ has a subnet {/(хЬк)}кек
converging to f(x). So, by Proposition 7.12 again, f(x) G F = F and,
hence, x G /"”1(F). We have shown that /-1(F) C /'’1(F). Because the
reverse containment is trivial, we have established that /-1(F) = /-1(F).
Consequently, by Proposition 7.8(a) on page 433, /“1(F) is closed. 
Motivated by Theorem 7.1(a), we can define continuity at a point for a
function from one topological space to another. Let Q and Л be topological
spaces and f: Q —* Л. We say that f is continuous at a point x G П
if for each open set V containing /(x), there is an open set U containing x
such that f(U) С V, We see from Theorem 7.1 that f is continuous if and
only if it is continuous at each point of П.
Criteria for Convergence of Nets
The two main types of topological spaces we have studied thus far are met-
ric spaces (including normed spaces) and spaces with weak topologies. For
each of these two types of topological spaces, we have a simple characteri-
zation of convergence of nets.
PROPOSITION 7.13
Let (12, T) be a topological space and let {xb}Lei be a net in fi.
a)	If the topology T is induced by a metric p, then limxt = x if and only
if lim p(xb,x) = 0.
b)	If the topology T is the weak topology determined by a family of func-
tions F, then limxt = x if and only if lim f(xb) = f(x) for each f 6 F.
PROOF:
a)	Suppose that limp(xt,x) = 0. If О is an open set containing x, then
there is an e > 0 such that B€(x) С O. Moreover, there is an index to
such that p(xt,x) < e for to Thus, x G О for to t. It follows
that limxt = x. Conversely, suppose that limxt = x. Then given б > 0,
7.5 Nets and Continuity □ 445
there is an index lq such that xL € Be(x) for to Hence, p(xL,x) < e
whenever to t. Thus, limp(xt, x) = 0.
b)	Suppose that limxt = x. Then, by Theorem 7.1(e), lim/(a;J = f(x)
for each f € T. Conversely, suppose that lim/(xt) = /(x) for each
f G F. Let О be an open set containing x. Then, by Proposition 7.7
on page 428, there exist n 6 Л7,	C J7, and Uj € Tf., 1 < j < n,
such that
x e П ZfW c °-
j=l
Now, for each j = 1, 2, ..., n, since fj(x) e Uj and lim//3^) =
there exists an index tj such that fj(xL) в Uj whenever t5 z< t. Because
I is a directed set, there is an index to such that t; to for each j.
Therefore, we have
e П ZfW c °’
j=l
whenever to t. Thus, limx,, = x.	
EXERCISES 7.5
7.65	Verify the assertions made in Example 7.12 on page 439.
In Exercises 7.66-7.74, we are using the notation introduced in Definition 7.20
on page 440-
7.66	Refer to Definition 7.20.
a)	Show that an infinite series Xj converges if and only if the series
Xj converges, where к < I.
b)	Show that an infinite sum	converges if and only if the sum
xi converges, where F is a finite subset of S.
7.67	Show that Y^jLi fails converSe-
7.68	Show that	l)J/j converges.
7.69	Suppose ||xj || < oo. Show that Xj converges to s if and only if
Xj converges to s.
*7.70 Assume the ZjS are complex numbers.
a)	Prove that the infinite sum ^2jes Zj converges if there are nonnegative
real numbers bj, j € S, such that < bj for all but finitely many j € S
and bi < °°-
b)	Prove that \zj\ < oo implies ^2jeSZj converges.
446 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
7.71	Let a and /3 be scalars.
a)	Show that if the infinite sums Yljes xi and jes Уэ converg6> then so
does ^2jeS(otXj + fryj) and, moreover,
^(axj + /3yj) = а^х,+/3^у,.
jes	jes jes
b)	Show that the results of part (a) remain valid for infinite series.
7.72	Let S and T be disjoint infinite subsets of integers. Suppose that two of the
three infinite sums,	and	are convergent-
Prove that all three are convergent and that
52 =
jesur jes jer
Xj •
7.73	Let	be a convergent infinite series in a normed space.
a)	Prove that lim xn — 0.
b)	Prove that lim Xj = 0.
7.74	Let	be a convergent infinite sum in a normed space. Show that
the net < 52 <= f xi г converges to 0.
I	) Fev^(s)
7.75	Consider a sequence	of real numbers. Show that the function
ft = G[t], where [ ] denotes the greatest integer function and t € [1, oo), is a
subnet of {an}“=1- The order relation on [1, oo) is understood to be <.
7.76	Consider a function /:[l,oo) —> TZ. Let ft = /(t). Then {ft}te[i,oo) is
a net in 1Z with respect to the usual < ordering. Suppose that {tnjJXi
is a nondecreasing sequence in [l,oo). Show that {ftn}^-! is a subnet
{/t}tG[i,oo) И and only if limn-oo tn = oo.
7.77	Suppose that	is a subnet of {xj^ez and that {хСк^}р.ем is a
subnet of {xtK}«GK- Show that	is a subnet of {rrt}tez-
7.78	Let	be a net such that limxt = x. Show that if {хСк]кек is a
subnet of {ziJigz, then limxtK = x.
+ 7.79 Prove that a Cauchy sequence converges if and only if it has a convergent
subnet.
+7.80 Let T and U be topologies on a set Q. Show that T is weaker than U if and
only if xL x implies xc x.
7.81	Let Q be the Cartesian product of a family {Qk}kgk of topological spaces
and suppose that Q is given the product topology. Let {rctjiez be a net
in Q. Show that limzt = x if and only if limpK(xt) = pK(x) for each к e K.
7.82	Prove that the sum and product of continuous complex-valued functions on
a topological space are continuous.
7.6 Separation Properties □ 447
7.83	Suppose that f is a complex-valued continuous function on a topological
space Q. Prove that the function g defined on /~1({0}c) by g(x) = l/f(x)
is continuous.
7.84	Let Q and Л be topological spaces and f: Q —> Л. Prove using nets that f is
continuous if and only if f(E) C f(E) for each E C Q.
★7.85 Let /:Q —> A and g:A —> Г be continuous. Show that p о /:Q —> Г is
continuous. In words, the composition of continuous functions is continuous.
7.86 Let Q be the Cartesian product of a family {Qk}kgk of topological spaces
and suppose that Q is given the product topology. Let Л be a topological
space and f : Л —► Q. Show that f is continuous if and only if рк о f is
continuous for each к G К.
+7.87 Let Q be a topological space and /:Q —> H. Prove that the following
conditions are equivalent.
a)	f is continuous.
b)	/~1((—oo, a)) and J"1 ((a, oo)) are open for each a G 7£.
с)	/“‘ЧС—oo, a]) and /-1([a, oo)) are closed for each a G TZ.
7.88 Let Q be a set and (A,p) a metric space. A net of functions from Q
to A is said to converge uniformly to the function f if for each e > 0,
there is an index to such that p(fL(x\ f(x)) < e for all x E Q whenever
to t. Show that if Q is a topological space, fL is continuous for each t G I,
and converges uniformly to /, then f is continuous.
+7.89 Let Q be a topological space, (A, || ||) a complete normed space, and S an
infinite subset of integers. Suppose that for each j G S, fj'.ft —» A is
continuous and there is a bj G such that \\fj (x)|| < bj for all x G Q.
Show that if bi < °°> then fW =	fi (x) defines a continuous
function from Q into A. Hint: See Exercise 7.88.
7.6 SEPARATION PROPERTIES
In this section, we take up the topic of separation in topological spaces.
Two subsets A and В of a topological space Q are said to be separated if
there exist open sets U and V such that A C U, В С V, and U П V = 0.
When it is important to emphasize the role of the sets U and V, we will
say that A and В are separated by U and V.
EXAMPLE 7.14 Illustrates Separated Sets
a)	Consider the normed space (C([a, 5]), || ||oq) discussed in Example 7.8
on page 425. For f G C([a, b], we have ||/||oo = sup{ \f(x)I : x [a> Ц }.
(Why?) Thus, when two functions Д and /2 are “close” with respect
to this norm, say, ||/i — У2Ц00 < for some small number 5, it means
448 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
that |/i (x) — /2(^)1 < <5 for all x G [a, b], that is, the two functions are
uniformly close.
Suppose E C C([a, b]) and / 6 C([a, b]). Then f e E if and only if
for each б > 0, there exists a function g 6 E such that ||/ — <?||оо <
in other words, f G E if and only if it can be uniformly approximated
arbitrarily closely by members of E. On the other hand, f E if and
only if there is an б > 0 such that E C Ue = { h : \\f — h||oo > e}.
Because {/} C Vc = {h : \\f —	< б}, it follows that / is not
uniformly approximable by members of E if and only if {/} and E are
separated by the open sets Ue and V€ for some e > 0.
b)	Suppose that A and В are disjoint closed disks in the plane 112. Then
there is a line L = { (т, у) : ax+by = c } such that A and В are separated
by the corresponding open half-planes L_ = { (x, y) : ax + by < c } and
L± = { (rr, ?/) : ax + by > c}. The topic of separation by half-spaces is
important in our subsequent study of normed linear spaces.
c)	Let (Q,p) be a metric space and A and В disjoint closed subsets of Q.
By Exercise 7.47 on page 437, the function f(x) = p(x. A) — p(x, B) is
continuous on Q. It follows that A and В are separated by the open
sets /“1((—00,0)) and /-1((0,00)).	□
DEFINITION 7.22 Hausdorff Space, Normal Space
Let fi be a topological space.
a)	fi is said to be a Hausdorff space if distinct points are separated;
that is, x / у implies that {ж} and {y} are separated.
b)	fi is said to be a normal space if disjoint closed sets are separated;
that is, A and В closed and А О В = 0 implies that A and В are
separated.
While it is not true in general that a normal space is a Hausdorff
space (see Exercise 7.91), it is obvious that a normal space is Hausdorff if
all single element subsets are closed. A space with the property that all
single element subsets are closed is called a Ti-space. Hausdorff spaces
are always Ti-spaces. From now on, whenever we consider a topological
space, we will assume implicitly that it is a Ti -space.
Example 7.14(c) shows that all metric spaces are normal. And it is an
easy exercise to prove that all metric spaces are Hausdorff. Later we will
see examples of normal and Hausdorff spaces that are not metric spaces.
7.6 Separation Properties □ 449
Existence of Continuous Functions
Given an arbitrary topological space Q, it is not at all clear that there
exist nonconstant, real-valued continuous functions on Q. However, as we
will see momentarily, normal spaces always possess an abundance of such
functions. First we need the following characterization of normal spaces.
PROPOSITION 7.14
A topological space Q is normal if and only if for each closed set F and
each open set О with F С O, there exists an open set W such that
FcWcWcO.
PROOF: Suppose that Q is normal. Let F be closed, О open, and F С O.
Then F and Oc are disjoint closed sets. It follows that there are open
sets W and U with F C W, Oc C (7, and U A W = 0. Because W C Uc
and Uc is closed, it follows that W C Uc C (Oc)c = O.
To prove the converse, let A and В be disjoint closed sets. Taking
F = A and О = Bc, there is, by assumption, an open set W such that
A C W C W C Bc. But IVе is open and contains B, and W A Wc = 0.
Thus, A and В are separated by W and W .	
The string of containments F C W C W С О in the statement of
Proposition 7.14 invites iteration: We can find open sets U and V such
that
F CU CU CW CW CV CV CO.	(7.7)
To iterate further, we need better notation. A natural and judicious choice
is to use binary digits as follows: W = W.io, U = Woi, and V = W,ц.
Then (7.7) becomes
F C W.oi C W.oi C W io С ТУ.ю С Ж11 C W.n C O.
This construction can now be carried on indefinitely to yield the following
lemma. The details of its proof are left to the reader as an exercise.
LEMMA 7.1
Suppose that Q is a normal space. Let F be closed, О open, and F с O.
Furthermore, let T denote the set of numbers in the interval (0,1) that
have terminating binary expansions. Then there is a collection of open sets
{ Ж :t e T } such that t,s ET and t < s implies F C Wt CWSCO.
Using Lemma 7.1, we can now construct nonconstant, continuous, real-
valued functions on normal spaces.
450 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
THEOREM 7.2 Urysohn’s Lemma
Let A and В be disjoint closed nonempty subsets of a normal space П.
Then there is a continuous function f :fl —> such that f(Q) C [0,1],
/(A) = {0}, and/(B) = {1}.
PROOF: First we apply Lemma 7.1 with F = A and О = Bc to obtain a
collection of open sets { Wt : t G T } such that t, s G T and t < s implies
A C Wt C C Bc. Also, we let Wx = Q and To = TU {1}.
Now we define a function f on f2 by
f(x) = inf{ t G To : x G Wt}.
Clearly, f takes values only in [0,1]. If x G A, then x G Wt for each t G T.
Because T is dense in [0,1], it follows that f(x) = 0. On the other hand, if
x G B, then {t GT0 : x G } = {1}. Thus, f(x) = 1.
It remains to show that f is continuous on Q. By Exercise 7.87 on
page 447, it is enough to prove that for each real number s, /“1((—oo, $)) is
open and oo,s]) is closed. First note that
r—i//	\\ Г 0,	s < Oj j	p—iff ix Г 0,	S < 0,
f ((-оо,в)) = |П)	s>1	and	f ((-«м])	= |П)
Again using the fact that T is dense in [0,1], we have
/-i((-oo,s)) = (J Wt, se(o,l],	(7.8)
t<s
and
r1((-TO>S]) = p|TFt, SG[O,1).	(7.9)
3<t
Equation (7.8) shows that for s G (0,1], /“1((—oo,s)) is open, being a
union of open sets. And (7.9) shows that for s G [0,1), /-1((-oo, $]) is
closed, being an intersection of closed sets.
We have now shown that for all s G R, /“1((—oo,s)) is open and
/-1((—oo, $]) is closed. Hence, f is continuous.	
Remark: Exercise 7.47(f) on page 437 provides a quick elementary proof
of the metric-space version of Urysohn’s lemma.
Urysohn’s lemma is frequently used to obtain continuous approxima-
tions to characteristic functions of closed sets. Typically, one has a closed
subset F of some normal space Q and an open set О containing F that is
7.6 Separation Properties □ 451
“close” to F in some sense. Applying Urysohn’s lemma with В = F and
A = Oc, we obtain a continuous function f with values in [0,1] that agrees
with the characteristic function of F everywhere except possibly on О \ F.
When fi = 1Z, F = [a, b], and О = (a — 6, b + e), this approximation
procedure is nicely illustrated by the continuous function that is 1 on [a, b],
0 on Oc, and linear on each of the intervals (a — e, a) and (b, b + e). Later,
when we study spaces of continuous functions, we will rely heavily on the
approximation of characteristic functions.
As a more immediate application of Urysohn’s lemma, we present the
following important result.
THEOREM 7.3 Tietze’s Extension Theorem
Let fi be $ normal space, F a closed subset of fi, and	continu-
ous function. Then there exists a continuous function /: fi —» such that
f(x) = f(x) for each x e F. Moreover, if M — sup{ |/(#)| : x e F } < oo,
then f may be chosen such that sup{ |/(x)| : x G fi } = M.
PROOF: If M = 0, the result is trivial. We next consider the case where
M is finite and nonzero. Without loss of generality, we can assume M = 1.
(Why is that so?)
Because f is continuous on F, the sets A = oo, —1/3]) and
В — /“"1([l/3, oo)) are relatively closed in F and, because F is a closed
subset of fi, A and В are also closed in fi. So, by Urysohn’s lemma, there is
a continuous function pi on fi such that pi(fi) C [—1/3,1/3], gi(x) = —1/3
for all x 6 A, and pi(x) = 1/3 for all x G B. It follows that the continuous
function /i defined on F via fi = f~9i satisfies |/i(a:)| <2/3 for all x G F.
Similarly, applying Urysohn’s lemma to the sets A = oo, —2/9])
and В = /1“1([2/9, oo)), we can obtain a continuous function p2 on fi such
that p2(fi) C [—2/9,2/9], рг(^) — -2/9 for all x G A, and рг(х) = 2/9
for all x G B. It follows that the continuous function /2 defined on F via
/2 = fi - 92 = f “ (Pi + P2) satisfies |/2(z)| < 4/9 for all x G F.
We now proceed inductively to construct a sequence {pn}^=i of con-
tinuous functions on fi such that |pn(rr) | < 2n-1/3n for all x G fi and
/(*)-$>) < (2/3)”,
x G F
J=1
It follows from Exercise 7.89 on page 447, that the function f = gn
is continuous on fi. And the previous two inequalities show that |/(x)| < 1
for each x G fi and that /(x) = /(x) for each x G F.
452 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
It remains to consider the case where f is unbounded. To that end,
define /о = arctan/. Since |/o(z)| < тг/2 for all x e F, we can apply the
results just proved for bounded functions to obtain a continuous function
po- fi —* 'R such that go(x) = /o(^) for each x E F. The function f = tanp0
is continuous on Q and is such that f(x) = /(x) for each x € F. 
EXERCISES 7.6
7.90	Consider the subsets of 7£2 given by A = {(x, у) : x > 0, у > 1/x} and
В = { (x,0) : x > 0}.
a)	Show that A and В are disjoint closed sets that cannot be separated
by open half-planes in the sense of Example 7.14(b) on page 448.
b)	Find explicitly open sets U and V that separate A and B.
7.91	Provide an example of a normal space that is not Hausdorff. Hint: Refer
to Example 7.2(c) on page 416.
7.92	Show that all metric spaces are Hausdorff.
7.93	Let T = {0} U { W C A/*: Wc is a finite set} where, as usual, N denotes
the set of positive integers. Show that T is a topology on Af and that
(N,T) is a Ti -space.
7.94	Refer to Exercise 7.93. Show that the topological space (AT, T) is neither
a Hausdorff space nor a normal space.
7.95	Describe all continuous functions f'.N —► 7£, where Af is given the topol-
ogy T defined in Exercise 7.93.
7.96	Describe all convergent sequences in A/*, where is given the topology T
defined in Exercise 7.93.
7.97	Prove that a normal Ti-space is a Hausdorff space.
7.98	Prove that a Hausdorff space is a Ti-space.
7.99	Prove that a topological space is Hausdorff if and only if convergent nets
have unique limits (i.e., limxt = x and limxt, = у imply x = y).
7.100	Let Q be a nonempty set and 7> the weak topology on Q determined by
a family of functions F. Suppose that for each f E F, the space /(Q) is
Hausdorff. Show that (Q, T) is a Hausdorff space if and only if F separates
the points of f^(i.e., x, у E Q and x у imply that there exists an f E F
such that /(x) / /(?/)).
7.101	Provide the details of the proof of Lemma 7.1 on page 449.
7.102	Let S be a nonempty set. Formulate and prove a version of the Tietze
extension theorem where 11 is replaced by the Cartesian product 1ZS with
the product topology.
7.103	Show that Theorem 7.3 on page 451 is no longer valid if F is assumed to
be open instead of closed.
7.7 Connected Sets □ 453
7.104	Let F be a closed subset of TZ and f: F —> TZ be continuous. From Propo-
sition 2.13 on page 59, we can write Fc = Uje.s where 5 is a count-
able collection of disjoint open intervals. Construct a continuous function
g: TZ —> TZ that agrees with f on F and is linear on each interval JeS.
7.7 CONNECTED SETS
If D is a subset of TZ, then, except in trivial cases, the characteristic func-
tion xd is not continuous. There are, however, many topological spaces
that have nonconstant, continuous characteristic functions. For example,
if fi = [0,1] U [2,3] is given the relative topology from TZ, then X[o,i] is a
continuous function on fi. Such topological spaces are called disconnected.
DEFINITION 7.23 Disconnected and Connected Spaces
A topological space having at least one nonconstant, continuous char-
acteristic function is said to be disconnected. A topological space
that is not disconnected is said to be connected. A subset of a topo-
logical space is called (dis)connected if it is (dis)connected with respect
to the relative topology.
If f is a nonconstant, continuous characteristic function on a topologi-
cal space fi, then oo, 1/2)) and /”1((l/2, oo)) are disjoint nonempty
open sets whose union is fi. Thus, we see that each of the following condi-
tions is equivalent to a topological space, fi, being disconnected:
•	fi can be decomposed into two disjoint nonempty open sets.
•	fi contains a proper, nonempty subset that is both open and closed.
The following proposition provides yet another way of characterizing
disconnected sets.
PROPOSITION 7.15
A subset D of a topological space fi is disconnected if and only if there are
nonempty sets A and В such that D = AuB. АПВ = 0, and АП В = 0.
PROOF: Suppose that D is a disconnected subset of fi. Let f be a non-
constant characteristic function on D that is continuous with respect to
the relative topology. Because A = Z"1 ((1/2,3/2)) = /“1({1}), we have
454 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
that A is nonempty and both open and closed in the relative topology.
Similarly, В = D \ A = /”1({0}) is also relatively open, relatively closed,
and nonempty. It follows that there is a closed subset F of fi such that
A — F П D. Because F A В = 0 and A C F, we have A A В = 0. Similarly,
we have A A В = 0.
Conversely,, suppose that there are nonempty sets A and В such^that
D = AUB, A A В = 0, and A A В = 0. Then we have A = D A A and
В = D A B. Thus, A and В are relatively closed. Since В = D \ A, В is
also relatively open and, similarly, A is relatively open. It follows easily
that the characteristic function xa is nonconstant on D and continuous in
the relative topology. Consequently, D is a disconnected subset of fi. 
EXAMPLE 7.15 Illustrates Connected Sets
Let fl be a topological space and x e fl. Then it follows easily from
Proposition 7.15 that each singleton subset of fl is connected. The Cantor
set provides an example of a topological space in which the only connected
subsets are singletons.	□
EXAMPLE 7.16 Illustrates Connected Subsets of It
In this example, we will establish the fact that the connected subsets of 1Z
are precisely the intervals (including degenerate intervals).
Let D be a connected subset of 11. If D = 0, then it is also a degenerate
interval (e.g., (я,#]). If D is a singleton set, {x}, then it is a degenerate
interval of the form [ж,ж]. So, assume that D contains more than one
point. Let a, b € D with а < b and let c 6 (a, b). If c does not lie in D,
then the sets A = D A (c, oo) and В = D A (—oo,c) are relatively open,
disjoint, and their union is D. Thus, D is disconnected, a contradiction.
Hence, the interval (a, b) is contained in D whenever a and b are elements
of D with а < b. It follows immediately that D is equal to (inf D,supD),
(inf D, sup D], [inf D, sup D), or [inf P, sup D].
Conversely, suppose that D is an interval. We claim that D is con-
nected. Assume to the contrary. Then, by Proposition 7.15, there are
nonempty sets A and В such that D = AU В, A A В = 0, and A A В = 0.
Let a e A and b e B, and assume without loss of generality that а < b.
Consider the set C = { x : [a, x] C A }. We note that C/0 because а € C.
Because b A, C is bounded above by b. Thus, и = sup C is a real number
and а < и < b. There are three possibilities: и e А, и E B, or и D.
The last possibility can be eliminated immediately because а < и < b and
both a and b lie in the interval D.
7.7 Connected Sets □ 455
Suppose и € A. Because, for each n € Af, [a, и 4-1 /п] is not a subset
of A, it follows that we can find an element bn e В П [и, и 4- l/п]. And,
since Ишп-^оо bn = u,_we have that и € А П B. But this contradicts the
assumption that А П В = 0. On the other hand, suppose и € В. Then
u> a and, so, и — 1/n € A for sufficientlyjarge n. Consequently, because
limn^oo и — 1/n = u, we have that и € А П B. But this contradicts the
assumption that А П В = 0.	□
One of the most useful properties of connected spaces is described in
the following theorem which, in words, states that the continuous image of
a connected space is connected.
THEOREM 7.4
Let fi be a connected topological space and /: Q —► Л be a continuous
function. Then f (fi) is a connected subset of A.
PROOF: Suppose to the contrary that /(Q) is not connected. Then there is
a nonconstant, continuous characteristic function д on /(Q). It follows from
Exercise 7.85 on page 447 that the nonconstant characteristic function gof
is continuous on fi. Thus, fi is not connected, a contradiction.	
Combining Theorem 7.4 and Example 7.16, we immediately obtain the
following two corollaries.
COROLLARY 7.2
Let f be a real-valued continuous function on a connected topological
space fi. Then f(£l) is a (possibly degenerate) interval.
COROLLARY 7.3 Intermediate Value Theorem
Let f be a real-valued continuous function on a closed bounded inter-
val [a,b]. Then for each number у between f(d) and f(b), there is an
x € [a, b] such that /(x) = y.
Arcwise Connected Spaces
Let fi be a topological space and p and q points of Q. Then we say that
p is connected to q by an arc if there exist a, b e and a continuous
function g-. [a, b] —> Q such that p = g(a) and q = g(b). The set A = <j([a, b])-
456 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
is called an arc connecting p to q. It is easy to show that the following
hold for all points p,q,r € Q. (See Exercise 7.111.)
•	p is connected to itself by an arc.
•	If p is .connected to q by an arc, then q is connected to p by an arc.
•	If p is connected to q by an arc and q is connected to r by an arc, then
p is connected to r by an arc.
Note: In view of the second bulleted item, we can unambiguously use
phrases such as “p and q are connected by an arc” and “there is an arc
connecting p and
The space fi is said to be arcwise connected if for every pair of
points p, q € fi, there is an arc connecting p and q. The next proposition
shows that arcwise connected spaces are always connected.
PROPOSITION 7.16
An arcwise connected topological space is connected.
PROOF: Suppose that fi is arcwise connected but not connected. Let
д be a nonconstant, continuous characteristic function on fi. Let p and q
be points of fi such that <?(p) = 0 and g(q) = 1. As fi is arcwise connected,
there is an interval [a, b] and a continuous function ft [a, b] —* Q such that
/(a) = p and f (b) = q. It follows that g о f is a nonconstant, continuous
characteristic function on [a, b], implying that [a,b] is disconnected. But,
by Example 7.16 on page 454, the interval [a, b] is connected. Thus, we
have reached a contradiction. Hence, Q must be connected.	
The converse of Proposition 7.16 is false. (See Exercise 7.109.) There
is, however, a converse for open subsets of a normed linear space.
PROPOSITION 7.17
A connected open subset of a normed space is arcwise connected.
PROOF: Suppose that D is a nonempty open subset of a normed space.
Let p e D and W be the set of all points of D that are connected to p by
an arc in D. Since p e W, W is nonempty. We claim that W is open. Let
q € W. Then q e D and, hence, there is an r > 0 such that Br(q) C D.
If x 6 Br(q), then the arc { q + t(x — <?) : 0 < t < 1} connects q to x and
lies inside Br(q). It follows that p is connected to x by an arc in D. Thus,
Br(g) C W and, hence, W is open.
7.7 Connected Sets □ 457
We also claim that D \ W is open. Let q € D \ W. Then q € D and,
so, as we discovered in the previous paragraph, there is an r > 0 such that
Br(q) C D and any point of Br(q) is connected to q by an arc in D. If a
point of Br(g) is connected to p by an arc in P, then there would be an
arc in D connecting p to q, contradicting the assumption that q € D \ W.
Thus, Br(q) C D \ W and, hence, D \ W is open.
We have shown that W is both open and closed in D. Because D is
connected and W is nonempty, it follows that D = W. As any point of D
is connected to p by an arc in P, any two points of D must be connected
to each other by an arc in D. Hence, D is arcwise connected.	
Remark: A normed space fl is always arcwise connected. Indeed, if x G fl,
then the arc {tx : 0 < t < 1} connects 0 to x. Hence, any point of fl is
connected to 0 by an arc and, so, any two points of fl are connected to
each other by an arc.
Connected Components, Totally Disconnected Spaces
We will now discover how a topological space can be decomposed as the
union of a family of pairwise disjoint connected subsets. First we state
two propositions, whose proofs are left to the reader as Exercises 7.113
and 7.114.
PROPOSITION 7.18
Let S be a collection of connected subsets of a topological space fl. Suppose
that Pi П D2 / 0 whenever Di,D2 € S. Then Unes^ IS a connected
subset of fl.
PROPOSITION 7.19
Let fl be a topological space and A a connected subset of fl. Then A is
also connected.
Given a point x in a topological space fl, we can apply Proposition 7.18
with S equal to the collection of all connected subsets of fl containing x
to obtain a connected set Cx. The set Cx is the largest connected sub-
set of fl that contains x and is called the connected component of fl
containing x.
458 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
THEOREM 7.5
Let Q be a topological space.
a)	For each pair of elements x, 2/ € fl, either Cx = Cy or СХПСУ = 0.
b)	For each x G П, Cx is closed.
c)	fi = UiGQ Cx.
PROOF:
а)	К Cx П Cy / 0, then, by Proposition 7.18, Cx U Cy is connected. It
follows that Cx U Cy C Cx and Cx U Cy C Cy. Hence, Cx = Cy.
b)	By Proposition 7.19, Cx is connected for each x € Q. Hence, Cx C Cx
and, so, Cx is closed. Thus, (b) holds.
c)	The proof of (c) is trivial because x € Cx for all x G Q.	
A topological space Q is said to be totally disconnected if all of its
connected components are single element sets.
EXAMPLE 7.17 Illustrates Totally Disconnected Spaces
a)	Any nonempty set is totally disconnected with respect to the discrete
topology.
b)	The set Q of rational numbers, equipped with the relative topology
inherited from 11, is totally disconnected.
c)	The Cantor set P, equipped with the relative topology inherited from H,
is totally disconnected.	□
EXERCISES 7.7
7.105	Show that a topological space is disconnected if and only if it has a subset
that is proper, nonempty, open, and closed.
7.106	Show that a continuous integer-valued function on a connected space must
be constant.
7.107	Refer to Exercise 7.64 on page 438. Let Q be a topological space and
A C Q. Suppose g: [0,1] —► Q is a continuous function such that p(0) G A
and </(l) G Ac. Show that there exists an s G [0,1] such that g(s) G dA.
7.108	Provide the omitted details of Example 7.16 on page 454.
7.109	Give an example of a topological space that is connected but not arcwise
connected. Hint: Consider the following subset of 1Z2:
{(О,?/) G 7£2 : -1 < у < 1} U { (ж, 2/) G 7£2 : x > 0, у = sin(l/x) }.
7.110	Consider the normed linear space (C([a, 5]), || ||oo) from Example 7.8 on
page 425. Which of the following subsets of C([a, 5]) are connected? Pro-
vide a proof in each case.
a)	{ g : g is real-valued and never 0 },
7.8 Separability, Second Countability, and Metrizability □ 459
b)	{ g : g(x) > 0 for each x 6 [a, 5] },
c) { 9 • 9 is never 0 on [a, 6] }.
7.111	Let p, q, and r be points of a topological space Q. Prove each of the
following:
a)	p is connected to itself by an arc.
b)	If p is connected to q by an arc, then q is connected to p by an arc.
c)	If p is connected to q by an arc and q is connected to r by an arc, then
p is connected to r by an arc.
7.112	Let Q be a topological space. For x 6 Q, define the arcwise connected
component of x by Ax = { у £ Q : у is connected to x by an arc }.
a)	Prove analogues of parts (a) and (c) of Theorem 7.5 on page 458 using
arcwise connected components in place of connected components.
b)	Show that the analogue of part (b) of Theorem 7.5 is false in general, but
is true if Q is an open subset of a normed space and, so, in particular,
is true if Q is a normed space.
7.113	Prove Proposition 7.18 on page 457.
7.114	Prove Proposition 7.19 on page 457.
7.115	Let T denote the unit circle centered at 0 in the complex plane C; let C(T)
be the space of complex-valued continuous functions defined on T equipped
with the norm Ц/Ц = sup{ |/(z)| : z 6 T}; and let G be the set of non-
vanishing functions in C(T).
a)	Show that G is open.
b)	Describe the connected component of the constant function 1.
c)	Describe the connected components of G.
7.116	The Cantor function restricted to the Cantor set is an example of a con-
tinuous function mapping a totally disconnected space onto a connected
space. Show that if Q is a connected space, then there are no nonconstant
continuous functions from Q into the Cantor set.
7.8 SEPARABILITY, SECOND COUNTABILITY,
AND METRIZABILITY
In this section, we will discuss separable spaces and a related class of spaces
known as second countable spaces. We will also prove a powerful theorem
that gives a sufficient condition for a topological space to be metrizable.
Separable Spaces
Recall that a subset E of a topological space fi is dense if E = Q. A crucial
property of the space 1Z of real numbers is that it contains a countable
subset that is dense; for example, the countable set Q of rational numbers is
460 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
dense, as we know from Proposition 2.4 on page 39. Many of the topological
spaces of interest in analysis share with 1Z the property of having subsets
that are both countable and dense. Such spaces are called separable.
DEFINITION 7.24 Separable Space
A topological space Q is said to be separable if it contains a countable
dense subset; that is, if there is a set E С fi such that E is countable
and E = Q.
EXAMPLE 7.1	8 Illustrates Definition 7.24
In this example, we use the notation Q + iQ for the set of complex num-
bers having rational real and imaginary parts. We note that Q + iQ is a
countable set. (Why?)
a)	Consider the space ^1(ЛГ). For each n G V, the set
An = { f € ^(Af) : /(j) G Q 4- iQ, l<j <n, and /(j) =0, j > n }
is countable. Hence, by Proposition 1.10 on page 23, A = U^Li
is also countable. It is left for Exercise 7.120 to show that A is dense
in ^(Af). Thus, is separable.
b)	Consider the normed space (C([a, b]), || ||oo) discussed in Example 7.8 on
page 425. For each n € Af, let PLn denote the set of / G C([a,b]) with
the property that, for each j = 0, 1, 2, ..., n — 1, the restriction of f to
the subinterval [a4-j(b—a)/n, a4-(j4-l)(b—a)/n] is of the form rrijx+bj,
where rrij, bj G Q 4- iQ. Each function in PLn is completely determined
by a 2n-tuple of numbers in Q+iQ. Hence, PLn is a countable set and,
so, PL = (J~ i PLn is also countable. It is left for Exercise 7.121 to
show that PL is dense in C([a, 6]). Thus, <7([a, b]) is separable. □
r
EXAMPLE 7.1	9 A Nonseparable Metric Space
Consider the space €°°([0,l]). The family { X[0,t] • t £ [0,1] } of character-
istic functions satisfies the condition:
Bi/2(X[o,t]) П B1/2(x[o,s]) = 0 for t / s.	(7.10)
If E is a dense subset of Z?°°([0,1]), then, for each t G [0,1], there is an
ft G E П B1/2(x[o,t])- By (7.10), no two fts can coincide. Because the
collection { ft : t G [0,1] } is uncountable, it follows that E is not countable.
Consequently, Z?°°([0,1]) is not separable.	□
7.8 Separability, Second Count ability, and Metrizability □ 461
Second Countable Spaces
We know that the collection I of open intervals forms a neighborhood
basis determining the topology of TZ. By considering intervals in I with
rational endpoints, we obtain a countable neighborhood basis determining
the topology of TZ. There are many interesting spaces that, like TZ, have
countable neighborhood bases. Such spaces are called second countable.
DEFINITION 7.25 Second Countable Space
A topological space is said to be second countable if it has a count-
able neighborhood basis.
The following proposition relates the concepts of second countable and
separable for topological spaces.
PROPOSITION 7.20
a) If a topological space is second countable, then it is separable,
b) If a metric space is separable, then it is second countable.
PROOF:
a)	Let 01 be a countable neighborhood basis for a topological space fi. For
each nonempty U € % let хц G 17. The set { хц : U € 01, U / 0 } is
dense because it has a nonempty intersection with each nonempty open
set, and it is countable, because 01 is countable. Thus, fi is separable.
b)	Suppose that (fl, p) is a metric space containing a countable dense sub-
set, say, E = Then the collection of open balls
: j, fc = 1, 2, ...}
is countable. And it is easy to show that is a neighborhood basis
on fl. (See Exercise 7.122.)
We claim that ЯИ is a neighborhood basis for the topology induced
by p. Let О be open with respect to the topology induced by p and
let x e O. Choose e > 0 so that Be(x) С О and let fc be a positive
integer such that 2/fc < e. Since E is dense, there exists a j such that
Xj € Byk(x). Then x G Bi/k(xj) and
Bi/k(xj) ®2/fc(*c) C Be(x) C O.
462 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
Thus, ЯН is a neighborhood basis for the topology induced by p and, so,
(fl, p) is second countable.	
We next consider a consequence of second countability that will be
useful later when we study compactness. Let E be a set. A collection S
of sets such that E C Uses $ *s са^е^ a covering of E. A subcollection
of S that is also a covering of E is called a subcovering. If the members
of S are open in some topology, then S is called an open covering of E.
A topological space fl is said to have the Lindelof property if every open
covering of fl has a countable subcovering.
PROPOSITION 7.21
A second countable topological space has the Lindelof property.
PROOF: Suppose that fl is a topological space with a countable neigh-
borhood basis {C7n}nj and let S be an open covering of fl. For each
x E fl, we can choose an Ox E 5 and a positive integer nx such that
x E Unx C Ox. The set of integers В = {nx : x E fl} is countable, being
a subset of a countable set. For each m E B, we can choose an От E S
such that Um C It follows that fl C	c	Thus,
{ Om : m E В } is a countable subcovering.	
The converse of Proposition 7.21 is false in general, but it is true for
metric spaces. See Exercises 7.123-7.124.
Metrization
We conclude this section by stating and proving a theorem that provides
a simple pair of conditions that are sufficient for a topological space to be
metrizable.
THEOREM 7.6 Urysohn’s Metrization Theorem
A second countable, normal space is metrizable.
PROOF: Let (fl, T} be a normal space with countable neighborhood ba-
sis 91. Consider the countable set W = {(17, V) : U, V E У1 and U С V }.
We show first that, for each open set О and point x E O, there is a pair
(17, V) E W such that
xtUcUcVcO.	(7.11)
Indeed, since 91 is a neighborhood basis, we can find a V E 91 such that
x E V С O. Applying Proposition 7.14 on page 449 with F = {a;},
7.8 Separability, Second Countability, and Metrizability □ 463
we obtain an open set W such that x e W C W C VJ Again using
the assumption that 91 is a neighborhood basis, we can find a U 6 91
such that x E U c W. It follows that the pair ((7, V) belongs to W and
satisfies (7.11).
Let {(Un, Vn)}n be an enumeration of W and apply Urysohn’s lemma
(Theorem 7.2 on page 450) to obtain, for each n, a continuous function
/n:f2 —> [0,1] that vanishes on Un and is constantly 1 on V£. Using the
functions {/n}n, we define a function a on Q x Q by
= 522-n|/n(x) - /n(y)|.
n
We claim that ст is a metric. That a(x,x) = 0, ст(х,?/) = сг(г/, rr), and
<т(а;, y) < cr(x, z) + a(z, y) are easily verified. Thus, it remains only to show
that a(a;,2/) > 0 if x / y. Because {y} is a closed set, (7.11) implies that
there is a A; such that
xeUkcU^cVkcQ\{2/}.
Thus, fk(x) = 0 and fk(y) = 1. So, a(x,y) > 2~k\fk(x)-fk(y)\ = 2~fc > 0.
The last step of the proof is to show that the topology Ta induced
by the metric a is the same as T. For fixed у E Q, consider the function
9y(x) —	follows from Exercise 7.89 on page 447 that gy is
continuous with respect to the topology T. Hence, for each r > 0, the ball
B°(y) = 9y \(—oo,r)) is T-open. Consequently, the topology is weaker
than T.
To prove that T is weaker than 7^, it suffices to show that if О is
T-open and x E O, then there exists an s > 0 such that С O.
Referring to (7.11), we see that we can find a positive integer m such that
x e Um C Um c Vm C O. From 2~m\fm(x) - /m(t/)| < <r(x,t/), we deduce
that, for у e B%_m(x), fm(y) = \fm(x) - fm(y)\ < 2ma(x,y) < 1. Because
f is constantly 1 on it follows that B£_m(x) cVmcO.	
The following corollary of Urysohn’s metrization theorem provides a
sufficient condition for a space with a weak topology to be metrizable. Its
proof is left to the reader as Exercise 7.127.
COROLLARY 7.4
Let Q be a set equipped with the weak topology induced by a family of
functions F satisfying the following conditions:
t See the paragraph following Definition 7.22 on page 448.
464 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
a)	F is countable.
b)	Ifx.yeft and x^y, then there is an f e F such that f(x) / f(y).
c)	f(SY) is metrizable for each f e F.
Then Q is metrizable.
EXERCISES 7.8
7.117 Show that the spaces 7£n and Cn are separable.
7.118 Let E be a Lebesgue measurable subset of 1Z. Show that
а)	£г(Е) and £2(E) are separable.
b)	£°° (E) is not separable except in the trivial case where E has Lebesgue
measure 0.
7.119 Work Exercise 7.118 with 7Z replaced by 7£n.
7.120 Show that the set A in Example 7.18(a) on page 460 is dense in
7.121 Show that the set PL in Example 7.18(b) on page 460 is dense in C([a, b]).
7.122 Refer to the first paragraph in the proof of part (b) of Proposition 7.20 on
page 461. Show that 9Л is a neighborhood basis on Q.
7.123 Let V denote the topology on H determined by the neighborhood basis
consisting of all intervals of the form [a, b). Show that (7£, V) is separable
and has the Lindelof property but is not second countable.
7.124 Show that a metric space with the Lindelof property is second countable.
it 7.125 A topological space is called first countable if at each point of the space,
there is a countable neighborhood basis. Show that the space in Exer-
cise 7.123 is first countable.
7.126 Show that the topological space in Example 7.19 on page 460 fails to be
second countable.
7.127 Prove Corollary 7.4 on page 463.
7.128 Show that the conclusion of Corollary 7.4 on page 463 fails if the hypothesis
that F is countable is omitted.
7.9 COMPACT METRIC SPACES
The idea of compactness of a set of real numbers can be formulated in
several ways — for example, compactness as the Heine-Borel property (see
next.page) or compactness in terms of the Bolzano-Weierstrass condition
(see Exercise 2.45 on page 63).
In this section, we will present a definition of compactness in the con-
text of metric spaces that reduces to the Heine-Borel property in the case
of the real line 7£. We will also prove an important theorem that provides
several alternative characterizations of compactness.
7.9 Compact Metric Spaces □ 465
DEFINITION 7.26 Compact Set, Compact Metric Space
A subset E of a metric space Q is called compact if every open cov-
ering of E has a finite subcovering. If Q itself is compact, then it is
said to be a compact metric space.
In practice, we often verify that a space is compact not directly from
the definition but, rather, by using conditions equivalent to compactness.
These conditions are generalizations of various formulations of compactness
on the line. For example, the Heine-Borel theorem asserts that a subset
of 11 is compact if and only if it is closed and bounded. We will see that we
can get an appropriate generalization of the Heine-Borel theorem by using
the right analogues of the terms closed and bounded.
The following simple example shows that a condition more subtle than
“closed and bounded” is needed to extend the Heine-Borel theorem to
metric spaces. Let Q = Q, p(x,y) = |x — y\, and E =. [t,t 4- 1] A Q,
where t is any irrational number. We note that although E is a closed and
bounded subset of Q, the collection {(t 4- 1/n, t 4-1) A Q}Xi *s 311 °Pen
covering of E without a finite subcovering.
It is also not clear what replacement for bounded is appropriate for
general metric spaces. A naive approach would call a subset E of a metric
space “bounded” if the set of distances between points of E is bounded. By
Proposition 7.6 on page 426, however, any metric is equivalent to a bounded
metric. Because a set that is compact with respect to a metric p is also
compact with respect to an equivalent metric, it follows that imposing a
boundedness condition on the distances between points of a set will be
irrelevant to the problem of characterizing compactness.
We will show that one way to generalize the Heine-Borel theorem to
arbitrary metric spaces is to replace the term closed by complete and the
term bounded by what is called totally bounded.
DEFINITION 7.27 Totally Bounded Set
A subset E of a metric space f2 is said to be totally bounded if for
each б > 0, there exist finitely many points si, ..., of E such
that E C |Jj=1 Be(xj).
466 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
We note that a compact subset E of a metric space Q is totally bounded
because, for each e > 0, the collection of balls { Be(x) : x G E } is an open
covering of E. However, total boundedness is not by itself sufficient to
guarantee compactness, as can be seen by considering again the example
where fi = Q, p(x, y) = |x — y\, and E = Q A [t, t + 1]. The following theo-
rem shows, among other things, that total boundedness and completeness
together are equivalent to compactness.
THEOREM 7.7
For a nonempty subset E of a metric space f2, the following conditions are
equivalent:
a)	E is compact.
b)	If {Fn}^! is a sequence of closed subsets of Q such that for each
M,we have E П (п„=1 Fn) / 0, then E П (A~=i Fn) / 0.
c)	Every sequence of points of E has a subsequence converging to a point
ofE.
d)	E is complete and totally bounded.
PROOF:
(а)	=> (b): Suppose that E is compact and that {Fn}^^ is a sequence of
closed sets such that E А Fn) / 0 for each N G Af. We must prove
that E А (П5Х1, Fn) / 0- Suppose to the contrary. Then
Ec
OO \ c oo
Грп) =UFn-
1=1	' n=l
Therefore, {F^}^ is an open covering of E. Because E is compact, we
have that E C Un=iFn for some N Thus, EП (f)n=i= 0- This
contradiction shows that E A (QJXi Fn) / 0.
(b)	=> (c): Suppose that E satisfies (b). Let {^n}^=i be a sequence of
points of E. The sets Fn = { Xk : к > n }, n G Af, satisfy the hypothesis
of (b) and, hence, ПХ1 contains some point x G E. We will find a
subsequence of {^n}Xi ^at converges to x. As x G Fi, there exists an
П! > 1 such that р(хП11х) < 1. Suppose that integers ni < П2 < • • • < rik
have been chosen such that p(xnj, x) < 1/j for j = 1, 2, ..., k. Because
x G Fnfc, we can find an > nk such that p(xnjfc+1, x) < 1/(A: + 1).
Thus, we have defined inductively a subsequence	°f	that
converges to the point x of E.
7.9 Compact Metric Spaces □ 467
(с)	=> (d): Suppose E satisfies (c). First we show that E is totally
bounded. Let 6 > 0. If rri 6 E, then either E C Be(xi) or there is an
X2 G E\Be(xi). In the former case, we have found an open ball of radius б
that covers E. In the latter case, we again have two possibilities — either
E C B6(a;i) U Be(x2) or there is an x3 e E \ (Be(xi) U Be(x2)).
Clearly, we can continue with this line of reasoning to obtain either a
finite collection of balls of radius б covering E or a sequence {Tn}n=i С E
satisfying a:n+i Uj=i ) for all ft € The latter case, however,
contradicts (c) because it implies p(xn,xm) > б for m / n which, in turn,
implies that {ajn}^Li cannot have a convergent subsequence. Hence, E is
totally bounded.
Next we show that E is complete. But this follows easily from Exer-
cise 7.55 on page 438, which states that a Cauchy sequence with a conver-
gent subsequence is convergent.
(d)	=> (c): We will use a famous argument due to Georg Gantor. Let
{zn}^Li be a sequence of points of E. Since E is totally bounded, we can
find a finite number of open balls of radius 1/2 that cover E. It follows
that one of those balls must contain xn for infinitely many n. Hence, it is
possible to find a subsequence of {#n}^Li whose terms are all contained in
a single ball. It is convenient to denote the nth term of this subsequence
by S[i,nJ- Then we have р(я[1)П]5< 1 for n,m G //. Similarly,
by covering E with finitely many open balls of radius 1/4, we can find a
subsequence {a:[2,n]}“=i of -O[i,n] }^=i such that р(я?[2)„],Х[2,т]) < 1/2 for
n, m G ЛЛ
Continuing inductively, we obtain an infinite sequence of subsequences
W,n] }£LilbLi such that {z[fc+i,n]}£=i is a subsequence of {z[fc,n]}£Lj
and	< 1/fc for m, n G ЛЛ It follows that {^[n,n]}n=rl is a
subsequence of {zn}^i satisfying p(x[k,k]>x[j,j]) < max{l/j, 1/fc). Thus,
{^[n.n]}^! is a Cauchy subsequence of	and, so by completeness,
that subsequence converges to a point of E.
(d)	=> (a): Let E satisfy (d). Suppose for the moment we can show that
(E, p) is separable. Then, by Proposition 7.20 (page 461) and Proposi-
tion 7.21 (page 462), E has the Lindelof property. Thus, if О is an open
covering of E, then it has a countable subcovering {On}^=i-
We claim that {On}Xi bas a finite subcovering. For otherwise, we
can choose an element xn G E \ Uj=i Oj for n = 1, 2, ... . Since we have
already shown that (c) and (d) are equivalent, it follows that	has a
subsequence {£nfc}iK=i that converges to a point x G E. Because {On}^Li is
a covering of E, we have x G Om for some m. Because Ить*оо xnk = x
468 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
and, because Om is open, it follows that there is a fc such that nk > m and
хПк G Om. But this is a contradiction as хПк G E\ Ujii OjCE\ Om-
To complete the proof of (d) => (a), we need to show that (E, p) is
separable. If fc is a positive integer, then, because E is totally bounded,
there are points xJyk G E, j = 1, 2, ..., m^, such that E C Uj=i ^i/feG^fc)-
Let A = {Xj,k ' 1 < j' < W, fc G AT}. Then A is countable.
Now let Be(x) be an open ball centered at an element x G E, Then,
choosing 1/fc < 6, we can find a j such that p(x,Xj}fc) < 1/fc. It follows
that Xj,k G Be(x). Thus, every open ball around a point of E contains a
point of A. Hence, A is a countable dense subset of E and, so, (E, p) is
separable.	
In the last two paragraphs of the proof of Theorem 7.7, we established
the following result.
COROLLARY 7.5
A totally bounded metric space is separable.
EXAMPLE 7.20 Illustrates Theorem 7.7
Let || || denote one of the norms defined in Example 7.4 (or 7.5) on page 422.
By Proposition 7.10 on page 435 and Exercise 7.59 (or 7.60) on page 438, a
subset E of 7£n (or Cn) is complete if and only if it is closed. Exercise 7.129
shows that E is totally bounded if and only if E is bounded, that is, if and
only if sup{ ||rr|| : x G E} < oo. From Theorem 7.7, we can now deduce
the classical Heine-Borel theorem: A subset of (or Cn) is compact if
and only if it is closed and bounded.	□
A set E in normed space (Q, || ||) is called bounded if
sup{ ||rr|| : x G E} < oo.
Example 7.20 suggests that in a normed space, total boundedness might
be equivalent to boundedness. That this is not correct is shown by the
following example.
EXAMPLE 7.21 A Noncompact, Closed and Bounded Set
Refer to Exercise 7.48(d) on page 437. The closed unit ball Bi(0) in the
space ^2(Af) is closed and bounded. For each n G AT, let en(fc) = 1 if fc = n,
and 0 if к / n. As ||en — em||2 = \/2 for n / m, it follows that no ball
of radius 1/2 can contain more than one en. Thus, the sequence
7.9 Compact Metric Spaces □ 469
of elements of Bi(0) cannot be contained in a finite union of balls of ra-
dius 1/2. Hence, Bi(0) is not totally bounded and, so, by Theorem 7.7(d),
is not compact.	□
Properties of Compact Metric Spaces.
Next we discuss some useful properties of compact metric spaces. Proofs
will be left for the exercises.
DEFINITION 7.28 The Lebesgue Number of a Covering
Let О be an open covering of a metric space (Q, p). A number A > 0
is called a Lebesgue number of О if for each x e Г2, the ball B\(x)
is entirely contained in some member of O.
THEOREM 7.8
Let (П, p) be a compact metric space. Then every open covering of Q has
a Lebesgue number.
PROOF: See Exercise 7.137.	
DEFINITION 7.29 Uniformly Continuous Function
Let (Q, p) and (Л, a) be metric spaces. A function /: Q —» A is called
uniformly continuous if for each e > 0, there is a 6 > 0 such that
/(?/)) < 6 whenever p(x, y) < 6.
Note: A crucial element of Definition 7.29 is that 6 depends only on e. It
has no dependence on x and y.
THEOREM 7.9
Suppose (Q,p) and (A, a) are metric spaces, f2 is compact, and f:Sl —» A
is continuous. Then f is uniformly continuous.
PROOF: See Exercise 7.138.	
470 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
EXERCISES 7.9
7.129	Consider 7£n equipped with any one of the norms discussed in Example 7.4
on page 422.
a)	Show that a subset E of IV1 is totally bounded if and only if it is
bounded, that is, if and only if the set of norms of elements of E is
bounded as a subset of 7£.
b)	Show that part (a) holds when is replaced by Cn. (Refer to Exam-
ple 7.5 on page 422.)
7.130	In a metric space Q, let {zn}~=1 be a sequence such that limn-»oo xn = x.
Show that the set { xn : n = 1, 2, ... } U {ж} is compact.
7.131	Compactness can also be expressed in terms of the Bolzano-Weierstrass
property. Let (Q, p) be a metric space and E C Q. A point x G Q is called
an accumulation point of E if for each e > 0, there is а у G E such
that 0 < p(x, y) < e. Prove that E is compact if and only if every infinite
subset of E has an accumulation point that is a member of E. Hint: Show
that this condition is equivalent to (c) of Theorem 7.7.
7.132	Let у G ^2(ЛГ) and К = {x G £2(ЛГ) : |rc(j)l < |?/(j)| for each J G A/"}.
Show that К is a compact subset of €2(.M).
7.133	Refer to Exercise 7.47 on page 437. Let К be a compact subset of a metric
space (Q, p) and let x G Q. Show that there is an element у G К such that
p(x,y) = p(x,K).
7.134	Refer to Exercise 7.47 on page 437. In a metric space (Q, p), let F and К
be, respectively, closed and compact subsets such that F П К = 0. Show
that p(F, K) > 0.
7.135	Consider the normed space (C([a, 5]),|| ||oo) discussed in Example 7.8 on
page 425. Show that the closed unit ball Bi (0) is not compact.
7.136	Suppose that (Q, p) and (A, a) are metric spaces. Let QxAbe given the
product topology.
a)	Show that Q x A is metrizable.
b)	Show that if К and H are compact subsets of Q and A, respectively,
then К x H is a compact subset of Q x A.
7.137	Prove Theorem 7.8.
7.138	Prove Theorem 7.9.
7.139	Let (Q, p) and (A, a) be metric spaces and let f:Q. —► A be continuous.
Show that if К is a compact subset of Q, then /(F) is a compact subset
of A. In words, the continuous image of a compact space is compact.
7.140	Prove that a continuous real-valued function on a compact metric space
attains maximum and minimum values. Hint: See Exercise 7.139.
7.10 Compact Topological Spaces □ 471
7.10 COMPACT TOPOLOGICAL SPACES
In Section 7.9, we examined compact metric spaces. We are now ready
to discuss compactness in the setting of arbitrary topological spaces. Our
main goal is to prove a generalization of Theorem 7.7 on page 466, following
which, we will derive some useful properties of compact topological spaces.
DEFINITION 7.30 Compact Set, Compact Topological Space
A subset E of a topological space fl is called compact if every open
covering of E has a finite subcovering. If fl itself is compact, then it
is said to be a compact topological space.
Remark: Certainly, any compact metric space satisfies Definition 7.30.
Later, we will give examples of nonmetrizable compact topological spaces.
We note that E is a compact subset of fl if and only if E equipped
with the relative topology is a compact topological space. We observe also
that the union of a finite collection of compact sets is compact.
By studying conditions (a)-(d) in Theorem 7.7, we find that only (d)
involves the use of a metric in a crucial way — the conditions (a)-(c) have
natural generalizations to the setting of any topological space.
We can generalize condition (c) by passing from sequences to nets.
And we can generalize condition (b) by introducing the finite intersection
property: A collection C of subsets of a set fl is said to have the finite
intersection property if the intersection of each finite subcollection of C
is nonempty.
THEOREM 7.10
The following conditions on a topological space fl are equivalent:
a) fl is compact.
b) If a collection C of closed subsets of fl has the finite intersection prop-
erty, then F / 0.
c) Every net in fl has a convergent subnet.
PROOF: The equivalence of (a) and (b) is left for Exercise 7.141.
(b)	=> (c): Suppose (b) holds. Let {xb}bEj be a net in fl having index set I
with relation z<. For each index l, let Fb = {x^ : l tj }. We claim that
the collection { Fb : l G I} of closed subsets of fl has the finite intersection
property. For, if ti, 42, ..., 4n are indices, then, because I is directed, there
472 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
is an index lq such that bj 4 bQ for each j = 1, 2, ..., n. It follows that
F^ C FLj for each j and, so, FLj / 0. Hence, by (b), Fb contains
an element x.
We will construct a subnet of converging to x. Let 91 denote
the collection of all open sets containing x. For each U € 91 and ь G Z,
we have {x^ : ь r]} QU / 0. Applying the axiom of choice, we obtain
a function /:91 x I —* I such that ь f(U,b) and ,G U for each
pair (U, b). We define a relation <j on 91 x I as follows:
(17, б) <3 (V, n) if /(С/, ь) f(V, t]) and V C U.
It is not hard to show that 91 x I is a directed set with respect to the
relation <J. (See Exercise 7.142.) Therefore, the net defined on 91 x I by
= xf(u,L) is a subnet of {xL}Lej.
All that remains is to show that lim^^t) = Given W G 91, we
choose any bQ G Z. If (W, t0) <з (17, t), then у^и,с) — xf(u,L) G U C W. It
follows from the definition of convergence of nets that lim ?/(£/ <_) = x.
(c)	=> (b): Suppose that (c) holds and that C is a collection of closed sets
having the finite intersection property. If C* is the collection consisting
of finite intersections of members of C, then, clearly, p|CeC С = F.
Thus, to show that (c) implies (b), it is enough to show that F / 0.
The collection C* is a directed set with respect to the relation de-
fined by Fi F2 if F2 C Fi. Applying the axiom of choice, we obtain a
net	where xp G F for each F G C*.
From (c), we know that there is a subnet	with index set К
and corresponding relation <j, having a limit x. Given an F G C*, there
is a n G К such that F FK. Thus, G F when к <j 77. Because F is
closed, Proposition 7.12 on page 442 implies that x G F. As F was chosen
arbitrarily from C*, we have that x G P|F6C, T-	®
Properties of Compact Topological Spaces
From Theorem 7.10, we can derive one of the most useful properties of
compact spaces. In words, it states that the continuous image of a compact
space is compact.
THEOREM 7.1	1
Let Q, be a compact topological space and f:Q—*Kbea continuous func-
tion. Then f(£V) is a compact subset of A.
7.10 Compact Topological Spaces □ 473
PROOF: Let {yb}bei be a net in /(fi). For each l € /, we choose an
xL € fi such that f(xL) = yb, thus obtaining a net {xL}L^i in fi. By
Theorem 7.10, there is a subnet {хЬк}кек having a limit x 6 fi. It follows
from Theorem 7.1 on page 443 that lim2/tK = lim/(x, ) = f(x). Noting
that G J(fi), we conclude by applying Theorem 7.1U again that J(fi) is
compact.	
The following corollary of Theorem 7.11 is left to the reader as an
exercise. (See Exercise 7.143.)
COROLLARY 7.6
If fi is compact and f is a real-valued continuous function on fi, then there
exist points xi,X2 G fi such that J(xi) = sup/(fi) and f(x2) = inf J(fi).
Next, we discuss relationships between compactness and separation
properties. The first result is left to the reader as Exercise 7.144.
THEOREM 7.1	2
a)	A closed, subset of a compact space is compact.
b)	A compact subset of a Hausdorff space is closed.
COROLLARY 7.7
Let fi be a compact space and Л a Hausdorff space. Suppose that f —► Л
is continuous, one-to-one, and onto. Then /-1 is continuous and, so, f is
a homeomorphism.
PROOF: According to Theorem 7.1 on page 443, it suffices to prove that
(/~1)”1(F) = f(F) is closed in Л when F is closed in fi. But, if F is closed
in fi, then, by Theorem 7.12(a), F is compact. Hence, f(F) is compact
by Theorem 7.11. Applying Theorem 7.12(b), we conclude that f(F) is
closed.	
The following corollary is also left to the reader as an exercise. See
Exercise 7.145.
COROLLARY 7.8
Let T and U be topologies on a set fi such that T is weaker than U. If
(fi, T) is Hausdorff and (fi,W) is compact, then T — U.
474 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
THEOREM 7.1	3
A compact Hausdorff space is a normal space.
PROOF: Suppose that fi is a compact Hausdorff space and that A and В
are disjoint closed subsets of fi. Because fi is compact, Theorem 7.12(a)
implies that A and В are also compact.
We must find disjoint open sets U and V containing A and B, respec-
tively. Let b be a fixed, but arbitrary, element of B. Since fi is a Hausdorff
space, we can, for each а € A, find disjoint open sets Oa and Pa containing a
and &, respectively. The collection { Oa : a € A } is an open covering of A.
As A is compact, there is a finite subcovering { Oaj : j = 1, 2, ..., m }.
Let Ub = UjLi Oaj and Vb = Pa, • Then Ub is an open set
containing A, Vb is an open set containing b, and Ub И И = 0. The open
covering { Vb : b € В } of В has a finite subcovering { Vbk : к = 1, 2, ..., n }.
Let V ~ Ufc=i and U = П£=1 Ubk. Then U and V are disjoint open
sets containing A and B, respectively.	
The next corollary follows immediately from Theorem 7.13 and Theo-
rem 7.6 on page 462.
COROLLARY 7.9
A second countable compact Hausdorff space is metrizable.
It is useful to note that Theorem 7.13 together with Urysohn’s lemma
(Theorem 7.2 on page 450) show that compact Hausdorff spaces carry an
abundance of real-valued continuous functions.
EXERCISES 7.10
7.141	Prove the equivalence of (a) and (b) in Theorem 7.10 on page 471.
7.142	Prove that the set 91 x Z, defined in the proof of (b) => (c) in Theorem 7.10
on page 471, is directed with respect to the relation < defined there.
7.143	Prove Corollary 7.6 on page 473.
7.144	Prove Theorem 7.12 on page 473.
7.145	Prove Corollary 7.8 on page 473.
7.146	Let fi be a compact Hausdorff space. Suppose there is a sequence {fn}n
of continuous real-valued functions on fi having the following property:
If x / y, then there is an n such that fn(x) / fn(y)- Prove that fi is
metrizable.
7.147	Refer to Exercise 7.125 on page 464. Show that, in a first countable com-
pact Hausdorff space, every sequence has a convergent subsequence.
7.11 Locally Compact Spaces □ 475
7.148	Suppose that Q and A are compact spaces and that Q x A is given the
product topology. Show that Q x A is compact.
★ 7.149 Let Q be a topological space. A function f :fl —> [—00,00) is said to be
upper semicontinuous if	00, r)) is open for each real number r;
a function g is said to be lower semicontinuous if — g is upper semicon-
tinuous.
a)	Show that an upper semicontinuous function on a compact space is
bounded above and attains the sup of its range.
b)	Show that a lower semicontinuous function on a compact space is
bounded below and attains the inf of its range.
+7.150 Refer to Exercise 7.149. Suppose that f is an upper semicontinuous func-
tion on a compact Hausdorff space Q.
a)	Prove that f(x) — inf{h(x) : h is continuous and f < h} for each
x E П.
b)	State and prove an analogous result to part (a) for lower semicontinuous
functions.
7.151	Refer to Exercise 7.149, Definition 6.6 on page 331, and Example 7.8 on
page 425. Show that f —>Vaf defines a lower semicontinuous function on
the normed space (C([a, 6]), || ||oo).
7.11	LOCALLY COMPACT SPACES
The space of real numbers, 7£, is not compact. We can see this directly
by noting that the open covering {(—n, n) : n G Af} of TZ has no finite
subcovering; or we can deduce it from the Heine-Borel theorem.
Although TZ is not compact, compactness plays an important role in
its analysis. This is because every element of TZ is contained in an open
set having compact closure. Many topological spaces share with TZ this
important property, which is called local compactness.
DEFINITION 7.31 Locally Compact Topological Space
A topological space Q is said to be locally compact if for each x G fi
there is an open set W such that x G W and W is compact.
It is not hard to see that the spaces TZn and Cn in Examples 7.4 and 7.5,
respectively, on page 422 are locally compact. The spaces £1(/x)? £2(m)>
and £°°(/z) of Example 7.6 on page 423 are not locally compact except in
certain special instances. (See Exercise 7.152.)
476 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
In most cases of interest, the property of local compactness appears in
conjunction with the Hausdorff property. The next several results provide
some important properties of locally compact Hausdorff spaces.
PROPOSITION 7.22
Let О be an open subset of a locally compact Hausdorff space fi.
a)	If x € O, then there exists an open set V such that V is compact and
xe Vc v co.
b)	If К is compact and К С O, then there is an open set W such that
W is compact and К C W C W С O.
PROOF:	_
a)	Let W be an open set containing x such that W is compact. By The-
orem 7.13 on page 474, W equipped with the relative topology is a
normal space.
We note that, in the relative topology of the compact space W,
IVDO is open and {x} is closed. Hence, by Proposition 7.14 on page 449,
there is a set V having the following properties: x С V, V is open in
the relative topology of W, and the closure of V in the relative topology
of W is contained in W П O. Because W is closed in fi, it follows that
the closure of V in the relative topology of W coincides with its closure
in fi. Hence,
xeVcVcWnOcO.
The proof of (a) will be complete if we can show that V is open as a
subset of fi. By the definition of relative topology, there is an open
subset U C fi with V = U A W. Then
V = V QW = U QW QW = U HW.
Thus, V is open in fi.
b)	By part (a) we can, for each x С K, find an open set Vx whose closure
is compact and satisfies x € Vx C Vx С O. Because К C	and
К is compact, we can find finitely many points ®i, x2, ..., xn of К such
that К C Uj=i Letting W = U?=1 VXj, we obtain
KcWcW=(jV^CO.
J=1
As W is a finite union of compact sets, it is compact.	
Using Proposition 7.22, we can prove a version of Urysohn’s lemma
(Theorem 7.2 on page 450) for locally compact Hausdorff spaces.
7.11 Locally Compact Spaces □ 477
THEOREM 7.14
Suppose that fi is a locally compact Hausdorff space and that О and К
are, respectively, open and compact subsets of fi such that KcO. Then
there is a continuous function f:ft —> [0,1] such that f(x) = 1 for x e К
and f(x) = 0 for x € Oc.
PROOF: By applying Proposition 7.22 twice, we obtain open sets Wi
and W2 such that W2 is compact and
К C Wi C Wr C W2 C W2 C O.
By Theorem 7.12 on page 473, К is a closed subset of W2. Because the
space W2 equipped with the relative topology is normal (Theorem 7.13
on page 474), it follows from Urysohn’s lemma that there is a continuous
function g: W2 —► [0,1] with g equal to 1 on AT and 0 on W2 \ Wi.
We now define a function	—► [0,1] by letting f be equal to g on W2
and equal to 0 on fi \ W2. It is left as an exercise for the reader to show
that f is continuous on fi. (See Exercise 7.154.)	
Theorem 7.14 is the basis of an important construction related to cov-
erings of compact subsets of locally compact Hausdorff spaces. To describe
this construction, it is helpful to introduce the following terminology. Let
f be a complex-valued function on a topological space fi. The closure of
the set of points where f is not 0 is called the support of f and is denoted
by supp f. Hence, supp / = { ж G fi : f(x) / 0 }.
THEOREM 7.15 Partition of Unity
Let fi be a locally compact Hausdorff space, К a compact subset of fi,
and О an open covering of K. Then there are finitely many continuous
real-valued functions fi, f2, ..., fn on fi such that:
a) fj > 0 for each j.
b)	For each j, there is an Oj G О such that supp fj C Oj.
c)	fj(x) = 1 for each x G K.
d)	127=1 fj(x) — 1 f°r each x
PROOF: For each x G К we choose Ox G О such that x G Ox. By
Proposition 7.22, there is an open set Vx such that Vx is compact and
x € Vx C Vx C Ox. By Theorem 7.14, there is, for each x G K, a
continuous function gx such that 0 <	< 1? 9x(%) = 1> and 9x{y) = 0
for у G V£. We note that supp^ C Vx C Ox.
478 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
Since { gx 1 ((0, oo)) : x € К } is an open covering of K, there are a finite
number of points x?, ..., xn of К such that К C Uj=i	°°))-
Hence, the function g = gXj is strictly positive on К.
By Corollary 7.6 on page 473, we have a = inf g(K) > 0. The closed
set F =	oo,a/2]) is disjoint from K. Thus, again by Theorem 7.14,
we find that there is a continuous function h such that 0 < h < 1, h(x) = 0
for x € K, and h(x) = 1 for x € F. Because the function g -4- h is positive
everywhere on fi, it follows that the functions
fj = 9x3A1 + hln)K9 + h), j = l, 2, n,
are continuous. It is easy to check that the functions Л, /2, • • •, /n satisfy
conditions (a)-(d).	
Theorem 7.14 can also be applied to extend Urysohn’s metrization
theorem (Theorem 7.6 on page 462) to locally compact Hausdorff spaces.
THEOREM 7.16
If a topological space is locally compact, Hausdorff, and second countable,
then it is metrizable.
PROOF: Let Q be a locally compact Hausdorff space with a countable
neighborhood basis 91. Let
W={(U,V): U,Ve% U is compact, and U С V}.
We will show that given an open set О and a point x € O, there is a pair
(U, V) € W such that x € U and V С О. Since 91 is a neighborhood basis,
there is a V € 91 such that x € V С O. By Proposition 7.22 on page 476,
there is an open set W such that W is compact and x € W C W С V.
We can now choose a U € 91 such that x e U C W. It follows from
Theorem 7.12 on page 473 that U is compact. Thus, (U, V) € W.
The remainder of the proof is the same as the proof of Urysohn’s
metrization theorem, where Theorem 7.14 on page 477 is used as a replace-
ment for Urysohn’s lemma.	
It is possible to extract from the proof of Theorem 7.16 the following
corollary whose proof is left to the reader as Exercise 7.159.
COROLLARY 7.10
Let Q be alocally compact space. Then there is a neighborhood basis 91
such that U is compact for each U € 91. Furthermore, if Q is second
countable, then 91 can be chosen to be countable.
7.11 Locally Compact Spaces □ 479
Let fi be a second countable topological space. Suppose that fi has
a countable neighborhood basis 91 = {t7n}^=1 such that the_closure of
each Un is compact. Let Wi = Ux. The sets in 91 cover U{ and, so,
there is an integer n2 > 1 such that W\ C Uj. Let W2 = UjXi Цг
Then W2 is compact and, hence, we can find an integer пз > n2 such that
^2 C Ujli Uj. Let W3 = Uj=i Uj. Continuing in this fashion we obtain
an infinite sequence of sets satisfying the conditions delineated in the next
definition.
DEFINITION 7.32 Exhaustion
A sequence {Wn}^ of subsets of a topological space fi is called an
exhaustion if it satisfies the following conditions:
a)	Each Wn is open.
b)	Each Wn is compact.
c)	Wn c Ж+i for each n.
d)	П = Un=i Wn.
Corollary 7.10 and the paragraph preceding Definition 7.32 show that
a second countable locally compact space has an exhaustion. Here are some
concrete examples of exhaustions.
EXAMPLE 7.22 Illustrates Definition 7.32
a)	{(—n, n)}^^ is an exhaustion of H.
b)	{(1/n, 1 — l/n)}^! is an exhaustion of the interval (0,1).
c)	{Bn(0)is an exhaustion of the normed space (7£n, || Ц2) discussed
in Example 7.4 on page 422.	□
In the next section, exhaustions will be used to obtain metrization
results for certain spaces of functions.
Compactification
Our next theorem shows that it is possible to turn any locally compact
Hausdorff space into a compact space by the addition of a single point.
To see how, first consider the set AT of positive integers with the discrete
topology. This space is locally compact, but not compact, and its compact
subsets coincide with its finite subsets. We would like to add a “point at
infinity,” denoted to this space to turn it into a compact space. The
problem is to find the right topology for the set AfU {tu}.
480 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
To see what to do, we pass from the set A/" to the subset of real numbers
E = {1/n : n € A/"} via the function h(n) — 1/n. Note that E = E U {0}.
E is a bounded set of real numbers but is not compact because it is not
closed. However, E U {0} is closed and bounded and, hence, compact with
respect to the relative topology inherited from TZ. We can easily describe
the open sets (in the relative topology) of Eu{0}: Each subset of E is open
and a subset D of E U {0} containing 0 is open if and only if its relative
complement (E U {0}) \ D is a finite set.
If we now extend the function h to AT U {tu} by h(tu) = 0 and call a
subset W of A/*U {lj} open if h(W) is open in EU {0}, we obtain a topology
making AT U {u} into a compact space. The open sets of this topology on
AT U {cj} consist of all subsets of A/" as well as all complements of finite
subsets of X.
As the next theorem shows, the construction that we just performed
can be generalized to arbitrary locally compact Hausdorff spaces. The
proof of the theorem is left to the reader as Exercise 7.160.
THEOREM 7.17 One-Point Compactification
Suppose that (Q, T) is a locally compact Hausdorff space. Let u> be an
element not in Q and set П* = OU {lj}. Let T* denote the collection
of subsets of SI* that are either members of T or whose complements are
compact subsets of the space (Q, T). Then T* is a topology on Si* having
the following properties:
a) (Q*,T*) is compact.
b) T coincides with the relative topology {QfW : W € T* }.
c) SI is open in the topological space (Sl*,T*).
d) SI is dense in (Sl*,T*) unless (Sl,T) is compact.
The space	constructed in Theorem 7.17, is called the one-
point compactification of SI.
EXERCISES 7.11
7.152	Show that the space ^2(Q) of Example 7.6 on page 423 is locally compact
if and only if Q is finite.
7.153	Suppose that Q is a locally compact space, that D C SI, and that 7b is
the relative topology on D. Prove that (D,7b) is locally compact if D is
closed in Q.
7.154	Show that the function f defined in the last paragraph of the proof of
Theorem 7.14 on page 477 is continuous and satisfies f(K) = {1} and
/(Oc) = {0}.
7.12 Function Spaces □ 481
7.155	Show that the functions /i, /2, . • •, fn defined in the last paragraph of
the proof of Theorem 7.15 on page 477 satisfy conditions (a)-(d) of that
theorem.
7.156	State and prove a version of Tietze’s extension theorem (Theorem 7.3 on
page 451) for locally compact Hausdorff spaces.
7.157	Let К be a compact subset of a locally compact Hausdorff space fi.
Show that there is nonnegative continuous function f on fi such that
К = /~1(0) if and only if there is a sequence {Gnj^Lx of open sets such
that К = x Gn.
7.158	Let f be a complex-valued function on a set fi. A point xq 6 fi is said to
be a peak point of f if |/(x)| < |/(xo)| for all x / xq. Show that in a
metrizable locally compact space fi, each point is a peak point for some
complex-valued continuous function on fi.
7.159	Prove Corollary 7.10 on page 478.
7.160	Prove Theorem 7.17.
★ 7.161 Let f be a continuous function from a locally compact Hausdorff space fi
into a metric space (A, p). The collection /С of compact subsets of fi is
a directed set with respect to the relation C. Define limz—w/(x) = у
to mean that the net {sup{p(/(rr),2/) : x E Kc}}KE)C converges to 0.
Show that f is the restriction of a continuous function on the one-point
compactification of fi if and only if limx—w f(x) exists. Note: In case
fi = 7£n, limx—a, f(x) = у if and only if lim||xj|—f(x) = y, where || || is
any one of the norms defined in Example 7.4 on page 422.
7.162	Prove that the one-point compactification of К is homeomorphic to a circle
in 7£2.
7.163	Define a “two-point compactification” of that is homeomorphic to the
interval [—1,1].
7.164	Prove that the one-point compactification of the space of complex num-
bers C is homeomorphic to the sphere S = {x e H3 : ЦжЦг = 1 }• Hint:
Let h(z) = (14- \z\2y\2ftz,2<Sz, \z\2 - 1) for z e C, and = (0,0,1).
7.12 FUNCTION SPACES
We will consider, in this section, what it means for a sequence of continuous
functions to converge. In particular, we will construct a topology T(fi,A)
for the collection of continuous functions from a topological space fi to a
482 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
metric space (A, p) such that convergence of a sequence with respect to
T(fl,A) corresponds to uniform convergence on compact subsets. Related
notions of pointwise and uniform convergence will also be discussed.
For a sequence {fn}^=1 of functions from a topological space fl to a
metric space (Л, p), there are several meanings that can be attached to the
expression
lim fn = f.
n—»oo
(7-12)
One simple way to define (7.12) is the following.
DEFINITION 7.33 Pointwise Convergence
A sequence {/nl^Lx of functions from a topological space fl into a
metric space (A, p) is said to converge point wise to the function f
if for each x E fl and each e > 0, there is an E Лг such that
p(/n(z)5/(z)) < e whenever n > N.
Pointwise convergence of a sequence of functions	to a func-
tion f requires that the sequence {/n(^)}^=i of elements of A converges
to f(x) for each x G fl. A much more demanding mode of convergence is
as follows.
DEFINITION 7.34 Uniform Convergence
A sequence	of functions from a topological space fl into a
metric space (A, p) is said to converge uniformly to the function f
if for each e > 0, there is an N G Af such that p(/n(a;), /(^)) < 6 f°r
all x G fl whenever n > N.
The crucial difference between Definitions 7.33 and 7.34 is that, in the
latter, N may not depend on x whereas, in the former, it may.
For many applications, Definition 7.33 is too weak and Definition 7.34
is too strong. In this section, we will be concerned primarily with a mode
7.12 Function Spaces □ 483
of convergence that is intermediate between pointwise and uniform conver-
gence. This mode of convergence is as follows.
DEFINITION 7.35 Uniform Convergence on Compact Subsets
A sequence {/n}^Li of functions from a topological space Q into a met-
ric space (Л, p) is said to converge uniformly on compact subsets
to the function f if for each compact subset К C Q and each e > 0,
there is an N e Af such that p(fn(x), f(x)) < e for all x e К when-
ever n > N.
EXAMPLE 7.23 Illustrates Definitions 7.33-7.35
Let П = (0,1), A = 7£, and fn(x) =	• Then the sequence
of functions converges both pointwise and uniformly on compact subsets to
the function f(x) = 1/(1 — x). But,	does not converge uniformly
to f.	□
Next we introduce some notation that will be used throughout the
remainder of the text.
DEFINITION 7.36 Collection of Continuous Functions
Let Q be a topological space and A a metric space. Then we denote
by C(f2, A) the collection of all continuous functions from Q to A. In
case Л = C, we write C(Q) for C(Q,A); thus, C(f2) is the collection
of all complex-valued continuous functions on fl.
We will construct a topology for C(f2,A) such that convergence of
sequences in that topology is the same as uniform convergence on compact
subsets. To aid our construction, it will be helpful to have the following
notation. For f,g € C(Q,A) and S' C Q, we let
%
Ps(f,g) = sup{ p(f (a?),s(x)) : x 6 S }.	(7.13)
Thus, ps measures how (uniformly) close two functions are on S.
The following proposition, whose proof is left for Exercise 7.166, shows
that as a function from C(Q, Л) x C(Q, A) to [0, oo], ps is almost a metric.
484 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
PROPOSITION 7.23
The function ps, defined in (7.13), satisfies the following conditions for all
£ C(fl, A):
a)	0 < ps(j,g) < oo.
b)	ps(f,n = o.
c)	Ps(f,g) = Ps(g,f)-
d)	Ps(J,g) < ps(f,ty + ps (h,g).
e)	If S is compact, then ps(f,g) < oo.
f)	If S is compact, then \ps(f,h) - ps(g,h)\ < ps(f,9)-
g) If fl is compact, then pp is a metric.
If ps(f)9) = 0 for some f / g or if ps(f,g) = oo for some f and g,
then p is not a metric. When S is a compact subset of fl, then, as Propo-
sition 7.23(e) shows, the latter obstacle cannot arise, but the former still
remains. (See Exercise 7.168.) Nevertheless, by considering the entire fam-
ily { Pk • If compact}, we can produce a topology on C(fl, Л) that will be
the correct one for studying uniform convergence on compact subsets.
In the following definition, the notation рк(-, p) represents the function
from <7(Q,A) to [0, oo) defined by	= PK(f,9h where К C fl is
compact and g G C(f2,A).
DEFINITION 7.37 Topology of Uniform Convergence on Compacts
The weak topology on C(fl, A) determined by the family of functions
{ £/<(•? <7) : If compact, g G C(fl, A)} is called the topology of uni-
form convergence on compact subsets and is denoted T(fl,A).
Note: Whenever we work with a function space of the form C(fl,A), we
will assume that it is equipped with the topology T(fl,A) unless explicitly
stated otherwise.
The next proposition shows that convergence in the topology T(fl, A)
is exactly the same as uniform convergence on compact subsets.
PROPOSITION 7.24
Let {ЛЬе/ be a net of functions in C(fl,A). Then {fL}Lei converges to f
if and only if
lim pK(fb,f)=O	(7.14)
for each compact subset К C fl.
7.12 Function Spaces □ 485
PROOF: By Proposition 7.13(b) on page 444, the net {fb}Lei converges
to f if and only if
Итрк(Л,#) = pK(f,g)	(7.15)
for each compact set К and each function g G C(fl,A). The equivalence
of (7.14) and (7.15) now follows easily from parts (b) and (f) of Proposi-
tion 7.23.	
Remark: Because р(/(я),р(я)) < Рк(/,д) for each x € K, it is clear
from Proposition 7.24 that convergence of a sequence with respect to the
topology T(fl, Л) corresponds to uniform convergence on compact subsets.
Although, in general, T(fl, A) is not metrizable, it is nevertheless pos-
sible to define analogues of Cauchy sequences and a notion of completeness
for the space C(fl,A).
DEFINITION 7.38 k-Cauchy Sequence; k-Complete
A sequence {/n}^Li of functions in C(fl, A) is said to be fc-Cauchy
if for each compact subset К and each e > 0, there is an N G Af such
that PKljmfm) < whenever n, m > N. If every fc-Cauchy sequence
in C(fl,A) converges, then C(fl,A) is called fc-complete.
Remark: The concept of a fc-Cauchy sequence expresses the idea of a se-
quence that is “uniformly Cauchy on compact subsets.”
The most interesting examples of spaces of the type C(fl,A) occur
when fl is a locally compact Hausdorff space. Theorem 7.19 describes
some properties of C(fl, A) in this case. Before stating and proving that
theorem, however, we need some preliminary results.
THEOREM 7.1	8
Suppose that fl is a topological space and that (A, p) is a metric space. Let
{/n}~=i be a sequence in C(fl, A) that converges uniformly to a function f.
ThenfeC^X).
PROOF: Let xq e £1 and e > 0 be given. To establish the continuity of f
at xq, we will show that there is an open set U containing xq such that
p(/(u), /(^o)) < 6 whenever x G U. (See Theorem 7.1 on page 443.)
By uniform convergence, there is an N such that p(/n(a;), /(^)) < e/3
for all x G fl whenever n > N. Because /n is a continuous function, there
486 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
is an open set U containing xq such that p(Jn(x), < e/3 whenever
x G U. It follows that for x G U,
p(f(x), f&oY) < fN(x)) + p(/w(x), fN(x0)) + p(Jn(zo), f(x0))
< e/3 + б/З 4- б/З = б.
Hence, f is continuous at xq.	
LEMMA 7.2
Let fl and Л be topological spaces and f:Fl —> Л. Suppose that for each
x G fl, there is an open set Ux containing x such that f\ux is continuous.
Then f is continuous.
PROOF: The proof is left for Exercise 7.170.	
THEOREM 7.1	9
Suppose that Q is a locally compact Hausdorff space and that (Л, p) is a
metric space.
a)	If A is complete, then C(Q,A) is k-complete.
b)	If fl is second countable, then C(fl, A) is metrizable.
c)	If fl is second countable and A is complete, then C(fl, Л) is complete.
d)	If fl is compact, then the topology T(Q, Л) is induced by the metric pq.
PROOF:
Proof of (a): Suppose that {/n}^Li is a fc-Cauchy sequence. Applying
Definition 7.38 with К = {x}, we find that the sequence {/n(^)}^=i is
Cauchy in A for each x G fl. Because A is complete, we conclude that for
each x G fl, the sequence {/n(^)}^=i converges in A.
Let the function /:fl —» A be defined by f(x) = limn_4OO/n(z). We
will show that f is continuous on fl. Let xq G fl and W be an open set
containing xq such that W is compact. Let б > 0. We can choose N such
that Pw(fn, fm) < б for m, n > N. For each x G W, we have
p(/n(^),f(a:)) = lim p(/n(z),/m(z)) < limsup Лу(/П, fm) < e (7.16)
m^oo	т—юо
for n > N. It follows that the restrictions to W of the functions fn con-
verge uniformly to the restriction of f to W. Hence, by Theorem 7.18 and
Lemma 7.2, f is continuous on all of fl.
It remains to show that the sequence {/n}^Li converges to f with
respect to the topology T(fl, A). Replacing W by an arbitrary compact
7.12 Function Spaces □ 487
subset К in (7.16) and taking the supremum over x G K, we get that
Рк(/п, /) < e for n > N. Hence, by Proposition 7.24 on page 484, {Ail^Li
converges in	to f. The proof of (a) is now complete.
Proof of (b): If Q is second countable, then it has an exhaustion {JFn}J°=1.
(See Definition 7.32 on page 479 and the paragraph that follows that defini-
tion.) Let pn = рщг-. By Proposition 7.23(e) on page 484, pn is real-valued.
Let cr =	2“npn/(l+Pn)- We claim that a is a metric on C(f2, Л).
That a is a real-valued function follows from Exercise 7.70 on page 445. By
Exercise 7.30 on page 427, a satisfies Definition 7.7(c) on page 420. That
ст satisfies Definition 7.7(b) is clear, as are the facts that a is nonnegative
and satisfies сг(/, f) = 0 for each f. Thus, to prove that a is a metric, it
remains only to show that a^f^g) = 0 implies f = g. If a(f,g) — 0, then,
Pn(f,g) must vanish for each n. Hence, for each neAf, f(x) = g(x) for
all x G Wn. Because	is an exhaustion, it follows that f = g.
Consequently, a is a metric.
Let T denote the topology on C(f2, Л) induced by the metric a. We will
show that T = Т(П, Л). By the definition of the topology T(f2, Л), for each
fixed g G C(Q, A) and n G Af, the function pn(-,p) is continuous with re-
spect to that topology. The sequence of sums 2“Jpj(-,p)/(l+pj(-,p))
converges uniformly on C(Q, Л) to the function ct(-, g). From Theorem 7.18,
we conclude that cr(-,p) is continuous with respect to T(f2,A). Because
B?(g) = cr(-,p)'"1(—oo,r), it follows that every open ball B°(g) is open
with respect to T(Q, Л). Hence, every T-open set is T(Q, A)-open, that is,
T is weaker than T(f2, A).
To complete the proof of (b), we must show that T(Q, A) is weaker
than T. To do so, it suffices, by Exercise 7.80 on page 446, to show that if a
net {Л}^1 converges to f with respect to the topology T, then it converges
to f with respect to the topology T(Q,A). Now, {Alter converges to f
with respect to T if and only if lim a( fL,f) = 0. Let К be an arbitrary
compact subset. Then, since the sets in the exhaustion {H'n}^=1 are an
open covering of K, there is an m such that К C Wm. Thus,
Pk(Jl, f) — Pwm (•/*’ D ~ Pm(fi) f)-
The inequality
2-"WA, /)/(l + MA, /)) < ^(A, /)
implies that lim pm (A >/) = 0. Hence, lim Pk(Ji, f) = 0. It now follows
from Proposition 7.24 on page 484 that {A} converges to f with respect
to the topology T(f2, A). This completes the proof of (b).
488 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
Proof of (c): By (a) and the proof of (b), it suffices to show that a se-
quence {/n}Xi that is Cauchy with respect to the metric a is fc-Cauchy.
Let e > 0 be given and К a compact subset of Q. As before, we can choose
an m such that К C Wm. It follows from the definition of a that
n-m PK^fmfp) < <?—m Pm(fmfp) < rr( f f \
l + PK(fnJP)~ l + Pm(fn,fP)~
If N is large enough so that cr(fmfp) < 2-Tne/(l 4- e) for n,p > N, we
obtain pK^fn, fP) < Thus, {/n|^Li is fc-Cauchy. The proof of (c) is now
complete.
Proof of (d): The proof of part (d) is left for Exercise 7.171.	
EXAMPLE 7.24 The Case Л = C
Let fi be a locally compact Hausdorff space. Besides having a metric space
structure derived from the usual distance function p(z,w) = |z — w|, the
space С(П) has a linear-space structure, where addition and scalar multi-
plication are defined pointwise.
We now consider the relationship between the linear-space structure
and the topology T(f2, C) which, for simplicity, we denote by T(Q). For
S C Q, let
П/Ils’ = Ps(f,0) = sup{ |/(x)| : x e S }.
We note that ps(f,d) = ||/~p||s and that || ||s has the defining properties
of a norm except that ||/||s can be oo for some functions and can be 0 for
functions that do not vanish identically.
If and are nets in C(f2) converging to f and p, respec-
tively, then, for each compact subset К of Q,
Pk(A + 9i, f + 9) = IIЛ + 9c - f - slk
< IIA - /IIаг + Ил - sIIk =	/) + рк(д<.,д)-
Л.
We now see from Proposition 7.24 on page 484 that {fL 4- converges
to f 4- g. If follows that the operation of addition is continuous as a func-
tion from C(Q) x C'(Q) to C(£2), where C(Q) x C(Q) is given the product
topology.
By a similar argument, we find that the operation of scalar multipli-
cation is continuous as a function from C x C(f2) to C(f2). But, actually,
more is true. Scalar multiplication of a function by a complex number is a
special case of pointwise multiplication of functions. If the product fg of
two functions f and g in С(П) is defined pointwise, then C(f2) becomes an
7.12 Function Spaces □ 489
algebra of functions? Furthermore, since the product operation on C(fi)
satisfies ||/р||к < Ц/НкЦ^Цк, it follows by an argument similar to the one
used to prove continuity of addition that the operation of multiplication is
continuous as a function from C(fi) x C(fi) to C(fi).	□
EXAMPLE 7.25 The Case fi Compact and Л = C
Refer to Example 7.24. If fi is compact, then || ||q is a norm on C(fi) called
the sup-norm, also known as the supremum norm or uniform norm.
Thus the sup-norm on C(fi) is given by
II/IIq = sup{ \f(x)\ : X G fi }, f £ C(fi). (7.17)
The sup-norm induces the topology T(fi) and, moreover, (C(fi), || ||q) is
complete. Whenever we are considering a space of the form C(fi) where
fi is compact, we will assume that it is equipped with the sup-norm unless
explicitly stated otherwise.	□
EXAMPLE 7.26 The Case fi Not Compact and Л = C
Refer to Example 7.24. If fi is not compact, || ||q is still a norm on some
subspaces of C(fi). Important instances are the following:
C'c(fi) = { f G C(fi) : supp f is compact}
С0(П) = { f G C(Q) : lim f(x) = 0 }*
X—*(*>
С-ь(П) = {/e ОД :||/||n< 00 }.
The spaces Cc(fi), Co(fi), and Cb(fi) are called, respectively, the contin-
uous functions with compact support, continuous functions van-
ishing at infinity, and bounded continuous functions.
Co(fi) is a closed subspace of Ck(fi) with respect to the topology in-
duced by || ||q. Cc(fi) is a linear subspace of Co(fi) but it is not closed.
Indeed, it can be shown that Cc(fi) is dense in Co(fi) with respect to the
topology induced by || ||q. See Exercise 7.173.	□
For each x G fi define e^:C(fi, A) —* A by ex(f) = f(x). The weak
topology Tp(fi,A) determined by the family of functions {ex : x G fi}
t A linear space L with a multiplication operation that satisfies x(yz) = (xyjz,
x(y + z) = xy 4- xz, (x 4- y)z = xz 4- yz, and а(ху) — (ах)у = x(ay) for all x, y, z E L
and all scalars a is called an algebra.
* For the meaning of limx-»w /(x), see Exercise 7.161 on page 481.
490 - □ Chapter 7 Elements of Topological, Metric, and Normed Spaces
is called the topology of pointwise convergence. In case Л = C,
the topology Tp(f2,C) is denoted by jTp(Q). Whereas each function ex
is continuous with respect to T(f2, A), it follows that TP(Q,A) is weaker
than T(£l, Л).
The space C(Q,A) is a subset of the Cartesian product AQ. If AQ is
equipped with the product topology, then 7^(0, A) is the relative topology
on C(Q,A).
EXAMPLE 7.27 The Case Q = Af and Л = C
The set of positive integers A/* equipped with the discrete topology is a
second countable, locally compact Hausdorff space. Thus, by Theorem 7.19
on page 486, (C(A/’), 7'(A/’)) is metrizable and complete. As the compact
subsets of Af are exactly the finite ones, it follows that T(Af) = TP(A/).
The subspace Cc(Af) consists of all sequences of complex numbers that
are zero except for finitely many indices; the subspace often denoted
in the literature by cq, consists of all sequences of complex numbers that
converge to 0; and the subspace Cb(Af) coincides with the space £°°(A/) of
Example 7.6 on page 423. Note also that on Сь(А0, || \\/j is just || ||oo. □
EXAMPLE 7.28 Continuous Periodic Functions
Consider the subspace P of C(TV) given by
P = { h G C(TV) : h(x 4- 2тг) = h(x) for all x G TZ }.
Of course, P is just the space of continuous functions having period 2тг. It
is easy to see that P is a closed subspace of the normed space Сь(7£).
We will discuss a relationship between P and the space C'(T’), where
T is the unit circle centered at 0 in the complex plane C. For / G C(T),
define J(f):1Z —> C by J(/)(rr) = /(cosx 4- i sinx}. The function J is a
one-to-one linear function from C(T) onto P satisfying J(fg) =
and ||7(/)||тг = ||/||т for all f,g G C(T). (See Exercise 7.179.) Thus,
as normed spaces and as algebras, the spaces P and C(T) are essentially
copies of each other. It will be helpful later, when we study approximation
by trigonometric polynomials, to identify the spaces P and C'(T) by means
of the correspondence J.	□
EXERCISES 7.12
«
7.165	Show by examples that no two of the modes of convergence described by
Definitions 7.33, 7.34, and 7.35 are equivalent.
7.166	Prove Proposition 7.23 on page 484.
7.12 Function Spaces □ 491
7.167	Give an example where ps(f,g) = oo.
7.168	Suppose that a Hausdorff space Q is locally compact but not compact
and that AT is a compact subset of Q. Prove that рк cannot be a metric
on C(Q).
7.169	Prove that (<7(Q,A),T(Q, A)) is always a Hausdorff space.
7.170	Prove Lemma 7.2 on page 486.
7.171	Prove part (d) of Theorem 7.19 on page 486.
7.172	Prove that || ||q is a norm on each of the spaces Cc(Q), Co(Q), and Cb(Q)
defined in Example 7.26 on page 489.
★ 7.173 Suppose that a Hausdorff space Q is locally compact but not compact.
Let <7C(Q), Cb(Q), and Cb(Q) be given the norm || ||q.
a)	Show that Co(Q) is closed in but not equal to Cb(Q).
b)	Show that Cc(Q) is dense in Co(Q).
c)	Prove that Cb(Q) and Co(Q) are complete.
d)	Is Cc(Q) complete? Justify your answer.
★ 7.174 For xo G Q, define eXo:C(Q, A) —► A by eXQ(f) = f(xo).
a)	Show that eXQ is a continuous with respect to T(Q, A).
b)	Conclude that TP(Q,A) is weaker than T(Q,A).
7.175	Give an example where TP(Q,A) is properly contained in T(Q,A).
7.176	Show that the relative topology on Co (AT) determined by T(Af) is strictly
weaker than the topology induced by || Цлг.
7.177	Let (Q, T) be a locally compact Hausdorff space. Show that the weak
topology determined by the family of functions C(Q) coincides with T.
7.178	Let Q be a compact Hausdorff space. Show that the following subsets
of C(Q) are open:
a)	{ / : p/ = 1 for some g G C(Q) }.
b)	{ f : f = eh for some h G C(Q) }, where eh is defined by eh(x) = eh^x\
Hint: Use the Taylor series expansions of 1/(1 — z) and log(l — z).
7.179	Verify the asserted properties of the function J defined in Example 7.28.
Marshall Harvey Stone
(1903-1989)
Marshall Stone was born in New York City on
ЧМИЙ	April 8. 1903. He entered Harvard in 1919 and
WWL	received his doctorate under G. D. Birkhoff
in 1926. Stone taught at Columbia. Yale. Har-
vard, and Chicago. He retired from Chicago
in 1968, but accepted a position at the Uni-
IHBHHHHHHHHi versity of Massachusetts where he worked full
time until 1973.
Influenced by Birkhoff and von Neumann, Stone obtained significant
results in the spectral theory of unbounded operators, Boolean algebras,
general topology, and rings of continuous functions. Perhaps most well
known is his striking generalization of Weierstrass’s theorem on polyno-
mial approximation. Four theorems are named for him: Stone’s represen-
tation theorems for one-parameter unitary groups and Boolean algebras,
the Cech-Stone compactification theorem, and the Stone-Weierstrass ap-
proximation theorem.
Under Stone's leadership as chairman, the University of Chicago’s de-
partment of mathematics developed into a world center of mathematics
research. In 1950, after modernizing and upgrading the University of
Chicago's mathematics programs. Stone devoted himself to improving
the teaching of pre-university mathematics. He took a leading part in a
series of international conferences on mathematical education and served
as an active member of the School Mathematics Study Group.
Stone was traveling in Madras, India when he died on January 9, 1989.
492
Complete Spaces, Compact
Spaces, and Approximation
In this chapter, the ideas of Chapter 7 are used to prove some general theo-
rems of analysis in metric and topological spaces. Sections 8.1 and 8.2 deal
with two important consequences of the completeness property, namely, the
Baire category theorem and the contraction mapping principle. Sections 8.3
and 8.4 answer the following question for spaces of the form G(Q,A) and
for product spaces, respectively: “When is a set of functions compact?” In
Section 8.5, we discuss generalizations of piecewise linear approximation of
functions to spaces of the form C(Q,77.) and, in Section 8.6, we consider
generalizations of polynomial approximation of functions to spaces of the
form C(f2).
8.1 THE BAIRE CATEGORY THEOREM
Suppose we think of a subset of a topological space as being “thin” if its
closure has an empty interior. The Baire category theorem asserts that a
complete metric space cannot be expressed as the union of countably many
“thin” sets. Rather than use the term “thin,” we adopt the following more
widely used terminology.
493
494 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation
DEFINITION 8.1 Nowhere Dense Set
A subset E of a topological space Q is said to be nowhere dense if
its closure has an empty interior, that is, if (E)° = 0.
EXAMPLE 8.1 Illustrates Definition 8.1
a)	It is easy to see that a set is nowhere dense if and only if the complement
of its closure is dense.
b)	Clearly, any finite subset of TZ is nowhere dense; in fact, any subset
of TZ whose closure is countable is nowhere dense. There are, of course,
countable subsets of TZ that are not nowhere dense (e.g., Q). On the
other hand, the Cantor set is an example of an uncountable set that is
nowhere dense.
c)	Any line segment in TZ2 is nowhere dense.	□
We now state and prove the Baire category theorem.
THEOREM 8.1 Baire Category Theorem
A complete metric space (Q, p) is not the union of a countable collection
of nowhere dense subsets.
PROOF: Let	be a sequence of nowhere dense subsets of Q. We
must show that
(8.i)
4=1	'
To prove (8.1) it suffices to verify that QJXi / 0» where Gn = (Sn)c.
We will prove somewhat more, namely that U П (A^Li Gn) / 0, whenever
17 is a nonempty open subset of Q.
So, let U be open and nonempty. We first note that, for each n € Af,
Gn is dense because Sn is nowhere dense. Thus, there exists an element Xi
in the open set U П Gp It follows that there is an ri > 0 such that
Bn(xi) C U П Gi. Since G2 is dense and open, we can choose an element
z2 € Bri (xi) A G2 and an r2 > 0 such that
r2 < n/2 and ВГ2(х2) C Bri(xi) A G2.
8.1 The Baire Category Theorem □ 495
Continuing inductively, we obtain a sequence {^п}^ *n a?d a
sequence of positive numbers suc^ ^at f°r all n e Af,
Tn+i < Tn/2 and ВГп(хп) C Gn	(8-2)
and
-®rw+i(^n+i) С ВГп(хп).	(8.3)
Because p(xn+i, rrn) < rn < 2“^n”^ri for n = 1, 2, ..., it follows that
k-l	k-l
p(xn+k,xn) < J>(xn+j+1,sn+i) <	< 2-n+2n.
j=0	j=0
Hence, {^n}^Li is a Cauchy sequence and, so, its limit exists; call it x. It
follows from (8.2) and (8.3) that for n > m,
m-
Thus, x G Gm for m = 1, 2, ... . As we also have xn G Bri (xi) C U, we
conclude that x G U. Hence, x G U П (ПХ1 ^n)«	
The following corollary is an immediate consequence of the proof of
Theorem 8.1 and is sometimes referred to as the Baire category theorem.
COROLLARY 8.1 Baire Category Theorem (Alternative Version)
Let {C?n}^Li be a sequence of dense open subsets of a complete metric
space. Then OJXi Gn is dense.
A subset of a topological space is said to be of the first category if
it can be expressed as a countable union of nowhere dense sets. A set that
is not of the first category is said to be of the second category.
Using this terminology, we can restate the Baire category theorem as
follows: A complete metric space is of the second category. It shows that
sets of the first category in a complete metric space are in a sense much
smaller than sets of the second category.
The Baire category theorem is frequently used as a tool for obtaining
existence results. One establishes the existence of an object of a certain
type by showing that the complement of the collection of such objects is
of first category in an appropriate complete metric space. Examples 8.2
and 8.3 illustrate this technique.
496 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation
EXAMPLE 8.2 Illustrates the Baire Category Theorem
We know that is a complete metric space. Whereas every single-element
set in TZ is nowhere dense and the set Q of rational numbers is countable,
it follows that Q is a set of the first category. Hence, the Baire category
theorem implies that Qc is nonempty, that is, irrational numbers exist. □
EXAMPLE 8.3 Illustrates the Baire Category Theorem
By Theorem 7.19 on page 486, the space C([0,1],7£) equipped with the
norm || || [o,i] is a complete metric space, where
||/H[O,1] = sup{ |/(x)I : X e [0,1] }.
We will use the Baire category theorem to show that “most” functions in
C([0,1], TV) vary erratically in the sense that they fail to be monotonic on
any nonempty subinterval. A stronger result of this type is developed in
Exercise 8.2.
Let T denote the collection of nonempty open subintervals of [0,1]
having rational endpoints. For I G I, let Uj and Dj denote, respectively,
the set of functions in C([0,1], TV) that are nondecreasing and nonincreas-
ing on I. If we let F denote the set of functions in C([0,1],7£) that are
monotonic on some nonempty subinterval of [0,1], then we have that
F=U(t7fUDr).	(8.4)
it?
It is not hard to see that for each I G I, Uj and Dj are closed
in C([0,1],7£). If we can show that, for each I G T, Uj and Dj have
empty interiors, then it will follow from (8.4) that F is a set of the first
category. In particular then, the Baire category theorem will imply that
Fc is nonempty; that is, there are functions in C([0,1],7£) that fail to be
monotonic on any nonempty subinterval of [0,1].
We will prove that (C7/)° = 0. A similar proof shows that (-D/)° = 0.
Let € > 0 and f G [7/. We choose a point t G I and 6 > 0 small enough so
that the function f varies by less than e/2 on the interval [t — 5, t H- 5] C I.
Define
o(x\ _ f f(x),	if x € [0,1] \ [t - <5, t + 6];
9[X) ~ ( /(x) + 6~2e(62 - (x - t)2), if x € [t - 6,t + 6].
It is easy to see that д is continuous and that \\f — ^||[o,i] = 6- On the other
hand, д because g(t) is greater than g(t + 6). This shows that every
ball around f contains points outside Uj.	□
8.1 The Baire Category Theorem □ 497
We conclude this section with an important consequence of the Baire
category theorem called the uniform boundedness principle. We will
give some applications of the uniform boundedness principle in the exer-
cises. It will also prove useful later when we return to our study of normed
spaces.
THEOREM 8.2 Uniform Boundedness Principle
Let Q be a complete metric space and J- a family of continuous functions
from Q into a normed space (Л, || ||). Suppose that for each x € Q,
sup{ ||/(x)|| : f € .F} < oo.	(8.5)
Then there is an M e 11 and a nonempty open set О such that ||/(u)|| < M
for all и E О and f e. J2.
PROOF: For each n E Af and f € J7, the set {x : ||/(ж)|| < n} is closed.
Thus,
En = П : ll/WU <	; 11/MII < " all / € Л
/GJF
is closed. By (8.5), Q = UJXi The Baire category theorem now implies
that there is an integer N such that (En)° / 0. Thus, the assertion of the
theorem is verified with M = N and О = (En)°-	
EXERCISES 8.1
8.1	Let E = {/ € C([0,1]) : \f(x) - f(y)\ < |ж - y\ for all x,y € [0,1] }. Show
that E is nowhere dense in C([0,1]).
8.2	For this exercise, we need to extend the definition of differentiability given in
Definition 6.1 on page 316. When we have a function f defined on a closed
interval [a, 5], we say that f is differentiable at the left endpoint a if the limit
f(a + h) — f(a)
lim ---------------
h-*o+	h
exists and is finite. In that case the limit is called the derivative of f at a
and is denoted by /'(a). In a similar way, we define differentiability at the
right endpoint b.
a)	Let D denote the set of functions in C([0,1]) that are differentiable at
some point of [0,1]. Show that D is of the first category. Hint: Consider
the set of functions
Dn = { f : \f(t) — f(x)| < n\t — x| for all t € {0,1] for some x e [0,1] }.
498 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation
b)	Deduce from part (a) that there are functions in C([0,1]) that are not
differentiable at any point of [0,1].
8.3	Show that, as a subset of (ZL1 ([0,1]), || ||i), Z2([0,1]) is of the first category.
Hint: See Exercise 4.84 on page 206.
8.4	In this exercise, we will be considering functions from 1Z to 1Z.
a)	Give an example of a function that is continuous at each irrational and
discontinuous at each rational. Hint: Consider the function that is 0 at
each irrational, 1 at 0, and 1/q at each rational of the form p/g, where
q > 0 and p and q have no common divisor except 1.
b)	Is there a function that is continuous at each rational and discontinuous
at each irrational? Hint: Associate with each function /:	func-
tions и and £ defined by u(x) = inf{ sup{ f(y) :|z — p|<6}:6>0} and
£(x) = sup{ inf{ f(y) :|z — p|<6}:6>0}. Consider the set where £ is
strictly less than u.
8.5	Show that there is no sequence {fn}™=1 in C([0,1]) that satisfies
..	. , 4 fl, if x is rational;
hm fn(x) = 4	.
n—oo	10, if x is irrational.
In Exercises 8.6 and 8.7, we will need the concept of a basis for a linear space. A
subset S' of a linear space Q is said to be a basis for Q if for each nonzero x G Q,
there is a unique subset {zi, тг, • • •, zn} of S and a unique set of nonzero scalars
{ai,«2,.. •, on} such that x = oizi 4- 02^2 4--1- anxn. It follows from Zorn’s
lemma that every linear space has a basis. The number of elements in a basis
is called the dimension of the linear space. A linear space is said to be finite
dimensional if it has a basis containing finitely many elements; otherwise, it is
said to be infinite dimensional.
8.6	Let Q be a normed space and D a proper, finite-dimensional, linear subspace
of Q.
a)	Show that D is closed.
b)	Show that D is nowhere dense.
8.7	Let Q be a complete normed space and S a basis for Q. Prove that S is
either finite or uncountable. Hint: Refer to Exercise 8.6.
8.8	Let Q be a normed space and D a proper closed linear subspace of Q. Show
that D is nowhere dense.
8.2	CONTRACTIONS OF COMPLETE METRIC SPACES
Let f: Q —> Q. An element p G Q is called a fixed point of f if /(p) = p.
In this section, we will give a simple condition on f that guarantees the
existence of a fixed point when Q is complete. We will also give some
applications of that condition to differential and integral equations.
8.2 Contractions of Complete Metric Spaces □ 499
DEFINITION 8.2 Contraction
Let (Q, p) be a metric space. A mapping f: fi —> Q is called a con-
traction if there is a constant c G [0,1) such that
cp(x,y)
for all z, у G Q.
EXAMPLE 8.4 Illustrates Definition 8.2
a)	It is obvious that contractions are always continuous functions.
b)	Suppose a is a positive constant and define f(x) = ax for x G [0,1].
If a < 1, then f: [0,1] —> [0,1]. If a < 1, then f is a contraction with
c = a. If a = 1, then f is not a contraction, since
1 = |l-0| = 1/(1)-/(0)1 < c|l —0| = c
implies c > 1. Note, however, that 0 is a fixed point of f for all a.
c)	Suppose that f: [0,1] —► [0,1] is continuous and has a derivative at each
point of (0,1). Suppose further that
В = sup{ |/'(гг) | : x G (0,1) } < oo.
By the mean value theorem, for [0,1] with x < ?/, we have that
f(y) “ f(x) =	~ x) f°r some t G (z, y). It follows immediately
that \f(y)—f(x)\ < B|t/—rrI for all x, у G [0,1]. Hence, f is a contraction
if В < 1.
d)	An interesting special case of part (c) is the cosine function. Whereas
the derivative of cos x is — sin x and
sup{ | — sin ж| : x G [0,1] } = sin 1 < sin(7r/2) = 1,
it follows from part (c) that the cosine function is a contraction of [0,1].
e) Let Q = {zgC:|z|<1}, the closed unit disk, equipped with the usual
metric inherited from C. Define Q by /(z) = az, where a is a
complex constant with |a| < 1. It is easy to see that f is a contraction
if and only if |a| < 1. Note, however, that 0 is a fixed point of f for
all a.
f) Let Q = {zgC:|z|<1}, the open unit disk, equipped with the usual
metric inherited from C. Define f: Q —► Q by /(z) = (1 + z)/2. Then
f is a contraction with c = 1/2, but has no fixed point.	□
500 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation
THEOREM 8.3 Contraction Mapping Principle
Let (Q, p) be a complete metric space and f:Q-^Qa contraction. Then
f has a unique fixed point. Furthermore, if x is any point of Q, then
the sequence {^n}^L0 defined recursively by xq = x and xn+i = f(xn)
converges to the unique fixed point of f.
PROOF: By Definition 8.2 there is a constant c G [0,1) such that
p(xn+i,xn) = p(/(xn),/(a;n_1)) < cp(in,xn-i),
for n = 1, 2, ... . It follows immediately that p(xn+i,xn) < cnp(xi,x).
Thus, if 1 < n < m, we have
ТП — 71	771 — 71
p(xm,xn) < 52 x^n+j.^n+j-i) < 52cn+J_1
J=1	J=1
< Cnp(x\, x}	= Cnp{x\, z)/(l — c).
Because 0 < c < 1, we have that {2rn}£L0 is a Cauchy sequence. Hence,
by completeness, p = limn-^ooXn exists. Using the continuity of /, we
conclude that
/(p) = lim f{xn) = lim xn+i = p.
n—*oo	n—*oo
Thus, p is a fixed point of f.
It remains to establish uniqueness. Let q be a fixed point of f. Then
p(p,g) = p(/(p)>№)) < cp(p,g). Since c < 1, it follows that p(p,g) = 0.
Hence, p = q.	
We note from the proof of the contraction mapping principle that we
can obtain not only the existence of a unique fixed point, but a method for
approximating it and an error estimate. See Exercise 8.12.
EXAMPLE 8.5 Illustrates the Contraction Mapping Principle
We will illustrate the contraction mapping principle by using it to obtain
an existence result for a certain class of integral equations. Let AT be a
real-valued Borel measurable function defined on the rectangle
I x J = [x0 - a, z0 + a] x [т/0 - Д po + &]
where > 0. We make the following assumptions:
~K(x,z/2)| < -A|3/i-Уа!	(8.6)
8.2 Contractions of Complete Metric Spaces □ 501
for all x e I and any pair ?/i, y2 G J, where A is a constant, and
В = sup{ K(x, y) : (x,y) € I x J} < oo.	(8.7)
We will show using the contraction mapping principle that the integral
equation
9(x)=y0 + [ K(t,g(t))dt	(8.8)
Jxo
has a unique solution g G С(/, IV) if a A < 1 and aB < /3.
By Theorem 7.19 on page 486, C(7, IV) is a complete metric space with
respect to the norm || ||j. Hence, by Proposition 7.10 on page 435, the
closed ball B/?(t/o), centered at the constant function y$, is also complete.
If g G B/g(yo)J then the function T(g) defined by
Г(з)(х) = Уо + / K(t, g(t)) dt
(8-9)
is continuous on I. (See Exercise 8.14.) Furthermore, since
|r(ff)(®) -2/o| = / K(t,g(t))
Jxo
< aB < /3,
dt
it follows that T(g) G Bp(y$). Thus, the function T defined by (8.9) carries
В(з(уо) into itself.	_
We will show that (8.8) has a unique solution in B/?(?/o) by showing
that T is a contraction. For f,g€ B@(yo) and x > xq we have, using (8.6),
|T(/)(x) - T(p)(x)| = Г	- K(t,g(t\) dt
Jxq
<A \f(t) - p(t)| dt < aA\\f - fl'll/.
Jxq
Similar inequalities hold if x < Hence, ||T(f) — T(g)||z < aA||/ — g\\j
and, because a A < 1, T is a contraction on Bp(yo).	□
The conditions (8.6) and (8.7) appear rather restrictive. However,
they can be used effectively to obtain a fundamental existence result for
differential equations. See Exercise 8.16.
502 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation
EXERCISES 8.2
8.9	Give an example of a complete metric space (Q, p) and a map ft Q —> Q
that satisfies p(f(x),	< р(я, у) but has no fixed point.
8.10	Define ft [0,1]	[0,1] by f(x) = x2. Show that f is not a contraction.
Note, however, that f has two fixed points, 0 and 1.
8.11	Let ft [0,1] —> [0,1] be continuous.
a)	Show that f has a fixed point.
b)	Must f have a unique fixed point?
8.12	Let (Q,p), /, {zn}£L(b and P be as in the proof of the contraction mapping
principle. Establish the error estimate p(xn,p) < cnp(x, /(x))/(l — c).
8.13	Recall Newton’s method ®n+i = xn — G^x^/G'^Xn) for approximating the
roots of a function G. Suppose that r is a root of G. Further suppose that
8 is a positive number such that G' does not vanish on [r — 6, r 4- <5] and
c = sup {|g(x)G'7z)/(G,(x))2| : x € [r — 6,r + 6]| < 1.
Show that if the initial guess xq in Newton’s method is chosen from the
interval (r — 6,r-h<5), then the sequence {zn}“=o converges to r and satisfies
\zn-r\<^G(x0)/G'(x0)\.
Hint: See Exercise 8.12.
8.14	Refer to Example 8.5.
a)	Verify that K(-, </(•)} is Borel measurable if g G
b)	Use part (a) to prove that T(g) is continuous on I.
8.15	Let Tt C([0,1]) — C([0,1]) be defined by Г(/)(т) = 1 + f* f(t) dt.
a)	Show that T is not a contraction.
b)	Show that T о T is a contraction.
c)	Show by direct calculation that the sequence {/n}S?=o defined recursively
by /о = 1 and fn+i = T(fn) converges in C([0,1]) to the solution of
f(x) = l + ^ f(t)dt.
In Exercises 8.16 and 8.17 we use the following notation for partial derivatives.
Suppose f is a real- or complex-valued function defined on some open subset
of Ип containing the point x = (zi, • • •, zn). When
lim f(X1 + ^>^2,..-,^) - /(Х1,х2,...,жп)
h—*0	h
exists and is finite, we denote it by Dif(x) and call it the partial derivative of f
at x with respect to a?i. Partial derivatives with respect to z2, яз, ..., xn are de-
fined similarly. We will make use of standard results about partial differentiation
from multivariable calculus?
t See, for example, A. E. Taylor and W. R. Mann, Advanced Calculus, 3rd edition
(New York: Wiley, 1983).
8.3 Compactness in the space C(Q, Л) □ 503
8.16	Use Example 8.5 to establish the following existence theorem for differential
equations: Suppose that f and Р2/ are defined and continuous on some
open set containing the point (xo,yo) G TZ2. Then there is a 6 > 0 and
a unique continuously differentiable function g such that g(xo) = у о and
g'(x) = /(ж,^(ж)) for |ж - z0| < 6.
8.17	Show that the uniqueness part of Exercise 8.16 fails if the conditions on the
derivative £>2/ are removed. Hint: Consider д' = Зрз /2.
8.3	COMPACTNESS IN THE SPACE C(fi,A)
We now take up the study of compactness in the space (C(Q, Л), T(Q, Л)),
introduced in Section 7.12 beginning on page 481, where fi is a topological
space and (Л, p) is a metric space. Under certain mild restrictions on Q, we
give useful necessary and sufficient conditions for a subset D of C(Q,A) to
be compact with respect to the topology T(Q, Л) of uniform convergence
on compact subsets. But first we present a simple example.
EXAMPLE 8.6 Illustrates Compactness in Function Spaces
Consider the space C(TZ) equipped with the topology T(TZ) of uniform
convergence on compact subsets.
a)	As for any topological space, any finite subset of C(TZ) is compact.
b)	Let f € C(TZ) and'define g: [0,1] —> C(TZ) by g(t)(x) = f(x+t). It is not
difficult to show that g is continuous and, therefore, from Theorem 7.11
on page 472, { f (•+/):/ G [0,1] } is a compact subset of C(TZ).
c)	Let f € C(TZ). The set {/(• +1) : t G TZ} may fail to be a compact
subset of C(7£), as is the case when /(x) = x.	□
The construction of more elaborate examples of compactness in func-
tion spaces requires some theory. We first develop a necessary condition
for the compactness of D C C(Q, A) using the functions {ex : x G Q}.
We will use a result from Exercise 7.174 on page 491, which we now state
formally as a proposition.
PROPOSITION 8.1
For x e Q, define ex: C(Q, A) -» A byex(f) = f(x). Then ex is a continuous
function.
Applying Proposition 8.1 and Theorem 7.11 (page 472) we get the
following necessary condition for compactness of a subset of С(П, A).
504 о Chapter 8 Complete Spaces, Compact Spaces, and Approximation
PROPOSITION 8.2
If D C C(Q, Л) is compact, then { f(x) : f G D} is a compact subset of A
for each z G Q. .
Next, using Proposition 8.2 and Theorem 7.7 on page 466, we derive
another necessary condition for compactness of a subset of C(Q, A).
PROPOSITION 8.3
Suppose that Q is a locally compact Hausdorff space and D is a compact
subset of С(П, A). Then, given x € П and e > 0, there is an open set W
containing x such that p(f(x), /(?/)) < e for ally eW and f G D,
PROOF: First recall that for f,g G С(П, A) and S C$1,
Ps(J,g) = sup{p(y(x),p(x)) :i€S}
and that T(£l, A) is the weak topology on C(Q, A) determined by the family
erf functions { Pk(-, g) : К compact, g G C(Q, A) }.
Now let U be an open set containing x such that U is compact. Then,
for each h G C(Q, A), the function Рц(-, ti) is continuous. It follows that the
collection of sets { Jf : f G D } where Jf = { g : pff(g, f) < c/3 } is an open
covering of D. Hence, there are finitely many functions /i, /2, • • •, /п € D
such that D C /fj- Because there are only finitely many fjS, we can
find an open set W such that x G W C U and p(Jj(x), fj(y)) < e/3 for
each у G W and j = 1, 2, ..., n.
If f G D, we choose к G {1,2,..., n} such that py(fk, f) < e/3- Then
for each у G W, we have
f(y)j < p(f(x), fk(x)j + p(/fc(x), fk(y}) + p(/fc(у), У(у))
< 2py(fk, f) + p(fk(x), fk(y)) < 2t/3 + e/3 = e,
as required.	
The necessary condition derived in Proposition 8.3 is a natural exten-
sion of the notion of continuity. It is so important that it merits a formal
definition, which we now give. Note that the definition does not require Q
to be locally compact or Hausdorff.
8.3 Compactness in the space C(Q, Л) □ 505
DEFINITION 8.3 Equicontinuity
A subset D of C(fi, Л) is said to be equicontinuous on fi if for each
x E fi and € > 0, there is an open set W containing x such that
p(/(x), /(?/)) < € for all у € W and f € D.
We will now show that under mild conditions on fi, the necessary
conditions derived in Propositions 8.2 and 8.3 are also sufficient.
THEOREM 8.4
Let SI be a second countable locally compact Hausdorff space. A subset D
of C(SI, Л) is compact if and only if it is closed and satisfies the following
conditions:________
a)	{ f(x) : f e D} is a compact subset of Л for each x G fi.
b)	D is equicontinuous.
PROOF: Propositions 8.2 and 8.3 already show that conditions (a) and (b)
are implied by the compactness of D.
Assume that (a) and (b) hold and that D is closed. From Theorem 7.19
on page 486, we know that the space C(fi,A) is metrizable. Hence, by
Theorem 7.7 on page 466, the compactness of D will follow if we can show
that every sequence in D has a subsequence that converges uniformly on
compact subsets of fi.
So let {/n}~=1 C D. Since fi is second countable, Proposition 7.20 on
page 461 implies that it contains a countable dense set, E. Suppose we can
find a subsequence {gkjkLi of {/n)Xi suc^ ^at
= lim 9k(y)	(8.10)
K—+OO
exists for all у € E. We will show that the limit in (8.10) exists for all
x G fi, that the limit function g is continuous, and that {gk}^-i converges
uniformly on compact subsets of fi to g.
Let x 6 fi and e > 0 be given. By (b) we can choose an open set W
containing x such that
p(fffc(a:),ff*:(w)) < e/3	(8.11)
for each w € W and к G Af. As E is dense in fi, there is а у € E П W.
By (8.10) there is an N 6 AT such that р(дк(у)^де(у)) < e/3 whenever
506 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation
k,t> N. It follows that
p(Sfc(a:),+ P(9k(.y),9e(y))
+ р(«(у),рЯ^))	(8-12)
< б/З 4- б/З 4~ б/З = 6
for к,£> N. Thus, the sequence {рл(^)}ь=1 is Cauchy. Since {gk(x)}kLi is
contained in the compact set {/(x) : f G D}, it follows from Theorem 7.7
that limfc_>oogk(x) = g(x) exists. Thus, (8.10) continues to hold if у is
replaced by any x G fi.
We will now verify that g is continuous at each x G fl. For each
e > 0, we choose an open set W as in the previous paragraph. From the
inequalities
|p(ff(®),<z(w)) - p(fffc(x),pfc(w))| < |p(ff(:r),5(w)) - p(g(x),gk(w))\
+ |p(ff(^),5fcW) - p(gk(x),gk{w))|
< P(ff(w),5*;(w)) + p(g(x'),gk(x)'),
it follows that lim^^oo p(^(x),^(w)) = p(^(x),^(w)). Consequently, we
can let к pass to infinity in (8.11) to obtain p(g(x), <?(w)) < б/З < б for
each w G W.
Next we show that for each x G Q and б > 0, there is an open set Ox
containing x and an integer kx such that
p(^fc(w),ff(w)) <6, w € Ox, к > kx.	(8.13)
Indeed, by the continuity of g and (b), we can choose an open set Ox
containing x such that for each w G Ox, we have p(^(w),^(x)) < б/З and
p(<7k(w)>^(z)) < e/3 for all k. Because g(x) = limfc->oo <7fc(z), it follows
that there is an integer kx such that р(<7л(х)>р(х)) < e/% f°r
Thus, for each w G Ox, we have
p(fffc(w),s(w)) < p(gk(w),gk(x)) + p(gk(x),g(x)) + p(g(x),g(w))
< б/З ~F б/З 4" б/З = б,
whenever к > kx and, so, (8.13) holds.
Now, let К be a compact subset of Q and б > 0. Then we can cover К
with finitely many open sets {OXj each °f which satisfies (8.13). Let
ко = max{ kXj : j = 1,2,..., m }. If у € К, then у € OXj for some j and,
8.3 Compactness in the space C(Q, Л) □ 507
hence, p(<7fc(?/), <7(2/)) < c for k > fco. Thus, we have shown that {gk
converges uniformly on compact subsets of Q to g.
It now remains only to show that there is a subsequence {gk}kLi
of {/n}~ i satisfying (8.10). We shall do this by adapting the diagonaliza-
tion argument used in the proof of (d) => (c) in Theorem 7.7 on page 466.
Let {z/n}^Li be an enumeration of the countable dense subset E of fi
that we selected earlier. By (a) we can select a subsequence {/[i,n] }n=1
of {/n}Xi such that #(3/i) = limn-oo /[I,n](2/1) exists. And, then again
by (a), we can find a subsequence {/[2^]}^^ of {/[i,n] such that
gtyj) = limn->oo f[2,n](yj) exists for j = 1,2. Continuing in this man-
ner, we obtain a sequence of subsequences {JXLri. x suc^ ^at
g(yj) = limn^oo /[fc,n](3/j) exists for j = 1, 2, ..., fc. Letting gk = f[k,k], for
each fc € AT, we now have a subsequence of {/n}^Li that satisfies (8.10). 
In the mathematical literature, the following variant of Theorem 8.4
is frequently cited.
THEOREM 8.5 Ascoli-Arzela Theorem
Let ft be a separable topological space and D a subset of С(П, Л) that
satisfies the following conditions:
a)	{ f(x) : f G D} is a compact subset of Л for each x € Q.
b)	D is equicontinuous.
Then every sequence in D has a subsequence that converges uniformly on
compact subsets of Q.
PROOF: In the proof of the sufficiency part of Theorem 8.4, the second
countability and local compactness conditions are used only to ensure that
C(Q,A) is metrizable. If we are just trying to show that every sequence
in D has a subsequence that converges uniformly on compact sets, then all
that is required for the proof of Theorem 8.4 to remain valid is the existence
of a countable dense subset of Q.	
Next we give a simple example of the use of Theorem 8.4. More elab-
orate applications are left to the exercises.
EXAMPLE 8.7 Illustrates Theorem 8.4
Let Q = [a, b] and Л = C.
a)	The closed ball Br(f) of radius r > 0 in C([a, b]) clearly satisfies the
condition (a) of Theorem 8.4. On the other hand, Exercise 8.18 shows
that Br(f) is not equicontinuous. Thus, Br(f) is not a compact subset
of C([a,b]).
508 о Chapter 8 Complete Spaces, Compact Spaces, and Approximation
b)	Let A and В be nonnegative constants and F consist of all functions
f G C([a, b]) satisfying |/(a)| < A, f is differentiable on (a, b), and
\ff(x)| < В for all x G (a, b). We claim that D = F is compact.
Indeed, if x G (a, b], then, by the mean value theorem from ele-
mentary calculus, we have that f(x) = /(a) 4- f'(t)(x — a) for some
t G (a,x). Setting M = A 4- (b - a)B, we see that |/(ж)| < M for all
f G F and x G [a, b]. It follows easily that F and, hence, Z), satisfies
Theorem 8.4(a).
If a?, у G [a, b] and x / ?/, then another application of the mean
value theorem yields \f(x) — f(y)\ < B\x — y\. It follows immediately
that F is equicontinuous and, hence, by Exercise 8.19, so is D. We can
now conclude from Theorem 8.4 that D = F is compact.	□
EXERCISES S3
Some of the exercises in this section use the concept of a compact function.
Let Q and Л be topological spaces and U C Q. A function f: U —* A is said to
be compact if f(U) is a compact subset of A.
8.18	Show that the closed ball Br(f) in C([a,b]) is not equicontinuous.
8.19	Show that if F C C(Q, A) is equicontinuous, then F is also equicontinuous.
8.20	Let g G £х([0,1]) and F denote the set of all functions f G C([0,1]) such
that /(0) = 0, f is absolutely continuous, and |/'| < |#| A-ae. Show that
F is compact.
8.21	Let Af* denote the one-point compactification of the space (Af, 7d), where
Af is the set of positive integers and Td is the discrete topology. Prove
that a closed set D C C(Af*) is compact if and only if it satisfies the
following two conditions: (1) sup{ |/(tu)| : f G D} < oo and (2) there is a
sequence {bn}^^ such that limn—oo bn = 0 and |/(n) — /(o>)| — bn afi
f G D and n G Af.
8.22	Let f(x) = sin x 4- sin \/2д:. For t G define ft (%) = f(x — t).
a) Show that { ft : t E 11} is a compact subset of C(7£).
b) Find a convergent subsequence of {f-2nn}^=1-
8.23	Refer to Exercise 8.22. Give an example of a bounded function f such that
{ ft : t G 1Z } is not a compact subset of C(1Z).
8.24	Refer to Example 8.5 on page 500. Suppose we drop the assumptions that
(8.6) holds and a A < 1, but still assume that (8.7) holds and aB < /3.
a)	Show that T still maps В/з(1/о) into itself.
b)	Show that T is a compact function.
8.4 Compactness of Product Spaces о 509
Exercises 8.25—8.28 require some knowledge of the theory of functions of a com-
plex variable? In these exercises, Q denotes an open subset of C and H(Q) the
set of functions that are analytic on Q.
8.25	Show that is a closed subset of C(Q). Hint: Use the Cauchy integral
formula.
8.26	Let F C B(Q). Show that F is compact if and only if it is closed in C(Q)
and for each z0 G Q there is an r > 0 such that Br(z0) C Q and
sup{ \f(z)\ : f tF, zE Br(zo)} < oo.
8.27	Let Q = Bi(0) and U = { f G H(Q) : \ f\ < 1}. Prove that for each z G Q,
there is a g G U such that |/(z)| = sup{\f'(z)| : f G U }.
8.28	Let Q and U be as in Exercise 8.27 and 0 < r < 1. Consider the function
T: U U defined by T(/)(z) = f(rz). Prove that T is a compact function.
8.4	COMPACTNESS OF PRODUCT SPACES
The Cartesian product of two compact intervals of is a compact rect-
angle in 7£2. Indeed, it is not hard to prove that Г x Л is a compact
space whenever Г and Л are compact spaces, as you are asked to show in
Exercise 8.29.
In this section, we prove a striking generalization of the forementioned
fact, namely, that the Cartesian product of any collection of compact spaces
is compact. As a corollary to the main result, we obtain simple sufficient
conditions for compactness of spaces with weak topologies.
To begin, we briefly review two prerequisite concepts.
•	If	is an indexed collection of sets, then the Cartesian product of
the collection, denoted A, is the set of all functions x on I such
that x(l) G Al for each l G I. We call x(l) the Lth coordinate of x and
often denote it by xL.
•	Let {(Qt, TL)be an indexed collection of topological spaces and set
Q = XtG/ The function pc: fl —>	defined by pb(x) = x(t) is called
the tth coordinate projection on fl. The weak topology on Q determined
by the family of coordinate projections {pL : l G I} is called the product
topology. Thus, the product topology is the weakest topology for which
all coordinate projections are continuous.
Now we present the main result of this section, known as Tychonoff’s
theorem.
t See, for example, L. Ahlfors, Complex Analysis (New York: McGraw-Hill, 1979).
510 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation
THEOREM 8.6 Tychonoff’s Theorem
The Cartesian product of any family of compact spaces is compact. That
is, if	an indexed collection of compact topological spaces, then
the Cartesian product Q = X is compact with respect to the product
topology.
PROOF: By Theorem 7.10 on page 471 it suffices to show that any col-
lection of closed subsets of П having the finite intersection property has
a nonempty intersection. So, let C be a collection of closed subsets of Q
having the finite intersection property. We need to prove that
f| F/0.	(8.14)
Fee
Let 21 denote the family of all A C P(fi) such that A has the finite
intersection property and С C A. We will use Zorn’s lemma (page 17)
to show that 21 contains a maximal element with respect to the inclusion
ordering C.
Suppose that € is a nonempty chain in 21. We claim that U = Uxec
is an upper bound for C. Clearly A C U for all A € €. So, we need only
show that U E 21. Because it is obvious that С C U, it remains to prove
that U has the finite intersection property.
Suppose Ui,U2, • •. ,Un G U. Then, for each j, Uj G Aj for some
Aj G €. Since € is a chain, there exists an A G € such that Aj C A
for j = 1, 2, ..., n. It follows that U±, U2,.. - ,Un G A and, since A has
the finite intersection property, we conclude that flj=1 Uj / 0. Thus, U is
an upper bound for €. Zorn’s lemma now implies that 21 has a maximal
element, say, A*.
We claim that A* has the following properties:
Аъ A2,..., An G A* => p| Aj g A*	(8.15)
j=i
and
В С Q and В П A / 0 for all A G А* => В G A*.	(8.16)
To verify (8.15), let A = p|?=1 Aj. We note that A*U{A} has the finite
intersection property and that С С A* U {A}. Because A* is a maximal
element of 21, we must have A* U {A} = A*. Hence, A G A*.
To establish (8.16), we show that A* U {B} has the finite intersection
property. It will then follow from the maximality of A* that В G A*. Let
Bx, B2, ..., Bm be distinct elements of A* U {B}. If Bj / В for j = 1, 2,
..., m, then	because A* has the finite intersection property.
8.4 Compactness of Product Spaces □ 511
On the other hand,Jf Bk = В for some fc, then (\:^kBj G A* by (8.15).
Thus,
771	z	\
Г|В, =ВП.(р|В,) /0,
J=1	'
as required.
Let l G I. For 8 C A*, we have pL (Г|Л€£ А) С Пде^рДА). Hence,
the collection {pfc(A) : A 6 A* } has the finite intersection property. From
the compactness of the space we can conclude that Pl(A) / 0.
It now follows from the axiom of choice (page 16) that there is an x G Q
such that
Pt(*) € Pl Pl (A)
AGX*
for each l G I.
We will show that x G P|FgC F by proving that if F G C and W is an
open set containing x, then
WnF/0.	(8.17)
It will follow that x G F = F for each F G C.
We recall that the product topology is determined by the neighborhood
basis consisting of finite intersections of sets of the form p^WQ, where
Wb is an open subset of Qt. Thus, to establish (8.17), it suffices to consider
the case where W = f\€/ РГ^Ж) for some finite subset Iq С I.
Let l G Iq and A G A*. Because pb(x) G Pt(A) and WL is an open set
containing рДя), it follows that И^ПрДА) / 0. Thus, pp1(Wt)nA / 0 for
each A G A* and, hence, by (8.16), pp1(WJ G A*. Therefore, since F G A*
and A* has the finite intersection property, (П^е/о РГЧ^)) Fl F / 0. 
Next we will apply Tychonoff’s theorem to the study of compactness in
the context of weak topologies. Specifically, we have the following corollary,
which will be useful later when we study weak topologies on linear spaces.
COROLLARY 8.2
Suppose that Q has the weak topology determined by a family of func-
tions F and that the following conditions are satisfied:
a)	f (fi) is compact for each f G F.
b)	Ifx/ y, then f(x) / f(y) for some f G F.
c)	If {f(xL)}bEj is a convergent net for each f G F, then there is an x G fi
such that lim f(xL) = f(x) for each f G F.
Then Q is compact.
512 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation
PROOF: Each f G T maps Q into a topological space Пу. Condition (a)
asserts that Ay = /(Q) is a compact subset of Qy. It follows from Ty-
chonoff’s theorem that the product space Л = X y6^ Ay is compact. Since
a closed subset of a compact space is compact (Theorem 7.12 on page 473),
we can establish the corollary by finding a homeomorphism from Q onto a
closed subset of A.
Let h: П —* A be defined by h(x)(/) = /(x). Condition (c) is equivalent
to the assertion that h(Q) is a closed subset of A. That h is continuous fol-
lows from the definitions of product and weak topologies, Proposition 7.13
on page 444, and Theorem 7.1 on page 443. Condition (b) says that h is
one-to-one.
It remains only to show that the inverse function h^1 is continuous
on h(Q). If {h(xc)}cei is a net in h(Q) converging to Л(х), then, by Propo-
sition 7.13, converges in Q to x. It now follows from Theorem 7.1
that /i-1 is continuous.	
Recall that if	= A for each l G 7, where A is some set, then
Cartesian products of the form XtGj Ab are usually denoted by A1. Note
that A1 is just the set of all functions from I to A. In case I is the set M
of positive integers, A1 is the set of all sequences of elements of A. Often
a typical element of A^ is written in the form x = (xi, X2,...).
EXAMPLE 8.8 Illustrates Tychonoff’s Theorem
a)	Consider the space { 0,1} endowed with the discrete topology and let I
be any index set. We write 21 in place of {0,1}7. It follows from
Tychonoff’s theorem that 21 is a compact space.
b)	If [a, b] is a closed bounded interval, then it follows from the Heine-Borel
theorem and'Tychonoff’s theorem that [a, b]1 is compact for any index
set I.	□
EXERCISES 8.4
8.29 Give an alternative proof of Tychonoff’s theorem in case the index set has
only two elements. That is, without using Tychonoff’s theorem, show that
if Г and A are compact, then so is Г x A (in the product topology).
8.30 Let (Q, T) be a compact Hausdorff space. Show that the topology T coin-
cides with the weak topology on Q determined by the family of functions
^ = C(Q).
8.31 Prove that the Cartesian product of a collection of Hausdorff spaces is a
Hausdorff space.
8.32 Let Q be a compact Hausdorff space. Show that Q is homeomorphic to a
closed subset of [0,1]J for some set I. Hint: Refer to Exercise 8.31.
8.5 Approximation by Functions From a Lattice о 513
8.33 A topological space Q is said to be completely regular if it satisfies the
following two conditions:
•	Q is a Ti -space.
•	Given a closed set F and a point x G Fc, there is a continuous function
k: Q —► [0, l^such that k(x) = 0 and k(F) = {1}.
Let Q be completely regular and set F — C(£l, [0,1]). Define h: Q —► [0,1]'77
by h(x)(/) = /(x). Prove that h is a homeomorphism of Q onto h(Q).
8.34 This exercise continues Exercise 8.33. Let /?(Q) denote the closure of h(Q)
in [0,1]77. By identifying Q with h(Q) via the map h we may consider Q a
dense subset of the compact Hausdorff space /?(Q). Thus, /?(Q) is a compact-
ification of Q, that is, a compact space containing Q as a dense subspace.
The space /?(Q) is called the Stone-Cech compactification of Q.
a)	Prove that if g: Q —► 1Z is continuous and bounded, then it has a contin-
uous extension to /?(Q).
b)	Does the one point compactification of Theorem 7.17 have the property
in part (a)?
Note: To appreciate the Stone-Cech compactification, it helps to consider
the continuous function sin(l/a:) on the interval (0,00).
8.35 This exercise continues Exercise 8.34. Show that /?(Q) is the largest com-
pactification of Q in the following sense: If A is a compact Hausdorff space
such that Q C A, Q = A, and the topology of Q is the same as its relative
topology inherited from A, then there is a continuous function f: /?(Q) —► A
such that f(x) = x for each x G Q.
8.35 Show that if I is uncountable, then 2J is not metrizable.
8.3 f Prove that 2^ and the Cantor set are homeomorphic. Hint: Define h on 2^
by h((xi, x2, •••))=	2x„/3n.
8.38 Show that if Q is a compact metric space, then Q is homeomorphic to a
closed subset of [0, l]7^.
8.39 Find a continuous function from 2^ onto [0, l]7^.
8.40 This exercise involves the Cantor set.
a)	Let F be a nonempty closed subset of the Cantor set P. Show that there
is a continuous function r: P —♦ F such that r(t) = t for each t G F.
b)	Use part (a) and Exercises 8.37-8.39 to show that if Q is a compact
metric space, then there is a continuous function from P onto Q.
8.5 APPROXIMATION BY FUNCTIONS FROM A LATTICE
Recall that a real-valued function g on an interval J is piecewise linear
if there is a partition o,q < ai < 0,2 < ... < an of J such that on each
subinterval	we have g(t) = mjt 4- bj for some real numbers mj
514 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation
and bj. Consider the following two problems of approximation of continuous
functions on the closed interval [0,1].
Problem 1: Given f E C([0,1], 7£) and e > 0, find a continuous piecewise
linear function g such that |/(t) — g(t)\ < e for each t E [0,1].
Problem 2: Given f e C([0,1], 7£) and б > 0, find a polynomial p such that
\f(t) ” pWI < 6 for each t € [fi? 1]-
Let us denote by W the set of continuous piecewise linear functions
on [0,1]. And let us also denote by Pr the set of polynomials with real
coefficients? Then it is easy to see that Problems 1 and 2 can be solved if,
respectively, f E W and f E Pr-
Motivated by Problem 1, we will prove in this section a general result
from which we obtain W = C([0,1],7£) as a special case. In the next
section, we will prove a theorem that has Pr = C([0,1],7£) as a special case.
It then follows that Problems 1 and 2 can be solved for each f E C([0,1], 7£).
The collection W of continuous piecewise linear function is a motivat-
ing example for the following definition.
DEFINITION 8.4 Lattice of Functions
A collection £ of real-valued functions on a set Q is called a lattice if
it is closed under maximums and minimums. That is,
a) f>9 € £ implies fVgEC.
b) f>9 £ £ implies f Ад E £.
EXAMPLE 8.9 Illustrates Definition 8.4
a)	The following are lattices of functions contained in C([0,1],7£):
(i)	C([0,l],7£) itself.
(ii)	The collection of nonnegative continuous functions on [0,1].
(iii)	The collection W of continuous piecewise linear functions on [0,1].
b)	The collection Pr of polynomials on [0,1] with real coefficients is not a
lattice of functions.	□
t If we restrict a member of Pr to any subset of 7£, it is continuous thereon. For
convenience, we will abuse notation slightly and use Pr to denote the collection of
polynomials with real coefficients considered as functions on any particular subset
of 7£. Context will determine the appropriate subset.
8.5 Approximation by Functions From a Lattice □ 515
The main result of this section, the Kakutani-Krein theorem, provides
a set of sufficient conditions for a lattice to be dense in C(fi, TV) when fi is
a compact Hausdorff space. In order to prove that result, we first establish
the following theorem, which is important in its own right. You should
recall the notation ||/||q = sup{ |/(x)| : x G fi }.
THEOREM 8.7 Dini’s Theorem
Let fi be a compact Hausdorff space. Suppose that F C C(fi, TV) has the
following two properties:
a)	f>9 F implies there is an h G F such that h < f f\g.
b)	The function /о defined by fo(x) = inf{/(ic) : f G T7} is real-valued
and continuous.
Then given e > 0, there exists an f G F such that Ц/ — fa||q < c.
PROOF: By the definition of /о, for each x G fi, there is a function fx G F
such that /х(х) < /o(^) + c. Because /о is continuous, the sets
ux = { у : fx(y) < /о(у) + e}, x e Q,
constitute an open covering of fi. Hence, there are points x^, X2,..., xn G fi
such that fi = Uj=i VXj •
Using (a), we can find an f G F with / < fx. for j = 1, 2, ..., n.
Hence, for each x G fi, we have
/o(®) < /(x) < min fx (x) < fo(x) + e.
It follows at once that \\f — /o||q < e.	
DEFINITION 8.5 Separation of Points
A collection F of functions on a set fi is said to separate points of fi
if whenever x and у are distinct elements of fi, there is an f G F such
that /(x) / /(y).
EXAMPLE 8.10 Illustrates Definition 8.5
a)	If F contains a one-to-one function, then it separates points.
b)	PT separates the points of [0,1] because it contains the identity function.
c)	W separates the points of [0,1] because it contains the identity function.
516 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation
d)	Let F denote the polynomials on [—1,1] containing only even powers,
that is, polynomials of the form do + aii2 4-----к anx2n. Then T does
not separate the points of [—1,1] since /(-1) = /(1) for all f e T.
e)	Suppose fl is a topological space with the property that there is a col-
lection T C C(fl, Tty separating the points of fl. Then fi is a Hausdorff
space.
f)	The collection of functions {sinrr,cosa;} separates points of [0,2%), but
it does not separate points of [0,2тг].	□
THEOREM 8.8 Kakutani-Krein Theorem
Let fl be a compact Hausdorff space. Suppose that £ C C(Q, Tty satisfies
the following conditions:
a) £ is a lattice,
b) £ separates points of fl.
c) f E_£ and c^Tl implies cf G £ and f + c€ £.
Then £ — C(tl,1ty.
PROOF: For д G C(Q, Ity, let £g = { f G £ : g < f }. Because £ is a lat-
tice, it follows that £g satisfies condition (a) of Dini’s theorem. Therefore,
if we can show that
c?(x) = inf{ f(x) : f eCg}	(8.18)
for each x 6 Q, then Dini’s theorem will imply the required result. We
will prove (8.18) by constructing for each e > 0, an fx € £g such that
= g(x) + e.
To begin our construction, we show that for each pair of distinct points
V, z € fl and each pair of real numbers a and b, there is an h € £ such that
h(y) = a and h(z) = b.	(8.19)
Using (b), we choose an ho € £ with ho(y)	ho (z) and then, using (c), we
conclude that the function
h = (a - b)———+ b
^o(y) - h0(z)
belongs to £. It is easy to see that h satisfies (8.19).
Next we consider the open set О = { у : g{y) < g(x) + e }. For each
z € Oc, we can apply (8.19) to obtain an hz G £ such that
M*) = sW + < and hz(x) = g(x) + e/2.
8.5 Approximation by Functions From a Lattice □ 517
Let Vz = { у : hz(y) > g(y)}. Then { Vz : z G Oc } is an open covering of
the compact set Oc. Therefore, there are points zi, Z2,..., zn G Oc such
that Oc C U;=i VZj.
Let
fx =	+ б) V hZ1 V hZ2 ... V hZn.
It follows from (c) that £ contains the constant function g(x)+e and, hence,
by (a), we have fx e £. If у G Oc, then у G VZj for some j G {1,2,..., n}
and, consequently, we have that fx(y) > hZj (?/) > g(y). On the other hand,
if у G O, then fx(y) >	> д(у)- It now follows that fx G Cg. Finally,
we have that
fx(x) = (g(z) + e) V (g{x) + e/2) V •  • V (g(x) + e/2) = g{x) + e,
as required.	
EXAMPLE 8.11 Illustrates the Kakutani-Krein Theorem
It is easy to see that the collection W of continuous piecewise linear func-
tions on [0,1] satisfies the conditions of the Kakutani-Krein theorem. Con-
sequently, W = C([0,1], 1Z). In other words, Problem 1 on page 514 can
be solved for each f G C([0,1]).	□
The most important application of the Kakutani-Krein theorem comes
in the next section where it is used to prove the Stone-Weierstrass theorem.
EXERCISES 8.5
8.41	Show that Dini’s theorem fails if the assumption that /0 is continuous is
dropped.
8.42	In Exercise 2.63 on page 72, we asked you to prove another version of Dini’s
theorem. Show that the theorem stated there is a special case of the Dini’s
theorem of this section (Theorem 8.7).
8.43	Verify that the collection W of continuous piecewise linear functions on [0,1]
is a lattice.
8.44	Suppose that £ C C(Q,7£), where Q is a compact Hausdorff space. Show
that if £ is a lattice, then so is £.
8.45	Suppose that £ is a linear subspace of C(Q,7£). Show that £ is a lattice if
and only if |f I G £ whenever f G £.
8.46	Give an example showing that the Kakutani-Krein theorem fails if condi-
tions (b) and (c) are retained but (a) is dropped.
518 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation
8.47	Give an example showing that the Kakutani-Krein theorem fails if condi-
tions (a) and (c) are retained but (b) is dropped.
8.48	Give an example showing that the Kakutani-Krein theorem fails if condi-
tions (a) and (b) are retained but (c).is dropped.
8.49	Let Q be a compact Hausdorff space. Suppose that £ C C(Q, 7£) satisfies
the following conditions:
•	£ is closed.
•	£ is a linear subspace of C(Q, 7£).
•	£ is a lattice.
•	f G £, g € C(Q, 7£), and 0 < g < f imply g G £.
Show that either £ = C(Q,7£) or there is a nonempty closed set F C Q
such that £ C { f G C(Q, TV) : f(x) = 0 for each x € F }.
8.6 APPROXIMATION BY FUNCTIONS FROM AN ALGEBRA
In our study of measure theory, we found that the concept of an alge-
bra of functions is essential. Now we will see that this concept is also of
importance in the study of approximation by functions.
DEFINITION 8.6 Algebra of Functions
A collection A of real-valued or complex-valued functions on a set Q is
called an algebra if it is closed under addition,, scalar multiplication,
and multiplication. That is, if /, g G A and a is a scalar, then
a) f + gtA.
b) af G A.
c) f • 9 e A.
Theorem 4.3 (page 176) and Exercise 4.32 (page 182) show, respec-
tively, that the collection of real-valued and complex-valued functions mea-
surable with respect to a cr-algebra of subsets of a set Q form algebras of
functions. In addition, Theorem 2.4 (page 66) shows that the collection
of real-valued continuous functions oh a subset of 7Z constitutes an alge-
bra of functions. It is this latter type of algebra—algebras of continuous
functions — that will be important to us in this section.
8.6 Approximation by Functions From an Algebra □ 519
EXAMPLE 8.12 Illustrates Definition 8.6
a)	Let Q be a topological space. Then the collection C(Q, 7£) of real-valued
continuous functions on Q is an algebra of functions.
b)	Let Q be a topological space. Then the collection C(Q) of complex-
valued continuous functions on Q is an algebra of functions.
c)	Let Pr denote the collection of polynomials with real coefficients viewed
as functions on the closed bounded interval [a, b]J Clearly, Pr is an
algebra in C([a, b], 7£).
d)	Let Ur denote the collection of trigonometric polynomials with real co-
efficients viewed as functions on the closed bounded interval [a, b\. We
claim that Ur is also an algebra in C([a, b],7£). To see this, recall that
a trigonometric polynomial и with real coefficients is a function of the
form
n
u(t) = cos jt + bj sin ji),	(8.20)
j=o
where the a7s and bjS are real numbers. That Ur is a linear subspace
of C([a, b],7£) is clear; that it is also closed under multiplication and,
hence constitutes ah algebra, follows from the trigonometric identities
2 cos jt cos kt = cos(j 4- k)t 4- cos(j — k)t,
2 sin jt sin kt = cos(j — k)t — cos(j 4- k)t,
and
2 sin jt cos kt = sin(j 4- k)t 4- sin(j — k)t.
e) By considering functions of the form (8.20) where the djS and bjS are
permitted to be complex numbers, we obtain the collection W of complex
trigonometric polynomials. Rather than writing complex trigonometric
polynomials in the form (8.20), we will usually work with the equivalent
expression
«w = ZL cieijt-
j=-n
It is easy to check that, viewed as a subset of C([a, b]), U is an algebra
of functions.
1 See the footnote on page 514.
520 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation
f) Suppose Q is a compact subset of C. Let P(fi) denote the collection
of functions in С(П) that are polynomials in 2, that is, functions of
the form p(z) =	where the a^s are complex constants. It is
clear that P(Q) is an algebra in C(Q). Another algebra in C(fi), which
we denote by P*(Q), consists of all polynomials in z and 7, that is,
functions of the form
P(2,2) =
j=0 fc=0
where each aj^ E C. We will see later that the closure of P*(Q) is C(Q).
g) Refer to part (f). The case fi = T, where T is the unit circle in the
complex plane, { z G C : |z| = 1}, is of particular interest. It is not
hard to show that P*(T) consists of all functions of the form
n
«(*) = £ cizi' zeT'
j~—n
where each Cj E C. There is a connection between P*(T) and the
collection U of trigonometric polynomials, namely, и E U if and only if
u(t) = for some q E P*(T).	□
Motivated by Problem 2 on page 514, we now take up the question of
when an algebra of functions A C C(Q,7i) is dense in C(Q,7£). Later in
this section, we will consider the same question when 1Z is replaced by C.
It is a classical result due to Karl Weierstrass that every continuous
real-valued function on a closed bounded interval can be uniformly approx-
imated arbitrarily closely by polynomials. In the notation of the preceding
example, Weierstrass’s theorem states that Pr = C([a, b],7£).
Our next theorem, the Stone-Weierstrass theorem, is a far-reaching
generalization of the forementioned result. The Stone-Weierstrass theo-
rem gives a set of sufficient conditions for an algebra of functions to be
dense in C(Q, 7£) when Q is a compact Hausdorff space. Its proof relies on
the following lemma whose verification was considered in Exercise 3.2 on
page 102?
t Note that the lemma is actually a special case of the classical Weierstrass theorem
but can be proved using a rather straightforward argument.
8.6 Approximation by Functions From an Algebra □ 521
LEMMA 8.1
For each e > 0, there is a polynomial p such that ||t| — p(t)\ < e for all
tehMk
THEOREM 8.9 Stone-Weierstrass Theorem
Let fi be a compact Hausdorff space. Suppose that A C C(Q, 1Z) satisfies
the following conditions:
a)	A is an algebra.
b)	A separates points of Q.
с)	1 eA.
Then A = Cffl.TV).
PROOF: We leave it to the reader as an exercise to show that because A is
an algebra of functions, so is A. If we can prove that A is also a lattice,
then the verification will be complete on account of the Kakutani-Krein
theorem (page 516).
We first note that/Vp = (/+p+|/-p|)/2 and fKg = (/4-p-|/-p|)/2.
Thus, to prove A is a lattice, it suffices to show that
feA=>\f\eA.	(8.21)
If f = 0, then (8.21) is trivial. If f 0, let g = //||/||n and observe that
g € A. Given б > 0, we can apply Lemma 8.1 to obtain a polynomial p
such ||t| “p(t)| < б for all t G [—1,1]. Because the range of g is contained
in [—1,1], it follows that
|||p|-p°5||n <
And because p о g is a polynomial in powers of g and A is an algebra
containing the constant functions, we conclude that род e A. Thus,
|p| 6 A = A. Finally, because f is a scalar multiple of g and A is an
algebra, it follows that \ f\ G A.	
EXAMPLE 8.1	3 Illustrates the Stone-Weierstrass Theorem
Suppose fi is a compact subset of TV1. Let P™ denote the set of polynomials
in n variables with real coefficients. It is clear that, as a collection of func-
tions on Q, P™ satisfies the hypotheses of the Stone-Weierstrass theorem.
It follows that any f G C(Q, H) can be approximated arbitrarily closely by
polynomials in n variables with real coefficients.	□
522 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation
EXAMPLE 8.1	4 Illustrates the Stone-Weierstrass Theorem
The collection of real-valued trigonometric polynomials Ur is an algebra of
functions in C([0,tt],7?,) satisfying the hypotheses of the Stone-Weierstrass
theorem. Thus, llr = C([0,7r],Ti). As an algebra in C([0,2тг],7£), how-
ever, Ur does not satisfy condition (b) of the Stone-Weierstrass theorem
because it(0) = u(2tt) for each it G UT. If g € C([0,2тг], 1Z) is such that
p(0)	<?(2тг), then g cannot be uniformly approximated arbitrarily closely
by trigonometric polynomials.	□
EXAMPLE 8.1	5 Illustrates the Stone-Weierstrass Theorem
Suppose f G C(7£, П) is periodic with period 2тг. Then, see Example 7.28
on page 490, f(t) = g(ezt>) for some function g G C'(T,7^). It is easy to
verify that the algebra P*(T), defined in Example 8.12(g), is such that
P*(T) П C(7\1Z) is an algebra in C(T,7£) satisfying the hypotheses of
the Stone-Weierstrass theorem. Consequently, for each e > 0, there is a
p G P*(T) П C(T, 7£) such that \\g — p\\r < c. It follows that
|/(t) -р(е^)| < e, tell.
Thus, we have proved the following important fact: Every continuous real-
valued function on П having period 2тг can be uniformly approximated
arbitrarily closely by a trigonometric polynomial.	□
Complex Version of the Stone-Weierstrass Theorem
If C(Q,7£) is replaced by C(Q), the hypotheses of the Stone-Weierstrass
theorem must be augmented in order to obtain an analogous result.
THEOREM 8.10 Stone-Weierstrass Theorem (Complex Version)
Let Q be a compact Hausdorff space. Suppose that Л C C(Q) satisfies the
following conditions:
a) A is an algebra.
b)	A separates points of Q.
c)	1 G A. _
d)	fe_A=>fe A.
Then A =
PROOF: Let 3L4 denote the set of real parts of functions in A. Because
ЭМ = {(/+7)/2:/€Л},
8.6 Approximation by Functions From an Algebra □ 523
it follows from the hypotheses of the theorem that 3?Л C A and that is
an algebra in C(Q,7£).
We claim that 5?Л separates the points of Q. Let x and у be distinct
elements of Q. By condition (b), there is an f G A such that f(x) =£ f(y).
We note that either 5?/(rr) =£ or 3?(г/(гг)) =£ Э?(г/(?/)). Because A is
an algebra, we have if G A. It follows that 5?Л separates the points of Q.
Because 5?Л satisfies the hypotheses of Theorem 8.9, we conclude that
ЗЫ = C(Q, 7£). So, given f G C(Q) and e > 0, we can find g, h G ЗЫ such
that ||3?/ - g\\n < e/2 and Ц9/ - h||n < e/2. Thus, \\f - g - ih\\a < e. As
3?Л С Л, it follows that g 4- ih G A.	
EXAMPLE 8.16 Illustrates Theorem 8.10
Refer to Example 8.12(f). Suppose Q is a compact subset of C.
a)	The algebra P*(Q) satisfies the hypotheses of the complex version of
the Stone-Weierstrass theorem. Hence, P*(Q) is dense in C(Q).
b)	In general, the algebra P(Q) is not dense in C(Q). To see this, consider
the case Q = T and assume P(T) = C(T). Then there is a sequence
{PnlXi C P(T) such that {pn(ezt)}^=1 converges to e~~zt uniformly for
t G [0,2%]. Consequently,
1 = 2- / еие~и(И= lim — / еирп(ег*) dt.
2% Л	n-oo27rj0	J
However, as ezkt dt = 0 for к G Af, it follows that the right-hand side
of the previous equation equals 0, a contradiction. Thus, P(T) is not
dense in C(T). Noting that P(T) satisfies conditions (a), (b), and (c)
of the complex version of the Stone-Weierstrass theorem but not con-
dition (d), we see that this example shows that the theorem fails if
condition (d) is dropped from the hypotheses.	□
EXERCISES 8.6
In this exercise set, we assume throughout that Cl is a compact Hausdorff space.
8.50	Let A be an algebra in C(Q,7£) or C(CT). Prove that A is also an algebra.
8.51	Verify the relations
fvg = (f + 9 + \f -9^/2
and
f Лд = (f + g- |f-0|)/2
used in the proof of the Stone-Weierstrass theorem.
524 о Chapter 8 Complete Spaces, Compact Spaces, and Approximation
8.52	Suppose A is an algebra in C(Q, 7£). Show that if f G A and a is a positive
constant, then \ f\a G A.
8.53	Show that Theorem 8.9 remains valid if condition (c) is replaced by the
following: There is a g G A such that g(x) / 0 for each x G Q. Hint: See
Exercises 8.50 and 8.52.
8.54	Show that if A C C(Q,7£) satisfies conditions (a) and (b) of the Stone-
Weierstrass theorem and A / C(Q,7£), then there is a point x G fi such
that f(x) = 0 for each f G A. Hint: See Exercise 8.53.
8.55	A linear subspace I of C(Q, 1Z) is called an ideal if f G T and g G C(Q, 1Z)
imply g-f el. Suppose that T is a proper closed ideal of C(Q,7£). Show
that there is a nonempty closed subset F C Q such that
T = { f G C(Q, 1Z.) : f(x) = 0 for each x G F }.
Hint: By Exercise 8.54, the set F = A/€j(W) / Show that if
g G C(Q, IV) vanishes on F, then there is an fo G T such that 0 < fo < 1 and
/o(y) > 0 for all у £ P~1({0}). Deduce from Exercise 8.52 that G T.
Show that ||<7 — gf^n ||n —♦ 0 as n —> oo.
8.56	Let D be a dense subalgebra of C(Q). Show that D must separate the
points of Q.
8.57	Give an example showing that the complex version of the Stone-Weierstrass
theorem fails if condition (c) is dropped.
8.58	Show that the complex version of the Stone-Weierstrass theorem remains
valid if condition (c) is replaced by the following: There is a g G A such
that g(x) ф 0 for each x G П. Hint: See Exercises 8.54 and 8.55.
8.59	Suppose that A C C(Q) satisfies conditions (a), (c), and (d) of the complex
version of the Stone-Weierstrass theorem. Also suppose that g G A and
that h is a complex-valued function continuous on the range of g. Show
that h о g G A.
8.60	Give an example showing that the complex version of the Stone-Weierstrass
theorem fails if instead of assuming that A is an algebra, we require only
the weaker condition that A is a linear subspace of C(Q).
8.61	A linear subspace 1 of C(Q) is said to be ideal if f G T and g G C(Q)
implyg-fel. Suppose that T is_a proper closed ideal of C(Q) satisfying
the condition that f G T implies f G 1. Prove there is a nonempty closed
subset F C Q such that
Z = {/ G C(Q) : f(x) = 0 for each x G F}.
Hint: See the hints for Exercise 8.55.
8.6 Approximation by Functions From an Algebra □ 525
8.62	Suppose that Г and A are compact Hausdorff spaces. Let A denote the
collection of functions f on Г x A of the form f(x,y) =
where n E V and gj € С(Г) and hj G C(A) for j = 1, 2, ..., n. Show
that A is dense in С(Г x A). Generalize this result to arbitrary Cartesian
products.
8.63	Let h be a strictly increasing continuous function on [a,b]. Suppose that
f G C([a, satisfies the condition J* hn(t)f(t) dt = 0 for n = 0, 1, ... .
Prove that f = 0.
★8.64 Suppose that f G £1([0, oo)) and that Jo°° e~txf(x)dx = 0 for each t > 0.
Show that f = 0 ae.
David Hilbert
(1862-1943)
David Hilbert was born in Konigsberg, Ger-
many, on January 23, 1862. He entered the
University of Kbnigsberg in 1880 and received
his doctorate there in 1885. In 1886, Hilbert
qualified as an unpaid lecturer at the Univer-
sity of Konigsberg and acted in this capacity
until 1892, when he replaced Adolf Hurwitz as
assistant professor. In 1895, he obtained a chair
at the University of Gottingen where he remained until he retired in 1930.
Hilbert's first work was on the theory of invariants. His activity moved
from algebraic forms to algebraic number theory, foundations of geom-
etry, analysis (including calculus of variations and integral equations),
theoretical physics, and, finally, to the foundations of mathematics. The
invention of the space that bears Hilbert's name grew from his work in
the field of integral equations.
The treatise, Der Zahlbericht, was begun in 1893 in partnership with
Minkowski. But Minkowski abandoned the project, and Hilbert reshaped
the information of algebraic number theory into a master work of mathe-
matical literature—for 50 years, Der Zahlbericht was the sacred canon of
algebraic number theory. Hilbert also wrote Grundlagen der Geometric,
a text published in 1899 that reached its ninth edition in 1962.
In 1925, Hilbert contracted pernicious anemia, and although he recov-
ered from this illness, he did not resume his full scientific activity. Hilbert
died in Gottingen, Germany, on February 14, 1943.
526
□
□
Hilbert Spaces and the
Classical Banach Spaces
The theory of normed spaces applies ideas from linear algebra, geometry,
and topology to problems of analysis. In this chapter we will study in
detail the most important examples of normed spaces, namely, Hilbert
spaces and the classical Banach spaces. These spaces, which are natural
generalizations of Euclidean n-space, 7£n, and unitary n-space, Cn, are
ubiquitous in analysis. The examples we study in this chapter also serve
to motivate some general theorems that appear in Chapter 10.
Section 9.1 discusses preliminaries on normed spaces; Sections 9.2
and 9.3 consider Hilbert spaces and bases and duality of Hilbert spaces;
Section 9.4 examines Cp spaces; and Sections 9.5 and 9.6 investigate non-
negative linear functionals on С(П) and the dual spaces of C(Q) and Co(f2).
9.1 PRELIMINARIES ON NORMED SPACES
In this section, we study some elementary properties of normed spaces.
Specifically, we examine the relationship between continuity and linearity
for mappings of a normed space. We also present a criterion for a normed
space to be complete.
527
528 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces
In calculus, the following properties of derivative and integral are used
so often that their fundamental importance is indisputable:
(а/ + Ш = а/'(0 + М«)
and
pb	pb	pb
I (af + @g)(t)dt = a / f(t)dt + (3 / g(t)dt.
a	J a	J a
These formulas show that differentiation and integration are linear map-
pings on appropriate spaces of functions.
DEFINITION 9.1 Linear Mappings, Operators, and Functionals
Let П and Л be linear spaces with the same scalar field. A function
L: fl —► Л is said to be a linear mapping if for all x, у G fi and all
scalars a the following two conditions are satisfied:
a)	L(x + y) = L(x) + L(y).
b)	L(ax) = aL(x).
Linear mappings are also referred to as linear operators or linear
transformations; and in cases where Л is the scalar field, linear map-
pings are usually called linear functionals.
It follows easily from Definition 9.1 that a linear mapping L takes the
linear combination &jXj to the linear combination otjL(xj)\
that is, for each n 6 У,
for all xi, X2,..., xn 6 П and scalars ai, ct2> • • •, otn.
EXAMPLE 9.1 Illustrates Definition 9.1
a)	Let Ci([0,1]) denote the collection of complex-valued functions on [0,1]
having everywhere defined and continuous derivatives. Then the func-
tion D: Ci([0,1]) —► C([0,1]) defined by D(f) = f is a linear mapping.
b)	The function J: C([0,1])	C([0,1]) defined by J(/)(x) = f* f(t) dt is
a linear operator.
c)	The function £C([0,1]) —> C defined by €(/) = f(t)dt is a linear
functional.
9.1 Preliminaries on Normed Spaces □ 529
d)	Let A be an m x n real matrix. Then the function T:	—► 7£n defined
by T(x) = xA is a linear mapping. Here xA denotes the product of x
with A as matrices, where x is considered a 1 x m matrix. These map-
pings are the classical linear transformations studied in linear algebra.
Note that if m = n, then T is a linear operator.	□
The next proposition, whose proof is left to the reader as Exercise 9.1,
considers the relationship between continuity and linearity of mappings of
normed spaces. In the statement of the proposition, as often elsewhere in
the text, we use the symbol || || as a generic norm.
PROPOSITION 9.1
Let fi and Л be normed spaces with the same scalar field and L: fi —► Л a
linear mapping. Then the following are equivalent:
a)	L is continuous.
b)	L is continuous at some point of SI.
c)	L is continuous at 0.
d)	sup{ ||L(x)|| : ||x|| < 1} <oo.
e)	There is a constant c such that ||L(x)|| < c||rr|| for all x G fi.
Part (d) of Proposition 9.1 motivates the definition of a bounded linear
mapping, as given in Definition 9.2.
DEFINITION 9.2 Bounded Linear Mapping
Suppose that fi and Л are normed spaces with the same scalar field
and that L: fi —► Л a linear mapping. If
Pill = sup{ ||L(x)|| : M < 1} < oo,
then L is said to be a bounded linear mapping.
Proposition 9.1 shows that a linear mapping is bounded if and only if
it is continuous. Note that if L is a bounded linear mapping on fi, then we
have \\L(x)\\ < |||L|||||x|| for all x G fi.
EXAMPLE 9.	2 Illustrates Definition 9.2
a)	Let Q be a normed space and I: fi —> fi be the identity function, that
is, I(x) = x for all x G fi. Then I is a bounded linear operator and we
have |||/||| = 1; I is called the identity operator on fi.
530 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces
b)	The linear operator J defined in Example 9.1(b) is bounded and, in
fact, it is easy to show that ||| J||| = 1.
c)	The linear functional I defined in Example 9.1(c) is also bounded and,
again, it is easy to show that |||€||| = 1.
d)	The linear mapping D defined in Example 9.1(a) is not bounded if
Ci([0,1]) is given the norm || ||[o,i]• To see this, consider the sequence of
functions defined by sn(x) = sinn7nr. Clearly, ||sn||[o,i] = 1- However,
as ||B(sn)||[o,i] = П7Г, follows that |||B||| = oo.	□
When П and Л are normed spaces with the same scalar field, the
collection of all bounded linear operators from Q to A is denoted by B(Q, Л).
If we define addition and scalar multiplication in B(Q,A) by
(Li 4- B2)(^) = Li (ж) 4- L2{x) and (aLi)(rr) = aLi(rr),
then B(Q, Л) becomes a linear space. Furthermore, as the reader is asked
to show in Exercise 9.3, ||| ||| defines a norm on B(Q,A).
From now on, unless specified otherwise, we will abbreviate the normed
space (B(Q, A), ||| |||) by B(Q, A). When Q = A, we usually denote B(fl, A)
by B(Q); and when A is the scalar field, B(Q, A) is denoted by Q* and the
norm HI HI by || ||*. This latter space has a special name.
DEFINITION 9.3 Dual Space
Let Q be a normed space. Then the space (Q*, || ||*) of bounded linear
functionals on Q is called the dual space of Q.
The following proposition, whose proof is left to the reader as Exer-
cise 9.6, provides a sufficient condition for the completeness of B(Q,A).
PROPOSITION 9.2
Let П and A be normed spaces. If A is complete, then so is B(Q,A). In
particular, the dual space (Q*, || || *) is complete.
We will discover that in many notable cases it is possible to find a
concrete description of the dual of a normed space. For example, we will
prove later that t € C([0,1])* if and only if there is a unique complex Borel
measure p on [0,1] such that €(/) = f f dp for all f e C([0,1]).
9.1 Preliminaries on Normed Spaces □ 531
Banach Spaces
For normed spaces, completeness is a property of such consequence that
those possessing it are called Banach spaces, after the noted mathematician
Stefan Banach. (See the biography at the beginning of Chapter 10 for more
on Banach.)
DEFINITION 9.4 Banach Space
A complete normed space is called a Banach space.
EXAMPLE 9.	3 Illustrates Definition 9.4
a)	Exercises 7.59 and 7.60 on page 438 show that TV1 and Cn are Banach
spaces.
b)	By Proposition 9.2, B(Q,A) is a Banach space whenever Л is; in par-
ticular, fi* is always a Banach space.
c)	If fi is a compact topological space, then С7(П) is a Banach space.
d)	If П is locally compact but not compact, then Exercise 7.173(c) on
page 491 shows that Cb(Q) and Q,(Q) are Banach spaces.
e)	If (Q, Д, g) is a measure space, then £°°(/z) is a Banach space. □
Our next proposition characterizes completeness in normed spaces in
terms of infinite series. First let us recall some concepts from Chapter 7.
If {^n}n=i a sequence of elements in a normed space Q, then the ex-
pression xn is called an infinite series. The sequence {sn}Xi °f
elements of П defined by sn = xk is called the associated sequence
of partial sums. We say the infinite series converges if the sequence of
partial sums converges, that is, if limn-^o sn exists.
Closely related to the concept of convergence of series is the concept
of absolute convergence of series. If {zn}^Li is a sequence of elements in a
normed space Q, then the infinite series xn is said to be absolutely
convergent or to converge absolutely if ||rrn|| < oo- In the
normed space 7£, a series of nonnegative terms converges if and only if
it converges absolutely. On the other hand the series	l)n/n con-
verges but does not converge absolutely.
We learned in calculus that every absolutely convergent series of real
numbers converges. Proposition 9.3 shows that this property characterizes
Banach spaces.
532 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces
PROPOSITION 9.3
A normed space Q is a Banach space if and only if every absolutely con-
vergent series in Q converges.
PROOF: Suppose that Q is a Banach space and let xn be an abso-
lutely convergent series. Since the sequence of partial sums sn = xn
satisfies ||sn - sm|| < £X=m+i ll^ll for < n, it follows that {sn}^ is a
Cauchy sequence. Therefore, by completeness, liiUn—oo sn exists.
Conversely, suppose that every absolutely convergent series in Q con-
verges. Let {z/n}^Li be a Cauchy sequence. Taking into account Exer-
cise 7.79 on page 446, to prove that {з/п}^Х=1 convergent suffices to show
that it has a convergent subsequence. By repeatedly applying the Cauchy
property, we obtain a subsequence {z/nfc}j&i sucb that ||?/nfc+1 ~Упк || < 2“*.
Let Xi = yni and xk = уПк ~ Упк_х for к > 2. Then xk converges
absolutely. Because уПк = xh it follows that Иш^—оо уПк exists. 
EXERCISES 9.1
9.1	Prove Proposition 9.1.
9.2	Let L E B(Q, Л), where Q and Л are normed spaces and || || represents the
norm on both spaces. Prove that
IIILIH = sup{ \\L(x)\\ : И < 1} = sup{ \\L(x)\\ : ||x|| = 1}.
9.3	Suppose that Q and Л are normed spaces. Prove that ||| ||| is a norm on the
space B(Q,A).
9.4	Let д E C([0,1]). Consider the linear operator L5:C([0,1]) —► C([0,1])
defined by Lg(f) = gf. Show that Lg is continuous and find |||L9|||.
9.5	Show that each of the following is a continuous linear functional on C([0,1])
and find its norm:
a)	£(f) = /(0)
b)	£(J) = £f(t)dt
c)	^(/) = fg /(t)h(t) dt, where h € £x([0,1])
9.6	Prove Proposition 9.2.
9.7	Let Ci([0,1]) be defined as in Example 9.1(a) on page 528.
a)	Show that Ci([0,1]) is not a closed subspace of C([0,1]).
b)	Conclude that Ci([0,1]) equipped with the norm |[ ||[o.i] is not a Banach
space.
9.2 Hilbert Spaces □ 533
9.8	Show that the space Ci([0,1]) defined in Example 9.1(a) becomes a Banach
4 space if it is equipped with the norm ||/|| = |/(0)| 4- ||/'|l[o,i]•
9.9	Refer to Example 7.6 on page 423. Let Q be a nonempty set. Show that
the spaces €1(Q), £2(Q), and £°°(Q) are all Banach spaces.
9.10	Prove that there exist discontinuous linear functionals on any infinite di-
mensional normed space.
9.11	This exercise shows that linear mappings on Euclidean n-space or unitary
n-space are automatically continuous.
a)	Show that all linear functionals on Cn or are continuous.
b)	Show that all linear mappings from Cn or Ип into a normed space are
continuous.
9.12	Let S be a linear subspace of the normed space Q. Prove that if S° / 0,
then S = Q.
9.13	Let Г and Л be normed spaces. Define
ll(«>y)lli = lkll + l|y||>
Il(®,j/)ll2 = (IM2 + M2)1/2,
and
ll(*>S/)ll°o = max{||z||, ||y||}.
a)	Prove that each of the three expressions defines a norm on the Cartesian
product space ГхА.
b)	Prove that all three norms are equivalent.
9.14	Let || ||i be the norm on C([0,1]) defined by
11/111= f'tfMdt.
Jo
a)	Show that ||/||i < ||/||[o,i]•
b)	Are || ||i and || ||[o,i] equivalent?
9.2 HILBERT SPACES
Perhaps because they are such natural generalizations of the standard Eu-
clidean space (7£n, || Ц2), Hilbert spaces appear more frequently in math-
ematics than other Banach spaces. In addition to being intrinsically im-
portant, the theory of Hilbert spaces also merits an extensive discussion
because it serves as a model for the general theory of Banach spaces. In
this section, we begin our treatment of Hilbert space theory.
534 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces
DEFINITION 9.5 Inner Product, Inner Product Space
Let X be a linear space with scalar field F either TZ or C. An inner
product on X is a function ( , ): X x X —> F satisfying the following
conditions for all x, y, z 6 X and a, /3 E F:
a) {ax + 0y, z) = a{x,z) + (3{y, z).	_____
b) (rr, yj = {y, x) if F = TZ or {x, y) = {y, x) if F = C.
c) {x, x) > 0.
d) {x, x) = 0 if and only if x = 0.
If ( , ) is an inner product on X, then the pair {X, { , )) is called an
inner product space.
Note: When it is clear from context which inner product is being con-
sidered, the inner product space (Л\ ( , )) will be indicated simply by X.
And, although we usually denote an inner product by { , ), it is sometimes
convenient to have slight variations of this notation such as ( , )2 or [ , ].
EXAMPLE 9.4 Illustrates Definition 9.5
a)	Cn is an inner product space if we define
n
{z,w) = ^zkVTk,
fc=l
where z = (zi,..., zn) and w = (wi,..., wn).
b)	7Zn is an inner product space if we define
n
(х,у) = ^ХкУк’
fc=l
where x = (a?i,...,xn) and у = (?/i,..., yn). This inner product is the
classical “dot product” encountered in vector-calculus courses.
When we consider Cn or TC1 as an inner product space, we will assume
that the inner product is as in this example unless we state otherwise. □
THEOREM 9.1
Let X be an inner product space. Then, for all x,y E X,
a)	+	= {x,x} + 23l{x,y) 4- {y,y).
b)	|2 < {x,x){y,y). (Cauchy’s inequality)
Moreover, if у / 0, then equality holds in (b) if and only if x = ay for some
scalar a.
9.2 Hilbert Spaces □ 535
PROOF:
a)	From Definition 9.5, we have
(x + y,x + y) = (x,x + y) + (y,x + y)
= {x, x) + (x, y) + {x, y} + {y, y) (9.1)
= {x,x) +2%t{x,y) + (y,y),
as required.
b)	If in (9.1) we replace у by —ty where t is a real scalar, then we obtain
the polynomial
p(t) = (x - ty,x - ty) = 7 + /3t 4- at2,
where a = (y,y), /3 = —2Щх,у), у = (x,x). By Definition 9.5(c), we
have p(t) > 0. It follows that p(t) has at most one real root. Thus,
/32 — 4a7 < 0, that is,
(5t(x,y))2 < (x,x){y,y).	(9.2)
The proof of (b) is now complete in the case of real scalars. If the
scalar field is C, we choose в G [0,2тг) so that eie(x,y) = |(rr,y)\ and
use Definition 9.5 and (9.2) to obtain
|(x,i/)|2 = (Щегвх,у))2 < {eiex,eiex){y,y)
= вгве~гв(х,х)(у,у) = {x,x){y,y).
Therefore, (b) holds in any case.
Suppose now that the Scalar field is 7£, у / 0, and that equality holds
in (b). Then the polynomial p(t) has a root at t = —/3/(2a). It follows from
Definition 9.5(d) that x = — (/3/(2a))y. If the scalar field is C, we choose в
as in the preceding. Then equality in (b) yields егвх = —(J3/(2a))y by an
argument similar to that used in the real case.	
We have referred to the inequality in part (b) of Theorem 9.1 as
Cauchy’s inequality. But it is also known as the Schwarz, Cauchy-Schwarz,
Bunyakovski, or Cauchy-Bunyakovski-Schwarz (CBS) inequality.
EXAMPLE 9.5 Illustrates Definition 9.5 and Theorem 9.1
Suppose Zi,Z2,...,zn,wi,W2,...,wn G C. Then it follows from Theo-
rem 9.1 and Example 9.4 that
n	4	/ n \ / n	\
< (^i^fci21 (^2ы2).
/с=1	4=1	' 4=1	'
This is Cauchy’s inequality for finite sequences of complex numbers. -	□
536 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces
EXAMPLE 9.6 Illustrates Definition 9.5 and Theorem 9.1
Refer to Example 7.6(b) on page 423. Let (П,Л,/х) be a measure space.
Recall that £2(/z) consists of all complex-valued Д-measurable functions
satisfying |/|2d/z < oo Also recall that we identify functions that are
equal /z-ае. We will show that
(Лр) = [ fgdp	(9-3)
Jn
defines an inner product on £2(/z).
Because of properties of Lebesgue integration that we established in
Chapter 4, we need only prove that
(9.4)
But this follows immediately from the simple inequality 2|/<j| < \ f\2 + |p|2.
From now on, whenever we consider £2(/z) in the context of inner
product spaces, we will always use the inner product defined by (9.3). □
EXAMPLE 9.7 Illustrates Definition 9.5 and Theorem 9.1
Let (П,Л, P) be a probability space. By Example 9.6, the function ( , )
defined by (X, Y) = £(XY) is an inner product on the space of all ran-
dom variables having finite variance where, again, we identify two random
variables that are equal with probability one. Note that
Cov(X, Y) = £((X - £(X))(Y - £(Y))) = ((X - £(X)), (Y - f (Y)))
and, in particular,- Var(X) = ((X - £(X)), (X - £(X))).
The correlation coefficient of two random variables X and Y having
finite variance is defined by
px,y = Cov(X, Y)/x/Var(X)Var(Y).
This quantity is used extensively in probability, statistics, and stochastic
processes. From Cauchy’s inequality, we see that — 1 < px,y < L □
COROLLARY 9.1
Let X be an inner product space. Define || ||: X —>	by
9.2 Hilbert Spaces □ 537
Then the following hold.
a)	The function || || is a norm on X.
b)	We have
lk + y||2 + lk-y||2 = 2|М2 + 2||у||2
for all x,y e X.
c)	The inner product is continuous with respect to the product topology
induced on X x X by the norm || ||.
PROOF:
a)	Definition 7.9 on page 422 gives the three conditions for being a norm.
It is easy to check that || || satisfies the first two conditions. To verify
the third condition, we use Theorem 9.1 to conclude that
lk + з/И2 = Ikll2 + 23t(x,j/> + hll2
< Ikll2+2|klllMI + IMI2 = (Ikll+ 11И)2-
This gives the required result.
b)	Applying Theorem 9.1 again, we obtain that
Ik + 2/II2 = Ikll2 + %R{x,y) + ||y||2
and, replacing у by — у in the previous equation, we get
lk-3/ll2 = Ikll2 -2^,y).+ ||y||2.
Adding corresponding sides of the two preceding equalities yields (b).
c) We leave the proof of part (c) to the reader as Exercise 9.15. Й
In the future, we will assume that every inner product space is also
a normed space, equipped with the norm defined in Corollary 9.1. If an
inner product space is complete, it is called a Hilbert space in honor of the
mathematician David Hilbert. (See the biography at the beginning of this
chapter for more about Hilbert.)
DEFINITION 9.6 Hilbert Space
An inner product space that is complete with respect to its norm is
called a Hilbert space.
We already know that TV1 and Cn are Hilbert spaces. Later in this
chapter we will prove that all spaces of the form £2(/z) are Hilbert spaces.
But for now we will content ourselves with knowing that £2(/z)-type spaces
are inner product spaces, as we showed in Example 9.7.
538 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces
Nearest Points
The standard Euclidean plane (T?2, || Ц2) serves to illustrate an essential
property of Hilbert spaces that we will prove in Theorem 9.2. We know that
the linear subspaces of 1Z2 are {(0,0)}, 7£2, and lines passing through (0,0).
If L is a line through (0,0) and if x G 7£2, then the point of intersection, г/о»
of L and the line through x perpendicular to L is the unique point on L
that is nearest to x. What is important for us is that yo is completely
determined by the conditions
т/o G L and {x ~ 2/0 3 y) = 0 for all У £ L,
as seen in Fig. 9.1.
This property of the Euclidean plane serves to motivate the following
important theorem about Hilbert spaces.
THEOREM 9.2
Let H be a Hilbert space and К a closed linear subspace of Tt. For each
x G 7Y there is a unique point y0 G К such that
Ik - i/o|| = p(x,k),
9.2 Hilbert Spaces □ 539
where p(x,K) = inf{ ||ж — г/|| : у G К }. Furthermore, the point yo is
determined by the conditions
yo E К and (x — yo, у) = 0 for all у G К. (9.5)
In other words, (9.5) determines the unique nearest point of К to x.
PROOF: We establish the theorem when the scalar field is C; the proof
for real scalars is obtained by a slight modification. To begin, we select a
sequence {2/n}^Li С К such that limn-^o ||x — yn\\ = p(x, K). We claim
that {l/n}^Li is a Cauchy sequence. Setting x = x — yn and у = x — ym in
Corollary 9.1, we obtain
4lk - (.Уп + Ут)/2||2 + ||j/n - Ут||2 = 2||х - уп||2 + 2||х - уго||2.
Since К is a linear subspace, (yn + Ут)/2 € К. It follows that
Из/n - Ут||2 < 2||x - ynII2 + 2||x - ymII2 - 4p(x, K)2.	(9.6)
Because the right-hand side of (9.6) tends to 0 as n, m —> oo, we conclude
that {2/n}^°=i is a Cauchy sequence.
By completeness, yo = limn_>oo yn exists and, because К is closed, we
have yo G K. Moreover,
Ik - Уо|| = Пт ||x - 2/n|| = p(x, K).
n—>oo
To verify (9.5), it suffices to consider the case where у 6 К \ {0}.
Suppose that yo is a point of К nearest to x. By Theorem 9.1(a), we have
II® - Уо - ay||2 = Ik “ Уо||2 - 2Kd{x - y0,y) 4- |a|2||y||2
for all scalars a. Choosing a = (x - г/о?3/)/||з/||2, we obtain
Ik - 3/0 - ay||2 = lk-yo||2 - Ik - yo, у)|2/||у||2-
Because К is a linear subspace, it follows that yo + oty G K. Hence,
Ik - 2/0II2 = pk, K)2 < Ik - (Уо + ay)II2
= Цх-3/oll2 - Ik-yo,y>|2/l|y||2
and, consequently, {x — yo, y) =0.
540 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces
Suppose, on the other hand, that yo is an element of К that satis-
fies (9.5). Then, for every y€K,
Ik - y||2 = Ik - yo + 2/0 - S/||2
= Ik - 2/oII2 + 2Э?к - Уо, Уо ~ У) + ||з/0 - y||2	(9-7)
= |k-2/o||2 + ho -У||2 > 1к-Уо||2.
Thus, yo is a point of К nearest to x.
It remains to prove that yo is unique. Let yi be a point of К nearest
to z. Then, by (9.7),
Ik - 2/o||2 = Ik - 2/1II2 = Ik - yoll2 + ho - 2/1II2
and, therefore, ||г/о — 3/11|2 = 0- It follows that yo = У1-	
EXAMPLE 9.	8 Illustrates Theorem 9.2
a)	Let (a?i, t/i), (#2,3/2), • • •, Уп) be n points in the plane. In statistics
and other fields, it is important to find the straight line that best fits
the n points in the sense of minimizing the sum of squared errors. That
is, the problem is to find real numbers a and (3 that minimize
i>2(yj - (a + /3xj))2.
j=i
The resulting line is called the least-squares line or regression line.
We can use Theorem 9.2 to obtain the regression line as follows.
Let x = (zi,z2,...,zn), у = (t/i,2/2,---,2/n), w = (1,1,..., 1), and
К = { aw 4- bx : a, b e }. Finding the regression line is equivalent to
obtaining the element yo of /^nearest to y. Writing yo = aw + fix, we
apply (9.5) to get the equations
{aw -F /3z, w) = (7/, w) and {aw + /3x, x) = (2/, x)
or, equivalently,
na + P^Xj = ^yj and a^xj +
j=l J=1	J=1	j=l J=1
We thus have two linear equations in the two unknowns a and (3. The
solution, which we leave to the reader, gives the slope and ^/-intercept
of the regression line.
9.2 Hilbert Spaces □ 541
b)	Let fj, be the measure on [-1,1] defined by = X(E)/2. The quantity

.1/2
g(x)\2dx)
can be thought of as the average distance between f and g. We will
use Theorem 9.2 to find the function of the form g(x) = ax + (3 that
minimizes the average distance to f(x) = x2. The function g must
satisfy
y* (x2 — ax — P)(^x -I- (5) dx = 0
for all 7,6 6 C. A calculation shows that 2 (<5 — cry)/3 — 2,(36 = 0 for all 7
and 6. It follows that a = 0 and /3 = 1/3. Thus, the best approximation
to x2 of the form ax + /3 in the sense of the £2(/x)-norm is the constant
function g(x) = 1/3.
c)	Refer to Example 9.7. Let (Г2,Д, P) be a probability space and X a
random variable having finite variance. We will use Theorem 9.2 to
determine the constant c that minimizes £((X — c)2). Applying (9.5)
to the subspace generated by the random variable 1, we obtain the
equation £((X - c)l) = 0. Thus c = £(X) minimizes 5((X - c)2) and
we see that the minimum value is Var(X).	□
A close reading of the proof of Theorem 9.2 reveals that more than just
that theorem has been established. We did not fully use the assumption
that 7Y is complete; rather, we only needed the completeness of the linear
subspace К. The assumption that AT is a linear subspace of H can also be
relaxed.
Recall that a subset S of a linear space is said to be a convex set if for
all z, у e S and 0 < a < 1, we have ax + (1 — a)y e S'; in words, whenever
S contains two points, it also contains the entire line segment connecting
the two points. If C is a closed convex subset, but not necessarily a linear
subspace, of a Hilbert space 7Y, then we can still obtain a unique nearest
point. However, (9.5) is in general no longer valid. (See Exercise 9.22.)
Theorem 9.2 enables us to associate with each closed linear subspace К
of a Hilbert space H the function Pk'.'H —> W, where Рк(ж) is the point
of К nearest to x. The properties of the function Pk are explored in
Exercise 9.26 where, in particular, it is shown that it is a bounded linear
operator on H having range K. The operator Pk is often referred to as
the orthogonal projection of H onto K.
542 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces
Orthogonality
Recall that the ordinary dot product on 7£2 satisfies (x,y) = ||z||||t/|| cos0,
where в is the angle between x and y. Thus, two vectors in 1Z2 are per-
pendicular if and only if their dot product is 0. Similarly, the condition
(x,y) = 0 captures the notion of perpendicularity of two elements of a
general inner product space X. The term used for “perpendicular” in the
context of inner product spaces is orthogonal.
DEFINITION 9.7 Orthogonality
Let X be an inner product space. Two elements x and у of X are
said to be orthogonal if (ж,7/) = 0. For a subset S of X, we define
the orthogonal complement of 5, denoted 5Х, to be the set of all
elements of X that are orthogonal to every element of S’, that is,
= { у G X : (x, y) = 0 for all x G S }.
EXAMPLE 9.	9 Illustrates Definition 9.7
a)	The elements (1,0) and (0,1) of 1Z2 are orthogonal and the orthogonal
complement of {(1,0)} is { (0, у) : у G TZ }.
b)	Recall that two random variables having finite variance are said to be
uncorrelated if Cov(X, Y) = 0. We see from Example 9.7 that two
random variables are uncorrelated if and only if X — £ (X) and Y — 8 (У)
are orthogonal.	□
It is left to the reader as Exercise 9.23 to prove that Sx is always a
closed linear subspace. Moreover, it can be shown that in Hilbert spaces,
(5х )x = span 5, as the reader is asked to verify in Exercise 9.25. Here we
are using span S to represent the span of 5, that is, the linear subspace of
all finite linear combinations of elements of S.
Our next result is a version of Theorem 9.2 that emphasizes the role of
the orthogonal complement. It also serves as the prototype for an important
theorem in the general theory of normed spaces that appears in Chapter 10.
THEOREM 9.3
Let К be a proper closed linear subspace of the Hilbert space 7Y and x G Kc.
Then there exists a unique zq G K1 such that ||zo|| = 1 and
p(x, K) = inf{ ||rr - y|| : у G К }
= sup{ |(x, z)| : z € and ||г|| < 1} = (x, zq).
9.2 Hilbert Spaces □ 543
PROOF: Let т/o be the nearest point of К to x. If z G is such that
|| г|| < 1, then, by the definition of KL and Theorem 9.1, we have
|(z,z)| = \{x-y0,z)\ < ||rr- 2/o||||z|| < inf{ ||z - y\\ :yeK}. (9.9)
It follows that inf{ ||я — г/|| : у G К } > sup{ |(x, z)| : z G KL and ||z|| < 1}.
Now we let zq = (x — 2/о)/||я — Уо||• By (9.5), z0 G K1 and, furthermore,
inf{ ||t - z/Ц : у G К } = ||rr - т/oll = (x - y0, z0) = (z, z0)
< sup{ |(x, z)| : z G KL and ||z|| < 1}.
The equations in (9.8) now follow from (9.9) and (9.10). The uniqueness
of zq is left to the reader as Exercise 9.28.	
As a visual aid to understanding Theorem 9.3, we have constructed a
simple illustration of the theorem in Fig. 9.2. >
EXERCISES 9.2
9.15	Prove part (c) of Corollary 9.1.
9.16	Let (<¥,|| ||) be a normed space with scalar field
a)	Suppose the norm satisfies the identity in Corollary 9.1(b) on page 537.
Show that there is an inner product on <¥ such that || || is the induced
norm.
b)	Repeat part (a) in case the scalar field is C.
544 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces
9.17	A semi inner product on a linear space X is a function ( , ): X x X —> F
satisfying conditions (a), (b) and (c) of Definition 9.5 on page 534 and the
following weakening of condition (d): {x,x} = 0 if x = 0. Show that (a)
and (b) of Theorem 9.1 remain valid for semi inner products.
9.18	Let X be a linear space with inner product ( , ) and L: X —> X a linear
operator. Show that [:r, y] = (L(x), L(y)) defines a semi inner product on X
in the sense of Exercise 9.17.
9.19	Let Q be a nonempty set. Prove that ^2(Q) is a Hilbert space with respect
to the inner product given by
</,p)= / f9d^
Jn
where p is counting measure on Q.
9.20	Let (Q,A,p) be a measure space. Show that if f e £2(g), then there is
a sequence of simple functions {rn}^_1 C £2(g) such that as n —> oo,
II/ - rn||2 -+ o, ||rnII2 11/Ц2, and Гп-> f g-ae.
9.21	Let	be a sequence of Hilbert spaces and set
< 00 ►.
Denote by ( , ) the inner product for each 7Yn- Show that 7Y is a Hilbert
space with respect to the inner product defined by [a:, y] = {хп,Уп)-
9.22	Let C be a closed convex subset of a Hilbert space Fl. Show that for each
x € Fl there is a unique point yo 6 C such that ||x — 3/01| =	C).
9.23	Let S be a subset of an inner product space X. Show that S1- is a closed
linear subspace of X.
9.24	Verify the following properties of orthogonal complements:
a)	AcB^cA1.
b)	A1- = (span A)-1.
c)	iHnE1 = (DUE)1.
9.25	Prove that in Hilbert spaces, (A±)“L = span A.
★9.26 Let К be a closed linear subspace of a Hilbert space Fl and Pk the associated
orthogonal projection. Verify the following properties.
a)	Pk is linear.
b)	||Рк(x)II < ||x||, so that Pk is continuous.
с)	Рк о Pk = Pk-
d)	Pk1({0}) = ^±-
e)	The range of Pk is K.
f)	PK-l = I-Рк, where I is the identity operator on Fl. (See Exercise 9.25.)
g)	Deduce from part (f) that each x € Fl can be written uniquely in the
form x = у 4- у1-, where у € К and у1- € К±.
9.3 Bases and Duality in Hilbert Spaces о 545
9.27 Let 2/o be a nonzero element of a Hilbert space H and set К = span{po}-
Find an explicit formula for Pk.
9.28 Verify the uniqueness of zq in Theorem 9.3.
9.3 BASES AND DUALITY IN HILBERT SPACES
As we know, the concepts of linear independence and basis play an essen-
tial role in the theory of finite dimensional linear spaces. In the infinite
dimensional case, one can use Zorn’s lemma to prove the existence of a
Hamel basis — a maximal linearly independent set В — and then show
that every element of the space can be written uniquely as a finite linear
combination of members of B.
Hamel bases are of little use in analysis, however, because they gener-
ally cannot be obtained by a formula or constructive process. Fortunately,
in Hilbert spaces, there is an analogue of Hamel basis that is much better
suited to the needs of analysis. It is this notion of basis to which we now
turn our attention.
DEFINITION 9.8 Orthogonal Set; Orthonormal Set and Basis
Let (Д', ( , )) be an inner product space. A subset S С X is said to be
an orthogonal set if every two distinct elements of S are orthogonal,
that is, (ж, у) — 0 for all z, у G S with x / y. An orthogonal set S is
said to be an orthonormal set if ||x|| = 1 for each x G S. If S is an
orthonormal set and is contained in no strictly larger orthonormal set,
then S is called an orthonormal basis, or simply a basis.
EXAMPLE 9.10 Illustrates Definition 9.8
a)	The set of elements
{(1,0,0,... ,0), (0,1,0,... ,0),..., (0,0,0,..., 1)}
is an orthonormal set in Cn. Clearly, it is also a basis.
b)	Let Q be a nonempty set. For each x G Q, let dx denote the function
that is 1 at x and 0 at all other points of П. Then {dx : x G Q }
is an orthonormal set in £2(Q). We will see later that it is also an
orthonormal basis.
546 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces
c)	For each п e Z, define en(x) = (2тг)“1/2егпж. It is easy to see that the
collection of functions { en : n e Z } is an orthonormal set in C2 ([0,2тг]).
Later we will show that it is an orthonormal basis as well.	□
Our next theorem provides some fundamental properties of orthonor-
mal sets.
THEOREM 9.4
Let X be an inner product space and E = {ei, 62,..., en} a Unite orthonor-
mal subset of X. Then the following hold.
a)	E is linearly independent.
Ь)	II	II2 — lQj|2 ?ог апУ choice of scalars Oi,ct2, • • • ,c*n-
c)	For each x 6 X, we have 1(ж> ej)|2 — ll^ll2•
d)	x —	(s, ej}ej f°r each x € spanE.
e)	span E is a complete subspace of X, in particular, a closed subset of X.
f)	For each x € X, the element г/о =	ej)ej *s unique nearest
point of spanE to x, that is, it is the unique member у of spanE
satisfying ||x — y\\ = p(x,spanE).
PROOF: The proofs of (a), (b) and (d) are left to the reader as Exer-
cise 9.30. To prove (c), let x e X and у =	By Part (b), we
have Hj/II2 =	|(х,е,)|2. Also,
{Х>У} —	k>ej)ej) = k, ej)(x,ej) = Ik) ej)l •
J=1	j=l
Applying Theorem 9.1(a) on page 534, we now obtain that
о < |k~ Z/H2 = Ikll2 -2Щх,у} 4- ||У||2 = Ikll2 -^|к,е7)|2,
J=1
from which (c) follows immediately.
To prove (e), let {ym}m=i be a Cauchy sequence in spanE. From
Cauchy’s inequality, we have
\{ym,ek) - (ye,ek)\ < Ikm ~3/dl-
Thus, {{Ут)Ск)}т=1 is a Cauchy sequence for к = 1, 2, ..., n. Applying
part (d) and using the completeness of the scalars, we conclude that the
limit
n
У — lim ym = У2 ( lira {ym, ek)) ek
fc=l
9.3 Bases and Duality in Hilbert Spaces □ 547
exists. Clearly, у G span E. We have now shown that spanE is complete.
Since a complete subset of a metric space is closed, it follows that span E is
closed in X.
Next we establish (f). By Theorem 9.2 on page 538 and the defining
properties of inner product, it is enough to show that {x — уо,вк) =0 for
к = 1, 2, ..., n. Using the fact that E is an orthonormal set, we get
n
{x-y0,ek) = (x,ek) -£(x,ej}(ej,ek) = (x,ek) - (x,ek) = 0,
J=1
as required.	
As an immediate consequence of Theorem 9.4(c), we get the following
important result, known as Bessel’s inequality. Refer to Exercise 2.37 on
page 57 for the meaning of the summation that occurs in that inequality.
COROLLARY 9.2 Bessel's Inequality
Let E be an orthonormal subset of an inner product space X. Then
£|М12<И2
e£E
for all x e X.
EXAMPLE 9.11 Illustrates Theorem 9.4
In the space £2 ([0, 2тг]), consider the linear subspace
Un — span{ ejt : —n < к < n },
where e^x) = (2iv)~1^2e'Lkx. From what we noted in Example 9.10(c),
Un is an orthonormal set. It is clear that Un is the space of complex
trigonometric polynomials of degree at most n.
Let f e £2([0,2тг]). Then, from Theorem 9.4(f), the nearest member
of Un to f is given by
sn =
|fc|<n
The number
1 Г27Г
/(fc) = (2тг) - V2 </, efc) = _ jf, f(x)e~ik* dx
548 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces
is called the fcth Fourier coefficient of f. Thus, the best approximation,
n
«„(*) = 52 <J,e*}ek = 52 /(fc)e<fcX’
|fc|<n	k=—n
is the nth partial sum of the Fourier series	f(k)etkx associated
with the function f.	□
More examples of orthonormal sets can be found using the procedure
described in the proof of the following theorem.
THEOREM 9.	5
Let {•c7n}m=i be a sequence of elements in an inner product space X and
assume xj / 0. Then there is a countable orthonormal set {z/i, Z/2, • • •} and
a nondecreasing sequence of integers {k{m)}^=1 such that
span{xi, x2, ...,xm} = span{j/i, j/2,..., yk(m)}
for each m € AT.
PROOF: We outline an argument by mathematical induction leaving the
details for Exercise 9.31.
Let 2/i = zi/||zi||. Proceeding inductively, suppose ?/i, ?/2, • • •, Ук(т)
have been chosen so that {т/i, ?/2,...,Ук(т)} is an orthonormal set and
span{xi,a:2,... ,xm} = span{7/i,7/2, • • • ,7/fc(rn)}.
Define
fc(m)
V = 2?m+l ~	У1}УЗ'
Then we find that v is orthogonal to yj for j = 1, 2, ..., k(m).
If v = 0, then G span{i/i,j/2,... ^Ук(т)} and, in this case, we
let k(m + 1) = k(m). If v / 0, we let fc(m + 1) = k(m) + 1 and define
Ук(т+1) = v/||v||; then {3/1,J/2, • • • ,Ук(т),Ук(т+1)} is an orthonormal set
such that span{a:i,a:2,	= span{3/i,3/2,... ,3/fc(m+i)}-	
The following theorem provides several equivalent conditions for an or-
thonormal set in a Hilbert space to be a basis. It also makes clear why bases
in the sense of Definition 9.8 are appropriate analogues of Hamel bases.
Before stating the theorem, we need to discuss generalized sums
in normed spaces. Let {xL}L^i be an indexed collection of elements of a
normed space. Then we say that the sum xL converges if there are
only countably many nonzero terms and if for every enumeration of these
terms, the resulting series converges to the same element.
9.3 Bases and Duality in Hilbert Spaces □ 549
THEOREM 9.	6
Let H be a Hilbert space and E an orthonormal subset of H. Then the
following are equivalent:
a)	E is a basis.
b)	spanE = 7Y.
с)	(x, e) = 0 for each e G E implies x = 0.
d)	For each x EH, we have x =	{x, e)e.
e)	Ikll2 = Eees K1»e)l2 for each я: € W.
PROOF:	_______
(a)	=> (b): If span E / H, then by Theorem 9.2 on page 538, we can find
a nonzero element z G (spanE)3-. Let eo = г/||г||. We note that E U {eo}
is orthonormal and properly contains E. Thus, E is not a basis.
(b)	=> (c): Suppose that (x,e) — 0 for each e G E. It follows from
the properties of an inner product that (x,y) = 0 for each у G spanE.
Using the continuity of the inner product, we conclude that x is orthogonal
to every element of spanE, which by assumption equals H. Therefore,
{x, x) = 0 and, so, x = 0.
(c)	=> (d): It follows from Bessel’s inequality that	e)l2 < °0-
Using that fact and Exercise 2.37(c) on page 57, we conclude that the set
Eq = { e G E : (x,e) / 0 } is either countably infinite or finite. We will
deal with the former case; the latter one is handled in a similar manner.
Let	be an enumeration of Eo and define xn = 22j=1 {x,ej)ej.
If n < m, then Theorem 9.4(b) implies that ||^n~^m||2 =	1(ж> еэ)I2*
It now follows that	is Cauchy and, therefore, converges to some
у G H. We claim that у = x. For each e G E, we have
{x -y,e) = (x,e) - ^2 (®,е7)(е,-,е).	(9.11)
3=1
If e is not in Eo, then (x,e) = 0 and (ej,e) = 0 for each j. If e = ejt for
some fc, then the right-hand side of (9.11) reduces to (x, e^) — (x, e*). Thus,
x - у is orthogonal to each element of E. It follows from (c) that у = x.
(d)	=> (e): It follows from (d) and the continuity of the inner product that
m2 =	= 52 K37’ e>i2»
eEE	eEE
as required.
550 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces
(e)	=> (a): If E is not a basis, we can find an element e0 € H such that
||eo|| = 1 and (eo,e) = 0 for each e G E. Thus,
||e0||2 = 1 /0 = £ Keo,e)|2.
e£E
This completes the proof of the theorem.	
EXAMPLE 9.12 Illustrates Theorem 9.6
Assume as known that £2([0,2тг]) is complete, a fact that will be proved
in the next section. We will show that the orthonormal set { en : n G Z },
introduced in Example 9.10(c), is a basis for £2([0,2тг]). By Theorem 9.6,
it suffices to show that if f G £2 ([0,2тг]) is such that
/•2тг
/ f(x)e~inx dx = 0, n G Z,	(9.12)
Jo
then f = 0 ae.
It follows immediately from (9.12) that if p is a trigonometric poly-
nomial, then J027r	dx = 0. As the reader is asked to show in Ex-
ercise 9.34, there is a sequence {pn}^Li of trigonometric polynomials such
that limn-юо \\f — pn||2 = 0. Using the continuity of the inner product, we
conclude that
r2ir ______ /»2тг
I f(x)f(x)dx = lim I f{x)pn(x)dx = Q.
Jo	n—>oo Jq
Hence, f vanishes ae.
Because {en : n G Z} is a basis for £2([0,2тг]), Theorem 9.6 implies
that each function f G £2([0,2тг]) has the Fourier series expansion
/(*) = £ /(")Л
n=—OO
where the convergence is in £2 ([0,2тг]).	□
Unless we know that a Hilbert space possesses a basis, Theorem 9.6 is
of little consequence. That every Hilbert space does in fact have a basis is
part of our next theorem.
9.3 Bases and Duality in Hilbert Spaces □ 551
THEOREM 9.	7
Let Ti be a Hilbert space. Then the following hold.
a)	H has a basis.
b)	If E is a basis for a closed linear subspace К ofH, then there exists a
basis for H containing E as a subset.
с)	H has a countable basis if and only if H is separable.
PROOF: We prove (a) and leave (b) and (c) to the reader as Exercises 9.35
and 9.36. 'Let О denote the collection of orthonormal subsets of H, ordered
by C. Suppose that C is a chain of O. Then Uoec О € is an upper bound
for C. Thus, we may apply Zorn’s lemma (page 17) to obtain a maximal
element of O.	
The Dual of a Hilbert Space
Let у be an element of the Hilbert space H. The mapping defined by
£(x) =	xEH,	(9.13)
is a linear functional and satisfies |^(a?)| < ||z||||2/||. Thus, I belongs to
the dual space 7Y*. It is an important property of Hilbert spaces that all
continuous linear functionals are of the form (9.13).
THEOREM 9.	8
Let H be a Hilbert space. Then leW if and only if there is ay EH such
that £{x) — (x,y) for each x EH. Furthermore, ||^||* = ||з/||-
PROOF: We have already observed that functionals of the form (9.13)
belong to W*. Conversely, suppose that £ G W*. If € is identically 0,
then (9.13) holds with у = 0. Otherwise, К = £“1({0}) is a proper closed
linear subspace of H and, consequently, K1- contains at least one nonzero
element z. For each x E H, we have £(£(z)x — £(x)z) = 0. Thus,
0 = (£(z)x — £(x)z, z) = £(z)(x, z) — £(x){z, z).
It follows that £(x) = {x,y}, where у = (£(z)/{z, z))z.
To find the norm of the linear functional £, we first apply Cauchy’s
inequality to get
\\£\\* = sup{\{x,y)\ : ||x|| < 1} < ||2/||.
Thus, if у = 0, then, trivially, ||£||* = ||з/||. If у / 0, we choose w = 3//Ц3/Ц
in order to obtain Ц2/Ц = (w,y) < ||^||*.	
552 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces
Remark: If E is a basis for a Hilbert space W, then we can write a formula
for the element *y given in Theorem 9.8 in terms of the basis elements.
Indeed, noting that €(e) = (e,?/), we have by Theorem 9.6 that
eEE	eEE
Theorem 9.8 is a prototype for results appearing in subsequent sections
where we find explicit formulas for bounded linear functionals on various
Banach spaces.
EXERCISES 9.3
9.29	Verify the assertions parts (b) and (c) of Example 9.10.
9.30	Prove (a), (b), and (d) of Theorem 9.4.
9.31	Provide the details for the proof of Theorem 9.5.
9.32	In this exercise, E denotes an orthonormal set and H a Hilbert space.
a)	Show that if e and e' are distinct members of E, then ||e — e'||2 = 2 .
b)	Show that if the closed unit ball Bi (0) of H is compact, then H is finite
dimensional.
9.33	Let [a, 6] be a closed bounded interval.
a)	Prove that the continuous functions are dense in £2([a, b]).
b)	Formulate and prove a similar result for unbounded intervals.
9.34	Prove that the trigonometric polynomials are dense in £2([0,2тг]). Hint:
Refer to Exercise 9.33.
9.35	Prove part (b) of Theorem 9.7.
9.36	Prove part (c) of Theorem 9.7.
9.37	Let E be an orthonormal set of a Hilbert space H. Establish the following.
a)	= Se6B (x>e)e for aD X e H.
b)	p(x, spanE)2 = ||x||2 - ^,ceE |<x,e)|2 for all x € H.
c)	If a is a scalar-valued function on E such that |a(e)|2 < °0? then
the sum a(e)e converges.
9.38	Refer to Theorem 9.5.
a)	Apply the technique used in the proof of that theorem to the subset
of £2([—1,1]) consisting of 1, ж, x2, ... to obtain an orthonormal set of
polynomials Lo, Li, ... Show that
L„(x) = (n + l/2)1/2(2"n!)~1dn(x2 - l)n/dxn.
The polynomials (2пп!)-1<Г*(х2 — l)n/dxn are called Legendre poly-
nomials.
b)	Show that {Lo, Li,...} is a basis for £2([—1,1]).
9.4 ГР-Spaces о 553
9.39	The Haar functions are functions on [0,1] defined as follows. Ho(t) = 1,
€ [O’	.1»
H1( )	|-i, te (1/2,1],
and
Hj(t) = ( 2n/2Hi(2nt - j + 2n), t E [-1 + j/2n, -1 + (j + l)/2n];
3	(0	otherwise,
for 2n < j < 2n+1. Show that the Haar functions form a basis for £2([0,1]).
9.40 Let n 6 AT. Define a linear functional S on £2([0,2тг]) by
n
S(f) = X f(k).
k——n
Find a function g 6 £2([0,2-zr]) such that S(f) = f(x)g(x) dx.
In Exercises 9.41-9.44, we will need the concepts of an isometric function and an
isomorphism of normed spaces. Let Q and A be normed spaces and L: Q —► Л.
Then L is said to be isometric (or to be an isometry) if ||L(x)|| = ||x|| for each
x 6 Q. It is said to be an isomorphism if it is linear, one-to-one, onto, and
continuous and L-1 is also continuous.
9.41	Let H be a separable Hilbert space. Show that there is an isometric iso-
morphism from H onto ^2(Af).
9.42	Let 7Y be a Hilbert space. Show that there is an isometric isomorphism
from H onto £2(S) for some set S.
9.43	Prove that the function g —► {-,g) defines an isometric linear mapping
of £2(/z) onto £2(/z)*.
9.44	Show that there is no isometric isomorphism from £2(1l) onto £1(7?.).
9.4 £p-SPACES
In Example 7.6 on page 423, we introduced three normed spaces of measur-
able functions: jC1 (/z), £2(m), and £°°(/z). Now we will generalize to £p(/z),
where p is any positive extended real number. These spaces are called
£p-spaces.
We will show that for p > 1, £p(/z) is a Banach space and will describe
its dual space in the spirit of Theorem 9.8 (page 551). The £p-spaces, along
with spaces of the form C(Q) where Q is a compact Hausdorff space, are
sometimes referred to in the literature as the classical Banach spaces.
554 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces
DEFINITION 9.9 £P-Spaces
Let (Q, Л, g) be a measure space, f a complex-valued Л-measurable
function on fi, and 0 < p < oo.
•	For 0 < p < oo, we define
ap(/)= [ \f\PdP
Jn
and
•	For p = oo, we define
||/||oo = inf{M:|/|<Mp-ae}.
The collection of all complex-valued Л-measurable functions / such
that H/llp < oo is denoted £р(П,Л,р) or, when no confusion can
arise, simply £p(p). The spaces £Р(9,Л, p), 0 < p < oo, are called
£p-spaces.
Note: Under certain conditions, special notation is used for £p-spaces:
•	When p is Lebesgue measure restricted to some Lebesgue measurable
subset Q of 7£n, we write £P(Q) for £p(p).
•	When p is counting measure on some set Q, we write €P(Q) for £p(p)
and, in the special case, fi = AT, we sometimes write simply IP.
As mentioned earlier, we identify functions that are equal p-ae. Keep-
ing that in mind, we will see later that || ||p is a norm on the linear
space £p(p) when 1 < p < oo. When 0 < p < 1, the space £p(p) is
still a linear space, but || ||p is no longer a norm. Rather, in this case,
£p(p) is a metric space with metric given by pp(f,g) = crp(f — p). See
Exercises 9.53-9.55.
EXAMPLE 9.13 Illustrates Definition 9.9
a)	Let [a, b] be a closed bounded interval of 7Z and 0 < p < oo. A complex-
valued Lebesgue measurable function / on [a, b] is in £p([a, b]) if and
only if fb |/(x)|p dx < oo.
9.4 £P-Spaces □ 555
b)	Let p be counting measure on {1,2}. Then the space of real-valued
functions in ^({1,2}) can be identified with 1Z2, We have
||(xi,a:2)||p =
((|xi|p + k2|p)1/p,
| maxflxij, |x2|},
0 < p < oo;
p = oo.
Figure 9.3 shows the unit “circles” centered at (0,0) in the metric space
(7£2,po.5) and in the normed space (7£2, || ||p) for p = 1, 2, 3, and oo.
c)	Refer to Example 5.10(c) on page 293. Let (П,Л, P) be a probability
space. The random variables with finite nth moments are precisely those
in £n(P).
d)	Let p be counting measure on V and 0 < p < oo. A sequence {an}^!
of complex numbers is in IP if and only if |an|p < oo.	□
Our next proposition, whose proof is left to the reader as Exercise 9.45,
provides some basic properties of £p-spaces.
556 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces
PROPOSITION 9.4
Let p be a positive extended real number. Then the following hold.
a)	IIq/IIp = lalll/llp f°r ah f € £р(д) and scalars a.
b)	Cp(p) is a linear space.
c)	For each f С £p(/z), there exists a sequence of simple functions {sn}^^
in £p(p) such that, as n oo, sn f p-ae, \\f — sn||p —> 0, and
Jn |sn|p dp -►	|/|P dp.
In Section 9.2, we used Cauchy’s inequality to prove that an inner
product ( , ) induces a norm via ||ж|| = у/{x,x}. Similarly, we will use
Holder’s inequality, a generalization of Cauchy’s inequality, to show that
|| ||p is a norm when p > 1.
THEOREM 9.	9 Holder’s Inequality
Let 1 < p < oo and let q be such that 1/p 4- 1/q = 1. Then for any two
A-measurable functions f and g, we have
[ l/ffl dp < Hrilpllplk-
Jn
(9-14)
Furthermore, if 1 < p < oo, then equality holds in (9.14) if and only if
there are constants a and /3 not both zero such that a|/|p = /9|p|9.
PROOF: Without loss of generality we can assume that \\f\\p and ||^||g are
finite and nonzero. Suppose that 1 < p < oo. By the concavity of the
natural log function we have
In \fgI = (1/p) In |/|P + (1/g) In |p|9 < ln(( 1/p)|/|₽ + (1/g)|5|’).
Thus,
\fg\ < (i/?)l/lp + (1/<?)Ы’.	(9.15)
If ||/||p = ||p||g = 1, it follows from (9.15) that
f \fg\dp<(l/p) [ \f\pdp + (l/q) [ \g\q dp = 1/p + 1/q = 1 (9.16)
Q	Jn	Jn
and, hence, (9.14) holds in that case. In general, we can replace / and g
by //||/||p and <?/||<?||д, respectively, and use Proposition 9.4(a) and (9.16)
to obtain (H/llpllffll,)"1 Jn\fg\ dp < 1.
We leave the cases p = 1 and p = oo and the “Furthermore, ...” part
to the reader as Exercises 9.46-9.47.	
9.4 /^-Spaces □ 557
THEOREM 9.1	0 Minkowski’s Inequality
Let 1 < p < oo. Then
||/ + ^||Р<]|/||Р + ЫР
for all f,g € £p(p)-
PROOF: The case p = 1 follows immediately from \f + g\ < |/| 4-1<?| and
the case p = oo from the fact that if \f\ < p-ae and |p| < М2 p-ae,
then I/ + g\ < Mi + M2 p-ae.
Suppose that p G (1,00) and let q be defined via 1/p + 1/q = 1. From
I/+ slP < 1/11/+ ff|p-1 + IffllZ + ffr1
/
we get
\\f + 9\\pP < [ l/ll/ + pr1dp+ [ \g\\f + g^1 dp. (9.17)
Noting that
Л|/ + РГ1)9Ф= / \f + 9\qp'-qdp = \\f + g\\pp1
Jn,	Jn
it follows from (9.17) and Holder’s inequality that
II/ + 9\\PP < II/lipII/ + 9\\pP/q + ||ffl|P||/ + 9\\pP/g-
Hence,
ll/ + slirP/? < ll/llp + Ир-
Whereas p — p/q = 1, the proof is complete.	
It follows from Proposition 9.4 and Theorem 9.10 that £p(p) is a
normed space when p G [l,oo]. The next theorem shows that it is in
fact a Banach space.
THEOREM 9.1	1 Riesz’s Theorem
For 1 < p < 00, the normed space (£p(p), || ||p) is a Banach space, that is,
a complete metric space in the metric induced by the norm || ||p.
558 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces
PROOF: We leave the case p = oo to the reader as Exercise 9.51. By
Proposition 9.3 on page 532, it suffices to show that the series fn
converges with respect to the norm || ||p whenever 52X1II/n ||p < oo.
Consider the nondecreasing sequence of functions gn = J2fc=i IАI an<^
set g = limn_^oopn. It follows immediately from Minkowski’s inequality
that f^g^dp < (££=1 ||A||p)p. Applying the monotone convergence the-
orem, we obtain
г	/ 00	\ p
/ gpdfi< (Sll/nllp) < oo-
4=i	/
Hence, g must be finite /z-ae.
It is easy to see that, whenever g(x) < oo, the sequence of partial sums
sn(z) =	fk (T) is Cauchy and, hence, convergent. Let
s(x) =
f >oo sn(x),
I °,
if g(x) < oo;
if g(x) = oo.
Then s e £p(/z) because fQ |s|p dp < fQ |(?|pd/z < oo. Also, using the fact
that |s — sn|p < gp and applying the dominated convergence theorem, we
get
lim ||s - sn||p = lim [ \s - sn|p dp = 0.
n-oo	v	n—oo
We have now shown that the series fn converges with respect to the
norm || ||p.	*	
The Dual Space of Cp(p)
We now take up the problem of describing the bounded linear functionals
on £p(/z). At this point, we restrict ourselves to the case where 1 < p < oo.
To begin, we observe that for g e £Q(/z), where 1/p + 1/q = 1, the linear
functional defined by
£(f)= [ fgdp	(9.18)
is continuous on £p(p). Indeed, by Holder’s inequality, |€(/)| < ||/||p||<j||9
and, therefore, *
НФ < llffll,-	(9-19)
We claim that equality holds in (9.19). If g = 0, there is nothing to prove.
So assume ||(j||Q / 0 and set
sfaA = /ff(*)/|<z(x)l, if / 0;
10,	if g(x) = 0.
9.4 £P-Spaces о 559
Then the function f0 = s|g|9- 1/||g||’_1 satisfies
[ \fo\p dp = [ |s|p|p|P9"p/||p||P9~pdp = [ \g\q/\\g\\qdp = l.
Jn Jn	Jn
Hence, /о € £p(p) and ||/o||p = 1- Furthermore,
€(/o) =	[ s\g\g~1gdfi = - *	f \g\4 dfj, = ||</||,.
Ilffllg Jn	llffllq	Jn
It follows from this last equality and (9.19) that ||£||* = ||^||^.
We have shown that functions in £9(p) induce bounded linear func-
tionals on £p(p) via the formula (9.18). Now the question is whether these
exhaust all bounded linear functionals on £p(p). The following theorem
shows that the answer is yes!
THEOREM 9.12 Riesz Representation Theorem
Let 1 < p < oo and 1/p + 1/q = 1. Then £ G £p(p)* if and only if there
exists a unique g G £9(p) such that
€(/) = [ fgd^ fem
Jn
Furthermore, g satisfies ||€||* = ||p||g.
PROOF: In view of our discussion directly before this theorem, we need
only prove necessity. So assume that £ G £p(p)*. We will work under the
assumption that (fi, A, p) is a finite measure space and leave the general
case to the reader as Exercises 9.62-9.65. We also leave the proof of the
uniqueness of g for Exercise 9.59.
Define the complex measure и on A by v(E') = £(xe\ If p(-E) = 0,
then xe = 0 p-ae and so v(E) = £(xe) = 0- Thus v is absolutely continu-
ous with respect to p. Applying the complex version of the Radon-Nikodym
theorem (page 383), we conclude that there exists a function g G £T(p) such
that
1(xe) = [ gdp, EeA;
Je
By linearity, it follows that £(ф) = ]^фдс1р for all (Л-measurable) simple
functions ф. Thus, | fQ фgdp\ < ||€||*||ф||р for all simple functions. Let .
s(x\ _ J 9(x)/\g(x)\, if g(x) / 0;
{) to,	if g(x) = 0.
560 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces
As the reader is asked to show in Exercise 9.60, we can find a sequence of
simple functions	such that \фп\ < 1 p-ae and —> s p-ae. We
have
f iM>gdp <	IKII»Mp
and, applying the dominated convergence theorem, we obtain
[ Ф\д\ dp < ||4I»Mp-
Jn
(9.20)
We will use (9.20) to show that g belongs to the space £9(p).
Let n e N and En — {x : |р(ж)| < n}. The function /0 = Хеп IpI9”1
belongs to £p(p). Hence, by Proposition 9.4 on page 556, there is a se-
quence	of simple functions such that, as к —> oo, фк —* fo Ц-ж
and \\фк\\р -* ||/o||p- Replacing фк by %En\ФкI if necessary, we may assume
that the фкз are nonnegative and vanish outside of En. Using Fatou’s
lemma and (9.20), we obtain
/ tel9-1 Isd dp < liminf [ фк\д\ dp < ||£||. liminf ||<Mp = ||£||,||/o||p
JEn	Jn	k~*°°
and, hence, that
/ /»	\ 1/q	/ /•	\ i-i/p
(/£ \9\gdp)	=\JE\g\<dp)	<||£||..
Letting n —* oo and applying the monotone convergence theorem, We get
that ||p||g < ||€||*. Thus, g belongs to £9(p).
Because g 6 £9(p), the function tg defined by £g(f) = /n/pdp is
in £p(p)*. As t and agree on simple functions, Proposition 9.4 implies
that they are identical.	
Remark: If p = 1, Theorem 9.12 remains valid under the additional as-
sumption that (fi, Л, p) is cr-finite, as the reader is asked to prove in Exer-
cise 9.61. An example given in Chapter 10 shows that Theorem 9.12 fails
when p = oo.
In view of Theorem 9.12, we can write £p(p)* = £9(p), for 1 < p < oo,
and, in the cr-finite case, for p = 1. However, for p = oo, we can assert only
that
с°°{ру^с\р}.
(9.21)
See Exercise 9.58.
9.4 £P-Spaces □ 561
EXAMPLE 9.14 Illustrates Theorem 9.12
Refer to Example 9.11 on page 547. Let x e [0,2тг] and 1 < p < oo. Define
the linear functional tx on £p([0,2тг]) by
= E f№ikx
k=—n
Of course, £x just gives the value at x of the nth partial sum of the Fourier
series of f.
First we will show that (x is bounded and then we will find the function
g 6 £Q([0, 2tt]) guaranteed by Theorem 9.12. Prom Holder’s inequality,
l/(*)l = Г №e-ikvdy
27Г Jq
< ф* ( Г le-ifcVl9dy} /9 = ||/||р(2тг)-1/₽.
2тг \Jo
It follows at once that f£x(f)| < (2п+1)(2тг) 1/Гр||/||р. Thus, lx is bounded.
Finally, we write
W) = E V- / f№ik{X~y} dy = / f(.y)Dn(x - y) dy,
27r Jg	Jg
where
ад)=i E eikt = <
27Г
sin((n + l/2)t)
27Г sin(t/2)
Thus, the function g guaranteed by Theorem 9.12 is g(y) = Dn(x — y). □
EXERCISES 9.4
9.45	Prove Proposition 9.4.
9.46	Prove the “Furthermore, ...” part of Holder’s inequality.
9.47	Verify Holder’s inequality for p = 1 and p — oo.
9.48	Discuss the case of equality in (9.14) when p = 1 or p = oo.
9.49	Suppose that p.q 6 (0, oo].
a)	Let r be such that 1/r = 1/p + 1/q. Show that if f 6 £p(p) and
9 € £9(p), then fg € £г(м) and ||/p||r < ||/||ph||g.
b)	Let (Q, Л, p) be a finite measure space. Show that if 0 < s < r < oo,
then £г(р) C £e(p).
562 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces
9.50	Let (£2,Л,р) be a finite measure space. Show that for each f 6 £°°(p),
II/IIp -* ll/lloo asp-> oo.
9.51	Prove that the normed space (£°°(p), || ||oo) is a Banach space.
9.52	Show that (£p([0,1]), || ||p) is not an inner product space unless p = 2.
9.53	Show that || ||p does not define a norm on £p([0,1]) when 0 < p < 1.
if 9.54 Refer to Definition 9.9 on page 554.
a)	Show that if 0 < p < 1, then сгр(/ + g) < (TP(f) + crp(p).
b)	Deduce that pp(/, g) = crp(f-g) defines a metric on £p(p) for 0 < p < 1.
9.55 Refer to Exercise 9.54. Show that if 0 < p < 1, then (£p(p),pp) is a
complete metric space.
4t9.56 Let J be a nonempty interval in 7Z and 0 < p < oo.
a)	Show that if J is closed and bounded, then C(J) is dense in £P(J).
b)	Refer to Example 7.26 on page 489. Show that Cc(J) is dense in £P(J).
c) Show that Cc(J) is not dense in £°°(J).
9.57	Let 0 < p < oo. Prove that the trigonometric polynomials are dense
in £p([0,27t]).
9.58	The result of this exercise gives meaning to the relation (9.21) on page 560.
Prove that if g G £T(p), then £(f) = j^fgdp defines a bounded linear
functional on £°°(p) and that ||^||* = ||p||i.
9.59	Prove the uniqueness of the function g in. Theorem 9.12.
if 9.60 Suppose that f G £°°(p). Show that there exists a sequence of simple
functions {0n}SXi such that |0n| < II/II с» M-ae and limn->oo Фп = f p-ae.
if9.61 Prove Theorem 9.12 when p = 1 under the assumption that (П,Л, p) is a
cr-finite measure space.
In Exercises 9.62-9.65 we complete the proof of Theorem 9.12 by eliminating the
restriction p(Q) < oo.
9.62	Suppose (Q, Л, p) is a measure space. For E G Л, define the measure pe
on A by ре(Л) = p(E П A).
a)	Show that / 6 £p(pe) if and only if XEf 6 £p(p);
b)	Show that if £ 6 £p(p)*, then £e(J) = £(хеГ) defines a continuous linear
functional on £p(pe) and ||/?e||* < ||£||*.
c)	If р(Е) < oo, show there is a unique function gE G £9(p) such that
gE vanishes outside of E, £e(J) = fgEdp, for each / 6 £p(pe), and
ИЫ2 = fn |S£|’d/x£-
9.63	Use Exercise 9.62 to prove Theorem 9.12 in case (П,Л, p) is cr-finite.
9.64	Let (£2,Л,р) be an arbitrary measure space and 1 < p < oo. Show that
if £ € £p(p)*, then there exists a sequence {Qn}^i of Л-measurable sets
9.5 Nonnegative Linear Functionals on C(Q) □ 563
such that /z(Qn) < oo for each n € J\[ and ^(хл) = 0 for each A e A such
that /i(A) < oo and A C (UXi ^n)c.
9.65	Use Exercises 9.62-9.64 to verify Theorem 9.12 for an arbitrary measure
space (Q, A,/i).
9.5	NONNEGATIVE LINEAR FUNCTIONALS ON C(Q)
We have now characterized the dual spaces of Hilbert spaces (Theorem 9.8
on page 551) and £p-spaces (Theorem 9.12 on page 559). Our next task,
which we will begin in this section and complete in the following one, is to
characterize the dual spaces of C(Q) and Co(J2).
We will see that the linear functional defined on C([0,1]) by
= f1 f(x)dx= [ fdX
Jo	J[0,l]
is typical in the sense that all bounded linear functionals on C(fi) arise
from integration with respect to some complex measure. Here we lay the
foundation for the general result by characterizing those that arise from
integration with respect to a (nonnegative) measure.
Borel Sets and Regular Borel Measures
In Chapter 3 we defined the collection В of Borel sets of 'll. We showed in
Theorem 3.4 that В is the smallest cr-algebra of subsets of 1Z that contains
the-open sets of 7£. This characterization allows us to extend the concept
of Borel sets to any topological space.
DEFINITION 9.10 Borel Set, Measure, and Measurable Function
Let Q be a topological space. The smallest cr-algebra of subsets of Q
that contains all the open sets is denoted В(П). We use the following
terminology:
•	Borel set: a member of B(Q).
•	Borel measurable function: a function measurable with respect
to B(fi).
•	Borel measure: a signed or complex measure on B(Q).
564 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces
EXAMPLE 9.1	5 Illustrates Definition 9.10
a)	B(R) ~ B, as defined in Chapter 3.
b)	b(h2) = 02 = В x В, as discussed in Exercise 4.147 on page 244. More
generally, B(1ln) = Bn = BxBx-xB,as discussed in Exercise 4.171
on page 259.
c)	Let fl be any set and T = {fl, 0}. Then B(fl) = T.
d)	Let fl be any set and T be the discrete topology on fl. Then we have
that B(fl) = T = P(Q).
e)	Let (fi, T) be a topological space. Then all functions in C(fl) are Borel
measurable.	□
To characterize the bounded linear functionals on C(fl), we need the
concept of a regular Borel measure. We recommend that the reader review
the discussion of the total variation of a complex measure presented in
Section 6.7 starting on page 381.
DEFINITION 9.11 Regular Borel Measure
Let fl be a locally compact Hausdorff space. A complex Borel mea-
sure д is said to be a regular Borel measure if for each В E B(fl)
and e > 0, there is a compact set К and an open set О such that
KcBcOand \p](O\K)<e.
The collection of all regular Borel measures on fl is denoted by M(fl);
the real-valued and nonnegative regular Borel measures are denoted,
respectively, by Afr(fl) and
Remark: Definition 9.11 requires that a regular Borel measure be finite
valued. Other definitions of regular Borel measure exist and some permit
certain extended real-valued measures, such as Lebesgue measure, to be
regular.
EXAMPLE 9.1	6 Illustrates Definition 9.11
a)	Lebesgue measure on [0,1] is a regular Borel measure. In fact, Lebesgue
measure on any Borel set of finite Lebesgue measure is a regular Borel
measure.
b)	The Lebesgue-Stieltjes measure corresponding to a distribution function
on R is a regular Borel measure, as the reader is asked to establish in
Exercise 9.68.
9.5 Nonnegative Linear Functionals on C(Q) □ 565
c)	Let Q be a locally compact Hausdorff space. For x G Q, the Dirac
measure concentrated at x, restricted to the Borel sets of Q, is a regular
Borel measure. See Exercise 9.71.	□
Suppose that Q is a locally compact Hausdorff space. The spaces M(Q)
and Afr(Q) are, respectively, complex and real linear spaces, where the
operations of addition and scalar multiplication are defined by
(/z 4- i/)(B) = jz(B) +	and (q/z)(B) = afi(B).
Referring to Exercise 6.112 on page 386, we see that the linear spaces M(Q)
and Afr(Q) are also normed spaces, where the norm is given by the total
variation, that is, ||jz|| = |д/| (Q). Moreover, as the reader is asked to prove
in Exercise 9.66, M(Q) and Mr(Q) are Banach spaces with respect to the
norm || ||.
If F is a closed subset of Q, then any и G M(F) can be extended to a
regular Borel measure i/' on Q by defining
1/'(B) = 1/(BAF), BeB(fi).
It is convenient to view v as a measure on Q by identifying it with v'. In
this way we can identify M(F) with the linear subspace
{M € M(Q) : м(В) = 0 for all В G B(Q) with В C Fc}.
Nonnegative Linear Functionals
From now on in this section, unless explicitly stated otherwise, we assume
that Q is a compact Hausdorff space. If jz G Л/(П), then ц induces a linear
functional on the space C(Q) via
W) = [ fd^ fem
That is a bounded linear functional follows from
IM/)I < llfllnlMl(fi) = 11/hlHI,
where we have applied Exercise 6.117(b) on page 387.
In this section we will show that any linear functional on C(Q) sat-
isfying a certain nonnegativity condition must be of the form for some
fi G M+(Q). In the next section, we will extend this result to all bounded
linear functionals on C(Q) if Q is a compact Hausdorff space and to Co(Q)
if Q is a locally compact Hausdorff space.
566 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces

DEFINITION 9.12 Nonnegative Linear Functional
A linear functional £ on (7(Q) is said to be nonnegative if €(/) > 0
whenever f > 0.
As the reader is asked to show in Exercise 9.75, the linear functional
on C(Q) induced by a regular Borel measure p is nonnegative if and only
if p is nonnegative.
The following theorem, whose proof is left to the reader in Exer-
cises 9.76-9.81, presents some basic properties of nonnegative linear func-
tionals.
THEOREM 9.13
Let Q be a compact Hausdorff space.
a)	If £ is a nonnegative linear functional on C(Q), then £ G C(Q)* and
||€||. =£(1).
b)	If £ e (7(Q)* and ^((7(0,7?,)) C 7Z, then there exist nonnegative linear
functionals and £- such that ||€|| * = €+(l) + ^-(1) and £ = £+ - €_.
We have noted that a nonnegative regular Borel measure on Q induces
a nonnegative linear functional on C(Q). Our next theorem shows that all
nonnegative linear functionals on C(Q) are of that type. There are two
main ideas in the proof of this result. One is the use of Urysohn’s lemma
to obtain suitable approximations to characteristic functions of closed sets.
The other is to mimic the construction of Lebesgue measure from Lebesgue
outer measure.
With regard to the latter, recall that the collection Л4 of Lebesgue
measurable sets is defined using the Caratheodory criterion and Lebesgue
outer measure: E G Л4 if and only if
A*(W) = A*(W A E) + A*(W A Ec)
for all W c Theorem 3.11 on page 120 shows that Ad is a cr-algebra. A
careful look at the proof reveals that it uses only the properties of Lebesgue
outer measure given in (a), (b), Xе), and (e) of Proposition 3.1 on page 106.
In other words, we have already proved the following proposition.
PROPOSITION 9.5
Let Q be a set and v* an extended real-valued function on P(Q) satisfying
the following conditions:
9.5 Nonnegative Linear Functionals on C(Q) □ 567
a)	v* (A) > 0 for each А С П.
b)	= 0.
c)	4cB=>/(A)</(B).
d)	{An}n C P(Q) => v* (Un An) < EnP*(An).
Then the collection of subsets E of Q satisfying
= l/^WdE)+ v\W П Ec)
for all W C Q is a cr-algebra. Members of this cr-algebra are referred to as
i/*-measurable sets?
We now state and prove the main result of this section, known as the
Riesz-Markov theorem.
THEOREM 9.14 Riesz-Markov Theorem
Let Q be a compact Hausdorff space and £ a nonnegative linear functional
on C(Q). Then there exists a unique p G Af+(Q) such that
^(/)= [ fdp, JeC(fl).
PROOF: We start by assigning a nonnegative number /7(0) to each open
set O. If О = 0, let p(O) = 0; otherwise, let
/7(0) — sup{€(/) : 0 < f < 1 and supp/ С О }.
We note that /7(0) < /7(Q) = ^(1) for all O. Next, for each A C fl, we
define
p*(A) = inf{/7(0) : О open and О D A}.
Observe that /z*(O) = /7(0) whenever О is open.
We will show that /z* satisfies the hypotheses of Proposition 9.5. Con-
ditions (a)-(c) follow easily from the definition of /z*. To verify condi-
tion (d), we first show that if {On}^^ is a sequence of open subsets of Q,
then
z OO	\	oo
м(и°п)	(9-22)
'71= 1	'	71=1
t Proposition 4.13 on page 210 shows that the Quter measure i/* induced by an
appropriate set function on a semialgebra of subsets of a set Q satisfies (a)-(d) of
Proposition 9.5. Thus the concept of i/*-measurability given here is the same as that
in Definition 4.18 on page 211.
568 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces
Let f be a continuous function satisfying 0 < f < 1 and supp f c U~ t On.
Applying Theorem 7.15 on page 477 with К = supp /, we obtain continuous
functions A, /2, ..., /m satisfying
•	0 < fj < 1, for each j,
•	EjLi Л(®) = 1 for x e supp/,
•	fj i> 311(1
•	for each J, there is an mj such that supp fj C •
By replacing fj by 52mfc==rn. fk if necessary, we can assume that the
mjS are distinct. It is clear that f = SJLi f fj and, so, €(/) = Kffj)-
As supp ffj С Omi -> it follows that
J /
^(/)<f>(On).	(9.23)
n=l
Taking the supremum on the left-hand side of (9.23), we obtain (9.22). It
is now easy to check that /z* satisfies condition (d) of Proposition 9.5, as
we ask the reader to verify in Exercise 9.82.
We complete the proof of the theorem by showing successively that
•	all open sets are /z*-measurable,
•	д — is a regular Borel measure, and
•	*(/) = for a11 f e <ЭД-
To show that an arbitrary open set О is /^’-measurable it suffices to
prove that
/z*(A) > ц*(А П О) + /z*(A П Oc)	(9.24)
for all А С П. Let U be an open set containing A, f a continuous function
satisfying 0 < f < 1 and supp f C U П O, and V = U П (supp /)c. If g is a
continuous function satisfying 0 < g < 1 and supp g С V, then
supp(/ + g) C supp f U suppgCU.
It follows that
M(t/)>€(/)+^).	(9.25)
From (9.25) we deduce that
Ш > £(f) + /z(V) > €(/) + /z*(A П Oc)
9.5 Nonnegative Linear Functionals on C(Q) □ 569
and, therefore, that
Д(С7) > T1{U П О) + /Г(А П Oe) > /Г (А П О) + д*(А П 0е).
As the open set U containing A was chosen arbitrarily, (9.24) holds.
Having shown that all open sets are /i*-measurable, we can invoke
Proposition 9.5 and Proposition 4.16 on page 213 to conclude that all Borel
sets are /immeasurable and /1 = Mib(Q) *s a B°rel measure. To show that
/1 is regular, we first observe that, by the definition of /i*,
/i(B) = inf{ /1(0) : О open and О D В }, Be B(Q).	(9.26)
Because /z(Q) = £(1) < oo, we have for each В e that
д(В) = д(Я) - д(Вс)
= д(П) - inf{ fi(W) : W open, W D Bc }
= sup{ n(Wc) : W open, W D Bc }
(9.27)
= sup{ /z(F) : F closed, F С В }.
It follows at once from (9.26) and (9.27) that g is regular.
Finally, we must show that
ПП = [ fdli.
Jq
f e C(Q).
(9.28)
Every function in C(Q) is a linear combination of functions with values
in the interval [0,1). Therefore, by the linearity of £, it suffices to estab-
lish (9.28) in case 0 < f < 1.
Let n e J\T. For each integer fc, 0 < к < n, the sets Fk — f~Wk/n, oo))
and Uk = /“1(((fc — l)/n, oo)) are closed and open, respectively. Moreover,
we have
n—1
C Uk+i C Fk cUk and Q = (Ffc \ Fk+i).
fc=O
If Fk = 0, we set gk = 0. Otherwise, we first invoke the regularity of /i to
choose an open set Vk such that Fk CVkcUk and HZo1 f4Vk\Fk) < 1 and
then apply Proposition 7.14 on page 449 and Urysohn’s lemma on page 450
to obtain a continuous function gk such that 0 < gk < 1, St (Ft) — {1},
and suppgfc C Vk-
570 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces
Let h = (1/n)	9j- We claim that f < h. For each x E Я, choose
the unique к such that x E /~1([fc/n» + l)/n)) =	If 0 < J < fc,
then gj(x) = 1 because Fk C Fj\ if j > к 4- 1, then gj(x) = 0 because
x € F£+1 C t/^+2 c Uj c ХЛ ft follows that
h(x) = (fc 4- l)/n 4-	> /(*),
as required. Using the fact that f < h and the nonnegativity of £, we
obtain
n—1	n—1
ЦП < Ць) (1/n) £>ft) < (1/n)
J=O	j=0
n—1	n—1
= (l/«)52(M(vj \fj)+m(fj)) < 1/n + (1/n) ^^(F,).
J=o	j=0
(9.29)
For j = 0, 1, ..., n — 1, we can write Fj = Ut=J (-ffc \ -ffc+i) anch therefore,
p(Fj) = ^p(Fk\Fk+1).
k=j
Applying (9.29), we get
ЦП < 1/n + (1/n) £ £ M(Ffc \ Fk+1)
j=Q k=j
= 1/n + (1/n) ^(fc + l)M(Ffc \ Ffc+1)
fc=0
n—1
= 1/n + M(n)/n + 5?(fe/n)pt(Ffc \ Ffc+1)
fc=0
= 1/n + £(l)/n + [ 52(fc/n)x(Ffc\Ffc+i)
Ja k=o
' <(l+£(l))/n+ f fdn.
Jq
Because n was chosen arbitrarily, it follows that
^(/)<
f dp..
(9.30)
9.5 Nonnegative Linear Functionals on C(Q) □ 571
We can replace f by (1 — /)/2 in (9.30) to get
£(1)-£(/)< д(П) - [ /dg = €(l)- [ fdfi.
Jq	jq
Thus (9.28) holds.
It remains only to prove the uniqueness of д, which we leave to the
reader as Exercise 9.83.	
EXERCISES 9.5
9.66 Let Q be a locally compact Hausdorff space. Show that (M(Q),|| |[) and
(Mr(Q), || ||) are Banach spaces, where ||д|| = |д|(Q).
9.67 Let fi be a locally compact Hausdorff space. Show that if p G M(Q), then
\p\ g M(Q).
★9.68 In this exercise, you are asked, among other things, to verify the statement
of Example 9.16(b).
a)	Prove that if a locally compact metric space Q is the countable union of
compact subsets, then every complex Borel measure on Q is regular.
b)	Show that the Lebesgue-Stieltjes measure associated with a distribution
function on 11 is a regular Borel measure.
9.69 Suppose Q is locally compact and p G M_|_(Q). Prove that Cb(Q). is dense
in £p(p) for 1 < p < oo.
9.70 Let p G M([0,1]) satisfy
*
I xn dp(x) = 0
J [од]
for n = 0,1,2,.... Show that p = 0, that is, p vanishes identically.
9.71	Suppose Q is a locally compact Hausdorff space. Let x G Q and 6X be
defined on B(Q) by,
c (/ 1’ if ж G B,
= 10, ifx^B.
a)	Show that 6X is a regular Borel measure.
b)	Determine f f d6x when f G C(Q).
v a 4
9.72	Let 6X be as in Exercise 9.71.
a)	Show that ||6X — 6y|| = 1 when x / y.
b)	Deduce that M(Q) is not separable if Q is uncountable.
9.73	Show how to identify M(Q) and ^(Q) when fi is countable.
9.74	Let fi be a locally compact Hausdorff space, p G M(Q), and В G B(Q).
Prove that there are sets F and G such that G is a countable intersection
572 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces
of open sets, F is a countable union of closed sets, F С В C G, and
|/z|(G\F) = 0.
9.75	Suppose that fi is a compact Hausdorff space and that /i € M(Q). Show
that £M(/) = f dpL defines a nonnegative linear functional on C(Q) if and
only if € M+(Q).
Exercises 9.76-9.81 provide the proof of Theorem 9.13 on page 566.
9.76	Show that if £ is a nonnegative linear functional on C(Q), then £ 6 C(Q)*
and ||£||* = £(1).
9.77	Suppose that £ satisfies the hypotheses of part (b) of Theorem 9.13. For
each nonnegative continuous function f on П, let
£+(f) = sup{ £(g) :0 < g < f, g continuous}.
a)	Show that if /i and /2 are nonnegative and continuous, then
M/1+/2) = €+(Л) + £+(Л).
b)	Show that 0 < f < g implies £+(/) < £+(g).
c)	Show that £+(a/) = a£+(/) whenever f > 0 and a is a nonnegative real
number.
9.78	Extend the function £+ defined in Exercise 9.77 to all of C(Q,TV) by the
formula M/) = MII/11 + /) - MII/11), where ||/|| = ||/||n.
a)	Prove that this new definition of £+ (/) agrees with the old one when f is
nonnegative.
b)	Show that this extended £+ is linear on the space C(Q,7£).
9.79	Extend the function £+ defined in Exercise 9.78 to all of C(Q) by the formula
М/) = МЯ/) + *МЗ/).
a)	Prove that this new definition of £+(f) agrees with the old one when f is
real valued.
b)	Show that this extended function is linear and nonnegative.
9.80	Suppose that £ satisfies the hypotheses of part (b) of Theorem 9.13. Let
— £, where £+ is defined as in Exercise 9.79. Show that £- is
nonnegative.
9.81	Suppose that £ satisfies the hypotheses of part (b) of Theorem 9.13. Let £+
and £- be defined as in Exercise 9.80. Show that ||£||* = €-|-(l) + ^-(1)-
Hint: If 0 < g < 1, then ||2p — 1|| < 1 and, so, ||£||* > 2£(g) — 1.
9.82	Show that the set function д* defined in the proof of Theorem 9.14 satisfies
condition (d) of Proposition 9.5.
9.83	Prove the uniqueness part of Theorem 9.14.
9.6 The Dual Spaces of C(Q) and Co(Q) □ 573
9.6	THE DUAL SPACES OF C(Q) AND C0(Q)
In this section, we extend the Riesz-Markov theorem (page 567) to arbitrary
bounded linear functionals on C(Q). We will also characterize the bounded
linear functionals on Со(П) when Q is a locally compact Hausdorff space.
These results show that we are justified in writing C(Q)* = M(Q) and
Co(Q)* = M(fi) in the compact and locally compact cases, respectively.
LEMMA 9.1
Suppose that Q is a compact Hausdorff space and p G Af(Q). Further
suppose that ф is a complex-valued Borel measurable function such that
|0| < 1 |/x|-ae. Then there is a sequence {/n}^=i of continuous functions
such that || fn||q < 1 for each n and fQ \ fn — ф\ d\p\ —> 0 as n —> oo.
PROOF: By Exercise 9.60 on page 562, we can choose a sequence
of Borel measurable simple functions such that \фп\ < 1 |//|-ae for all n and
limn-^oo фп = ф |/i|-ae. Applying the dominated convergence theorem, we
get that
lim [ \фп - ф\ d\p\ = 0.	(9.31)
n~*OO JQ
Let n G Af be fixed but arbitrary. We can write фп = <*kXEk,
where | < 1 for each к and the EkS are pairwise disjoint Borel sets whose
union is Q. Using the regularity of д, we can find compact sets Fk C Ek
such that |/i|(£?fc \ Fk) < 1/nm for к = 1, 2, ..., m.
For each fc, we can write ак = |a&\егвк, where 0 k € [0,2%). If x G Fk,
define uq(x) = |ak| and vq{x) = 0k- Since the FfcS are pairwise disjoint and
closed, the functions uq and vq are well-defined and continuous on (JfcLi
and, furthermore, |uo| < 1.
By Tietze’s extension theorem (page 451), we can extend uq and vq to
continuous real-valued functions и and v on all of fi such that |u| < 1. Let
fn = ueiv. Then fn = фп on UZLi F* and \\fn||n < L Moreover,
I \фп fn\ d\p\ = I |ctfc — fn\ d|/x|
JQ	k=l JEk
m p
W + l/W	(9.32)
m	m
<	2|m|\ Fk) < 52 2/mn = 2/n.
k=l	k=l
It follows from (9.31) and (9.32) that Итп_чоо Jq \fn — Ф\ d|g| = 0.	
574 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces
THEOREM 9.15 Riesz Representation Theorem
Let Q be a compact Hausdorff space. Then I G if and only if there
exists a p G M(Q) such that
f fdp, feC(ty.
Jn
Furthermore, the measure д is unique and satisfies
PH» = |Д|(П).
(9.33)
(9.34)
PROOF: In the penultimate paragraph before Definition 9.12, we showed
that each p G M(Q) induces a bounded linear functional on C(Q) via the
relation (9.33).
Conversely, suppose that t G C(Q)*. Define
W) = IШ + €(/))	and £im(/) = 1(£(/) - £(/)).
2	2г
Then £re and £im satisfy t 4- and the hypotheses of Theo-
rem 9.13(b) on page 566. Therefore, by the Riesz-Markov theorem, there
are measures /ii, /12, Рз, P4 € M_|_(fi) such that
4e(/) = [ fdpr- [ fdp2 and 4m(/) = [ fdp3- [ f dp4
for all f G C(Q). Thus, the measure p = pi — p2 + г(рз — pt) belongs
to M(Q) and satisfies (9.33).
To verify (9.34), we note first that
НФ = sup
= sup( j^f dp. : ||/||n <11
< sup{ №|д|(П) : № < 1 } = |д|(П).
To prove the reverse inequality, we use Exercise 6.117 on page 387 to obtain
a Borel measurable complex-valued function ф such that ]</>| = 1 |/z|-ae and
Jnvdp = JQvфd\p\ for all v G £х(|д|). Applying Lemma 9.1 to ф, we
choose a sequence {/n}^=i of continuous functions such that ||/п||п < 1
and fQ \ fn — ф\ d\p\ —» 0. We have
fndp- |/z|(Q) = [ ф(/п~Ф)Л\р\ < [ |/n — <£|d|/z|.
It follows that |/z| (Q) < ||£||* and, hence, (9.34) holds. The proof of unique-
ness is left to the reader as Exercise 9.84.	
9.6 The Dual Spaces of C(Q) and Co(fl) □ 575
The Case Q Locally Compact
Next we extend Theorem 9.15 to locally compact, noncompact Hausdorff
spaces. In this case, we work with Co(fl) rather than C(fl) because || ||q is
no longer a norm on C(Q).
THEOREM 9.16 Riesz Representation Theorem
Let fl be a locally compact, noncompact Hausdorff space. Then t G Co (fl)*
if and only if there exists a p G M(fl) such that
[ fdp,
JQ
f e Co(fl).
(9.35)
Furthermore, the measure p is unique and satisfies ||£||* = |/i|(Q).
PROOF: Let t G Co (fl)*. We will prove the existence of the measure p
satisfying (9.35), leaving the proofs of the remainder of the assertions to
the reader as Exercise 9.86.
Let fl* = Q U {o>} denote the one-point compactification of Q, as
described in Theorem 7.17 on page 480, and define the function L on C(fl*)
by L(g) = ^(<7|q —<?(^)). Clearly L is linear. That it is also bounded, follows
from
1Ш1 = W<7|n-ff(u0)l < IWIsiq -ffMllfi < 2||£||.^||n-
Hence, by Theorem 9.15, there is a measure p* G Af(fl*) such that
L(g) = [ gdp\ geC(Q*).
J a-
Letting p = Mjs(Q), we obtain
£($) = [ gdp+ д(ш)р'9 e C(Q*).	(9.36)
JQ
Now let f G Co(fl). By defining /*(#) = f(x) for x G fl and /*(o?) = 0,
we can extend f to a function /* G C(fl*) having the same norm; indeed,
Co (fl) is the collection of restrictions to fl of functions in C(fl*) that vanish
at ш. We have by (9.36) that £(/) = L(J*) = f^f dp. The regularity of p
follows from Exercise 9.85.	
576 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces
*
Two simple but instructive illustrations of Theorem 9.16 are provided
in Example 9.17. In the next chapter, we will see more elaborate applica-
tions of the results of this section.
9
EXAMPLE 9.17 Illustrates Theorem 9.16
a)	When it is given the discrete topology, the set of positive integers N
becomes a locally compact space. Cq(AT) is simply the collection of
all sequences {an}^=1 of complex numbers such that Нтп-юо ап = 0.
By Exercise 9.73, we can identify M(AQ with ^(AQ and so we can
write Cq(AT)* = €1(Af). It follows from Theorem 9.16 that the bounded
linear functionals on are of the form £(a) = 52^=1 an&n, for some
b E ^(A/*) and, furthermore, that ||£||* = SmLi l&n|>
b)	Let Q be a locally compact Hausdorff space and жо € П. Define the
function I on Co(fi) by £(/) = /(xo). Clearly, £ € Co(Q)* and ||£||* < 1.
Since /(x0) =	ft follows from the uniqueness part of Theo-
rem 9.16 that /z = 6Жо. Moreover, ||^||* = |6Жо|(Q) =	= 1.	□
EXERCISES 9.6
9.84	Verify the uniqueness assertion in Theorem 9.15.
9.85	Let Q be a locally compact, noncompact Hausdorff space and fi* = Qu{oo}
its one point compactification.
a)	Show that B(Q) C B(Q*).
b)	Show that /z 6 M(Q) if and only if there exists /z* € such that
д*(В) = м(В) for all В € B(Q).
9.86	Verify the assertions in Theorem 9.16 that we did not prove.
9.87	Refer to Exercises 7.149 and 7.150 on page 475. Let Q be a compact Haus-
dorff space, g a lower-semicontinuous function on Q, and д € M+(Q). Prove
that fQgdp = sup{ f f dp: f € C(Q) and f < g }.
9.88	Let Q and A be compact Hausdorff spaces, p E M(Q), and G: Q —» A be
continuous.
a)	Show that there is a measure v E M (A) such that Д / di/ == f о G dp
for all f E C(A).
b)	Verify that и = p о G"1, the measure induced by p and G.
9.89	Define the linear functional € on C([0,1] x [0,1]) by €(/) = f(x,x)dx.
Describe explicitly the measure p that satisfies €(/) =	f dp. Hint:
Refer to Exercise 9.88.
9.90	In Exercise 4.158 on page 256, we defined the convolution product of two
nonnegative сг-finite Borel measures on 7£. An alternative definition that
holds for any two (complex) Borel measures on 1Z is given as follows. For
9.6 The Dual Spaces of C(fi) and Co(Q) □ 577
/1, v 6 define the convolution product of p and v to be the unique
measure /i * v G M(1V) satisfying
' fdp*v= I / f(x + y)dp(x)dv(y), feCo(1Z).
n	J'rJ’R
Show that for /z, i/ € M+(7£), this definition agrees with the one given in
Exercise 4.158(d) on page 256.
9.91	Refer to Exercise 9.90. For p 6 M(7£), find p * <5q.
9.92	Let Q be a locally compact Hausdorff space and и 6 M+(Q). Denote
by AC(y) the collection of measures in M(Q) that are absolutely continuous
with respect to v. Prove that AC(y) is a closed subspace of M(Q).
9.93	Refer to Exercise 9.92. Show that C1 (y) is isometrically isomorphic to AC(y)
via the correspondence f —► i/y, where = f f du.
Stefan Banach
(1892-1945)
Stefan Banach was bom in Krakow, Poland,
on March 30, 1892, the son of a railway offi-
cial. His parents gave Banach to a woman who
gave him her name. Banach graduated from
high school in Krakow in 1910. He supported
himself by tutoring for the last three years.
Although Banach attended the University of
Lvov, he was awarded his doctorate in mathe-
matics in 1919 under the unusual circumstance of not completing a uni-
versity education. Banach's thesis, "Sur les operations dans les ensembles
abstraits et leur application aux equations integrates,” was published in
Fundamenta mathematicae in 1922.
A professor at the University of Lvov from 1927, Banach was also
a member of the Polish and Ukrainian Academies of Science. With his
friend H. Steinhaus he founded the journal Studia Mathematica. Through
his writings and through his students, many of whom, lite S. Mazur,
W. Orlicz, J. Schauder, and S. Ulam, became notable researchers, Banach
exerted enormous influence on mathematics. In his classic monograph,
Theorie des operations lineaires," he laid the foundations of modern
functional analysis. He also made important contributions to the theory
of measure and integration, to orthogonal series, and to general topology.
During the Nazi occupation of Lvov (1941-1944), Banach was forced
to work in a German infectious disease institute where his health was
broken. He died in Lvov less than a year later on August 31, 1945.
578
□
10
Basic Theory of Normed
and Locally Convex Spaces
In this chapter, we will develop the basic theory of normed spaces. We
will also present results on locally convex spaces, spaces that include the
normed spaces as well as interesting spaces like C(Q), where Q is locally
compact but not compact.
Section 10.1 discusses the Hahn-Banach theorem and some of its most
important consequences. In Section 10.2, we investigate linear transfor-
mations of Banach spaces. Section 10.3 introduces locally convex spaces
and examines some of their fundamental properties. Section 10.4 discusses
locally convex topologies on normed spaces and their duals. And, in Sec-
tion 10.6, we present the Krein-Milman theorem, a result about compact
convex subsets of locally convex spaces.
10.1 THE HAHN-BANACH THEOREM
We begin by introducing notation for a normed space that is suggested
by the duality theory of Hilbert spaces — each bounded linear functional £
on a Hilbert space H is of the form £(x) = (x, y) for some у G H. (See
Theorem 9.8 on page 551.)
579
580 d Chapter 10 Basic Theory of Normed and Locally Convex Spaces
Let (Q, || ||) be a normed space. For x E Q and x* E П*, we define
(x,x*) = x*(x).
And when A С fl, we let
A1 = {x* e П* : (x,x*) = 0 for all x € A}.
As the reader is asked to verify in Exercise 10.1, Ax is a closed linear
subspace of П*.
The notation we have just introduced and Theorem 9.3 on page 542
suggest the following conjecture:
Let К be a closed linear subspace of the normed space Q and let
x e П. Then there is an Xq E KL such that ||xq||* < 1 and
p(x, A") = inf{ ||® - 2/|| : У € К }
= sup{ |(x,®*)| : x* 6 and ||x*||, < 1} = {x,Xq).
Verifying this conjecture depends on being able to extend a linear functional
on span({x} U K) to all of Q without increasing its norm. This requires
a fundamental result that is the main topic of this section, namely, the
Hahn-Banach theorem.
THEOREM 10.1 Hahn-Banach Theorem
Let V be a linear space with real scalars and p a real-valued function on V
such that
p(u + v) < p(u) + p(u), u, v E V
and
p(au) == ap(u), v E V, а > 0.
Suppose Vo is a linear subspace of V and £q is a linear functional on Vfc
such that
4)(v) < p(v), V E Vo-
Then there exists a linear functional I on V such that £(v) = £q(v) for each
v E Vo and £(u) < p(u) for each u^V.
10.1 The Hahn-Banach Theorem □ 581
PROOF: If Vb = V, there is nothing to prove. So, assume Vo is a proper
subspace of V. We begin by enlarging only slightly the domain of the
functional £q. Let Vi G Vq and consider the linear subspace
Vi = { avi 4- v : a e 7£, v G Vq }•
Tq define a linear functional on Vi that agrees with £q on Vb and satisfies
£i(avi 4- v) < p(avi 4- v), a G TZ, v G Vo,	(Ю.1)
it suffices to assign a value /3 to ^i(vi) such that
a/3 < p(avi 4- v) — £q(v), a eTZ, v G Vq.	(10.2)
Indeed, if we can find such a /3, then the mapping £f. Vi —> TZ defined by
-£i(avi 4-v) = a/34-4)(^) will give the required extension, as the reader can
easily verify.
If a = 0, then, by hypothesis, (10.2) holds for any choice of /3. If
a > 0, then (10.2) holds if and only if
/3 < oT^ptavi 4- и) — а“х€о(и) = p(vi 4- a-1v) — ^(a-1^)
for each v eVq.
As v varies over all of Vo, so does a~rv. Hence (10.2) holds for a > 0
if and only if
(3 < inf{p(vi 4-u) — £q(u) \ u G Vb}.
Similarly, if a < 0, then (10.2) holds if and only if
—p(-Vi - q-1v) 4- 4)(—a-1v) = a“xp(avi 4- v) — а-1Л)(^) <
for each v G Vq. Hence (10.2) holds for a < 0 if and only if
sup{ -p(-vi 4- w) 4- £q(w) ' w EVq} < /3.
It follows that we can choose a suitable value for Л(^1) if
sup{ -p(-vi 4- w) 4- Zq(w) : w G Vq }
(10.3)
< inf{p(i>i + u) - £q(u) : и e Vo }.
582 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
We will now verify (10.3). For u, w G Vq we have
£0(u) 4- 4(w) = £0(u 4- w) < p(u 4- Vi - щ 4- w)
< p(u 4- Vi) 4- p(-vi 4- w).
Thus,
-p(-Vi 4- w) 4- ^o(w) < p(vi 4- u) - £q(u).
Since и and w are arbitrary members of Vq, it follows that (10.3) is valid?
Consider the collection £ of pairs of the form (L, W), where TV is a
linear subspace of V containing Vq and L is a linear functional on W that
agrees with on Vo and satisfies L(w) < p(w) for each w G W. We define
an order relation -< on £ by (Iq, Wi) -< (L2, W%) if Wi C W2 and L2 agrees
with L\ on W\.
As the reader is asked to verify in Exercise 10.2, -< is a partial or-
dering and each chain in £ has an upper bound. It follows from Zorn’s
lemma (page 17) that £ has a maximal element (-C,K;) with respect to
the ordering -<
To complete the proof of the theorem, it suffices to show that V^ = V.
Suppose to the contrary that V£ / 0 and let G V°. Then we can apply
the argument used in the beginning of the proof with V^ replacing Vq,
replacing and replacing Vi to obtain a linear functional on
span(l<u U {vw}) such that (C5span(K, U {v^})) G £ and
(4v>K>) (£u,span(K, U {v^})).
Thus, we have reached a contradiction to the maximality of (Л^ИД It
follows that Vu = V.	
EXAMPLE 10.1 Illustrates the Hahn-Banach Theorem
Let £^°(jV) or, more briefly, ££°, denote the real linear space consisting of
all bounded sequences of real numbers and set
E = { x G : lim xn exists }.
f Having succeeded in finding a suitable extension of the linear functional £0 to the
subspace Vi = span(Vo U {vi }), we could now proceed inductively to find a sequence of
subspaces of the form Vn = span(Vb U fyi, г>2> • • •»vn}) and corresponding linear
functionals £n, that extend Iq and satisfy £n(w) < p(u), in the hope that the Vns would
exhaust V. That this approach cannot work in general can be seen by considering a
space V that is not the span of a countable set. Nevertheless, as the following
argument shows, this idea becomes effective if we replace the inductive procedure by
“transfinite induction” based on Zorn’s lemma.
10.1 The Hahn-Banach Theorem □ 583
Consider the linear functional Lo on E defined by Lq(x) = lim^oo xn. We
will use the Hahn-Banach theorem to extend Lq to all of
Define p: -► H by p(x) = limsupn_oo n”1 £X=i xk- Then p satis-
fies the hypotheses of the Hahn-Banach theorem and, according to part (b)
of Exercise 2.35 on page 56, p(x) = Lq(x) for all x G E. It follows that
there is a linear functional L on that agrees with Lq on E and satisfies
L(x) < p{x) for all x G The functional L shares with Lq the following
properties:
liminf xn < L(x) < limsupzn	and L(x) =	(Ю.4)
n~*oo
where = zn+i for each n. (See Exercise 10.3.) For this reason L can be
thought of as assigning a “generalized limit” to any bounded sequence of
real numbers. Linear functionals on ££° satisfying (10.4) are called Banach
limits.	□
Next we present a version of the Hahn-Banach theorem that is valid
in the case of complex scalars.
THEOREM 10.2 Hahn-Banach Theorem (Complex Version)
Let V be a linear space with complex scalars and p a real-valued function
on V such that
p(u 4- v) < p(u) 4- p(v), it, v G V
and
p(av) = |a|p(v), v G V, a G C.
Suppose Vq is a linear subspace of V and £q is a linear functional on Vq
such that
|^o(v)| <p(v), ve Vq.
Then there exists a linear functional £ on V such that £(v) = £o(v) for each
v G Vo |/?(u)| P(w) ^ог eac^ V.
PROOF: As 1Z С C, it follows that V and Vq are also linear spaces with
respect to 71. Furthermore, JJfo is linear with respect to real scalars and
satisfies Э?£о(^) < l^o(^)| < p(v) for each v G Vq. Hence, we can apply
Theorem 10.1 to obtain a function £r on V that is linear with respect to
real scalars and satisfies £r(v) = 3Wo(^) for each v E Vq and £r(u) < p(u)
for each и G V.
584 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
We note that ?R£o(iv) = -S£o(^) and, so, 4)(^) =	~ i3W0(w)-
Thus, the function £ defined on V by £{u) = £T(u) — i£r(iu) agrees with £o
on Vq and it is easy to see that £ is linear with respect to complex scalars.
We will complete the proof by showing that |-£(u)| < p(u) for each
и € V. Choosing a complex number a such that ]a| = 1 and a£(u) = |€(u)|,
we have
|£(u)| = £(au) = £r(au) < p(au) = |a|p(u) = p(u),
as required.	
COROLLARY 10.1
Let S be a linear subspace of the normed space Q and let £ € S*. Then
there exists an L G fl* such that L\$ = £ and ||L||* = ||£||*.
PROOF: Apply Theorem 10.2 with p(x) = ||£||*||x||.	
Armed with Theorem 10.2, we can now prove the conjecture made
on page 580 in the paragraph prior to the statement of the Hahn-Banach
theorem.
THEOREM 10.3
Let К be a closed linear subspace of the normed space Q and let x e SI.
Then there is an Xq G K"*1 such that ||xq||# < 1 and
p(x, K) = inf{ ||x - 2/|| : у G К }
= sup{ |(x,x*)| : x* G and ||x*||* < 1} = (x,Xq).
PROOF: We will prove the theorem under the additional assumption that
x G Kc, leaving to the reader the case x G К as Exercise 10.4.
Let Vo = {otx+y : a G С, у G К } and £q the linear functional defined
on И) by £Q(otx + у) = ap(x, K). Then |£0(a# + y)\ < ||as + p||. Thus,
£o satisfies the hypotheses of Theorem 10.2 with p = || ||. It follows that
there is a linear functional Xq on fl satisfying
Xq(olx + у) = ap(x, K“), a G С, у G К	(10.5)
and
l*5(*)l < ll< * G Q.	(10.6)
10.1 The Hahn-Banach Theorem □ 585
The relations (10.5) and (10.6) show that we have x$ g K1, ll^oll* —
and Xq(x) = p(rr, K).
Finally, let x* G K1- with ||rr* ||* <1. If у G K, then we have
1(*.И1 = |x‘(x) - x*(j/)| < ||x*||.||x - 2/11 < ||x - j/||.
Taking the infimum over у G К, we get that |(rr, rr*)| < p(x, K). Therefore,
we have
sup{ |(z,x*)| : x* G and ||z*||* < 1} < p(x, K) = (x,Xq).
The reverse inequality is trivial.	
As our first application of Theorem 10.3, we use it to prove an attrac-
tive and useful symmetry between the norms || || and || ||*.
COROLLARY 10.2
Let (fi, || ||) be a normed space, xq G fi, and Xq G fi*. Then
||x0|| = sup{ |(x0,x*)|: ||x*||, < 1}	(10.7)
and
Цх^Н» = sup{ |(x,Xq)| : ||x|| < 1},	(Ю.8)
and, moreover, the supremum in (10.7) is attained. In particular, if xq / 0,
then there is an x* G fi* having norm 1 such that ||xo|| = {xq,x*).
PROOF: Equation (10.8) is just the definition of the norm of the bounded
linear functional Xq. To obtain (10.7), we apply Theorem 10.3 with x = xq
and К — {0} to obtain an x$ G fi* having the properties ||zj ||* < 1 and
xj(rro) = p(zo, {0}) = H^oll- It follows that
Ikoll < sup{|(xo,x*)| : ||x‘||. < 1}.	(Ю.9)
On the other hand, we have |(xq, x*)| < ||zo|| whenever ||z*||* < 1. Conse-
quently, the reverse of (10.9) is also valid. Finally, since (zo/lko||) = 1,
we see that ||ar*||* = 1.	
586 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
EXAMPLE 10.2 Illustrates Theorem 10.3
Let fi be a compact Hausdorff space and F a nonempty closed subset of fi.
We consider the linear subspace of C(fi) defined by
Tp = { / € С(П) : f(x) = 0 for each x € F }.
Theorem 9.15 on page 574 shows that we can identify C(Q)* with the
space M(fi) of regular Borel measures on fi.
We claim that
. Ti = {M G M(fi) : |/i|(Fc) = 0}.	(10.10)
That Tp contains the right-hand side of (10.10) follows easily from the
equation fQf dp = fF f dp 4- JFc f dp.
To show that T^ is contained in the right-hand side of (10.10), it
suffices to prove that p(E) — 0 whenever p ETp and E is a closed subset
of Fc. Let e > 0 be given. By the regularity of the measure p there is an
open set О such that E С О and |/i|(O \ E) < e. Furthermore, we can
choose О so that О C Fc.
Next, we use Urysohn’s lemma to obtain a continuous function g such
that 0 < g < 1, g(E) = {1}, and g(Oc) = {0}. Then g G Tf and, hence,
0= / gdp = p(E) 4- / gdp.
Jq	Jo\e
It follows that
|д(Я)| =
<|/z|(<9\^)<6.
Thus, p(E) = 0.
Having established (10.10), we can now use Theorem 10.3 to assert
that for each f G C(fi), there is a regular Borel measure p0 such that
1мо|(П) < 1, |zio|(Fc) = 0, and
P(f,lF) = SUp
( fdji : |д|(П) < 1 and |д|(Гс) = (A = [ f dp,Q
Q	) JO,
for some measure pQ. (See Exercise 10.12.)
It is also clear from (10.10) that Tp can be identified with M(F) since
it consists of measures in M (fi) that vanish on Borel subsets of Fc. □
10.1 The Hahn-Banach Theorem □ 587
EXAMPLE 10.3 Illustrates Corollary 10.2
a)	Consider the space £p(/z), where 1 < p < oo, and let q be defined by
1/p+l/q = 1. Applying the Riesz representation theorem for £p-spaces
(Theorem 9.12 on page 559) and Corollary 10.2, we obtain the following
fact: If f G £p(/z), then there is a function g G such that ||p||g < 1
and ll/llp = fnfgdp.
b)	We can use Corollary 10.2 to show that Theorem 9.12 cannot be ex-
tended to the case p = oo. To do that, consider the space £°° and
define {rn}^=1 G £°° by xn = n/(n 4-1). Note that ||x||oo = 1- If we
could extend Theorem 9.12 to hold for p = oo, then we could apply
Corollary 10.2 to find a у G t1 such that
oo	oo	oo
52	= Mi =1 = ikiioo =
n=l	n=l	n=l
However, this is impossible because the quantity on the right-hand side
is in modulus strictly less than the quantity on the left.	□
More on Duality
As we mentioned at the beginning of this section, the notation (x,x*) is
motivated by Hilbert space theory. Specifically, if TL is a Hilbert space,
then each bounded linear functional on H is of the form (,y(x) = (x,y) for
some у e 7Y and, moreover, ||£y ||* = ||j/||.
The correspondence j-/H —► 7Y* defined by j(y) = ty is, therefore,
onto and isometric. It is also almost, but not quite, linear. Indeed, we
have j(y 4- z) = j(y) 4- j(z) and j(ay) = aj(y). Due to these properties
of j, we can use it to identify W* and H. When this identification is made,
it becomes clear that the notation
A1 = {z* G fi* : {x,x*} = 0 for all x G A }
has the same meaning for Hilbert spaces as the notation for the orthogonal
complement of a set introduced in Definition 9.7 on page 542.
When fi is a normed space, but not a Hilbert space, and А С fi, then
A1 no longer resides in fi but, rather, in fi*. Thus, while the notation
(A.-1-)-1- makes sense in Hilbert spaces, it does not in general normed spaces.
One way to generalize the notation for a double orthogonal comple-
ment from inner product spaces to general normed spaces is to define, for
В C fi*,
= { x G fi : (x,x*) = 0 for all / G В }.
Then an analogue of the notation that makes sense in arbitrary
normed spaces is ±(A±).
588 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
THEOREM 10.4
Let Q be a normed space, A C fl, and В C fl*. Then the following hold:
a) LB is a closed linear subspace of fl.
Ъ) ±(А±) = span A.
c) If A1- = {0}, then span A is dense in fl.
PROOF: The proof of (a) is left to the reader as Exercise 10.11 and (c) fol-
lows immediately from (b). Thus, we move on to the verification of (b).
From (a), we know that ^(A-1-) is a closed linear subspace of fl. Therefore,
because А С ±(А±), we have span A C ±(A±).
To prove the reverse inclusion, let x G X(AX). Applying Theorem 10.3
with К = span A, we conclude that there exists an x* G span A such that
p(x,spanA) = (x,x*). Because spanA C A-1-, it follows that (x,x*) = 0.
Thus, x G spanA.	
EXAMPLE 10.4 Illustrates Theorem 10.4
In this example we make use of the following result from the theory of
analytic functions: Let д be a function that is analytic on a connected
open subset О of C. If there is a sequence {bn}^=i of distinct elements
of О such that b = limn^oo bn exists and belongs to О and д(Ъп) = 0 for
each n G Af, then д vanishes identically on all of O.
Consider the space C([a,5]), where а < b and 0	[a, 6]. For a G C,
define fa(x) = l/(rr —a) and let {an}^Li be a sequence of distinct elements
of C \ [a, b] with lim^oo an — 0. We will prove that span{ fan : n G Af } is
dense in C([a, 5]) by showing that {/fln : nGJV}1 = {0}.
Suppose p G { fan '•ntNJ1 and let
9&)	= I
Because д is analytic on the open connected set C \ [a, b] and g(an) = 0 for
n G AT, we conclude that g vanishes identically on C \ [a, 6]. This, in turn,
implies that the nth derivative
vanishes identically on C \ [a, 6]. In particular, then, f^a x~n dp(x) = 0
for each n G Л’.
It follows from the complex version of the Stone-Weierstrass theo-
rem (page 522) that the span of the functions 1, or1, rr“"2, ... is dense
10.1 The Hahn-Banach Theorem □ 589
in C([a, b]). From this we can conclude that	= 0 for
each f e C([a, b]) and, consequently, /1 = 0. We leave the details for the
reader as Exercise 10.13.	□
EXERCISES 10.1
10.1	Let A be a subset of the normed space Q. Show that A1' is a closed linear
subspace of Q*.
10.2	In the proof of Theorem 10.1, verify that -< is a partial ordering and that
each chain of 8 has a -<-upper bound.
10.3	Verify (10.4) in Example 10.1. Hint: See Exercise 2.35 on page 56.
10.4	Prove Theorem 10.3 in the case where x € K.
★10.5 For a normed space Q, let Q** denote the dual space of the dual space,
that is, (Q*)*. Let —> Q** be defined by (x*, J(x)) = (x,x*). Show
that J is a linear isometry.
10.6 Show that the mapping J defined in Exercise 10.5 is onto if Q is a Hilbert
space.
10.7 Show that the mapping J defined in Exercise 10.5 is onto if Q = £p(/z),
where 1 < p < oo.
10.8 Show that the mapping J defined in Exercise 10.5 is not onto if Q =
10.9 Prove that Q is a separable space if Q* is separable.
10.10 Show that the converse of the assertion of Exercise 10.9 is false.
10.11 Prove part (a) of Theorem 10.4.
10.12 Show that in Example 10.2, the measure /io can be chosen to be &6X for
some x E F and some constant a with |a| = 1.
10.13 Refer to Example 10.4.
a)	Show that the function g is analytic on C \ [a, b].
b)	Verify the formula g^n\z) = n! /(x — z)n+1 dp(x).
c)	Prove that span{ : n E A/"} is dense in C([a, 5]).
d)	Use part (c) and the fact that x~n dp(x) = 0 for each n E A' to
show that х~г/(х) dp(x) = 0 for all f E C([a, 5]).
e)	Use part (d) to conclude that fB x-1 dp(x) = 0 for each В E B([a, b]).
f) Deduce from part (e) that p = 0.
590 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
10.2 LINEAR OPERATORS ON BANACH SPACES
Linear operators (mappings) appear in most branches of mathematics, but
especially in analysis, where differentiation and integration are basic pro-
cesses. In this section we present some important general results about
continuous (bounded) linear operators on Banach spaces.
The Open Mapping Theorem
Let T be a continuous function from a topological space Q to a topological
space Л. Then, by definition, the inverse image of an open set under T is
open, that is, T“1(C7) is open in Q whenever U is open in Л.
On the other hand, as the reader is asked to verify in Exercise 10.14,
it is not generally true that the image of an open set under T is open, that
is, that T(O) is open in Л whenever О is open in Q. However, if Q and Л
are Banach spaces and T is linear, continuous, and onto, then our next
theorem, called the open mapping theorem, shows that T does carry
open sets to open sets.
We will employ the following notation. Suppose A and В are subsets
of a linear space and a is a scalar. Then
A + В = {x + у : x e А, у e B}
and
aA = {ax : x e A}.
Furthermore, we define A ~ В = A -I- (-l)B and x + В = {ж} + В.
Note that, in a normed space Q, we have
Br(x) = x + rj?i(0)
for all x G Q and r > 0.
THEOREM 10.5 Open Mapping Theorem
Let Q and Л be Banach spaces and T: Q —► Л be continuous, linear, and
onto. Then T(O) is open in A whenever О is open in Q.
PROOF: We claim that it suffices to prove that there is an б > 0 such that
B6(0) C^B^O)).	(10.11)
10.2 Linear Operators on Banach Spaces □ 591
Indeed, suppose that (10.11) holds. If О G Q is open, then for each x G O,
there is a 6 > 0 such that x + <5Bx(0) = В$(х) С O. Hence, by (10.11),
B6c(T(x)) = T(x) + <5eBx(0) = ВД + <5B€(0)
G Т(ж) + <5T(Bx(0)) = T(x + <5Bx(0)) C T(O).
As T(x) is an arbitrary point of T(O), it follows that T(Q) is open.
Therefore, to establish the theorem, we need only show that (10.11) is
valid. As a first step, we will prove that there is an e > 0 such that
ве(о)сад/2(о)).	(Ю.12)
Since fi - UXi п-®1/г(0), we have
Л = T(fi) = (j nT(B1/2(0)).
n=l
It follows from the Baire category theorem (page 494) that there exist
m G A/", т/o € A, and a > 0 such that Ва(уо) G mT(BX/2(0)) and, conse-
quently, that 2/o/^ + ^a/m(0) С Т{Вг/2^)) • Because y$/m G
we also have —y^/m G (—l)T(Bx/2(0)) = T(BX/2(0)). Thus,
^a/2m(0) = (l/2)Ba/m(0)
= -2/o/2m + (1/2) (2/0/m + Ba/m(0))
G (1/2)T(Bx/2(0)) + (1/2)T(Bx/2(0))
GT((1/2)Bx/2(0)) + (1/2)Bx/2(0))
CT(Bx/2(0j).
(See Exercise 10.15.) We have now verified (10.12) with e = a/2m.
Finally, we will derive (10.11) from (10.12). Let у G Be(0). By (10.12)
we can find an zx G Bx/2(0) such that \\y — T(xx)|| < e/2. As
2/-T(xx) G (l/2)Be(0) G (1/2)T(Bx/2(0)) = T(Bx/4(0)),
it follows that there exists rr2 G Bx/4(0) such that \\y—T(rrx)—T(x2)|| < e/4.
Proceeding in this fashion and using mathematical induction, we obtain a
sequence {zn}^=x of elements of Q such that
||xn|| < 2 n and
n
J=1
< e/2n.
592 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
Because П is a Banach space, Proposition 9.3 on page 532 implies that the
series Xn converges, say, to x. Noting that
oo	oo
и< £h„ii<E2~" = i’
n=l	n=l
we conclude that x G #i(0). By the continuity of T, we have T(x) = у
and, so, у G T(Bi(0)). We have shown that Bc(0) C T(Bi(0)).	
COROLLARY 10.3
Suppose that the operator T satisfies the hypotheses of the open mapping
theorem and is also one-to-one. Then the following hold:
a)	T-1 is linear and continuous.
ь)	iriir1 < iiit-iiii.
V^e have
щт-1г1Ы < ЦГ(х)|| < 1ЦТЦ1М
for all x G Q.
PROOF: It is easy to see that the inverse function T^.A —> Q is linear.
Moreover, as (T”1)“1(O) = T(O) is open in Л whenever О is open in Q,
T-1 is continuous. Thus, (a) holds.
By (a), we know that |||T”1||| < oo. For each у e Л, we have
112/11 = ||Г(Г-1(У))11 < Wil ||T_1(!/))|| < 111Л1 Г'1 III НИ,
from which (b) follows immediately.
To obtain (c), we need only prove the first inequality. We observe that,
for each x G Q,
' ||x|| = ||Г-1(Г(х))|| < 1117-411 ||7(x)||,
as required.	
EXAMPLE 10.5 Illustrates Corollary 10.3
It follows from Exercises 3.77 and 8.64 on pages 148 and 525, respectively,
that the Laplace transform L defined by
L(/)(s)= [ e~8Xf(x)dx, s>0,
Jo
10.2 Linear Operators on Banach Spaces □ 593
is a one-to-one linear operator from the Banach space (£1([0, oo)), || ||i)
into the Banach space (Co([0, oo)), || ||[o,oo)) • Because
|L(/)(S)|< f°° e~ax \f(x)\dx< Г° \f(x)\dx,
Jo	Jo
we have ||L(/)||[o,oo) < ll/lli- И follows that L is continuous and |||L||| < 1.
We will use Corollary 10.3 to show that L is not onto, that is, there
are functions in <7o([O, oo)) that are not Laplace transforms of functions
in£x([0, oo)). Suppose to the contrary that L is onto. By Corollary 10.3(c),
there is a positive constant c such that
cll/llx < ||L(/)||[0,oc)	(10.13)
for all f G Z31 ([0, oo)). Let n G X and define
fn = X[n,n-|-1)	X[n4-l,n+2)*
Then \\fn||i = 2. Moreover,
L(/n)(s) = e-n\e'2s - 2e~s + l)/s = se~ns((e“5 - l)/s)2.
It is easy to check that the maximum of ((e”s — l)/s)2 is 1 and that the
maximum of is 1/ne. Thus, using (10.13) we obtain 2c < 1/ne. As
ntM was chosen arbitrarily, we conclude that c = 0, a contradiction. □
One can get a better appreciation for the power of the open mapping
theorem by trying to explicitly construct a function in Cb([0, oo)) that is
not a Laplace transform of a function in £x([0, oo)). We leave that to
the reader and, instead, present another interesting corollary of the open
mapping theorem.
COROLLARY 10.4
Let Q be a linear space. Suppose that || || and || ||o are norms on Q
such that (Q, || ||) and (Q, || ||o) are Banach spaces. If there is a positive
constant a such that
||z|| < a||z||o, x e fl,
(10.14)
then there is a positive constant /3 such that
Ikllo < Ж1, x e n.
(10.15)
594 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
PROOF: From (10.14), we see that the identity map
Z:(Q,|| ||o) — (П,|| II)
is continuous. The relation (10.15) now follows from Corollary 10.3.	
We note that Corollary 10.4 shows that the topology of a Banach
space (Q, || ||) cannot be strictly weaker than the topology induced by a
norm || ||o for which (Q, || ||0) is complete. And, as the reader is asked
to verify in the exercises, Corollary 10.4 can also be used to prove that
any finite dimensional normed space is isomorphic to either Cn or 1Zn for
some n e AT.
The Closed Graph Theorem
Another application of Corollary 10.4 provides a condition equivalent to
continuity for linear operators on Banach spaces. Let Q and A be normed
spaces and T:Q —► A a linear operator. Then, as we know from Exer-
cise 7.53 on page 437, T is continuous if and only if it satisfies
lim xn = x => lim T(xn) = T(x).
n—>oo	n—*oo
A weaker condition on T is that
lim xn = x and lim T(xn) = у => у = T(x).	(10.16)
n—+oo	n—*oo
A linear operator satisfying (10.16) is said to be closed. We use that
terminology because (10.16) is equivalent to the condition that the graph
of T—the set {(rr,T(a;)) : x E Q} — is a closed subset of the product
space Q x A.
In our next theorem, called the closed graph theorem, we will prove
that, when Q and A are Banach spaces, not only is being closed a necessary
condition for a linear operator to be continuous, but it is also sufficient.
First, however, we present an example showing that, in general, a closed
linear operator need not be continuous.
EXAMPLE 10.6 A Discontinuous Closed Linear Operator
Let Q = { f : /' e C([0,1])} and A = C([0,1]), both equipped with the
sup-norm, and let D: Q —> A be the differentiation operator, D(f) =
Suppose that is a sequence of functions in Q such that fn~>f and
D(fn) ~It follows from these assumptions and the second fundamental
theorem of calculus that f(x) = /(0) 4- Jq g(t)dt. This, in turn, implies
that D(f) = g and hence that D is closed. But D is not continuous, as can
be seen by considering the sequence fn(x) = sin(nrr)/n.	□
10.2 Linear Operators on Banach Spaces □ 595
THEOREM 10.6 Closed Graph Theorem
Let Q and Л be Banach spaces and T: Q —* Л a linear operator. If T is
closed, then it is continuous.
PROOF: We define a second norm on Q by
Ионм + imii
and show that (Q, || ||o) is a Banach space. Let {xn}Xi a Cauchy
sequence with respect to || ||o- Because ||xn — £m|| < ||xn — xm||o and
||T(zn) - T(xm)У < ||zn “ Zm||o, it follows that {zn}£°=1 and {T(x„)}™=1
are Cauchy sequences in (Q, || ||) and Л, respectively. Hence, x = limrrn
and у = limjT(2:n) both exist. Because, by assumption, T is closed, we
have у = T(x). It follows that
lim ||xn - rc||o = lim ||xn - z|| + lim ||T(a:n) - T(x)|| = 0.
n—*oo	n—>oo	n—+OO
Thus, (Q, || ||o) is a Banach space.
Since ||x|| < ||rr||o for all ж G fi, it follows from Corollary 10.4 that
there is a positive constant /3 such that ||T(rr)|| < ||ж||о < /?||я|| for all
x G fi. Hence, T is continuous.	
The Uniform Boundedness Principle for Linear Operators
A look at the proof of the open mapping theorem reveals that it is a con-
sequence of the Baire category theorem. We conclude this section with
another application of the Baire category theorem to the theory of linear
operators on Banach spaces.
THEOREM 10.7 Uniform Boundedness Principle for Linear Operators
Suppose that T is a collection of continuous linear operators from a Banach
space fi into a normed space A. If sup{ ||T(x) || : T G T} < oo for each
x G Q, then sup{ |||T||| : T G T } < oo.
PROOF: By Theorem 8.2 on page 497, there exists an x§ G Q and a <5 > 0
such that
M = sup{ ||T(x)|| : Tg7, ||ж — ж0|| < 6 } < oo.
If ||u|| < 1, then
||T(u)|| < г?-1 (ПГ^о + Ml + ||T(xo)||) < 2M6-1.
It follows that |||T||| < 2M6"1 for each T G T.	
596 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
EXERCISES 10.2
10.14	Let Q and A be Banach spaces.
a)	Provide an example of a linear and continuous mapping from Q into A
that does not take open sets to open sets.
b)	Provide an example of a continuous mapping from Q onto A that does
not take open sets to open sets.
10.15	Show that if A and В are subsets of a normed linear space and a is a
nonzero scalar, then aA = aA and x + A = x + A, A + В С A + B, and
A-Вс A^B.
10.16	Provide an example of a continuous linear operator T:£l —> A, where Q
and A are normed spaces, such that T is one-to-one and onto but T-1 is
not continuous.
Exercises 10.17 and 10.18 show that every finite dimensional normed space is
isomorphic to either Cn or Нп for some n E Л'. Recall that the dimension of a
linear space is the number of elements in a Hamel basis.
10.17	Let Q be a finite dimensional normed space.
a)	Show that the dimension of Q* is at most the dimension of Q. Hint:
What is the dimension of the space of all linear functionals on Q?
b)	Use Exercise 10.5 on page 589 and Proposition 9.2 on page 530 to
deduce that Q is complete.
10.18	Let Q be a finite dimensional normed space and {zi,яг, • • •, £n} a Hamel
basis for Q. Recall that each x 6 Q can be written uniquely in the form
x = aj(x)xj, where the aj(x)s are scalars. For x 6 Q, define
/ n	\ 1/2
p(®) = (52 M*)!2)
S=i	/
a)	Show that p is a norm on Q.
b)	Show that there exist positive constants a and fi such that for each
x 6 Q, we have ap(x) < ||x|| < fip(x).
c)	Deduce that Q is isomorphic to Cn or to TV1 in the case of complex or
real scalars, respectively.
10.19	Let Q be a normed linear space such that Bi(0) is compact. Prove that
Q is finite dimensional. Hint: Find xrj, xj,.. •, x*n E Q* such that
n
{x :M = 1}cU{*:|MI>1/2}
j=l
and consider the mapping L(x) = ((x, xj), (x, zj),..., (x, x*)).
10.3 Topological Linear Spaces □ 597
In Exercises 10.20-10.22, we consider projections of a normed space. Let Q be a
normed space. A linear operator P : Q —> Q is called a projection if both the
range of P and P“1({0}) are closed and P о P = P. Exercise 9.26 on page 544
shows that orthogonal projections, as defined in Section 9.2 on page 541, are
projections in the sense defined here.
10.20	Show that if Q is a Banach space, then all projections of Q are continuous.
10.21	Show that if P is a projection on Q, then ||x — P(x)|| > p(x, range P) for
each x € Q.
10.22	Let К be a finite dimensional subspace of Q. Show that there is a projec-
tion of Q with range equal to K.
Exercises 10.23-10.27 elaborate on Example 9.11 on page 547. Let Sn denote the
linear operator defined on C([0,2тг]) by Sn(f) = sn.
10.23	Refer to the definition of a projection given in the paragraph prior to
Exercise 10.20. Show that Sn is a projection of С([0,2тг]) having range
given by Un — span{ еь : —n < к < n }.
10.24	Show that
l|S„(/) - /||[0,2Ж) > p(/,Wn) > |/(2tt) - /(0)1/2.
Deduce that /(2тг) = /(0) is a necessary condition for the uniform conver-
gence of the Fourier series f(k)etkx to the function /(x).
10.25	Show that for each / 6 C([0,2тг]),
Sn(f)(x) = (2ТГ)-1 Г
Jo	sm((x t)/2)
10.26	Let С„([0,2тг]) = { f € C([0,2тг]) : /(0) = /(2тг) }. Show that
sup { |S„(/)(x)|: / 6 CP([0,2тг]), ||/||[0,2w) < 1}
/9_л-1	sin((n+l/2)(x-t)) ,
- (2’> /, sin((a: —1)/2)
★ 10.27 Show that there is a function in Ср([0,27г]) whose Fourier series diverges
at some point.
10.3 TOPOLOGICAL LINEAR SPACES
Let fi be a locally compact Hausdorff space. We recall from Section 7.12
(see page 483) that for S G fi,
pstfig) = sup{ |/(x)	: X e S}, f,g e C(D).
598 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
And we also recall (see page 484) that the weak topology on C(Q) de-
termined by the family of functions {рк(-,р) : К compact, g G C(Q)} is
called the topology of uniform convergence on compact sets and is denoted
by T(Q).
As we know, if fi is compact, then
. Il/lln = pn(/,o)=sup{|/(x)|:xen}, /еОД,
is a norm on the linear space C(Q) that induces the topology T(Q).
On the other hand, if Q is not compact, then the topology T(Q) on
the linear space C(Q) is not induced by a norm. Indeed, suppose to the
contrary that T(Q) is induced by a norm || ||. Because {f : ||/|| < 1} is
an open set containing 0, it follows from Proposition 7.7 on page 428 that
there exist nG X, compact sets /<2, • • •, Kn, and positive numbers
Ci, 62, ..., 6n such that
П{/ : pKi(M < } C {f : H/ll < 1}.	(10.17)
j=l
Because Q is not compact, there is an xq Uj=i Kj- Applying Theo-
rem 7.14 on page 477, we choose g G C(Q) such that p(rro) = 1 and g(x) = 0
for x G (Jj=i й now follows that from (10.17) that |a|||p|| = ||ap|| < 1
for each a G C. On the other hand, because g 0, we have ||p||	0. Hav-
ing reached a contradiction, we conclude that there is no norm inducing
the topology T(fi).
Nevertheless, it is clear that the topology and the linear structure
on C(Q) are related. In this section, we develop a theory of topological
linear spaces that encompasses not only normed spaces, but interesting
spaces like (C(Q),T(Q)) as well.
DEFINITION 10.1 Topological Linear Space
Let fi be a linear space with scalar field F and let T be a topology
on fi. Then we say that (fi, T) is a topological linear space if the
operations of addition and scalar multiplication are continuous, that
is, if the functions Л: fi x fi -> fi and M: F x Q —► f2 defined by
Л(я, У) = x + У and M(a, x) = ax
are continuous.
10.3 Topological Linear Spaces □ 599
EXAMPLE 10.7 Illustrates Definition 10.1
a)	Any normed space is a topological linear space.
b)	If fi is a locally compact Hausdorff space, then (C(S2),T(S2)) is a topo-
logical linear space, as the reader is asked to verify in Exercise 10.28.
c)	Let (П,Л,/г) be a measure space. For 0 < p < 1, the space £pQu) is
a topological linear space with respect to the topology induced by the
metric pp, where
Pp(/, 9) = [ \ f~9\p dp,,	f,ge
JO.
See Exercise 10.40.	□
Unless there is a danger of ambiguity, we will write Q for the topolog-
ical linear space (S1,T). In what follows, we assume that the scalar field
of Q is C unless we state otherwise. Our results are easily adapted to the
case of real scalars.
The next two propositions provide some basic properties of topological
linear spaces. Note that the second one shows that the topology of a topo-
logical linear space is determined by a neighborhood basis at 0. We leave
the proofs of both propositions to the reader as Exercises 10.29 and ДО,30.
PROPOSITION 10.1
Let fi be a topological linear space.
a)	Suppose that у G Q and a is a nonzero scalar. Then the mappings
T(x) = x + y and S(x) = ax are homeomorphisms of Q onto itself.
b)	Suppose that U is an open subset of П, A G SI, and a is a nonzero
scalar. Then A+ 17 and aU are open subsets of fi.
PROPOSITION 10.2
Let SI be a topological linear space. Then there is a collection W of open
sets containing 0 having the following properties:
a)	ИД, W2 G W => W3 C Wi П W2 for some W3 G W.
b)	W G W and x G W => there is a G W such that x + W\ C W.
c)	W G W => there is a W\ G W such that Wi + W\ C W.
d)	W G W => there is а ИД G W and an e > 0 such that aWi C W
whenever |a| < e.
e)	{ x 4- W : x G SI, W G W } is a basis for the open sets of SI.
Conversely, if W is a collection of subsets of a linear space SI satisfying
(a)-(d) and 0 G W for each W G W, then {rr + W : rr G SI, W G W } is a
basis for a topology T on Q and (SI, T) is a topological linear space.
600 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
Locally Convex Topological Linear Spaces and Seminorms
When fi is a normed space, we can take the collection W in Proposition 10.2
to be { Br (0)*: r > 0 }. Thus, in the important case of a normed space, the
sets in W can be assumed convex. This convexity property turns out to
be the key to a significant generalization of normed spaces, called locally
convex topological linear spaces.
DEFINITION 10.2 Locally Convex Topological Linear Space
A topological linear space fl is said to be locally convex if there is a
collection W of convex open sets containing 0 such that
a) Wi, W2 G W => W3 C Wi A W2 for some W3 G W.
b)	W G W and x € W => there is a G УУ such that x + C W.
c) W G W => there is a W\ G W such that Ж + Ж C W.
d)	W G W => there is a Wi G W and an e > 0 such that aW\ C W
whenever |a[ < e.
e)	{ x + W : x G fi, W G W } is a basis for the open sets of Q.
EXAMPLE 10.8 Illustrates Definition 10.2
a)	Any normed space is a locally convex topological linear space.
b)	As we will soon see, if Q is a locally compact Hausdorff space, then
(С(О),Т(П)) is a locally convex topological linear space.
c)	Exercise 10.41 shows that the space in Example 10.7(c) is not a locally
convex topological linear space.	□
Locally convex topological linear spaces are often defined in terms of
collections of objects called seminorms. A seminorm has the defining prop-
erties of a norm except that the seminorm of a nonzero element may be 0.
DEFINITION 10.3 Seminorm
Let fl be a linear space having as its scalar field F either or C. A
function a: Q —>	is said to be a seminorm on Q if it satisfies the
following conditions for all x,y G Q and a G F:
a) a(x) > 0.
b)	<t(0) = 0.
c)	crfotx) = |o|cr(o:).
d)	o(x + y) < cr(x) + o(y).
10.3 Topological Linear Spaces □ 601
Remark: Although condition (b) follows from condition (c), we have in-
cluded the former to retain the resemblance of a seminorm to a norm.
Let a be a seminorm on a linear space fi. Then for each x G Q and
r > 0, we define
B°(x) = {y : <?&-y) < r}-
It is important to note that, by the defining properties of a seminorm, sets
of the form B° (x) are convex.
EXAMPLE 10	.9 Illustrates Definition 10.3
a)	Any norm is a seminorm.
b)	Let fi be locally compact, noncompact, Hausdorff space and К a com-
pact subset of Q. The function || ||k defined by
\\Л\к = Pk(/,0) = sup{ \f(x)\ : x G K}, fE ОД,
is an example of a seminorm that is not a norm.
c)	If Ms a linear functional on a linear space V, then |f| is a seminorm
on V.	)	□
Let 5 be a collection of seminorms defined on a linear space Q. For
a E S and x G Q, define crx by
crx(y) = сг(х + у), у E Cl.
Then we define the topology induced by S to be the weak topology on Q
determined by the family of functions {ax : x G Q, cr G S }.
PROPOSITION 10.3
Let Cl be a linear space having the topology Ts induced by a family of
seminorms S and let W denote the collection of subsets of Cl consisting of
intersections of finitely many sets of the form B^(0). Then
a) { x 4- W : x G Q, W G W } is a basis for Ts-
b) (Cl, Ts) is a locally convex topological linear space.
PROOF: It can be shown that W satisfies the conditions (a)-(d) of Propo-
sition 10.2. The details are left to the reader as Exercise 10.31.	
EXAMPLE 10.	10 Topologies Induced by a Collection of Seminorms
a)	Let (Q, || ||) be a normed space. Then the topology induced by the
single-element collection 5 = {|| ||} is the same as the topology induced
by the norm || ||.
b)	Let Cl be a locally compact Hausdorff space. Then it follows from Exam-
ple 10.9(b) and Proposition 10.3 that (C(fi),T(f2)) is a locally convex
topological linear space.	□
602 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
PROPOSITION 10.4
Let Q be a linear space with a topology induced by a collection of semi-
norms S. Then the following hold:
a)	Q is Hausdorff if and only if x /0 => a{x) / 0 for some a eS.
b)	Suppose that	is a net in Q. Then limrrt = x if and only if
limcr(rrt — x) = 0 for each cr G 8.
PROOF:
a)	Suppose fi is Hausdorff and let x / 0. Then, by Proposition 10.3, there
is an e > 0 and а а € 5 such that x (0). Hence, cr(rr) > e. To prove
the converse, let x and у be distinct elements of Q. Then, by assumption,
there is an e > 0 and a G S such that cr(x — y) = e. It follows that
x + B^2(0) and у + B^2(0) are disjoint open sets containing x and 2/,
respectively. Hence, (Q,7s) is Hausdorff.
b)	By Proposition 7.13 on page 444, we know that limrrt = x if and only if
limay(xi) = ay(x) f°r each У € П and G S. Suppose that limxt = x
and let a G S, Then, setting у = — x, we get
lim cr(xL — x) = lim cr_x (x t) = a_x(x) — a(0) = 0.
Conversely, suppose that lim<r(a:t — x) = 0 for each a G S. Then, using
condition (d) of Definition 10.3, we get .
lim — cry(x) | < lim<r(a:t — x) = 0
for all у G fi and a G 5. Thus, limrrt = x.	
For topologies induced by collections of seminorms, there is a nice ana-
logue of Proposition 9.1 on page 529. We present this as Proposition 10.5
and leave the proof to the reader as Exercise 10.32.
PROPOSITION 10.5
Let Q and Л be linear spaces with the same scalar fields and having topolo-
gies induced, respectively, by the collections of seminorms Si and S2> For
a linear mapping T: Q —» A, the following are equivalent:
a)	T is continuous.
b)	T is continuous at 0.
c)	For each a G S2 there exist <ti,(T2, ... ,crn G <Si and a constant a such
that cr(T(rr)) < amax{ crj(rr) : j = 1,2,... n } for all x G Q.
We observe that if Л = C (with the usually topology), then the semi-
norm a in Proposition 10.5 is just the modulus of a complex number.
10.3 Topological Linear Spaces □ 603
Linear Functionals and Separation by Hyperplanes
Because continuous linear functionals play an important role in the the-
ory of normed spaces, one might expect that they would also be signifi-
cant in the theory of topological linear spaces. Surprisingly, though, there
are naturally arising examples of topological linear spaces having no con-
tinuous linear functionals other than the one that is identically 0. (See
Exercise 10.41.)
The situation becomes much more agreeable, however, if local convex-
ity is assumed. We will devote the remainder of this section to showing
that, in the locally convex case, there are an abundance of continuous lin-
ear functionals; indeed there are enough to separate elements from closed
convex subsets.
DEFINITION 10.4 Internal Point, Support Function
Let V be a linear space and A a convex subset of V.
a)	An element и G A is said to be an internal point of A if for each
v G V, there is an e > 0 such that и + av € A for all scalars & such
that |a| < 6.
b)	If 0 is an internal point of A, then the function зд defined on V by
sa (v) = inf{ r : r-1v G A, r > 0 }
is called the support function of A.
EXAMPLE 10.11 Illustrates Definition 10.4
Let (fi, || ||) be a normed space. Then 0 is an internal point of Bi(0) and,
as the reader is asked to verify in Exercise 10.34, «Bi(o) = II II-	□
Our next proposition, Proposition 10.6, shows that support functions
behave much like norms.
PROPOSITION 10.6
Let V be a linear space and 0 an internal point of the convex set A. Then
a)	ед (av) = asx(v) for all v GV and а > 0.
b)	sa(v! + v2) < sa(^i) + $a(v2) for all Vi, v2 G V.
c)	{ v : sA(v) < 1} С A C { v : sA(y} < 1}.
604 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
PROOF: We leave the proofs of (a) and (c) to the reader in Exercise 10.35.
To prove (b), let ri and Г2 be positive numbers such that rf xvi, € A.
As A is convex, we have that
(ri 4- r2)-1(vi + v2) = (n/(ri + r2))rf + (r2/(rj + r2))r21v2 e A.
Hence, + V2) < ri + 7*2- Taking infimums with respect to ri and 7*2,
we obtain (b).	
We will use support functions to study separation by a hyperplane.
To introduce that topic, let E and F be disjoint closed convex subsets of 1Z2.
Then, as shown in Fig. 10.1, E and F can be separated by a line L in the
following sense: Associated with L, there is a nontrivial linear functional t
on 'R? and a real number a such that L = £“1({a}), E C £~1((—00, a]),
and F C £~1([a, 00)). Note that
sup{ t(y) :vtE}< inf{t(u) :u e F}
is a necessary condition for such a separation.
FIGURE 10.1 Separation by a hyperplane.
Similarly, disjoint closed convex subsets of 7£3 can be separated by a
plane. In what follows, we will generalize these two simple examples to
locally convex topological linear spaces.
THEOREM 10.8
Let V be a linear space with real scalars. Suppose that Ai and A2 are
nonempty disjoint convex subsets of V and that Ai has an internal point.
Then there is a nontrivial linear functional I on V such that
sup{ £(v) : v e Ai } < inf{ £(u) : и E A? }.	(10.18)
10.3 Topological Linear Spaces □ 605
PROOF: Let Vi be an internal point of Ai, V2 be any point of A2, and
vo = V2 — Vi. Then it is easy to check that the set
A = vQ 4- Ai - A2
is convex and contains 0 as an internal point.
We define a linear functional £q on the subspace Vq = { avQ : a E К }
by 4)(av0) = a and will show that £q is dominated on Vo by s^. Because
Ai П A2 = 0, we have vq A. Hence, by Proposition 10.6, $a(vo) > 1.
Using Proposition 10.6 again, we conclude that for a > 0,
4(»v0) = a < asA{vo) = $д(аи0).
On the other hand, if a < 0, then £q(&vq) < sa(qvq) is trivially true.
We can now invoke the Hahn-Banach theorem (page 580) to obtain a
linear functional £ on V such that ^|y0 = £0 and
€(v) < sa(v), v E V.
Because £(yo) = 1, £ is not identically 0. Also, if v E Ai and и E A2, then
vq + v — и E A and, so, by Proposition 10.6,
1 4- £(y) — £(u) = £{vq 4- v — u) < sa(vq 4- v - u) < 1.
The inequality (10.18) now follows immediately.	
Next, we would like to prove a version of Theorem 10.3 (page 584) for
topological linear spaces. To do so, we will need the following lemma.
LEMMA 10.1
Let £ be a linear functional on the topological linear space fi. If there is
an open set W containing 0 such that sup{ SR£(rr) : x E W } < 00, then £ is
continuous.
PROOF: It suffices to show that if xq E Q and 6 > 0, then there is an open
set U containing 0 such that
£(xq 4- U) C {z: |^(rr0) - z\ < e }.
Let bo = sup{ 3W(x) : x E W }. Since 0 E W, it follows that bo > 0- Apply-
ing Proposition 10.2(d), we choose a 6 > 0 and an open set О containing 0
such that aO C W whenever |q| < 6. Let b > bo and set
U = eb~1 U aO.
|Q|<«
606 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
Then U is open and eltU CU for all t G
Let у eU. Selecting t so that |^(?/)| = elt£(y) we get
|£(y)| = е«£(у) = Ш^у) = eb~19i£(e~1beity) < eb^bg < e.
Hence, |£(xo) £(xo + У)\ < e-	®
THEOREM 10.9
Let F be a nonempty closed convex subset of a locally convex topological
linear space fi. If Xq G Fc, then there is a continuous linear functional £
on fi such that У1£(хо) < inf{ ?R£(rr) : x G F }.
PROOF: The set — xq 4- F is convex and closed and does not contain 0.
Since fi is locally convex, there is a convex open set U that contains 0 and
is disjoint from —Xq + F. And because scalar multiplication is continuous,
the point 0 is an internal point of U. Thus, by Theorem 10.8, there is a
nontrivial linear functional £ on fi such that
sup{ 3t£(u) : и G U } < inf{ 3W(v) : v G -xq + F }.
Applying Lemma 10.1, we conclude that £ is continuous.
Because 0 G U, we have sup{ У1£(и) : и G U } > 0. We claim that
this inequality is strict. Suppose to the contrary and let z G fi. Then by
choosing c > 0 small enough, we get eltez G U for all t G and, hence,
< 0 for all t G 7£. Upon selecting t so that elt£(z) = |£(z)|, it
follows that £(z) = 0. We have shown that £ is identically 0, a contradiction.
From sup{ У1£(и) : и G U} > 0 and the previous displayed inequality,
we conclude that inf{ 3t£(v) : v G — xq + F} > 0. Consequently, we have
JW(rro) < inf{ 3t£(rr) : x G F }.	
Remaining consistent with the notation used for normed spaces, we will
write fi* for the space of all continuous linear functionals on the topological
linear space fi and (x, rr*) for x*(x) when x G fi and rr* G fi*. Theorem 10.9
shows that when fi is locally convex and Hausdorff, fi* has enough members
to separate the points of fi. We refer the reader to Exercises 10.40-10.41 for
an example of a topological linear space where the only continuous linear
functional is identically zero.
When the convex set in Theorem 10.9 is a linear subspace of fi, we have
the following refinement, which is also a generalization of Theorem 10.3
on page 584.
10.3 Topological Linear Spaces □ 607
COROLLARY 10.5
Let К be a closed linear subspace of the locally convex topological linear
space Q and let x G Kc. Then there is an x$ G such that (x,Xq) > 0.
PROOF: By Theorem 10.9, there is a continuous linear functional xj such
that x\) < inf{ x*) : у G K}. Let у € К and b > 0. Choose t so
that ett{y1Xi) = |(з/,Because —be1* у G K, we have
< Щ-Ьеиу,х$) = Щ-Ьеи{у,х^У) = -b\{y, zj>|.
As b is an arbitrary positive number, it follows that (?/, xj) =0. If s is
chosen so that els{x, x*) = |(x, x*) |, then the functional Xq = eisrc* satisfies
the assertions of the corollary.	
The next corollary shows that the point xq in Theorem 10.9 can be
replaced by a compact convex set disjoint from F.
COROLLARY 10.6
Let fi be a locally convex topological linear space. Suppose that F and К
are, respectively, nonempty closed and compact convex subsets of Q such
that F A К = 0. Then there is an x* G Q* such that
sup{ 3t{y, x*) : у G К } < inf {, x*) : x G F }.
PROOF: Clearly F — К is convex; we claim that it is also closed. Let
~ be a net in F — К such that lim(rrt — yL) = z. Because К is
compact, there is a subnet {yLrj}vtT that converges to an element у G K.
It follows that	is a net in F converging to z + У- Because F is
closed, z + у G F; hence, z E F — K. We have now shown that F — К is
closed.
Because F and К are disjoint, 0 F — K. Hence, we can apply
Theorem 10.9 to obtain a continuous linear functional x* such that
0 = (0, x*) < inf{ — 2/, x*) : x G F, у G K}
= inf{ x*} : x G F } - sup{ 5?(т/, x*) :y G K},
as required.	
In the remaining two sections of this chapter, we will see many appli-
cations of Theorem 10.9 and its corollaries. Here we content ourselves with
the following simple examples.
608 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
EXAMPLE 10.12 Illustrates Theorem 10.9
a)	Let F be a nonempty closed convex subset of the normed space Q.
By Theorem 10.9, if 0 F, then there exists an rr* G Q* such that
inf{ x*) : x G F } > 0.
b)	Let F = { f G C([0,1]) : SR/(t) > t}. As 0 £ F, part (a) guarantees the
existence of an x* G'C([0,1])* such that inf{ : f G F } > 0. The
continuous linear functional on С([0,1]) defined by {g,x*) = g(t)dt
satisfies that condition. In fact, the infimum is 1/2.
c)	If we replace C([0,1]) by C(TZ) in part (b), but use the same F and x*,
then we obtain an illustration of Theorem 10.9 for the case of a locally
convex topological linear space that is not a normed space.	□
EXERCISES 10.3
10.28	Let Q be a locally compact Hausdorff space. Prove that (C(Q),T(Q)) is a
topological linear space.
10.29	Prove Proposition 10.1.
10.30	Prove Proposition 10.2.
10.31	Prove Proposition 10.3.
10.32	Prove Proposition 10.5.
10.33	Let U be an open convex subset of a topological linear space. Show that
all points of U are internal points.
10.34	Show that in a normed space, the support function of the open unit ball
around 0 is equal to the norm.
10.35	Prove (a) and (c) of Proposition 10.6.
10.36	Let £ be a linear functional on a topological linear space Q. Prove that the
following conditions on £ are equivalent:
a)	£ is continuous.
b)	£ is continuous at some point of Q.
c)	sup{ SR£(u) : и G U } < oo for some nonempty open set U.
d)	inf{ Э1£(и) : и G U } > —oo for some nonempty open set U.
e)	sup{ |£(u)| : и G U } < oo for some nonempty open set U.
10.37	Show that Corollary 10.6 fails if the compactness assumption on К is
replaced by the assumption that К is closed.
10.38	Let A and В be subsets of a topological linear space Q. Show that if A is
closed and В is compact, then A 4- В is closed.
10.39	Show that if a topological linear space is locally convex and Ti (i.e., single-
element subsets are closed), then it is Hausdorff.
10.4 Weak and Weak* Topologies □' 609
10.40	Consider the space £p([0,1]), where 0 < p < 1. By Exercise 9.54 on
page 562, the function pp defined by
Pp(/,p) = ap(/-p)= [ \f - g\p dX, /,p 6 £p([0,l])
./[0,1]
is a metric on £p([0,1]). Show that £p([0,1]) is a topological linear space
with respect to the topology induced by pp.
10.41	Refer to Exercise 10.40.
a)	Show that (£p([0,1]))* contains only the functional that is identically 0.
b)	Deduce that £p([0,1]) is not locally convex when 0 < p < 1.
In Exercises 10.42-10.44, C°°(7£) denotes the space of complex-valued functions
having derivatives of all orders at each point of 1Z. For nonnegative integers n
and m, define
a„,m(/)=sup{|t|n|/(ra)(t)|	/€C°°(7£).
We will consider the space
S(7£) = { / б С°°(7г) : an,m(/) < oo, n, m = 0,1,2,...}.
10.42	Let the notation be as in the previous paragraph.
a)	Show that S(1Z) is a linear space.
b)	Show that functions of the form p(x)e~x , where p(x) is a polynomial,
belong to .
10.43	Let the notation be as in the foregoing.
a)	Show that { ап,тп : n, m — 0,1,... } is a family of seminorms inducing
a Hausdorff topology on S(7£).
b)	Show that the linear operators D(f) = f and M(/) = p/, where p is
a polynomial, are continuous with respect to this topology.
10.44	Call a subset J7 C bounded if
sup{ crn,m(/) : f e T} < 00
for each pair of nonnegative integers n and m. Let S(7£) have the topology
defined in Exercise 10.43. Prove the following version of the Heine-Borel
theorem: T7 C S(K) is compact if and only if it is closed and bounded.
10.4	WEAK AND WEAK* TOPOLOGIES
In this section we will introduce and discuss the main properties of the
weak topology on a normed space and the weak* topology on its dual
space. Included will be an investigation of weak convergence and weak
610 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
boundedness of sequences in a normed space and an important result about
weak* compactness.
Let t be a linear functional on a linear space V. Then, as we have
seen, |^| is a seminorm on V. Therefore, according to Proposition 10.3
on page 601, if T is a family of linear functionals on V, the collection of
seminorms { |£| : I e T } induces a locally convex topology on V. The cases
where V is a normed space or its dual are important enough to warrant a
special definition.
DEFINITION 10.5 Weak and Weak* Topologies
Let fi be normed space. For each x 6 fi, define the linear functional
tx on fi* by €x(x*) — x*(x) = (x,x*). Then we use the following
terminology:
•	Weak topology: the topology on fi induced by the collection of
seminorms { |x*| : x* 6 fi* }.
•	Weak* topology: the topology on fi* induced by the collection of
seminorms { |£ж | : x 6 fi }.
Because we work with the norm topologies on fi and fi* as well as
the weak and weak* topologies, it is useful to have a convenient way to
distinguish these topologies. When no modifier is used, we assume that
the topology is the norm topology. So, for instance, when we say that a
function is continuous, we mean that it is continuous with respect to the
norm topology, and when we say that a set is closed, we mean that it is
closed with respect to the norm topology.
On the other hand, we will employ the words weak and weakly to
indicate “with respect to the weak topology” and, similarly, use the term
weak* to indicate “with respect to the weak* topology.” Thus, for example,
a function that is continuous with respect to the weak* topology is called
weak* continuous and a set that is closed in the weak topology is called
weak closed or weakly closed.
Let {xL}Lei and	be nets in fi and fi*, respectively. Then we
use the notation
wlimxt = x and	w*lima;* = x*	(10.19)
to denote, respectively, that	i weak converges (or converges weakly)
to x and {x*}t€; weak* converges to x*. We observe that (10.19) holds if
10.4 Weak and Weak* Topologies □ 611
and Only if
lim(xt,x*) =	x* 6 Q*,
and
lim(x,x*) = (x,x*),	x e Q,
respectively.
EXAMPLE 10.13 Illustrates Definition 10.5
a)	Let H be a Hilbert space. By Theorem 9.8 on page 551, each element
of H* is of the form (•,?/) for some у G H. Hence, by associating у
with (•,?/), we can identify W with its dual space. A consequence of this
identification is that the weak and weak* topologies of a Hilbert space
coincide.
Now suppose H contains an infinite orthonormal sequence
Then, by Bessel’s inequality, we have	en)|2 < INI2 < oo for
each x e H and, so, lim^oo (x, en) = 0 for each x G W. It follows
that wlimen = 0; but, on the other hand, because ||en|| = 1 for all n,
the sequence {en}^L1 cannot converge to 0 with respect to the norm
topology on H.
This simple example shows that the norm topology on a normed
space can be strictly stronger than the weak topology. Indeed, as the
reader is asked to verify in Exercise 10.48, the weak (weak*) topology
coincides with the norm topology on Q (Q*) if and only if Q is finite
dimensional.
b)	Suppose that fi is a compact Hausdorff space. In view of the Riesz
representation theorem (Theorem 9.15 on page 574), we know that the
dual space of C(Q) can be identified with the space M(Q) of regular
Borel measures on fi. Thus, for a sequence {/n}Xi °f we ^ave
wlim fn — f if and only if
lim / fnd/j,— I f dfj,, /xEM(fi).	'
Jn Jn
And, for a sequence {Mn}^=i of Af(Q), we have w*lim/zn = Д if and
only if
lim [ f dfin = [ fdn, fe C(fi).
n~*°°Jn Jn
c)	Suppose that fi is a locally compact Hausdorff space. In view of the
Riesz representation theorem (Theorem 9.16 on page 575), we know
612 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
that the dual space of Со(П) can be identified with the space M(Q) of
regular Borel measures on Q. Thus, the same results hold as in part (b)
provided we replace C(Q) by СЬ(П).
d)	A sequence {A'n}^=1 of random variables (not necessarily defined on
the same probability space) is said to converge in distribution to
the random variable X if the sequence {мхп}Х1 converges to px in
the weak* topology. We write Xn X to indicate convergence in
distribution. Thus, we have Xn X if and only if
lim [ fdpXn = [ fdpx, ft CQ(1l).
n~*°° Jn	Jn
Actually, it can be shown that the previous limit holds for all f G Съ(Т1)>
(See Exercise 10.59). An equivalent condition for Xn X is that for
each x G TZ at which Fx is continuous, Fxn{x) —> Fx(z) as n —* oo.
The reader is asked to verify this in Exercise 10.61. In that exercise,
we also ask the reader to show that convergence in probability implies
convergence in distribution.
e)	We will show that a familiar example of convergence is really weak*
convergence in disguise. Consider the sequence of measures in M([0,1])
defined by /in = (1/n)	Then, for each f G C([0,1]), we have
lim / f dpn = lim (l/n)Y2/(j/n) = / f(x)dx.
n->°°	Jo
It follows that the sequence {^n}^=i converges in the weak* topology
to Lebesgue measure on [0,1].	□
Our next theorem provides some fundamental properties of the weak
topology on a normed space.
THEOREM 10.10
Let Q be a normed space.
a)	With respect to the weak topology, fi is a locally convex Hausdorff
topological linear space.
b)	Weakly closed subsets of fi are closed.
c)	Convex closed subsets of Q are weakly closed.
PROOF:
a)	By definition, the weak topology is induced by the collection of semi-
norms {|rc*| : x* G Q* }. Hence, it follows from Proposition 10.3 on
10.4 Weak and Weak* Topologies □ 613
page 601 that fi is locally convex with respect to the weak topology.
That the weak topology is Hausdorff follows from Corollary 10.2 on
page 585 and Proposition 10.4 on page 602.
b)	This result follows immediately because the weak topology on fi is
weaker than the norm topology on fi.
c)	Let F be a nonempty closed convex subset of fi. We will prove that
Fc is weakly open by showing that for each r E Fc, there is a weakly
open set W such that
X e W C Fc.	(10.20)
By Theorem 10.9 on page 606, there exists an x* G fi* such that
5R(x,x*) < d = inf{ Э£(з/, x*) : у 6 F}.
It follows that the weakly open set W = {w G fi : 5R(w, x*) < d}
satisfies (10.20).	
Bounded and Weakly Bounded Sets
A subset E of a normed space fi is said to be bounded if
sup{ ||z|| : x G E} < oo.
Our next definition provides a less restrictive notion of boundedness.
DEFINITION 10.6 Weakly Bounded Set
A subset E of a normed space fi is said to be weakly bounded if
sup{ |(x, x*}| : x e E} < oo
for each x* G fi* .
The inequality |(x,x*)| < ||x*||*||z|| implies that bounded sets are
weakly bounded. Although not obvious, it is nevertheless true that weakly
bounded sets are bounded.
THEOREM 10.11
A subset of a normed space is weakly bounded if and only if it is bounded.
614 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
PROOF: We have already established sufficiency. To prove necessity, we
first note that each x G E determines a continuous linear functional ix
on Q* via tx(x*) — (x,x*). Because E is weakly bounded,
sup{ |£r(x*)| ' x e E} = sup{ | {x, x*)| : x e E} < oo
for each x* G Q*. Recalling that Q* is a Banach space, we apply the
uniform boundedness principle for linear operators (page 595) to conclude
that sup{ III^HI : x G E} < oo. But, by Corollary 10.2 on page 585, we
know that III4IH = ||x||.	
EXAMPLE 10.14 Illustrates Theorem 10.11
In this example we will use Theorem 10.11 to characterize the weakly con-
vergent sequences in the space C(Q), where Q is a compact Hausdorff space.
Specifically, we will show that a sequence of functions in С(Г2) converges
weakly if and only if it converges pointwise and is uniformly bounded.
Suppose that {/n}^Li is a sequence in C(Q) converging weakly to f.
Then
lim I fndp = I f dp	(10.21)
n“>°° Jq Jq
for each p G M(Q). Setting p = 6X, we obtain
lim fn(x) = /(x), xGfi.	(10.22)
n—»oo
Also, since weakly convergent sequences are weakly bounded, it follows
from Theorem 10.11 that
sup{ ll/nlln : n GAf} < oo.	(10.23)
Thus, (10.22) and (10.23) are necessary conditions for the weak convergence
of {/n}~ 1 to f.
Next, we show that (10.22) and (10.23) together are sufficient condi-
tions for the weak convergence of	to f. Let p G M(Q). Then,
because of (10.22), (10.23), and |/i|(Q) < oo, we can apply the Lebesgue
dominated convergence theorem to obtain (10.21).	□
Compactness in the Weak* Topology
One of the most important properties of the weak* topology is that the
closed unit ball is always weak* compact. This famous result is known as
Alaoglu’s theorem.
10.4 Weak and Weak* Topologies □ 615
THEOREM 10.12 Alaoglu’s Theorem
In the dual space fi* of a normed space fi, the closed unit ball,
B,(0) = {x* G П* : ||x*||» < 1},
is weak* compact.
PROOF: In Exercise 10.5 on page 589, we introduced the linear operator
J : fi -* fi** defined by	= {x, x*). The relative weak* topology
on B1(0) is just the weak topology induced by the family F of restrictions
of functions in J(fi). We will establish the theorem by showing that the
family satisfies the hypotheses of Corollary 8.2 on page 511.
If x e fi and x* 6 Bj(0), then |J(x)(x*)| — |(x,x*)| < ||x||. Con-
sequently, 7(х)(Вх(0)) is a compact subset of C for each x E fi. Thus,
T satisfies condition (a) of Corollary 8.2. If x* and x% are distinct ele-
ments of B^O), then for some у 6 fi,
•Л1/)С4) =	+ (y,xt,) = J(y)(x*2).
Hence, condition (b) of Corollary 8.2 holds.
To verify condition (c) of Corollary 8.2, let	be a net in Вг(0)
such that lim (x, x*) = £{x) exists for each x 6 fi. Whereas
t{ax + /Зу) = lim {ax 4- /Зу, z*)
= lim(a(x, x*) + /3{y, x*}) = at(x) + /?€(y)
and |^(x)| = lim|(x,x*)\ < ||x||, it follows that t is a continuous linear
functional on fi with norm at most 1. Furthermore, t is the weak* limit of
the net {x*}t€j. Hence, condition (c) of Corollary 8.2 is satisfied. 
COROLLARY 10.7
Every bounded net in the dual space fi* of a normed space fi has a weak*
convergent subnet.	f
PROOF: A bounded net is contained in a closed ball B*(0) for sufficiently
large r. Since B*(0) = rBj(O), it follows that B*(0) is also weak* compact.
An application of Theorem 7.10 on page 471 completes the proof. 
In practice, Corollary 10.7 is often applied in an effort to obtain a
weak* convergent subsequence of a bounded sequence. But unless we know
that B1(0) is metrizable with respect to the weak* topology, all that we
can assert is that a bounded sequence has a weak* convergent subnet. Our
next theorem shows that Bx(0) is weak* metrizable if fi is separable.
616 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
THEOREM 10.13
Let_£2 be a separable normed space. Then the following hold:
а)	Вг(0) is weak* metrizable.
b)	Every bounded sequence in fi* has a weak* convergent subsequence.
PROOF: We note, in view of Alaoglu’s theorem and Theorem 7.7 on
page 466, that (b) follows from (a). We now outline the proof of (a),
leaving the details to the reader in Exercise 10.54.
Let {^n}^=1 be a sequence that is dense in the closed ball Bi(0). We
define a metric p on Br (0) by
oo
p(x*,y') = 522~n|(a:n,a:*) - (xn,y*)\.
n=l
For each x*, the function p(x*,-) is weak* continuous. It follows that
B£(x*) is weak* open for each r > 0 and x* 6 Вг(0). Thus, the topology
induced by p is weaker than the relative weak* topology. Since is
weak* compact, we conclude from Corollary 7.8 on page 473 that the topol-
ogy induced by p coincides with the weak* topology.	
EXAMPLE 10.15 Illustrates Theorem 10.13
Let д 6 £2([0,1] x [0,1]) and define L: Г2([0,1]) - £2([0,1]) by
rl
Ь(/)(х)=/ g(x,y)f(y)dy.
Jo
Then L is a linear operator and satisfies
/ r1 r1	\ V2
l'lLll'“\/o Jo ^x,y^2dxdy) •
We will prove that there is an fo € £2([0,1]) such that ||L(/b)||2 = |||L|||-
Let {/n}Xi be a sequence with ||/n||2 < 1 and lim^» ||L(fn)||2 = |||Ь|||.
By Theorem 10.13, there is a subsequence {/nj }J^i converging weakly to
some /о- Hence,
lim L(Jn)(x) = lim {fn ,g(x, )) = {f0,g(x,-)) = L(f0)(x)
J—ЮО	j—»oo
for almost all x. By Cauchy’s inequality,
|L(/nJ(x)|2 < / \g(x,y)\2dy.
Jo
Thus, the Lebesgue dominated convergence theorem implies that
Pill2 = lim ||L(/nJ||i = l|L(/o)||i,
as required.	□
10.4 Weak and Weak* Topologies □ 617
EXERCISES 10.4
10.45	Let V be a linear space and P a family of linear functionals on V. Then
the collection of seminorms 5 = {|f| : € 7} induces a locally convex
topology Ts on V. Show that Ts is the same as the weak topology Tjr
determined by the family J7 in the sense of Definition 7.10 on page 428.
10.46	Construct a sequence in C([0,1]) such that \\fn||[o,i] = 1 and wlim/n = 0.
10.47	Let fi be an infinite dimensional normed space.
a)	Show that if W is a weak open set containing 0, then W contains an
infinite dimensional linear subspace of fi. Hint: Consider the linear
mapping L: fi —► Cn defined by L(x) = ((ж, xj), (x, xj),..., (x, x„)) for
appropriate linear functionals ж J, xj, ..., E fi*.
b)	Show that if U is a weak* open set containing 0, then U contains an infi-
nite dimensional linear subspace of fi*. Hint: Consider an appropriate
analogue of the hint for part (a).
10.48	Use Exercise 10.47 to prove the following facts.
a)	The norm and weak topologies are equal only for finite dimensional
spaces.
b)	The norm and weak* topologies are equal only for finite dimensional
spaces.
10.49	Consider the normed space £1.
a) Prove that if a sequence converges weakly, then it converges in the norm,
b) Deduce from part (a) and Exercise 10.48 that with the weak topology
is not metrizable.
10.50	In the space £2, let en denote the sequence which is 1 at the nth position
and 0 elsewhere, and set E = { en + nem : m > n > 1}.
a)	Show that E is closed.
b)	Show that 0 is in the weak closure of E. Deduce that there is a net
in E converging weakly to 0.
c)	Show that there is no sequence in E that is weakly convergent to 0.
d)	Deduce from parts (b) and (c) that £2 with the weak topology is not
metrizable.
it 10.51 Show that if fi is a compact or locally compact Hausdorff space, then
M+(fi) is weak* closed.
10.52 Suppose that fi is a compact or locally compact Hausdorff space and that
M(fi) is endowed with the weak* topology. Let A: fi —► M(fi) be defined
by A (a;) = 6X. Prove that A is continuous.
it 10.53 Let fi be a Hausdorff space and set P(fi) = { д E M+(fi) : /i(fi) = 1}, the
collection of probability measures in M(fi).
a)	Show that if fi is compact, then P(fi) is weak* compact.
b)	Show that if fi is locally compact but not compact, then P(fi) is not
weak* compact.
10.54	Provide the details of the proof of Theorem 10.13 on page 616.
618 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
10.55	Prove that in a separable Hilbert space, every bounded sequence has a
weakly convergent subsequence.
10.56	Consider the space £p([a, b]), where 1 < p < oo. Prove that every bounded
sequence has a weakly convergent subsequence.
10.57	Refer to Example 10.15 on page 616.
a)	Show that L maps £2 ([0,1]) into £2([0,1]).
b)	Verify that
/ y»l /•!	\ !/2
M< (J J \g(x,y)\2dxdy) .
10.58	Recall that Co(7^)* = M(7£). Let	be a sequence in M+(7£) and
p G M+(H).
a)	Show that if sup{ /zn(7£) : n eV} < oo and limn—oo НЦп(х) = FM(x)
at every x where FM is continuous, then w*lim^n =
b)	Show by example that the converse of the statement in part (a) is false,
c) Show that if w*lim jzn = p and limx_>_oo sup{ F^n (x) : n G JV} = 0,
then limn->ooF^n(x) = F^x) at every x where FM is continuous.
+10.59 Let	be a sequence in M+(1Z). Recall that w*lim^n = p means
that j^fdpn —* fnf dp for each f G Co (11). Show that if w*limjxn = P
and pn(1l) —> p(H), then fnfdpn —► f^fdp for each f G Cb(1Z). Hint:
First consider the case where 0 < f < 1.
10.60	Suppose that {Fn}^ is a sequence of distribution functions on 1Z such
that sup{ Fn(oo) : n G .V} < oo and lim^-oo sup{ Fn(x) : n G A/"} = 0.
Show that there is a subsequence {Fnfc}^=1 and a distribution function F
such that limfc_oo Fnfc (x) = F(x) at every x where F is continuous. This
result is a version of what is known as Helly’s selection principle. Hint:
Observe that Co(1Z) is separable. Use Exercise 10.58(c).
10.61	Refer to Example 10.13(d) on page 612. Let X be a random variable and
a sequence of random variables.
a)	Show that Xn -4 X if and only if Fxn(x) —► Fx(x) at every x where
Fx is continuous. Hint: Use Exercises 10.58 and 10.59.
b)	Suppose {Xn}“=1 and X are all defined on the same probability space.
Prove that if {X}Xi converges to X in probability, then Xn -4 X.
Hint: Use the fact that functions in Co(1Z) are uniformly continuous
and Theorem 6.17 on page 404.
10.5 COMPACT CONVEX SETS
In this section, we will study subsets of locally convex topological linear
spaces that are both convex and compact. We will prove the Krein-Milman
theorem, a result that describes how compact convex sets are generated by
10.5 Compact Convex Sets □ 619
their irreducible elements. Additionally, we will give an application of the
Krein-Milman theorem to the trigonometric moment problem.
To begin, we introduce some simple geometric ideas. Let vi, v2, ..., vn
be elements of a linear space and ai, o2, ..., an nonnegative scalars that
sum to 1. Then the sum
v = Q1vi + a2v2 4-----h OLnvn	(10.24)
is called a convex combination of the v^s. When n = 2, we see that
v must lie on the line segment connecting and v2. Thus, a set is convex
if it contains all convex combinations of any two of its elements. It is not
hard to show that a convex set contains all convex combinations of any
finite subset of its elements, (See Exercise 10.62.)
Some convex combinations are trivial, such as v =	----hanv
or v = lv + 0v2 + • • • + 0vn. We say that a convex combination of the
form (10.24) is proper if there are at least two distinct indices i and j
such that ai and aj are positive, and either v / or v / Vj.
An element of a convex set C is either a proper convex combination of
elements of C or it is not. The latter case, where the element is “irreducible”
with respect to convex combinations, is important enough to warrant the
following definition.
DEFINITION 10.7 Extreme Point
An element of a convex set C is called an extreme point of C if it
is not a proper convex combination of elements of C. We use ex C to
denote the set of all extreme points of C.
There are various useful equivalent conditions for an element of a con-
vex set to be an extreme point. In describing these conditions, we let
[u, v] denote the closed line segment {(1 — a)u 4- av : a G [0,1]} and
(u, v) denote the open line segment { (1 — a)u + av : a e (0,1) }. We leave
it to the reader» to show that each of the following conditions is equivalent
to x being an extreme point of the convex set C.
•	If x G (u, v), where u, v G C, then x = и = v.
•	If x G [u, v], where	G C, then x = и or x = v.
Closely related to the concept of an extreme point is that of a face, as
described in our next definition.
620 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
DEFINITION 10.8 Face of a Convex Set
Let C be a convex set. A set F с C is said to be a face of C if it
satisfies the following two conditions:
a)	F is convex.
b)	au + (1 — a)v G F, where 0 < a < 1 and u, v G C, implies u, v G F.
EXAMPLE 10.16 Illustrates Definitions 10.7 and 10.8
Figure 10.2 shows a closed triangular region T in the plane, 112. The
extreme points of the region are the vertices, pi, p2, and рз; in sym-
bols exT = {Р1,Р2,Рз}« The single element sets {pi}, {рг}» {рз}, the
edges [Р1,рг], [р2,Рз], [рз,Р1], the set T itself, and (vacuously) the empty
set are faces of T.	□
FIGURE 10.2 A triangular region.
We observe that in Example 10.16, every element of the triangular
region T is a convex combination of the extreme points. As we will see
shortly, this is close to being typical of compact convex subsets of Hausdorff
locally convex topological linear spaces in that every element of such a set
is “approximately” a convex combination of the extreme points.
Let A be a subset of a linear space. The set of all possible convex
combinations of elements of A is called the convex hull of A and is
10.5 Compact Convex Sets □ 621
denoted cov(A). Referring to Example 10.16, we see that the line seg-
ment [pi,рг] is the convex hull of {pi,P2} and that the triangular region T
is the convex hull of {Р1,Р2,Рз}-	______
In topological linear spaces, we write cov(A) for cov(A) and call this
set the closed convex hull of A. Proposition 10.7 provides some basic
properties of the convex hull and closed convex hull.
PROPOSITION 10.7
Let fl be a topological linear space and А, В C Q. Then the following hold:
a) covA is convex.
b)	If В is convex and Ac B, then cov(A) С B.
c)	If В is convex, then В is convex.
d)	If В is closed and convex and A С B, then cov(A) С В.
e)	If fl is locally convex, then
cov(A) = { x e fl: $i{x,x*) < sup{ %l{y, x*) : у € A} for all x* € Q* }.
PROOF: We leave the proofs of parts (a), (b), (c), and (d) to the reader
as Exercise 10.64. To prove part (e), we let, for each x* G Q*,
b(x*) = sup{ Щу,x*) : у G A },
and
Hx. = { x e fl: %t{x, x*) < b(x*)}.
We must show that cov(A) = Hx*.
Each Hx* is closed, because x* is continuous, and is convex, because
x* is linear. It follows from (d) that cov(A) C Hx* for each x* G fl*.
Therefore, we have cov(A) C Hx*-
To prove the reverse containment, suppose that x0 cov(A). By
Theorem 10.9 on page 606, there is an x* G Q* such that
й(х0, Xi) < inf{ У1(у, Xi) :y G cov(A) }.
Letting xj — — xj, we obtain
b(xo) < sup{SR(2/,Xo) : у G cov(A)} < 3J(xo,Zq).
Hence, xq Hx*.	
We note the following consequences of Proposition 10.7.
•	cov(A) is the smallest convex set containing A.
•	cov(A) is the intersection of all closed half-spaces containing A.
622 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
The Krein-Milman Theorem
We axe now ready to prove the main result of this section—the Krein-
Milman theorem.
THEOREM 10.14 Krein-Milman Theorem
Let Cl be a Hausdorff locally convex topological linear space. If К is a
nonempty compact convex subset of Cl, then К = cov(exK). In particular,
we have that ex К
PROOF: It suffices to consider the case of real scalars. Let denote
the collection of nonempty closed faces of К. Then the following assertions
hold, as the reader is asked to verify in Exercise 10.65:
F C F(K) and Qf/0	p| Fe F{K) (10.25)
FCF	F6F
FgF(K) F{F)cF(K)	(10.26)
FeF(K)	=> exFCexK	(10.27)
i* G Q* {xeK:{x,x*} = inf x*(K) } G	(10.28)
The collection is partially ordered by reverse inclusion D. If J7 is a
chain in then, because F has the finite intersection property, the
intersection F\ = nF€j-F is nonempty; hence, by (10.25), F\ G F(K).
Since F D Fi for each F G F, we see that F has a D-upper bound. Thus,
by Zorn’s lemma, there is a D-maximal nonempty closed face Fq.
We claim that Fo has only one element. Suppose to the contrary.
Then Theorem 10.9 (page 606) implies the existence of an x* G Q* that
is nonconstant on Fo. It follows from (10.26) and (10.28) that the set
{x e Fq : (x,x*) = infrr*(Fo)} is a nonempty closed face of К that is
properly contained in Fo. This contradiction shows that Fo = {z} for
some x G K. Because Fo is a face, x must be an extreme point of K.
Each F G F(K) is also a compact convex set and, consequently, by the
preceding argument, exF / 0. It follows from (10.27) that each F G F(K)
contains an extreme point of К.
We are now in a position to show that К = cov(exK). Since К is
closed and convex, we have cov(exK) с K. To prove the reverse in-
clusion, suppose that К \ cov(exK) / 0. Then Theorem 10.9 implies
that there is a y* G Q* such that inf?/*(K) < inf 2/*(cov(exK)). Hence,
{ x G К : (x, y*) = inf y*(K)} is a nonempty closed face of К that is dis-
joint from ex K. Since every nonempty closed face of К contains an extreme
point of K, we have reached a contradiction. Thus, К = cov(exK). 
10.5 Compact Convex Sets □ 623
Some of the most important applications of the Krein-Milman theorem
involve optimization of linear functionals. Corollary 10.8 provides some
particulars.
COROLLARY 10.8
Suppose that Q is a Hausdorff locally convex topological linear space and
that x* G Q*. If К is a nonempty compact convex subset of fl, then there
exist Х1,Ж2,хз G ex К such that
a) K(zi,z*) = inf{SR(i/,z*) :yeK},
Ъ) Щх2,х*) = sup{^y,x*) : у e K},
c) |(x3,x*)| =sup{|(y,x*)| : у € К}.
PROOF: We will prove part (a) and leave parts (b) and (c) for the reader
as Exercise 10.67. Let а = inf{ 3J(t/, z*) : у G K}. According to (10.28),
the set F = { x € К : 3?(z, z*) = a } is a nonempty closed face of K. Thus,
the Krein-Milman theorem implies that exF / 0. Applying (10.27), we
conclude that F contains at least one element of ex K,	
EXAMPLE 10.17 Illustrates the Krein-Milman Theorem
a)	Suppose that К is a compact convex subset of Rn. Given a real-valued
continuous function д on A, it is often a difficult problem to find the
maximum value of g. In case g is linear, however, this problem is sim-
plified by the relation supg(K) = supg(exK). For example, if К is the
triangle in Fig. 10.2, then the maximum on К of a real-valued linear
functional is attained at one of the points of {р1,Р2,Рз}- This prop-
erty of linear functionals on finite-dimensional compact convex sets is
fundamental to the subject of linear programming.
b)	Let Z) = {zeC:|z|<l} and H(D) the collection of analytic functions
on D, equipped with the relative topology inherited from C'(P). Each
function f G H(D) has a power series expansion f(z) — 52Xo an(/)zn.
As the reader is asked to show in Exercise 10.68, the coefficients an(/)
define continuous linear functionals on H(D). Hence, if К is a compact
convex subset of H(D) and n G AT, then there exists a g G ex К such
that |an(p)| = sup{|an(/)| : f G K}. This observation is useful in
complex-variable theory.	□
Next we give a measure-theoretic version of the Krein-Milman theo-
rem. To begin, we define the concept of a representing measure.
624 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
DEFINITION 10.9 Representing Measure
Let К be a compact convex subset of a topological linear space fl. A
measure p e M(K) is said to be a representing measure for the
element x if p is nonnegative, p(K) = 1, and
{x,x*)= / <2/,x‘)dM(y)
Jk
for each x* € fl*.
It is easy to see that in a Hausdorff topological linear space, x can be
written as a convex combination
x = ai^i 4- «2^2 H-----h anxn
if and only if the measure
M =	+ o^2^x2 4- • • • 4- an6Xn
is a representing measure for x. This suggests that a representing measure is
a kind of generalized convex combination. As we will now see, the Krein-
Milman theorem shows that every element of a compact convex subset
of a Hausdorff locally convex topological linear space is, in that sense, a
generalized convex combination of extreme points.
THEOREM 10.15 Krein-Milman Theorem (Measure-Theoretic Version)
Let fl be a Hausdorff locally convex topological linear space. If К is a
nonempty compact convex subset ofQ, then each x e К has a representing
measure p such that
p(K\exK) =0.	(10.29)
PROOF: Let x 6 К. It follows from our first version of the Krein-Milman
theorem (Theorem 10.14) that x is the limit of a net {xJieJ contained
in cov(exK). Now each xL is a convex combination
=	4* Oft>2*^L,2 4- * • • 4-	,
where xt)i,xtj2,... , xtjn<. € ex A". Hence, xL has the representing measure
Mt =	4- «/,,2^X2,^ 4- • • • 4-	•
10.5 Compact Convex Sets □ 625
By Alaoglu’s theorem (page 615), the net of measures {pL}Lei has a sub-
net	which is weak* convergent to a measure д in M(K). And,
by Exercise 10.53 on page 617, p is a probability measure. That д is a
representing measure for x follows from the fact that
(x,x*) =	= lim / (y,x*)dnt(y) = / {y,x*)
jk	Jk
for each x* e Q*.
It remains to verify (10.29). By the regularity of p, it is enough to
show that p(F) = 0 whenever F is a compact subset of К \ ex K. We apply
Urysohn’s lemma to obtain a continuous function g:K —> [0,1] such that
g vanishes on ex К and is constantly 1 on F. Since each of the measures pLrt
satisfies (10.29), it follows that /z(F) < JKgdp = lim fK g dpLr} =0.	
Remark: We emphasize that the essential point of Theorem 10.15 is not
simply the existence of a representing measure for each point of AT, but
the existence of a representing measure concentrated on ex AT, in the sense
of (10.29).
The Trigonometric Moment Problem
The remainder of this section is devoted to a rather elaborate application
of the measure-theoretic version of the Krein-Milman theorem. We will use
that theorem to solve the classical trigonometric moment problem:^
Given a doubly infinite sequence of complex numbers {/(n)}^L_oo,
find necessary and sufficient conditions for the existence of a mea-
sure p such that
f(n) = f einldp(t\ neZ.	(10.30)
J[0,27r)
In what follows, we assume that the space £°°(Z) of bounded sequences
is equipped with the weak* topology, where we recall that €°°(Z) is the dual
space of ^(Z). (Refer to Exercise 9.61 on page 562.)
t See “Moments in Mathematics,” H.J. Landau (ed.), Proceedings of Symposia in
Applied Mathematics, 37 (American Mathematical Society, 1987), for an introduction
to the voluminous literature on problems of this type.
626 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
We first derive a necessary condition for the existence of a measure ц
satisfying (10.30). Consider the complex numbers
A-n? A -n+i> • • • ? A-i, Aq, Ai,..., An.
It follows from (10.30) that
j,k=—n	I°’27r) n
2
dp(t).
Hence, we see that a necessary condition for the existence of a measure p
satisfying (10.30) is that
1	fc) > 0,
j,k=—n
{Aj}"__n с C, neX.
(10.31)
Sequences satisfying (10.31) are called nonnegative definite. So, we have
proved the necessity part of the following theorem.
THEOREM 10.16
Let {/(n)}n^=-oo a doubly infinite sequence of complex numbers. A
necessary and sufficient conditions for the existence of a measure p such
that
f(n) = [ eintd[i(t), net,
./[0,27г)
is that {/Wj^L-oo is nonnegative definite.
To establish sufficiency in Theorem 10.16, we first introduce some no-
tation. Let D and Dx denote, respectively, the collection of nonnegative
definite sequences and the collection of nonnegative definite sequences that
аге 1 at the Oth position. The following basic properties of D and Dx are
left to the reader as Exercise 10.72.
(Pl) If f e D, then |/(n)| < /(0) for all n e Z.
(P2) If / G ID, then = /(n) for all n G Z.
(P3) f G D and а > 0 => af G D.
10.5 Compact Convex Sets □ 627
(P4) If w G C with |w| = 1, then the sequence defined by ew(n) = wn
is in Pi.
(P5) If w G C with |w| = 1, then ew G ex Pi.
(P6) D is a weak* closed subset of €°°(Z).
(P7) Di is a convex, weak* compact subset of €°°(Z).
(P8) If T = { z G C : |z| = 1} and D\ has the weak* topology, then the
function E\ T —> Di defined by E(w) = ew is continuous.
We note that (Pl) and (P3) imply that every element of D is a nonneg-
ative scalar multiple of an element of D±. Therefore, we need only establish
sufficiency in Theorem 10.16 for elements of Pi.
The crucial assertion for what follows is that
ex A = E(T).
(10.32)
Suppose, for the moment, that we have established (10.32), and let f G Pi.
By (P8) and (10.32), ex Pi is weak* closed. Applying Theorem 10.15, we
obtain a representing measure Vf for f such that i//(Pi \exPi) =0. Thus,
we have
Cf) = [
JexDi
for each weak* continuous linear functional £.
Define Ё: [0,2тг) ex Pi by E[f) = E(ezt). Because Ё is continuous,
one-to-one, and onto, the set function д defined by
//(A) = i//(E(A)), A G B([0,27г)),
is a measure. It now follows from Theorem 6.17 on page 404 that
J[0,2tt)
If, for each n G Z, we apply the previous equation to the linear functional
that evaluates sequences at the integer n, we obtain
f(ri) = f E(t)(n)d/z(t) = f elnt diift), neZ,
./[0,27г)	J[0,2tt)
as required.
It remains to show that (10.32) is valid. In view of (P5), we can
conclude that E(T) C ex Pi, but the reverse inclusion is more difficult to
628 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
prove. Let f G ex£>i. For a fixed but arbitrary integer m, we define the
sequence
ft(fc) = i(/(fc 4-	- f(k -	к G Z.
We claim that
f±h/2eD1.	(10.33)
By (P3) and the fact that h(0) = 0, we see that proving (10.33) is equivalent
to showing that 2/ ± h e D.
Let n G Af and {Aj}j=_n С C. Setting Aj = 0 for |J| > n, the
condition for nonnegative definiteness can be expressed as
£ Aj A?(2/(j - fc) ± h(j - fc)) > 0,	(10.34)
j,k
where Y^jtk indicates summation over all integers j and fc. Let S denote
the left-hand side of (10.34) ahd note that S is real since h(fc) =* h(—fc) for
all к G Z. Then, using (10.31), (Pl), and (P2), we obtain
s>£AjW(j-fc)
± 52 ’WW /0’ - fc+m)
_	(10.35)
T 52 /(m) WO’ - fc - m)
+52 /(m)Aj/(rn)Afc /0' - fc)-
5’Л
On the right-hand side of (10.35), we replace fc by fc + m in the second sum,
3 by j 4- m in the third sum, and both j by j 4- m and fc by fc 4- m in the
fourth sum to obtain
s > 52 AjW(j - fc) ± 52 iXjXk+тЯт) № - k)
j,k	j,k
T^iXj+mf(m)X^f(j - fc) + 52/(m)Aj+m/(m)Afc+»n/0 ~ fc)
j,k
= ^(Aj -F ^Aj_|_Tn/(772))(Afc 4= iAfc+Tn/(m)/(J — fc).
10.5 Compact Convex Sets □ 629
Because f is nonnegative definite, it follows that S > 0. Therefore, 2f ± h
is nonnegative definite and, so, (10.33) holds
Now, we have
/ = (1/2)(/+ Л/2) 4-(l/2)(/— h/2).
Because f e exZ?i, it follows that h = 0 and, therefore,
/(fc + m)/(—m) = /(fc — m)/(m),	fc, m € Z. (10.36)
We will use (10.36) to show that / € E(T).
First we observe that there is a smallest positive integer n such that
/(n) / 0. For otherwise, it follows from (P2) that
/(fc) =
r 1,
10,
if fc = 0;
if fc / 0.
But, as the reader is asked to verify in Exercise 10.73, this sequence is not
an extreme point of £>i, a contradiction.
Let b = |/(n)| and /(n) = bezt. Any integer has the form m = sn + r,
where s and r are integers and 0 < r < n. Thus, from (P2) and (10.36),
we have /(sn 4- r)be~zt = /(($ — 2)n + r)bezt or, equivalently,
/(sn + r) = /((s - 2)n + r)e2lf.
It follows that, when s > 0,
/(sn + r) = <
eisf/(r)>	5 is even;
et(s+l)t	jf s is qJJ
Recalling that /(g) = 0 when 0 < q < n and using (P2) again, we obtain
/(sn + r) =
eist,
0,
beist,
if s is even and r — 0;
if 0 < r < n;
if s is odd and r = 0
(10.37)
for an arbitrary integer s.
Next we make use of a well-known property of roots of unity, namely,
fc=0
1,
0,
if n divides m;
if n does not divide m
(10.38)
630 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces
where w = It follows from (10.37) and (10.38) that
n—1
/(m) = 2-1(l + ^n”1 J2u>fc’nei,nt/n
fc=0
n— 1
4-	2-1 (1 - b)n-1 ^2 eim*/nwkmeimt/n.
k=Q
Thus, we have written f as a convex combination of sequences of the
forms (wkelt!n)m and (el7r/na;fcelt/n)rn. Because f G ex£>i, it follows that
b = n = 1 and /(m) = eimt. We have shown that f G E(T). Hence
(10.32) is valid.
EXERCISES 10.5
10.6	2 Show that a convex set contains any convex combination of its elements.
10.6	3 Show that x is an extreme point of the convex set C if and only if it satisfies
the following condition: x = au + (1 — a)v, where 0 < a < 1 and u, v G C,
implies и = v = x.
10.6	4 Prove parts (a)-(d) of Proposition 10.7 on page 621.
10.6	5 Prove assertions (10.25)-(10.28) on page 622.
10.6	6 Why does it suffice to prove the Krein-Milman theorem (Theorem 10.14
on page 622) in the case of real scalars?
10.6	7 Prove parts (b) and (c) of Corollary 10.8 on page 623.
10.6	8 Verify that the coefficient functionals in Example 10.17(b) on page 6g3 are
continuous.
10.6	9 Let Q be a compact Hausdorff space and P(Q) the collection of proba-
bility measures on Q, that is, the set {/i G M+(Q) : /z(Q) = 1}. By
Exercise 10.53 on page 617, P(Q) is weak* compact, and it is easy to see
that P(Q) is convex. Show that exP(Q) = { 6X : x G Q }.
10.7	0 Let К be a compact convex subset of a locally convex topological linear
space and set P(K) = {д G M+(K) : fi(K) = 1}. Show that each
fi G P(K) is the representing measure for some point of K. Hint: See
Exercise 10.69.
10.7	1 Consider the space ZL1 (7?.).
a)	Show that Bi(0) has no extreme points.
b)	Deduce that Theorem 9.12 on page 559 fails in the case p = oo.
10.7	2 Prove assertions (P1)-(P8) on pages 626-627.
10.5 Compact Convex Sets □ 631
10.73 Define
/(") = {
1,
o,
if n = 0;
if n / 0.
Show that f ex Di.
10.74 Let Q be a compact Hausdorff space and /:Q —> Q be continuous. A
measure /1 G M(Q) is said to be invariant with respect to f if
м(/-1(А))=м(Л), AeB(fi).
Let Т/ denote the collection of all probability measures in M(Q) that are
invariant with respect to f.
a)	Show that Tf is convex and weak* compact.
b)	Suppose that /1 G exZy. Prove that if A.G B(Q) and /~1(A) C A, then
д(А) G {0,1}. Measures satisfying this condition are called ergodic.
Hint: Refer to Theorem 6.17 on page 404.

PART FOUR
□
Harmonic Analysis and
Dynamical Systems
шШ (1954-)
 1г|||В11аЛЛ
;'; gium on August 17,1954, and is now a natural-
ist ’,^Вч|ЖЙЖ‘О|Ш1':0Ш:ЙШЙ1ЖвЖЙВИ
* and PhD degrees in physics (in 1975 and 1980)
from the Free University in Brussels, Belgium,
and remained there in a research position un-
> til 1987. From 1987 to 1994, she was a member
ries, taking leaves to spend six months in 1990 at the University of Michi-
gan and two years, from 1991-1993, at Rutgers University.
Daubechies is an elected member of the National Academy of Sci-
ences and the American Academy of Arts and Sciences. She was awarded
the 1994 Leroy P. Steele prize for exposition for her book Ten Lectures on
Wavelets and the 1997 Ruth Lyttle Satter Prize by the American Math-
ematical Society. Her many editorial activities include editor-in-chief for
Applied and Computational fiannonjc Analysis (Academic Pre»).
The simplest example of what is now" known as a wavelet family was
discovered in 1909 by T. Haar. The usefulness of Haar’s wavelets are
limited, however, because they are discontinuous. Daubechies made a
major contribution to wavelet theory when she found generalizations of
Haar wavelets that, in addition to being highly regular, are considerably
more effective for representing fonctions.
Daubechies is currently at the Mathematics Department and the Pro-
gram in Applied and Computational Mathematics at Princeton University.
634
□
11
□
Elements of Harmonic
Analysis
Much of the subject matter of real analysis emerged from attempts by
various mathematicians to deal with problems associated with the idea,
developed by Jean Baptiste Joseph Fourier, of expanding a function in a
series of the form
oo
f(x) = ao + ^2 (&n cos nx + bn sin nx).
n=l
(11-1)
This expansion can be thought of as a decomposition of f into an infi-
nite sum of harmonic (oscillatory) terms. Thus one speaks of (11.1) as a
harmonic analysis of the function f.
In this chapter we will investigate the meaning of (11.1) and some
of its many variations using ideas that we have explored and results that
we have obtained in previous chapters. Sections 11.1 and 11.2 deal with
properties of Fourier series. In Sections 11.3 and 11.4, we investigate the
Fourier transform. Sections 11.4 and 11.5 are devoted to wavelet expan-
sions, analogues of Fourier expansions that have received much attention
in recent years.
635
636 □ Chapter 11 Elements of Harmonic Analysis
11.1 INTRODUCTION TO FOURIER SERIES
Recall that a complex-valued function f on is said to be periodic with
period p if
f(x + p) = /(я), xe1l.
In most cases of interest, there is a smallest positive period, called the
basic period of f. (See Exercise 11.1.) The reciprocal of the basic period
is called the frequency of the periodic function. For convenience, and
because it involves no real loss of generality, we restrict our treatment to
functions having period 2тг. (See Exercise 11.2.)
Harmonic analysis, often called Fourier analysis after its main founder,
attempts to understand complicated periodic functions in terms of simple
ones. Specifically, it was Fourier’s idea to try to represent a function with
period 27Г as a series of the form (11.1). The formula (11.1) yields the
function f as a sum of simple oscillating terms, that is, sine and cosine terms
whose frequencies form the arithmetic progression 1/2тг, 2/2тг, 3/2тг, ... .
The main purpose of this and the next section is to explore the meaning
and ramifications of (11.1).
Using the formulas
2 cos# = егх + е~гх and	2i sin x = егх — е~гх,
we can recast (11.1) in the more compact form
oo
/(*) = 52 cneinx-
n=—oo
(11.2)
Assuming that the series converges rapidly enough so that the integral and
sum can be interchanged, we obtain
f(x)e~inxdx = £ ck Г eiik~n)xdx
k=—oo	J—ir
and, hence, that Cn = (2тг)“1 f(x)e~tnx dx. This shows how the co-
efficients Cn, n = 0, ±1, ±2, ..., can be calculated explicitly from the
function f and serves as motivation for the following definition.
11.1 Introduction to Fourier Series □ 637
DEFINITION 11.1 Fourier Coefficients, Transform, and Series
For f e £1([—7г,7г]), the function ft Z —» C defined by
/(n) = ^£y(x)e"^dx
is called the Fourier transform of f. We refer to the number f(n)
as the nth Fourier coefficient of f and to the expression
£ f(n)einx
as the Fourier series of f.
The Fourier series of f is said to converge at x if the sequence of
partial sums
£ f(k)etkx, пеЛТ,
k=—n
has a finite limit s(z). Convergence to s(x) will often be indicated by
s[x) = SX-oo /(П)егпж- The Fourier series of f is said to converge in
the norm || || to the function s if the forementioned partial sums converge
to s with respect to || ||.
EXAMPLE 11.1 Illustrates Definition 11.1
a)	Let /(z) = sin mz, where m G ЛЛ Then /(±m) = ±1/2г and /(n) = 0
otherwise. The Fourier series of f converges to /(z) for all z. In fact,
the partial sums at z equal /(z) as soon as n > m.
b)	Consider the function f defined by
-j, ifzG[-7T,0);
if z G [О,7г].
An easy calculation shows /(0) = 0 and /(n) = г((—l)n — 1)/2тгп if
n / 0. We will see later that the Fourier series of f converges to /(z)
at points of (—7T,0) U (0,7r) and to 0 at —7Г, 0, and tv.	□
The following theorem provides some basic properties of Fourier trans-
forms of functions in £x([—7Г, 7r]).
638 □ Chapter 11 Elements of Harmonic Analysis
THEOREM 11.1
Let f € £1([—тг, тг]). Then
a)	f € Co(Z).
b)	||/|k < Н/111/2ТГ.
c)	f = 0 => f = 0 ae,
PROOF: We leave the proof of part (a) to the reader as Exercise 11.3.
Part (b) follows from the inequality
|/(n)| < ±I**\f(x)e~inx\dx =
Example 9.12 on page 550 shows that part (c) holds whenever f is a function
in £2([—тг, tt]). The same argument remains valid, however, if £2([—тг, тг]) is
replaced by £1([—тг,тг]) and || Ц2 is replaced by || ||i.	
Any function with period 2тг is completely determined by its values on
the interval (—тг,тг]. Conversely, a function f defined on (—тг,тг] extends
uniquely to a periodic function on all of via
f(x) = f(x - 27rfc), X e ((2fc - 1)7T, (2fc 4- 1)7Г], к € Z, (11.3)
A continuous function f on [—тг, тг] extends to a continuous periodic func-
tion via (11.3) if and only if /(-тг) = /(тг).
We will use to denote the space of continuous complex-valued
functions having period 2тг. And, unless stated otherwise, we assume that
is equipped with the supremum norm,
ll/h = ll/111-^л) = ll/lloo = sup{ |/(x)| : X e [-7Г, 7Г] }.
The normed space is identified via (11.3) with a closed subspace
of C([—тг,тг]), hence it is a Banach space.
Similarly, for 1 < p < 00, denotes the space of complex-valued
Lebesgue measurable functions on with period 2тг whose restrictions to
[—тг, тг] are in £p([—тг, тг]). We assume that £2^ ^ias norm
iiyii = f	if 1 < p < oo;
( inf{M : ll/l < M ae}, if p = 00.
Since £2^ is identified via (11.3) with the space £p([—тг, тг]), it is, according
to Riesz’s theorem (page 557), a Banach space.
11.1 Introduction to Fourier Series □ 639
For a function f on 7£, let fy denote the translated function defined
by A/(x) = f(x ~ У)- If / € £2tt> ^еп so is fy and, likewise, if f G С27г,
then so is fy. Thus we say that the spaces C27r and axe translation
invariant. Furthermore, it follows from Exercise 11.2 that Ц/Jlp = ||/||p.
Some important properties of the spaces an(l Фзтг are given in the
following proposition. Its proof is left to the reader as Exercise 11.4.
PROPOSITION 11.1
a)	For p > 1, we have C27r с с c
b)	For 1 < p < oo, 62% is dense in jC^tt w^h respect to the norm || ||p.
c)	For 1 < p < 00, we have (£%*)* = where 1/p+l/q = 1. Specifically,
I is a continuous linear functional on if and only if there exists a
function g G such that
Г f(x)g(x)dx, fe£^.
Furthermore, ||£||* = \\g\\q.
d) For f G £^ and у € H, we have
e) Each function in is uniformly continuous on H.
f) For f e C^ir and у ell, we have
fy (n) =
n G Z.
For f G £271-> Sn(f) denote the nth partial sum of the Fourier series
of/:
5n(/)(x) = £ f(k)eikx.	(11.4)
fc=—n
It is easy to see that (11.4) defines a linear operator Sn on £2% with range
in C27r. Because c it follows that Sn also maps into ^2%-
The following proposition describes some essential properties of Sn.
640 □ Chapter 11 Elements of Harmonic Analysis
Dn(x)
PROPOSITION 11.2
Define
Dn(x)= eikx, n = 0,1,2,....
k——n
Then the following hold.
a)	We have
sm((n + l/2M
sm(a:/2)	F
2n + 1,	if ге/2тг e Z.
b)	=
c)	Dn(—x) = Dn(x) for each x e 11.	4
d)	Iff e then Sn(f)(x) = (27г)”1 f(t)Dn(x-t) dt for each x eH.
e) If f is a trigonometric polynomial, then Sn(J) = f whenever n > deg f.
PROOF: We prove part (a) and leave the remaining parts to the reader
as Exercise 11.5. If ж/2тг e Z, then егкх = 1 for each integer к and,
consequently, Dn(x) = 2n + 1.
If x/2tt Z, then using eikx = е~гкх and the formula for a geometric
sum, we have
JL	JL	/ JL	\	p£(n+i)x _ 1
Dn(x) = eikx + 22 e'ikx = 2ЭЧ £ eikX ) - 1 = Э*2 te _	- 1
fc=0	fc=l	' k=Q	'
_	ег(п+1/2)ж - e~ix/2 _ sin((n 4- l/2)z)
eix/2 _ e-ix/2	sin(x/2) ’
as required.	
The function Dn, often called the Dirichlet kernel, changes sign
more and more rapidly as n increases. This is a main reason why so little
can be said about the behavior of the sequence	of partial sums
of the Fourier series of a function f unless special conditions are imposed.
However, the corresponding sequence of averages
1 n-l
л.(/) = -Гад-	(n-5)
П k=Q
satisfies a formula similar to Proposition 11.2(d), where Dn is replaced by
a nonnegative function. This tends to make {An(/)}^=1 a more tractable
sequence than {SnCf)}^.
Clearly, (11.5) defines a linear operator on with range in C^.
Proposition 11.3 presents some basic properties of An.
11.1 Introduction to Fourier Series □ 641
PROPOSITION 11.3
Define
- n—1
Fn(x) = -VDk(x\ пеЛЛ
n fc=0
Then the following hold.
a)	We have
Fn(x) = <
1 /sin(nx/2)\ 2
n \ sin(z/2) J
if гг/2тг Z;
n,
if x/2it € Z.
b)	(27г)'1 f\Fn(t)dt = l.
c)	Fn(—x) = Fn(x) for each x ell.
d)	For each б e (0, тг), sup{ Fn(x) : б < |x| < 7Г} = 0.
e)	If f e jCJtt, then An(f)(x) = (27г)”1 f** f(t)Fn(x-t) dt for each x eH.
f)	If 1 < p < oo and f e £p2*, then ||An(f)||p < \\f\\p.
PROOF: The proofs of parts (a)-(e) are left to the reader as Exercise 11.6.
To prove part (f), we argue as follows. If p = oo, then, by part (e),
|An(/)(x)| <	/’ \f(t)\Fn(x -t)dt< £ Fn(x _ f) dt
for each x e 11. Applying Proposition 11.2, parts (c) and (b), we obtain
Hn(/)(*)l < ll/lloo for each x e И and so ||An(/)||oo < ||/||oo
Now suppose 1 < p < oo and let 1/p + 1/q = 1. For x e 11, define
the Borel measure p on [—тг, тг] by p(B) = (27г)”1 fBFn(x — t)dX(t). It
follows from parts (a), (b), (c), and Exercise 11.2 that p is nonnegative and
p([—тг, тг]) = 1. Furthermore, by part (e) and Exercise 4.61 on page 191,
An(/)(z) = f(t)dv(t). Hence, by Holder’s inequality (page 556),
|An(/)(x)|< ( [	|/(t)|₽dp(t)y/P
\»/[—7Г,7Г]	/
Applying Exercise 4.61 again and using Fubini’s theorem, we obtain
IH„(/)||₽ <	/’ |/(t)I” /’ Fn(x - t) dxdt = H/Ц*.
Thus, part (f) is established.
642 □ Chapter 11 Elements of Harmonic Analysis
The function Fn introduced in Proposition 11.3, often referred to as
Fejer’s kernel, plays an essential role in Fourier analysis.
We conclude this section by introducing another important function
in harmonic analysis, namely, the sine function, defined by
' sin x
—
sine x =
for x / 0;
(11.6)
, 1, for x = 0.
As Exercise 3.91 on page 160 shows, sine is not Lebesgue integrable over Tt,
yet its improper Riemann integral exists. In particular, we can assert that
roo
lim / sincxdx = 0.	(11.7)
b-*oojb
We will find (11.7) useful in the next section.
EXERCISES 11.1
11.1 Let У be a complex-valued measurable function on It such that there are
arbitrarily small positive numbers p satisfying f(x + p) = f(x) for almost
all x 6 It. Show that / is constant ae.
★11.2 Let f be a complex-valued function on It.
a)	Show that f has period p if and only if the function defined by f(px/2ir)
has period 2тг.
b)	Show that if f has period p > 0 and is integrable over every bounded
interval of It, then f*+p f(t) dt is independent of x.
11.3	Prove Theorem 11.1(a), often referred to as the Riemann-Lebesgue
lemma. Hint: Start with the case where f is the characteristic function
of a subinterval of [—тг, тг].
11.4	Prove Proposition 11.1. Hint: For part (b), see Exercise 9.56 on page 562
and, for part (d), refer to Exercise 11.2.
11.5	Complete the proof of Proposition 11.2.
11.6	Prove parts (a)-(e) of Proposition 11.3.
11.7	Let f be a complex-valued function defined on It and satisfying the fol-
lowing conditions: f is not identically 0, is continuous at 0, has period 2тг,
and f(x + y) = f(x)f(y) for all x, у G It. Show that f(x) = егпх for some
integer n.
it 11.8 For f,g G £2%» let the function f * g be defined by
1 Г*
= — / f(x-y)g(y)dy.
11.2 Convergence of Fourier Series □ 643
Show that f * g is well-defined ae and belongs to . f * g is called the
convolution of f and g. Observe that Sn(f) = f*Dn and An(f) = f*Fn.
In Exercises 11.9-11.12, f * g denotes the convolution product introduced in Ex-
ercise 11.8.
★11.9 Verify that the convolution product is commutative and associative, that
is, prove that
a) f*g = S*f-
b) (f * ff) * h = f * (g * h).
+11.10 Show that if g €	then f * g 6
★11.11 Show that if f 6	has a derivative at all points of f' 6 and
9 € then (f * g)f exists and equals (/') * g.
+11.12 Prove that f * g = fg.
Exercises 11.13-11.16 are concerned with extending the definition of the Fourier
coefficient to measures. A measure д 6 M([—7г,тг]) is said to be periodic if
м({—тг}) = д({тг}). The set of all periodic measures is a closed subspace of
the Banach space M([—тг,тг]). The Fourier coefficients of a measure g 6
are defined by the formula
AW — тг" I e~tnx dp,(x), n e z.
27Г «/[-7Г,7Г]
In Exercises 11.13-11.16, we assume that д and v are members of Л/г*.
+11.13 Show that if д << A with Radon-Nikodym derivative g, then Д = g.
11.14	Prove that |Д(п)| < |д|([—тг, тг])/2тг. Does Д always lie in Cq(Z)? (See
part (a) of Theorem 11.1.)
11.15	Set Sn(/z)(x) = №)eikx. Verify that Sn(5o) = Dn/2ir.
11.16	Show that if Д = v, then /2 = 1/. Hint: See Example 8.15 on page 522.
11.2	CONVERGENCE OF FOURIER SERIES
For a particular function f 6 and a number 2; 6 K, we pose the
following two questions:
Question 1: Does the Fourier series of f converge at x?
Question 2: If the answer to Question 1 is yes, does the Fourier series of f
at x converge to /(z)?
Because these two questions are so broadly posed, one has to give the
answer “not always” to both. Nevertheless, they serve as motivators for a
644 □ Chapter 11 Elements of Harmonic Analysis
host of interesting and useful results. In this section we present samples of
two approaches to answering Questions 1 and 2, namely:
•	Narrow the class of functions under consideration.
•	Modify the convergence requirement.
The following theorem shows, among other things, that if the Fourier
coefficients of a function converge to 0 rapidly enough, then the Fourier
series converges to the function for almost all x.
THEOREM 11.2
bet f e	If f e £*(Z), then the Fourier series of f converges uniformly
to a continuous function д such that f = д ae. In particular, the Fourier
series of f converges to f almost everywhere.
PROOF: Because Y^=-oq l/(n)l < °°> we deduce from Exercise 7.89 on
page 447 that the series
oo
д(х)= £ /(n)e™
n=—oo
converges uniformly on 71 and that g is continuous and has period 2тг.
It follows that g = f and, consequently, from Theorem 11.1 on page 638
that g = f ae.	
EXAMPLE 11.2 Illustrates Theorem 11.2
Consider the function f defined on 11 having period 2тг and satisfying
/(«) = 1 — |ж|/тг for x e [—7Г, 7г]. An easy calculation shows that /(0) = 1/2
and
1 _ (_ПП
= «*<>.
Because f € Theorem 11.2 implies that we have the expansion
£ Л")'‘“Ч+£Ц^(е*"'+е’,п‘)
n=—oo	n=l
_ 1	4 cos((2n 4- l)z)
“ 2 +	(2n + I)2 ’
n=0	4	'
11.2 Convergence of Fourier Series □ 645
where the series converges uniformly for x G [—тг, тг]. Setting x — 0, we
obtain the formula
>8
oo
= E
n=0
1
(2n + l)2
as a special case.
□
Our next theorem, due to Dirichlet, shows that the Fourier series of a
function of bounded variation converges everywhere and that it converges
almost everywhere to the function. The proof of Dirichlet’s theorem re-
quires the following lemma.
LEMMA 11.1
Suppose that F is a right-continuous function of bounded variation defined
on [a, 5]. Then there exists a p G M([a, b]) such that F(t) — F(a) = /i((a, t])
for each t G [a,b].
PROOF: Suppose first that F is nondecreasing and nonnegative. Then we
can extend F to a distribution function on 71 by defining F(t) = 0 for t < a,
and F(t) = F(b) for t > b. Applying Theorem 4.13 on page 226 to the ex-
tended version of F, we conclude that there is a finite Borel measure p
on H such that F(t) — F(a) = /i((a, t]) for each t G (a, 6]. By Exercise 9.68
on page 571, p is regular and, hence, the restriction of p to Borel measur-
able subsets of [a, b] is a regular Borel measure satisfying the assertion of
the lemma.
To continue, we next assume that F is real-valued. Then, according
to Theorem 6.3 on page 332, we can write F = pi — #2 where gi and g2
are nondecreasing functions on [а,Ь]. Letting /3 = pi (a) A p2(a), we define
Fi(b) = 9i(b) - /3, F2(b) = p2(6) - /?, and for t G [a,b), F^t) = gi(t+) - /3
and F2(t) = p2(^+) — /3- Then Fi and F2 are nonnegative, nondecreasing,
and right continuous on [a, 6], and we have F = Fi — F2. Therefore, there
exist Д1,д2 G Af([a, d]) such that F(t) — F(a) = pi ((a, t]) — /i2((a,t]) for
each t G [a, b]. It follows that the signed measure p = pi — p2 satisfies the
assertion of the lemma.
It remains to establish the lemma in case F is complex valued. This is
done by first noting that the real and imaginary parts of F are real-valued,
right-continuous functions of bounded variation and then applying what
we just proved to the real and imaginary parts of F.	
646 □ Chapter 11 Elements of Harmonic Analysis
THEOREM 11.3 Dirichlet’s Theorem
Suppose that f E £2% anc^ JS °? bounded variation on the interval [—7г,7г].
Then
i(/(x+) + /(x-))= f; /(n)e*n®	(11.8)
n=—00
for each x € 71. In particular, the Fourier series of f converges to f almost
everywhere.
PROOF: Because /(#+) = f(x) for all but countably many x, if we re-
place f(x) by f(x+) at each x, the Fourier coefficients of the function are
unaltered. Therefore, without loss of generality, we can assume that f is
right continuous on 1Z.
We first show that (11.8) holds when x = 0. Using Proposition 11.2
on page 640, we obtain
= i Г f(t)Dn(t)dt
27Г J
Jo
1 f°
+ J-J/W - /(0-))Dn(t)dt
+ Ш Г Dn(t) dt +	[° Pn(t) dt (119)
27Г Jo	2тг
= ^l\f^-fW)Dn(.t)dt
+ |(/(0) + /(0-))-
A
We will show that
lim [ (/(t)-/(0))Dn(t)dt = 0.	(11.10)
71—>OO Jq
Set g(t) = (t/2)/(t)/sm(t/2) for t e (0,7r] and g(0) = /(0). Clearly, g is
right continuous and, referring to Exercise 6.28 on page 334, we see that it
has bounded variation over [0, тг]. Hence, by Lemma 11.1, there is a regular
11.2 Convergence of Fourier Series □ 647
Borel measure /z on [0,7r] such that
^(t) - g(0) = д((0, t]) = [ dn(x)
for t G [0, тг].
Applying Fubini’s theorem, we obtain
f(/W - /(0))Pn(t) dt = [\g(t) - g(0))sm>±W) dt
Jo	Jo	Ь/ z
_ Г [ sin((n + l/2)f) . M,
~ I I	4. tn	dp,[x)dt
Jo j (0,t]	4 *
= f
J(О,7Г] Jx	Щ
r /•(n+l/2)%
= 2 /	/ sine vdvdpjx).
J(О,7Г] J(n+l/2):c
It follows from (11.7) on page 642 that the sequence
I sinev dv
J(n+l/2)a:
is uniformly bounded and tends to 0 as n —> oo for я; G (О,тг]. There-
fore, by the dominated convergence theorem, (11.10) is satisfied. A similar
argument applied to the function p(—t) shows that
lim	- f(0-))Dn(t) dt = 0.
n-*°o Jo
In view of (11.9), we conclude that
lim Sn(/)(0) = |(/(0) 4- /(0-)).	(11.11)
n~>oo	2
To complete the proof of (11.8), we apply (11.11) to the translated
function f_x. Using Proposition 11.2 on page 640 and Exercise 11.2 on
page 642, we obtain
S„(/-x)(0) = T £ f(t + x)Dn(t) dt
= T / f(t)Dn(t -x)dt = Sn(/)(a:).
2^ J
648 □ Chapter 11 Elements of Harmonic Analysis
Thus, from (11.11) and Exercise 11.17, we have
1 1
lim Sn(/)(x) = ~(f-x(0) + /-Д0-)) = -(/(x) +
71—ЮО	Z	Z
as required. The last sentence in the statement of the theorem follows
from the fact that a function of bounded variation has only countably
many points of discontinuity.	
EXAMPLE 11.3 Illustrates Dirichlet's Theorem
Refer to Example 11.1(b) on page 637. Clearly, f is a right-continuous func-
tion of bounded variation and /(0) = 1/2 and /(0—) = —1/2. Considering
the Fourier series of f at x = 0, we have
E /(n) =
n=—oo
lim
n—*oo
<((-!)*-1))
2?rfc
2-^ 2nk
k——n
= 0=J(/(0) 4-/(0—)),
as predicted by Dirichlet’s theorem.
Theorems 11.2 and 11.3 are about pointwise convergence of Fourier
series. Other interesting results can be obtained if we allow alternate modes
of convergence. For example, if f 6 then it follows from Example 9.12
on page 550 that
lim Ц/ - Sn(/)||2 = 0,	(11.12)
n—*oo
that is, the Fourier series of f converges to f with respect to the norm || Ц2.
The weaker formula
lim ||/-Лп(/)||2 = 0,	(11.13)
n—>OO
which follows immediately from (11.12), suggests still another way of look-
ing at the convergence of Fourier series.
Intuitively, because averaging tends to diminish fluctuations, it should
tend to make partial sums easier to handle. That this intuition is correct
is borne out by the following theorem of which (11.13) is a special case.
THEOREM 11.4
a)	If f & C2k, then limn-,», Ц/ - A^/)^ = 0.
b)	Forl<p< oo, if f e then H-oo II/ ~ An(/)||p = 0.
c)	If f € then the sequence {An(/)}^L1 converges to f in the weak*
topology of
11.2 Convergence of Fourier Series □ 649
PROOF: We show first that for f e
lim An(/)(0) = /(0).
n—>oo
(U-14)
Later in the proof, we will generalize the argument to establish part (a).
By Proposition 11.3 on page 641, we have
I A»(/)(0) - /(0)| <	|/(t) - /(0)|Fn(t) dt.
Given e > 0, there exists a 6 € (0,7г) such that \f(t) — /(0)| < e/2 for
t e (—6,6), Hence,
Hn(/)(0)-/(0)|<5-^y’"F„(t)dt
(11.15)
1
27Г
|/(t)-/(0)|Fn(t)dt.
Since, by Proposition 11.3, (2тг) 1 f^g Fn(t)dt < 1, we have that
Hn(/)(0) - /(0)1 < | + 2II/IU sup{ Fn(t): 6 < |t| < 7Г}.
£
Equation (11.14) now follows from Proposition 11.3(d).
To complete the proof of part (a), we observe that, as f is uniformly
continuous on It, the 6 in the foregoing argument can be chosen so that
whenever t e (—6,6), \f-x(t) — /_ж(0)| < e/2 for all x e It. It follows
that the inequality (11.15) is satisfied when f is replaced by f_x. Because
An(/-x)(0) = Ап(/)(ж) and ||/-x||oo = ll/lloo, we deduce that
|An(/)(x) - fix)I < | + 211/Hoo sup{ Fn(t): <5 < |t| < тг}.
Hence, the sequence {An(/)}^=1 converges uniformly to f. This verifies
part (a) of the theorem.
To establish part (b), we make use of the density of in and
the inequality ||p||p < (2тг)1/р||р||оо for functions in For e > 0, choose
д € Сгтг such that \\f - g\\p < e/3. Then we have
||An(/) - f\\p < \\An(f - p)||p + ||An(p) - g\\p + Ц/ - g\\p
<?€ + (2тг)1/р||Ап(^) — рЦоо-
О
650 □ Chapter 11 Elements of Harmonic Analysis
It now follows from part (a) that ||An(J) — f\\p < e for sufficiently large n.
For part (c), we must show that
lim / An(f)(x) g(x) dx = / f(x)g(x)dx (11.16)
n-oo
for all g G By Proposition 11.3 and Fubini’s theorem,
^п(/)(ж)5(ж) dx =	[	[ f(t)g(x)Fn(x - t) dtdx

It follows that
/7Г	С7Г
A»(/)(®)p(a:)<b: - / f(x)g(x) dx
-7Г	J —7Г
= Г f(t)(An(g)(t) - g(t)) dt < ll/UIIAn(g) - 5||
J — 7Г
and, hence, in view of part (b), we see that (11.16) holds.
EXAMPLE 11.4 Illustrates Theorem 11.4
It follows from Exercise 10.27 on page 597 that there exists a function
f € such that the sequence {Sn(/)(«o)}^Li diverges for some xq.
Theorem 11.4 shows that by averaging the Sn(f)s we can remove the di-
vergence at xq and get uniform convergence for all x.
EXERCISES 11.2
11.17	Show that /_x(0+) = f(x+) and /_x(0—) = f(x-).
11.18	Explain why part (b) of Theorem 11.4 is false when p = oo.
11.19	Localization of Fourier series.
a)	Suppose f G and that f vanishes on some open interval J C [—тг, тг].
Show that the Fourier series of f converges uniformly to 0 on compact
subsets of J. Hint: Start with the case where f is the characteristic
function of an interval disjoint from J.
b)	Deduce from part (a) that if /, g G and f = g on some open interval
J С [-тг,тг], then	/(n)einx converges at a point x G J if and
only if	9(n)e'nx converges.
11.2 Convergence of Fourier Series □ 651
11.20 Suppose that G C2-K for j = 0, 1, ..., m — 1 and that is
absolutely continuous.
a)	Show thatj/(n)| < ||/(m) ||1/(2тгпт).
b)	Deduce that if m > 2, then the Fourier series of f converges uniformly
to f.
★ 11.21 In this exercise, we evaluate Jo°° sine x dx.
a)	Consider the function f with period 2% satisfying
/(*) =
Г (тг — x)/2,
[ -(тг + x)/2,
if 0 < x < %;
if —тг < x < 0.
Show that
•f(n) “ { (2m)"1,
if n = 0;
if n / 0.
Deduce that
' (7Г — x)/2
< 0,
k -(тг + z)/2,
if < x < тг;
if x = 0, %, — тг;
if тг < x < 0.
b)	Show that for x G [—тг, 7r],
=	+ Dn(t)dt.
c)	Show that for x G [—7г,тг],
Q x 1 , Г sin((n + l/2)t) f .
Sn{f){x) = --x + / —-—	dt + qn(x),
2 Jo z
where qn{x) tends to 0 uniformly for x G [—тг, тг] as n —► 00.
d)	Deduce from part (c) that sine v dv = тг/2.
11.22	This exercise studies the behavior near 0 of the sequence of partial sums of
the Fourier series of a function f. It is assumed that f is right continuous
and of bounded variation on [—тг, тг]. In what follows, /0 denotes the
function defined in Exercise 11.21. For 0 < £ < тг, let
wn,6(/) = sup{ Sn(/)(x) - Sn(/)(-rr) : 0 < x < 6 }
and w6(/) = limsup^^ wn,6(/)-
a)	For each nonnegative integer n, verify that the function
n
sin(n + l/2)t
652 □ Chapter 11 Elements of Harmonic Analysis
has a local max and local min at ir/(n + 1/2) and —тг/(п + 1/2), re-
spectively, and, moreover, that
Gn(ir/(n 4-1/2)) = I sinctdt
and
Gn(~тг/(п+ 1/2)) = — I sinctdt.
b)	Deduce that lim6—o+ им(/о) = 2 f* sine t dt.
c)	Show that if f is continuous at 0, then lim^o+ ^<s(7) = 0- Hint: Study
the proof of Theorem 11.3 carefully. Use Exercise 11.21 to show that
there is a constant c such that UJ7 Dn(t)dt| < c for all n and for all
X,y 6 [—7Г, 7г].
d)	Show that if f is discontinuous at 0, then
2	/
lim ws(f) = -(/(0) - /(0-)) / sinct dt.
Hint: Consider f — afo for a suitable constant a and use part (c). This
property of the sequence of partial sums near a jump discontinuity is
known as Gibbs’ phenomenon.
11.23	Let Мгк denote the space of periodic measures on [—7r, 7г], as defined in
the paragraph preceding Exercise 11.13 on page 643. Verify that Мг* can
be identified with the dual space of in the sense that every continuous
linear functional on	is of the form f —* J f dp for some g G Мг^-
11.24	Refer to Exercise 11.23. Suppose that g € and set
лп(м)(х) = 1У sk(n)(x), neM,
where S„(g)(x) = J2”=_n/i(fc)e<fca: and Д(А:) = (27г) 1e ikx йц(х).
Define vn on 23([—7г, тг]) by vn(B) = An(g)(x)dx. Prove that the se-
quence {i/n converges in the weak* topology of М2 к to Lebesgue mea-
sure on [—7г,тг].
Let I be a bounded interval with a nonempty interior. A sequence {an}“=1 of
elements of I is said to be uniformly distributed in I if
lim -N({keti:l<k<n, akeJ}) = ^
n—»oo П
for each subinterval J С I. Here N(E) denotes the number of elements of a
set E and £ denotes length. The idea of uniform distribution is that the relative
11.3 The Fourier Transform □ 653
frequency of points of the sequence lying in an interval is proportional to the
length of the interval.
11.25	The object of this exercise is to establish Weyl’s criteria for uniform
distribution, which we do for I = [—7г,7г]. Prove that the following are
equivalent:
а)	MXi is uniformly distributed in [—7г,тг].
b)	limn_+oo n-1	f(ak) = (27г)”1 f** f(x) dx for each f e C([-тг, тг]).
c)	limn_>oo n-1	elOfeTn = 0 for every nonzero integer m. Hint: If
part (b) holds, consider continuous functions f and g having the prop-
erty that 0 < g < xj < f.
★ 11.26 Show that the sequence {nb —	is uniformly distributed in [0,1] if
and only if b is irrational. Here [ ] denotes the greatest integer function.
11.3 THE FOURIER TRANSFORM
Fourier series expansions express 2?r-periodic functions in terms of the os-
cillatory functions егпх, n e Z, whose basic periods form the discrete set
{27Г, 2tt/2, 2tt/3, ...}. In this section we discuss an analogous expansion
for certain nonperiodic functions in terms of oscillatory functions of the
form eltx, where the parameter t is continuous rather than discrete and
summation is replaced by integration. Specifically, we have the following
definition.
DEFINITION 11.2 Fourier Transform
For f e £1(7^), the function fiH—* * C defined by
/(t) = Г f(x)e~itxdx
V27T J-oo
is called the Fourier transform of f.
We should point out the following facts:
• Definition 11.2 deviates from the one given in Exercise 4.81 on page 202
by the factor (27г)""1/2. In fact, slightly different definitions appear in
various mathematical subfields, mostly for aesthetic reasons.
• The term Fourier transform is used in both Definition 11.1 (for periodic
functions) and Definition 11.2 (for /^-functions). There is little room for
654 □ Chapter 11 Elements of Harmonic Analysis
confusion, however, because the Fourier transform of a function in £|%
a function on 2, whereas that of a function in £1(7?<) is a function on 1Z.
Moreover, the only function common to both £3% and & W the zero
function. The advantage of using common terminology is that it suggests
many important analogies between properties of the two transforms.
The following theorem, whose proof is left to the reader as Exer-
cise 11.27, provides some basic properties of the Fourier transform. One
of the properties employs the notation to represent the translation-
dilation of the function /. That is, we write
/o,b(®) =	- b)/a)
v|a|
for a, b G TZ and а / 0.
THEOREM 11.5
Let f G £1(7^). Then the following hold:
a)	feC0(K).
b)	ll/h < 11/111/Л
c)	The function F:£x(7£) —► Cq(H) defined by F(J) = f is a continuous
linear mapping.
d)	For a,b elZ and а / 0, we have
= УЙe~ibtf(at),	t G 1Z.
e)	If |ж/(ж)| dx < oo, then f' exists for all t Gil and
1	f°°
f'(t) = I ^ix)f[x)e^itx dx.
V27T J-oo
f)	If f exists ae and f € £1(7J), then ff (t) — itf(t).
We observe that parts (a) and (b) of Theorem 11.5 are, respectively,
the analogues of parts (a) and (b) of Theorem 11.1 on page 638. Moreover,
for а = 1, part (d) of Theorem 11.5 is the analogue of part (f) of Proposi-
tion 11.1 on page 639. The properties of the Fourier transform described
in parts (e) and (f) of Theorem 11.5 have numerous applications to many
fields, including differential equations and probability theory.
11.3 The Fourier Transform □ 655
EXAMPLE 11.5 Illustrates Definition 11.2 and Theorem 11.5
For c > 0, we have
1 rc
X[-c,c]W — ~7= / e~ltx dx = y/2/ircsinc(cf).
Using this fact and Theorem 11.5(d), it is easy to obtain the Fourier trans-
form of any integrable step function.	□
EXAMPLE 11.6 Illustrates Definition 11.2 and Theorem 11.5
The function g(x) = e-*2/2, often called the Gaussian function, arises
in many areas of mathematics, including harmonic analysis, probability
theory, and statistics. We will prove that the Gaussian function is its own
Fourier transform, that is, g = g. To accomplish this, we first note that
g satisfies the condition of Theorem 11.5(e) and, consequently,
p'(t) = --= у (—гя)е x2l2e ztx dx.
Applying integration by parts, we find that g '(t) = —tg(t). This differential
equation has the solution g(t) = £(0)е“* I2. As the reader is asked to verify
in Exercise 11.28,
/•OO 2
х/2тгд(0) = / e~x /2 dx = л/2тг.
J—oo
Hence, £(0) = 1, as required.
□
Convolution Products
In the theory of Fourier series, we find frequent appearanc^of convolution
products, that is, integrals of the forni
1
(J * 5)(x) = — J f(x- y}g(y) dy,
where f,g E Some basic properties of convolution are examined in
Exercises 11.8-11.12 on pages 642-643. For instance, Exercise 11.12 shows
that convolution multiplication of periodic functions corresponds to ordi-
nary multiplication of Fourier coefficients.
In the theory of the Fourier transform, a similar notion of convolution
product for Z}1 (T^)-functions plays an essential role. We begin with the
following definition.
656 □ Chapter 11 Elements of Harmonic Analysis
DEFINITION 11.3 Convolution of Functions
For f,g e £* 1(7i), the function f * g defined by
1 r°°
(f * g)(x) =	/ f(x - y) g(y) dy,
V2tt J-oo
is called the convolution of f and g.
xeTl,
As with the definition of the Fourier transform, minor modifications
of the definition of convolution given in Definition 11.3 appear in various
mathematical subfields. In particular, the factor (2тг)‘“1/2 is often omitted,
as was done in Exercise 4.157(d) on page 256. From this point on, we will
use Definition 11.3.
The following theorem summarizes basic properties of the convolution
product. Its proof is left to the reader as Exercise 11.31.
THEOREM 11.6
bet f,g,h 6 £1(7^). Then the following hold:
л) f*g& /ЭД and, in fact, \\f *	< ||/||i||p||i/v^.
b) =
c) (f *g)*h = f
d) f*(g + h) = f*g + f*h.
e) f*g = fg-
EXAMPLE 11.7 Illustrates Definition 11.3
The integrals
1 fT -
W)(s) =	/ /(t)eitedt, T > 0,
v 2тг J-t
are analogous to the partial sums of a Fourier series. We will show how to
express Ir(f) as a convolution product. Using Fubini’s theorem, we have
W)(x) = Г eitx Г f(y)e~itv dydt
Z7T J_T J-.^
= Г f№ f e-^-^'dt dy = (f* DT)(x),
27r 7-00 J-T
where Dy(x) = yJ'llvTsinc(Tx). The function Dt is the continuous ana-
logue of the Dirichlet kernel.	□
11.3 The Fourier Transform □ 657
Uniqueness and Inversion
Based on an analogy with Fourier series, we might expect the formula
lim JT(/)(x) = /(x)	(11.17)
1 —+oo	(
to hold, at least under some reasonable conditions on f. Indeed, the follow-
ing heuristic argument suggests that (11.17) is valid when f is continuous
at x. By Exercise 11.21 on page 651, we have
1
f (x) = - f(x)sincydy
* J-OO
and, by Example 11.7,
1
M/)W = - / f(y) sinc(T(x - 2/)) d(Ty)
J-oo
1 r°°
= -	y/T) sine у dy.
J-OO
It follows that
1
Jr(/)(*) - Лж) = - / (Л* - У/т) - /(x)) sine у dy.
J-oo
Hence,
1 f°°
lim Fp(/)(x) - /(x) = - / lim (/(x - y/T) - f(x)) sine у dy = 0.
T-*oo	7Г	T—>oo
The obstacle to making this argument rigorous is that the function sine does
not belong to £1(7i) and so the dominated convergence theorem cannot be
applied. A way around this obstruction is to pass from the integals Ir(f)
to their averages.
For f e Г1^), let
W) = ± [T It(f)dt.
1 Jo
Integrals of the form Jrtj), which are analogous to averages of partial
sums of Fourier series, make tractable substitutes for Like
658 □ Chapter 11 Elements of Harmonic Analysis
the integral Jt(J) is also a convolution product, as can be seen as follows.
By Fubini’s theorem,
1	fT
J- Jo
11	rT r°°
= -= /	/ /(y)tsinc(t(x - y))dydt
J Jo J—oo
If00	1 [T
= ~	isinc(f(x-y))dfdy
J—oo	•* JO
1 f°° ft X1 - cos(T(x - y)~)	.	\
= * L, {Ы dy - (/ * G^’
where
x [2 1 - cos(Tx) [2 sin2(Ti/2)
GtW=v; w = vj №/2' -
The function Gt is the continuous analogue of the Fejer kernel. Three
of its essential properties are presented in Lemma 11.2. Parts (a) and (c)
of the lemma are obvious; part (b) is left to the reader as Exercise 11.32.
LEMMA 11.2
The function Gt defined by the previous equation satisfies the following
conditions.
a)	Gt > 0.
b)	(2TT)-1/2frooGT(x)dx = l.
c)	For each 6 > 0, 1нпт_>оо	Gt(x) dx = 0.
Our next theorem presents results analogous to those given for Fourier
series in Theorem 11.4 on page 648.
THEOREM 11.7
a)	Iff E Со(тг) П then limr-.oo \\f - Jr(/)||oo = 0.
b)	Iff G Г1 (7г), then limr^oo \\f - W)||i = o.
PROOF: To prove part (a), let e > 0. Since f is uniformly continuous, we
can choose 6 > 0 so that \f(x — y) — f(x)\ < e for all у G (—6,6) and x G 7£.
11.3 The Fourier Transform □ 659
Then, by Lemma 11.2, we have
1 r°°
Jr(f)(x) - f(x) = -= / /(y)Gr(x -y)dy- f(x)
V	J — oo
1	f°°
= -7== / (/(* - У) ~ /(^))Ст(у) dy.
V	J—qo
Thus,
1 r°°
\W)(x) - f(x)\ < -= /	|/(x - y) - f(x)\GT(y) dy
V2tt J—о©
< -7== f GT(y)dy
V2tt
+ \[^ ll/lloo [	GT(y)dy
<e+\[f H/lloo [	GT(y)dy.
Applying Lemma 11.2(c), we conclude that
lim sup |JT(f)(x) - /(x)| < e.
т-*°°хетг
As e > 0 was arbitrarily chosen, we see that part (a) holds.
To establish part (b), we begin by using Fubini’s theorem to conclude
that
Г° | JT(/)(x) - /(x)I dx<* Г Г |/(x - y) - /(x)|GT(y) dy dx
J—oo	V	J — oo J — oo
= -7== f f \f(x-y)~ f(x)\GT(y)dxdy
V	J—oo J—oo
1 /*°°
= -= /	||/a-/||1GT(y)dy.
V J — oo
The function h(y) = \\fy — /||i is bounded by 2||/||i and, as we ask the
reader to verify in Exercise 11.34, is continuous at 0. It follows by the
argument used in part (a) that
lim f ИД - f\\iGT(y) dy = 0.
T-*°° J-oo
Thus, part (b) is proved.
660 □ Chapter 11 Elements of Harmonic Analysis
COROLLARY 11.1 Uniqueness Property of Fourier Transforms
If f,g e £1(7^) and f = g, then f = g ae.
PROOF: If f = 0, then and, hence vanishes for every T > 0.
Applying Theorem 11.7(b), we conclude that f = 0 ae. The corollary now
follows from the linearity of the Fourier transform.	
Corollary 11.1 implies that an СУ (T^)-function is determined by its
Fourier transform in the sense that two functions having the same transform
must be identical almost everywhere. Theorem 11.8 gives a recipe for
recovering a function from its Fourier transform. Such recipes are referred
to as inversion theorems.
THEOREM 11.8
Suppose that both f and f belong to C1 ('ll). Then
1	г°° л
/(x)==--= /
V	J —oo
for almost all x £ TI-
PROOF. Let
1	f°° л
р(х) = -=/ f{tytxdt.
N J —oo
Because, by assumption, / e £1(7i), it follows that g is well defined for all
x € H. Using the dominated convergence theorem, we can write
g(x) = lim -1= [ f(tytx dt = lim
1 —>oo у z7T J — T	T—*oo
From this we can conclude that g(x) = lim?^ Jr(/)(x), as the reader is
asked to verify in Exercise 11.35. In particular, we have shown that the
sequence {Jn(/)}^Li converges to g pointwise on 71.
Now, by Theorem 11.7(b), the sequence {Jn(/)}^=i converges to f in
the £1(7^)-norm. Applying Exercise 4.84 on page 206 and Proposition 4.12
on page 204, we deduce that there is a subsequence of	that
converges to f almost everywhere. Consequently, f = g ae.	
Applying Theorem 11.8 and Theorem 11.5 on page 654, we obtain the
following corollary.
11.3 The Fourier Transform □ 661
COROLLARY 11.2
If both f and f belong to £x (1Z), then f(x) = /(—x) for almost all x e 11.
Furthermore, f is equal to a continuous function almost everywhere.
Although Theorem 11.8 is adequate for handling functions satisfying
certain mild restrictions, such as the ones given in Exercise 11.36, it is by no
means the last word on inversion of the Fourier transform. Indeed, Exam-
ple 11.5 on page 655 shows that the Fourier transform of the characteristic
function of an interval fails to be Lebesgue integrable.
EXERCISES 11.3
11.27	Prove Theorem 11.5 on page 654.
11.28	Show that e~x2^2 dx = у/2тг. Hint: Use polar coordinates to evaluate
the double integral	e”^2+j/2^2 dxdy.
11.29	Calculate the Fourier transform of the function f(x) = e~^x~b^2^a, where a
and b are real constants with a > 0.
11.30	The convolution product also appears in probability theory in a natural
way. Let X and Y be independent random variables having probability
density functions fx and /у, respectively.
a)	Show that the random variable X + Y has probability density function
given by fx+Y = VZnfx * fv-
b)	Explain the discrepancy between the result in part (a) and the one
obtained in Exercise 5.56(c) on page 288.
11.31	Prove Theorem 11.6 on page 656.
11.32	Prove that there is no identity for the convolution product, that is, there
does not exist a function h G £1(7?-) such that f = f * h for all f € С1 (IV).
Note, however, that Theorem 11.7 shows that limy—oo f * Gt = f for
aii/er1^).
11.33	Show that
Hint: Use
1 - cos(Tz) _ 1 sin(ta) ,
Tx2 ~TJQ
and Exercise 11.21 on page 651.
11.34	Let 1 < p < oo and f G CP(1V). Show that the function h defined by
h(y) = \\fy — /||p is continuous on IV
11.35	Let / G £1(7?.) and x G 1Z. Suppose that limy—oo Лг(/)(я) exists and
equals, say, L. Prove that limy—о© Jy(/)(x) also exists and equals L.
662 □ Chapter 11 Elements of Harmonic Analysis
11.36	Suppose that /" exists and is finite everywhere and that /,	/" € C1 (1Z).
Prove that f G £1(7?.).
11.37	Suppose that	€ £T(7^) A C(1Z) and that there is a constant M
such that
|/(x)|V|/'(x)|V|/"(x)|<T^, xen.
a)	Prove the Poisson summation formula:
oo	oo
^2 7(fc) = V^ir J2 /(2’rn)-
fc=—oo	n=—oo
Hint: Consider the function g(x) =	f(x + 2тгп).
b)	Use the Poisson summation formula to verify the Jacobi theta func-
tion identity:
£2 e-n2/2t = V^t J e-2’2"2*,	t>0.
n=—oo	n=—oo
Hint: Refer to Exercise 11.29.
In Exercises 11.38-11.40, C°°(1Z) denotes the space of complex-valued functions
having derivatives of all orders at each point of 11. For nonnegative integers n
and m, define
= sup{ (1 + xr2”)|/<Tn)(re)| : x e 7г}, f € С°°(7г).
We will consider the linear space
S(7£) = { f e С°°(П) : an,m(/) < oo, n, m = 0,1,2,... }
with the topology induced by the family of seminorms {<rn,m : n, m = 0,1,... }.
11.38	Prove that f € S(1Z) and g G £1(7^) imply / * g G S(1Z).
11.39	Prove that f G S(1Z) if and only if f G S(1Z).
11.40	Prove that the linear operator F: S(K) —► S(1Z) defined by F(/) = f is
continuous, one-to-one, and onto.
11.4	FOURIER TRANSFORMS OF MEASURES
In this section we will extend the concept of Fourier transform from func-
tions in £1(7^) to measures in M(1Z). As an application of Fourier trans-
forms of measures, we will obtain several interesting and important results
in probability theory, including the celebrated central limit theorem.
11.4 Fourier Transforms of Measures о 663
DEFINITION 11.4 Fourier Transform of a Measure
For ft 6 M(7£), the function fi: 1Z, —> C defined by
AW = -i= f e~itxd^x)
v2tt Jn
is called the Fourier transform of /z.
EXAMPLE 11.8 Illustrates Definition 11.4
a)	The Radon-Nikodym theorem for complex measures (page 383) and
Exercise 6.115 on page 387 imply that if /z C A, then /z = d/i/dX.
b)	If а 6 1Z and /z = 6a> then /z(t) = (2тг)“1 /2е~га1.	□
Our next proposition, whose proof is left for Exercise 11.41, provides
some basic properties of Fourier transforms of measures.
PROPOSITION 11.4
Let /z e M (H). Then the following hold:
a)	fie с(тг).
b)	|M(t)| < |д|(я)/>/2?.
c)	If n(B) = fB f(x) dA(x) for some f € £1(7J), then fi = f.
The integrals It (J) and 1т(Г)> defined in Section 11.3, play an im-
portant role in the theory of Fourier transforms of C1 (T^)-functions. They
have natural analogues when f is replaced by a measure: If /z e
we let
/Т(д)(х) = -А= [T M)eitxdt, T > 0,
V2tt J-t
and
1 fT
=	T>0.
Using these integrals we can show that a measure is determined by its
Fourier transform. We begin with the following theorem whose proof is left
to the reader as Exercise 11.42.
664 □ Chapter 11 Elements of Harmonic Analysis
THEOREM 11.9
For fi G A/(7£), define
pT(B) = / JT(/z)(x)dA(x), BeB.
Then the following hold:
&) рт g M(1Z).
b) |/zT|(7J) < |M|(7J).
c) The net {mt}tg(0,oo) converges in the weak* topology to p.
COROLLARY 11.3 Uniqueness Property of Fourier Transforms
If p, v G M(11) and p = у, then p = y.
PROOF: If p = 0, then Jt(p) and, hence рт, vanish for every T > 0.
Applying part (c) of Theorem 11.9, we conclude that p = 0. The corollary
now follows from the linearity of the Fourier transform of a measure. 
Corollary 11.3 implies that a measure is determined by its Fourier
transform in the sense that two measures with the same transform must
be identical. When p is a probability measure, we can get still more infor-
mation about its relationship with p.
LEMMA 11.3
Suppose p G M+(1Z) and p(1Z) = 1. Then, for each c> 0, we have
[Це
p([—2c,2c]) > v2tfc / p(t)dt
J-l/c
- 1.
PROOF: Let b > 0. Then
r J rb	r
= / — / e~ltx dt dp(x) = / sinc(fer) dp(x).
fn 2o J-ь	Jn
It is easy to see that | sinc(frr)| < 1 for all x and that | sinc(fer)| < (26c)-1
when |rr] > 2c. It follows that
Г	1
J ji(t) dt < д([-2с, 2c]) +	\ [-2c, 2c]).
11.4 Fourier Transforms of Measures □ 665
Taking b—\/c and using /z(7£) = 1, we get
2тг /*1/c	i 1
/ A(t)dt <1 + £д([-2С)2с]),
from which the assertion of the lemma follows immediately.	
Just as convolution of functions plays an important role in the theory
of Fourier transforms of functions, convolution of measures figures promi-
nently in the theory of Fourier transforms of measures.
DEFINITION 11.5 Convolution of Measures
For д, у G М(И), the Borel measure /z * у defined by
(д * p)(B) = -J= [ n(B - x) dv(x), В ев,
v2tt Ju
is called the convolution of /z and y.
As with the definition of convolution of functions, minor modifications
of the definition of convolution of measures given in Definition 11.5 appear
in various mathematical subfields. In particular, the factor (2тг)-1//2 is
often omitted, as was done in Exercise 4.158(d) on page 256. From this
point on, we will use Definition 11.5.
Our next proposition, whose proof is left for the reader as Exer-
cise 11.45, shows that convolution of measures (Definition 11.5) is consistent
with convolution of functions (Definition 11.3 on page 656).
PROPOSITION 11.5
Let £1(7^). Define measures p and у by
/z(B) = [ fdX and y(B)= [ gdX, В G B.
J в	J в
Then
(M*P)(B)= [ (J*g)dX, BeB.
Jb
Equivalently, if /z and у are absolutely continuous with respect to Lebesgue
measure, then so is p*y and, moreover, d(p * y)/dX = (dp/dX) * (dy/dX).
Proposition 11.5 shows that convolution of functions corresponds to a
special case of convolution of measures. More examples of convolution of
measures are contained in Example 11.9.
666 □ Chapter 11 Elements of Harmonic Analysis
EXAMPLE 11	.9 Illustrates Definition 11.5
a)	For each /i G Af(7£), we have
(6o * д)(В) = X= [ 60(B - x) dfi(x) =	В g B.
v2tt Лг	v2tt
In other words, 6q * /z = (27г)-1/2/х for all /z G
b)	Let X and Y be independent random variables. Then Exercise 5.56 on
page 288 implies that /ix+у = у/2тг fix *	□
The following analog of Theorem 11.6 on page 656 gives some basic
properties of convolution of measures. Its proof is left for Exercise 11.46.
THEOREM 11.	10
Let G Then the following hold:
a)	fi * у G
b)	fi * у = у * fi.
с)	(д*р) *7 = Ц* (y*^f).
d)	fi * (i/ 4- 7) = fi * у + fi * 7.
e)	fr*y = jiy.
Fourier Transforms in Probability Theory
Let X be a random variable. Then the function V>xW = £(eltx) is called
characteristic function of the random variable X, not to be confused
with the characteristic function of a set. It is easy to see that
V’xW = [ e.ttxdnx(x) = V2tt Mx(-t).
JR.
Instead of using the characteristic function V>x5 as is usually done in prob-
ability theory, we will work with the essentially equivalent Fourier trans-
form fix-
Recall that a sequence {Xn}^_1 of random variables is said to con-
verge in distribution to the random variable X, written Xn X, if the
sequence {/zxn}^i of measures converges to fix in the weak* topology.
Exercise 10.59 on page 618 shows that Xn Л X if and only if
lim / fdfiXn= / fdfix, fe
n“*°° Jtz	Jn
11.4 Fourier Transforms of Measures □ 667
Suppose that Xn -4 X, Then it follows immediately from the previous
equation that
lim p^(t) = px(t), t e 7г.	(11.18)
n—*oo
In other words, convergence in distribution of a sequence of random vari-
ables implies pointwise convergence of the Fourier transforms of the cor-
responding distributions. The following important theorem, due to Paul
Levy, provides a partial converse to this result.
THEOREM 11.	11 Ldvy’s Theorem
Let	be a sequence of random variables such that the sequence
of Fourier transforms converges pointwise to a function h that is
continuous at t = 0. Then there is a random variable X such that Xn Л X
andh = jrx-
PROOF: Let	be any subsequence of {Xn}Xi an<^ Mfc —
We will first show that	has a subsequence {Mfcj that converges
in the weak* topology of M (Я) to a probability measure p. Because the
PfcS are probability measures, it follows from Theorem 10.13 on page 616
that there is a subsequence	that converges in the weak* topology
of М(Тг) to a regular Borel measure p and, by Exercise 10.51 on page 617,
p e м+(тг).
We will show that p is a probability measure. Let б > 0. Because h is
continuous at 0, there is a 6 > 0 such that — h(0)| < б for |t| < 6.
Let c be a positive real number such that c“x < 6. Select a continuous
function д satisfying 0 < д < 1, д(х) = 1 for |x| < 2c, and g(x) = 0 for
|x| > 2c + 1. Applying Lemma 11.3, we have
I g(x)dfj,kj(x)
> pfc.([-2c,2c]) > Л/27ГС
- 1.
(11.19)
Letting j —> oo and using weak* convergence on the left-hand side of (11.19)
and dominated convergence on the right-hand side of (11.19), we obtain
668 □ Chapter 11 Elements of Harmonic Analysis
Using [—1/c, 1/c] C (—<5,6) and h(0) = (2тг) */2, we conclude that
p(T^) > / g(x) dfi(x) > 1 — 2л/2тг €.
Jn
Because 6 is an arbitrary positive number, it follows that p(7£) > 1. On
the other hand, if f G Co(7£) with \ f\ < 1, then
Applying the Riesz representation theorem (page 575), we deduce that
p(7£) < 1- Thus', we have shown that p(7£) = 1-
Next we apply Exercise 10.59 on page 618 to assert that for each
f & Cbity,	-> /л f d/J, as j -» oo. Letting f(x) =	,
we obtain that h(t) = p(t).
Now suppose {Xrnjc}^=1 is another subsequence of {^n}^°=r By the
preceding argument, there is a subsequence of {pxmk )ь=1 ^hat converges in
the weak* topology to a probability measure v G M(1Z) with v(t) = h(t).
Invoking the uniqueness property of Fourier transforms of measures (Corol-
lary 11.3), we conclude that v = p>. Thus, we have shown that every subse-
quence of {pxn }^Li has a subsequence converging weak* to the probability
measure p.
In a metric space, a sequence converges to a limit L if every subse-
quence has a subsequence converging to L. Because the set of probability
measures in M(ft) is weak* metrizable, it follows that {pxn}^=1 converges
in the weak* topology to the probability measure
To complete the proof, let X be the identity function on TZ. Then, as
a random variable on the probability space (7£, 23, p,), we have that p = p,x
and, because w*limpxn = p, we conclude that Xn Л X.	
In view of Proposition 11.4(a) on page 663, the uniqueness property of
Fourier transforms, and Levy’s theorem, we obtain the following corollary.
COROLLARY 11.4
Let X, Xi, X%, ... j be random variables. Then Xn X if and only if
fix2	fix pointwise as n —> 00.
The Central Limit Theorem
The strong law of large numbers for sequences of independent and identi-
cally distributed (iid) random variables, Theorem 5.9 on page 308, is one
11.4 Fourier Transforms of Measures □ 669
of the two most important theorems in probability theory. The other is
the central limit theorem. This remarkable and useful result states that
the partial sums of any sequence of iid random variables is asymptotically
normally distributed, provided only that the random variables have finite
variance.
We will use Levy’s theorem to prove the central limit theorem. But
first we require a lemma, the verification of which is left to the reader as
Exercise 11.50.
LEMMA 11.4
Suppose that p e M±(7V) is such that p(T^) = 1 and x2 dp(x) < oo. Set
mi = f^xdp(x) and m2 = fnx2 dpjx). Then
л/2тг p(t) = 1 - im^t — m2t2/2 4- a(t),
where lim^o a(t)/t2 = 0*
THEOREM 11.	12 Central Limit Theorem
Suppose Xi, X2, .. •, are mutually independent and identically distributed
random variables with mean m and Suite variance a2. Let Sn = $2£=1 Xfc.
Then we have
lim P 1 а <
n—>oo	\
Sn — nm
y/na
1 f -z2/2 >
- I e dx
V27r Ja
uniformly for all —00 < а < b < 00.
PROOF: The reader is asked in Exercise 11.51 to show that we can with-
out loss of generality assume that m = 0 and a2 = 1. It follows from
Example 11.9(b) on page 666 that
pSn = (2?r)(n x)/2p * p * • • • * jU,
n times
where p denotes the common distribution of the Xns. Let Zn = Sn/y/n.
Then, by Theorem 11.10(e) and Exercise 11.43,
670 □ Chapter 11 Elements of Harmonic Analysis
Using Lemma 11.4, we get
_____... I t2 ( t \
P>zn (£) — “7= 1 — n—Ь a I “7= I
у/2тг L 2n \V^J
Consequently,
lim gZn(t) = -±==e t2/2 = -i=g(t),
n-юо	у/2тг	v2?r
where g is the Gaussian function discussed in Example 11.6 on page 655.
By Levy’s theorem, the sequence {Zn}^=1 converges in distribution to
a random variable having the distribution u(B) = (27г)”1/2 fB e~x f2 dx.
Applying Exercise 10.59 on page 618, we conclude that
lim / f dp>zn = I f du,
n-kO° Jn Jn
f e Сь(тг).
Let 0 < € < (b — a)/2. Choose a continuous function fi such that
0 < fi < 1, fi(x) = 0 for x (a, b), and fi(x) = 1 for x e [a 4- e, b — б].
And choose a continuous function /2 such that 0 < j2 < 1, AW = 1 for
x E [a, b], and /2(2) = 0 for x (a — e, b 4- б). Then
/1(1) dvzjx)
dnzn(x)< / y2(a:)dgzn(a:)-
Jn
Thus,
and
/	I / v 1	2
limsup/x^n((a, 6]) < / f2(x) dv(x) < —= / e~x '2 dx.
n-+oo	Jn	V 27Г Ja-e
Because б can be made arbitrarily small, it follows that
e x2/2 dx
= lim Mzn((u, b])= lim P
n—*00	n—ЮО
as required. The uniformity in a and b follows from Exercise 11.52.
11.4 Fourier Transforms of Measures □ 671
EXAMPLE 11.10 Illustrates the Central Limit Theorem
As a consequence of the strong law of large numbers for iid random vari-
ables, we proved, in Corollary 5.5, Borel’s strong law of large numbers:
Suppose that E is an event associated with some random experiment and
let p be its probability. Denote by n(E) the number of times that event E
occurs in n independent repetitions of the experiment. Then, with proba-
bility one, limn-юо n(E)/n = p.
Similarly, we can obtain as a special case of the central limit theorem,
the following result known as the DeMoivre-Laplace theorem:
lim P
n—*oo
n(E) — np<
y/np(l-p) ~
[be-^dx
27Г Jа
uniformly for all —oo < а < b < oo. To prove this, define for each nG .V,
Xn = 1 or 0 according to whether event E occurs or does not occur on
the nth repetition of the experiment. Then Xi, X2, ..., are iid and have
common meanp and variancep(l—p). Noting that n(E) = XiH-----hXn, we
obtain the DeMoivre-Laplace theorem from the central limit theorem. □
EXERCISES 11.4
11.41	Prove Proposition 11.4 on page 663.
11.42	Prove Theorem 11.9 on page 664.
11.43	Establish the following facts.
a)	Let p G M(It) and i/(B) = д(а-1(В — 5)), where a and b are constants
with a 0. Show that v(t) = e~lbtp(at).
b)	Let X be a random variable and set Y = aX 4- ft, where a and b are
constants with a 0 0. Show that py(t) — e~'btp(at).
11.44	Let p e	Show that if |x|fcd|/z|(o:) < 00, where k is a positive
integer, then the fcth derivative p^ exists and
£<fc>(t) = -L [ (~ix)ke~itx
V2rr Jn
t&n.
11.45	Prove Proposition 11.5 on page 665.
11.46	Prove Theorem 11.10 on page 666.
11.47	Let p G М(П). Show that p has period 2тг if and only if |/z|(3c) = 0.
In the next two exercises, we borrow some terminology from communications
engineering. A measure p G is said to be time limited if it vanishes on
672 □ Chapter 11 Elements of Harmonic Analysis
Borel subsets of [—a, a]c for some a > 0, and it is said to be band limited if
Д vanishes outside of [—&, b] for some b > 0.
11.48	Show that if p is band limited, then there is an f G £1(7^) such that
p(B) = fB fdX for all В G 13. In other words, prove that every band-
limited measure is absolutely continuous with respect to Lebesgue measure.
11.49	Show that a measure that is both band limited and time limited must
vanish identically. Hint: Use the fact that if a function analytic on C
vanishes on a nonempty open interval, then it vanishes identically.
11.50	Prove Lemma 11.4 on page 669.
11.51	Show that it suffices to prove Theorem 11.12 in the case of zero mean and
unit variance.
11.52	Let {Xn}£°=1 be a sequence of random variables.
a)	Show that if Xn Л X, where X is a continuous random variable, then
Fxn —► Fx uniformly on Я. Hint: Show that for each c > 0, there
is a T > 0 such that gxn ([—T,T]C) < e for all n. Use the uniform
continuity of Fx .
b)	Use part (a) to deduce the uniformity in the central limit theorem.
11.5 £2-THEORY OF THE FOURIER TRANSFORM
Because £2тг С Fourier coefficients are defined for functions in £3^.
Indeed, the theory of Fourier series for ^„.-functions is particularly well
understood. In the sense of convergence in the norm of £3^., we have, for
f G £3^, that
№)= £ /(n)eini	(11.20)
n=—00
and, furthermore,
00
II/II2 = 2ТГ X \f(n)\2.	(11.21)
n=—00
Given the strong analogy between Fourier coefficients and the Fourier
transform, we would expect similar results to hold for functions in £2(7£)
provided, of course, that the sums in (11.20) and (11.21) are replaced by
suitable integrals.
However, there is an immediate problem: Because £2(7£) $£ £x(7£),
the Fourier transform is not defined for all functions in £2(7£). To proceed
we must therefore first provide an appropriate definition of the Fourier
transform of such functions. In this section, we will see that the “correct”
definition leads naturally to extensions of (11.20) and (11.21).
11.5 £2-Theory of the Fourier Transform □ 673
We begin by studying the integral
f \f(t)\zdt, /бф).	(11.22)
J—oo
In particular, we would like to know when this integral is finite. By the
monotone convergence theorem, the finiteness of (11.22) is equivalent to
that of limT—oo	|/(t)|2 dt.
Referring to Example 11.5 on page 655 and applying Fubini’s theorem,
we get that
[T \f(t)\2dt=^~ Г Г /(;г)Ж [T e-^-^dtdxdy
J-T	27Г J-ooJ-oo J-T
= - [	[ f(x)f(yyr sinc(T(x -y))dxdy.
J-oo J-OO
The presence of the term sinc(T(x — y)) makes these integrals hard to
handle. Consequently, we will employ the averaging technique used in
previous sections.
We first note that
[ |/(s)|2dsdt = ~^= f f f(x)f(y)GT(x-yjdxdy,
J—t	v2tt J—о© J—oo
(11.23)
where
1 — cos(Tx)
TV
Because, by Exercise 11.53,
lim f |/(t)|2dt= lim If f \f(s)\2dsdt,
Г-oo J_T	T—*oo T Jo J_f
it follows that finiteness in (11.22) is equivalent to that of the right-hand
side of the previous equation. Our strategy therefore is to examine finite-
ness in (11.22) by working with the right-hand side of (11.23).
LEMMA 11.5
If f G £2(7£) П ^(Я), then f G £2(7£) and ||/||2 = ||/||2.
i2(Ti/2)
= vl
674 □ Chapter 11 Elements of Harmonic Analysis
PROOF: By Lemma 11.2 on page 658,
ll/lli = -L Г WfWlGr^dx = -+=. Г Г f(y)f(y)GT(x)dxdy.
v 2тг J-oo	v2?r J-oo J-oo
On the other hand, from (11.23) we have
[ [ \f(s)\2dsdt = —j==f [ f(x+ y)f(y)Gr(x)dxdy.
Jo J—t	\/27Г J—oo J—oo
Using Fubini’s theorem, we get,
1 rT ft л
-	/ i/(S)i2dSdt-n/iii
1 Jo J-t
= ~i= [ [ (f(x + y) - /(j/))/(j/)Gt(x) dydx.
V 2/K J—oo J-oo
and applying Cauchy’s inequality gives
7 [T f \f(s)\2dsdt-ll/ll2 < 4=11/112 Г \\f-s-f\\2GT(x)dx.
1 Jo J-t	V^T	J — oo
We now proceed as in the proof of Theorem 11.7(b) on page 658 to show
that the right-hand side of the preceding inequality tends to 0 as T —> oo.
Thus, we have
Г |/(з/)|2й= lim i [T /“ \f(s)\2dsdt = ||/||22,
J — OO	T—+OO 1 Jq J _t
as required.	
THEOREM 11.13 Plancherel’s Theorem
There is a unique linear operator .F:£2(7£) —> £2(7£) with the following
properties:
a)	For each f e £2(7£) О we have = f ae.
b)	For each f E £2(7£), we have lima/-^ ||^(/) — r^f Ц2 = 0, where
тм =
С) И(/)1|2 = ||/||2 for each f e Г2(Л).
d) W/),^)) = {f,9} for f,g & C2{Tl).
e) 5"(Jr(/))(a:) = /(—x) ae for each f e £2(7l).
11.5 £2-Theory of the Fourier Transform □ 675
PROOF: Let f G £2(7£) and let {A/n}^=1 be a sequence of positive num-
bers tending to oo. Then we have {тмп/}™=1 С £2(7£) A £1(7^) and
limn^oo ||тд/п/ — f ||2 = 0. Let fn = TMnf. From Lemma 11.5, it fol-
lows that \\fn - fm||2 = \\fn - /mlh- Consequently, the sequence {fn}™=1
is Cauchy. Using the completeness of £2(7£), we now define
JT(f) = lim f^ = lim r^f.	(11.24)
n—+OO	n—*oo
As the reader is asked to show in Exercise 11.54, the limit in (11.24) is
independent of the particular sequence of Mns.
If f G £2(Я)А£х(Я), then the dominated convergence theorem implies
that the sequence {fn}^=i converges pointwise to f. Thus, part (a) holds.
That У is a linear operator can be seen as follows: Let f,g G £2(7£)
and a,/3 G C, and, for each n G N, set hn = TMnaf 4- тмп&9- Using the
linearity of the Fourier transform on £1(7^), we have hn = атмп/+/3тм^д.
Passing to the limit on n, we get 4- /Зд) = aFfJ) 4- /3F(cj).
Part (b) follows from the definition of ^(f) and Exercise 11.54. We
obtain part (c) from Lemma 11.5 via
И(/)||2 = Hm ||t^7||2 = lim ||tm„/||2 = ||/||2.
n—>oo	n—>oo
The uniqueness of a linear operator satisfying parts (a) and (c) is a conse-
quence of the fact that £x(7£) A£2(7£) is a dense subset of £2(7£). Part (d)
is left to the reader as Exercise 11.55.
It remains to prove part (e). Since C2(1t) A £x(7^) is dense in £2(7£),
it suffices to prove that ^(^(f)) = R(f) for all f G £2(7£) A jC1^), where
R(f)(x) = f(~x)- So assume f G £2(7£) A £x(7^). Then, as the reader is
asked to show in Exercise 11.55(b),
f * GT G £2(7£) А £\?г) and f^GT e £2(7£) A £\R) (11.25)
and
lim ||f *GT-f ||2 =0.	(11.26)
T —►oo
Applying parts (a) and (c), (11.25), (11.26), and Theorem 11.8 on page 660,
we obtain w
0= lim ||f * GT -f||2 = lim ||f	- JT(^(f))||2
T—к»	T—кэо
= lim * GT) - J=W))||2 = \\R(f) - JW))ll2-
T—►□©
The proof of Plancherel’s theorem is now complete.	
676 □ Chapter 11 Elements of Harmonic Analysis
The operator T given in Plancherel’s theorem extends the definition of
the Fourier transform to the space £2(7£). From now on we will call ^(/)
the Fourier transform of an £2(7£)-function / and write
Я/) = f, fe
Although strictly speaking this is an abuse of notation, we observe from
part (a) of Plancherel’s theorem that this notation is consistent with pre-
vious usage.
EXAMPLE 11.11 Illustrates Plancherel’s Theorem
The function sine is not in £1(7?.), but it is in £2(7?,). By Example 11.5 on
page 655 we have sine t = \Ar/2 xpij] (t). Applying part (e) of Plancherel’s
theorem, we deduce that
sincx =	X[-i,i](-x) = y/ir/2x[-i,i](x)
for almost all x. In particular, sine is not continuous.	□
The Fourier transform on £2(7£) retains some of the properties that it
has on £1(7?,) — but others are lost. Specifically, our next theorem shows
that part (d) of Theorem 11.5 remains valid for £2(7£)-functions, as do
parts (e) and (f), provided we modify the notion of derivative. On the other
hand, the Fourier transform of an £2(7£)-function need not be continuous,
as Example 11.11 shows.
Let f G £2(7£). Then we say that ф G £2(7£) is the derivative of f
in the £2-sense, and write ф — if ||(/-h - f)/h- ф\\ъ 0 as Л 0.
That this definition of derivative is close to the usual one can be seen from
Exercise 11.57(a).
THEOREM 11.14
Let f G £2(7£). Then the following hold:
a) For a, b G and a / 0, we have = \/H e~'btf(at) ae.
b)If$lя2|/(я)|2 dx < oo, then we have f'=gin the £2-sense, where
g(x) = —ixf(x).
c) If /' exists in the £2-sense, then /'(t) = itf(t) ae.
PROOF: See Exercise 11.57.	
11.5 £2-Theory of the Fourier Transform □ 677
EXERCISES 11.5
11.53	Show that
lim [ \f(t)\2dt= lim 1 Г Г |/0)|2 ds dt.
T-^J_T T^°°T Jo J-t
11.54	Prove that the limit in (11.24) on page 675 is independent of the particular
sequence of Mns tending to oo.
11.55	Refer to Plancherel’s theorem (page 674).
a)	Prove part (d) of the theorem.
b)	Verify (11.25) and (11.26).
11.56	Show that the Fourier transform is onto £2(7£).
11.57	Establish the following.
a)	If f G £2(7£) is an absolutely continuous function such that /' G £2(7£),
then f' is also the derivative of f in the £2-sense.
b)	Theorem 11.14 on page 676.
11.58	Let f be a continuous function in £2(7£) such that $2^1|f(n) | < oo
and f (t) = 0 for 11\ > 7Г.
a)	Show that, as a function on [—тг, тг], f has the Fourier series expansion
f(t) = E”=-oo Cne4”* where °" = (27r)-1/2/(-n)-
b)	Show that /(x) =	/(n) sinc(7r(x — n)) for each x E1Z.
c)	Use part (b) to prove the Shannon sampling theorem: Let L be
a positive constant. Suppose that g is a continuous function in £2(7£)
such that l^(n7r/^)l < 00 and dlfi = ^ог H — L. Then
p(x) =	№№ sinc(Lx — тгп). The Shannon sampling theo-
rem is used extensively in communications engineering.
Exercises 11.59-11.66 consider an important class of special functions closely
related to the Fourier transform.
11.59	Let д be the measure defined by /x(B) = fB e“®2 dX(x) and ( , )д the inner
product induced by д.
a)	Verify that the space £2(д) contains all polynomials.
b)	Apply the Gram-Schmidt orthogonalization technique (see Theorem 9.5
on page 548) to the sequence 1, x, x2, ... to obtain a sequence of poly-
nomials Ho, Hi, ..., where Hn is of degree n, that are orthonormal
with respect to ( , ) . The Hns are often referred to as Hermite
polynomials.
c)	Deduce that any polynomial p of degree n can be written in the form
11.60	Refer to Exercise 11.59.	2
a)	Deduce that the functions hn(x) = Hn(x)e~x constitute an or-
thonormal sequence in £2(7£).
678 □ Chapter 11 Elements of Harmonic Analysis
b)	Show that kn = hn takes the form kn(t) = JCn (t)e f2/2, where Kn is a
polynomial of degree at most n.
11.61	Prove that, for each n G V, we have Kn = anHn, where an E {1, —1, i, —i}.
11.62	Let an denote the leading coefficient of /fn. Show that
hn{x) xhnix) == 2
<*n+l
11.63	Prove that
e~^2hn(x) = cn^e^,
where cn = (— l)nan2“n.
11.64	Referring to Exercise 11.61, verify that an = (—i)n.
11.65	Show that the collection of functions {ho, hi, /12, • • •} form an orthonormal
basis for £2(1Z).
11.66	Show that the Fourier transform of a function f 6 £2(1Z) can be expressed
11.6	INTRODUCTION TO WAVELETS
The theory of Fourier series seeks expansions of the form
/(x) = £ /(n)e’nx
n=—OO
that express the function f as an infinite linear combination of dilations of
the basic oscillating function E(x) = егх.
Similarly, wavelet theory is concerned with expansions of the form
f(x) = Cnm^(anx + bm).	(11.27)
n,m=-oo
that express f as an infinite linear combination of translations-dilations of
a single function *ф called a wavelet. Wavelet theory, however, unlike the
theory of Fourier series, emphasizes the case where is localized, that is,
tp vanishes or decays rapidly outside of some bounded interval.
This and the following section provide a brief introduction to the bur-
geoning theory of wavelets. We begin with a discussion of the family of
Haar wavelets. Motivated by the example of Haar wavelets, we will then
introduce the concept of a multiresolution analysis of £2(7£).
11.6 Introduction to Wavelets □ 679
In our discussion of wavelets, we will restrict ourselves to functions
in the Hilbert space £2(7£). And when we consider convergence of expan-
sions of the form (11.27), we will always do so in the context of the usual
£2 (7?,)-norm. It will therefore be unambiguous to drop the subscript on
that norm and to write ( , ) for the usual inner product on £2(TZ).
As a further restriction, we will only investigate expansions of the
form (11.27) in case an = 2~n and bm = —m, where n and m vary over the
set Z of all integers. Double sums of the form 52^°m=-oo be denoted
by У .
Wavelets and Haar Wavelets
In what follows, we will employ the notation
/(n,m)(^) = f2”,2”m(x) = 2~n/2f(2~nX - m).
It is important to note that if f G £2(7£) and ||/|| = 1, then we have
ll/(n,m)|| = ll/ll = 1 for all n,m G Z.
DEFINITION 11.6 Orthonormal Wavelet Basis, Wavelet
Let ф G £2(7£). If the collection of functions {^(п,т) : n,m G Z}
is an orthonormal basis for £2(7£), then it is called an orthonormal
wavelet basis and the function is called a basic wavelet or, more
simply, a wavelet.
The following example introduces an important orthonormal wavelet
basis and illustrates some basic ideas of wavelet theory.
EXAMPLE 11.12 The Haar Wavelet
For each n G Z, let Vn denote the set of all functions in £2(1Z) that are
constant on every interval of the form [t 2n, (£+l)2n), where t G Z. Clearly,
we have Vn C Vn-i for each n G Z. Moreover, as the reader is asked to
verify in Exercise 11.67, we have
Vn is a closed linear subspace of £2(7£),	(11.28)
U = Ф),	(11.29)
nEZ
п К = {о}.	(11.30)
nez
680 □ Chapter 11 Elements of Harmonic Analysis
Let ip = X[o,i) • From Exercise 11.68,
Vn = span{ 4>(n,m) : m 6 Z }.	(11.31)
Applying (11.29), it follows that
span{ ^(n,m) • m, n G Z } = £2(7£).	(11.32)
Also, for each n 6 2, the family { ^(n,m) • n* G Z } is orthonormal.
But although {(P(n,m) • тп,п E Z} resembles an orthonormal wavelet
basis, it is not because it lacks orthogonality. Indeed, we have, for example,
that <¥>(n,o),^(o,o)) / 0- The problem is that the <p(n,m)S are nonnegative-
valued. To produce an orthonormal wavelet basis, we will modify
To that end, we define
1,
h(x) = < —1,
0,
if 0 < x < 1/2;
if 1/2 < ж <1;
otherwise.
Members of the family Bh = { h(n,7n) : n, m G Z } are referred to as Haar
functions.
It is not difficult to show that Bh is orthonormal; but verifying that
it is a basis for £2(7£) is somewhat more challenging. Suppose we can
prove that
(pGspanBfc.	(11.33)
Then, because
f G spanSh =>	€ spanBh, n,m G Z,	(11.34)
(see Exercise 11.70), it follows that
span{ : m, n G Z } C spanBh.
Thus, by (11.32), spanS/i = £2(7£) and, hence, Bh is a basis.
It remains to verify (11.33), which we will do by proving that
(11.35)
171,71
As the reader is asked to show in Exercise 11.71,
/ ,	\	( 2 n/2, if n > 0 and m = 0;
^Л(п,ГО)) = |0) otherwise.
11.6 Introduction to Wavelets □ 681
To establish (11.35), we first prove that
<p(xj = J 2~nh(2~nx), X e n.
n=l
(11.36)
Clearly, both sides of (11.36) are 0 if x < 0. For x G [0,1), we have
2~nx G [0,1/2) for all n > 1 and, consequently, both the left- and right-
hand sides of (11.36) equal 1.
For x G [l,oo), select к G such that 2k~r < x < 2k. Then we
have 2~nx G [0,1/2) for n > fc, 2~kx G [1/2,1), and 2~nx G [l,oo) for
n < fc. Therefore, the right-hand side of (11.36) is -2~fe +	2~n = 0
which, of course, equals the left-hand side of (11.36). It now follows from
Proposition 9.3 on page 532 and the DCT that (11.35) holds.
We have now shown that the family of Haar functions constitutes an
orthonormal basis for £2(7£). Hence, it forms an orthonormal wavelet basis
and fc is a wavelet, called the Haar wavelet.	□
Multiresolution Analysis
Guided by the essential features of Example 11.12, we can establish a gen-
eral framework for constructing orthonormal wavelet bases. Specifically,
we will work with a sequence {Vn}^°=_oo of closed subspaces of £2(7£) sat-
isfying the following conditions.
(Ml) ••• C V2 C Vi C Vo C V-i C V_2 C •••.
(М2) J Vn = £2(7£).
nez
(М3) Qv„ = {0}.
nez
(M4) f G Vn if and only if /(_n,o) € Vo-
(M5) f G Vo if and only if /(o,m) € Vo for all m G Z.
(M6) There is a function ip G Vo such that {<p(o,m) : m G Z } is an
orthonormal basis for Vo-
A sequence {Vn}^L:_oo of closed subspaces of £2(7£) satisfying (M1)-(M6)
is said to be a multiresolution analysis of
As we observed in Example 11.12, if we let Vn denote the collec-
tion of £2(7£)-functions that are constant on every interval of the form
[£2n, (£ + l)2n), where £ G Z, then {Vnl^L-oo is a multiresolution analysis
with tp = X[o,i) •
682	□ Chapter 11 Elements of Harmonic Analysis
In the general setting described by (M1)-(M6), the family of functions
{	: m G Z } is an orthonormal basis for Vn for each n, but (p(n?Tn)
and may not be orthogonal if n / j. Rather, an orthonormal wavelet
basis can be constructed using <£, as we will show in the next section.
For the remainder of this chapter, we will assume that {Vn}^L_oo is a
multiresolution analysis. The orthogonal projection of £2(7£) onto Vn will
be denoted by Pn.
Our next lemma, whose proof is left to the reader as Exercise 11.72,
will be needed in our development of the theory of wavelets.
LEMMA 11.6
Let Wo = Vo-1- П V_b Wn = { /(n,0) : f € Wo }, and Qn : £2(7£) - Wn be
the orthogonal projection of £2(jR.) onto Wn. Then the following hold,
a) If £	then (f,g) — 0 for all f G We and g G Wn.
Ь) Pn-l = Pn “b Qn-
c) For each f G £2(7£), we have f = Y^=-ooQn(fh where the series
converges absolutely with respect to the £2(1Z)-norm.
Now, if {*0(0,771) • пг G Z} is an orthonormal basis for Wo, then
{ 0(n,m) • m E Z } is an orthonormal basis for Wn. Hence, by Theorem 9.6
on page 549, Qn(/) = Е~=_те	for each f (= £2(7£).
It follows from Lemma 11.6(c) that f = £n,m	On
the other hand, by Lemma 11.6(a), the family	• n, m G Z} is
orthonormal. Thus, { 0(n?rn) : n, m G Z } is an orthonormal wavelet basis
for £2(7£).
We have shown that if {0(Ojrn) : m G Z} is an orthonormal ba-
sis for Wo, then {0(njTn) : n,m G Z} is an orthonormal wavelet basis
for £2(7£). We conclude this section by giving sufficient conditions for
{0(o,m) : П7 6 2} to be an orthonormal basis for Wo-
PROPOSITION 11.6
Suppose 0 E Wo is such that ||0|| = 1 and also satisfies the following two
conditions:
a)	For each n G Z\ {0}, we have eint |0(t)|2 dt = 0.
b)	For each f G Wo, there exists F G £2^ such that f = Р'Ф-
Then { 0(o,m) : m. G Z } is an orthonormal basis for Wo and, consequently,
{0(n,m) - n,m E Z} is an orthonormal wavelet basis for £2(7?,).
11.6 Introduction to Wavelets □ 683
PROOF: Applying condition (a), Plancherel’s theorem (page 674), and
Theorem 11.14 (page 676), we get that
= Г = {°-	™
J —co '	t 11 P ~ frt,t
Thus, { ^(о,т) : m e Z } is orthonormal. We will show that it forms a basis
for Wo by applying Theorem 9.6(c) on page 549.
Suppose that f 6 Wq and (/,	— 0 for each m 6 Z. Then, by
condition (b) and, again, Plancherel’s theorem and Theorem 11.14,
eimtF(t)\^(t)\2dt =	= 0
(11.37)
for each m € Z. Now, we have
[°° eimtF(t)$(t)\2dt = V [	eimtF(t)\ij>(t)\2 dt
-<x>	l=_(x>
eimtF(t) 52 |^(« + (2^+1)тг)|2Л.
£=-oo
The function g(t) = F(t) Y^-oo
by Cauchy’s inequality,
\$(t + (2^ + 1)tt)|2 belongs to since,
52 |^(t + (2£+ l)7r)|2dt
£=-oo
‘OO
In view of (11.37), all Fourier coefficients of g vanish and, consequently,
Theorem 11.1 on page 638 implies that g = 0 ae. It follows that f = Fty
vanishes ae on 7?. and hence that f = 0 ae.	
EXERCISES 11.6
11.67	Verify (11.28)-(11.30).
11.68	Refer to Example 11.12 on page 679.__
a)	Prove that Vn = span{ v?(n,m) : m € 2}.
b)	Show that (a) holds for any multiresolution analysis of £2(7£).
684 □ Chapter 11 Elements of Harmonic Analysis
11.69	Prove that the Haar functions form an orthonormal family.
11.70	Verify (11.34).
11.71	Show that
(X[o,i)> Ь.(П)ГП))
f 2 n/<2, if n > 0 and m = 0;
( 0, otherwise.
11.72	Prove Lemma 11.6 on page 682.
11.73	Calculate the Fourier transforms of the Haar functions.
11.74	For a multiresolution analysis {Vn}$^=-oo> let Pn denote the orthogonal
projection of £2(7£) onto Vn- Is Pno Pn-i = Pn? Hint: See Exercise 9.26
on page 544.
11.75	Show that the Haar functions do not span a dense subspace of £1(7^).
11.76	Let { h(niTn) : n,m 6 Z } be the family of Haar functions. Define
/(x) = <
2z,
2-2z,
0,
for 0 < x < 1/2;
for 1/2 < x < 1;
for x £ [0,1).
Sketch the graph of the partial sum 52n=-i
11.7	ORTHONORMAL WAVELET BASES? THE
WAVELET TRANSFORM
In this section, we continue our presentation of wavelet theory. Working
with a multiresolution analysis {Vn}£L-oo> we will construct a function *ф
satisfying the conditions of Proposition 11.6 on page 682 and thereby ob-
tain an orthonormal wavelet basis for £2(7£). We will also introduce a
continuous version of the wavelet expansion.
Scaling Functions
Recall that a sequence {Vn}^=_oo of closed subspaces of £2(7£) is called
a multiresolution analysis of £2(7£) if it satisfies (M1)-(M6) on page 681.
By (M6) there is a function tp such that { 9?(o,m) • m 6 Z } is an orthonormal
basis for Vq. We will call ip a scaling function of the multiresolution
analysis {Vn}^L_oo. Properties of tp are developed in the following lemmas.
11.7 Orthonormal Wavelet Bases; The Wavelet Transform □ 685
LEMMA 11.7
Let tpbea scaling function of the multiresolution analysis {Vn}£L-oo* Then
00	1
£ |^ + 2тг< = -
27Г
for almost all t G 1Z.
PROOF: We outline the proof, leaving the details to the reader for Ex-
ercise 11.77. Let g(f) = £X-oo + 27r^)|2- Then g is an extended
real-valued function with period 2тг. That g is finite almost everywhere
follows from f^g(t)dt = ||0||2 = 1.
We now see that g G Using Plancherel’s theorem and Theo-
rem 11.14 on page 676, it can be shown that the Fourier coefficients g(k)
vanish for к / 0. Thus, g has the same Fourier coefficients as the function
that is constantly equal to 1/2тг. Applying Theorem 11.1(c) on page 638
now yields the required result.	
LEMMA 11.8
Let <pbea scaling function of the multiresolution analysis {Vn}^=-oo • Then
the following hold,
a)	For almost all x € 71, we have tp(x) = \/2	- n), where
Pn = (p,P(-i,n))- Moreover, EX-oo Ш2 = L
b)	For almost all t G 1Z, we have <^(t) = mo(t/2) (^(t/2), where mo(t) =
PROOF: Again we only outline the proof, leaving the details for Exer-
cise 11.78. To prove (a) it suffices to show that {(p(_i?n) : n G Z } is an
orthonormal basis for the space V-i. Applying the Fourier transform to
both sides of the equation for given in part (a), leads to the verifica-
tion of (b).	
LEMMA 11.9
Let ip be a scaling function of the multiresolution analysis {Vn}£L-oo an(^
let mo be as in Lemma 11.8(b). Then |m0(t)|2 + |mo(t + 7r)|2 = 1 ae.
686 □ Chapter 11 Elements of Harmonic Analysis
PROOF: By Lemmas 11.7 and 11.8, we have
1	00
= E |£(t + 2tt£)|2
£=—oo
oo	oo
= j; Ht+4^)i2+ 52 i^+2^+i)i2
£=—oo	£=—oo
oo
= 52 l^(V2 + 2тг€)|2 |m0(t/2 + 2тг£)|2
£=—oo
oo
+	1^/2 + + 2тг£) |2 \m^(t/2 + 7Г + 2тг£)|2.
£=—oo
for almost all t. Using Lemma 11.7 and the fact that mo has period 2тг, we
obtain that
|m0(t/2)|2 У |^(t/2 + 2тг£)|2
1
27Г
£=—oo
+ |mo(t/2 + 7г)|2	|<£(£/2 + 7Г + 2тг£)|2
£=—oo
= ^-|m0(t/2)|2 + ^-|mo(t/2 + 7r)|2.
27Г	27Г
The assertion of the lemma now follows immediately.
Our next lemma characterizes the action of the Fourier transform on
the space V-i.
LEMMA 11.10
Let <p be a scaling function of the multiresolution analysis {Vn}^=-oo and
let f € V-i. Set fn = (/, ^(-i,n))- Then f(t) = mf(t/2) ^(t/2), where
(11.38)
11.7 Orthonormal Wavelet Bases; The Wavelet Transform □ 687
PROOF: Recall that Wo = Vo-1" О V-i. Because	: n € Z} is an
orthonormal basis for V-i, every f e Wo has the expansion
oo	oo
f= X? (A <£(-l,n))<£(-l,n) = У2 /n^(-l,n)‘
n——oo	n=—oo
The required result now follows from a straightforward application of the
Fourier transform using Theorem 11.14 on page 676.	
When the function f belongs to the space Wo, more can be said about
the function rrif given in (11.38).
LEMMA 11.11
Let tp be a scaling function of the multiresolution analysis {Vnl^-o© and
let mo and m/ be as in Lemmas 11.8 and 11.10, respectively. IffE Wo,
then	______ _________________
mo(t) TTif(t) 4- mo(t 4- тг) mj(t 4- тг) = 0
for almost all t.
PROOF: The proof is similar to that of Lemma 11.7. We sketch it here
and leave the details to the reader as Exercise 11.79. Let
G(t)= f(t + 2тг£) <p(t + 2тг€).
£=—oo
Then G € ancl	Fourier coefficients vanish; so, G = 0 ae.
Applying Lemmas 11.8 and 11.10 we have, for almost all t, that
0= ^2 Tn/(t/2 4- 7t£) m^t/2 4- тг£) |<^(t/2 4- тг£)|2.
£=—oo
The proof is completed by an argument similar to that used in the proof
of Lemma 11.9.	
Next we have a formula that characterizes the Fourier transform of a
function in the space Wo-
LEMMA 11.12
Let ip be a scaling function of the multiresolution analysis {VnJJJL-c» and
let ttiq be as in Lemma 11.8. Then f € Wo if and only if there is a function
F € £?, such that
f(t) = eit/2 mo(t/2 + 7r) <p(t/2) F(t).	(11.39)
688 □ Chapter 11 Elements of Harmonic Analysis
PROOF: Suppose that f e Wo- Let
Ltt} = /	+ тг), if m0(t 4- тг) / 0;
( 0,	if mo(t 4- тг) = 0.
It follows from Lemmas 11.9 and 11.11 that
L(t) = —L(t + тг)	(11.40)
and	_________
mf(t) = mo(t 4- тг) L(t).	(11.41)
Now let F(t) = e"lt/2L(i/2). Applying Lemma 11.10 and (11.41), we
deduce that *
f(t) = ei(/2 m0(t/2 + тг) £(t/2) F(t).
From (11.40), we see that F has period 2тг. Also, it follows from the defini-
tion of F that |F(t)|2 |mo(t/2-|-7r)|2 = |my(t/2)|2. Hence, by Lemma 11.9,
К = |F(t)|2 |m0(t/2)|2 + |F(t)|2 |m0(t/2 + <
= |m/(Z/2 + 7r)|2+|m/(f/2)|2.
Consequently, by Theorem 9.6 on page 549, we have
[ |F(t)|2 dt = f (|m/(t/2 4- тг)|2 -I- |m/(t/2)|2) dt
J — 7Г	J — 7Г
= 2/ |m/(t)|2 dt = 4тг £2 l/n|2 = 47г||/||2 < oo.
t'~7r	n——oo
This shows that F G £2тг-
Conversely, suppose that f satisfies (11.39) for some F G £2тг- We
then have a Fourier series expansion F(t) = 52^=-oo F(n)elnt, where
l^(«)|2 < 00. Thus,
/(<) = £ F(n)einteit/2m0(t/2 + ir)<p(t/2).	(11.42)
n= —OO
Applying Theorem 11.14 on page 676, we have
____________ i 00
eil'2 mg(t/2 + тг) ^(t/2) = 4= V
K— — OO
oo
=	(”~f)	1,—fc)(f),
fc=—oo
where we recall from Lemma 11.8 that pk —
11.7 Orthonormal Wavelet Bases; The Wavelet Transform □ 689
The series
oo
V»= 52	,-fc)
k=—oo
defines a function in the space V-i. By the continuity of the Fourier trans-
form, we have	_____________
0(t) = mo(t/2 + 7r) ip(t/2).	(11.43)
Consequently, we can use Plancherel’s theorem and Theorem 11.14 to
rewrite (11.42) as
oo
/(<)= £ F(n)eint^t).
n=—OO
Thus, f is also the Fourier transform of the function	^(П)0(о,-п)«
Applying the uniqueness property of Fourier transforms (Corollary 11.1 on
page 660), we conclude that
oo
/=^2 F(n)^0-n)-
n=—oo
Because € V-i for each n € Z, it follows that f € V-i. To
complete the proof, we must show that f € Vo"1"- This will be accomplished
if we can prove that
(V\0,—n)? ^(0,—п+т)} = (VS ^(0,m)) 0	(11.44)
for all m € Z. However, we have by Plancherel’s theorem, Lemmas 11.7
and 11.8, and (11.43) that
(Ф 1^(0,тп)) ~ (VS 0(0,m))
= f ei(m+i/2)tmo(f/2-h 7r) mo(^/2) \ф(1/2}\2 dt
J—oo
OO -2k__________________________________
=	2/ ег(2гп+х)* m0(t + 7r) mo(t) |0(t + 27f^)|2 dt
£=—oo J0
1 f27T_________________________
= — /	el(2m+1)t Tno(t + 7r) molt) dt
П Jo
= — [	mo(t + 7r) mo(t) dt
* Jo
+ - Г ei(2m+l)(t+Tr) m + 2k) TOo(t + 7Г) dt
К Jo
= 0.
This verifies (11.44) and completes the proof of the lemma.
690 □ Chapter 11 Elements of Harmonic Analysis
Construction of Orthonormal Wavelet Bases
In the course of the proof of Lemma 11.12, we constructed the function
OO
k=—oo
As the next theorem shows, ф is a wavelet.
THEOREM 11.15
Let	be a multiresolution analysis of £2(7£) with scaling func-
tion tp. Define
oo
Ф= 52	(11.45)
k=—oo
where ipk —	Then {ф(п,т) • n,m E Z} is an orthonormal
wavelet basis for £2(7£).
PROOF: We will prove the theorem by verifying that ф satisfies the hy-
potheses of Proposition 11.6 on page 682. To begin, we note that \\ф\\ = 1
because 'EnL-oo l^n12 = 1- Also, in the course of proving Lemma 11.12,
we actually established that ф G Wo- It now follows from (11.43) and
Lemma 11.12 that condition (b) of Proposition 11.6 is satisfied.
It remains to show that condition (a) of Proposition 11.6 holds. To
that end, we apply (11.43), Lemma 11.7, and Lemma 11.9 to obtain that
Г eint|Vi(t)|2 dt = Г° eint |m0(t/2 + тг)|2 |£(t/2)|2 dt
—oo	J—oo
oo г2тт
= У2 I el2nt\mQ(t 4- 7г)|2 \fi(t 4- 27rf)|2 dt
£=-oo
1 f27r .
= — J et2nt\mQ(t 4- тг)I2 dt
= — el2nt|mo(t 4- 7r)|2 dt
+ — [ et2nt\m^{t 4- 2тг)|2 dt
к Jo
1 Г
= - / ei2nf dt = 0,
Jo
for П / 0.
11.7 Orthonormal Wavelet Bases; TheWavelet Transform □ 691
EXAMPLE 11.13 Illustrates Theorem 11.15
Refer to Example 11.12 on page 679. If we apply Formula (11.45) to the
scaling function <p = X[o,i), we obtain the wavelet
-0(x) = ip(2x 4-1) — ip(2x 4- 2).
This wavelet is quite similar to the basic Haar wavelet, h. In fact, we
have ^(0,1) = —h. It follows that, in this case, the orthonormal wavelet
basis determined by consists of the Haar functions multiplied by the
factor —1.	□
The Wavelet Transform
Next we introduce a continuous version of the discrete wavelet expansion.
To begin, we recall that for a,b G 'll and a / 0, the function 'фа,ъ is the
translation-dilation of the function
фа,ь(х>) = ~7r=^((x ~ tf/a)-
Vl°l
Here now is the definition of the wavelet transform.
DEFINITION 11.7 Wavelet Transform
Let be a fixed function in £2(7l) \ {0}. Then, for each f G £2(7£), ,
the function Wf: (0, oo) x H —> C defined by
W) = («
is called the wavelet transform of f.
Note: Although the wavelet transform depends on the fixed function we
have retained the terminology found in the literature by writing W instead
of and by using the terminology “ the wavelet transform” instead of,
say, “ the wavelet transform with respect to
EXAMPLE 11.14 Illustrates Definition 11.7
Let Г > 0 and set ф =	From Plancherel’s theorem, we have
for each f G £2(7?.) that Wf(a,b) =	Referring to Example 11.5
on page 655, we conclude that
H7(a,&) =
f(t) sinc(afT)elbt dt.
oo
692 □ Chapter 11 Elements of Harmonic Analysis
Replacing f by f and again applying Plancherel’s theorem, we obtain that
Ж/(а, b) =	I f(t) sinc(atT)eiM dt
’	=	J	f(x) sinc(axT)e~lbx dx.
If f is also in £x(7£), then we can use the dominated convergence theorem
to conclude that
If00
lim W= —7= / f(x)e ltxdx = f(t).
у2тг J-oo
Thus, we obtain the Fourier transform of f as a limiting case of a wavelet
transform.	□
If гр is a wavelet, that is, if { V\n,m) • n,m € Z } happens to be an
orthonormal wavelet basis for £2(7£), then f can be recovered from its
wavelet transform via
f = 52	= 52 W7’(2n,2nm)V’2’>,2’>m-
n,m	n,m
This suggests, in general, the heuristic formula
f= /	W7(a,6)^ai6dg(a,&)	(11.46)
J (0,oo)x'R.
for recovering a function from its wavelet transform. In what follows, we
will show how sense can be made of (11.46) by choosing the measure ц
appropriately and imposing mild restrictions on гр.
We begin by deriving the measure /1. By Plancherel’s theorem and
Theorem 11.14 on page 676, we have
/ОО __________
/(t)^(at) eibt dt = F(-b),
-OO
where F(t) = y/2naf(t)$(at). Again applying Plancherel’s theorem, we
, obtain, for each а > 0, that
/ОО	pOO
\Wf(a,b)\2db = /	\F(—b)\2 db
-oo	J —oo
/ОО	roo
|F(t)|2 dt = 2тга /	|/(t)|2|V-(at)|2 dt.
-oo	J—oo
11.7 Orthonormal Wavelet Bases; The Wavelet Transform □ 693
Multiplying by a 2 and integrating over (0, oo) yields
/•OO /*OO	1
I / \Wf(a,b)\2-^dbda
0 J—oo
/ОО	z»OO 1
2тг|/(£)|2 / -|V>(at)|2 dadt
-oo	Jo &
(И-47)
/•oo	/»oo -j	/»0	/*0 -j
f 2тг|/(£)|2 I -|^(5)|2 dsdt — /	2тг|/(£)|2 /	-|^(s)|2dsdt
0	Jo s	J-ж	J_oo s
We are now ready to impose a restriction on namely, that
r°° 1 -
ds = I -|'0(s)|2 ds = Сф
Jo s
oo.
With this restriction on Vs we can prove a theorem for the wavelet transform
that is analogous to Plancherel’s theorem.
THEOREM 11.	16
Suppose that € C2(JRJ) \ {0} and
7° 1 -	r00 1 -
— I	-l^(s)I2ds = / -|^(s)|2 ds =	< oo.
J—oo $	Jo &
Define the Borel measure p$ on (0, oo) via
Mo(B) = 77г [ a~2 в EB,
ЛТГСф J в
(11.48)
and let p = pq x Л. Then the wavelet transform is a linear operator
from £2(7£) to C2(p) that satisfies
II WII2.M = 11/11,	/e£2(7£),	(11.49)
where || ||г,м denotes the £2-norm on £2(g).
PROOF: It follows from (11.47), (11.48), and Plancherel’s theorem that
f	1	f°° f°°	1
/	\Wf(a,b)\2d^a,b) = —— /	/ \Wf(a,b)\2-idbda
У(0,оо)хтг	z7rC^ j0 j_oo	a
/•OO	7*00
= / |/(t)|2dt = /	\f(x)\2 dx.
J—00	J — 00
Thus, (11.49) is valid.
694 □ Chapter 11 Elements of Harmonic Analysis
Theorem 11.16 provides a likely candidate for the measure p appearing
in the heuristic formula (11.46). Still, the problem of correctly interpret-
ing (11.46) remains. One possible approach, explored in Exercise 11.83, is
to show that under appropriate conditions
f(x)=l Wf(a,b)ipatb(x)dn(a,b)
J (0,oo)x7?,
for almost all x. A more subtle, but easier to prove, interpretation is
based on the following theorem whose verification is left to the reader as
Exercise 11.86.
THEOREM 11.	17
Suppose that ф € £2(7£) \ {0} and
/° i л	/*°° i л
-|^(s)|2ds= I -|^(s)|2ds =	< oo.
-oo	Jf) §
Then (11.46) is valid in the sense that
(f,g) = [ Wf(a,b){ipatb,g)dfj,(a,b'), f,ge£2(1l),
J (0,оо)х7£
where p is defined as in Theorem 11.16.
The theory of wavelets is an important and active research area. As a
starting point for the interested reader, we recommend the paper “Wavelet
transforms and orthonormal wavelet bases” by I. Daubechies (Proceedings
of Symposia in Applied Mathematics, Vol. 47, American Math. Soc., Prov-
idence, RI, 1993).
EXERCISES 11.7
ft
11.77	Provide the details of the proof of Lemma 11.7 on page 685.
11.78	Provide the details of the proof of Lemma 11.8 on page 685.
11.79	Provide the details of the proof of Lemma 11.11 on page 687.
11.80	Show that (11.48) on page 693 is satisfied if ф is real-valued and ф vanishes
in some open interval containing 0.
11.81	Show that (11.48) on page 693 is satisfied if ф is real-valued, ф is continuous
in some open interval containing 0, ф(0) = 0, and ^'(0) exists.
11.7 Orthonormal Wavelet Bases; The Wavelet Transform □ 695
11.82	Show that (11.48) on page 693 is satisfied by the Haar wavelet, h, discussed
in Example 11.12 on page 679. Find C# in this case.
11.83	Suppose that ip satisfies (11.48), д is defined as in Theorem 11.16, and
/, f 6 £2(7£) П £1(7^). Prove that
/(x) = I Wf(a,b)ipa,b(x)dfi(a,b) <
J (0,oo) X7£
for almost all x G TZ.
11.84	Consider the Hermite function hi discussed in Exercises 11.59-11.66.
a)	Find C/ц.
b)	Determine Who.
11.85	Find a formula for Wfc,d in terms of Wf.
11.86	Prove Theorem 11.17.	_	-
Claude Elwood Shannon
(1916- )
Claude Elwood Shannon was born in Gaylord,
Michigan, on April 30, 1916. In 1936, he ob-
tained a bachelor's degree at the University of
Michigan; in 1940, he was awarded both a mas-
ter’s degree and a doctorate in mathematics at
the Massachusetts Institute of Technology.
After working as a National Research Fellow
at Princeton University for a year, he joined
the staff at Bell Telephone Laboratories in 1941. Shannon's charge at
Bell Labs was to determine the most efficient method of transmitting
information; his success in presenting the transmission of information as
precise mathematical theory has led to his being regarded as one of the
founders of information theory. Shannon related the relaying of informa-
tion to a binary system of yes/по choices, represented by a 1/0 binary
code, a representation still integral to computer design today.
Shannon published the book, The Mathematical Theory of Communi-
cation, in 1949, In 1956. he accepted the position of Visiting Professor
of Electronic Communication at the Massachusetts Institute of Technol-
ogy; in 1957, Professor of Communications Science and Mathematics,
and in 1958, Donner Professor of Science.
In addition to communications engineering, Shannon's methods have
profoundly influenced several other sciences including statistics, engineer-
ing, biology, and physics. Dr. Shannon is now retired and resides in Cam-
bridge, Massachusetts.
696
12
Measurable Dynamical
Systems
In this chapter we will discuss the theory of measurable dynamical systems.
Section 12.1 introduces the theory by providing a motivating heuristic il-
lustration, stating the definition of a measurable dynamical system, and
presenting several examples. In Section 12.2 we discuss ergodicity and
prove the pointwise ergodic theorem. Section 12.3 examines isomorphisms
of measurable dynamical systems and introduces entropy. Then, in Sec-
tion 12.4, we investigate the entropy of a Bernoulli shift.
12.1 INTRODUCTION AND EXAMPLES
To introduce this chapter, we construct a simple heuristic model illustrating
the idea of a measurable dynamical system. Imagine a particle p confined
in some compact region (1 C ft3.
Suppose that p moves around inside Q according to the following rule:
If p is at x at time n, it moves to <^(z) at time n -I-1, where <p: Q —> Q is
a function that is independent of n. Although, according to this rule, the
particle is always moving in Q, the law governing its movement remains
constant for all time.
697
698 □ Chapter 12 Measurable Dynamical Systems
^n(A) — f
For A C Q, let
1, if p € A at time n;
0, otherwise.
Then the expression
д(А)= lim
represents an average over time of the number of visits of the particle to
the set A, that is, the number of visits to A per unit time by the particle.
Let A denote the collection of subsets of Q such that the previous limit
exists. Clearly, 0 and Q belong to A, and it is easy to see that p satisfies
the following conditions:
•	p(A) > 0 for all A e A.
•	д(0) = 0.
•	М(П) = 1.
•	If A, В e A are disjoint, then p(A U B) = p(A) + /z(B).
Consequently, we see that the triple (Q, A, p) resembles a probability space.
Suppose that, indeed,
(Q, A, p) is a probability space.	(12.1)
Because the particle is in A at time n if and only if it is in <^“1(A) at
time n — 1, we have /zn_i(99-1(A)) = pn(A). It follows that
AeA =* ^(AJeA	(12.2)
and
AeA => ^(A) = ^"1(A)).	(12.3)
Thus, a quadruple (П, A,p, y>) satisfying (12.1)-(12.3) models the average
behavior of the simple particle motion described in the preceding. Formally,
we have the following definition.
DEFINITION 12.1 Invariant Measure, Measurable Dynamical System
Let (Q,A,/z) be a measure space. Suppose that	—> Q and that
y?~1(A) e A for all A e A. Then p is said to be invariant with
respect to if
p(A) = /z(y>-1(A)), AeA.
12.1 Introduction and Examples □ 699
If ii is invariant with respect to and is also a probability measure,
then the quadruple (П,Л,^, 92) is called a measurable dynamical
system.
In the remainder of this section, we will present a variety of examples
of measurable dynamical systems showing their relevance and importance.
EXAMPLE 12.1 Addition Modulo One
Let the operation 4- be defined on [0,1) by
?	f \ ji	/	я + ?Л	if x + y	<	1;
x +	у =	(x + y) mod 1	= <	-r	_l	i
v	[	x + у -	1,	if x + у	>	1.
For fixed b G [0,1), let <ръ(х) = x -?• b. Then ([0,1),	A[o,i)> <Рь) is a
measurable dynamical system.	□
EXAMPLE 12.2 Rotation Through an Angle
Let E be the map from [0,1) onto the unit circle T in the complex plane
defined by E(x) — е21ггх and let A = { A С T : E"1^) G M }. Define the
measure /z on A by p(A) = A(E”1(A)), so that /z is normalized arc-length
measure on T. Also, for fixed b G [0,1), define •фь-Т -+T by ^(z) = e27ribz.
Then (Г, Л, /z, фъ) is a measurable dynamical system. In a sense that will
be made precise later, this example is the same as the previous one. □
EXAMPLE 12.3 Multiplication by 2 Mod One
Let the mapping 925 in Example 12.1 be replaced by
<p(x) = 2x mod I = x x.
As the reader is asked to verify in Exercise 12.1, у is measurable with
respect to A4[o,i) and A[0,i) is invariant with respect to 99. Consequently,
([0,1), Л4[од), A[o,i), <p) is a measurable dynamical system. It is interesting
to note that if x G [0,1) has the binary expansion x = Q.X1X2X3 ..., then
we have <p(x) = О.Я2Я3 • • • (2).	□
EXAMPLE 12.4 Bernoulli Schemes
Let S = {1,2,...,JV}, where N > 2, and let p = (pi,P2, • • • ,Pn), where
Pj > 0 for each j G S and £^1 Pj = 1- The vector p defines a probability
measure /zq on S via Mo({j}) = Pj-
700 □ Chapter 12 Measurable Dynamical Systems
Recall that the Cartesian product fi = Sz consists of all functions
on the integers having values in S', or, alternatively, all doubly infinite
sequences of elements of S. From /iq we will construct a probability mea-
sure // on Q by extending the development of product measure given in
Theorem 4.20 on page 254.
To begin, let F be a finite set of integers and a a function from F
into S'. Then we define
CF,a = { / G П : /(j) = a(j) for j e F }.
Denote by C, the collection of subsets of fi consisting of 0, fi, and all sets
of the form CF,a.
Next we define a set function l on C by letting t(0) = 0, t(fi) = 1, and
t(CF>o) = JI Mo({a(j)})-
j’GF
Exercises 12.2 and 12.3 ask the reader to show that C is a semialgebra of
subsets of fi and that l satisfies conditions (E1)-(E3) on page 208 and con-
dition (E4) of Theorem 4.12 on page 216. Consequently, by that theorem,
i extends uniquely to a probability measure /1 on the ст-algebra A generated
byC. We now have a probability space (fi,A, /х).
Next we define the function > fi by <^(/)(j) = f(j + 1). If we
consider the elements of fi doubly infinite sequences, then the effect of <p
is to move each term of a sequence f one place to the left. For this reason,
the mapping p is often called a Bernoulli shift.
It is easy to see that
p'\CF^ = CF^a^	(12.4)
where F* = {j + 1 : j G F} and a*(j + 1) = a(j). It follows that the
a-algebra { A C fi :	G A} contains C and, hence, (^“^A) G A for
each A G A.
We claim that the measure /j, is invariant with respect to <p. Let the
measure и be defined on A by i/(A) = ^(<^“1(A)). By (12.4) we have
«'(C'f.o) = M(CF.,a.) = П Mo(a‘(j)) = П W>(°0’)) = M(CF,a)-
J6F-	jEF
Thus v and agree on C and so, by Theorem 4.12, v = g. This means that
д is invariant with respect to p.
12.1 Introduction and Examples □ 701
We have shown that (Л,Л,/х, is a measurable dynamical system.
This system is known in the literature as a Bernoulli scheme and is
often denoted by B(p1,p2> • • • ,Pn).	□
EXAMPLE 12.5 Continued Fraction Expansions
Decimal and binary expansions are familiar ways of representing real num-
bers. Less familiar but, nevertheless, useful and interesting is the expansion
of a real number as a continued fraction. This expansion is based on iter-
ation of the function defined on [0,1) by
¥>(*) = | J7* L1/a:J’
for x / 0;
for x = 0,
where |_ J denotes the greatest integer function.
We can express x in terms of ip(x) by
1
ot(x) + ip(x) ’
(12.5)
where
a(x) = / IV^-I >
I oo,
if x / 0;
if x = 0.
Replacing x by (p(x) and substituting into the right-hand side of (12.5), we
obtain
1
ж-	1
+ а(^(яг)) +
Repeating this procedure, we obtain
1
X~ 1
a(x) +----------------------------------------------------------
a(y>(x)) +-----------------------------------------
a(^(^(x))) +-------------------j---------
702 □ Chapter 12 Measurable Dynamical Systems
where (p^ indicates the nth iterate of tp. As the reader is asked to show
in Exercise 12.4, the sequence of quotients
1
Xn —	1
a(x) +--------------------------------------------
“(Ф)) +-----------------------------------
а(^(р(я:))) +------------------
(a:))
converges to x. Thus, we have the continued fraction expansion
x =--------------------------------------- (12.6)
a(®) +----------------------------
а(<р(я:)) +--------------—
а(<р(<р(х))) + —
Clearly, the mapping <p is the key element for obtaining (12.6). We
will relate a measurable dynamical system to the continued fraction by
finding a probability measure p, on Borel subsets of [0,1) that is invariant
with respect to <p. To obtain /z, we begin by deriving a necessary condition
for a Borel measure on [0,1) to be both invariant with respect to <p and
absolutely continuous with respect to A[o,i)«
Suppose then that /1 is a Borel measure on [0,1) that is invariant
with respect to <p and absolutely continuous with respect to A[Oji). Set
g = dpb/dX. Then, for each t 6 (0,1), we have
/ g(x) dX(x) = / g(x)dX(x).
Ao,t)	Jy-'M
Using
1((0,t))= (J{x : Ll/zJ = k and 92(2?) <t}= |J |
fc=i	fc=i	K
we obtain
(12.7)
12.1 Introduction and Examples □ 703
Ignoring questions of convergence, we differentiate both sides of (12.7) to
get the equation
00	/ i \	1
9 ® = 529	) (t+fc)2 	(12,8)
k=l x z v '
Equation (12.8) looks formidable. To find a solution, it is helpful to recast
it as a functional equation:
1
(t + fc + l)2
V ’	12.9)
9(t) = g
1	\	1	уг /	1
t + ij (t + l)2 v + fc + i
Мгтт) <rrip+s'1 + 1)
The form of (12.9) suggests that we try to find solutions of the type
g(t) = (t + l)a. Substituting for g(t) in (12.9) gives
/ 1 V 1
(t + i)“ = (j7i+1) (t+Ij2 + (*+ 2)a-	(12’10)
It is not hard to see that (12.10) is satisfied for all t G [0,1) if a = —1.
The preceding informal argument suggests that measures on [0,1) of
the form
= / 7TT dX&
J В я + 1
are invariant with respect to the transformation <p. It is left for Exer-
cise 12.5 to verify this suggestion formally. The choice c = (log2)”1 yields
an invariant probability measure on [0,1).	□
EXAMPLE 12.6 Hamiltonian Systems
Consider the system of differential equations
=	=	J = l,2,...,3n,	(12.11)
dt @Pj	OQj
where H is a function on TZen of the form
H(p,q) = H(pi,P2,--->P3n,9i,Q2,---,93n)
704 □ Chapter 12 Measurable Dynamical Systems
Such systems of differential equations are important in mechanics where,
for 1 < j < n, the vectors (Q3j-2,Q3j-i, Q3j) and (p3j_2,P3j-i5P3j) rep-
resent, respectively, the position and momentum of the jth of n particles
moving in 7£3. The term | SjZi Pj /mj gives the total kinetic energy of the
n particles and V(q\, g2, • • •, <73n) is the energy associated with interactions
of the n particles.
Assuming that V is reasonably well behaved, it follows from the general
theory of differential equations that, for each
(x,y) = (х1,х2,...,2:зп,У1,2/2,...,Узп) € 1ZGn,
there is a unique solution
a(t, x, p) = (pi (t, x, p),..., рзп(t, x, y), gi (t, x, p),..., g3n(*, x, 2/))
to the system that is defined for all t and satisfies q(0, x, p) = (ж, у).
Also, under appropriate hypotheses on the function V, it can be shown
that, for j = 1, 2, ..., 3n, all second-order partial derivatives of the func-
tions Pj(t, x,y) and qj(t, x, y) with respect to each of the variables t, a?i,
..., узп exist and are continuous.
For fixed t, the function ipt(x,y) = a(f, x,y) maps 1Z6n into itself. We
will show that
A6n(£) = Абп^Г1 (£))> E e -M6n.	(12.12)
This result is known as Liouville’s theorem.
To obtain (12.12), we first apply the change of variable formula from
advanced calculus* to conclude that
I | det J(pt | dXen — I d\Qn
Jb	J^b)
whenever В is a Cartesian product of bounded intervals. And then we use
the fact that
detJ^ = l.	(12.13)
(See Exercise 12.6.)
Next we will combine (12.12) with an invariance property of H to
produce a measurable dynamical system. The property of H that we need is
(
Ho<pt = H.	(12.14)
t See, for instance, Protter and Morrey’s A First Course in Real Analysis, 2nd edition
(New York: Springer-Verlag, 1991), p. 366.
12.1 Introduction and Examples 705
To obtain (12.14), we use the chain rule and (12.11). We have
to>+gta(I,s)) to>
J=1 J	J
3n
dH . ..дн, , "
C7Qj-	OPj
= 0.
It follows that
H(<Pt(x,yy) = #(^o(z,y)) = H(x,y)
and, hence, (12.14) holds.
For each c G 1Z, let Qc = H""1((—oo,c)). Then, in view of (12.14),
we have <pt(Qc) C Qc. Assuming, as in many applications, that Qc is a
bounded set with positive Lebesgue measure, we can define a probability
measure on Lebesgue measurable subsets of Qc by
/z(E) =
A6n(£)
^бп(^с)
It follows from (12.12) that p, is invariant with respect to ipt and, conse-
quently, (Qc,-A4qc,/z, <^t) is a measurable dynamical system.	□
EXERCISES 12.1
12.1	Prove that the mapping <p in Example 12.3 on page 699 is measurable with
respect to Л4(о,1) and that A[o,i) Is invariant with respect to ip.
12.2	Show that the collection C defined in Example 12.4 on page 699 is a semi-
algebra.
12.3	Show that the set function l defined on the collection C of Example 12.4
satisfies conditions (E1)-(E3) on page 208 and condition (E4) on page 216.
12.4	Prove that the sequence {zn}!Xv defined in Example 12.5 on page 702,
converges to x. Hint: If x G Q, show that xn = x for sufficiently large n. If
x Q, show that xn = pn/qn where {pnJ^Li and	are sequences of
integers defined recursively by: p~i = 0, po = 1, and pn = anpn-i + Pn-2
for n > 1; q-i = 1, go — a(z), and qn = unQn-i 4- qn-2 for n > 1. Here
an = a(cp(n\x)).
706 □ Chapter 12 Measurable Dynamical Systems
12.5	Prove that the measure
m(B) =
x 4-1
В € B[0,i),
is invariant with respect to the mapping ip defined in Example 12.5 on
page 701.
12.6	Verify (12.13) on page 704. Hint: Show that ddet y)/dt — 0.
★12.7 Suppose that Q is a compact Hausdorff space and let ipi Q —> Q be contin-
uous. Show that there is a regular Borel probability measure д on Q such
that ^(y?“1(B)) = /z(B) for, all Borel subsets of Q. Hint: Fix w E О and
apply the Hahn-Banach theorem (page 580) using the subadditive function
a: C(Q, 1Z) —►	defined by
<r(/) = limsup
п-oo n <
where is the fcth iterate of <p.
*12.8 Let Q be a compact Hausdorff space and (pi Q —> Q be continuous. Show
that the collection I(ip) of regular Borel probability measures on Q that
are invariant with respect to <p is weak* compact and convex.
12.9	Let ip(x) = x2. Show that the only regular Borel probability measures
on [0,1] that are invariant with respect to the function ip are those of the
form сбо 4- (1 — c)(5i, where 0 < c < 1.
12.10	Let <pi [0,1] —► [0,1] be absolutely continuous, strictly increasing, and onto.
Set ip = <p~l. Show that if /j, is absolutely continuous with respect to A[o,iJ
and invariant with respect to y?, then
for almost all x 6 [0,1].
12.11	Suppose that (pi [0,1] —► [0,1] is continuously differentiable and that for
each x 6 [0,1], (p"1 ({ж}) is a finite set. Let // be absolutely continuous
with respect to A[o,i] and set g = dpb/dA. Show that /z is invariant with
respect to ip if and only if
g(y)

for almost all x € [0,1].
12.2 Ergodic Theory □ 707
12.2 ERGODIC THEORY
Let (fl, А, д, (p) be a measurable dynamical system. Recall that for n e M
denotes the nth iterate of </?. We also define to be the identity
function on fl.
For x e П, the sequence
x, p(x), p(<p(x)), ..., p(n)(x), ...,
called the orbit of x, describes the path of the point x as it moves in Q
under iterations of the mapping </?. Ergodic theory tries to find out as much
as possible about this sequence.
Oftentimes in applications, orbits cannot be observed directly, but
rather data are obtained in the form of numerical sequences
/(x), №(x)), /(^(x))),	№(n)(z))>
where f is some function defined on Q. In this section, we prove some gen-
eral results about the average behavior of the sequence {/(^n“14^))}n=1-
Specifically, we will first establish that for each f E £1(/^), the limit
- n—1
/* =, lim - V f о
n—>oo ti
k=0
exists /i-ае. Then we will investigate the important case where /* is con-
stant /i-ae for all f E £1(/i).
THEOREM 12.1 Pointwise Ergodic Theorem
For each / e the limit
- n-1
f* = lim -У	(12.15)
n—»oo TI
fc=0
exists fi-ae. Furthermore, f* € Г1 (д) and satisfies
n
n
(12.16)
708 □ Chapter 12 Measurable Dynamical Systems
PROOF:* We will prove the theorem in the special case f = хв? leaving the
proof of the general case for Exercises 12.13 and 12.14.
We begin by considering the number of visits to the set В among the
first n terms of the orbit of x, that is, Sn(x) = Хв 0 anc* *he
average number of visits An(x) = Sn(x)/n. _
Suppose we can show that the functions A(x) = limsupn-^ An(#)
and A(z) = liminfn^oo An(x) satisfy
[ Adfj,<fj,(B) and [Adp,>pJB). (12.17)
Jn	Jn
Then we would have fQ(A — A) cfyt < 0 and, because A — A > 0, it would
follow that the limit (12.15) exists /i-ае and that (12.16) holds.
We proceed to verify (12.17). Our arguments will make use of the
following properties of the functions A and A:
0< A< A< 1	(12.18)
and
= А and Aoip — A	(12.19)
(See Exercise 12.12.)
To understand the proof of (12.17), it helps to think of the parameter n
as time. Then An(x) represents the average number of visits of the orbit
of x to the set В by time n — 1.
Let e > 0 and let re(x) denote the first time that the average number
of visits exceeds A(x) — 6. Symbolically, we have
r€(x) = min{ n eAf : An(z) > A(z) - e }.
We observe that by (12.18), re(x) is always a positive integer. From
{x : re(x) > c} = Q{x : An(x) < A(x) - e},
n<c
it follows that re is A-measurable.
t This proof is adapted from one given by M. Keene, “Ergodic Theory and Subshifts
of Finite Type,” in Ergodic Theory, Symbolic Dynamics and Hyperbolic Spaces, edited
by T. Bedford, M. Keene, and C. Series (Oxford, UK: Oxford University Press, 1991).
Keene’s argument is based on ideas in Y. Kamae, “A Simple Proof of the Ergodic
Theorem Using Non-Standard Analysis” (Israel J. of Math, 42, pp 284-290, 1982.)
12.2 Ergodic Theory □ 709
Either re is essentially bounded or it is not, that is, either
re e	(12.20)
or
Te^r00^).	(12.21)
Suppose first that (12.20) holds. Then we can choose an integer M
such that
M(r€-1((M,«)))=0.	(12.22)
For each x e П, we consider the sequence of integers
7i(x) = r£(x), r2(x) = re(9?(ri(l))(x)), т3(х) = те(<^(Т1(а:)+Т2(а:))(х)), ....
It follows from (12.22) and the invariance of д with respect to ip that, for
//-almost all x, we have
Tj(x)<M, jeM.	(12.23)
Suppose that x satisfies (12.23). In what follows, we will suppress the
dependence of Tj on x. Let n be a positive integer greater than M and
let q be such that
where we are using the notation aq = n 4- 12 4----F rq. Then
Sn(x) > S„(x)
= IL о ^k\x) + iz *b о (fW (x) + • • -. + xb 0	(*)
fc=0	fc=CTi	k=aq-i
= ST1(x) + ST3(^(x)) +  + ST,(^’-)(x)).
It follows from (12.19) and the definition of re that
STl(x) > Т1(Л(х) -e),
> t2(A^\x)) -e)= t2(A(x) -e)
5T,(^’-'>(x)) > r,(A(^-)(x)) - e) = 79(A(x) - e).
710 □ Chapter 12 Measurable Dynamical Systems
Hence,
Sn(x) > crq(A(x) - e) > (n - Tg+i)(A(a:) - e).
Applying (12.23) we conclude that, for д-almost all is, we have the in-
equality
Sn(x) > (n - M)(A(x) - б).	(12.24)
We have shown that (12.24) holds for g-almost all x € Q. Integrating
both sides of (12.24) and using the invariance of д with respect to we
obtain
пд(В) = 2д((^))-1(В))
k=0
= I Sn(x) dp,(x) > I (n — M)(A(x) — б) d[i(x).
Jn	Jn
Dividing by n and letting n —* oo, we get
/i(B) > I A(x) d[i(x) — 6.
Jn
A similar argument shows that
/i(B) < I 4(я) d/i(x) 4- 6.
Jn
As б > 0 was chosen arbitrarily, we obtain (12.17). Thus, the proof of the
theorem is complete in case (12.20) holds.
It remains to establish (12.17) in case (12.21) holds, that is, when r€ is
not essentially bounded. The idea is to reduce the proof to the case where
(12.20) holds by slightly enlarging the set B.
Because re is finite-valued, we can choose a positive integer M such
that д(т€~1((Л/,оо)) < б. Now we set Be = В U t”1((M,oo)),
k=0
A^(x) = Sn(x)/n, and т€(я) = min{n G Af : A^(x) > A(x) - б}.
It follows immediately that t6 < re. We claim that
т€(х) < M, x G Q.
(12.25)
12.2 Ergodic Theory □ 711
(12.26)
(12.27)
For, if т€(я) > M, then т€(х) > M. Hence, А|(ге) = 1 > A(x) — c, but this
implies that re(x) = 1 < M, a contradiction.
We can now apply the arguments used in the case (12.20) to obtain
/i(B6) > fQ A(x) d/j,(x) — c. Therefore,
m(B) + € > м(В) + м (т-1 ((M, oo)))
>/i(Bc) > f A(x) dp,(x) - c.
Jn
By similar arguments, we obtain that
/i(B) - б < / A(x) dpjx) + 6.
Jn
From (12.26) and (12.27), we deduce that (12.17) holds.
EXAMPLE 12.7 Illustrates the Pointwise Ergodic Theorem
Consider the Bernoulli scheme of Example 12.4 on page 699. Let к E J\f.
Define F: Q —* 1Z by F(/) = f(k). Because
F-1({m}) = { f e n : /(fc) = m } = C{k}<m,
F is Л-measurable. Applying the pointwise ergodic theorem, we conclude
that the average
F*(/)= lim i£/(fc + J)
j=0
exists for almost all f e fi. We also have
f	f	N	N
/ F*dp = / F dp = V mp{C{k}tm) = V mpm,
as is easily verified.	□
Ergodicity
Many interesting measurable dynamical systems have the property that
for each f e £1(/i), the average, /*, in the pointwise ergodic theorem is
constant almost everywhere. In Theorem 12.2 we will see that this property
is characterized by the following condition:
ВеЛ & B = ^“1(B) => /i(B) =0 or /i(B) = 1.	(12.28)
712 □ Chapter 12 Measurable Dynamical Systems
To understand the meaning of (12.28), it helps to consider its negation.
Suppose Qi G Л, Qi =	1 (Qx), and 0 < /i(fli) < 1. Let Q2 = fl \ fli-
Then we also have П2 G A, Q2 =	and 0 < /1(0г) < 1.
For j = 1, 2, we define the ст-algebra Aj = {А П flj : A G A}
and a corresponding probability measure p>j(A П flj) = /i(A П fl,)//i(flj).
Denoting by tpj the restriction of the mapping <p to Slj, we obtain the two
measurable dynamical systems (Oj,	j = 1, 2.
For x 6 fl, the orbit {^n”14x)}^=1 is contained in either flx or fl2.
Indeed, that orbit equals either {^"^(я)}^ or {p^~ 1\rE)}n2=i • Thus,
we have complete information about the orbits of (fl, A, /1, </?) if we have it
for each of the two smaller systems (fl.,, Aj, <Pj), j = 1, 2.
DEFINITION 12.2 Ergodicity
A measurable dynamical system (fl, A, /1, p) is called ergodic if
Ее A к E^ip-\E) => /i(E) = 0 or /i(E) = 1.
Exercise 12.21 shows that the measurable dynamical system in Ex-
ample 12.3 on page 699 is ergodic. Example 12.8, which we will present
shortly, shows that the measurable dynamical system in Example 12.2 on
page 699 is ergodic if and only if b is irrational.
In the proof of our next theorem, we will need to know that ergodicity
is equivalent to
Ее A & Ec</?”1(E) => /i(E) = 0 or /i(E) = 1.	(12.29)
We leave the verification of this fact to the reader as Exercise 12.15.
THEOREM 12.2
Let (fl, A, /1, (p) be a measurable dynamical system. Then the following are
equivalent:
a)	(fl, A, p, (p) is ergodic.
b)	For each f e £1(/i), the average
f* = lim - V f о tpW
n—>00 fl
k—0
is constant /i-ae.
c)	If f e C1 (/1) and f о ip = f p-ae, then f is constant /i-ae.
12.2 Ergodic Theory □ 713
PROOF: The equivalence of (b) and (c) is left for Exercise 12.16. Suppose
(a) holds and f G £1(p) is such that f о cp = f p-ae. To show that f is
constant p-ae, it suffices to consider the case where f is real-valued. Let
D — {т G fi : f(x) 7^ f о p(x) }. Then = 0. Letting p~k = (<p(fc))“x,
we have from the invariance of p that p(<p“fc(D)) = 0 for all k. Hence,
✓ OO	\	oo
4 □ ^-fe(D)) < £M(^-fc(D)) = 0.
'fc=0	'	fc=0
Let b G 11 and set E = /-1((-оо,Ь)) \ UfcLo	Then we have
p(E) = p(/-1((—oo,6))) and E C p'^E'). By (a) and (12.29), we know
that p(/”1((—oo, 6))) equals either 0 or 1. It is now immediate that f is
constant p-ae. Consequently, we see that (a) => (c)
Conversely, suppose (c) holds. Let E G A be such that E = <p”1(E').
Then xe 0 Ч> = Х^-ЦЕ) = Xe- Hence, by (c), xe is constant p-ae. It
follows that p(£?) is either 0 or 1. Thus, we have shown that (c) => (a) 
EXAMPLE 12.8 Illustrates Theorem 12.2
Using Theorem 12.2, we will now show that the measurable dynamical
system of Example 12.2 on page 699 is ergodic if and only if b is an irrational
number. Suppose f G £x(p) is such that f о p = f. Then the Fourier
coefficients of the function g(x) = f(etx) must satisfy g(n) = е27ГгпЬд(п).
If b is irrational, it follows that g(n) = 0 for all nonzero integers n.
Thus, f is constant by Theorem 11.1 on page 638. Consequently, we see
that (Г, .A,p,Vb) is ergodic.
On the other hand, if b is rational, say, b = p/g, where p and q are
integers, then the function f(z) = zq is nonconstant and satisfies foipb = f.
Hence, (Т,Л, p, Vb) is not ergodic.	□
From the pointwise ergodic theorem and Theorem 12.2, we obtain the
following important corollary.
COROLLARY 12.1
If (Q, Л, p, p) is ergodic, then for each f G E1 (p),
1 n~1 /*
for almost all x G Q.
714 □ Chapter 12 Measurable Dynamical Systems
EXERCISES 12.2
12.12	Verify (12.19).
12.13	Let (Q, Л,/z, y?) be a measurable dynamical system. Suppose f 6 £x(/z)
and f > 0 /z-ае. Let N 6 X and e > 0 be given. Set Sn(f) = Z2kZo
An(/) = Sn(f)/n, A(f) = limsupAn(J), and
Te = min{ n € N : An > min{7V, A(f) — e} }.
Show that
I fdp,> I min{7V, A(f) - e} d/z,
Jn Jn
if Te 6 £°°(/z), and
I fdp,> I min{7V, A(f) — e} dfi — e,
Jn Jn
ifT£g£~(/z).
12.14	Use Exercise 12.13 to complete the proof of the pointwise ergodic theorem.
12.15	Prove the equivalence of ergodicity and condition (12.29). Hint: Consider
я = П„>ои„>^-п(£)-
12.16	Prove the equivalence of (b) and (c) in Theorem 12.2 on page 712.
Exercises 12.17-12.20 are devoted to proving an £2-version of the pointwise er-
godic theorem.
12.17	Let V denote the collection of all f e £2(/z) such that
/* = lim
n-*oo П
k=0
exists in the sense of convergence in the £2(^z)-norm. Prove that V is a
closed linear subspace of £2(/z).
12.18	Let V be as in Exercise 12.17 and let
У = {/еГ2(д):/о^ = /}.
Show that Y С V and P(/) = /* for all f e V, where P:£2(/z) -> Y is
the orthogonal projection.
12.19	Refer to Exercises 12.17 and 12.18. Let Z = {foip — f:fe £2(/z) }•
Show that Z С V and P(Z) = {0}.
12.3 Isomorphism of Dynamical Systems; Entropy □ 715
12.20	Refer to Exercises 12.17-12.19. Show that (Y+Z)1- = {0} and deduce that
V = C2(p). This proves the £2-ergodic theorem: For each f G £2(m),
the limit
f* ~ lim n 52 °v,(fc)
k=0
exists in the sense of convergence in the £2(/z)-norm. Hint: Show that
h G (Y + Z)1- => (h о f) = (h,f о у).
12.21	Show that the measurable dynamical system of Example 12.3 on page 699
is ergodic by employing the following argument.
a)	Show that if (/?-1(A) = A, then A(A A I) = A(A)A(Z), whenever I is a
subinterval of [0,1) of the form I — [p/2n,g/2n) for integers p and q.
b)	Extend the result in part (a) to arbitrary subintervals of [0,1).
c)	Show that A (A) = A (A)2.
12.22	Use the Fourier coefficients cn = J* e~2™nxf(x) dx, n G Z, to provide an
alternative verification to the one given in Exercise 12.21 showing that the
measurable dynamical system of Example 12.3 on page 699 is ergodic.
12.23	Show that if (Q,A, Mi,^) and (Q,A,M2,¥?) are both ergodic, then either
Ml = М2 or Ml ± М2-
12.24	Let Q be a compact Hausdorff space, A be the collection of Borel subsets
of Q, and y?: Q —► Q be a continuous function. Consider
/(Q) = { n 6 P(fi) : д(^-1(А)) = д(Л) for all A € A }.
Show that p is an extreme point of Z(Q) if and only if (Q, А, m, ¥>) is ergodic.
Refer to Exercises 12.7 and 12.8 on page 706.
12.25	Let Q, A, and ip be as in Exercise 12.24. Show that for each v G Z(Q),
there is a regular Borel measure on the weak* closure of exZ(Q) such
that
[	( I fdp) сЕДм) = [ fdv, f G C(Q).
AxT(Q) \Jq /	Jn
Hint: See Theorem 10.15 on page 624.
12.3 ISOMORPHISM OF MEASURABLE DYNAMICAL
SYSTEMS; ENTROPY
This section is an introduction to some ideas motivated by the question:
“When are two measurable dynamical systems essentially the same?” First
we will give a definition of what it means for measurable dynamical systems
to be isomorphic. Then we will present a powerful tool for deciding when
two measurable dynamical systems are isomorphic, namely, entropy.
716 □ Chapter 12 Measurable Dynamical Systems
DEFINITION 12.3 Isomorphism of Measurable Dynamical Systems
Two measurable dynamical systems (fi,A, p, <p) and (A,5, p, VO are
said to be isomorphic if there are mappings J: fi —> A and К: Л —> fi
such that
a)	J“1(B) 6 A for each В 6 S,
b)	K-1(A) G 5 for each A € A,
с)	p(J“1(B)) = v(B) for each В G 5,
d)	p(K~1(A')) = p(A) for each A G A,
e)	J о = -0 о J p-ae,
f)	К о'ф — ip о К p-ae,
g)	К о J(x) = x ц-ае,
h)	J о К (у) = у v-ae.
Each of the mappings J and К is called an isomorphism.
As the reader is asked to verify in Exercise 12.27, the measurable
dynamical systems given in Examples 12.1 (page 699) and 12.2 (page 699)
are isomorphic via the mapping E(x) = e27rLX defined in the latter.
A more complicated example of a pair of isomorphic measurable dy-
namical systems is obtained by considering a so-called one-sided variation
of the Bernoulli scheme B(l/2,1/2).
EXAMPLE 12.9 Illustrates Definition 12.3
Refer to Example 12.4 on page 699. The construction of the Bernoulli
scheme is unaffected if the space fi = Sz is replaced by fi+ = S^. In the
case where S = {0,1} and (р1,рг) = (1/2,1/2), the measure p is replaced
by the measure p+ satisfying р+(С/?а) = 2~N^ and the function ip is re-
placed by <p+((xi, ^2, X3, ...)) = (^2, хз,...). It can be shown that the map-
ping J: fi+ —* [0,1) defined by J((x!,X2,X3,...)) =	Xj2~^ if Xj = 0
for some j, and 0 otherwise, is an isomorphism of (fi+,A+,p+,<p+) onto
the measurable dynamical system ([0,1), B[o?1), A[0,i), <p) of Example 12.3
on page 699. See Exercise 12.28.	□
The idea of isomorphism immediately suggests the following prob-
lem: Given two measurable dynamical systems, determine whether they
are isomorphic. A natural approach to this problem is to seek invari-
ants of measurable dynamical systems. An invariant of a measurable
dynamical system (fi, A, p, <p) is a number or property, T(fi, A, p, <p), such
that if (fi,A,p, <p) and (A,S,v,il>) are isomorphic, then Z(fi, A,fi,ip) and
I(A, 5, v, i/j) are identical.
12.3 Isomorphism of Dynamical Systems; Entropy □ 717
Here is a simple illustration of the use of invariants. As the reader is
asked to verify in Exercise 12.29, the property of being ergodic is an invari-
ant of a measurable dynamical system. From Example 12.8 on page 713,
we know that if b € Q and c Q, then (T, A, /z, is not ergodic and
(Г,А,д,^с) is ergodic. Therefore, those two measurable dynamical sys-
tems are not isomorphic.
Entropy
The remainder of this section is devoted to a discussion of numerical mea-
sures of information. To motivate the pertinent ideas, we consider the
following “thought experiment.”
Let (fi,A,p) be probability space. Suppose that tlie distribution of
the location of a particle, p, in fi is given by the probability measure p;
that is, for each A 6 A, the probability that p is in A equals p(A). The
object of our experiment is to locate the position of p as closely a& possible.
Let ф be a measurable partition of (Q, A). Suppose that we can extract
information about the location of p by answering, for each A € ф, the
question: “Is p in A?” In other words, we can ascertain which element
of ф contains p.
Some partitions tell us more than others about the location of p. For
example, for the probability space ([0,1),Л4[од), A[o,i))i we expect more
information from ф = {[0,1/2), [1/2,1)} than Q. = {[0,1/100), [1/100,1)}.
This is because we are guaranteed that ф will reduce by 50% the measure
of the set where we have to look for p, whereas, unless we are lucky, Q will
reduce it by only 1%.
To proceed rigorously, we need to assign a number to the amount of
information gained from a measurable partition. That number is called the
entropy of the measurable partition.
DEFINITION 12.4 Entropy of a Measurable Partition
Let (fi,A,p) be a probability space and ф a measurable partition
of (fi, A). Then the entropy of ф, denoted by Н(ф), is defined by
where we use the convention that 0 log 0 = 0.
718 □ Chapter 12 Measurable Dynamical Systems
At the end of this section, we will derive the formula for H(^J) from
some plausible properties of a measure of information. For the present, we
content ourselves with the intuitively satisfying observation that in the case
of the probability space ([0,1), A4[o,i), A[o,i)), the entropy of a two element
partition
H({A, Ac}) = -A(A) log A(A) - (1 - A(4)) log(l - A(4))
is maximized when A(A) = A(AC) = 1/2.
To obtain the basic properties of entropy, we need to introduce the
concept of the refinement of a measurable partition. We say that a mea-
surable partition £1 is a refinement of the measurable partition and
write £} if every element of is a union of elements of £1. For any
two measurable partitions ^ and 9Я, there is a smallest common refinement
given by*pv£R = {AnB:Ae*P, BefR}.
PROPOSITION 12.1
Let (Q, A,p) be a probability space and £1, and 91 be measurable par-
titions of (fl, A). Then the following hold:
а)	ф«СП=>Н(ЭД<Н(£1).
b)	Я(<£ V £R) < Я(ф) + Я(ЭТ).
PROOF: To prove (a) we start by observing that each A G ф is a disjoint
union of members of £J. Thus, for p(A) > 0, we have
-/z(A)log/z(4) = - J2 ^(s)log/z(4)
BCA
BEU
= - E M(B) log д(В) + 52 д(В) log
BCA	BCA
B€Q	B€Q
< - E M(B)log/z(B).
BCA
вед
Summing over A e we obtain that
HW) = - E M(A)logM(A) < - E E M(B)logM(B) = B(Q).
АЕф	A&p BCA
The proof of (b) is based on the fact that the function g(t) = —tlogt
is concave on [0,1]; that is, g satisfies
— 9(12.30)
j=i	'
12.3 Isomorphism of Dynamical Systems; Entropy □ 719
for all convex combinations of elements of [0,1]. Without loss of generality
we can assume that д(С) > 0 for all С 6 fH. Thus, we can write
for each A E ^3. It follows from (12.30) and (12.31) that
ЮЕд(Л) > -	ьЕ (2ИПЗ)
= - д(Л n C) logn(A П C) + 52 /Ф4 n C) log/i(C).
сел	с&я
Summing over A E we get
>-^Х^АпСУ) 1о^И n С) + V £ n C) log//(C)
= - 52 5><Л n C) log д(А П C) +.£ M(C) logM(C)
AG^P CGfH	CGfH
= H(qjvfH)-H(fH).
Thus, (b) is proved.*	
Entropy and Measurable Dynamical Systems
Up to this point, we have defined entropy for probability spaces (9,Л, д).
Now we introduce a dynamical aspect by considering the measurable dy-
namical system (Q, Д,/х, ip). Suppose that we modify the “thought experi-
ment” introduced on page 717 by allowing the particle p to move according
to the following rule: If p is at x at time 0, then its position at time 1
is ^(x), its position at time 2 is	etc.
If we use a measurable partition ^3 to .obtain information about the
location of p at time 0, then the measurable partition
^-пФ = {(у>(п))-1И):^е‘р}
yields corresponding information about the particle’s location at time n,
and the measurable partition
fp(n) = <p v	V ... V
yields corresponding information about the path of successive positions of p
at times 0 through n — 1 as it moves in Q under the action of
720 □ Chapter 12 Measurable Dynamical Systems
PROPOSITION 12.2
Let (П,Л,р, 99) be a measurable dynamical system and a measurable
partition of (fl, Д). Then the following hold:
а) Н&~кф) = Я(«р).
b)	= Я(<р<п)).
c)	< Я(ф(п>) + Я(ф(т>).
PROOF: Parts (a) and (b) follow immediately from the definition of the
entropy of a measurable partition and the invariance of p with respect to <p.
To obtain (c), we begin with the observation <p(n+m) = *p(n) V ((p'"n^J)^Tn'\
It follows from Proposition 12.1 that
Я(ф(п+го)) < Я(^(п>) + Я((<р"п‘Р)(,п)).
The assertion (c) is now an immediate consequence of (b).	
Using Proposition 12.2(c) it can be shown that the limit
Я(ЯЗ<П))
Я(*£, сл) = hm ——-
n—>oo	П
exists. (See Exercise 12.30.) We can think of H(^3, ip) as the time aver-
age for the entropies associated with the measurable partitions The
quantity
/1(9?) = sup{ H(^3, ip) : ^3 a partition of (Q, Л) },
which can be viewed as the maximum amount of information that can be
extracted from the dynamical system per unit time, is called the entropy
of the measurable dynamical system (Я,Д,д,<р). As the reader is
asked to show in Exercise 12.34, h is an invariant of (О,Л, д, p).
Calculation of the entropy of a measurable dynamical system is often
not an easy task. In the next section, though, we will find a formula for
the entropy of the Bernoulli scheme B(pi,p2, • • • ,Pn)«
Motivating the Formula for the Entropy of a Partition
We will now motivate the formula for the entropy of a measurable partition
given in Definition 12.4 on page 717. Let us return to the “thought experi-
ment” discussed previously in this section. Recall that a particle is located
in Q according to the probability measure p. That is, for each A 6 Л, the
probability that
p is in A	(12.32)
equals p(A).
12.3 Isomorphism of Dynamical Systems; Entropy □ 721
We would like to assign a numerical value 1(A) to the information
contained in the event (12.32). It seems reasonable to require that 1(A) be
a decreasing function of p(A). In other words, the smaller the probability
of A, the greater the information that is obtained from the knowledge that
p is in A. Thus, we should have a decreasing function f defined on JO, 1]
such that
1(A) = f(jj,(A)).	(12.33)
Another plausible condition on 1(A) is that it should assign the value 0
to the sure event:
I(Q) = /(1) = 0.	(12.34)
Equation (12.34) reflects the fact that knowing p is in Q provides no infor-
mation.
Our final condition on I concerns the total information in two indepen-
dent events, say, A and B. Knowing that one of the events occurs provides
no probabilistic information regarding the occurrence of the other event.
Therefore, there is no redundancy in the information imparted by knowing
that p is in A and the information imparted by knowing that p is in B.
Therefore, the total information imparted by knowing that p is in А П В is
the aggregate of the individual information:
I(A A B) = 1(A) + 1(B).	(12.35)
Combining (12.33-12.35) we obtain a decreasing function f defined on
probabilities such that
/(l) = 0 and f(st) = f(s) + f(t), s,te[0,1].	(12.36)
As the reader is asked to verify in Exercise 12.35, the only decreasing
functions on [0,1] that satisfy (12.36) are those of the form
/(t) = —alogt,
where a is a positive constant.
For convenience, we choose a = 1 to arrive at the following definition
of the information content of a single event:
/(Д) = -logM(^).
Now consider a measurable partition = {Ai, A2,..., An}. The dis-
crete random variable
Х = ^1^)ХА,
3=1
722 □ Chapter 12 Measurable Dynamical Systems
gives the information gained by knowing which element of ф contains p.
The expected value of X is
f (X) = [ X du = £	= - £ д(АД 1оёМ(АД = Я(ф).
>=i
Thus we see that the entropy of a measurable partition is the expected
amount of information gained by knowing which element of the measurable
partition contains the particle p.
EXERCISES 12.3
12.26	Prove that isomorphism of measurable dynamical systems is an equivalence
relation.
12.27	Show that the measurable dynamical systems in Examples 12.1 and 12.2
on page 699 are isomorphic.
12.28	Refer to Example 12.9 on page 716. Show that the measurable dynamical
system (Q+,A+,//+,<£+) is isomorphic to ([0,l),S[o,i),	where ip is
the mapping defined in Example 12.3 on page 699.
12.29	Prove that ergodicity is an invariant of a measurable dynamical system.
12.30	Suppose that {an}^=1 is a sequence of real numbers satisfying the subad-
ditivity condition an+m < un 4- am. Show that limn—ooUn/п exists as a
real number or, possibly, — oo. Hint: Let m be fixed, but arbitrary.
Each n G AT can be written as n = £m 4- r, where I > 0 and 0 < r < m.
Thus, an < tom 4- ar.
12.31	Consider the probability space ([0, l),A4[o,i), A[0,i)). Show that among
all measurable partitions of ([0, l),Af[o,i)) having n members, entropy is
maximized by the measurable partition {[(J — l)/n,j/n)}j=1.
12.32	Let (П,Л,д) be a probability space. Show that if ф is a measurable
partition having n elements, then H(^3) < logn.
12.33	Let (p be the identity function. Calculate the entropy of (Q, A, /z,
12.34	Prove that if (Q,^, /z, <p) is isomorphic to (A, 5, p, ^), then h(<^) = hfy).
12.35	Show that if f : [0,1] —♦ [0, oo] is nonincreasing and satisfies (12.36), then
it must be of the form f(t) = —alogt.
The remaining exercises of this section consider an alternative approach to the
concept of the information in an event. As previously in this chapter, (£l,A,p)
is a probability space.
12.36	Let J be an information function on A of the form J (A) = p(/z(A)), where
g is a function defined on [0,1]. Suppose that there is also a conditional
12.4 The Kolmogorov-Sinai Theorem;Calculation of Entropy □ 723
information function defined by
KA IB) - / P(B)g(p,(A I B)), if g(B) > 0;
ifg(B)=0.
Define the joint information function of A and В to be the sum of the
information in В and the information in A given that В does not occur,
that is, J(A, B) = J(B) + J(A | Bc). Suppose that the following conditions
are satisfied:
•	{д(В):ВС A} = [0iM(A)].
•	J(fi) = 0.
•	J(A,B) =
Show that g satisfies the functional equation
9&) + (1 - x)g ) = 9(y) + (1 - У)9 (
Al-х/	\1 *“ У /
for x, у e [0,1] and x 4- у < 1.
12.37	Let g be as in Exercise 12.36.
a)	Show that g(x) = p(l — x).
b)	Deduce that J(A) = J(AC), that is, the information in A is the same
as that in Ac. Observe that the information function I discussed at the
end of this section fails to have this property.
12.38	Let g be as in Exercise 12.36.
a)	Assuming that g is twice continuously differentiable, show that it must
have the form
g(x) = c(x\ogx 4- (1 — x) log(l — x))
for some constant c. Hint: Differentiate both sides of the equation in
Exercise 12.36, first with respect to x and then with respect to y, and
then use the substitutions и = y/(\ — x) and v = x/(l — y).
b)	Deduce that, in the case of two element partitions, using J (A) as a
measure of the information content of an event A leads to the same
definition of entropy as given in Definition 12.4.
12.4 THE KOLMOGOROV-SINAI THEOREM;
CALCULATION OF ENTROPY
Our goal in this section is to prove a theorem, due to Kolmogorov and
Sinai, that will enable us to calculate the entropy of the Bernoulli scheme
724 □ Chapter 12 Measurable Dynamical Systems
We will need the following natural extension of the notion of the en-
tropy of a measurable partition. Suppose that ф and £1 are measurable
partitions of (Я, Л, p). The conditional entropy of ф relative to £1 is
defined by
Я(ЭД £2) = - £ £ д(В)д(А | B) log//(A | B),
вейАеф
where we assign the value 0 to a summand in which p(B) = 0.
PROPOSITION 12.3
Let (fl, Л, p, ip) be a measurable dynamical system and let ф, £2, and be
measurable partitions of (Я,Л). Then the following hold:
а)	Я(ЭД£1)<Н(ЭД.
b)	Н(^ V £2) = Я(£2) + H(% | £2).
c)	Я(фУ£2|ЭД<Я(ЭДЭД + Я(£2|ЭТ).
d)	ф«£2=>Я(ЭДЭД <Я(£2|ЭД.
e)	£2«91=>Я(ЭДЭД <Я(ф|£2).
f)	Я(<р-1ЭД y>-1£2) = Я(ЭД£2).
g)	Я(£2,^)<Я(£2|ЭД + Я(ЭД^).
PROOF: The proofs of (a)-(f) are left for Exercises 12.39-12.40. To ob-
tain (g), we argue as follows. By Proposition 12.1 on page 718 and (b)
and (c), we have
Я(£2(п)) < Я(£2(п) V ЭДп)) = Я(ЭДп)) + Я(£2(п) |ЭДп))
n— 1
< Я(ЭДП)) +	I ЭДЭД.
j=Q
Using (e) and (f) we conclude that
n—1
Я(£2(п)) < Я(ЭДЭД +	| <*ЭД < Я((р(п)) + пЯ(£21ЭД.
>=о
Recalling from page 720 that
Я(£2,у>) = lim	(12.37)
n—*oo	72
we see that (g) holds.	
Next/we need a lemma about approximating cr-algebras by algebras
of sets. In stating the lemma, we recall the notation for the symmetric
difference of two sets: E A F = (B \ F) U (F \ F).
12.4 The Kolmogorov-Sinai Theorem;Calculation of Entropy □ 725
LEMMA 12.1
Let (Q, Д, д) be a probability space, F C A an algebra of sets, and £ the
smallest а-algebra containing?. Then, given E E £ and e > 0, there exists
an F E ? such that p(E Д F) < e.
PROOF: Let Q denote the collection of all G E A having the property that
there is a sequence {Fn}Xi C ? such that limn—p(G Д Fn) = 0. As
the reader is asked to prove in Exercise 12.41, Q is an algebra of sets.
The lemma will be established if we can show that Q is actually a
cr-algebra. Let {Gn)Xi be a sequence of sets in Q. We must prove that
UXi € Q. First we disjointize the Gns. Let Ei = Gi and, for n > 2,
let En = Gn \ U£=i Gk- Because Q is an algebra, {En}Xi c moreover,
we have |JXi En = UXi Gn- Let E = UXi En-
' Because Q is an algebra, (J>=i Ej E Q. It follows that for each n E Af,
there is an Fn E ? such that	Ej) Д Fn) < 1/n. Now, we have
Hence,
p{E AFn) < p
£ fW+s-
j=n+l
Since 52X1 Д(^п) < 1, we conclude that limn—p(E Д Fn) = 0. Conse-
quently, e e g.	
For А, В E A, the expression p(A | B) log/z(A | B) will be close to
zero if p(A | B) is either close to zero or close to one. In other words,
p(A | B) log д(А | B) will be close to zero if A and В are either nearly dis-
joint or nearly equal.
This observation makes it reasonable to consider the conditional en-
tropy, | £2), a measure of closeness of the measurable partitions
and £2. From this viewpoint, our next lemma concerns approximating one
measurable partition by another.
726 □ Chapter 12 Measurable Dynamical Systems
LEMMA 12.2
Let J7 C A be an algebra of sets, £ the smallest (J-algebra containing J7, and
ф C £ a measurable partition. Then for each e > 0, there is a measurable
partition £1 C 5 such that | Q) < e.
PROOF: We sketch the proof, using imprecise terms such as “small” and
“close,” leaving the details for Exercise 12.42. Let ф = {Ai, A2,..., An}.
The main idea of the proof is to use Lemma 12.1 to approximate each Aj
by a Cj e F.
Let 6 be a small positive number. By Lemma 12.1, we can find, for
each j, a set Cj G T such that p(Aj Д Cj) < 6. We will use the CjS
to construct a measurable partition of Q. First, we disjointize the CjS
by defining Bj = Cj \ Ck- Then we obtain a measurable partition
£1 = {Bi, B2,.. •, Bn, Bn+i) by letting Bn+1 = Q \ Uj=1 Bj. Because T7 is
an algebra, it follows that Bj G T7 for all j.
Now we consider the conditional entropy
n n+1
HOP IO) = - £ £ д(Вк)д(А, I Bfc) log^A,- I Bfc).
J=1 fc=l
On the right-hand side of the previous equation, the sum of the terms for
which к = n + 1 is dominated by n/z(Bn+i) log 2/2. This latter expression
can be made small by choosing 6 appropriately, because |/x(4j) “ v(Bj)\ is
small for 1 < j < n and p(Aj) = 1.
We use
-/z(Bfc)/z(Aj | В*,) log/i(Aj | Bfc) <	| Bfc) log/x(Aj | Bfc)
and the observation that p(Aj |В&) is close to 0, when j / fc, and close
to 1, when j = fc, to assert that the sum of the remaining terms of Н(ф | Q)
is small when <5 is sufficiently small.	
In the remainder of this section, we assume that (Q,A,/z, </?) is a mea-
surable dynamical system. We also continue to use the convention that
denotes the identity on Q.
LEMMA 12.3	4
Let Vp be a measurable partition. Then H(*p(k\<p) =	y>) for all к > 1.
12.4 The Kolmogorov-Sinai Theorem;Calculation of Entropy □ 727
PROOF: It is easy to check that (<p(fc))(n) = ^(fc+n-i). Hence,
Я(ф<‘>,lim g(ffllt,),n)) = lim
n—+oo	n	n—>oo	n
= lim -----------^=Я(ф,^),
7П-+ОО ТП
as required.	
If p is a 1-1 correspondence and (П, Л, /z, у?”1) is a measurable dynam-
ical system, then we say that p is invertible. In such cases, the notation
<p(m’n) = y-nty v	V • • • V p~nty
is meaningful for each pair of integers n, m with m < n.
LEMMA 12.4
If p is invertible and is a measurable partition, then
Я(<Р(т’п>,¥>) = Н(ф,<р)
for each pair of integers n, m with m <n.
PROOF: It is easy to see that	= (^~гпф)(п”тп+1). Hence, by
Lemma 12.3, we have	p) = H(p~mty, p). Since p is invariant
with respect to both p and p~r, it follows that Я(у>“т*р, p) = Я(ф, p). 
Next we discuss the relationship between measurable partitions and
algebras of sets. Specifically, if p is invertible and ф is a measurable par-
tition, then for each n € AT, the collection
ЛП(Ф) = | В € A : В is a union of members of ф(~п’п> |
is an algebra of subsets of Q. Because ЛП(^Р) С Лп+iCP), the collection
Лоо(^) = и~=1Лп(Ф) is also an algebra of subsets of Q. (See Exer-
cise 12.43.)
We are now ready to state and prove the main result of this section,
which is known as the Kolmogorov-Sinai theorem. In doing so, we
recall that the entropy of the measurable dynamical system (Q, A, p, p) is
defined by
h(p) = sup{ Я(^Р, p) : a partition of (Q, Л) }.
728 □ Chapter 12 Measurable Dynamical Systems
THEOREM 12.3 Kolmogorov-Sinai Theorem
Let (П, А, д, tp} be a measurable dynamical system and assume that <p is
invertible. Suppose that ф is a measurable partition of (О,Л) such that
A is the smallest а-algebra containing Лоо(Ф)- Then h(ip) = H(ty, cp).
PROOF: By the definition of h(cp), it suffices to prove that

(12.38)
for each measurable partition £J. It follows from Proposition 12.3(g) on
page 724 that
H(Q, ^) < Я(П | «p(-n-n)) + Я(«р(~п’п), <p)
for all n € Af. Hence, by Lemma 12.4, we have
Я(£2, ¥>) < Я(О1 ф(-"’п>) + Я(«р, y>).	(12.39)
Given e > 0, we can apply Lemma 12.2 to find a measurable partition R
such that 91 С Лоо(Ф) and H(£l 191) < e. Since 91 is a finite collection, it
follows that 91 С Дп(ф) for some n. In particular, we have 91	ф(~п’п\
Applying Proposition 12.3(e), we get Я(О | ф^“п,п^) < H(Q|91) < e.
Hence, by (12.39), we have H(£l, <p) < e + Я(ф, ip). Since e is an arbitrary
positive number, the assertion (12.38) follows and the proof is complete. 
There is a version of the Kolmogorov-Sinai theorem that is valid when
ip is not necessarily invertible. For a measurable partition ф, let
Лп(ф) = |BG^:Bisa union of members of ф^п^ }
and let Лоо(ф) = UXi Аг(Ф)- If *п *be proof of the Kolmogorov-Sinai
theorem, we replace Лп(ф) and Лоо(ф) by Лп(ф) and Лоо(ф), respec-
tively, we obtain a proof of the following theorem.
THEOREM 12.4
Let (fl,A,p,ip) be a measurable dynamical system. Suppose that ф is
a measurable partition of (fl, A) such that A is the smallest a-algebra
containing (ф). Then h(<p) = Н(ф, <p).
12.4 The Kolmogorov-Sinai Theorem;Calculation of Entropy □ 729
EXAMPLE 12.10 Entropy of a Bernoulli Scheme
In this example, we apply the Kolmogorov-Sinai theorem to obtain the
entropy of the Bernoulli scheme B(pi,P2, • • • ,Pw), first introduced in Ex-
ample 12.4 on page 699. Consider the measurable partition of (Q, Д) given
by	: к = 1,2,..., N }. The entropy of ^3 is
N	N
= ~^M(C{0},fc)loS/*(<?{()},*:) = -^PkbgPfc.
fc=l	fc=l
We will now show that ф satisfies the hypothesis of the Kolmogorov-Sinai
theorem.
It is easy to see that is invertible. We have <p“1(C{o},fc) = ^{1}Л	-
more generally, <p“€(C{o},fc) = ^W,k for every integer t. Therefore, a typ-
ical element of	is of the form'll _m = C{_m _m+i,...,m},b
where 6(£) = ke for — m <£<m.
We recall that, in this example, A is the cr-algebra generated by sets
of the form Ср?а, where F is a finite set of integers and a is a function
from F into {1,2,..., TV}. By choosing m large enough, we can assume
that F C {—m, ...,m}. Hence, we can write	C{-m,...,m},b>
where the union is over all functions b: {—m,..., m} —> {1,..., N} such
that = a(^) for all t^F.
It follows that Cp,a belongs to Am(*P) and this in turn implies that the
algebra Aoo(^3) contains all sets of the form Сг,а. Thus, A is the smallest
cr-algebra containing Aoo^P).
Next, we calculate H(^3, <p). The entropy of is
NN	N m-1	m-1
52 П p(C{e},kt) log П Р(С{ОЛг)
fc0=lfci=l	^-1=1 £=0	£=0
NN	N 1	m-1
=	S IIpfc'losIIpfc'
fc0=lfc1=l	£=0	£=0
As the reader is asked to verify in Exercise 12.44, using ^LiPk = 1, it
can be shown that
NN	N	m—1	m—1	N
EE		E	П Pkt log fj Pkt	= m^pk logpfc.
fc0=lfc1=l	£=0	£=0	fc=l
Applying the Kolmogorov-Sinai theorem, we conclude that
Wm))
h(<p) = p) = lim ------------------ ~52Pk log?*.
7П-+ОО 771
730 □ Chapter 12 Measurable Dynamical Systems
Thus we see that the entropy of the Bernoulli scheme	• • • ,Pn)
equals -	Pk log Pk 	□
Using Example 12.10 and the fact that entropy is an invariant of a
measurable dynamical system, we obtain the following: If two Bernoulli
schemes B(pi,p2, • • • and B(gi, g2, • • •,Qm) are isomorphic, then
N	M
52 Pk log Pit = 52 # log qe.	(12.40)
fc=l	€=1
Thus, for example, we see that B(l/2,1/2) and S(l/3,1/3,1/3) are not
isomorphic because log 2 / log 3.
Actually, a stronger result exists regarding Bernoulli schemes, namely,
that B(pi,p2,... ,pjv) and B(qi, q2, ..., qjw) are isomorphic if and only if
(12.40) holds?
EXERCISES 12.4
12.39	Prove (a), (b), and (c) of Proposition 12.3 on page 724.
12.40	Prove (d), (e), and (f) of Proposition 12.3 on page 724.
12.41	Suppose (Q, Л, /i) is a probability space. Let Q denote the collection of all
E E A having the property that there is a sequence	C F such
that limn_oo P>(E A Fn) = 0. Prove that Q is an algebra of subsets of Q.
12.42	Provide the details for the proof of Lemma 12.2 on page 726.
12.43	Prove that if {A}^=1 is a sequence of algebras of subsets of some set Q
such that An C Лп+i, then U^Li An is a^so an algebra of subsets of ft.
12.44	Using	= 1, show that
NN	N m—1	m — 1	N
52 52   52 Пpktlog Пpk‘ ~mYjPklogpfc.
fc0 = l*!l=l	fem-1 = 1 £=0	£=0	fc = l
12.45	Show that if (Q,*4,g,tp) has entropy h(^>), then h(y№) = kh(tp) when
ktN and, if ip is invertible, hfjp^) = |fc|h((£>) for all к 6 Z.
12.46	Refer to Example 12.1 on page 699. Show that h(ipb) = 0 if b is rational.
Hint: See Exercise 12.45.
t For a relatively short proof of this result, see M. Keane and M. Smorodinsky,
“Bernoulli Schemes of the Same Entropy are Finitarily Isomorphic” (Annals of
Math., 109, pp 397-406, 1979).
12.4 The Kolmogorov-Sinai Theorem;Calculation of Entropy □ 731
12.47	Let (Q,.A, д,(£>) be a measurable dynamical system and a measurable
partition of (6,Д). Show that	= limn->oo Htfp |
Hint: Use Proposition 12.3 on page 724 to show that
= я(<р | «1Ф)(к)) +
12.48	Consider the measurable dynamical system in Example 12.1 on page 699
and assume b is irrational. Let = {[0,6), [5,1)},
An = {B 6 A : В is a union of members of	} , n 6 Af,
and Лоо = UXi Show that the smallest a-algebra containing Лоо(ф)
is the a-algebra of Borel subsets of [0,1). Hint: See Exercise 11.26 on
page 653.
12.49	Refer to Example 12.1 on page 699. Show that h(^) = 0 if b is irrational.
Hint: Use Exercises 12.47 and 12.48.

Index
Absolute continuity
equivalent conditions for, 350
Absolute value, 42
of a function, 421
Absolutely continuous, 343, 364
see also Absolutely continuous
random variable
Absolutely continuous function
on a finite closed interval, 343
on the real line, 343
relation to absolutely continuous
measures, 371
Absolutely continuous measure, 364
for complex measures, 382
relation to absolutely continuous
functions, 371
Absolutely continuous random
variable, 276, 371
probability density function of, 276
Absolutely convergent series, 531
Accumulation point, 63
in metric spaces, 470
Alaoglu’s theorem, 615
Algebra, 489
of functions, 66, 518
generated by a collection of sets, 28
Algebra of functions
continuous real-valued functions on a
subset of 7£, 66
Algebra of sets, 26
Almost all, 171
Almost always, 171
Almost certainly, 171
Almost everywhere, 171
see also Lebesgue almost everywhere
Almost surely, 171
Almost-uniform convergence, 206
Л-measurable function
Lebesgue integral for nonnegative
extended real-valued
functions, 186
Л-measurable set, 168
Arc, 456
connecting two points, 456
Archimedean principle, 38
Arcwise connected component of a
point, 459
Arcwise connected space, 456
733
734 □ Index
Ascoli-Arzela Theorem, 507
Associative laws, 36
Asymptotically uncorrelated, 312
Atom, 174, 392
Axiom of choice, 16
Baire category theorem, 494
alternative version, 495
Banach limits, 583
Banach space, 531
Banach, Stefan
biography, 578
Band-limited measure, 672
Basic period, 636
Basic wavelet, 679
Basis, 498, 545
Bayes, Thomas, 273
Bayes’ rule, 273
Bernoulli, James, 287
Bernoulli scheme, 699, 701
Bernoulli shift, 700
Bernoulli trials, 287
Bessel’s inequality, 547
Binomial distribution, 287
Bolzano-Weierstrass theorem, 63
Boole’s inequality, 267
Bootstrapping, 188
Borel-Cantelli lemma, 270
Borel, Emile
biography, 92
Borel measurable function, 563
Borel measurable functions on 7£, 94
equivalent condition for, 99
Borel measurable functions on a subset
of 7£, 101
equivalent condition for, 101
Borel measure, 220, 563
decomposition of, 395
finite, 220
n-dimensional, 259
regular, 564
two-dimensional, 244
Borel sets
n-dimensional, 254, 278
relation to Lebesgue measurable
sets, 125
of a topological space, 563
two-dimensional, 244
Borel sets of 7£, 95
equivalent condition for, 100
as related to the Borel sets of a
subset of 7£, 102
Borel sets of a subset of Ti, 101
equivalent condition for, 102
as related to the Borel sets of 7£, 102
Boundary
of a set, 438
Bounded
subset of a normed space, 613
weakly, 613
Bounded above
subset of real numbers, 37
Bounded below
subset of real numbers, 38
Bounded intervals, 4, 41
Bounded linear mapping, 529
Bounded set
in a normed space, 468
Bounded variation, 331
Canonical representation
of a simple function, 130, 184
Cantor function, 78
Cantor, Georg
biography, 2
Cantor set, 75
Cantor ternary set
see Cantor set
Caratheodory criterion, 119
Cartesian product
of a collection of sets, 18
of a finite number sets, 17
Cauchy criterion, 52
Cauchy sequence, 52
in a metric space, 435
pointwise, 73
uniform, 73
Cauchy’s inequality, 534
Central limit theorem, 669
Chain, 17
Characteristic function, 81
of a random variable, 666
Chebychev’s inequality, 295
Chi-square distribution, 285
Closed
under pointwise limits, 69
Closed ball, 437
Closed convex hull, 621
Closed graph theorem, 595
Closed interval, 61
Closed linear operator, 594
Closed set
in 7£, 61
of a topological space, 432
Closure, 432
of a set of real numbers, 60
Cluster point
of a sequence of real numbers, 45
Index □ 735
C(Q, A), 483
C0(Q), 489
Cb(Q), 489
Сс(П), 489
Commutative laws, 36
Compact function, 508
Compact metric space, 465
Compact set, 465, 471
Compact topological space, 471
Complement
relative, 6
of a set, 5
Complete measure space, 171
Complete metric space, 435
Complete set, 435
Complete subset
of a metric space, 435
Completely regular space, 513
Completeness axiom
for the real numbers, 37
Complex conjugate
of a function, 421
Complex measure, 379
absolute continuity of, 382
decomposition of, 380
Radon-Nikodym theorem for, 383
total variation of, 381
Composition
of two functions, 13
Conditional entropy, 724
Conditional expectation
existence of given a a-algebra, 385
given a random variable, 388
given a a-algebra, 384
relative to an event, 300, 384
Conditional probability, 267
existence of given a a-algebra, 373
given a random variable, 375
given a a-algebra, 372
Connected
by an arc, 455
set, 453
topological space, 453
Connected component, 457
Continuous
at a point, 444
uniformly, 469
weakly, 428
Continuous function
on a metric, normed, or topological
space, 425
real-valued function of a real
variable, 65
with respect to neighborhood
bases, 413
on a subset of a topological space, 417
on a topological space, 416
Continuous functions
bounded, 489
collection of from a topological space
to a metric space, 483
with compact support, 489
vanishing at infinity, 489
Continuous measure, 363, 392
Continuous random variable, 277
Continuous uniform model, 266
Contraction, 499
Contraction mapping principle, 500
Converge absolutely, 531
Convergence almost everywhere, 162
Convergence almost uniformly, 206
Convergence in distribution, 612, 666
Convergence in measure, 203
Convergence of nets, 439
Convergence in probability, 203
Convergent sequence
extended sense in 7£*, 44
of real numbers, 43
in a topological space, 433
Convex combination, 619
proper, 619
Convex function, 329
Convex hull, 620
Convex set, 541
extreme point of, 619
face of, 620
Convolution, 643, 656
of two Borel measurable
functions, 256
of two а-finite Borel
measures, 256, 374
Convolutions of measures, 665
Coordinate, 18
Coordinate projection, 430
Correlation coefficient, 536
Countable
set, 21
Countable additivity, 105
Countable subadditivity, 106, 267
of Lebesgue outer measure, 106
of outer measure, 210
property of a measure, 170
Countably infinite
set, 21
Counting measure, 169
736 □ Index
Covariance, 296, 299
bilinearity property of, 299
Covering, 462
Daubechies, Ingrid
biography, 634
Decomposition
of complex measures, 380
of finite Borel measures, 395
De Morgan’s Laws
for collections of sets, 8
for two sets, 5
DeMoivre-Laplace theorem, 671
Dense, 432
irrational numbers, 39
rational numbers, 39
Dependent events, 269
Derivative
of a complex-valued function, 328
of an indefinite integral, 339, 341
in the £2-sense, 676
of monotone functions, 325
Radon-Nikodym, 369
of a real-valued function, 316
Differentiability
Lebesgue theorem on for monotone
functions, 325
Dimension
of a linear space, 498
Dini-derivates, 317
Dini’s theorem, 72, 515
Dirac measure, 170
Directed set, 438
Dirichlet kernel, 640
Dirichlet’s theorem, 646
Disconnected
set, 453
topological space, 453
Discontinuity
point of, 65
Discontinuous
at a point, 65
Discrete distribution function, 396
Discrete measure, 170, 363, 392
Discrete random variable, 276, 374
probability mass function of, 276
Discrete topology, 416
Discrete uniform model, 266
Disjoint sets, 9
pairwise, 9
Distance
between a point and a set, 110
between two sets, 110
from a point to a set, 437
Distribution
convergence in, 612, 666
Distribution function, 226, 395
decomposition of, 395, 398
discrete, 396
of a finite Borel measure, 221
see also Probability distribution
function
Distributive law, 36
Distributive laws
for union and intersection, 6, $
Domain
of a function, 12
Dominated convergence theorem, 197
for convergence in measure, 205
for Lebesgue integration, 154
Dual space, 530
Egorov’s theorem, 139, 207
Empty set, 3
Entropy
of a measurable dynamical
system, 720
of a measurable partition, 717
Enumeration, 21
Equality of sets, 3
Equicontinuity, 505
Equicontinuous, 505
Equivalence
of sets, 21
Equivalence class, 25
Equivalence relation, 25
Equivalent, 21
metrics, 425
norms, 425
Ergodic, 712
Ergodic measure, 631
Ergodic theorem
in £2, 715
Ergodicity
of a measurable dynamical
system, 712
Essential-supremum norm, 423
Event
impossible, 263
occurrence of, 263
Event class, 263
Events
dependent, 269
independent, 269
mutually exclusive, 263
mutually independent, 269
pairwise independent, 269
pairwise mutually exclusive, 263
Index □ 737
Exhaustion, 479
Expectation, 289
conditional, see Conditional
expectation
finite, 196, 289
law of total, 301
long-run-average interpretation
of, 288
in terms of probability
distributions, 290
Expectation of a random variable
see Mean of a random variable
Expected value
of a random variable, 289
Expected value of a random variable
see Mean of a random variable
Extended real numbers, 40
interval of, 40
Extended real-valued functions, 178
Extensions to measures, 207, 214
existence of, 214
uniqueness of, 216
Extreme point, 619
Face, 620
Fatou’s lemma
for the abstract Lebesgue
integral, 188
for convergence in measure, 206
for Lebesgue integration, 146, 148
Fejer’s kernel, 642
Field, 420
Field axioms
for the real numbers, 36
Finite
sequence, 14
set, 21
Finite additivity, 105
Finite Borel measure
distribution function of, 395
Finite expectation, 196
Finite intersection property, 471
Finite mean, 196
Finite measure, 173, 195
Finite measure space, 173, 195
Finite sequence, 14
of sets, 9
First category, 495
First countable space, 464
First fundamental theorem of calculus
for Lebesgue integration, 339
for Riemann integration, 335
First moment
of a random variable, 289
Fixed point, 498
Fourier coefficient, 548
Fourier coefficients, 637
of a periodic measure, 643
Fourier series, 548, 637
convergence in norm, 637
convergence at a point, 637
localization of, 650
Fourier series expansion, 550
Fourier-Stieltjes transform
of a finite Borel measure, 257 .
Fourier transform, 637, 653
definition of, 202
of an £2-function, 676
of a measure, 663
uniqueness property of, 660
Fourier transform of measures
uniqueness property of, 664
Frequency, 636
Fa-set, 127
Fubini’s theorem, 247
Function, 12
absolute value of, 421
compact, 508
complex conjugate of, 421
domain of, 12
extended real-valued, 178
greatest integer, 43
inverse of, 13
monotone, 67
nondecreasing, 67
nonincreasing, 67
one-to-one, 12
onto, 12
peak point of, 481
range of, 12
real and imaginary parts of, 421
and set operations, 15
weakly continuous, 428
Functional
linear, 528
Functions
addition of, 421
algebra of, 518
composition of, 13
lattice of, 514
maximum of, 421
minimum of, 421
multiplication of, 421
738 □ Index
Gaussian function, 655
G$-set, 127
General change of variable formula, 404
Generalized sums, 57
in normed spaces, 548
Gibbs’ phenomenon, 652
Greatest integer
function, 43
in a real number, 43
Greatest lower bound, 38
Haar functions, 553, 680
Haar wavelet, 681
Hahn-Banach theorem, 580
complex version, 583
Hahn decomposition, 357
Hahn decomposition theorem, 357, 360
Half-open interval, 61
Hamel basis, 545
Hausdorff space, 448
Heine-Borel theorem, 72, 89, 465
for Rn or Cn, 468
Helly’s selection principle, 618
Hermite polynomials, 677
Hilbert, David
biography, 526
Hilbert space, 537
Holder’s inequality, 556
Homeomorphic topological spaces, 417
Homeomorphism, 417
Hyperplane
separation by, 604
Ideal, 524
Identities
for the real number system, 36
Identity operator, 529
iid, 300, 308
Image
of a function, 12, 14
Impossible event, 263
Indefinite Lebesgue integral, 334
Independence, 280
mutual, 281
pairwise, 281
Independent events, 269
Index, 438
set, 439
Indexed collection of sets, 9
Infimum
of a subset of real numbers, 38
Infinite
set, 21
Infinite sequence, 9, 14
of sets, 9
Infinite series, 56
absolutely convergent, 531
converge absolutely, 531
in normed spaces, 440
Infinite sums
in normed spaces, 440
Inner product, 534
Inner product space, 534
Integration by parts
for Lebesgue integration, 352
for Lebesgue-Stieltjes integrals, 258
Integration by substitution
for Lebesgue integration, 353
for Riemann integration, 402
Interior, 436
Interior point, 436
Internal point, 603
Intersection
of a collection of sets, 7
of two sets, 5
Interval
closed, 61
in the extended real number
system, 40
half-open, 61
open, 58
Intervals, 4
bounded, 4, 41
unbounded, 5, 41
Invariant measure, 631, 698
Inverse
of a function, 13
Inverse image
of a function, 15
Inverses
for the real number system, 36
Inversion theorems, 660
Invertible
measure-preserving
transformation, 727
Irrational numbers
density of in R, 39
Isometric, 553
Isometry, 553
Isomorphism
of a measurable dynamical
system, 716
Index □ 739
Jacobi theta function identity, 662
Joint probability density function, 279
Joint probability distribution, 279
Joint probability distribution
function, 280
Joint probability mass function, 279
Jointly absolutely continuous random
variables, 279
joint probability density function
of, 279
Jointly discrete random variables, 279
joint probability mass function of, 279
Jordan decomposition theorem, 361, 362
Kakutani-Krein theorem, 516
fc-Cauchy, 485
fc-Complete, 485
Kolmogorov, A. N., 261
biography, 260
Kolmogorov extension theorem, 301
Kolmogorov-Sinai theorem, 728
Krein-Milman theorem, 622
measure-theoretic version, 624
Kronecker’s lemma, 303
ГЧд), 423
Z^-norm, 423
424
L2-ergodic theorem, 715
Г2(д), 423
£2-norm, 423
£2(O), 424
£°°(д), 423
£°°-norm, 423
£°°(Q), 424
£p-spaces, 554
^(Q), 424
£2(Q), 424
£°°(Q), 424
Laplace transform, 148
Lattice of functions, 514
Law of large numbers, 261
see Strong law of large numbers and
Weak law of large numbers
Law of total expectation, 301
Law of total probability, 273
Least upper bound, 37
Least upper-bound axiom, 37
Lebesgue almost everywhere, 162
Lebesgue decomposition, 390
Lebesgue decomposition theorem, 390
Lebesgue, Henri
biography, 166
Lebesgue integrable, 150, 193, 195
Lebesgue integral
of an arbitrary measurable
function, 150
of a complex-valued A-measurable
function, 194
of an extended real-valued
A-measurable function, 192
as an extension of the Riemann
integral, 157
of a function defined almost
everywhere, 164, 197
indefinite, 334
linearity property of, 153, 196
of a nonnegative extended real-valued
A-measurable function, 186
of a nonnegative measurable
function, 135
of a nonnegative simple
function, 131, 185
second fundamental theorem of
calculus for, 350
with respect to a signed measure, 379
Lebesgue integration
first fundamental theorem of calculus
for, 339
Lebesgue measurable function, 128, 129
Lebesgue integral of, 150
Lebesgue integral of for nonnegative
functions, 135
Lebesgue measurable set, 123
in 7£n, 211
Lebesgue measurable sets
relation to Borel sets, 125
Lebesgue measure
definition of, 123
on a measurable subset of 7£, 169
in ?гп, 214
Lebesgue number, 469
Lebesgue outer measure
basic properties of, 106
definition of, 106
finite additivity properties
of, 110, 116
in 7£n, 210
Lebesgue singular function, 78
Lebesgue-Stieltjes integral, 227
Lebesgue-Stieltjes measure, 226
Legendre polynomials, 552
Levy’s theorem, 667
Limit
with respect to neighborhood
bases, 413
of a sequence of real numbers, 43
740 □ Index
Limit inferior
of a sequence of real numbers, 48
of a sequence of sets, 11
Limit point, 432
of a set of real numbers, 60
Limit superior
of a sequence of real numbers, 48
of a sequence of sets, 11
Lindelof property, 462
Lindelof’s theorem, 63
Linear combination, 528
Linear functional
nonnegative, 566
Linear mapping
bounded, 529
linear functional, 528
linear operator, 431, 528
Linear operator
closed, 594
uniform boundedness principle
for, 595
Linear space, 420
dimension of, 498
finite dimensional, 498
infinite dimensional, 498
Linear subspace, 421
Linearity property
of the Lebesgue integral, 153, 196
Liouville’s theorem, 704
Lipschitzian, 352
Locally compact space, 475
Locally convex topological linear
space, 600
Lower bound, 38
Lower Riemann integral, 82
Lower semicontinuous, 475
Lower and upper limits, 316
Lusin’s theorem, 140
Mapping
linear, 528
Marginal probability density
function, 279
Marginal probability mass function, 279
Markov, A. A., 299
Markov’s inequality, 299
Maximal element, 17
Maximum
of a finite set of real numbers, 42
of two functions, 96
of two real numbers, 42
Mean
finite, 196
of a population, 300
of a random variable, 289
Mean of a random variable, 189
Measurable dynamical system, 698
entropy of, 720
invariant of, 716
isomorphic, 716
Measurable function
complex-valued, 178
extended real-valued, 179
see also Lebesgue measurable
function
real-valued, 175
Measurable partition, 378
entropy of, 717
refinement of, 718
Measurable rectangle, 232
Measurable set, 104
in the context of outer measure, 211
see also Lebesgue measurable set
Measurable space, 168
Measurable transformation, 403
measure induced by, 403
Measure, 104, 168
atom of, 174
atoms of, 392
Borel, 220
complex, 379
continuous, 363, 392
counting, 169
Dirac, 170
discrete, 170, 363, 392
ergodic, 631
extension to, 207
finite, 173, 195
induced by a measurable
transformation, 403
invariant, 631, 698
Lebesgue decomposition of, 390
see also Lebesgue measure
Lebesgue-Stieltjes, 226
periodic, 643
probability, 169
product, 236
properties of, 170
representing, 624
signed, 356
Measure-preserving transformation
invertible, 727
Measures
mutually singular, 361
Index □ 741
Measure space, 168
complete, 171
completion of, 172
finite, 173, 195
product, see Product measure space
cr-finite, 234
Measure zero, 85
Metric, 420
induced by a norm, 422
restricted, 424
Metric space, 420
compact, 465
complete, 435
Metrizable, 425
topological space, 425
Minimum
of a finite set of real numbers, 42
of two functions, 96
of two real numbers, 42
Minkowski’s inequality, 557
.M-measurable function
see Lebesgue measurable function
Modulus of a function, 194
Monotone, 30
sequence of functions, 70
sequence of real numbers, 44
sequence of sets, 30
Monotone class, 29
Monotone class theorem, 30
Monotone convergence theorem
for the abstract Lebesgue
integral, 188
for Lebesgue integration, 142, 147
Monotone function, 67
differentiability of, 325
Monotone nondecreasing
sequence of sets, 30
Monotone nonincreasing
sequence of sets, 30
Monotonicity
of Lebesgue outer measure, 106
of outer measure, 210
property of a measure, 170
Multinomial distribution, 287
Multinomial trials, 287
Multiresolution analysis, 681
Mutual independence, 281
Mutually exclusive events, 263
Mutually independent events, 269
Mutually singular measures, 361
Nearest point, 539
Negative part of a function, 149
Negative set, 357
Negative variation
of a signed measure, 377
Neighborhood, 412
Neighborhood basis
determining a topology, 415
inducing a topology, 415
at a point, 412
on a set, 412
for a topological space, 415
Net, 439
Nondecreasing
sequence of functions, 70
sequence of real numbers, 44
Nondecreasing function, 67
Nonincreasing
sequence of functions, 70
sequence of real numbers, 44
Nonincreasing function, 67
Nonnegative definite sequence, 626
Nonnegative linear functional, 566
Nonnegativity
of Lebesgue outer measure, 106
of outer measure, 210
Norm, 422
metric induced by, 422
Normal number, 311
Normal space, 448
Normed space, 422
infinite series in, 440
infinite sums in, 440
Nowhere dense, 494
nth moment, 293
finite, 293
i/*-measurable sets, 567
One-point compactification, 480
One-to-one, 12
1-1 correspondence, 20
Onto, 12
Open
relatively, 417
Open ball
induced by a metric, 424
Open covering, 462
Open interval, 58
Open mapping theorem, 590
742 □ Index
Open set, 413, 415
of the extended real numbers, 179
in 57
with respect to a neighborhood
basis, 413
of a subset of 7£, 62
weakly, 428
Operator
linear, 528
Orbit
in a measurable dynamical
system, 707
Order axioms
for the real numbers, 37
Orthogonal complement, 542
Orthogonal elements, 542
Orthogonal projection, 541
Orthogonal set, 545
Orthonormal basis, 545
Orthonormal set, 545
Orthonormal wavelet basis, 679
Outcome space, 263
Outer measure, 106
induced, 209
see also Lebesgue outer measure
Lebesgue, see Lebesgue outer
measure
Pairwise disjoint, 9
sequence of sets, 10
Pairwise independence, 281
Pairwise independent events, 269
Pairwise mutually exclusive events, 263
Partial derivative, 502
Partial ordering, 16
Partially ordered set, 16
Partition, 11
Partition of unity, 477
Peak point, 481
Period, 636
Periodic
function, 636
measure, 643
Periodic function, 636
frequency of, 636
Periodic measure, 643
Fourier coefficients of, 643
Piecewise linear, 513
Plancherel’s theorem, 674
Point of discontinuity, 65
Pointwise convergence, 482
of a sequence of functions, 68
of a sequence of real-valued
functions, 68
Pointwise ergodic theorem, 707
Pointwise limit
of a sequence of functions, 68
Pointwise limits
closure under, 69
Poisson distribution, 287
Poisson summation formula, 662
Population mean, 300
Population standard deviation, 300
Positive part of a function, 149
Positive set, 357
property of, 358, 359
Positive variation
of a signed measure, 377
Power set, 5
Probabilistically independent, 269
see also Independent events
Probability
conditional, see Conditional
probability
relative-frequency interpretation
of, 262
Probability density function, 276
joint, 279
marginal, 279
as a Radon-Nikodym derivative, 371
Probability distribution, 275
binomial, 287
chi-square distribution, 285
joint, 279
multinomial, 287
Poisson, 287
standard normal, 285
uniform, 277, 285
Probability distribution function, 277
joint, 280
Probability mass function, 276
joint, 279
marginal, 279
as a Radon-Nikodym derivative, 374
Probability measure, 169
Probability space, 169, 265
Product measure, 236
n-dimensional, 254
Product measure space, 236
completion of, 250
of a finite number of factor
spaces, 253
n-dimensional, 254
Product cr-algebra, 236
n-dimensional, 254
Product topology, 430
Index □ 743
Projection
in a normed space, 597
Proper convex combination, 619
Proper subset, 4
Quotient topology, 419
Radon-Nikodym derivative, 369
Radon-Nikodym theorem, 354
for complex measures, 383
Random experiment, 263
Random variable, 177, 275
absolutely continuous, 276, 371
characteristic function of, 666
continuous, 277
discrete, 276, 374
expectation of, 289
probability distribution of, 275
probability distribution function
of, 277
standard deviation of, 294
variance of, 294
Random variables
covariance of, see Covariance
independent, 280
joint probability distribution of, 279
joint probability distribution function
of, 280
jointly absolutely continuous, 279
jointly discrete, 279
mutually independent, 281
pairwise independent, 281
uncorrelated, 297
Random vector, 278
Range
of a function, 12
Rational numbers
density of in TZ, 39
Real numbers
completeness axiom for, 37
extended, 40
field axioms for, 36
least upper-bound axiom for, 37
order axioms for, 37
Real-valued function, 65
on a set, 65
Rectangle, 18
Refinement
of a measurable partition, 718
Regular Borel measures, 564
Relative complement, 6
Relative-frequency interpretation of
probability, 262
Relative topology, 417
Relatively open, 417
Representing measure, 624
Restriction of a function, 13
Riemann, Bfernhard
biography, 34
Riemann integrable, 82
Riemann integral, 82
lower, 82
upper, 82
Riemann-Lebesgue lemma, 642
Riesz-Markov theorem, 567
Riesz representation theorem
for C(Q), 574
for C0(Q), 575
for £p-spaces, 559
Riesz’s theorem
on the completeness of £p-spaces, 557
Sample space, 263
Scaling function, 684
Schroder-Bernstein theorem, 26
Second category, 495
Second countable space, 461
Second fundamental theorem of
calculus
for Lebesgue integration, 350
for Riemann integration, 335
Sections
of a function on a product space, 239
of a set in a product space, 237
Semi inner product, 544
Semialgebra, 32, 208
Seminorm, 600
topology induced by a family of, 601
Separable space, 460
Separate points, 515
Separated sets, 447
Separation by a hyper plane, 604
Sequence, 14
convergent, 43
finite, 14
infinite, 14
nonnegative definite, 626
of real numbers, 43
of sets, 9
subsequence of, 14
term of, 14
Sequence of functions
monotone, 70
nondecreasing, 70
nonincreasing, 70
pointwise limit of, 68
744 □ Index
Sequence of real numbers
limit inferior of, 48
limit superior of, 48
monotone, 44
nondecreasing, 44
nonincreasing, 44
Sequence of sets
monotone, 30
monotone nondecreasing, 30
monotone nonincreasing, 30
pairwise disjoint, 10
Set, 3
countable, 21
countably infinite, 21
empty, 3
finite, 21
infinite, 21
of measure zero, 85
uncountable, 21
Set operations
and functions, 15
Sets
algebra of, 26
disjoint, 9
equality, 3
equivalence of, 21
equivalent, 21
finite sequence of, 9
infinite sequence of, 9
limit inferior of, 11
limit superior of, 11
pairwise disjoint sequence of, 10
sequence of, 9
a-algebra of, 28
Shannon, Claude E.
biography, 696
Shannon sampling theorem, 677
a-algebra
generated by a collection of sets, 29
product, 236
a-algebra of sets, 28
а-finite measure space, 234
Signed measure, 356
Hahn decomposition for, 357, 360
Lebesgue integral with respect to, 379
negative variation of, 377
positive and negative sets for, 357
positive variation of, 377
total variation of, 377
variations of, 377
Signed measures
Jordan decomposition of, 361, 362
properties of, 358, 359
Simple function	।
canonical representation of, 130, 184
Lebesgue integral of, 131, 185
on Q, 184
on 7£, 130
Sine function, 642
Span, 542
Standard deviation
of a population, 300
of a random variable, 294
Standard normal distribution, 285
Statistically independent, 269
see also Independent events
Step function, 81
Stochastically independent, 269
see also Independent events
Stone-Cech compactification, 513
Stone, Marshall Harvey
biography, 492
Stone-Weierstrass theorem, 521
complex version, 522
Strictly weaker
relation between topologies, 428
Strong law of large numbers, 306
Cantelli’s, 311
iid case, 308
Kolmogorov’s, 306
Sub-basis, 418
Subcovering, 462
Subnet, 443
Subsequence, 14
Subset, 3
proper, 4
Subspace, 421
Sup-norm, 489
Support function, 603
Support of a function, 477
Supremum
of a subset of real numbers, 37
Supremum norm, 489
Symmetric difference, 7
Taylor’s theorem
for Lebesgue integration, 352
Term
of a sequence, 14
Tietze extension theorem, 451
Time-limited measure, 671
Toeplitz’s lemma, 302
Ti-space, 448
Tonelli’s theorem, 245
for the completion of a product
measure space, 252
Index □ 745
Topological linear space, 598
locally convex, 600
Topological space, 415
compact, 471
completely regular, 513
first countable, 464
locally compact, 475
metrizable, 425
neighborhood basis for, 415
one-point compactification of, 480
second countable, 461
separable, 460
Topology, 415
determined by a basis, 415
determined by a neighborhood
basis, 415
determined by a subbasis, 418.
discrete, 416
induced by a collection of
seminorms, 601
induced by a neighborhood basis, 415
induced by a norm, 424
quotient, 419
relative, 417
of uniform convergence on compact
subsets, 484
Topology of pointwise convergence, 490
Total variation, 331
of a complex measure, 381
of a signed measure, 377
Totally bounded, 465
Totally disconnected space, 458
Transitive
property of ordering of 7£, 37
Translated function, 639
Translation-dilation, 654
Translation invariance
of Lebesgue outer measure, 106
Translation invariant, 639
Triangle inequality
for real numbers, 42
TYichotomous
property of ordering of 7£, 37
Tychonoff’s theorem, 510
Unbounded intervals, 5, 41
Uncorrelated, 297
asymptotically, 312
Uncountable
set, 21
Uniform boundedness principle, 497
for linear operators, 595
Uniform convergence, 482
on compact subsets, 483, 484
of nets of functions, 447
of a sequence of functions, 69
Uniform distribution, 277, 285
Uniform norm, 489
Uniformly continuous, 469
Uniformly distributed sequences, 652
Union
of a collection of sets, 8
of two sets, 5
Unit point mass, 170
see also Dirac measure
Upper bound, 17, 37
Upper Riemann integral, 82
Upper semicontinuous, 475
Urysohn metrization theorem, 462
Urysohn, Pavel S.
biography, 410
Urysohn’s lemma, 450
Value
of a function, 12
Variance
finite, 294
of a random variable, 294
of a sum of random variables, 295
Vector space, 420
Vitali cover, 322
Vitali covering theorem, 322
Wavelet, 679
Haar, 681
Wavelet theory, 678
Wavelet transform, 691
Weak law of large numbers, 306
Chebychev’s, 312
Markov’s, 312
Weak topology, 428, 610
Weak* topology, 610
Weaker
relation between topologies, 428
Weakly bounded set, 613
Weakly continuous, 428
function, 428
Weakly open set, 428
Well-ordering principle, 22
Weyl’s criteria for uniform
distribution, 653
With probability one, 171
Zorn’s lemma, 17
45	43:	A Course in Real Analysis
fE	%;	J. N. McDonald, N. A. Weiss
ф i? «:
ЙШЙШ:
Ж I® %:
ЕР «I %:
£ if: йдашад^^Ш*1П37 3 100010)
010-64015659,64038347
kjsk@vip.sina.com х
Jf *:	24 Jf	ЕР 5ft: 32
2005^4 Я^ II® 2006^1	2 ЛЕРИ
45	7-5062-6573-7/0 * 430
ISIXSiB: ffl?: 01-2005-1286
Те K|-:	89.00 х
t*IS4SElsevier (Singapore) Pte Ш.ШЯЙФИ±й
Й^ЖЕР^^.