Автор: John N. McDonald  

Теги: mathematics   mathematical analysis  

ISBN: 0-12-045143-3

Год: 2005

Текст
                    A Course in
Real Analysis
John N. McDonald
Department of Mathematics
Arizona State University
Neil A. Weiss
Department of Mathematics
Arizona State University
Biographies by Carol A. Weiss
ACADEMIC PRESS
Я f• ® f i Hi. V fl

Nonlinear Fiber Optics 3rd ed. G. P. Agrawal ISBN-.0-12-045143-3 Copyright © 2001, by Elsevier, All rights reserved. Authorized English language reprint edition published by the Proprietor. Reprint ISBN: 981-2592-99-7 Copyright © 2004 by Elsevier (Singapore) Pte Ltd. All rights reserved. Elsevier (Singapore) Pte Ltd. 3 Killiney Road #08-01 Winsland Hose I Sinagpore 239519 Tel: (65) 6349-0200 Fax: (65) 6733-1817 First Published 2005 2005 W Printed in China by Beijing World Publishing Corporation under special arrangement with Elsevier (Singapore) Pte Ltd. This edition is authorized for sale in China only, excluding Hong Kong SAR and Taiwan. Unauthorized export of this edition is a violation of the Copyright Act. Violation of this Law is subject to Civil and Criminal Penalties. Elsevier (Singapore) Pte Ltd.^t^tS^-S^lll
To Pat and Carol

Contents Preface xiii PART ONE □ Set Theory, Real Numbers, and Calculus 1 □ SET THEORY Biography: Georg Cantor 2 1.1 Basic Definitions and Properties 3 1.2 Functions and Sets 12 1.3 Equivalence of Sets; Countability 20 1.4 Algebras, (J-Algebras, and Monotone Classes 26 2 □ THE REAL NUMBER SYSTEM AND CALCULUS Biography: Georg Friedrich Bernhard Riemann 34 2.1 The Real Number System 35 2.2 Sequences of Real Numbers 43 vii
viii □ Contents 2.3 Open and Closed Sets 57 2.4 Real-Valued Functions 65 2.5 The Cantor Set and Cantor Function 73 2.6 The Riemann Integral 81 PART TWO □ Measure, Integration, and Differentiation 3 □ LEBESGUE THEORY ON THE REAL LINE Biography: Emile Felix-Edouard-Justin Borel 92 3.1 Borel Measurable Functions and Borel Sets 93 3.2 Lebesgue Outer Measure 103 3.3 Further Properties of Lebesgue Outer Measure 110 3.4 Lebesgue Measure 118 3.5 The Lebesgue Integral for Nonnegative Functions 128 3.6 Convergence Properties of the Lebesgue Integral for Nonnegative Functions 140 3.7 The General Lebesgue Integral 149 3.8 Lebesgue Almost Everywhere 161 4 □ MEASURE THEORY Biography: Henri Leon Lebesgue 166 4.1 Measure Spaces 167 4.2 Measurable Functions 174 4.3 The Abstract Lebesgue Integral for Nonnegative Functions 184 4.4 The General Abstract Lebesgue Integral 192 4.5 Convergence in Measure 203 4.6 Extensions to Measures 207 4.7 The Lebesgue-Stieltjes Integral 220 4.8 Product Measure Spaces 231 4.9 Iteration of Integrals in Product Measure Spaces 245
Contents □ ix 5 □ ELEMENTS OF PROBABILITY Biography: Andrei Nikolaevich Kolmogorov 260 5.1 The Mathematical Model for Probability 262 5.2 Random Variables 274 5.3 Expectation of Random Variables 288 5.4 The Law of Large Numbers 301 6 □ DIFFERENTIATION Biography: Johann Radon 314 6.1 Derivatives and Dini-Derivates 315 6.2 Functions of Bounded Variation 330 6.3 The Indefinite Lebesgue Integral 334 6.4 Absolutely Continuous Functions . 342 6.5 Signed Measures 354 6.6 The Radon-Nikodym Theorem 364 6.7 Signed and Complex Measures 377 6.8 Decomposition of Measures 390 6.9 Measurable Transformations and the General Change-of-Variable Formula 402 PART THREE □ Topological, Metric, and Normed Spaces 7 □ ELEMENTS OF TOPOLOGICAL, METRIC, AND NORMED SPACES Biography: Pavel Samuilovich Urysohn 410 7.1 Introduction to Topological Spaces 411 7.2 Metrics and Norms 419 7.3 Weak Topologies 427 7.4 Closed Sets, Convergence, and Completeness 431 7.5 Nets and Continuity 438 7.6 Separation Properties 447 7.7 Connected Sets 453 7.8 Separability, Second Countability, and Metrizability 459 7.9 Compact Metric Spaces 464
х □ Contents 7.10 Compact Topological Spaces 471 7.11 Locally Compact Spaces 475 7.12 Function Spaces 481 8 □ COMPLETE SPACES, COMPACT SPACES, AND APPROXIMATION Biography: Marshall Harvey Stone 492 8.1 The Baire Category Theorem 493 8.2 Contractions of Complete Metric Spaces 498 8.3 Compactness in the Space C(Q, Л) 503 8.4 Compactness of Product Spaces 509 8.5 Approximation by Functions From a Lattice 513 8.6 Approximation by Functions From an Algebra 518 9 □ HILBERT SPACES AND THE CLASSICAL BANACH SPACES Biography: David Hilbert 526 9.1 Preliminaries on Normed Spaces 527 9.2 Hilbert Spaces 533 9.3 Bases and Duality in Hilbert Spaces 545 9.4 £P-Spaces 553 9.5 Nonnegative Linear Functionals on C(Q) 563 9.6 The Dual Spaces of C(Q) and Co(Q) 573 10 □ BASIC THEORY OF NORMED AND LOCALLY CONVEX SPACES Biography: Stefan Banach 578 10.1 The Hahn-Banach Theorem 579 10.2 Linear Operators on Banach Spaces 590 10.3 Topological Linear Spaces 597 10.4 Weak and Weak* Topologies 609 10.5 Compact Convex Sets 618
Contents □ xi PART FOUR □ Harmonic Analysis and Dynamical Systems 11 □ ELEMENTS OF HARMONIC ANALYSIS Biography: Ingrid Daubechies 634 11.1 Introduction to Fourier Series 636 11.2 Convergence of Fourier Series 643 11.3 The Fourier Transform 653 11.4 Fourier Transforms of Measures 662 11.5 £2jTheory of the Fourier Transform 672 11.6 Introduction to Wavelets 678 11.7 Orthonormal Wavelet Bases; The Wavelet Transform 684 12 □ MEASURABLE DYNAMICAL SYSTEMS Biography: Claude Elwood Shannon 696 12.1 Introduction and Examples 697 12.2 Ergodic Theory 707 12.3 Isomorphism of Measurable Dynamical Systems; Entropy 715 12.4 The Kolmogorov-Sinai Theorem; Calculation of Entropy 723 Index 733

Preface This is a book about real analysis, but it is not an ordinary real analysis book. Written with the student in mind, this text incorporates pedagogical techniques not often found in books at this level. The book is intended for a one-year course in real analysis at the graduate level or the advanced undergraduate level. We bring over 50 years of combined teaching, research, and writing experience to this project. The text material has been class tested several times and has been used for independent study courses as well. What Makes This Book Unique This book contains many features that are unique for a real analysis text. Here are a few. Motivation of key concepts. All key concepts are motivated. The im- portance of and rationale behind ideas such as measurable functions, mea- surable sets, and Lebesgue integration are made transparent. Detailed theoretical discussion. Detailed proofs of most results (i.e., lemmas, theorems, corollaries, and propositions) are provided. However,
xiv □ Preface to fully engage the reader, proofs or parts of proofs are often relegated to the exercises. Illustrative examples. Following most definitions and results, one or more examples are presented that illustrate the concept or result in order to solidify it in the reader’s mind and provide a concrete frame of reference. This book contains approximately 200 examples, most of which consist of several parts. Abundant and varied exercises. The text contains over 1200 exercises, not including parts, far more than other real analysis books. Furthermore, the exercises vary widely with regard to application and level. Applications. A diverse collection of applications appears throughout the text, some as examples and others as entire sections or chapters. For in- stance, applications to probability theory are ubiquitous. Other applica- tions include those to Fourier analysis, wavelets, and measurable dynamical systems. Careful referencing. As an aid to effective use of the book, we have con- sistently provided references (including page numbers) to definitions, exam- ples, exercises, and results. Additionally, we have marked post-referenced exercises with a star (★); we strongly recommend that all such exercises be done by the reader. Biographies. Each chapter begins with a brief biography of a famous mathematician. Besides being of general interest, these biographies help the reader obtain a perspective on how real analysis and its applications have developed. Organization The text offers considerable flexibility in the choice of material to cover. • Chapters 1 and 2 present prerequisite material that may be review for many but provides a common ground for all readers. At the option of the instructor, these two chapters can be covered either briefly or in detail; they can also be assigned to the students for independent reading. • Chapters 3 and 4 present the elements of measure and integration by first discussing the Lebesgue theory on the line (Chapter 3) and then the abstract theory (Chapter 4). This material is prerequisite to all subsequent chapters.
Preface □ xv • Chapter 5 provides an introduction to the fundamentals of probability theory, including the mathematical model for probability, random vari- ables, expectation, and laws of large numbers. Although optional, this chapter is recommended as it provides a myriad of examples and appli- cations for other topics. • In Chapter 6 differentiation is discussed, both of functions and of mea- sures. Topics examined include differentiability, bounded variation, and absolute continuity of functions, and a thorough discussion of signed and complex measures, the Radon-Nikodym theorem, decomposition of measures, and measurable transformations. * • Chapter 7 provides the fundamentals of topological and metric spaces. This chapter can be covered relatively quickly when the students have a background in topology from other courses. In addition to topics tra- ditionally found in an introduction to topology, a discussion of weak topologies and function spaces is included. • Completeness, compactness, and approximation comprise the topics for Chapter 8. Examined therein are the Baire category theorem, contrac- tions of complete metric spaces, compactness in function and product spaces, and the Stone-Weierstrass theorem. • Presented in Chapter 9 are Hilbert spaces and the classical Banach spaces. Among other things, bases and duality in Hilbert space, com- pleteness and duality of £p-spaces, and duality in spaces of continuous functions are discussed. • The basic theory of normed and locally convex spaces is given in Chap- ter 10. Topics include the Hahn-Banach theorem, linear operators on Banach spaces, fundamental properties of locally convex spaces, and the Krein-Milman theorem. • Chapter 11 provides applications of previous chapters to harmonic anal- ysis. We examine the elements of Fourier series and transforms and the £2-theory of the Fourier transform. In addition, an introduction to wavelets and the wavelet transform is presented. • Chapter 12 examines measurable dynamical systems. This chapter re- quires the one on probability (Chapter 5) and discusses ergodic theorems, isomorphisms of measurable dynamical systems, and entropy. The flowchart on the next page summarizes the preceding discussion and depicts the interdependence among chapters. In the flowchart, the prerequisites for a given chapter consist of all chapters having a path leading to that chapter.
xvi □ Preface
Preface □ xvii A cknowledgmen ts It is our pleasure to thank the following reviewers, whose comments and suggestions were invaluable in finalizing the book: Wilfrid Gangbo Georgia Institute of Technology Maria Girardi University of South Carolina Michael Klass University of California, Berkeley Bert Schreiber Wayne State University Bruce A. Barnes University of Oregon Dennis D. Berkey Boston University Courtney Coleman Harvey Mudd College Peter Duren University of Michigan Our very special thanks go to Bruce Barnes who undertook a detailed reading of the entire manuscript and provided comments and suggestions throughout. We also thank the many graduate students in our courses, past and present, who furnished invaluable feedback; in particular, we would like to express our appreciation to Mohammed Alhodaly, Hamed Alsulami, Jimmy Mopecha, Lynn Tobin, and, especially, Jim Andrews, Trent Buskirk, Menassie Ephrem, Ken Peterson, John Williams, and Xiangrong Yin. We thank Arizona State University for its support and those chairs of the ASU Mathematics Department who provided encouragement for the project: Rosemary Renaut, Christian Ringhofer, Nevin Savage, and William T. Trotter. Our appreciation goes as well to Berthold Horn and Louis Vosloo of Y&Y, Inc., for their I^X software package and consistent willingness to provide technical support; to Amy Hendrickson of T^Xnology Inc., for perusing our T^X macros; to our copyeditors Carroll and Eugene Robinson; and to our cover designer Richard Hannus of Hannus Design Associates. Thanks to all of those at Academic Press for helping make this book a reality, in particular, to Nicole Burnett, Bettina Carbonaro, Victor Curran, Carla Daves, Linda Ratts Engelman, Julio Esperas, Amy Fulton, Pascha Gerlinger, Charles Glaser, Anja Mutic-Blessing, Peter Renz, Bob Ross, and Karen Wachs. Finally, we would like to express our heartfelt thanks to Carol Weiss. Apart from writing the text, she was involved in every aspect of develop- ment and production. Moreover, Carol researched and wrote the biogra- phies and took on the task of typesetter using the TgX typesetting system. Tempe, Arizona J.N.M N.A.W.

A Course in Real Analysis

PART ONE □ Set Theory, Real Numbers, and Calculus
(1845-1918) Georg Cantor was born on March 3, 1845, in St. Petersburg, Russia. He received his doc- torate in mathematics from the University of Berlin in 1867, having studied under Weier- strass, Kummer, and Kronecker. In 1869, he accepted a teaching position at the University of Halle and became a full professor in 1879. Cantor wanted to obtain a professorship at the University of Berlin, where both pay and prestige were higher, but Kronecker, believing that much of Cantor's work (particularly his "trans- finite numbers") was unsound, stood firmly in Cantor's path. Others, however, acknowledged Cantor’s genius. Cantor was an hon- orary member of the London Mathematical Society and received honorary doctorates from both Christiania and St. Andrews. Hilbert said Cantor's work was "... the finest product of mathematical genius and one of the supreme achievements of purely intellectual human activity.” Known as the founder of set theory, Cantor also made fundamental contributions to classical analysis. Many concepts in modern mathemat- ics bear his name, among which are Cantor series and Cantor sets; he also developed the first usable definition of the continuum. The controversy surrounding his work took a heavy toll on Cantor; beginning in 1884, bouts of deep depression drove him often to a sani- tarium. Georg Cantor died in a psychiatric clinic at the University of Halle (where he had remained as a professor) on January 6, 1918. 2
□ 1 □ Set Theory In this chapter, we will introduce the fundamentals of set theory. Although some readers may be familiar with much of the material, we present this chapter as a way to provide a common ground for all readers of the text. We will first discuss basic definitions and properties of sets. Next we will explore relationships between functions and sets, discuss Cartesian products, and introduce countability. Finally, we will examine algebras, a-algebras, and monotone classes — special collections of sets that play a prominent role in analysis and measure theory. 1.1 BASIC DEFINITIONS AND PROPERTIES A set is a collection of elements. If A is a set and x is an element (member, point) of A, then we write x € A; x A means than x is not an element of A and, in general, we use to signify negation. The symbol 0 denotes the empty set, a set containing no elements. Let A and В be sets. If every element of A is an element of B, then A is said to be a subset of B, denoted А С В or В D A. Two sets, A and B, are equal if they contain the same elements — in other words, if 3
4 □ Chapter 1 Set Theory А С В and В C A. If А С В but В £ A, then we say that A is a proper subset of B. EXAMPLE 1.1 Illustrates Sets and Subsets In this text, the following sets play a fundamental role: C = collection of complex numbers 11 = collection of real numbers Q = collection of rational numbers Z = collection of integers and X = collection of positive integers Note that Лг C Z C Q C ft С C or, equivalently, C D 7^ D Q D Z D Лг. But, C^1Z<^.QgLZ(^M or, equivalently, N^Z^Q^IZ^C. □ We will use the notation {a} to denote the set consisting of the ele- ment a; {a, b} to denote the set consisting of the elements a and 6; {a, b, c} to denote the set consisting of the elements a, b, and c; and so on. Let Q be a set. Subsets of Q are frequently defined in terms of proper- ties that its elements must satisfy. If P(x) is some proposition concerning z, then { x G Q : P(x)} is the collection of elements x G Q such that P(x) is true. For example, {x G AT : x2 < 5} = {1,2}. When no confusion is possible, we will sometimes abbreviate { x G Q : P(x) } to { x : P(x) }. Of particular importance in real analysis are intervals of real num- bers. The notation and terminology associated with these subsets of 1Z are presented in the following definition. DEFINITION 1.1 Intervals of Real Numbers Let a and b be real numbers such that id < b. The bounded intervals with endpoints a and b are as follows: (a,b) = {x e It: a < x < b} [a,6) = { x G It : a < x < b} (a,b\ = {x elt: a < x <b} [a,b\ = {x € 1t: a < x < b}
1.1 Basic Definitions and Properties □ 5 The unbounded intervals axe as follows: (a, oo) = { x G H : x > a } [a, oo) = { x G TZ : x > a } (—00, b) = (xEll: x <b} (—00,b] = {x : x <b} (—00,00) = {x e 1Z} = 1Z Complement, Intersection, and Union We will now discuss three fundamental operations on sets — complement, intersection, and union. In what follows, we will assume that all sets un- der consideration are subsets of some fixed set Q, often referred to as the universal set. The set of all subsets of Q is called the power set of Q and is denoted by P(Q). Thus, A C Q if and only if A € P(Q). Let A and В be subsets of Q. The complement of A, denoted Ac, is the set of elements of Q that do not belong to A. Thus, Ac = {x:x£A}. The intersection of A and B, denoted А П B, is the set of elements of Q that belong to both A and B. Thus, AAB = {x:2:EA and x G В }. The union of A and B, denoted A U B, is the set of elements of Q that belong to either A or В (or both); in other words, those elements that belong to at least one of A and B. Thus, AU В = {x : x e A or x e B}. Two important relationships among the three set operations of union, intersection, and complement are given in the following proposition, known as De Morgan’s laws. PROPOSITION 1.1 De Morgan’s Laws Let A and В be subsets of Q. Then, a) (AUB)C = ACABC. b) (АПВ)С = ACUBC.
6 □ Chapter 1 Set Theory PROOF: We prove part (a) and leave the proof of part (b) as an exercise for the reader. Suppose x E (A U B)c. Then x A U В so that x A and x B. But then x 6 Ac and x G Bc, which implies that x G Ac П Bc. Thus, (AUB)cC АСПВС. Conversely, suppose x G Ac П Bc. Then x E Ac and x G Bc so that x A and x B. But then x A U B, which implies that x E (A LLB)C. Thus, Ac П Bc С (Л U B)c. We have now shown that (AUB)C С АСПВС and АСПВС C (AUB)C. This means that (A U B)c = Ac П Bc. The following proposition shows that intersection and union obey the distributive laws. The proof is left to the reader as an exercise. PROPOSITION 1.2 Distributive Laws Let A, B, and C be subsets of Q. Then, a) An(BUC) = (AnB)U(AnC). b) A U (В П C) = (A U В) П (A U C). Relative Complement and Symmetric Difference Several set operations can be derived from the three basic operations of complement, intersection, and union. Two of the most important such operations are relative complement and symmetric difference. The definitions of these two set operations follow. DEFINITION 1.2 Relative Complement Let A and В be subsets of Q. Then the complement of A relative to B, denoted В \ A, is the set of all elements belonging to В that do not belong to A. Thus, B\A = {x:xEB and x 0 A }. In particular, we have that Ac = Q \ A; that is, the (absolute) com- plement of A is the complement of A relative to Q. Note: Clearly, we have В\А = ВпАс.
1.1 Basic Definitions and Properties □ 7 DEFINITION 1.3 Symmetric Difference Let A and В be subsets of fl. Then the symmetric difference of A and B, denoted A A JB, is the set of all elements belonging to either A or В but not both A and B. Thus, AAB = {x:xGA or xGB, and x А П В }. Note: It is easy to see that А А В = (A \ B) U (B \ A). We leave the verification to the reader as an exercise. More on Set Operations Exercises 1.1 and 1.2 discuss several properties of union and intersection. Among those properties are the following two: АП(ВПС) = (АПВ)ПС and A U (B U C) = (A U B) U C. The two sets in the first equality consist of all elements that belong to A, B, and C, which we write as А П В П C. Thus, АП В A C = {x : x e A and x G В and x G C }. The two sets in the second equality consist of all elements that belong to at least one of A, B, and C, which we write as A U В U C. Thus, AU В U C = {x : x G A or x G В or x G C}. We can generalize the notions of intersection and union to arbitrary collections of sets. DEFINITION 1.4 Intersection and Union Let C be a collection of subsets of fl, that is, С C P(fl). a) The intersection of C, denoted p|AeC A, is the set of elements of fl that belong to each set in the collection C. Thus, p| A = { x : x e A for all A G C }. лес
8 □ Chapter 1 Set Theory b) The union of C, denoted (JAec ^be se^ °f elemen^s of that belong to at least one of the sets in the collection C. Thus, [J A = {x : x G A for some A G C }. лес EXAMPLE 1.2 Illustrates Definition 1.4 Let Q = TZ and C = { [0,1/n] : n G JV }. Then p| A = {0} and (J 4 = [0,1], AeC Aec as the reader can easily verify. □ De Morgan’s laws and the distributive laws hold for any collection of subsets. These are stated in the following two propositions whose proofs are left to the reader as exercises. PROPOSITION 1.3 De Morgan’s Laws Let C be a collection of subsets of Q. Then, (n аУ = и a°- 'AeC ' AeC »(u-'Y xaec 7 Aec PROPOSITION 1.4 Distributive Laws Let C be a collection of subsets of Q and В a subset of Q. Then, a) BQ л) = и(ВпЛ). 'АбС ' AeC b) Bu(p| а\ = р|(ВиЛ). 'Aec ' Aec
1.1 Basic Definitions and Properties □ 9 Indexed Collections of Sets Suppose that J is a set and that to each l E I there corresponds a unique subset Ab of Q. Then we have an indexed collection of subsets of Q, indexed by I. We denote such a collection by In case I = {1,2,..., N}, the indexed collection is denoted by {An}^=1 and is called a finite sequence of sets. Similarly, if I = AT, the indexed collection is denoted by and is called an infinite sequence of sets. In both of these cases, we say that the indexed collection is a sequence of sets, and we often write {An}n to represent either a finite or infinite sequence of sets. For an indexed collection of sets, we denote the intersection and union of the collection by Ab and IJ^ez respectively. Thus, P| Ab = { x : x e Ab for all l e I} iei and PJ Ab = {x : x E Ab for some t 6 I}. In case I = {1,2,..., TV}, we use the notations Q^=1 An and (Jn=i respectively, for the intersection and union of the indexed collection. Sim- ilarly, if I = AT, we use the notations An and respectively, for the intersection and union of the indexed collection. For example, if we let An = (0,1/n] for each n e A/\ then oo oo 0 An = 0 and |JA„ = (O,1]. 71 = 1 71=1 Disjoint Collections of Sets An essential concept in analysis is that of disjoint sets. Two sets are disjoint if they have no elements in common. More generally, we have the following definition. DEFINITION 1.5 Disjoint and Pairwise Disjoint Two subsets, A and B, of Q are said to be disjoint if А П В = 0. A collection C of subsets of Q is said to be pairwise disjoint if each two distinct members of C are disjoint. If C is a pairwise disjoint collection, we often say the members of C are pairwise disjoint sets.
10 □ Chapter 1 Set Theory An indexed collection, {At}te/, °f subsets of Q is said to be pairwise disjoint if Ai П A3 = 0 whenever г / j. In case I = {1,2,..., TV} or I = AT and the indexed collection is pairwise disjoint, we say that we have a pairwise disjoint sequence of subsets of Q. EXAMPLE 1.3 Illustrates Definition 1.5 Let Q — 7£. a) The sets Z and (0,1) are disjoint, since Z П (0,1) = 0. b) The sets Z and [0,1] are not disjoint, since Z П [0,1] = {0,1} 0. c) The indexed collection, {[n — 1, subsets of 7Z is pairwise dis- joint because [m — 1, m) П [n — 1, n) = 0, m / n. d) The indexed collection, {[n — 1, n]}^sl, of subsets of 7Z is not pairwise disjoint, because, for instance, [0,1] П [1,2] = {1} / 0. Note, however, that the intersection of all the members of the collection is empty, that is, QJJLJn — l,n] = 0. This shows that for a collection of sets to be pairwise disjoint it is not sufficient for the intersection of that collection to be empty. Is it necessary? □ EXERCISES 1.1 1.1 Let A, B, and C be subsets of Q. Prove each of the following. a) AU В = BU A b) AU0 = A c) A U (B U C) = (A U B) U C d) A C A U В e) A = A U В if and only if В C A 1.2 Let A, B, and C be subsets of Q. Prove each of the following. а) АПВ = ВПА b) An0 = 0 с) А П (В П С) = (А П В) П C d) A D А П В e) A = А П В if and only if В D A 1.3 Let A and В be subsets of Q. Verify each of the following statements. a) A = (А П B) U (А П Bc) b) АП В = 0 => AC Bc с) AC В =>BC C Ac 1.4 Let A and В be subsets of Q. Prove that a) A\B = AQBC. b) АД B = (A\B)U(B\A).
1.1 Basic Definitions and Properties □ 11 1.5 Let A, B, and C be subsets of Q. Establish each of the following facts. а) А А (В А С) = (А Л В) A C b) AAQ = AC c) A A 0 = A d) A A A = 0 1.6 Let A, B, and C be subsets of Q. a) Prove that А П (В A С) = (А П В) А (А A C). b) What is the relationship between A U (В A C) and (A U B) A (A U С)? c) Precisely when does A U (В A C) = (A U B) A (A U С)? 1.7 Let A and В be subsets of Q. Show that A = A A В if and only if В = 0. 1.8 Let {An}“=1 be an infinite sequence of subsets of Q. a) Prove that k=n The set on the left is called the limit inferior of {An}^=1 and is denoted by lim inf„—oo An; the set on the right is called the limit superior of {Anj^Lj and is denoted by limsupn_^oo An. b) Describe in words the limit inferior and limit superior of {Anj^Lj, and use that description to'interpret the relation in part (a). c) Let Q = 1Z and define An f [0,1 + 1/n], ( [-1 - l/n,0], if n is an even positive integer; if n is an odd positive integer. Determine lim inf n—oo An and limsupn_>oo An. 1.9 Prove the general form of De Morgan’s laws, Proposition 1.3 on page 8. 1.10 Prove the general form of the distributive laws for sets, Proposition 1.4 on page 8. 1.11 Let C be a collection of subsets of Q and В a subset of Q. Prove each of the following facts. a) If В C Ua6C A> then B = UaecG4 n B)- b) If Uaec A ~ then E = Uaec^ D E) for each subset E of Q. c) If C is pairwise disjoint, then so is the collection { А П E : A E C} for each subset E of Q. d) We say that C is a partition of Q if it is pairwise disjoint and its union is Q. Conclude from parts (b) and (c) that if C is a partition of Q, then each subset E of Q can be expressed as a disjoint union of the collection of sets { А A E : A E C}.
12 □ Chapter 1 Set Theory 1.12 There is a slight distinction between the notions of pairwise disjoint for nonindexed collections of sets and indexed collections of sets, namely, an indexed collection of sets, can fail to be pairwise disjoint even though the collection, C = {Ab : l € I}, is pairwise disjoint. Provide an example that illustrates this fact. 1.13 Give an example of a collection C of sets that is not pairwise disjoint, has at least four members, and is such that any three distinct members of C have an empty intersection. 1.2 FUNCTIONS AND SETS Suppose that Q and Л are sets. A function (mapping, transformation) from Q to A is a rule that assigns to each element x G Q a unique element f(x) G A? We call f(x) the value of f at x or the image of x under f. To indicate that f is function from Q to Л, we often write f: Q —> Л. The set Q is called the domain of f. The set {f(x) : x G Q} is called the range of f. We note that, in general, the range of f will be a proper subset of Л. Two further concepts important in the study of functions are given in Definition 1.6. DEFINITION 1.6 One-to-One and Onto Let f be a function from Q to Л. a) f is said to be one-to-one (or injective) if distinct elements of Q have distinct images; that is, if f(xi) = f(x2) implies that x± = rr2. b) / is said to be onto (or surjective) if each element of Л is the image of some element of Q; that is, for each у G Л, there is an x G Q such that у = f{x). Thus, f is onto if and only if the range of f equals A. If a function is both one-to-one and onto, then we can invert the func- tion by using the rule that assigns to each element in the range the unique element in the domain of which it is the image. More precisely, we have the following definition. t We will take an intuitive approach to functions; that is, we will not use the definition based on ordered pairs.
1.2 Functions and Sets □ 13 DEFINITION 1.7 Inverse of a Function Suppose that f: fl —► Л is one-to-one and onto. For у e Л, let /“1(t/) be the unique x e fl such that у = f(x). The function f'"1: A —> Q so defined is called the inverse of the function f. EXAMPLE 1. 4 Illustrates Definition 1.7 Define /: [0,1] —► [2,5] by f(x) = 3x2 + 2. Then f is one-to-one and onto. As the reader can verify, the inverse of the function /, /-1: [2,5] [0,1], is given by (?/) = ^/(з/ — 2)/3. □ Let f be a function from fl to Л and g be a function from Л to Г. Then we can define a function from fl to Г by first applying f and then applying g to that result. Here is a formal definition. DEFINITION 1.8 Composition of Functions Let f: fl —> Л and g: Л —> Г. Then the composition of g with /, denoted g о /, is the function g о f: fl —> Г defined by EXAMPLE 1. 5 Illustrates Definition 1.8 Define f : TZ —> [0, oo) by f(x) = x2 and g: [0, oo) —»1Z by g(y) = y/у. Then the composition of g with f, g о ft TZ —» TZ, is given by (9 ° /)(*) = g(f(x)) = g{x2) = = |a:|. In this case, we can also consider the composition going the other way, that is, the composition of f with g, f о g: [0, oo) —> [0, oo), which is given by (У 0 p)(y) = №(j/)) = /(л/у) = (x/y)2 = y- D Sometimes we have a function defined on a set that we want to restrict to a subset of that set. To be specific, suppose /: fl —► Л and that A C fl. From f we can obtain a function from A to Л, called the restriction of f to A, denoted /|д, and defined by /\a(x) = /(^) for x € A.
14 □ Chapter 1 Set Theory Sequences and Subsequences Sequences are an important class of functions. An infinite sequence is a function whose domain is the set of positive integers, X. If s is an infinite sequence, then s(n) is called the nth term of the sequence and is usually denoted sn. For ease in notation, we use {sn}^Li to denote both the infinite sequence whose nth term is sn and the range of the sequence, that is, { sn : n € Af }; context will determine which meaning is intended. A finite sequence of length N is a function whose domain is the first N positive integers, {1,2,..., TV}. As for infinite sequences, if s is a finite sequence, then s(n) is called the nth term of the sequence and is usually denoted sn. For ease in notation, we use {sn}^=1 to denote both the finite sequence of length N whose nth term is sn and the range of the sequence, that is, { sn : n = 1, 2, ..., N }; context will determine which meaning is intended. We use the term sequence to refer to both infinite and finite se- quences. The notation {sn}n represents a sequence that may be finite or infinite and whose nth term is sn. If the range of a sequence {sn}n is a sub- set of a set Q, then we say that {sn}n is a sequence of Q or a sequence of elements of Q. EXAMPLE 1. 6 Illustrates Sequences a) The sequence {З^71}^ is an infinite sequence of 1Z. b) A sequence {An}n of subsets of a set Q, as defined on page 9, is a sequence of P(f2), the set of all subsets of П. □ Let be an infinite sequence and {п^}^ an infinite sequence of positive integers such that ni < rt2 < • • •• Then the sequence {$nfc}£Li is said to be a (infinite) subsequence of We note that a subse- quence of a sequence is the composition of two functions. To illustrate subsequences, let {sn}^^ be the sequence in part (a) of Example 1.6 and let = 2k. Then snk = 3"nfc = 3“2fc = $~k. In other words, the subsequence {snfc}i&i sequence {9~n}^=1. Subsequences of finite sequences are defined similarly to those for in- finite sequences. We leave the details to the reader. Images and Inverse Images Let f: Q —> A. If A C fi, then we define = {f(x) --xeA}, called the image of A under f.
1.2 Functions and Sets □ 15 If В с Л, then we define called the inverse image of В under /. The next two propositions relate set operations and functions. We state the results in terms of indexed collections because that is generally what we deal with? The proofs of the propositions are left to the reader as exercises. PROPOSITION 1.5 Let f:Cl—>A,AcQ, and { Ab }b$i an indexed collection of subsets of Q. Then *) f(UAj = Uf(AJ ' lei and b) \ei ' If f is one-to-one, then с) /(Пл) = Г|/(Л) \ei ' i€i and d) f(Ac) С (/(Л))с. And, if f is onto, then e) f(A') D (/(A))c. PROPOSITION 1.6 Let В C A, and { Bb }bEj an indexed collection of subsets of A. Then a) = 4ez ' lei t Actually, we are not losing generality, as any collection of subsets is an indexed collection that is indexed by the collection itself.
16 □ Chapter 1 Set Theory b) and с) Г\вс) = (г\в))с. The Axiom of Choice and Zorn's Lemma Many of the results that we will discuss in this text require more than the axioms of elementary set theory. Rather, they depend in addition on an axiom called the axiom of choice, which is independent of (i.e., cannot be derived from) the axioms of elementary set theory. Roughly speaking, the axiom of choice asserts that given a collection of nonempty sets, it is possible to select an element from each set in the collection. More precisely, we have the following statement. Axiom of Choice Suppose that C is a collection of nonempty sets. Then there exists a func- tion f:C -* IJagc suc^ /(^) A for each A e C. Although most mathematicians use the axiom of choice without hes- itation, some employ it only when they cannot obtain a proof without it and others consider it unacceptable. In this text, we will apply the axiom of choice freely, both tacitly and explicitly. There are several important equivalences to the axiom of choice. We will discuss only one, namely, Zorn's lemma. In preparation for stating Zorn’s lemma, we make the following definition. DEFINITION 1.9 Partial Ordering; Partially Ordered Set Let Q be a set. A relation on Q is said to be a partial ordering if for all x, y, z € fi, a) x -< x [reflexive]. b) x -< у and у -< x implies x = у [antisymmetric]. c) x -< у and у z implies x -< z [transitive]. The pair (Q, -<) is called a partially ordered set.
1.2 Functions and Sets □ 17 EXAMPLE 1. 7 Illustrates Definition 1.9 a) We have that < is a partial ordering on and, hence, (7£, <) is a partially ordered set. b) Let Q be a set. Then C is a partial ordering on P(Q) and, hence, (P(Q), C) is a partially ordered set. □ Let (Q, -<) be a partially ordered set. A subset C of Q is called a chain if for each x,y G C, either x -< у or у -< x. An element и G Q is called an upper bound for a subset A of Q if x -< и for all x G A. An element m G is called a maximal element of Q if x G Q and m -< x implies that x = m. With the preceding definitions in mind, we can now state Zorn’s lemma which, as we mentioned earlier, is equivalent to the axiom of choice? Zorn’s Lemma Let (Q, -<) be a nonempty partially ordered set with the property that each chain has an upper bound. Then Q has a maximal element. Applications of both the axiom of choice and Zorn’s lemma will appear throughout the text. Cartesian Products Next we will introduce Cartesian products. First we define the Cartesian product of a finite number of sets. DEFINITION 1.10 Cartesian Product of a Finite Number of Sets Let A and В be two sets. Then the Cartesian product of A and В (in that order), denoted A X B, is the set of all ordered pairs (a, b), where a G A and b G B. Thus, A x В = { (a, b) : a G A, b G В }. More generally, if Ai, A2, ..., An are sets, then the Cartesian product of those n sets, denoted Ai X A2 X • • • X An or Xj=1 Аь, is the t For a proof of the equivalence, see, for example, John L. Kelley’s General Topology (New York: Van Nostrand, 1955), p. 33.
18 □ Chapter 1 Set Theory set of all ordered n-tuples (ai, O2> • • •» an), where ak G Ak for к = 1, 2, ..., n. Thus, n Ak — { (di, • • • j ®n) : ^k £ Afc, 1 < к < П }. k=l An important special case of Cartesian product occurs when all of the sets in the product are identical. If Ak = A for 1 < к < n, where A is some set, then we write An for the Cartesian product. In other words, An = Ax Ax ••• x A. EXAMPLE 1. 8 Illustrates Definition 1.10 a) If at least one of A and В are empty, then so is A x В. b) Let Г and A be two sets, А С Г and В C A. Then the subset A x В of Г x A is called a rectangle. Note that, in general, not every subset of Г x A is a rectangle. c) The set lZn is called Euclidean n-space. d) The set Cn is called unitary n-space. □ We can generalize the Cartesian product to any collection of sets. This is done as follows. DEFINITION 1.11 Cartesian Product of a Collection of Sets Let be an indexed collection of sets. The Cartesian product of the collection, denoted is the set of all functions x on I such that z(t) is an element of Ab for each l G I. Thus, X Ab = < x: I -* (J Ab : x(b) G Ab, lE I lei tei We call x(l) the tth coordinate of x and usually denote it by xb. If Ab = 0 for some l G I, then X beI Ab = 0. Conversely, in view of the axiom of choice, if Ab 0 for all l G I, then X bEl Ab 0. An important special case of Cartesian product occurs when all of the sets in the product are identical. Suppose that Ab = A for all l G /, where A is some set. Then we write A1 for the Cartesian product. Thus, A1 is the set of all functions from I to A.
1.2 Functions and Sets □ 19 EXAMPLE 1. 9 Illustrates Definition 1.11 a) If I = {1,2,, n}, then we use the notation X £=1 Ak for the Cartesian product. b) If I = {1,2, ...,n}, then we write An in place of Thus, An denotes the set of all sequences of length n of elements of A . c) If I = W, we use the notation X Xi An for the Cartesian product. d) If I = Af, then w§ sometimes write A°° in place of A^. Thus, A°° de- notes the set of all infinite sequences of elements of A. e) is the set of all real-valued functions on [0,1]. f) C* is the set of all complex-valued functions on TZ. □ We seemingly have two different definitions of the Cartesian product of a finite number of sets, one given by Definition 1.10 and the other by Definition 1.11. However, the appropriate identification shows that the difference is only apparent. Indeed, let Л1, Л2, ..., An be n sets. By Definition 1.10, X £=1 Ak is the set of all ordered n-tuples (ai, 02,..., an), where ak G Ak for 1 < к < n. On the other hand, by Definition 1.11, х£=1Л& is the set of all func- tions x on {1,2,..., n} such that Xk G Ak for 1 < к < n. If we identify each such function x with the ordered n-tuple (^1,^2, • • • 5^n)5 then we obtain a 1-1 correspondence between the Cartesian product x£=1 Ak as defined in Definition 1.11 and the Cartesian product х£=1 Ak as defined in Definition 1.10. We will follow conventional notation and use the ordered n-tuple in- terpretation of the Cartesian product of a finite number of sets. Thus, for example, we construe 7Zn as the set of all ordered n-tuples of real numbers, realizing, however, that it can also be interpreted as the set of all sequences of length n of real numbers. EXERCISES 1.2 1.14 Suppose that /:Q —> A is one-to-one and onto. Prove that J"1 (/(□?)) = x for all x € Q and = у for all у G A. 1.15 Let /:Q—*A. a) Prove that f is one-to-one if and only if there is a function g: A —> Q such that (g о /)(ж) = x for all x 6 Q. b) Prove that f is onto if and only if there is a function g: A —► Q such that (/ 0 9)(y) — У f°r all У £ A. Hint: The axiom of choice is needed for the “only if’ part. c) Suppose there is a function <7: A —> Q such that (g о /)(х) = x for all x G Q and (/ о g)(y) = у for all у 6 A. Prove that g = J”1.
20 □ Chapter 1 Set Theory 1.16 Let be an infinite sequence of elements of Q and {snfc}^=1 a sub- sequence of {sn}^_1. Interpret the subsequence as a composition of two functions. 1.17 Suppose that f : Q —> A is one-to-one and onto. Show that for В C A, the two definitions of are consistent; that is, the image of В under J"1 equals the inverse image of В under f. 1.18 Prove Proposition 1.5 on page 15. 1.19 Refer to Proposition 1.5 on page 15. a) Show that the assumption of one-to-one cannot be dropped for parts (c) and (d). b) Show that the assumption of onto cannot be dropped for part (e). 1.20 Prove Proposition 1.6 beginning on page 15. 1.21 Let f: Q -> Л, A C Q, and В C A. a) Show that С В and that equality holds if f is onto. b) Show that J"1 (/(A)) D A and that equality holds if f is one-to-one. 1.22 Show that the axiom of choice is equivalent to the following statement: If is any indexed collection of nonempty sets, then X ceJ Ac / 0. 1.23 Let Q be a nonempty set. Construct a one-to-one function from P(Q) onto {0, l}n. 1.24 Let Q be a nonempty set. Prove that there is no function from Q onto P(Q). Hint: Suppose to the contrary that such a function, say, /, exists. Let A = {x e Q : x £ f(x)}. 1.3 EQUIVALENCE OF SETS; COUNTABILITY We see from Proposition 1.5 on page 15, that if /: Q —» A is one-to-one and onto, then, from a set theoretic point of view, Q and A are equivalent because, under those circumstances, the set operations are preserved by f. Thus we can think of f as simply renaming the elements of Q according to the rule x —> f(x). If f: Q —> A is one-to-one and onto, then it is called a 1-1 correspondence (or bijective function). Keeping the previous paragraph in mind, we now define set equiva- lence. Suppose that A and В are any two sets. Let us write A ~ В if there is a 1-1 correspondence from A to B. We leave it as an exercise for the reader to show that for any three sets, A, B, and (7, • A ~ A [reflexive]. • A ~ В implies В ~ A [symmetric]. • A ~ В and В ~ C implies A ~ C [transitive]. In view of these facts, we make the following definition.
1.3 Equivalence of Sets; Countability □ 21 DEFINITION 1.12 Equivalence of Sets Two sets are said to be equivalent if there is a 1-1 correspondence from one to the other. Finite, Infinite, Countable, and Uncountable Sets Using the concept of equivalence of two sets, we can now present definitions regarding the “size” of a set in the sense of how many elements it contains. DEFINITION 1.13 Finite, Infinite, Countable, and Uncountable Let A be a set. We say that a) A is finite if it is either empty or equivalent to the first N positive integers for some N E M In the former case, A is said to consist of 0 elements and, in the latter case, N elements. b) A is infinite if it is not finite. c) A is countably infinite if it is equivalent to X. d) A is countable if it is either finite or countably infinite. e) A is uncountable if it is not countable. EXAMPLE 1.10 Illustrates Definition 1.13 a) The set of all integers, Z, is countably infinite. Indeed, the function /:ЛГ —> Z defined by ‘ , x _ (n/2, n even; |-(n-l)/2, n odd, is a 1-1 correspondence from X to Z. b) Any (nondegenerate) interval of is uncountable. One proof of this fact is presented in Exercise 1.26 and another in Exercise 3.34 on page 126. In particular, 1Z is uncountable. c) Define /:№ -> AT by /(m,n) = 2m~1(2n - 1). Then it can be shown (see Exercise 1.27) that f is a 1-1 correspondence. Consequently, AT2 is countably infinite and, hence, countable. □ We can express countability in terms of sequences. If A is countably infinite, then, by definition, there is a 1-1 correspondence —» A. If we let sn = /(ti), then the infinite sequence {sn}^Li is called an enumeration
22 □ Chapter 1 Set Theory of A. Similarly, if A is a finite nonempty set, then, by definition, there is an N G AT and a 1-1 correspondence /: {1,2,..., N} —> A. If we let sn = f(n), then the finite sequence {sn}^=1 is called an enumeration of A. In particular, we see that if a set is countably infinite, then it is the range of an infinite sequence (but not conversely); and that if a set is finite and nonempty, then it is the range of a finite sequence (and conversely). The following proposition is also quite useful. PROPOSITION 1.7 A nonempty set is countable if and only if it is the range of an infinite sequence. PROOF: Suppose A is countable. Then, by definition, it is either finite or countably infinite. If it is countably infinite, then, by definition, it is equivalent to X, which means there is a one-to-one and onto function, /, from AT to A. Letting sn = /(n), we have that A is the range of the infinite sequence {sn}~r If A is finite (and nonempty), then, by definition, there is an N G AT such that A is equivalent to the first N positive integers. Let д be a one-to-one and onto function from {1,2,... ,N} to A. Select x G A and define the infinite sequence s by sn = g(n) for n = 1, 2, ..., AT, and sn = x for n > N. Then A is the range of the infinite sequence {sn}Xr Conversely, suppose A is the range of an infinite sequence, {sn}^=1. We claim that A is countable. If A is finite, there is nothing to prove. So, assume that A is infinite. We will construct a 1-1 correspondence from А/ to A thereby proving that A is countably infinite and, hence, countable. Let Tii = 1. Since A is infinite, A \ {sni} / 0- Therefore, because the range of {snjJXi is A, the set { n G AT : sn / $i } is not empty. Denote by ri2 the smallest integer in that set? Note that ni < П2. Proceeding inductively, note again that since A is infinite, we have that A \ {sni5Sn2r • • ,5nfc} / 0- Therefore, as the range of {sn}Xi is the set {n€fi[:sn^snp 1 < j < к } is not empty. Denote by rik+i the smallest integer in that set and note that xik < Пк+i- We claim that the function f:Af A defined by f(k) = snk is a 1-1 correspondence. By construction, f is one-to-one. So it remains to show that f is onto. Let x G A. Since the range of {sn}^=i is A, the set { n G AT : sn = ж } is not empty. Let тп be the smallest integer in that t Here and elsewhere in this proof we are using the well-ordering principle: Each nonempty subset of positive integers has a smallest element.
1.3 Equivalence of Sets; Countability □ 23 set. If m — 1, then x = $i = sni = /(1). Otherwise, let к be the smallest integer such that m' < n^. Because sn x = sm for n < m, we have that sm / snj for 1 < j < к — 1, which implies that m > Therefore, m = and, consequently, x = snk = f(k). Proposition 1.7 often provides an efficient method for proving that a set is countable. The next two propositions illustrate that fact. PROPOSITION 1.8 A subset of a countable set is countable. PROOF: Let A be a countable set and В C A. We claim that В is count- able. If В = 0, there is nothing to prove; so, assume В / 0. This implies that A is nonempty and, hence, by Proposition 1.7, A is the range of an infinite sequence, {sn}Xi • Choose x G В and let tn = sn if sn G В and tn — x if sn B. Then В is the range of the infinite sequence Applying Proposition 1.7 again, we conclude that В is countable. PROPOSITION 1.9 The image of a countable set is countable. PROOF: Let A be a countable set and f a function defined on A. By Proposition 1.7, A is the range of an infinite sequence, {sn}^=1. For each n G AT, define tn = f(sn). Now, let у G /(A). Then there is an x G A such that f(x) = y. Since A is the range of the infinite sequence {sn}^Li, there is an n G X such that sn = x. Therefore, у = f(x) = f(sn) — tn. This shows that /(A) is the range of the infinite sequence Hence, by Proposition 1.7, /(A) is countable. PROPOSITION 1.10 A countable union of countable sets is countable. PROOF: Let C be a countable collection of countable subsets of a set Q and let A = Ucgc must Prove that A is countable. If C is empty, then its union is empty and hence countable. So, assume that C is a nonempty collection. Without loss of generality, we can also assume that each member of C is nonempty. Since C is nonempty and countable, Proposition 1.7 implies that it is the range of an infinite sequence {An}^.! and, since each member of C is
24 □ Chapter 1 Set Theory countable, Proposition 1.7 implies that each An is the range of an infinite sequence Now, define /rAf2 —> A by f(m,n) = x^m and note that f is onto. By Example 1.10(c) on page 21, № is countable. Therefore, by Proposi- tion 1.9, A is countable, being the image of A/2 under f. In Example 1.10(c) we pointed out that Af2, the Cartesian product of Af with itself, is countable. More generally, we have the following fact. PROPOSITION 1.11 The Cartesian product of two countable sets is countable. PROOF: Let A and В be two countable sets. If either A or В is empty, then so is A x B. So assume that both A and В are nonempty. By Proposition 1.7, A and В are the range of infinite sequences, say, {un}^Li and {bn}^=1. Define —> A x В by /(m,n) = (am,bn). Then f is onto and, consequently, because AT2 is countable, Proposition 1.9 implies that A x В is countable. We can easily extend Proposition 1.11 to any finite number of sets. This will be explored in the exercises. PROPOSITION 1.12 The set Q of rational numbers is countable. PROOF: Example 1.10(a) on page 21 shows that Z is countable. Hepce, by Proposition 1.11, so is Z x flf. Define f: Z x A/* —> Q by /(z,n) = z/n. Since f is onto, Proposition 1.9 implies that Q is countable. EXERCISES 1.3 1.25 If A and В are sets, write A ~ В if there is a 1-1 correspondence from A to B. Prove that ~ is reflexive, symmetric, and transitive. In other words, if A, B, and C are sets, show that the following hold. a) A ~ A [reflexive] b) A ~ В implies В ~ A [symmetric] с) A ~ В and В ~ C implies A ~ C [transitive] +1.26 In this exercise, we will prove that any (nondegenerate) interval of H is uncountable. a) Show that the interval [0,1) is uncountable. Hint: Suppose to the con- trary that [0,1) is countable and let {^n}^=1 be an enumeration of its
1.3 Equivalence of Sets; Countability □ 25 elements. For each n G Af, let O.dnidn2... denote the unique decimal expansion of xn not containing only finitely many digits differing from 9. Then consider the number О.акгг ..., where an = 1 if dnn = 0 and an = 0 otherwise. b) Use part (a) to conclude that (0,1) is uncountable. c) Use part (b) to show that any bounded interval of the form (a, b) is uncountable. Hint: Construct a one-to-one and onto function from (0,1) to (a, b). d) Use part (c) to conclude that any interval is uncountable. In particular, is uncountable. 1.27 Refer to Example 1.10(c) on page 21. Prove that the function /:№ —* Af defined by f(m, n) = 2Tn-1(2n — 1) is a 1-1 correspondence. ★ 1.28 Prove that any infinite set contains a countably infinite subset. 1.29 Let A be a set. Prove that the following statements are equivalent. a) A is infinite. b) There is a one-to-one function f: A —> A that is not onto. c) There is an onto function g: A —> A that is not one-to-one. 1.30 Suppose that f: A —> В is one-to-one and that В is countable. Prove that A is countable. 1.31 Prove that the Cartesian product of a finite number of countable sets is countable. 1.32 In Proposition 1.11, we proved that the Cartesian product of two countable sets is countable and, in Exercise 1.31, we showed that the Cartesian product of a finite number of countable sets is countable. Is it true, in general, that the Cartesian product of a countable number of countable sets is countable? *1.33 Let Q be a set. A relation, =, on Q is said to be an equivalence relation if for all x, уj z G Q, • x = x [reflexive] • x = у implies у = x [symmetric] • x = y and у = z implies x = z [transitive] a) Give three examples of equivalence relations. b) Give three examples of relations that are not equivalence relations. 1.34 Refer to Exercise 1.33. Let Q be a nonempty set and = an equivalence relation on Q. For each x G Q, define Ex = {yi. G Q : у = x}. And let C = {Ex : x G Q }. Each member of C is called ail equivalence class of Q under =. a) Show that for each x, у € Q, either Ex О Ey = 0 or Ex = Ey. b) Prove that Q = IJagc A- c) Conclude that = partitions Q into disjoint equivalence classes; that is, Q is a disjoint union of the equivalence classes under =. 1.35 Let a and b be real numbers such that a < b. Prove that the intervals (a, b) and [a, b] are equivalent.
26 □ Chapter 1 Set Theory 1.36 Prove the Schroder-Bernstein theorem: Suppose that A and В are sets and that there are one-to-one functions f: A —* В and g: В —> A. Then A ~ B. Proceed as follows. Define t(E) = fl(/(F)c)c, EC A. a) Show that if E C F C A, then т(Е) C r(F). b) Let C = {EcA:EC r(E)} and set G — Uegc Prove that t(G) = G and, hence, that Gc = g(f(G)c). In particular, Gc is a subset of the range of g. c) Define h: A —> В by _ J /(t), if x G G\ ~ 1 Prove that h is a 1-1 correspondence. 1.4 ALGEBRAS, a-ALGEBRAS, AND MONOTONE CLASSES In set theory, as elsewhere in mathematics, it is important to distinguish collections that are closed under the relevant operations? For example, in linear algebra, the relevant operations are vector addition and scalar multiplication. Subsets of vector spaces closed under those operations are called subspaces and receive intensive study because of their significance. Algebras The three basic operations in set theory are union, intersection, and com- plementation. A nonempty collection of sets closed under these operations is called an algebra of sets. Thus, we make the following definition. DEFINITION 1.14 Algebra of Sets Let Q be a set. A nonempty collection До of subsets of Q is called an algebra if the following two conditions are satisfied: a) A G До implies Ac G До- b) A, В G До implies A U В G До- t Roughly speaking, a collection (set) C is closed under an operation if whenever the operation is applied to elements of C, the resulting element also belongs to C.
1.4 Algebras, cr-Algebras, and Monotone Classes □ 27 Conspicuous by its absence in Definition 1.14 is closure under inter- section. However, it is easy to show that this property follows from the two stated in the definition — an algebra is necessarily closed under inter- section; that is, if До is an algebra and А, В € До, then А П В € До- We leave the proof of this fact to the reader as an exercise. We also leave it to the reader to prove the following two facts: • An algebra is closed under finite unions and intersections; that is, if До is an algebra and Ak € До for к = 1, 2, ..., n, then Ufc=i Ak Ao and П£=1 Ak С До- • A nonempty collection of subsets of Q is an algebra if it is closed under complementation and intersection. EXAMPLE 1.11 Illustrates Definition 1.14 Let Q be a nonempty set. It is easy to see that each of the following is an algebra of subsets of Q: a) the power set, P(Q), that is, the set of all subsets of Q; b) the trivial algebra, {0, Q}; and c) {0, A, Ac, Q}, where A is a nonempty proper subset of Q. □ Next we will prove that the union of a sequence of members of an algebra can always be expressed as a disjoint union of members of the algebra. More precisely, we have the following useful proposition. PROPOSITION 1.13 Let До be an algebra of subsets of Q and {An}n a sequence of sets in До (i.e.f An G До for each n). Then there is a pairwise disjoint sequence {Bn}n of sets in До such that |Jn An = |Jn Bn. PROOF: The proof uses a process that we will refer to informally as dis- jointizing. Let B\ = Ai and, for n > 2, let Bn = An \ (Ufc=i Ak)- First we prove that Bn € Ao for each n. Let Cn = Ufc=i Ak- Since Ao is an algebra, we have, in turn, that Cn € Ao (because Ao is closed under finite unions), E Ao (because Ao is closed under complementation), and Bn = An\Cn = АпГ\С^ E Ao (because Ao is closed under intersection). Next we show that (Jn An = |Jn Bn. Since Bn C An for each n, it is clear that (Jn An D (Jn Bn. To show the reverse inclusion, let x E (Jn An. Then x E An for some n. Let m be the smallest such n. If m = 1, then x E Ai = Bi. If m > 2, we have x E Am and x Ak for к < m. This
28 □ Chapter 1 Set Theory implies that x E Am but x Ufc=i Ak, in other words, m—1 ж E Am \ I Ak j — Bm c |^J Bn. Thus, |Jn An c (Jn Bn. It is useful to know that given a collection of subsets, there is a small- est algebra containing the collection. We state this fact formally in the following proposition whose proof is left to the reader as an exercise. PROPOSITION 1.14 Let C be a nonempty collection of subsets of Q. Then there is a smallest algebra of subsets of Q containing C. The smallest algebra containing a collection C of subsets of Q is called the algebra generated by C and is denoted Ao(C). Thus, Ao(C) is an algebra of subsets of Q; С C Ao(C); and if Ao is an algebra of subsets of Q such that С C Ao, then Ao Э Ao(C). As a simple example, let A be a nonempty proper subset of a set Q. Then Ao({A}) = {0,A,Ac,Q}. a-Algebras As we have seen, an algebra of sets is closed under finite unions (and inter- sections). For the purposes of modern mathematics, a stronger condition is usually required, namely, closure under countably-infinite unions (and intersections). Hence, we make the following definition. DEFINITION 1.15 сг-Algebra of Sets Let Q be a set. A nonempty collection A of subsets of Q is called a а-algebra if the following two conditions are satisfied: a) A € A implies Ac € A b) {An}n C A implies \JnAne A Using the same type of argument used for algebras, we can show that a cr-algebra is necessarily closed under countable intersections; that is, if A is a а-algebra and {An}n C A, then An E A. We leave the proof
1.4 Algebras, a-Algebras, and Monotone Classes □ 29 of this fact to the reader as an exercise. We also leave it to the reader to prove that a nonempty collection of subsets of Q is a a-algebra if it is closed under complementation and countable intersections. EXAMPLE 1.12 Illustrates Definition 1.15 a) Clearly, any a-algebra is an algebra. However, the converse is not true. See the exercises for several examples. b) The three algebras given in Example 1.11 are also a-algebras. □ Additional examples of a-algebras are presented in the exercises. We will also encounter several a-algebras in future chapters; for instance, in Chapter 3, we will discuss the a-algebra of Borel sets and the a-algebra of Lebesgue measurable sets. It is useful to know that given a collection of subsets, there is a smallest a-algebra containing the collection. We state this fact formally in the following proposition whose proof is left to the reader as an exercise. PROPOSITION 1.15 Let C be a nonempty collection of subsets of Q. Then there is a smallest a-algebra of subsets of Q containing C. The smallest a-algebra containing a collection C of subsets of Q is called the a-algebra generated by C and is denoted Л(С). Thus, Л(С) is a a-algebra of subsets of Q; С С A(C); and if A is a a-algebra of subsets of Q such that С С Л, then A D A(C). As a simple example, let A be a nonempty proper subset of a set Q. Then A({A}) = {0,A,Ac,Q}. Monotone Classes and the Monotone Class Theorem Besides algebras and a-algebras, we also need to consider monotone classes. Here is the definition of a monotone class. DEFINITION 1.16 Monotone Class Let Q be a set. A nonempty collection P of subsets of Q is called a monotone class if it satisfies the following two conditions: a) {Dn}n=i c and Oi С O2 C ••• implies Dn £ T>. b) {Dn}n=i c and D d2 D • • • implies Dn G V.
30 □ Chapter 1 Set Theory Let {An}^Lx ke a sequence of subsets of Q. If Ai С A2 C • • •, then the sequence is said to be monotone nondecreasing or, more simply, non- decreasing. If Ai D A2 D • • •, then the sequence is said to be monotone nonincreasing or, more simply, nonincreasing. A sequence of subsets is called monotone if it is either monotone nondecreasing or monotone nonincreasing. Using this terminology, we see that a monotone class is a collection of subsets that is closed under unions of nondecreasing sequences and intersections of nonincreasing sequences. EXAMPLE 1.13 Illustrates Definition 1.16 a) Any cr-algebra is a monotone class. b) Let A C Q and T> = {A}. Then, trivially, T> is a monotone class. Note, however, that it is not a cr-algebra. □ For us, the most important result regarding monotone classes is the following theorem, known as the monotone class theorem. THEOREM 1.1 Monotone Class Theorem Let Q be a set and Ao an algebra of subsets of Q. Let D be a collection of subsets of Q such that V D Ao and V is a monotone class. Then T> D A(Ao), the cr-algebra generated by Ao- PROOF: Let F be the smallest monotone class that contains Ao- (Exer- cise 1.53 guarantees the existence of F.) We claim that F = A(Ao). Since every cr-algebra is a monotone class, we have A(Ao) Э F. If we can show F is a cr-algebra, that will imply A(Ao) C F, and the desired equality will follow. First we show that F is an algebra. Suppose A € Ao, and let 8 = { F E F : AUF E F }. We will show that 8 is a monotone class containing Ao- Since Ao is an algebra and F D Ao, it follows that 8 D Ao- Now suppose {En}n c 8 and Ei С E2 C •••. Because {Fn}n C F and F is a monotone class, |Jn En E F. And, because F is a monotone class, {A U En}n C F, and A U Ei C A U E2 C • •, we have that A U (|Jn En) — |Jn(A U En) E F. Therefore, (Jn En 6 8. Similarly, 8 is closed under intersections of nonincreasing sequences. So 8 is a monotone class containing Ao and, consequently, 8 D F. But, by definition, 8 C F. Thus, 8 = F- In other words, A U F E F for all A € Ao and F E F- Now suppose G E F, and let Q — { F E F : FUG E F}. We will show that Q is a monotone class containing Ao- From the previous paragraph, we know that Q D Ao and, using the same argument as in that paragraph, we can show that Q is a monotone class. This implies that Q — F. In other words, F U G E F for all F, G E F. Hence, F is closed under union.
1.4 Algebras, a-Algebras, and Monotone Classes □ 31 Next we show that F is closed under complementation. To that end, let 7Y = { F G F : Fc G F}. Because Ло is an algebra and F D Ло, it follows that 7Y D Aq. Also, because J7 is a monotone class, it is easy to see that H is a monotone class. Therefore, 7Y = F\ that is, F is closed under complementation. We have now shown that F is an algebra of sets. To show it is a a-algebra, we need only prove that it is closed under countably-infinite unions. So let {Fn}Xi c For n e let En = Ufc=i Fk- Then Fi C £2 C • • • and UXi E<ri = U~=i Fn- Since F is an algebra, {En}™=1 C F and, therefore, since F is a monotone class, UXi Fn = UX1 En € JT. Hence, F is a a-algebra. We now know that F = Л(Ло). That is, the smallest monotone class that contains Aq is Л(Ло). Because T> is a monotone class that contains Aq, it must be that P D Л(Ло). In proving the monotone class theorem, we showed that the smallest monotone class that contains an algebra of subsets is the а-algebra gener- ated by the algebra. That result is important in its own right. EXERCISES 1.4 1.37 Suppose that Ao is an algebra. a) Show that Ao is closed under intersection; that is, A, В G Ao implies А П В G Ao. b) Show that 0 G Ao. 1.38 Prove that an algebra is closed under finite unions and intersections. 1.39 Show that if a collection of subsets of Q is closed under complementation and intersection, then it is an algebra. 1.40 Let Q be an infinite set and T> = {A C Q : A is finite or Ac is finite}. Prove that P is an algebra. 1.41 This exercise generalizes Example 1.11(c). Suppose that {Ak }£=1 is a pair- wise disjoint finite sequence of nonempty subsets of Q whose union is Q. Let T> be the collection of all finite (including empty) unions of members of {Afc}£=1. Prove that 2? is an algebra. 1.42 Let C denote the collection of all intervals of 1Z, including degenerate in- tervals of the form [a, a] and (a, a). And let V be the collection of finite disjoint unions of members of C. Prove that 7? is an algebra. Hint: First show that T> is closed under intersection and then under complementation.
32 □ Chapter 1 Set Theory 1.43 Let Q be a set. A nonempty collection S of subsets of Q is called a semi- algebra if the following conditions hold: • A, В G S implies А П В G S. • A G S implies that either Ac = 0 or there is a pairwise disjoint finite sequence {Afc}£=1 of members of S such that Ac = |JZ=i In words, 5 is a semialgebra if it is closed under intersection and the com- plement of each member of S is a finite (possibly empty) disjoint union of members of S. a) Show that any algebra is a semialgebra. b) Give two examples of semialgebras that are not algebras. c) Let {Afc}£=1 be a pairwise disjoint finite sequence of nonempty subsets of Q whose union is Q. Set S = {0} U { A& : 1 < к < n}. Prove that S is a semialgebra. d) Let C denote the collection of all intervals of including degenerate intervals of the form [a, a] and (a, a). Show that C is a semialgebra. e) Let S be a semialgebra and T> the collection consisting of the empty set and all finite disjoint unions of members of S. Prove that 7? is an algebra. Hint: First show that T> is closed under intersection and then under complementation. 1.44 Prove Proposition 1.14: Let C be a nonempty collection of subsets of Q. Then there is a smallest algebra of subsets of Q containing C. Hint: Consider the collection of all algebras of subsets of Q that contain C. 1.45 Refer to Exercise 1.43. In part (e) of that exercise, we proved that the collection 2? consisting of the empty set and all finite disjoint unions of members of a semialgebra, S, constitutes an algebra. Show that T> is the algebra generated by S. 1.46 Prove each of the following facts. a) A a-algebra is closed under countable intersections; that is, if A is a a-algebra and {An}n С A, then Qn An G A. b) A nonempty collection of subsets of Q is a cr-algebra if it is closed under complementation and countable intersections. 1.47 In this exercise, we will provide two examples of algebras that are not a-algebras. a) Prove that the collection T> defined in Exercise 1.40, although an algebra, is not a cr-algebra. b) Prove that the collection T) defined in Exercise 1.42, although an algebra, is not a a-algebra. 1.48 Show that the collection T> defined in Exercise 1.41 is a a-algebra. 1.49 Suppose that {4n}J°=i is a pairwise disjoint sequence of nonempty subsets of Q whose union is Q. a) Prove that the collection T) of countable (including empty) unions of members of {An}^°=1 is a a-algebra.
1.4 Algebras, cr-Algebras, and Monotone Classes □ 33 b) Prove that the collection 8 of finite (including empty) unions of members of Mn}*=1 is not an algebra and, hence, not a cr-algebra. 1.50 Prove Proposition 1.15 on page 29: Let C be a nonempty collection of subsets of Q. Then there is a smallest cr-algebra of subsets of Q containing C. Hint: Consider the collection of all cr-algebras of subsets of Q that contain C. 1.51 Refer to Exercise 1.49, where {An}^=1 is a pairwise disjoint sequence of nonempty subsets of Q whose union is Q. In part (a) of that exercise, we proved that the collection T> of countable (including empty) unions of members of {An}^=1 is a cr-algebra. Show that T> is the cr-algebra generated by that sequence. 1.52 Let Q be a set. Prove that an algebra of subsets of Q is a cr-algebra if and only if it is a monotone class. 1.53 Let C be a nonempty collection of subsets of Q. Prove that there is a smallest monotone class of subsets of Q containing C.
Georg Friedrich Bernhard Riemann (1826-1866) Bernhard Riemann was born on September 17, wJBb 1826, in Breselenz, Germany. In 1846, Riemann joTVSHK entered Gottingen University to study theol- ogy. However, he soon persuaded his father to jj^^ allow him to switch to mathematics. ИЖ IBB Despite the presence of Gauss, Gottingen ИМи^ ДНН had only a simple mathematics curriculum, so, in 1847, Riemann enrolled at Berlin University, where he was greatly influenced by both Jacobi and Dirichlet. W. E. Weber s return to Gottingen University sparked an improvement in the mathematical climate there and, in 1849, Riemann also returned to Gottingen where, in 1851, he earned his PhD with his thesis on complex function theory and Riemann surfaces, Riemann continued his studies, submitting papers on Fourier series and geometry to qualify to become an unpaid lecturer. The mathemat- ical tools that Riemann developed in his geometry paper were used by Albert Einstein in his theory of relativity. Riemann’s first lectures were on partial differential equations in relation to physics. These brilliant lectures were reprinted for 80 years after his death. At last, in 1857, Riemann was appointed Assistant Professor (with pay!) at Gottingen. In 1862, Riemann became quite ill and spent most of the next four years trying to regain his health in the more hospitable climate of Italy But, in Selasca, Italy, on July 20, 1866, Riemann succumbed to tuber- culosis at the age of 39, 34
The Real Number System and Calculus As further preparation for our study of real analysis, we will present in this chapter several topics often encountered in previous mathematics courses. But, again, although some readers may be familiar with much of the ma- terial, we present this chapter as a way to provide a common ground for all readers of the text. We will first discuss the real number system and the extended real number system. Next we will investigate sequences of real numbers, ex- ploring, in particular, cluster points and limits of such sequences. Then we will introduce open and closed sets of real numbers and examine some of their basic properties. In the final sections of this chapter, we will present continuous functions and the Riemann integral with an eye toward reme- dying some of the deficiencies experienced in trying to use these classical concepts in modern analysis. 2.1 THE REAL NUMBER SYSTEM Although it is mathematically satisfying to construct the real numbers from “scratch,” such a construction would be an aside to the main thrust 35
36 □ Chapter 2 The Real Number System and Calculus of this text. Thus, we will not endeavor to present a construction of the real numbers? Instead, we will briefly review the main properties of the real number system, specifically, three groups of axioms that together char- acterize that system. Axioms for the Real Number System The first group of axioms for the real number system consists of the field axioms. These axioms provide the basic properties of the real numbers relative to the two binary operations of addition (+) and multiplication (•). We will follow convention in using juxtaposition to indicate multiplication when convenient. Field Axioms Let x,y,z G 1Z. Then we have: (Fl) x + у = у + x and xy — yx. (commutative) (F2) (x + y) + z = x + (y + z) and (xy)z = x(yz). (associative) (F3) x(y + z) = xy + xz. (distributive) (F4) There exist 0,1 € TZ with 0^1, such that for each x E 11, x + 0 = x and x • 1 = x. (identities) (F5) For each x e H, there is a ~x G 1Z such that x + (—x) = 0 and, if x / 0, there is an х~г G 1Z such that xx"1 = 1. (inverses) Because of (F2), x + у + z is defined unambiguously, as is any finite sum; likewise, xyz is defined unambiguously, as is any finite product. If xi, £2, • • •, xn are real numbers, then we use following notation: n Xk = Xi + X2 +---F xn k=l and n J* Xk = XiX2 * * ’ Xn. k=l Also, regarding (F5), we will usually write у — x for у + (—ж) and often write y/x or for yx-1. t Readers interested in a construction of the real numbers are referred to Cohen and Ehrlich’s The Structure of the Real Number System (New York: D. Van Nostrand Reinhold, 1963).
2.1 The Real Number System □ 37 The second group of axioms consists of the order axioms. These axioms provide the basic properties of the real numbers relative to the less-than (<) ordering. Order Axioms t Let x,y,z G 1Z. Then we have: (01) x < у and у < z implies x < z. (transitive) (02) x < у implies x + z < у + z. (03) x < у and 0 < z implies xz < yz. (04) Exactly one of x = y, x < y, and у < x holds. (trichotomous) Note: We will also employ the following notation: x < у means that x < у or x = y; x > у means that у < x; and x > у means that у < x. The third group of axioms actually consists of one axiom, called the completeness axiom or the least upper-bound axiom. In preparation for stating that axiom, we first introduce some terminology. Let A be a nonempty subset of 7Z. A real number и is called an upper bound for A if x < и for all x € A. Note that not every subset of has an upper bound, for example, neither 1Z nor Af has an upper bound. If a subset of 1Z has an upper bound, then we say that it is bounded above. A real number и is called a least upper bound or supremum for A if it is an upper bound for A and is smaller than or equal to any other upper bound for A. It is easy to see that a set can have at most one least upper bound. Also, by definition, a necessary condition for a subset of to have a least upper bound is that it be bounded above. That this condition is sufficient is the content of the completeness axiom. Completeness Axiom A nonempty subset of real numbers that is bounded above has a least upper bound. Let A be a nonempty subset of 11 that is bounded above. Then each of the following notations is used to denote the least upper bound of A: sup A, sup x, or sup{ x : x € A }. xEA t The order axioms can also be stated in terms of the positive real numbers. See Exercise 2.1.
38 □ Chapter 2 The Real Number System and Calculus Similarly, we can define lower bound and greatest lower bound: Let A be a nonempty subset of 1Z. A real number £ is called a lower bound for A if x > £ for all x G A. Note that not every subset of 7Z has a lower bound, for example, neither 7Z nor Z has a lower bound. If a subset of 7Z has a lower bound, then we say that it is bounded below. A real number £ is called a greatest lower bound or infimum for A if it is a lower bound for A and is greater than or equal to any other lower bound for A. It is easy to see that a set can have at most one greatest lower bound. Also, by definition, a necessary condition for a subset of 7Z to have a greatest lower bound is that it be bounded below. That this condition is sufficient is a consequence of the completeness axiom. In other words, we have the following proposition whose proof is left to the reader as an exercise. (See Exercise 2.4.) PROPOSITION 2.1 A nonempty subset of real.numbers that is bounded below has a greatest lower bound. Let A be a nonempty subset of 7Z that is bounded below. Then each of the following notations is used to denote the greatest lower bound of A: inf A, inf x, or inf{ x : x € A }. xEA EXAMPLE 2.1 Illustrates Least Upper Bound and Greatest Lower Bound a) sup[0,1) = 1 and inf [0,1) = 0. b) X has no least upper bound, but infX = 1. c) Let A = { x : x2 < 2 }. Then supxeA x = \/2 and infx€4 x = —y/2. □ An important consequence of the completeness axiom is that given any real number, we can find a positive integer exceeding that number. In other words, we have the following proposition, known as the Archimedean principle. PROPOSITION 2.2 Archimedean Principle For each x ETZ, there is an n 6 X such that n> x. PROOF: Let A — {m € Af : m < x}. If A = 0, then 1 > x and we are done. So, we can assume that A is nonempty. By definition, A is bounded above by x and, hence, by the completeness axiom, A has a least upper bound, say, u. Then и — 1 is not an upper bound for A and, hence, there is
2.1 The Real Number System □ 39 a к € A such that к > и — 1. Let n = к 4-1 and note that n G АЛ Because n > и and и is an upper bound for A, we have that n A. And from this last result and the fact that n G Af, we conclude that n > x. The next two propositions show that between any two real numbers there is both an irrational number and a rational number. We will find these two facts essential. PROPOSITION 2.3 Density of the Irrational Numbers Between any two real numbers there is an irrational number. PROOF: Let a, b G TZ with а < b. In Chapter 1, we noted that the interval (a, 6) is uncountable. (See Exercise 1.26 for a proof.) Since the set of rational numbers, Q, is countable (Proposition 1.12 on page 24), it follows from Proposition 1.8 on page 23 that any subset of Q is countable. If (a, 6) contained no irrational number then it would be an uncountable subset of Q. PROPOSITION 2.4 Density of the Rational Numbers Between any two real numbers there is a rational number. PROOF: Let a, b e with а < b. We first assume that a > 0. By the Archimedean principle, there is an n € AT such that n > (b — a)"1. Note that nb > nb — a > 1 4- na — a > 1; so, nb > 1. Now, let A — {k G N : к > nb}. By the Archimedean principle, A / 0 and, therefore, by the well-ordering principle, A has a smallest member, say, j. As nb > 1, j > 2. This, in turn, implies that j — 1 € A/" and, consequently, because j is the smallest member of A, we must have j — 1 < nb. Letting m = j — 1, we have that , ZR \ 771 + 1 a = b — (o — a) <------------ n 1 m n n Letting r = m/n, we have that r G Q and a < r < b. Next we remove the restriction that a > 0. Applying the Archimedean principle, we choose an n G Af such that n > —a. Then n+a > 0 and, so, by what we have already proved, there is an r G Q such that n+a < r < n + b. Then r — n G Q and a < r — n < b.
40 □ Chapter 2 The Real Number System and Calculus The Extended Real Number System It is convenient to enlarge the set of real numbers to the extended real numbers, which we denote by 7£*. This set is obtained by adding two dis- tinct symbols, oo and —oo, to the real numbers; thus, 7£* = 7£U {—oo, oo}. We extend the usual ordering of 71 to 71* by defining — oo < oo and —oo < x < oo for all x € 71. We also extend the binary operations of addition and multiplication to 7£*. In doing so, we make the convention that, for x € 71*, x — oo = x 4- (—oo) and x — (—oo) — z 4- oo in the sense that if one side of the equation is defined, then the other is defined likewise. Now, for x € 71, we define z-|-oo = oo 4- ж = oo and x — oo = —oo 4- x = —oo; and X • OO = 00 • X = 00 X • 00 = 00 • X = —oo and x (—oo) = (—oo) • x = —oo. and x (—oo) = (—oo) • x = oo, X • 00 = oo • X = 0 and x • (—oo) = (—oo) • x = 0, if x > 0; if x < 0; if x = 0. Also, we define oo 4- oo = oo and —oo — oo = —oo; oo • oo = oo and (—oo) • (—oo) = oo; and oo • (—oo) = (-oo) • oo = —oo. The expressions oo — oo and — oo 4- oo are left undefined because they cannot be defined in a way that is consistent with the rules of ordinary addition and multiplication. See Exercise 2.10. In Definition 1.1 on page 4, we defined intervals of 7Z. We can extend that definition to intervals of 7£* and, in fact, this extension simplifies the number of cases that need to be considered. DEFINITION 2.1 Intervals of 7£* Let a and b be extended real numbers such that а < b. Then the intervals of with endpoints a and b are as follows: (a, b) = { x € 7£* : а < x < b } [a,b) = {x € 71* : а < x < b} (a,b] = {x e 71* : а < x < b} [a,b] = {x € 7Z* : a < x < b}
2.1 The Real Number System □ 41 If a and b are both in 7£, then these are the bounded intervals of P, as given in Definition 1.1. On the other hand, if either a = —oo or b = oo, then the preceding four sets are unbounded intervals. Note that in 7?*, every set is bounded above by oo. Thus, every nonempty subset of 7£* has a least upper bound — if it is bounded above in 7£, then its least upper bound is also in 7£; if it is not bounded above in 7£, then its least upper bound is oo. Since every member of P* is vacuously an upper bound for 0, we see that the empty set also has a least upper bound in 7£*, namely, — oo. Similar remarks hold for greatest lower bounds. Thus, we have the following proposition. PROPOSITION 2.5 Every subset A of P* has both a least upper bound and greatest lower bound. We have the following: a) If A — 0, then sup A = —oo and inf A = oo. b) If A is bounded above in P, then sup A € 7£; otherwise, sup A = oo. c) If A is bounded below in P, then inf A 6P; otherwise, inf A = —oo. EXAMPLE 2.2 Illustrates Proposition 2.5 a) inf JV = 1 and supJV = oo. b) inf Z — —oo and supZ = oo. c) If A = {1,2,3, oo}, then inf A = 1 and sup A = oo. d) Suppose that I is an interval in P* with endpoints a and b. Then (see Exercise 2.11) we have inf I = a and sup I = b. □ EXERCISES 2.1 2. 1 The order axioms for the real number system can also be stated in terms of the positive real numbers as follows. Let P+ denote the subset of positive real numbers. Then we have: (01') x, у £ P+ implies x + у € 7£+. (02') x,y e P+ implies xy e P+. (03') x e P+ implies —x P+. (04') For each x G P, we have x = 0, x G P+, or — x G 7£+. Prove that these four axioms are equivalent to the order axioms given on page 37. Note: Assuming the order axioms given on page 37, we define P+ = {x : x > 0}. On the other hand, assuming the order axioms in this exercise, we define x < у to mean у — x G 7£+.
42 □ Chapter 2 The Real Number System and Calculus ★2.2 The absolute value of a real number rr, denoted |rr|, is defined by । . f x, if x > 0; W = l-X, ifx<0. Let x, у € TZ. Prove each of the following facts. a) | - z| = |i| b) |xj/| = |x||j/| c) |z + y\ < |x| + |?/| [triangle inequality] d) |kl-M| < к-3/1 ★2.3 For x, у G TZ, we define the maximum of x and у to be the larger of those two numbers. We denote the maximum by max{x, y} or x V y. Thus, v, r (x, if x > y, x V у = maxjj?, in = < .P 1 J I y, if x < y. Similarly, we define the minimum of x and у to be the smaller of those two numbers. We denote the minimum by min{x, y} or x A y. Thus, л • r i f У, if x > y; x Л у = min{a;, u} = < .. I x, if x < y. Let x, у G TZ. Referring to Exercise 2.2, prove each of the following facts. a) |x| = x V —x b) x V у = l(x + у + |x - 2/1) c) x Л у - + у - |x - y|) 2.4 Suppose that A is bounded below. Prove that A has a greatest lower bound and that, in fact, inf A = — sup{ — x : x G A }. 2.5 Suppose that A С B. Prove the following. a) If В is bounded above, then so is A and sup A < supB. b) If В is bounded below, then so is A and inf A > inf B. 2.6 Suppose that F is a finite nonempty subset of TZ. a) Prove that F is bounded above. b) Prove that sup F G F. (We call this element of F the maximum of F and denote it by maxF, таххег x, or max{ x : x G F}. c) Referring to Exercise 2.3, show that if F = {or, y}, then maxF = x V y. d) Prove that F is bounded below. e) Prove that inf F G F. (We call this element of F the minimum of F and denote it by minF, mincer x, or min{x : x G F}. f) Referring to Exercise 2.3, show that if F = {x, y}, then min F = x /\ y. 2.7 Prove that any (nondegenerate) interval of real numbers contains infinitely many irrational numbers, in fact, uncountably many. 2.8 Prove that any (nondegenerate) interval of real numbers contains infinitely many rational numbers.
2.2 Sequences of Real Numbers □ 43 2.9 Let x e and set A = { z € Z : z < x }. a) Prove that A is nonempty. b) Explain why A has a least upper bound. c) Prove that sup A G A and, hence, that sup A is an integer. d) The integer sup{ z e Z : z < x} is called the greatest integer in x and is denote by [ж]. Prove that [ж] < x < [x] + 1 or, equivalently, that x — 1 < [я] < x. e) The function f :1l —> Z defined by /(ж) = [ж] is called the greatest integer function. Prove that for each z G Z, f(z + x) = z + f(x). 2.10 Show that oo — oo cannot be defined in a way that is consistent with the rules of ordinary addition and multiplication. 2.11 Prove that if I is an interval in 7£* with endpoints a and 6, then inf I = a and sup I = b. 2.2 SEQUENCES OF REAL NUMBERS Recall from Chapter 1 (see page 14) that an infinite sequence is a function whose domain is the set of positive integers, A/*. In this section, we will study infinite sequences of real numbers. A sequence of real numbers is a sequence whose range is a subset of To begin, we recall the following definition from calculus. DEFINITION 2.2 Convergent Sequence; Limit A sequence {xn}^=1 of real numbers is said to converge to the real number x if for each e > 0, there is an N G AT such that |ж — xn| < 6 whenever n > N. In other words, the sequence converges to x if for each e > 0, all but finitely many terms of the sequence lie within e of x. The number x is called the limit of the sequence {^n}^Li and we write lim xn — x or xn —> x, as n —> oo. n—>oo If a sequence converges, we say that it has a limit. The sequence {(n — l)/™}^^ converges; in fact, its limit is 1, that is, Нтп__+оо(тг — 1)/тг = 1. On the other hand, it is easy to find sequences of real numbers that do not converge. Consider, for instance, the sequence {(—l)n}^_1. This sequence does not converge because its terms oscillate between —1 and 1 and, hence, do
44 □ Chapter 2 The Real Number System and Calculus not approach any single number. The sequence {n2}^ also does not converge but for an intrinsically different reason — its terms are becoming indefinitely large and, hence, do not approach a real number. If we would allow limits in 7£*, this latter sequence would converge to oo. It is convenient to permit convergence to extended real numbers and, in fact, to allow the sequences themselves to contain extended real numbers (i.e., to have range 7£*). Here we will discuss convergence to extended real numbers but will restrict ourselves to sequences of real numbers, leaving the generalization to sequences of extended real numbers to the reader. DEFINITION 2.3 Convergent Sequence (Extended Sense) A sequence of real numbers is said to converge in 7£* if one of the following three conditions hold: a) The sequence converges to a real number in the sense of Defini- tion 2.2. In this case, we say that the sequence converges in or that the limit exists and is finite. b) For each M e 7£, there is an N e AT such that xn> M whenever n > N. In this case, we say that the sequence converges to oo and write lim^-^oo xn = oo. c) For each M G 7£, there is an N G Af such that xn < M whenever n > N. In this case, we say that the sequence converges to — oo and write limn_»oo rrn = — cxd. Sequences, such as {^2}^=1 or {(n + l)/n}™=1, whose terms never decrease with increasing n or never increase with increasing n, play an important role in analysis. More generally, let {xn}^L1 be a sequence of real numbers. If xi < X2 < • • •, then the sequence is said to be nondecreasing. If > X2 > • • •, then the sequence is said to be nonincreasing. A sequence of real numbers is called monotone if it is either nondecreasing or nonincreasing. The next proposition, whose proof is left to the reader as an exercise, shows that any monotone sequence of real numbers has a limit (in 11*). In stating this and other propositions, we use the terminology that a sequence is bounded above if its range is bounded above, and that the least upper bound of a sequence is the least upper bound of the range of the sequence. Similarly, we say that a sequence is bounded below if its range is bounded below, and that the greatest lower bound of a sequence is the greatest lower bound of the range of the sequence.
2.2 Sequences of Real Numbers □ 45 PROPOSITION 2.6 Any monotone sequence of real numbers converges in 7£*. In fact, we have the following: a) If {xn}^=1 is nondecreasing, then lim xn = sup{ xn : n G A/"}, n—*oo In particular, the limit exists and is finite if {^n}Xi bounded above and is oo otherwise. b) If{Xn}Zl is nonincreasing, then lim xn = inf{irn : n € V}. n—+OO In particular, the limit exists and is finite if is bounded below and is —oo otherwise. Cluster Points By permitting sequences of real numbers to converge to extended real num- bers, we have dealt with one type of nonconvergence of sequences, namely, when the terms of the sequence are becoming either indefinitely large or in- definitely small. The other type of nonconvergence occurs when the terms of the sequence do not approach any single number, either real or extended real. To analyze sequences of this type, we introduce the concept of a cluster point. For a sequence of real numbers to converge to a real number x requires that for each c > 0, all but finitely many terms of the sequence lie within c of x. Thus, we see that the terms of a sequence that converges in are “clustering” around the limit of the sequence and no other number. If we consider again the sequence {(—we see that it does not converge because some of the terms of the sequence are clustering around —1 and some are clustering around 1. That is, for each e > 0, infinitely many terms of the sequence lie within e of — 1 and infinitely many lie within c of 1. This leads us to the following definition. DEFINITION 2.4 Cluster Point Let {zn}Xi a sequence of real numbers. a) A real number x is said to be a cluster point of {^n}^=i if f°r each e > 0 and N G Af, there is an n > TV such that |rr — xn\ < e.
46 □ Chapter 2 The Real Number System and Calculus b) oo is a cluster point of {zn}n=1 if for each M G К and N € TV, there is an n > TV such that xn > M. c) —oo is a cluster point of {^n}^°=1 if for each M G and TV G TV, there is an n > N such that xn < M. Remark: Because we are restricting ourselves to sequences of real numbers, the condition in part (b) of Definition 2.4 for oo to be a cluster point is equivalent to the following condition: For each M € 7£, there is an n G N such that xn > M; and, similarly, the condition in part (c) of the definition can be restated. However, it is better to use the definitions as stated in Definition 2.4 because they generalize properly to sequences of extended real numbers. EXAMPLE 2.3 Illustrates Definition 2.4 a) As the reader can easily verify, the sequence {(—l)n}^=i has two cluster points, namely, —1 and 1. b) Consider the sequence 2, 1, 0, 2, 2, |, 2, 3, j, 2, 4, |, ..., that is, {(n - 3)/n, (n+l)/3, if n = 0 (mod 3); if n = 1 (mod 3); if n = 2 (mod 3). This sequence, has three cluster points, namely, 1,2, and oq. c) Let {rn}^Li be an enumeration of the rational numbers. From the den- sity of the rational numbers (Proposition 2.4 on page 39), it follows that every extended real number is a cluster point of the sequence {гл}^х. We leave the details to the reader. □ The cluster points of a sequence of real numbers can be characterized as follows. (See Exercise 2.17.) • A real number x is a cluster point if and only if for each e > 0, infinitely many terms of the sequence are within e of x. • oo is a cluster point of a sequence if and only if for each M G 7Z, infinitely many terms of the, sequence exceed M if and only if the-sequence is unbounded above. • — oo is a cluster point of a sequence if and only if for each M G 7£, infinitely many terms of the sequence are smaller than M if and only if the sequence is unbounded below.
2.2 Sequences of Real Numbers □ 47 All three sequences in Example 2.3 have more than 5one cluster point and none of those sequences converge. More generally, we have the following proposition. PROPOSITION 2.7 A convergent sequence has exactly one cluster point, namely, its limit. Thus, a sequence having more than one cluster point cannot converge. PROOF: Suppose that {zn}^Lx is a convergent sequence of real numbers, say, xn —> x, as n oo. We will prove that x is a cluster point of the sequence and that it is the only cluster point of the. sequence. In doing so, we wilT assume that x € 7£ and leave the other two cases (x==oo and z = — oo) to the reader. To verify that x is a cluster point, let e > 0 and N We must find an n > N such that |z — zn| < e. But, in fact, since xn —> x, there is aK such that |x — zn| < e for all n > K. Let n max{7V,K}. Then n > N and, because n > K, we have |z — zn| < e. Thus, x is a cluster point of {zn}^r Now we show that rid real number different from x can be a cluster point of {zn}Xr Let у e H and у x. Let e = \y — z|/2. Choose N e Af such that n> N implies |z — zn| < e. Then for n > N; we have \y - Xn\ > \y - z| - |z - zn| > 2e - e = e and, consequently, у is not a cluster point of {zn}^_x. Next we show that oo is not a cluster point of {zn}^L1. Choose N e Af such that |z — xn| < 1 for n > N. Then, letting M = x + 1, we have that xn < M for n > N. Thus, oo is not a cluster point of Finally we show that —oo is not a cluster point of {zn}^Lx. Choose -V G N such that — xn\ < 1 for n > N. Then, letting M = x — 1, we have that xn > M for n > N. Thus, —oo is not a cluster point of {zn}Xi- Limit Superior and Limit Inferior Two of the most important concepts associated with infinite sequences of real numbers are the limit superior and the limit inferior. Although a sequence {zn}^Lx of real numbers does not necessarily have a limit (even in 7£*), it always has both a. limit superior and limit inferior. As we will see, these two extended real numbers are cluster points of the sequence, in fact, the largest and smallest cluster points, respectively.
48 □ Chapter 2 The Real Number System and Calculus First we introduce some convenient notation. For a sequence, {^n}Xi of real numbers, we write infxn = inf{zn : n G Af}, n supxn = sup{xn : n G A/*}, n sup Xk = sup{ Xk : к > n }, k>n inf Xk = inf{Xk : k>n}. k>n DEFINITION 2.5 Limit Superior and Limit Inferior Let {яп}„=1 be a sequence of real numbers. a) The limit superior of the sequence is the extended real number given by lim sup xn = inf sup Xk- n—*oo n k>n b) The limit inferior of the sequence is the extended real number given by liminf xn = sup inf Xk- n—*oo n k>n Note: Notations for the limit superior and limit inferior other than the ones presented in Definition 2.5 are commonly used. They are: lim sup xn = lim sup xn = lim xn = lim xn П—OO n“*°° and lim inf xn — lim inf xn = lim xn = lim xn . n“*°° n—*oo EXAMPLE 2.4 Illustrates Definition 2.5 Refer to Example 2.3 on page 46. a) Let xn = (—l)n. Then, for each n G A/*, we have supk>nxk — 1 and infk>nXk = —1. Therefore, inf sup Xk = 1 and sup inf Xk = — 1. n k>n n k>n In other words, lim sup xn = 1 and liminfa;n = — 1.
2.2 Sequences of Real Numbers □ 49 b) Consider the sequence 2, 1, 0, 2, 2, j, 2, 3, j, 2, 4, ., that is, (n — 3)/n, 2, (n+l)/3, (mod 3); (mod 3); (mod 3). Then, for each n G A/*, we have supfc>n Xk = oo and inf х^ = < k>n (n — 3)/n, if n = 0 (mod 3); (n — l)/(n 4-2), if n = 1 (mod 3); (n-2)/(n + l), ifn = 2 (mod 3). Therefore, inf supxfc = oo and sup inf Xk = 1. П k>n n In other words, lim sup xn = oo and liminfa;n = 1. c) Let {rn}^Lx be an enumeration of the rational numbers. Then, for each n G AT, we have supfc>n Xk = oo and inffc>n Xk = —oo. Hence, inf supxfc = oo and sup inf Xk = —oo. n k>n n k>n In other words, lim sup xn = oo and lim inf xn — — oo. □ It is helpful to note that the sequences {2/n}^Li an<^ defined by Уп = 8Щ>к>пхк and zn = infk>n^k are, respectively, nonincreasing and nondecreasing. Consequently, by Proposition 2.6 on page 45, both sequences are convergent, converging to, respectively, inf yn = inf sup Xk = lim sup xn n n k>n n—>oo and sup zn = sup inf Xk — lim inf xn. n n k>n n—*oo In other words, limsup= lim sup^fc and liminfa:n= lim inf Xk- П-+ОО 71—*OO k>n n—>oo n—>oo k>n The next two propositions characterize the limit superior and limit inferior of a sequence of real numbers, providing both mathematical and intuitive interpretations. We will prove the first part of the first proposition and leave the proofs of the remaining parts of both propositions to the reader as exercises.
50 □ Chapter 2 The Real Number System and Calculus PROPOSITION 2.8 Let {xn}^Li be a sequence of real numbers. We have: a) lim sup xn = x G if and only if for each e > 0, (i) there is an N G Af such that xn < x + e for n > N, and (ii) for each n e there is an m>n such that xm > x — e; in other words, if and only if for each e > 0, infinitely many terms of the sequence are within e of x and only finitely many are greater than x + e. b) limsupa;n =?oo if and only if for each M G and N G A/*, there is an n> N such that xn > M; in other words, if and only if the sequence is unbounded above. c) limsup xn — —oo if and only if lim^oo xn = —oo. PROOF: We prove part (a) and leave the proofs of the remaining two parts to the reader as Exercise 2.30. Let yn= supfc>nXk arid recall that is nonincreasing and converges to limsupxn. Suppose that x G 7<and limsupn_+ooa?n x- Then yn > x for all n 6 V and yn —* x as n —* oo. Let e > 0 be given. Choose N G X such that n > N implies yn — x < c Then, for n >^.N, < yn < x + e. This establishes (i). To establish (ii), we note that if n G Af, then supA;>n xk = yn > x and, hence, swpk>nXk > x e. This means that x — e is not an upper bound for { xk : к > n }; in other words, xm > x — e for some m > n. Conversely, suppose that for each e > 0, (i) and (ii) hold. We must prove that lim sup = x or, equivalently, that lim^oo yn — x. Let e > 0 be given. By (i), we can choose G A/* such that Xn < x -F e for n > N. This implies that, for n > N, yn = supjfc>n xk < x -F 6. By (ii), we know that for each n G Af, there is an m > n such that xm > x — e, which implies that, for each n G A/”, yn ~ supfc>n Xk > x — e. Thus, we have proved that, for each e > 0, there is an N G Af such that \yn — x| < e whenever n > N; in other words, lim^oo yn = x. PROPOSITION 2.9 Let {xnKXi be a sequence of real numbers. We have: a) lim inf xn = x G if and only if for each e > 0, (i) there is an N G Af such that xn > x — e for n > N, and (ii) for each n G X, there is an m>n such that xm < x -F e; in other words, if and only if for each e > 0, infinitely many terms of the sequence are within e of x and only finitely many are less than x — e. b) liminf xn —oo if and only if for each M G К and,N G there is an n > N- such that xn < M; in other words, if and only if the sequence is unbounded below. c) lim inf xn = oo if and only if limn-^ xn = oo.
2.2 Sequences of Real Numbers □ 51 We mentioned earlier that the limit superior and limit inferior are, respectively, the largest and smallest cluster points of a sequence. This is illustrated by Examples 2.3 and 2.4 (pages 46 and 48) and is proved in our next proposition. PROPOSITION 2.10 Let be a sequence of real numbers. Then, a) lim sup xn is the largest cluster point of {:rn}^=1. b) lim inf xn is the smallest cluster point of {^n}^Li- PROOF: We prove part (a) and leave the proof of part (b) to the reader as an exercise. Let x = lim sup xn. It follows immediately from the definition of cluster point (Definition 2.4 on page 45) and Proposition 2.8 that x is a cluster point of {xn}™=1. It remains to prove that x is the largest cluster point of x = oo, there is nothing to prove. If x = —oo, then Proposition 2.8(c) shows that \imn^xxn = — oo. Therefore, by Proposition 2.7 on page 47, —oo is the only cluster point of and, hence, the largest. So, we can assume that x e 1Z. By Proposition 2.8(a), only finitely many terms of the sequence exceed x 4-1; consequently, oo is not a cluster point. It therefore remains to prove that if у G TZ and у > x, then у is not a cluster point of {жп}^г Let e = (y — x)/2. Applying Proposition 2.8(a) again, we know that only finitely many terms of {^n}^Li exceed x + б or, equivalently, у — e. This shows that у is not a cluster point. The following proposition is often useful. The sufficiency part of the proposition enables us to prove that a sequence converges without explicitly finding its limit, and the necessity part often makes it easy to show that a sequence does not converge. PR0P0.SITI0N2.il A necessary and sufficient condition for a sequence of real numbers to converge is that its limit superior and limit inferior are equal. In such cases, the sequence converges to the common value of the limit superior and limit inferior. .. PROOF: Suppose {asn}^=1 converges. Then, by Proposition 2.7, the se- quence has a unique cluster point, namely, its limit. Since the limit superior and limit inferior are both cluster points (Proposition 2.10), they must both equal the limit of the sequence and, hence, each other.
52 □ Chapter 2 The Real Number System and Calculus Conversely, suppose that limsupa?n = liminfa;n. Then, by Proposi- tion 2.10, {xn}^=1 has exactly one cluster point, namely, the common value of the limit superior and limit inferior. Call that common value x. We claim that limn_+oo xn = x. If x = —oo, the result is true by Proposition 2.8(c) on page 50, whereas if x = oo, the result is true by Proposition 2.9(c) on page 50. Hence, it remains to show that if x E 7Z, then xn = x. Let € > 0. Then, by Proposition 2.8(a), there is an TV} 6 Af such that xn < z+e for n > Ni and, by Proposition 2.9(a), there is an N2 E M such that xn > x — e for n > TV2. Set N = max{M, N2}. Then, for n > TV, we have that x — € < xn < x + e, that is, |x — xn\ < c. Proposition 2.7 states that a convergent sequence has exactly one clus- ter point, namely, its limit. It follows immediately from Propositions 2.10 and 2.11 that the converse is true. In other words, we have the following. PROPOSITION 2.12 A sequence of real numbers converges if and only if it has exactly one cluster point. In such cases, the limit of the sequence is the unique cluster point. Cauchy Sequences Proposition 2.11 provides a criterion for determining whether a sequence of real numbers converges. A special case of that criterion is that a sequence converges in H if and only if its limit superior and limit inferior are equal and finite. Another criterion for determining whether the limit of a sequence exists and is finite is the Cauchy criterion. Roughly speaking, a sequence is a Cauchy sequence if the terms of the sequence become closer and closer together as the sequence progresses. More precisely, we have the following definition. DEFINITION 2.6 Cauchy Sequence A sequence of real numbers is called a Cauchy sequence if for each e > 0, there is an N E N such that \xn — xm\ < e whenever n, m > N.
2.2 Sequences of Real Numbers □ 53 With Definition 2.6 in mind, we now state and prove the Cauchy cri- terion for convergence of sequences of real numbers. THEOREM 2.1 Cauchy Criterion A sequence of real numbers converges in 71 if and only if it is a Cauchy sequence. PROOF: Let {zn}~ j be a sequence of real numbers. Suppose that the limit of the sequence exists and is finite, say, x. Let e > 0 be given. Then we can choose N € such that n > N implies |a; — тп| < e/2. Therefore, if n, m > TV, we have Thus, {^n}^Li is a Cauchy sequence. Conversely, suppose that {^n}^! is a Cauchy sequence. Let 6 > 0 be given. Then we can choose TV e jV such that \xn — xm\ < e whenever n, m > TV. In particular, we have that — e < xn < xn 4- c, n > TV. (2.1) From (2.1) and Exercise 2.29(b) on page 55, we see that both lim sup xn and liminfa;n lie in the interval [xn — 6, xn 4- e]. This shows that both lim sup xn and lim inf xn are finite and that 0 < lim sup xn — lim inf xn < 2e. As б > 0 was chosen arbitrarily, we conclude that liminfa;n = lim sup xn and that their common value is a real number. So, by Proposition 2.11, limn^oo xn exists and is finite. EXERCISES 2.2 2.12 Prove that the limit of a sequence of real numbers, if it exists, must be unique. 2.13 Let {xn}^=1 and be two sequences of real numbers whose limits exist and are finite. Also, let c (E 7£. Prove that each of the following holds. a) lim (xn 4- 2/n) = lim xn 4- lim yn n—*oo n—*oo n—*oo b) lim cxn = c • lim xn n—*00 n—*oo c) lim (xn2/n) = lim xn • lim yn n—>oo n—ЮО n—*oo
54 □ Chapter 2 The Real Number System and Calculus 2.14 Refer to Exercise 2.13. Decide under which conditions each of (a)-(c) holds if convergence is allowed in the extended sense, that is, in IV. 2.15 Let {znlJXi and be two convergent sequences of real numbers such that xn < Уп for n sufficiently large, that is, there is an E Af such that xn < yn for n > N. Prove that limn->oo xn < limn->oo yn- 2.16 Prove Proposition 2.6 on page 45. 2.17 Refer to Definition 2.4 on page 45. Let {^n}^ be a sequence of real numbers. Prove each of the following. a) A real number x is a cluster point of {ain}“=1 if and only if for each € > 0, infinitely many terms of the sequence are within e of x. b) oo is a cluster point of {zn}n=1 if and only if for each M eTZ, infinitely many terms of the sequence exceed M if and only if the sequence is unbounded above. c) —oo is a cluster point of {^nj^Lj if and only if for each M € 7£, in- finitely many terms of the sequence are smaller than M if and only if the sequence is unbounded below. 2.18 Find the cluster points of each of the following sequences. a) {l/n}~ , b) {1 + (-1)"}“, c) {sin(n7r/2)}~=1 2.19 Consider the sequence {in}“=1 defined by ' 1, Xn = < 2, .n+z if n = 0 (mod 3); if n = 1 (mod 3); if n = 2 (mod 3). Determine the cluster points of the sequence. 2.20 Consider the sequence {гп}“=1 defined by {n, if n is odd; (n — l)/n, if n is even. Determine the cluster points of the sequence. 2.21 Let {гп}^! be an enumeration of the rational numbers. Prove that every extended real number is a cluster point of this sequence. 2.22 Let т be a rational number, say, r = p/q where p and q are integers with no common divisors. Define xn = nr — [nr], where [sc] denotes the greatest integer in x. Determine the cluster points of {^n}“=1. 2.23 Let c be an irrational number and define xn = nc — [nc], where [ж] de- notes the greatest integer in x. Determine the cluster points of {zn}“=1 by proceeding as follows. a) Show that the terms of the sequence are distinct, that is, xn = xm implies n = m.
2.2 Sequences of Real Numbers □ 55 b) Prove that for each б > 0 and N G N\ there is an n > A/* such that 0 < xn < e. Hint: Use the Archimedean principle to choose an m G N such that 1/m < e. For 1 < к < m, let Ik = and note that the As are disjoint, their union is [0,1), and each has length 1/m. Now consider { Xj : j = 1, N 4-1,2N -hl,..., mN 4-1} and observe that, by part (a), this set consists of m 4-1 distinct numbers in [0,1). c) Let x e [0,1). Prove that for each e > 0 and N G AT, there is an n > N such that |x — xn| < 6. Hint: Choose an m E Af such that 1/m < e. Apply part (b) to choose an n > N such that 0 < xn < 1/m. Let к be the unique integer between 1 and m such that (k — l)/m < x < k/m. Now let £ be the largest positive integer such that Ixn < k/m. d) Obtain the cluster points of {гп}“=1. 2.24 Complete the proof of Proposition 2.7 on page 47 by showing that a sequence converging to oo or — oo has that value as its unique cluster point. 2.25 Prove that inffc>nXfc < supfc>mXk for all n, m G AT, where {zn}“=1 is any sequence of real numbers. 2.26 Let {rn}“=1 be a sequence of real numbers and c a real number. Show that a) lim sup(c 4- xn) ~ c 4- lim sup xn. b) lim inf (c 4- xn) = c 4- lim inf xn. ч 4 f climsupXn, if c > 0: c) hmsup(Cxn) = (climinfa;nj ifc-0 v . r/ Ч fcliminfxn, if c > 0; d) limmf(cxn) = < ~ ’ 1 fchmsupxn, if c < 0. Note that as special cases of parts (c) and (d), we have limsup(—xn) = — liminf xn and liminf(—xn) = — limsupxn. 2.27 Let {xn}“=1 and be sequences of real numbers. Verify that each of the following holds, provided the right-hand side makes sense. , a) limsup(xn 4- 2/n) < limsupxn 4- limsupyn. b) limsup(xn 4- 1/n) > limsupxn 4- liminf yn. c) lim inf (xn + Уn) > lim inf xn 4- lim inf yn> d) lim inf(xn 4- уn) < limsup xn 4- lim inf yn- 2.28 Let {xn}“=1 and {з/п}^ be sequences of real numbers and assume that limn-+oo yn exists and is finite. Prove that a) limsup(xn 4- yn) = lim sup xn 4- limyn• b) lim inf(xn 4- 2/n) = lim inf xn 4- lim yn. 2.29 Let {xnj^Lj and {т/п}^°=1 be sequences of real numbers. Suppose xn < yn for n sufficiently large; that is, there is an N G N such that xn < yn for n > N. a) Prove that limsupxn < limsupi/n and liminf xn < liminf yn. b) Suppose a and b are extended real numbers such that for n sufficiently large, a < xn <b. Show that a < lim inf xn < lim sup xn < b.
56 □ Chapter 2 The Real Number System and Calculus 2.30 Refer to Proposition 2.8 on page 50. Prove parts (b) and (c). 2.31 Prove Proposition 2.9 on page 50. ★2.32 Prove that limn_oo xn = x if and only if every subsequence of KJXi has a subsequence that converges to x. 2.33 Prove that an extended real number is a cluster point of a sequence if and only if the sequence has a subsequence converging to that number. Conclude that the limit superior of a sequence is the limit of a subsequence of the sequence and likewise for the limit inferior. 2.34 Provide an example of a sequence of real numbers that converges in 7£* but is not a Cauchy sequence. ★2.35 Let {zn}“=1 be a sequence of real numbers. Define n xi 4------1- xn 1 Qn — ~~ / Xkj n n k—1 so that an is the arithmetic mean of the first n terms of {zn}“=1- a) Prove that lim inf xn < lim inf an < lim sup an < lim sup xn. n~*°° n—*oo n—>oo n—*oo b) Prove that if {rrn}^Li converges, then so does {an}“=1 and, in fact, limn_>oo О-n = limn_+oo xn. c) Show that the converse of part (b) fails. 2.36 In this exercise, we will discuss infinite series. Let {zn}^=1 be a sequence of real numbers. The sequence {$n}^°=1 defined by n sn = n e Л/*, fc=i is called the sequence of partial sums of If the sequence {sn}^! converges to a real number, say, s, then we say that {zn}^=1 is summable to s or that the infinite series Xn converges to s, and we write oo S = n=l We also say that s is the sum of the infinite series. If the sequence {sn}^.1 does not converge to a real number, then we say that {жп}^°=1 is not summable or that the infinite series xn diverges. For brevity, we often write ^2 xn in place of xn. a) Prove that if xn > 0 for each n G Af, then either lim n—юо Sn — OO ОГ ^xn converges.
2.3 Open and Closed Sets □ 57 b) Show that if £2 xn converges, then limn_oo xn = 0. c) Show that if £2 xn converges, then limn_oo Xk = d) Prove that if 52|#n| converges, then so does ^Txn- Hint: Use the Cauchy criterion. ★2.37 In this exercise, we will consider generalized sums. Let I be a nonempty set and an indexed collection of nonnegative real numbers, that is, xb > 0 for each l G I. Define xL = sup < lEI xL : F finite, F С I lEF (2-2) where each sum in the set on the right is the ordinary sum of a finite collection of real numbers.« a) Suppose that I = {1,..., n}. Show that xb = £fc, where the term on the left is interpreted as in (2.2). b) Suppose that I = A/*. Show that ^2lEIxl = xn, where the term on the left is interpreted as in (2.2) and the term on the right as the sum of the infinite series if it converges and oo otherwise. c) Show that if ^2lEIxl < oo, then { l G I : xe > 0 } is countable. Note: This result is often applied in the following form: If f : Q —* [0, oo) is such that f(x) < oo, then { x : f(x) > 0 } is countable. 2.3 OPEN AND CLOSED SETS In this section, we will discuss open and closed sets of real numbers. These sets not only play a significant role in classical analysis but, as we will see throughout this book, figure prominently in many areas of modern analysis. We begin with the definition of an open set. Roughly speaking, a set О of real numbers is open if for each x G O, we can remain in О by staying sufficiently close to x. More precisely, we have the following definition. DEFINITION 2.7 Open Set A subset (9 C 77. is said to be an open set if for each x G (9, there is an r > 0 such that (x — r, x + г) С O. In other words, О is open if for each x G <9, there is an r > 0 such that all numbers within r of x are also members of O.
58 □ Chapter 2 The Real Number System and Calculus EXAMPLE 2.5 Illustrates Definition 2.7 a) Any interval of the form (a, b), where — oo < a < b < oo, is an open set. Therefore, such intervals are called open intervals. b) The interval (0,1] is not open, because (1-r, 1 + r) £ (0,1] for all r > 0. Similarly, neither [0,1) nor [0,1] are open sets. c) Let К be a nonempty countable subset of TZ. Then К is not open. Indeed, a nonempty open set must contain an open interval and such an interval is uncountable, as we know from Exercise 1.26 on page 24. In particular, then, V, Z, and Q are not open, and no nonempty finite set is open. d) The set, Qc, of irrational numbers is not open. If it were, then it would have to contain an open interval. Such an interval would contain no rational numbers, which is impossible by Proposition 2.4 on page 39. □ Our next theorem displays three fundamental properties of the collec- tion of open sets. As we will see in Section 7.1 (beginning on page 411), these three properties are precisely the ones needed to generalize the con- cept of open sets to other frameworks in the form of topological spaces. THEOREM 2.2 a) TZ and $ are open sets. b) If A and В are open sets, then so is A QB. c) If JS a collection of open sets, then *s °Pen- PROOF: The proof of (a) is trivial. For (b), suppose A and В are open sets. We must show that А П В is open. Let x e А П B. Then x G A and x G B. Since A and В are open, there exist п,Г2 > 0 such that (x — Г1,д;-ЬГ1) C A and (x — rz,x + r2) С B. Let r = min{ri, 7*2}- Then we have (x — r, x + r) C A and (x — r, x + г) С В so that (x — r, x + г) С АПВ. Hence, АП В is open. Now we prove (c). Suppose is a collection of open sets and let O = LU Ob. We must show that О is open. Let x G O. Then x e Ob for some 1 G I, say, lq. Since x G ObQ and is open, there is an r > 0 such that (x — r, x + r) C ObQ. Consequently, because ObQ C O, we have that (x — r, x + г) С O. Thus, О is open. Theorem 2.2(b) shows that the intersection of two open sets is open. It follows easily by induction that the intersection of a finite number of open sets is open; that is, if Ok is an open set for к = 1, 2, ..., n, then П£=1 Ok is also an open set. However, the extension to arbitrary (even countable)
2.3 Open and Closed Sets □ 59 collections is not valid. Indeed, for each n G AT, let On = (—1/n, 1/n). Then each On is open but Г)^ On = {0} is not open. We have seen that if a, b G 7£* with a < b, then the interval (a, b) is an open set, called an open interval. It follows from Theorem 2.2(c) that unions of collections of open intervals are open sets. As the next proposition shows, all open sets are of this form. PROPOSITION 2.13 Each open set О is a countable union of disjoint open intervals. The rep- resentation is unique in the sense that if C and V are two pairwise disjoint collections of open intervals whose union is О, then C — f). PROOF: Let О be an open set. We first show that О can be expressed as a union of open intervals. The idea is this: For each x G O, go as far as possible in either direction from x without leaving O; this will yield an open interval containing x and contained in O. The union of these open intervals will equal O. More formally, let x G O. Define Ax = {у : у < x and (3/,x) С О } and Bx = { z : z > x and (z, z) С О }. The sets Ax and Bx are nonempty because О is open. Let ax = inf Ax and bx = supBT. Here are two properties that we will need. First, ax < x <bx. This is true because if у G Ax, then ax < у < x and, so, ax < x] similarly, x < bx. Second, ax, bx O. To see this, suppose to the contrary that ax G O. Then (ax — r, ax + г) С О for some r > 0, and we can always choose r < x — ax. Since ax + r > aX1 there is a у G Ax such that у < ax + r and, since у E Ax, (т/, z) С O. It follows that (ax - r, x) = (ax - r,ax + r) U (г/, x) С О and, hence, that ax — r G Ax. But this is impossible because ax is a lower bound for Ax. Thus, ax О and, similarly, bx O. Set Ix = (ax,bx) and note that x G Ix. We claim that Ix С O. Let и G Ix, then ax < и < bx. Thus, we can choose у G Ax and z G Bx such that у < и < z. If и < ж, then и G (2/, ж] С О and, if и > x, then и G (ж, z) С O. Hence, Ix C O. We can now conclude that Uxeo С O. On the other hand, as x G IXJ LLeo Thus, О = LLeo -k- Next we show that either Ix П Iy = 0 or Ix = Iy. So, suppose that 1хГ\1у 0 0. Then ax < by and ay < bx. Since ax $ O, we have ax £ (ay, by) and, so, ax < ay. Similarly, ay < ax. Thus, ax = ay. Likewise, bx — by. Hence, Ix = Iy. Now, let C = {Ix : x G О }. Then, as we have seen, C is pairwise dis- joint and Uagc A = O. We claim that C is countable. Let A G C. Because A is an open interval, we can, by the density of the rational numbers, select a rational number гд G A. Define f:C —> Q by f(A) = гд. This function
60 □ Chapter 2 The Real Number System and Calculus is one-to-one because C is pairwise disjoint. Hence, C is equivalent to a subset of Q and, consequently, is countable. We leave the proof of the uniqueness of the representation as an exer- cise for the reader. Closed Sets Open sets constitute an important class of sets. Another important class of sets comprises the closed sets. To begin our discussion of closed sets, we make the following definition. DEFINITION 2.8 Limit Point, Closure Let E C 1Z. A real number x is called a limit point Ц E if for each e > 0, there is a у 6 E such that \y — x| < e. The set of all limit points of E, denoted E, is called the closure of E. It is easy to see that each of the following two conditions is equivalent to x being a limit point of E (i.e., x G E). • Each open interval containing x contains a member of E\ that is, if I is an open interval such that x G /, then I П E 0. • There is a sequence {xn}^=1 of elements of E such that limn-^ xn = x, thus, the terminology x is a limit point of E. EXAMPLE 2.6 Illustrates Definition 2.8 Wejeave the verification of each part that follows to the reader. a) and 0 = 0. b) Let a, b G with а < b. Then (a, 6) = [a, 6) = (a, 6] = [a, 6] = [a, 6]. с) АГ = ЛГ and Z = Z. d) Q = 1Z and Qc = K. e) If A is a finite subset of 7£, then A = A. □ Note that every point of a set E is a limit point of E, that is, E С E. However, the converse is not true — there may be limit points of E that do not belong to E. For instance, 1 is a limit point of [0,1) but does not belong to that set. If a set contains all its limit points, it is called closed. 1 Some texts use the term point of closure instead of limit point and reserve the term limit point for a related concept.
2.3 Open and Closed Sets □ 61 DEFINITION 2.9 Closed Set A subset F C TZ is said to be a closed set if F = F, that is, if F contains all its limit points. EXAMPLE 2. 7 Illustrates Definition 2.9 Referring to Example 2.6, we conclude the following: a) TZ and 0 are closed sets. But we also know from Theorem 2.2(a) that TZ and 0 are open sets. We leave it to the reader as an exercise to show that these are the only two subsets of TZ that are both open and closed. (See Exercise 2.46.) b) The intervals of TZ that are closed are those of the form [a, b], [a, oo), and (—oo, b], where a, b G TZ. Such intervals are called closed intervals. Note: Intervals of the form (a, b] and [a, 6), where a,b e TZ, are called half-open intervals. Degenerate intervals of the form [a, a] are closed sets; degenerate intervals of the form (a, a), (a, a], and [a, a) are empty and, hence, both open and closed. c) Af and Z are closed. d) Neither Q nor Qc is closed. e) Any finite subset of TZ is closed. f) A set may be neither open nor closed; examples are Q, Qc, and any half-open interval. □ The fundamental relationship between open and closed sets is eluci- dated by the following proposition. PROPOSITION 2.14 A set is open if and only if its complement is closed or, equivalently, a set is closed if and only if its complement is open. PROOF: Suppose that Ec is open. We will show that E is closed by proving that it contains all its limit points. So, assume that x E, that is, x e Ec. Since Ec is open, there is an r > 0 such that (a; — r, x + r) C Ec. But then (x — r, x + r) is an open interval about x containing no points of E\ hence, x E. We have therefore shown that E С E, as required. Conversely, suppose that E is closed. If x G Ec, then x E and, consequently, since E is closed, we have x E. Hence, there is an e > 0 such that (x — e, x + e) C Ec. We have thus shown that Ec is open.
62 □ Chapter 2 The Real Number System and Calculus Open and Closed Sets of a Subset oflZ Frequently, our “universal set” will be a proper subset of TZ. Therefore, we need to discuss open and closed sets of a subset D C TZ. DEFINITION 2.10 Open Set of D Let D C TZ. A subset G C D is said to be open in D if for each x G G, there is an r > 0 such that (x — r, x + г) П D C G. Thus, G is an open subset of D if for each x G G, there is an r > 0 such that all numbers within r of x that are members of D are also members of G. EXAMPLE 2. 8 Illustrates Definition 2.10 a) Let D = [0,2]. Then the interval [0,1) is open in D. Note, however, that it is not open in TZ. b) Let D = [0,2]. Then the interval [0,1] is not open in D because, for each r > 0, we have (1 — r, 1 4- г) П [0,2] £ [0,1]. c) Let D = ЛЛ Then every subset A c is open in AT. Indeed, if n G A, then (n - |,n + |) nAf = {n} C A. □ The following theorem provides the relationship between open sets of a subset of TZ and open sets of TZ. THEOREM 2.3 Let D CTZ. A set G C D is open in D if and only if there is an open set О ofTZ such that G = Dn О. In other words, the open sets in D are precisely the open sets ofTZ intersected with D. PROOF: Suppose G C D is open in D. Then, for each x e. G, there is an open interval Ix (open in TZ) containing x such that Ix П D C G. Let О = Then, by Theorem 2.2(c), О is open in TZ. We will show that G = D A O. If x G G, then x G D and x G Ix С O; thus, G C D A O. On the other hand, since Ix A D C G for all x G G, we have DDO = Z>n( J/А = |J(4nD)cG. ' xEG Hence, G = D ПО, as required. Conversely, suppose G = D П О for some open set О of TZ. If x G G, then x G О and, hence, there is an r > 0 such that (x — r, x + г) С O.
2.3 Open and Closed Sets □ 63 This, in turn, implies that (z — r, x + г) П D С О П D = G. Hence G is open in D. Limit points, closure, and closed sets in D are defined in a way anal- ogous to that in TZ. We leave the details to the reader in Exercise 2.52. EXERCISES 2.3 2.38 Prove that the intersection of a finite number of opens sets is open; that is, if Ok is an open set for к = 1, 2, ..., n, then Ok is also an open set. 2.39 Prove the uniqueness portion of Proposition 2.13 on page 59. 2.40 Prove Lindeldf’s theorem: Let О be a collection of open sets. Then there is a countable subcollection {(9n}n of О such that UO==U°- oeo n 2.41 Let E G TZ and x G TZ. Prove that each of the following is equivalent. a) x G E (i.e., x is a limit point of E). b) Each open interval containing x contains a member of E; that is, if I is an open interval such that x G I, then I A E / 0. c) Each open set containing x contains a member of E; that is, if О is an open set such that x G (2, then О A E / 0. d) There is a sequence {zn}^°=1 of elements of E such that limn_oo xn — x. 2.42 Refer to Example 2.6 on page 60. Verify each of the statements made in that example. 2.43 Let E C TZ. A real number x is called an accumulation point of E if for each € > 0, there is a у G E such that 0 < \y — rr| < e. Prove that the following are equivalent. a) x is an accumulation point of E. b) Each open interval containing x contains a member of E different from rr; that is, if I is an open interval such that x G I, then IA (E \ {a;}) / 0. c) x G E \ {я}. d) There exists a sequence {^n}^ of distinct elements of E such that limn—*oo Xn — x. 2.44 Refer to Exercise 2.43. Let E' denote the set of all accumulation points of a set E C 1Z. a) Prove that E' is closed. b) Prove that E = E U E'. ★2.45 Refer to Exercise 2.43. Prove the Bolzano-Weierstrass theorem: Every bounded infinite subset of real numbers has an accumulation point. Hint: Use the fact that every infinite set contains a countably infinite subset (Exercise 1.28 on page 25).
64 □ Chapter 2 The Real Number System and Calculus 2.46 Prove that 1Z and 0 are the only two subsets of 1Z that are both open and closed. Hint: Let A be a nonempty proper open subset of 1Z. Choose x G A and let ax and bx be as in the proof of Proposition 2.13 on page 59. 2.47 Let A and В be subsets of 1Z. Establish each of the following facts. a) A U В = A U B. b) АП В С АП В. Provide an example to show that the reverse inclusion does not hold in general. c) A is closed. d) If A and В are closed, then so is A U В. e) If is a collection of closed sets, then Г\6/ К is closed. 2.48 True or False: If is a collection of closed sets, then IJ^ez *s cl°secl- 2.49 Let A and В be subsets of and be a collection of subsets of 1Z. Establish each of the following facts. a) If A С B, then A С B. b) If А С В C A, then A = B. c) We have Jac ил. d) Referring to part (c), can we, in general, replace “C” by “=”? If not, state a condition on I that assures the replacement is valid. 2.50 Let D C 11. a) Suppose that D is an open subset of 1Z. Prove that a subset of D is open in D if and only if it is open in 1Z. b) Show that the result of part (a) fails to hold without the assumption that D is an open subset of H. c) Prove that a subset of D is open in D if it is open in 1Z. 2.51 Let D С 11. Prove that the collection of open sets of D satisfies the three properties listed in Theorem 2.2 (page 58). That is, a) D and 0 are open in D. b) If A and В are open in D, then so is А П B. c) If {Gzjtez is a collection of sets open in D, then Ua€/ is open in D. 2.52 In this exercise, we will explore limit points, closure, and closed sets of a subset of H. Let D C 11 and E C D. a) Define a limit point of E in D; call such a limit point a D-limit point of E. b) Define the closure of E in D; call it the D-closure of E. c) Define E is closed in D. d) Prove that E is closed in D if and only if D \ E is open in D. e) Prove that E is closed in D if and only if there is a closed set F of 1Z such that E = D П F.
2.4 Real-Valued Functions □ 65 2.4 REAL-VALUED FUNCTIONS A real-valued function is a function whose range is a subset of 7Z. If f : Q —> 7£, then we say that f is a real-valued function on Q. In this section, we will discuss real-valued functions and several concepts associ- ated with them. Much of the section is concerned with real-valued functions whose domains are a subset of H. We begin by defining algebraic operations on real-valued functions. This is done pointwise as follows. Suppose that f and g are real-valued functions on Q and that a G 7£. Then we define the functions f + g, and f • g on Q by (/+ £)(*) = /(*)+ (a/)(x) = a/(x), (/ • = f(x)g(x), for each x E Q. Continuous Functions The most important functions in calculus are the continuous functions. They play a prominent role in modern analysis as well. Roughly speaking, a function f is continuous at xq if f(x) can be made arbitrarily close to f(xo) by taking x sufficiently close to xq. More precisely, we have the following definition for a real-valued function defined on a subset of H. DEFINITION 2.11 Continuous Function Let D Cft, f:D -> ft, and xq G D. We say that f is continuous at Xq if for each e > 0, there is a 6 > 0 such that \f(x) — /(xo)| < б whenever x G D and |x — ях0| < 6. We say that f is continuous on D if it is continuous at every point of D. We denote by C(D) the collection of all continuous functions on ZM For simplicity, and when no confusion will arise, we often write C for (7(7£). Note: If f is not continuous at £q, then we say that f is discontinuous at xq or that xq is a point of discontinuity of f. t This notation is temporary and will be modified and generalized in Chapter 7.
66 □ Chapter 2 The Real Number System and Calculus EXAMPLE 2.9 Illustrates Definition 2.11 a) Let D = (0, oo) and define /(x) = 1/x. Then f is continuous on D. b) Let D = TZ. Define /(0) = 0 and f(x) = sin(l/x) for x / 0. Then f is continuous except at 0. c) Let D = TZ and define f(x) = [x]. Then f is continuous except at points of Z. d) Every function is continuous on ЛЛ Indeed, let -+TZ and xq e Af. Then |/(x) — /(xo)I — 0 whenever x e Af and |x — Xo| <1- О An important property of the continuous functions on a subset D of TZ is that they form an algebra of functions. In other words, we have the following theorem whose proof is left to the reader as an exercise. THEOREM 2.4 Let D C TZ. Then the collection C(D) of continuous functions on D is an algebra of functions. That is, if f,g E C(D) and a ETZ, then a) figeC(D). b) af e C(D). c) f-geC(D). Our next theorem provides a relationship between continuous functions on D and the open sets in D. THEOREM 2.5 Let D C TZ and f:D-+TZ. Then f is continuous on D if and only if /-1(O) is open in D for each open set О in TZ. PROOF: Suppose that f is continuous on D. Let О be an open set in TZ and Xo € /-1(O). Then /(xo) € О and, so, because О is open, there is an r > 0 such that (/(xo)—r, /(xo)4-r) С O. Since f is continuous at xq, there is a 6 > 0 such that |/(x) — /(xo)| < r whenever |x - Xo| < <5 and x e D. Therefore, if x G (xo — 5,xo+6)AL>, then /(x) G (/(xo) — r,/(xo) +r) С О and, hence, x G /-1(O). Consequently, we have found a 6 > 0 such that (xo - 6,xq 4- 6) A D С It now follows that /-1(O) is open in D. Conversely, suppose /-1(O) is open in D for each open set О in TZ. Let Xo G D. We will prove that f is continuous at xq. Let e > 0. The set (/(z0) ~ e, /(x0) + c) is open in TZ and, so, G = /-1 ((/(x0) - 6, /(x0) + б)) is open in D. Since x0 G G and G is open in D, there is an r > 0 such that (x0 - r, xo + r) A D C G. Thus, if x G D and |x — Xo| < r, then x G G and, consequently, /(x) G (/(x0) - 6, /(x0) + e), that is, |/(x) - /(x0)| <6.
2.4 Real-Valued Functions □ 67 COROLLARY 2.1 A function f:1Z-^1Z is continuous if and only if is open in 1Z whenever О is open in 11. We can restate Theorem 2.5 as follows: A real-valued function on D is continuous if and only if the inverse image of each open set in 1Z is open in D. This relationship between continuous functions and open sets is significant because it provides a way for us to define continuity of functions in very general settings, as we will see in Chapter 7. Monotone Functions Functions defined on an interval that never decrease as x increases or never increase as x increases play a significant role in analysis. DEFINITION 2.12 Monotone Function Let f be a real-valued function defined on an interval I of real numbers. Then f is said to be a) nondecreasing on I if f(x) < f(y) whenever x,y e I and x < y. b) nonincreasing on I if f(x) > f(y) whenever x,y e I and x < y. c) monotone on I if it is either nondecreasing or nonincreasing on I. Note: Some authors use the term increasing in place of nondecreasing and use the phrase strictly increasing to indicate that f(x) < /(?/) whenever x < y. We will use both “increasing” and “strictly increasing” to describe functions satisfying this latter condition, but will avoid both terms for functions that do not increase in the strict sense. Thus, for us, each of the terms “nondecreasing,” “strictly increasing,” and “increasing” applies equally well to the function f(x) = x3; but we would only use the term “nondecreasing” to describe the function f(x) — 1. Similar remarks hold for the three terms nonincreasing, strictly decreasing, and decreasing. EXAMPLE 2.10 Illustrates Definition 2.12 a) The function f(x) = e* * x is nondecreasing on any interval. It is also monotone and (strictly) increasing. b) The function f(x) = 1/x is nonincreasing on any interval not contain- ing 0. It is also monotone and (strictly) decreasing on any such interval. c) The function f(x) = sinx is nondecreasing on [-тг/2,тг/2] and nonin- creasing on [тг/2, Зтг/2]. However, it is not monotone on [0,7г]. □
68 □ Chapter 2 The Real Number System and Calculus Pointwise Limits We know from Theorem 2.4 on page 66 that C(jD), the collection of real- valued continuous functions on a set D C 7£, is an algebra of functions — it is closed under sums, multiples, and products. Being an algebra is a useful and important property for a collection of functions to have. Another desirable property for a collection of functions is that it be closed under pointwise limits. This concept is relevant to any collection of real-valued functions having a common domain, not just to those whose domain is some subset of TZ. To begin, we define pointwise convergence of a sequence of real-valued functions. DEFINITION 2.13 Pointwise Convergence Let {/n}Xi be a sequence of real-valued functions on a set Q, that is, fn: Q —> TZ for each n G АЛ Then we say that {/n}^Li converges pointwise on Q if for each x G Q, the sequence {/n(z)}^Li of real numbers converges in TZ. * ч If {fn}Xi converges pointwise on fi, then we can define a function by f(x) = limn-»oo fn(x)- We say that the function f is the pointwise limit of the sequence of functions {fn}^=i or that the sequence of functions {/n}~=1 converges pointwise to the function f. We write pointwise to indicate pointwise convergence of {/n}^=i to f. EXAMPLE 2.11 Illustrates Definition 2.13 a) For each n G Af, define fn:1Z —> TZ by fn(x) = (l+z/n)n. Then fn—*f pointwise on 7£, where f(x) = ex. b) Let D C H and define, for each n G AT, fn: D -+ H by fn(x) = xn. If D = [0,1], then fn—>f pointwise, where ч f 0, if 0 < x < 1; However, the sequence of functions fails to converge pointwise if D = [—1,1] since the sequence {(—1)™}^=! does n°t converge; it also fails to converge pointwise if D = [0,2] since, for instance, the sequence {2n}^=1 does not converge in TZ. c) For each n G Af, define fn: H by - z x f n2x, if Ixl < Zn(^) “ | otherwise.
2.4 Real-Valued Functions □ 69 Then fn-> f pointwise on ft, where /(0) = 0 and f(x) = 1/x for x / 0. d) Let jD C ft and define, for each n G Af, fn: D —► ft by fn(x) = x/n. Then fn —> 0 pointwise on jD, where 0 denotes the function identically equal to 0, that is, f(x) = 0 for all x G D. □ DEFINITION 2.14 Closure Under Pointwise Limits Let J7 be a collection of real-valued functions on Q. We say that J7 is closed under pointwise limits if whenever C J7 and fn —* f pointwise on Q, then / G £ Obviously, the collection of all real-valued functions on a set Q is closed under pointwise limits. In particular, the collection of all real-valued func- tions on a subset D of ft is closed under pointwise limits. However, in general, C(D) is not closed under pointwise limits, as we see from parts (b) and (c) of Example 2.11. The fact that C(D) is not generally closed under pointwise limits will lead us naturally into a discussion of Borel measurable functions in Chap- ter 3. For now, we introduce a type of convergence stronger than pointwise convergence that does yield closure for C(jD). This type of convergence is called uniform convergence, a concept that is relevant to any sequence of real-valued functions having a common domain. DEFINITION 2.15 Uniform Convergence Let be a sequence of real-valued functions on a set Q. Then we say that {fn}^LX converges uniformly to the real-valued function f on fi, if for each e > 0, there is an N G Af such that n > N implies |/n(z) ~ /(ж)I < € for * * x £ fL We write fn-+f uniformly to indicate uniform convergence of to f. The “uniform” in uniform convergence refers to the fact that by tak- ing n sufficiently large, fn(x) can be made arbitrarily close to f(x) for all x G Q, that is, uniformly over Q. Clearly, uniform convergence implies pointwise convergence. The con- verse is not true, however, as Example 2.11(b) shows. Uniform conver- gence also depends on Q. For example, let fn(x) = x/n and f(x) = 0. If
70 □ Chapter 2 The Real Number System and Calculus Q = [0,1], then fn-+f uniformly on Q. But, if Q = 7£, then fn f uniformly on Q, although it does so pointwise. The next proposition verifies our contention that C(D) is closed under uniform limits. PROPOSITION 2.15 Let D C 11. Suppose that {fn}n=i c C(D) and that uniformly. Then f e C(D). PROOF: Let xq e D and e > 0. Because fn~*f uniformly, we can choose N e AT such that |/n(x) - f(x)\ < e/S for all x e D. And, because /tv is continuous on D and, hence, at Xq, we can choose 6 > 0 such that |/лг(х) — /n(^o)| < 6/3 whenever x E D and |x — ^ol < It follows that whenever x G D and |x — ^o| < <5, we have |/(x) - /(x0)| < \f(x) - fN(z)\ + |/n(x) - /x(®o)| + |/n(zo) - /(^o)| < б/З + б/З 4- б/З = б, Thus, f is continuous on D. Monotone Sequences of Functions As we will see beginning in Chapter 3, it is important to consider mono- tone sequences of functions. As for pointwise and uniform convergence, this concept is relevant to any sequence of real-valued functions having a common domain. DEFINITION 2.16 Monotone Sequence of Functions Let {/n}n=i be sequence of real-valued functions on a set Q. Then we say that {/n}^Li is a) nondecreasing if for each x e Q, {/n(z)}^Li is a nondecreasing sequence of real numbers. b) nonincreasing if for each x G Q, {/n(^)}^=i is a nonincreasing sequence of real numbers. c) monotone if it is either nondecreasing or nonincreasing. EXAMPLE 2.12 Illustrates Definition 2.16 a) Let D c 11 and define fn:D -+1l by fn(x) = xn. Then {/n}Xi non” increasing if D = [0,1], nondecreasing if D = [1,2], and not monotone if D = [0,2] or £> = (-1,0].
2.4 Real-Valued Functions □ 71 b) Let D C R and define fn:D —► R by fn(x) = x/n. Then {/n}Xi nondecreasing if D c (—oo,0], nonincreasing if D c [0, oo), but not monotone if D contains both positive and negative numbers. c) Define fn: [0,1] R by 9 ТГХ 0. if 0 < x < Л-; if 277 < 271 71 if £ < x < 1. n — — The sequence {fn}^-i is not monotone. d) Let be a sequence of subsets of a set Q. Define fn: Q —► R by fn(x) = 1 if ж G An and fn(x) = 0 if x $ An. (i) {/n}^°=i is a nondecreasing sequence of functions if and only if Mn}Xi a (monotone) nondecreasing sequence of sets. (ii) {/nj^i is a nonincreasing sequence of functions if and only if is a (monotone) nonincreasing sequence of sets. □ EXERCISES 2.4 2.53 Let f and g be real-valued functions on Q. Write the pointwise definitions of f V g (the maximum of f and g) and f A g (the minimum of f and p). Refer to Exercise 2.3 on page 42. 2.54 Prove Theorem 2.4 on page 66. 2.55 Show that if f G C(D), then so is \f\. 2.56 Prove that C(D) is closed under maximums and minimums. That is, if f,g G C(D), then a) f V 9 £ C(D). Hint: Use Exercise 2.55, Theorem 2.4 on page 66, and Exercise 2.3(b) on page 42. b) f A 9 € C{D). Hint: Use Exercise 2.55, Theorem 2.4 on page 66, and Exercise 2.3(c) on page 42. 2.57 Verify each part of Example 2.10 on page 67. 2.58 Define f: [0,3] —► R by f(x) = 2ifl<x<2 and f(x) = 1 otherwise. On which (nondegenerate) subintervals of [0,3] is f a) nondecreasing? b) strictly increasing? c) nonincreasing? d) strictly decreasing? e) monotone? ★2.59 Suppose that f: (a, b) —> R is nondecreasing. For x G (a, 5), let Lx = { f(t) : a <t < x} and Rx = { /(t) : x < t < b }, and define f(x—) = supL^ and /(x+) = inf Rx.
72 □ Chapter 2 The Real Number System and Calculus f(x) = { a) Show that f(x—) < f(x) < /(#+) for all x G (a, b). b) Prove that f is continuous at x if and only if f(x—) = /(#+)• c) Prove that f has countably many discontinuities; that is, the set of points at wl)ich f is discontinuous is countable. d) Deduce that a nonincreasing function on (a, 6) has countably many dis- continuities. 2.60 For each n G define fn‘ [0,1] —> 1Z by fn(x) = xn. Also, define 0, ifO<m<l; 1, z = l. Prove that fn^f pointwise, but not uniformly, on [0,1]. 2.61 Let D C 11 and define, for each n G N, f:D^1Zby fn(x) = x/n. Also, define f: D —► H by f(x) = 0 for all x G D. a) Show that if D = [0,1], then fn—>f uniformly. b) Show that if D = 7£, then fn-/+f uniformly. c) In part (b), is it possible for {fn}^=1 to converge uniformly to some function? Explain your answer. 2.62 Verify each part of Example 2.12 on pages 70 and 71. ★2.63 In this exercise, we ask you to prove Dini’s theorem: Suppose that ' is a monotone sequence of continuous functions defined on a closed bounded interval [a,b\. Further suppose that converges pointwise to the continuous function f. Prove that fn —> f uniformly on [a, b] by applying the following steps. a) Explain why we can assume without loss of generality that {fn}<^L1 is nonincreasing and that f = 0. b) Let € > 0. For each n G set = {x G [a,6] : /п(я) < б}. Show that {On}?=i is a monotone nondecreasing sequence of open sets in [a, b] whose union is [a, b\. c) Use part (b) to prove that there is an G V such that OeN = [a, b]. Hint: Use the fact that if a closed bounded interval is a subset of the union of a collection of open sets, then there is a finite subcollection of that collection whose union also contains the interval. This result is a special case of the Heine-Borel theorem. d) Use part (c) to conclude that fn^0 uniformly. 2.64 Refer to Exercise 2.63. Show that the conclusion of Dini’s theorem does not hold if we weaken the hypotheses of that theorem in any one of the following ways: a) The interval on which the functions are defined is permitted to be any closed interval. b) The interval on which the functions are defined is permitted to be any bounded interval. c) The limiting function, /, is not restricted to being continuous. d) The monotonicity requirement is dropped.
2.5 The Cantor Set and Cantor Function □ 73 2.65 Refer to Example 2.11(a) on page 68. Prove that the convergence is uniform on any bounded subinterval of [0, oo). Hint: Apply the binomial theorem and Dini’s theorem (Exercise 2.63). 2.66 Let be a sequence of real-valued functions on Q. a) We say that {fn}^=1 is pointwise Cauchy on Q, if for each x € Q, {/n(z)}~ j is a Cauchy sequence. Prove that if {fn}^^ is pointwise Cauchy on Q, then it converges pointwise on Q. b) We say that {fn}^=1 is uniformly Cauchy on Q, if for each e > 0, there is an TV e Af such that \fn(x) — fm(x)\ < e whenever m,n > N and x G Q. Prove that if is uniformly Cauchy on Q, then it converges uniformly on Q. 2.5 THE CANTOR SET AND CANTOR FUNCTION We next introduce a set and function, called the Cantor set and Cantor function, that will serve as useful examples and counterexamples through- out the text? Before discussing the Cantor set, we present the following proposition which states in part that for each integer p > 2, every number between 0 and 1 has a base-p expansion. The proof is left to the reader as an exercise. (See Exercise 2.67.) PROPOSITION 2.16 Base-p Expansion Let p be an integer greater than 1. Then for each x G [0,1], there is a sequence {an}^Li of integers such that 0 < ап < p — 1 for all n and oo I= «I (2.3) Z—/ рП p p2 p3 t The sequence unique unless x / 1 and is of the form q/pm for some q,m in which case there are exactly two such sequences, one having only finitely many nonzero terms and the other having only finitely many terms different from p — 1. Note: We use the notation x = О.ахазаз • • • (p) (2.4) as a shorthand for Eq. (2.3). * The Cantor function and set are named in honor of Georg Cantor. See the biography on page 2.
74 □ Chapter 2 The Real Number System and Calculus EXAMPLE 2.13 Illustrates Proposition 2.16 a) For each p > 2, we have 0 = 0.000... (p) and l = 0.(p-l)(p-!)(?-!)... (p). b) The number 1/2 has, respectively, the binary (p = 2), ternary (p = 3), and decimal (p = 10) expansions given by 0.1000... (2), 0.1111... (3), 0.5000... (10). As predicted by Proposition 2.16,1/2 also has a second binary expansion and decimal expansion. They are, respectively, 0.0111... (2), 0.4999... (10). But the ternary expansion of 1/2 is unique. □ The Cantor Set We now construct the Cantor set. The Cantor set is a subset of [0,1] obtained as follows. Step 1: Delete the middle third open interval of [0,1], namely, (1/3,2/3). See Fig. 2.1. FIGURE 2.1 Set remaining after Step 1. Step 2: After the first step, there remain two closed intervals, namely, [0,1/3] and [2/3,1]. Delete the middle third open interval from each of those two intervals, namely, (1/9,2/9) and (7/9,8/9). See Fig. 2.2.
2.5 The Cantor Set and Cantor Function □ 75 1 1 1 1 1 1 1 1 n 1 2 1 2 7 8 1 и 9 9 3 3 9 9 1 FIGURE 2.2 Set remaining after Step 2. Step n: After the (n - l)st step, there remain 2n 1 closed intervals. Delete from each of these the middle third open interval. Continue this process inductively. For each n € X, let Gn denote the set removed at the nth step, Pn the set remaining after the nth step, G = U^Li Gn, and P = Pn- We have the following noteworthy facts: • Gn is the union of 2n-1 disjoint open intervals, each of length l/3n. In particular, Gn is an open set. • Pn is the union of 2n disjoint closed intervals, each of length l/3n. In particular, Pn is a closed set. • P = [0,1] \ G and, so, PPI G = 0 and P U G = [0,1]. • G is the disjoint union of all removed open intervals, the sum of whose lengths is 2n“x/3n = 1. In particular, G is an open set. • P is a closed set, being the intersection of closed sets. It contains no interval because, for each n e Af, P C Pn, and Pn contains no interval whose length exceeds l/3n. The set P is called the Cantor set or, sometimes, the Cantor ternary set. As we have just seen, G, the complement in [0,1] of the Cantor set, is a disjoint union of open intervals, the sum of whose lengths is 1. But the length of [0,1] is also 1. Thus, from the point of view of length, the Cantor set appears to be “small.” On the other hand, as we will see shortly, P is uncountable, so that from a cardinality point of view, the Cantor set is “large.” These, among other properties of the Cantor set, make it useful for illustrating many subtle concepts. We mentioned that the Cantor set is sometimes called the Cantor ternary set. The reason for this will now be revealed. We begin with the following lemma.
76 □ Chapter 2 The Real Number System and Calculus LEMMA 2.1 An interval (a, b) is one of the 2n~1 open intervals removed from [0,1] at the nth step in the construction of the Cantor set if and only if a and b are of the form а = O.(2C1)(2c2) ... (2cn_1)1000... (3), b = O.(2C1)(2c2) ... (2cn_i)2000... (3), 1 ' where ck e {0,1} for 1 < к < n — 1. PROOF: We proceed by induction. At the first step (n = 1) of the construction of the Cantor set, exactly one interval is removed, namely, (1/3,2/3). And we have that | =0.1000... (3), j = 0.2000... (3), which is of the form (2.5) with n = 1. Proceeding inductively, we note that an open interval (a, b) is removed at the nth step if and only if а = r-+-l/3n and b = r4-2/3n, where r equals 0 or is the right endpoint of one of the open intervals removed on or before the (n — l)st step. If r = 0, then а = ± =0.0^1000... (3), n—1 times b = Д = 0. (KM) 2000... (3), n—1 times which is of the form (2.5) with ck = 0 for 1 < к < n — 1. Otherwise, by the induction assumption, there is a positive integer m < n — 1 such that r = 0.(2ci)(2c2)...(2cm_i)2000... (3), where ck € {0,1}, 1 < k<n-l. Then we have а = r + ± = O.(2ci)(2c2) ... (2cm_i)20^0 1000... (3), n—m—1 times b = r + £ = O.(2ci)(2c2) ... (2cm_i)20^02000... (3), n—m—1 times which is of the form (2.5) with cm = 1 and ck = 0 for m-Fl < к < n — 1. Prom Lemma 2.1, we can obtain the following proposition whose proof is left to the reader as an exercise. See Exercise 2.69.
2.5 The Cantor Set and Cantor Function □ 77 PROPOSITION 2.17 The Cantor set consists of all numbers in [0,1] that have a ternary expan- sion without the digit 1. The Cantor Function Using Proposition 2.17, we can now define the Cantor function, which is a real-valued function on [0,1]. We begin by specifying its values on the Cantor set, in other words, by defining a function f :P -+11. * Let x G P. By Proposition 2.17, x has a (unique) ternary expansion without the digit 1, say, я = O.(2ci)(2c2) ... (3), where cn G {0,1} for each n G ЛЛ We define = O.cic2... (2). PROPOSITION 2.18 Let f:P-+Hbeas defined in the preceding text. Then the range of f is [0,1]. PROOF: It is clear from the definition of f that its range is a subset of [0,1]. To show that it is onto, let у G [0,1]. Then, by Proposition 2.16 on page 73, we can write у = O.did2 ... (2), where dn G {0,1} for each n G ЛЛ Let x = O.(2di)(2d2)... (3). From Proposition 2.17, we know that x G P and, by definition, f(x) = y- COROLLARY 2.2 The Cantor set is uncountable. PROOF: Prom Proposition 2.18, we have f(P) = [0,1]. By Proposition 1.9 on page 23, the image of a countable set is countable. Thus, since [0,1] is uncountable and is the image of P under /, P must be uncountable. Next we extend f to a function V? on [0,1]. If x G P, we define ^(x) = f(x). If x G [0,1] \ P, then x is in exactly one of the open intervals (a, b) removed from [0,1] in the construction of the Cantor set. By Lemma 2.1 on page 76, there is an n G Л7 such that а = O.(2C1)(2c2) ... (2cn_i)1000... (3), b = O.(2ci)(2c2) ... (2cn_i)2000... (3),
78 □ Chapter 2 The Real Number System and Calculus where Ck G {0,1} for 1 < к < n - 1. Now note that /(a) = O.cic2 ... cn__iOUl... (2), /(6) = 0.c1c2...cn_11000... (2), and, hence, /(a) = f(b). We define ^(x) to be the common value of /(a) and /(b). The function ф is called the Cantor function or Lebesgue singular function. Its graph is sketched in Fig. 2.3. 1 1 7 8 3 4 5 8 1 2 3 8 1 4 1 8 *111 1 II 1 Illi Illi x J__2_ 1 2 _7__8_ 1 2 19 20 7 8 25 26 27 27 9 9 27 27 3 3 27 27 9 9 27 27 1 FIGURE 2.3 Sketch of the Cantor function. In the next two propositions, we state two important properties of the Cantor function. One is that it is nondecreasing and the other is that it is continuous. The proof of the first proposition is left to the reader as an exercise. (See Exercise 2.72.) PROPOSITION 2.19 The Cantor function is nondecreasing.
2.5 The Cantor Set and Cantor Function □ 79 PROPOSITION 2.20 The Cantor function is continuous. PROOF: We will prove that 0 is continuous at each x G (0,1) and leave to the reader the proof of continuity at the endpoints of the interval. So, assume that x G (0,1). By Proposition 2.19, 0 is nondecreasing. Let Lx = { 0(t) : 0 < t < x } and Rx = { 0(t) : x < t < 1}, and let 0(x—) = supL^ and 0(x+) = inf Rx. To show that 0 is continuous at ж, it suffices, by Exercise 2.59 on page 71, to prove that 0(x—) = 0(x+). Suppose that this is not the case. Then, again, by Exercise 2.59, either 0(&—) < 0(&) or 0(x) < 0(x+). We will consider the case where the latter holds true, realizing that a similar argument would ensue if the latter does not hold true but the former does. As 0 is nondecreasing, we have 0 = 0(0) < 0(x) < 0(#+) < 0(1) = 1. Select у G (0(z), 0(z4~)) and note that 0 < у < 1. From Proposition 2.18 on page 77, it follows that the range of 0 is [0,1]; thus, there is a z G (0,1) such that 0(2) = y. Since 0(x) < у = 0(z), we must have x < z and, hence, 0(#+) < 0(z). But we also have 0(г) = у < 0(#+). Consequently, we have reached a contradiction and, therefore, 0 must be continuous at x. EXERCISES 2.5 2.67 Prove Proposition 2.16 on page 73 by using the following steps. We can assume that x G (0,1). (Why?) a) Show that for each n G Af, there are integers ai, 02, •••, fln with 0<Ufc<p — 1 for 1 < fc < n and such that Ql U2 p p2 CLn pn Ql U2 P P2 Qn pn pn Hint: Let ai = [px] and note that ai/p < x < (ui + l)/p. Also note that, because 0 < x < 1, we have 0 < px < p and, so, 0 < ai < p — 1. Now use induction. b) Use part (a) to show that there exists a sequence {on}“=1 of integers with 0 < an < p — 1 for each n G Af and such that (2.3) holds. c) Show that if x is of the form q/pm, where q,m G X, then it has two base-p expansions, one having only finitely many nonzero terms and the other having only finitely many terms different from p— 1. Hint: Use the Euclidean algorithm to write q = bip™-1 + ЬгР™-2 H---------F bm-ip 4- b™, where bk G Z and 0 < bk < p — 1 for к = 1, 2, ..., m.
80 □ Chapter 2 The Real Number System and Calculus d) Prove that x can have at most two different base-p expansions and that, if it has two, it is of the form ц/р™, where q,m G J\f. Hint: Assume x has two different base-p expansions, say, O.aia2 • • • (p) and 0.6162 ... (p). Let n be the first positive integer for which ak / bk and assume without loss of generality that an < bn- Show that this implies that an = bn — 1 and that, for к > n -4-1, a* = P ~ 1 and bk = 0. 2.68 Refer to Example 2.13 on page 74. For each part that follows, explain why we know from Proposition 2.16 that 1/2 has a) two binary expansions, b) two decimal expansions, c) a unique ternary expansion. 2.69 Prove Proposition 2.17: The Cantor set consists of all numbers in [0,1] that have a ternary expansion without the digit 1. Proceed as follows. a) Show that x G Gn if and only if each of its ternary expansions is of the form O.(2ci)(2c2) ... (2cn-i)lan+ian+2 . •. (3), (2.6) where Ck € {0,1} for 1 < к < n — 1 and not all the a^s are 0 and not all are 2. b) Use part (a) to show that G consists of all numbers in (0,1) that require a 1 in (each of) their ternary expansions. c) Use part (b) to conclude that Proposition 2.17 holds. 2.70 Refer to the notation introduced on page 75. a) Let I be any one of the 2n closed intervals whose union is Pn. Prove that I A P is uncountable. b) Prove that for each x € P and 6 > 0, the set (x — 6, x -4- 6) A P is uncountable. 2.71 Prove that the Cantor function, Vs satisfies ф(х) = 2ф ) for all x G [0,1]. 2.72 Prove Proposition 2.19: The Cantor function, ф, is a nondecreasing function on [0,1]. Hint: Let x,y G [0,1] with x < y. You must show ф(х) < ф(у). To accomplish that, consider cases depending on whether a? is a member of the Cantor set and whether у is a member of the Cantor set. First consider the case where both x and у are members of the Cantor set. 2.73 Complete the proof of Proposition 2.20 on page 79 by showing that the Cantor function is continuous at the endpoints of [0,1]. 2.74 Let ф denote the Cantor function and define D= r^+ft)-^)::re[0|1]| I h J Show that inf D = 0 and sup Z) = 00. 2.75 Generalize the technique used in the proof of Proposition 2.20 on page 79 to establish the following fact: If f: [a, 6] —* [c, d] is monotone and onto, then f is continuous on [a, 6].
2.6 The Riemann Integral □ 81 2.6 THE RIEMANN INTEGRAL In this section we will discuss the Riemann integral. We will define it in a way that motivates the definition of the Lebesgue integral, which is presented in Chapter 3. In defining the Riemann integral, we need the concept of characteristic function. The characteristic function of a set indicates which elements are in the set and which are not. More precisely, we have the following definition. DEFINITION 2.17 Characteristic Function Let Q be a set and А С Q. Then the characteristic function of A, denoted Xa, is the real-valued function on Q defined by 1, 0, if x e A; if x A. The Riemann Integral First we define the integral of a step function on a closed and bounded interval. A step function on an interval [a, 6] is a function of the form n h = ^akXik, (2.7) fc=i where n 6 Af, is a sequence of real numbers, and {Ik}k=i is a finite sequence of pairwise disjoint intervals whose union is [a, 6]. We permit degenerate intervals in this representation, that is, intervals of the form [c, c] = {c} or [c, c) = (c, c] = (c, c) = 0. Let us denote by £(I) the length of an interval I, where the length of a degenerate interval is defined to be 0. Then the integral of the step function in (2.7) is defined by b n h(x) dx = fc=i We leave it to the reader as an exercise to show that this definition of the integral of a step function is well-posed. See Exercise 2.77. t Note that we are defining the integral of a step function, not the Riemann integral of a step function. We will see shortly, however, that the two integrals agree.
82 □ Chapter 2 The Real Number System and Calculus For example, let [a, 6] = [0,1] and define Then h = 2x[o,i/3) + Зхц/3,1/2] ~ and we have f1 /1 \ /1 1\ / 1 / h(x) dx = 2l-—01+3I- — - )—4ll — - Jo \'5/ \^'5/ \ z Next we define the upper and lower Riemann integrals of a bounded real-valued function. Let f be a bounded real-valued function on [a, 6], that is, f: [a,b] —> and there is an M G such that |/(x)| < M for all x G [a,b\. Then the upper Riemann integral of f over [a,b] is defined by dx = inf < dx : h a step function and h > f Similarly, the lower Riemann integral of f over [a, b] is defined by dx = sup < dx : h a step function and h < f It is not too difficult to show that (see Exercise 2.79) [ /(x) dx < i /(x) dx. a J a If equality holds, then f is said to be Riemann integrable over [a, b\. DEFINITION 2.18 Riemann Integrable; Riemann Integral Let f be a bounded real-valued function on [a, b]. If dx,
2.6 The Riemann Integral □ 83 then we say that f is Riemann integrable over [a,b\. In this case, the common value of the upper and lower Riemann integrals is called the Riemann integral of f over [a, b] and is denoted by dx. We write R([a, b]) for the collection of all Riemann integrable func- tions over [a, b]. EXAMPLE 2.14 Illustrates Definition 2.18 a) Let f be a step function on [a, b], Then, see Exercise 2.80, /* f(x) dx = [ f (z) dx J a J a and / f(x) dx = f(x) dx, J a J a where each integral on the right is interpreted as the integral of a step function. Thus, every step function on [a, 6] is Riemann integrable and, moreover, its Riemann integral equals its integral as a step function. b) As we will discover later in this section, a continuous function on [a, b] is Riemann integrable thereon. c) Define /(0) = 0 and f(x) = 1/x2 for x 0. Then / is not Riemann integrable over a (closed and bounded) interval containing 0, as it is not bounded on such an interval. On the other hand, by part (b), / is Riemann integrable over a (closed and bounded) interval that does not contain 0, because it is continuous on such an interval. d) Here is an example of a bounded function that is not Riemann inte- grable. For x € [0,1], define /(x) = Xq(z). Now, a step function h that dominates f must satisfy h(x) > 1 except at a finite number of points. Because 1 (the function identically equal to 1) is a step function that dominates /, it follows that /(x) dx = 1. Similarly, a step function h that is dominated by f must satisfy h(x) < 0 except at a finite number of points. Because 0 (the function identically equal to 0) is a step func- tion that is dominated by /, it follows that /(x) dx = 0. Hence, f is not Riemann integrable over [0,1]. □
84 □ Chapter 2 The Real Number System and Calculus Basic Properties of the Riemann Integral The following theorem provides some fundamental properties of the Rie- mann integral. Its proof can be found in many advanced calculus and introductory real analysis texts? THEOREM 2.6 Suppose that f,g G й([а, b]) and that a elZ. a) If а < c < b, then f G Я([а, с]) П Я([с, b]) and [ f(x)dx= f /(z)dz+ f ffxjdx. a J a J c b) If f < g, then dx< i g(x) J а dx. c) We have f 4- g G Я([а, &]) and rb rb rb I (f + g)(x)dx= / f(x)dx+ / g(x)dx. a J a J а d) We have af G B([a, &]) and e) We have \f\ e 7?([a, 6]) and t See, for instance, Richard Goldberg’s Methods of Real Analysis, Section 7.4, 2d ed. (New York: Wiley, 1976).
2.6 The Riemann Integral □ 85 Characterization of Riemann Integrable Functions We stated earlier that a continuous function is Riemann integrable. Al- though there are noncontinuous functions that are Riemann integrable (e.g., step functions), we will soon see that a function is Riemann inte- grable if and only if it is “essentially continuous.” To make “essentially continuous” precise, we introduce the concept of sets of measure zero. Intuitively, these are sets without much content, although they may be large in the sense of cardinality. DEFINITION 2.19 Set of Measure Zero A subset E of 'll is said to have measure zero if for each c > 0, there exists a sequence {In}n of open intervals such that E C {JnIn and £n W < e- The next proposition shows that a countable union of sets of measure zero also has measure zero. Its proof is left to the reader as an exercise. PROPOSITION 2.21 Let {En}n be a sequence of subsets of'll each having measure zero. Then Un En has measure zero. EXAMPLE 2.15 Illustrates Sets of Measure Zero a) A singleton set has measure zero; that is, if x G 7£, then {ж} has measure zero. Ipdeed, for e > 0, Ц = (x - e/^x + e/4) is an open interval, {ж} C Ie, and £(IC) = c/2 < e. b) It follows immediately from part (a) and Proposition 2.21 that a count- able subset of H has measure zero. In particular, a finite subset of has measure zero and Af, Z, and Q have measure zero. c) The Cantor set has measure zero. (See Exercise 2.84). Since the Cantor set is uncountable (Corollary 2.2 on page 77), we see that although being countable is a sufficient condition for a subset of 'll to have measure zero, it is not necessary. d) A (nondegenerate) interval does not have measure zero. See Exer- cise 2.86. □ We are now in a position to state a continuity-type characterization of Riemann integrable functions. For a proof, see, for instance, Section 7.3 of ' the text referenced in the footnote on page 84.
86 □ Chapter 2 The Real Number System and Calculus THEOREM 2.7 A bounded function on [a, b] is Riemann integrable if and only if the set of points of discontinuity of the function has measure zero. COROLLARY 2.3 If f is continuous on [a, b], then it is Riemann integrable thereon and /* f(x) dx = sup /* h(x) dx, (2.8) J a h J а where the supremum is taken over all step functions h that are dominated by f. Equation (2.8) will serve as motivation when, in Chapter 3, we define the Lebesgue integral of a measurable function. EXAMPLE 2.16 Illustrates Theorem 2.7 a) From Theorem 2.7 and Example 2.15(b), we see that a bounded function on [a, b] having countably many discontinuities is Riemann integrable. b) Every monotone function on [a, b] is Riemann integrable. This follows immediately from Exercise 2.59(c) on page 72 and part (a) here. □ Convergence Properties of the Riemann Integral An important question in analysis is: If a sequence of functions converges pointwise, can the integral and limit be interchanged? For Riemann inte- gration, this question can be stated as follows. Suppose that {/n}^Li is a sequence of Riemann integrable functions on [a, b] that converges pointwise to a Riemann integrable function f. Is it true that lim [ fn(x)dx= [ f(x)dx? (2.9) n~4O° J a J а The answer, in general, is no. For example, let f = 0 (the function iden- tically 0) and fn = nx(o,i/n) f°r each n E JV. Then fn f pointwise on [0,1]. But f(x) dx = 0 and /J fn(x) dx = 1 for each n € ЛЛ There- fore, lim / fn(x) dx = 1 ф 0 = / /(x) dx. n-*°° Jo Jo Even if each /n and / are continuous, the limit and the integral cannot, in general, be interchanged. As in the discussion of pointwise convergence of continuous functions, the concept of uniform convergence plays an im- portant role here.
2.6 The Riemann Integral □ 87 THEOREM 2.8 Suppose is & sequence of Riemann integrable functions on [a, b] that converges uniformly to a function f. Then f is Riemann integrable over [a, b] and rb rb lim / fn(x)dx = / f(x)dx. n—*oo I I J CL J Q, PROOF: That f is Riemann integrable is left to the reader as an exercise. (See Exercise 2.91.) Let e > 0 be given. Choose N € Af such that |/‘(x) — < б/(Ь — a) for all x € [a, 6] whenever n > N. Using Theorem 2/6 on page 84, we have, for n > N, rb rb I f(x)dx — / fn(x)dx a J а fn){x)dx |/(z) - fn(x)\dx < fb 6 J I ----dx = e а b-а as required. As Theorem 2.8 shows, uniform convergence is a sufficient condition for the interchange of limit and integral. It is not, however, a necessary condition. To see this, let fn(x) = xn and /w={»: x = 1; x 1. Then fn-+f pointwise, but not uniformly, on [0,1]. Moreover, lim / fn(x)dx= lim —?—= 0 = / f(x)dx. n-^Jo n->oon+l Jo We have seen three important consequences of uniform convergence. It is a sufficient condition for • the limit of a sequence of continuous functions to be continuous (Propo- sition 2.15 on page 70). • the limit of a sequence of Riemann integrable functions to be Riemann integrable (Theorem 2.8). • the interchange of limit and integral (Theorem 2.8).
88 □ Chapter 2 The Real Number System and Calculus Although uniform convergence has these and other desirable conse- quences, it is a very strong condition to place on a sequence of functions, especially when the common domain is the entire real line. This fact and a need to “integrate” non-Riemann integrable functions will lead us naturally to Lebesgue measurable functions and the Lebesgue integral in Chapter 3. EXERCISES 2.6 2.76 Let Q be a set and А, В C Q. Prove the following facts. а) хаг\в =XA’Xb- b) If А П В = 0, then xaub = Ха + Xb- c) More generally than in part (b), if {Cn}n is a pairwise disjoint sequence of subsets of Q, then X|J Cn ~ • d) Obtain a general formula for xaub- 2.77 Show that the definition of the Riemann integral of a step function is well- posed: Suppose h = o>kXik = bjXJj, where n, m G {a/c}fc=i and are sequences of real numbers, and and are each a finite sequence of pairwise disjoint intervals whose union is [a, b\. Prove that ak£(Ik) = J2JL1 ^(Л)- Hint: First show Ik П Jj / 0 implies that ak = bj. Then show ak£(Jk) = °* EXi ^Ik П 2.78 Suppose that g and h are step functions on [a, b] and that g < h on [a, b], that is, g(x) < h(x) for all x G [a, b]. Prove that f^g(x)dx < h(x)dx, where each integral is interpreted as the integral of a step function. Hint: See the hint given in Exercise 2.77. 2.79 Let f be a bounded function on [a, b]. Prove that I f(x) dx < f(x) dx. J a J a Hint: Use Exercise 2.78. 2.80 Let f be a step function on [a, b]. Prove that pb pb J f(x) dx= f(x) dx a J a and I f(x) dx = I f(x) dx, a J a where each integral on the right is interpreted as the integral of a step function. Thus, a step function on [a, b] is Riemann integrable thereon and its Riemann integral equals its integral as a step function. Hint: Use Exercise 2.78.
2.6 The Riemann Integral □ 89 2.81 Prove that a real-valued function f on [a, 6] is a step function if and only if there is a partition a = xq < < • • • < xn = b of [a, b] and real numbers ci, C2, ..., cn such that for each 1 < к < n, we have f(x) = ck for Xk-l < x < Xk. 2.82 In this exercise, you are asked to show that the definitions of the upper and lower Riemann integrals, as presented on page 82, are equivalent to those usually encountered in advanced calculus and introductory real analysis courses. Let f be a bounded real-valued function on [a, d]. For a partition a = xq < xi < • • • < xn = b of [a,6], define, for 1 < к < n, Ik = [жл-1,ж&], = sup{/(z) : x e Ik}, and m(f,Ik) = inf{/(z) : x G Ik}- Then define Uaf = inf < M(/, Ik)(Xk - Xk-i) : a = Xq < Xi < - - < xn = b ► and La/ = sup V' m(/, Ik)(xk ~ Xk-1) : a = xq < Xi < • • • < xn = b fc=i Prove that U^f = f^f(x)dx and La(/) = f^f(x)dx. Hint: Use Exer- cise 2.81. ~ 2.83 Prove Proposition 2.21 on page 85. 2.84 Prove that the Cantor set, P, has measure zero. Hint: Recall that P C Pn for each n G X, where Pn is the set remaining after the nth step in the construction of the Cantor set. 2.85 Show that a subset of a set of measure zero also has measure zero. ★2.86 Prove that a nondegenerate interval does not have measure zero by pro- ceeding as follows. a) Let a, b G P with a < b. Show that if is a finite sequence of open intervals whose union contains [a, 6], then £(Ik) > b — a. b) Deduce from part (a) that if a < b, then [a, b] does not have measure zero. Hint: Use the fact that if a closed bounded interval is a subset of the union of a collection of open sets, then there is a finite subcollection of that collection whose union also contains the interval. This result is a special case of the Heine-Borel theorem. c) Conclude from part (b) that a nondegenerate interval does not have measure zero. 2.87 Define /: [0,1] — by 0, if x = 0 or x G [0,1] \ Q; 1/g, if x G (0,1] П Q and x = p/q in lowest terms. a) Show that the set of points of discontinuity of / is (0,1] П Q.
90 □ Chapter 2 The Real Number System and Calculus b) Deduce from part (a) that f is Riemann integrable on [0,1]. c) Show that f(x) dx = 0. 2.88 Find a function on [0,1] that has uncountably many points of discontinuity but is Riemann integrable. Hint: Do something with the Cantor set. 2.89 Construct a sequence of continuous functions on [0,1] that converges point- wise to a continuous function but for which the limit and integral cannot be interchanged. 2.90 Refer to Exercise 2.89. Is it possible to find such a sequence of functions if the sequence is required to be monotone? 2.91 Prove that if {fn}^^ is a sequence of Riemann integrable functions on [a, 6] that converges uniformly to a function /, then f is Riemann integrable. Proceed as follows. a) Show that f is bounded. b) Show that the set of points of discontinuity of f has measure zero. Hint: Let En denote the set of points of discontinuity of fn and set E = En- Show that f is continuous at each point of [а, Ь] \ E. 2.92 Let be a sequence of Riemann integrable functions on [0,1] that converges pointwise to the function f. Construct an example showing that f need not be Riemann integrable on [0,1], even if {fn}™=1 is monotone and f is bounded.
PART TWO □ Measure, Integration, and Differentiation
Emile Felix-Edouard-Justin Borel (1871-1956) Emile Borel was born at Saint-Affrique, France, on January 7, 1871. He exhibited a strong pro- clivity for mathematics when he was very young and was sent to а 1усёе at Montauban. In 1890, he entered the Ecole Polytechnique in Paris, graduating in 1893. He received his doctorate from the Ecole Normale Superieure in 1894. Borel’s most important research was done in the 1890s when he worked on probability, the infinitesimal calculus, divergent series, and measure theory. In 1896, he provided the proof of Pi- card’s theorem, a proof that mathematicians had been seeking for nearly 20 years. Although John von Neumann is credited as the founder of game theory, Borel completed a series of papers on the subject between 1921 and 1927, thus being the first to define games of strategy. After WW I. Borel developed an interest in politics, serving as Minis- ter of the Navy from 1925-1940. He was arrested and briefly imprisoned by the Vichy regime in 1940. after which he worked in the Resistance. His honors included the Resistance medal in 1945, the Croix de Guerre in 1918, the Grand Cross of the Legion d’Honneur in 1950, and the first gold medal of the Centre National de la Recherche Scientifique in 1955. Borel was appointed to the faculty of the Ecole Normale Superieure in 1896, held the Chair in Function Theory at the Sorbonne from 1909 until 1940, and was director and founding member of the Henri Poincare Institute from 1928 until his death on February 3, 1956, in Paris. 92
Lebesgue Theory on the Real Line In Chapter 2, we discussed open sets, continuous functions, and the Riemann integral. Those classical concepts have served mathematics and its applications well. However, for the purposes of modern mathematics, a more general and sophisticated framework is required. In this chapter, we will take the first steps toward obtaining that framework. We will expand the collection of continuous functions to the collec- tion of Borel measurable functions, the smallest algebra of functions that contains the continuous functions and is closed under pointwise limits. In doing so, we will be led to consider the collection of Borel sets, the small- est cr-algebra of subsets of that contains the open sets. Then we will generalize the Riemann integral so that it applies to Borel measurable functions. That generalization will take us to the development of Lebesgue measure, Lebesgue measurable functions, and the Lebesgue integral. 3.1 BOREL MEASURABLE FUNCTIONS AND BOREL SETS In the previous chapter, we showed that the collection of continuous, real- valued functions forms an algebra but is not closed under pointwise limits. Since this latter property is a crucial one in modern mathematical analysis, 93
94 □ Chapter 3 Lebesgue Theory on the Real Line we will enlarge the collection of continuous functions to a collection of functions that is closed under pointwise limits. Specifically, we will consider the smallest algebra of (real-valued) func- tions that contains the continuous functions and is closed under pointwise limits. Such an algebra of functions exists — it is the intersection of all algebras of functions that contain the continuous functions and are closed under pointwise limits Л As we will see presently, the condition of being an algebra is superflu- ous. That is, the smallest collection of functions that contains the continu- ous functions and is closed under pointwise limits is necessarily an algebra of functions. Thus, we make the following definition: DEFINITION 3.1 Borel Measurable Functions We denote by C the smallest collection of real-valued functions on 1Z that contains the collection of continuous functions and is closed under pointwise limits. The members of C are called Borel measurable functions. THEOREM 3.1 The collection, C, of Borel measurable functions forms an algebra. That is, if f and д are Borel measurable and а € 1Z, then a) f + д is Borel measurable. b) a f is Borel measurable. c) f • д is Borel measurable. PROOF: We prove only part (a); parts (b) and (c) are left as exercises. First of all, let д € C (the collection of continuous functions on IV) and set V = {f €C:/ + pEC}. If f G C, then f e C and f 4- д € С С C. Thus, T> D O'. Now suppose that {/n}~ i С T> and that fn~+f pointwise. Then fn£C and fn 4~ д € C for all n € Af and fn + д —> f 4- д pointwise. Since C is closed under pointwise limits, we conclude that f € C and f 4- д € C. Hence, f € T>. Therefore, we see that P is closed under pointwise limits. t The forementioned intersection is not vacuous because the collection of all real- valued functions is an algebra that contains the continuous functions and is closed under pointwise limits.
3.1 Borel Measurable Functions and Borel Sets □ 95 The previous paragraph shows that P contains the continuous func- tions and is closed under pointwise limits. Since, by definition, C is the smallest such collection of functions^ it follows that P D C. But, by the definition of P, 'D с C. Thus, T> = C; in other words, f 4- g G C whenever f G C and g G C. Next, let f € C and set £ = {g£C:f + geC}. It follows from what we just proved that £ contains the continuous func- tions. Moreover, the same argument that we used to show that T> is closed under^pointwise limits shows that £ is closed under pointwise limits. Hence, £ = C; i.e., f 4- g € C whenever f € C and g EC. Borel Sets In Chapter 2, we discovered that there is a natural association between the continuous functions and the collection of open sets: A function is continuous if and only if the inverse image of each open set is open. Now we ask which collection of sets corresponds naturally to the Borel measurable functions. As we will see, it is the collection of sets whose characteristic functions are Borel measurable functions. DEFINITION 3.2 Borel Sets A set В C TZ is called a Borel set if its characteristic function, xb, is Borel measurable. The collection of all Borel sets is denoted by B. Thus, B = {BGTZ:xBeC}. To begin, we will prove that the Borel sets form a ст-algebra of subsets of TZ. In order to accomplish this, we will need several lemmas. The proof of the first lemma is left as an exercise for the reader (see Exercise 3.2). LEMMA 3.1 Let h denote the absolute value function, that is h(x) = |x|, x € TZ. Then there is a sequence, {Pn}^=i? of polynomials such that pn —> h pointwise. Next we introduce the notation used for the maximum and minimum of two functions and prove that the Borel measurable functions are closed under those two operations.
96 □ Chapter 3 Lebesgue Theory on the Real Line DEFINITION 3.3 Maximum and Minimum of Two Functions Suppose that f and g are real-valued functions on TZ. Then we define f g — max{/, p} and f Kg = min{/, g}. That is, (/ V g) (x) = max{f(x),g(x)} and (/Aj)(i) = min{/(x),p(x)}. LEMMA 3.2 Iff and g are Borel measurable functions, then so are fVg and f Kg. More generally, if Д, f2, ..., fn are Borel measurable, then so are /1 V • • • V fn and fi A • • • Kfn. PROOF: We first note that the following identities hold: fVg= |(/ + 5 + l/-5l) (3.1) and /Л5=|(/ + 5-|/-5|). (3.2) Next we show that |F| € C whenever F E C. Use Lemma 3.1 to choose a sequence of polynomials, such that pn(z) —► |x| for all x e TZ. Since C is an algebra of functions (Theorem 3.1), it follows that pn о F € C for all n € ЛЛ But pn о F —* \F\ pointwise and, therefore, as C is closed under pointwise limits, |F| € C. Now suppose that f,g eC. Then \f — g\ EC (why?). Using (3.1) and (3.2) and the fact that C is an algebra, we deduce that f V g e C and f К g e C. The remaining conclusions of the lemma follow by employing mathematical induction. LEMMA 3.3 If {/n}Xi 2S a sequence of Borel measurable functions with {/nW}n=i bounded for each x eTZ, then supn fn and infn fn are Borel measurable. PROOF: By Lemma 3.2, if /i, /2? • • •, fn are Borel measurable, then so аге Л V • • • V fn and faK---K fn. But, sup fn = lim /1 V • • • V/n
3.1 Borel Measurable Functions and Borel Sets □ 97 and inf fn = lim fi Л • • • Л fn. n n—>oo The lemma now follows because C is closed under pointwise limits. Now that we have established Lemmas 3.1-3.3, we can prove that the collection В of Borel sets is a cr-algebra of subsets of 11. THEOREM 3. 2 The collection of Borel sets, В = { В C 1Z : хв € C }, is a ст-algebra of subsets of 1Z. PROOF: We first show that the collection of Borel sets is closed under complementation. Assume В € В. Then, by definition, хв € C. Since 1 € C and C is an algebra, we conclude that 1 — xb C C. But 1 — хв = Xbc and, consequently, Bc E B. Next we show that the collection of Borel sets is closed under countable unions. Suppose that Bn E B, for n E Then хвп E C for n 6 jV and, therefore, by Lemma 3.3, supnXBn € C- But supnXBn = X|j~ Bn- Hence, UXi Bn e B. ”=1 Further Properties of Borel Sets and Borel Measurable Functions It is left as an exercise for the reader to show that if О is an open set, then xo is a Borel measurable function. In other words, every open set is a Borel set. We will prove shortly that, in fact, В is the smallest a-algebra that contains all the open sets. But first, we will justify our contention that it is natural to associate the Borel sets with the Borel measurable functions, as we do the open sets with the continuous functions. Specifically, we wrill show that a function is Borel measurable if and only if the inverse image of each open set is a Borel set. In order to accomplish this, we will introduce another collection of functions which, as we will see, turns out to be identical to the collection of Borel measurable functions. LEMMA 3.4 Let F = {f : /-1(O) € В for all open sets O}. Then F contains the continuous functions.
98 □ Chapter 3 Lebesgue Theory on the Real Line PROOF: Suppose that f is a continuous function on 1Z. Then, by Theo- rem 2.5 on page 66, /-1(O) is open whenever О is open. But every open set is a Borel set (Exercise 3.3). Therefore, /-1(О) € В whenever О is open. This shows that F contains the continuous functions. LEMMA 3.5 f € F if either of the following conditions hold: a) For each a ElZ, f~r ((—oo, a)) € B. b) For each a ElZ, f~x ((a, oo)) € B. PROOF: We will prove part (a). The proof of part (b) is similar and is left as an exercise. So, suppose that f satisfies the condition in part (a). We claim that f E F\ that is, /-1(O) € В for all open sets, O. Set A={Ac1Z:f~\A)EB}. Because f-^A0) = [/“^A)]6, /“HUn^n) = Un/”1^), and В forms a а-algebra, it follows that A is a cr-algebra. Now, by assumption, A contains all sets of the form (—oo,a), where а E 1Z. If a € 7£, then we can write (—oo, a] = A^i(-00,a + Vn); therefore, (—oo, a] 6 A because A is a cr-algebra. This in turn implies that (a, 6) € A for each a, 6 € 7£, since (a, 6) = (—oo, b) П (-oo,a]c. It now follows easily that A contains all open intervals. But, by Proposition 2.13 on page 59, every open set is a countable union of open intervals. Conse- quently, A contains all open sets. This means that /-1(О) € В for all open sets, O; that is, f € F. LEMMA 3.6 F is closed under pointwise limits. PROOF: Suppose that {#n}^Li C F and let д = supn gn. Then, for a € 7£, (j-1 ((a, oo)) = UXi^n1((a>00)) e Therefore, by the previous lemma, supn gn £ F. Similarly, infn gn E F. Now suppose that {/rJXi c an^ /п —* / pointwise. Then for each x E 1Z, lim^oo fn(x) = f(x) and so limsupn^^ fn(x) = f(x). Thus, f = infn{supfc>n/fc}. Let gn = supfc>n/fc. Then the previous paragraph shows in turn that gn E F for all n G and infn gn € F. Hence, f E F. COROLLARY 3.1 F contains the Borel measurable functions.
3.1 Borel Measurable Functions and Borel Sets □ 99 PROOF: By Lemmas 3.4 and 3.6, T7 contains the continuous functions and is closed under pointwise limits. Since the collection of Borel measur- able functions, C, is the smallest collection of functions that contains the continuous functions and is closed under pointwise limits, it must be that ПС. LEMMA 3.7 Let f 6 fF. Then there is a sequence, {fn}™=i, of Borel measurable func- tions such that fn—*f pointwise. PROOF: First of all, note that if a,b G TZ, then /“1([a, 6)) € В (why?). For n G AT, let z? J k \ ^k + X\ /-1 f ffc fc + Enk = < x : - < f(x) <------ > = f ------ { n n J \Ln n / / for к = 0, ±1, ±2, .... Then Enk G В and so XEnk € C. Since C is an algebra of functions and is closed under pointwise limits, the function o° i к . fn= 52 ^xe^ = Jim 12 71 k-+oo П k= — tx> j=—k is in C. It is easy to see that |/(x) — fn(x)\ < 1/n for all x G 7Z. Hence fn —* f pointwise (in fact, the convergence is uniform). Using the preceding results, it is now evident that T7 and the collection of Borel measurable functions are one and the same. Specifically, we have the following theorem. THEOREM 3. 3 A function f is Borel measurable if and only if the inverse image of each open set under f is a Borel set; that is, if and only if f~r{0) G В for all open sets O. PROOF: By Corollary 3.1, J7 D C. Conversely, suppose^that f eF. Then, by Lemma 3.7, f is the pointwise limit oHunctionsjn C. Since C is closed under pointwise limits, this implies f G C, Thus, C D 7. We mentioned earlier that the collection of Borel sets, B, is the smallest a-algebra of sets that contains the open sets. Here is a proof of that result.
100 □ Chapter 3 Lebesgue Theory on the Real Line THEOREM 3. 4 The collection of Borel sets, B, is the smallest ст-algebra of subsets of R that contains all the open sets. PROOF: We already know that В is a cr-algebra that contains all the open sets. Let A be any cr-algebra that contains all the open sets. We claim that В C A. Let Q = {f : /-1(O) € A for all open sets O}. The arguments used for Lemmas 3.4 and 3.6 depend only on the fact that В is a cr-algebra containing the open sets. It follows that Q contains the continuous functions and is closed under pointwise limits; thus, Q D C. This last fact implies that Xb E Q for all В G B. But then, for each В e В, В = Хв* ((1/2,3/2)) € A. Thus В C A. Remarks: In many texts, the collection of Borel sets, B, is defined to be the smallest cr-algebra of sets that contains the open sets; and a function is defined to be Borel measurable if the inverse image of each open set is a Borel set. As we see from Theorems 3.3 and 3.4, the definitions presented here (Definitions 3.1 and 3.2) are equivalent to those. It seems more nat- ural, though, to introduce the Borel measurable functions in a way that is motivated by a defect in the collection of continuous functions; namely, the defect of not being closed under pointwise limits. Once this introduction is accomplished, however, it may indeed be easier to think of Borel measur- able functions via Theorem 3.3 and Borel sets via Theorem 3.4. Moreover, those characterizations will be used as a means to define Borel sets and Borel measurable functions in more general contexts. Here now are several examples that illustrate Borel measurable func- tions and Borel sets. We have left some of the justifications as exercises for the reader. EXAMPLE 3.1 Illustrates Borel Measurable Functions and Borel Sets a) Because В is a cr-algebra containing the open sets, it follows that all open sets, closed sets, and intervals are Borel sets. b) If В is a countable set, then В e B; in particular, Q e B. From this, it also follows that the set of irrational numbers, 1Z \ Q, is in B. c) By definition, any continuous function is Borel measurable. d) xq is Borel measurable because Q € B. Indeed, if В € В, then хв is Borel measurable by the definition of B. e) If Вг, ..., Bn E В and ai, ..., an e R, then f = akXBk is Borel measurable. This follows immediately from part (d) and the fact that C is an algebra of functions. In particular, all step functions are Borel measurable.
3.1 Borel Measurable Functions and Borel Sets □ 101 f) Every monotone function is Borel measurable, as the reader can easily verify by applying Lemma 3.5. g) A real-valued function f on TZ that is 0 except on a countable set is Borel measurable. Indeed, suppose К is countable and f(x) = 0 for x K. Let {яп}п be an enumeration of K. Then f = f(xn)X{xny If К is finite, then f is Borel measurable by parts (a) and (e). If К is infinite, then f is the pointwise limit of the Borel measurable functions 52fc=i f(xk)X{xk}> n e and, hence, is itself Borel measurable. □ Borel Measurable Functions and Borel Sets on Subsets of TZ We conclude this section with a brief discussion of Borel measurable func- tions and Borel sets when the underlying space is some subset D C TZ. The pertinent definitions and theorems are obvious modifications of those discussed earlier. DEFINITION 3.4 Borel Measurable Functions on D We denote by C(D) the smallest collection of real-valued functions on D that contains the continuous functions on D and is closed under pointwise limits. The members of C(D) are called Borel measurable functions on D. DEFINITION 3.5 Borel Sets of D A set В C D is called a Borel set of D if its characteristic function is a Borel measurable function on D. The collection of all Borel sets of D is denoted by B(D). Thus, B(D) = { В C D : Xb € C(D)}. Using arguments similar to those used earlier, we can obtain the fol- lowing theorems: THEOREM 3. 5 A function f is Borel measurable on D if and only if the inverse image of each open set under f is a Borel set in D, that is, /~1(O) € B(D) for all open sets O.
102 □ Chapter 3 Lebesgue Theory on the Real Line THEOREM 3. 6 The collection of Borel sets of D, B(D), is the smallest cr-algebra of subsets of D that contains all open sets in D. An interesting and useful characterization of Z3(P) is given by the following theorem. Note the analogy with open sets in D (see Theorem 2.3 on page 62). THEOREM 3. 7 В € B(D) if and only if there is an A E В such that В — DC\ A. That is, the Borel sets of D are precisely the Borel sets (ofTZ) intersected with D. PROOF: Let A = {D A A : A € B}. We claim that A = B(JD). It is easy to see that A is a cr-algebra of subsets of D and, since В contains all open sets of TZ, A contains all open sets of D. Thus, by Theorem 3.6, A D B(P). Now, let C be any cr-algebra of subsets of D that contains the open sets of D and set Then P contains the open sets of 1Z because C contains the open sets of D. Also, V is a а-algebra because В and C are. Consequently, P D B. But, by definition, P С B. Hence, P = B. It now follows that А С C and, since C was an arbitrary a-algebra of subsets of D that contains the open sets, we conclude that A C B(D). This last result and the previous paragraph show that A = B(D). EXERCISES 3.1 3.1 Prove parts (b) and (c) of Theorem 3.1 on page 94. ★3.2 Let h(x) = |x|. Prove Lemma 3.1 on page 95 by proceeding as follows: a) Show that there exists a sequence of polynomials that converges uni- formly to h on [—1,1]. Hint: Consider the Taylor series expansion for (1 — on [0,1]. b) Use part (a) to conclude that for each compact subset К of TZ, there exists a sequence of polynomials that converges uniformly to h on К. Hint: If 6 > 0, we can write |x| = b | f |. c) Use part (b) to conclude that there exists a sequence of polynomials that converges to h uniformly on each compact subset of TZ. d) Deduce Lemma 3.1 from part (c). 3.3 Prove that every open set is a Borel set by showing that for each open set, O, xo is a Borel measurable function. Hint: Begin by showing that Xi is Borel measurable for each open interval I.
3.2 Lebesgue Outer Measure □ 103 3.4 Verify part (b) of Lemma 3.5 on page 98. 3.5 Show that f is Borel measurable if and only if /-1(B) G В for all Borel sets B. 3.6 Let D be a dense subset of 71. Show that f is Borel measurable if either of the following conditions holds: a) For each d G D, J”1 ((-oo, d)) G B. b) For each d G B, Z”1 ((d, oo)) G B. 3.7 Show that all closed sets and all intervals are Borel sets. 3.8 Prove that every monotone function is Borel measurable. 3.9 Prove Theorems 3.5-3.7. 3.10 Verify (3.1) and (3.2) on page 96. 3.11 Show that any countable subset of is a Borel set. 3.12 For subsets A and В of 11, define Л 4-B = {a + 6 : a G A and b G В }. Suppose that В is a Borel set. Prove that A + В is a Borel set if A is a). countable. b) open. 3.13 Most functions encountered in a calculus course can be obtained from the identity function, i(x) = x, using the standard operations of algebra (sums, products, quotients, and the extraction of roots) together with the operation of passing to the limit in a sequence of functions. For example, Explain why any function obtained using the forementioned operations is a Borel measurable function. 3.2 LEBESGUE OUTER MEASURE In the previous section, we enlarged our basic collection of functions from the continuous functions to the Borel measurable functions. Although both of those collections of functions are algebras, the latter collection has the advantage of being closed under pointwise limits. Our next goal is to extend the Riemann integral to an integral that applies to all Borel measurable functions. The extension is not trivial since there are Borel measurable functions that are not Riemann integrable. Indeed, as we learned in Theorem 2.7 on page 86, a bounded function is Riemann integrable if and only if it is continuous except on a set of measure
104 □ Chapter 3 Lebesgue Theory on the Real Line zero. There are certainly Borel measurable functions that do not satisfy this last condition (e.g., xq). Referring to Section 2.6, beginning on page 81, we see that the Rie- mann integral is developed by first defining the integral of a step function, fc(a:) = Efc=! akXik (*), on [a, 6] to be fb n / h(x)dx = ^ak£(Jk) Ja fc=i where 1(1) denotes the length of an interval I. Therefore, the definition of the Riemann integral ultimately depends on the concept of length, which applies only to intervals of real numbers. To obtain an integral that applies to all Borel measurable functions, we proceed by analogy with the development of the Riemann integral. Specifically, we must first define the integral of a Borel measurable function of the form s(x) = where the B^s are Borel sets. If the B^s are intervals, then s is a step function and we simply define the integral to equal the Riemann integral, ^адДВ^). If the B^s are not intervals, then what? It seems that we need to generalize the concept of length so that it applies to arbitrary Borel sets. The Definition of Lebesgue Outer Measure The concept of length will be extended and replaced by that of measure. As we will see, this is by no means a simple procedure. Let us denote the required measure by the Greek letter, /z, and the collection of subsets of 71 to which it applies by the letter A. Subsets of TZ that belong to A are called measurable sets. We will now list some properties that jj, and A should satisfy. Since measure is to be a generalization of length, we require that the measure of an interval be its length; that is, jjl(I) = £(I) for all intervals I. Also, for purely mathematical reasons, we require that A be a tr-algebra; and as we want all Borel sets to be measurable, we require that A D B. Now, clearly, the measure of the union of two disjoint intervals should be the sum of their lengths (measures). More generally, then, we require that the measure of the union of two disjoint measurable sets be the sum of their measures. That is, if A, В € A and А П В = 0, then ц(А U B) = jjl(A) + /z(B). (3.3)
3.2 Lebesgue Outer Measure □ 105 Using mathematical induction, we can show that the previous condition implies that if Ai, A2, ..., An e A and Ai QAj = 0 for i j, then Gn ч n jAfe) = J>(4fe). (3.4) Л=1 ' к=1 This condition on /z is called finite additivity. For purposes of modern mathematical analysis, we need to impose a somewhat stronger condition on our measure than finite additivity; namely, that (3.4) hold not only for finite collections of pairwise disjoint measurable sets but also for countably infinite collections of pairwise disjoint measur- able sets. This condition is called countable additivity. In summary, if /z is the required generalization of length and A is the collection of subsets of TZ that have a length in this extended sense, then the following conditions should be satisfied: (Ml) /z(/) = ^(1), for all intervals I. (М2) A is a cr-algebra and A D B. (М3) If Ai, A2, ... are in A, with Ai A Aj = $ for i j, then д(и^п) =^2м(Л„). ' n ' n Conditions (M1)-(M3) provide us with the means for extending the notion of length to all open sets. First, since every open set is a Borel set, Condition (М2) implies that every open set should be measurable. Now, let О be an open set. Then О is a countable union of disjoint open intervals, say О = {JnIn- Now applying, in turn, Conditions (М3) and (Ml), we infer that m(o) = m(Uz") = E^) = E^n)- ' n ' n n So, we now see how to extend the notion of length to all open sets. For sets that are more complicated than open sets, however, it is not at all obvious what to do. In fact, defining a suitable measure for subsets of 1Z constituted a major problem for mathematicians until the beginning of the twentieth century, when Henri Lebesgue found the key. His idea was as follows: For a subset A C ft, consider all open sets that contain A as a subset. Then define the measure of A to be the greatest lower bound of the measures of all those open sets: inf{ /z(O) : О open, О D A }. (3.5)
106 □ Chapter 3 Lebesgue Theory on the Real Line With this definition, we “close down on A” or “come at A from the outside,” so we call this measure of A its outer measure. Outer measure is defined for all subsets of 1Z. But, as we will see, it is countably additive (i.e., satisfies Condition (М3)) only when restricted to a proper subcollection of subsets of 1Z. Consequently, we will denote outer measure not by p,, but instead by Л*. Below we give a formal definition of outer measure. The definition that we present does not use (3.5) but is equivalent to it. DEFINITION 3.6 Lebesgue Outer Measure For each subset A C 7£, the Lebesgue outer measure of A, denoted by A*(A), is defined by A*(A) = inf < У^£(1П) : {/n}n open intervals, |^Jln D A ► 4 n n , Note: A sequence of open intervals, {/n}n, appearing in Definition 3.6 can be either a finite or infinite sequence. Basic Properties of Lebesgue Outer Measure Some basic properties of Lebesgue outer measure are proved in the next two propositions. PROPOSITION 3.1 Lebesgue outer measure, A*, has the following properties: a) A* (A) > 0, for all A C 11. (nonnegativity) b) A*(0) = O. с) AcB=> A* (A) < A*(B). (monotonicity) d) A*(z + A) = A*(A) for x G 7£, A C 1Z} where x + A = {x + у : у e A}. (translation invariance) e) If {An}n is a sequence of subsets of 1Z, then A‘(UA.) <5>*(ЛП). (3.6) ' n ' n In particular if А, В с K, then X*(A U B) < A* (A) + A*(B). The relation in (3.6) is called countable subadditivity.
3.2 Lebesgue Outer Measure □ 107 PROOF: For each E C 7£, let Se = < У^£(ГП) : {ln}n open intervals, [J/n D E > n n > Then, by definition, A*(E*) = inf{ x : x € Se }• a) If A C 7£, then Sa C [0, oo] so that A* (A) = inf{ x : x G Sa } > 0. b) For e > 0, the interval Ie = (—б/2,б/2) contains 0; so, e = £(IC) G S®. Hence, A*(0) = inf{ x : x E S$} < e for all e > 0. This implies that A*(0) = 0. c) Let u G Sb. Then there is a sequence {ln} of open intervals such that В C UAi and и = But В C \JIn => A C U^n => и G SA- Thus, Sb C Sa and, consequently, A*(A) = inf{x : x G Sa} < inf{x : x G Sb } = A*(B). d) The proof of this part is left to the reader as an exercise. e) If A*(An) = oo for some n, then, by part (c), A*(|JAn) = oo; hence, (3.6) holds. So, assume that A*(An) < oo for all n. Let 6 > 0 be given. For each n, choose a sequence {Ink}k of open intervals such that \JkInk Э An and ^2k£(Ink) < X*(An) + e/2n. Now, the collection of intervals, {Ink}n,k> is countable because Af x Af is countable. Therefore, because Un,k ^k = Un(Ufc Ък) Э Un An, it must be that A‘(U An> < £^Ink) = SE п n,k n к <E(a-(A.) + £) n <£а*(Лп) + €. n As б > 0 was arbitrary, this proves that A*(|Jn An) < A*(An). As we have noted, the domain of A* is P(7£); that is, every subset of has an outer measure. Our question now is whether A* is the desired extension of length. That is, do Conditions (M1)-(M3) hold with /1 = A* and A = Р(Я)? Certainly, Condition (М2) holds; and the next proposition shows that Condition (Ml) holds also. PROPOSITION 3.2 The outer measure of an interval is its length. That is, for every interval I.
108 □ Chapter 3 Lebesgue Theory on the Real Line PROOF: First assume I = [a, b], that is, that J is a bounded and closed interval. If e > 0, then (a — c/2, b 4- c/2) D [a, b] and so A*([a, Ь]) < £(fa — b -I-) = b — a 4- c. Thus, for any e > 0, A*([a, b]) <b — a + e and, hence, A*([a, b]) <b — a. Consequently, it remains to establish that A*([a,b]) > b — a. Let {In} be a sequence of open intervals such that (J/n D [a, b]. We claim that 52 ^(Jn) > b — a. Since {In} is an open cover for [a,b], the Heine-Borel theorem implies that there is a finite subcover, say {Ik}k=1- Now, clearly, SkLi Wk) < SnW- So, we need only show that SitLi^fc) >b-a. As a e [a,fe], there must be an interval, say Ji = (ai,bi), in the collection {Ik)k=i ai < a < ^i- If b < bi, then N = >b-a. fc=l Otherwise, b\ G [a, b], so there must be an interval, say J2 = (n2,b2), in the collection {Zfc}^=1 with a2 < < b2. Note that, necessarily, J2 Ji- If b < b2, then N $>№) > €(Л) + W = (bj - ai) + (b2 - a2) fc=l = (b2 - ai) 4- (bi — a2) > b2 — ai > b — a. Otherwise, b2 G [a,b], so there must be an interval, say J3 = (аз,Ьз), in the collection {Ik}k=i such that аз < b2 < Ьз and, necessarily, J3 / J2 and This process can continue at most N times. Consequently, there is an m G A/* with m < N such that Ji = (a;, bi) G {h}k=i for 1 < z < m and ai <C a, a2 < bi < b2, ..., am <C bm—1 < bm^ b <C bm. Therefore, N m — (bi — ai) + (b2 — a2) -|-4- (bm — dm) к=1 г=1 = (bm — ai) + (bi — a2) 4- (b2 — аз) 4- • • • 4- (bm-i — am) > bm > b a.
3.2 Lebesgue Outer Measure □ 109 Thus, if {In} is a sequence of open intervals with \JIn [<x, 6], then > b — a. But then, by definition, A*([a, b]) > b — a. This last fact and the previously established reverse inequality show A*([a, b]) = b — a. Now, let I be any finite (i.e., bounded) interval. Then for each e > 0, there is a closed interval J with J С I and £(/) < £(J) + e (why?). Thus, £(I) — в < ^(J) = A*(J) < A*(Z). Since e > 0 was arbitrary, it follows that A*(Z) > £(/). But, on the other hand, A*(Z) < A* (7) = £ (l) = £(Z), so that A*(Z) < £(Z). Finally, if I is an infinite (i.e., unbounded) interval, then for each real number M, there is a closed interval, J, of length M with J С I. It follows that A*(Z) > A* (J) = £(J) = M. Hence, A*(Z) = oo. We have seen that Conditions (Ml) and (М2) are satisfied with /i = A* and A = So our question now is: Does Condition (М3) holds with pt = A* and A = P(7£)? If the answer to this question were yes, then A* would be the desired extension of length and every subset of P, would be measurable. The answer, however, is no! In fact, as we will discover in the next section, A* is not even finitely additive. EXERCISES 3.2 3.14 3.15 3.16 3.17 3.18 Let I be any finite interval. Show that for each c > 0, there is a closed interval J with J С I and £(J) < £(J) + 6. Prove part (d) of Proposition 3.1. That is, show that A*(x + A) = A* (A) for x G P, A C 7£, where x + A = {x + у : у G A}. Let A be a set with A*(A) < oo. Show that the function, p, defined by g(x) = A* (An (—oo, x]) is uniformly continuous on P. Show that the Cantor set, P, has Lebesgue outer measure zero. Let E С P. Show that there is a sequence of open sets, such that Oi D O2 D • • • D E and A*(E) = A’ lim A* (On). 3.19 3.20 For A CP and b G 7£, define bA = { ba : a € A }. Show that A*(bA) = |b|A*(A). Suppose that f: P —► P is differentiable at each point of P. a) If |/'(x)| < 1 for each x G P, prove that for each Acft, А*(/(А))<А*(Л). Hint: Use the mean-value theorem. b) Provide an example to show that the previous inequality may fail to hold if |/'(x)| > 1 for some x G P.
110 □ Chapter 3 Lebesgue Theory on the Real Line 3.3 FURTHER PROPERTIES OF LEBESGUE OUTER MEASURE Recall that we are trying to extend the notion of length so that it applies to all Borel sets. Specifically, we are searching for a function /i defined on some collection, A, of subsets of 1Z such that (Ml) /i(J) = ^(7), for all intervals I. (М2) A is a cr-algebra and Л D B. (М3) If Л1, Л2, ... are in A, with Ai for i / j, then In Section 3.2, we discovered that Conditions (Ml) and (М2) are sat- isfied with = A* (Lebesgue outer measure) and A = P(7£). We will prove in this section that Condition (М3) does not hold with /1 = A* and a = р(тг). In fact, we will show that even finite additivity does not obtain. That is, it is possible to find disjoint subsets, A and B, of such that the equation A* (A U В) = A* (A) + A* (B) (3.7) fails to hold. The idea is to choose A and В so that they are disjoint but “sufficiently intermingled.” Finite Additivity Properties of A* It is best to begin by determining conditions on disjoint sets, A and B, that imply that (3.7) holds. Our first result is that if A and В are not only disjoint but are a positive distance apart, then (3.7) is true. Before proving that fact, we need some preliminary definitions and lemmas. DEFINITION 3.7 Distance Between a Point and a Set or Two Sets If x e It and E C 7£, then the distance from x to E, denoted by d(x, B), is defined to be d(x, E) = inf{ \y - x| : у G E }.
3.3 Further Properties of Lebesgue Outer Measure □ 111 If E and F are subsets of 1Z, then the distance from E to F, denoted by d(F, F), is defined to be d(F, F) = inf{ \y — x| : у e F, x e F }. * is It is left as an exercise for the reader to show that (1) for fixed E C 7£, the function d(x, F) is continuous, (2) d(E,F) — inf{d(?/,F) : у 6 F}, and (3) if A C F and В C F, then d(F, F) < d(A, B). LEMMA 3.8 Suppose that I is a finite open interval and let > 0 be given. Then there are a finite number of open intervals, say Jlf ..., Jn, such that £(Jk) < 6, 1 < к < n, I c UL1 Jk, and ££=1 £(Jk) < £(I) + 6. PROOF: The proof is left as an exercise. LEMMA 3.9 Suppose that A is a subset ofIZ with A* (A) < oo. Then for each e, 6 > 0, there is a sequence {In} of open intervals such that £(In) < 6 for all n, Ufn D A, and £€(/п) < A*(A) + e. PROOF: Given e > 0, there is a sequence {Jn} of open intervals such that (J Jn D A and 52f(Jn) < A* (A) + c/2. By Lemma 3.8, for each Jn, there are a finite number of open intervals, say Jni, Jn2, ..., Jnkn> with < 5, 1 < j < kn, Ujli Jnj D Jn, and < £(Л) +e/2n+1. Now, the collection, U{Ai}j=i = ^12’ * ’' ’ ^21’ ^22’ * ’ ’ ’ ^2fc2’ * ’ • ь n is countable, being a countable union of finite collections. We have that €(Jnj) < <5, for each n and j, and fan) + 2n+i) n,j n J=1 n <V(4) + | + S A-(A) + «. n Also, Un,J ^nj — Un(Uj=l D |Jn *7n Иwe ге‘п^ех the collection, {Ju, J12, • • •, Jifcj, J21, J22, • • •, J2fc2, • • •}, using a single subscript and obtain {Jn}n, then this sequence satisfies the conclusions of the lemma.
112 □ Chapter 3 Lebesgue Theory on the Real Line THEOREM 3.8 Suppose that A and В are subsets of 71 that are a positive distance apart; that is, d(A, B) > 0. Then A*(AUB) = A*(A) + A*(B). PROOF: Let 6 = d(A, B). If A* (A U B) = oo, then it follows from Propo- sition 3.1(e) that the conclusion of the theorem holds. So, assume that A*(A U B) < oo. Let e > 0 be given. By Lemma 3.9, there is a sequence {/n} of open intervals such that £(In) < <5 for all n, (JIn D A U B, and £Z(/n)< A*(AuB) + e. Now, let {Jn} denote the members of {In} that contain a point of A and let {A"n} denote the ones that do not contain a point of A. Since A C A U В C (j In, it follows that A C (J Jn. Also, because d(A, B) = 6 and £(In) < 6 for all n, there can be no points of В in any Jn. Therefore, because В C A U В c (JIn, it must be that В C U Kn- Using the definition of outer measure, we conclude that А‘(Я) + A* (В) < < A*(A U B) + e. Because e > 0 was arbitrary, A*(A) + A*(B) < A*(A U B). The reverse inequality is true by Proposition 3.1(e). We can, in fact, improve on Theorem 3.8. Indeed, suppose that A and В are two subsets of 71 with the property that there is an open set, O, with Ac О and В C Oc. Then the conclusion of Theorem 3.8 obtains. Roughly speaking, the reason is as follows. Since О is open, it can be written as a countable union of disjoint open intervals. Because the points of A must lie within these open intervals and the points of В must lie outside of them, the sets A and В cannot be too intermingled. Before we can provide a rigorous proof of the improvement of Theorem 3.8, we need two more lemmas. LEMMA 3.10 Let О be a proper open subset of 71 (i.e., О is open, nonempty, and not equal toll). For each n G let On = < x : d(x, Oc) > — |. I n I
3.3 Further Properties of Lebesgue Outer Measure □ 113 Then, a) On is open and On С О for all nG AC b) Oi С O2 C • • • and (Jn On = O. c) If On / 0, then d(On,Oc) = 1/n. d) If On / 0, then d(On, O£+1) = l/n(n 4- 1). PROOF: The proofs of parts (a) and (b) are left to the reader. c) Since.d(On, Oc) = inf { d(x, Oc) : x G On }, we see that d(On, 0c) > 1/n. To prove the reverse inequality, we first note that because О is open, it can be expressed as a countable union of disjoint open intervals, say the intervals {Ij}j. Now, by assumption, On / 0- This means that there is a у G О such that d(y, Oc) > 1/n. Since у G O, there is a к such that у G Д. Clearly, the distance from у to Oc is the same as the distance from у to the nearest endpoint of Ik- Therefore, if we write Ik = (u^, 6&), then у G (afc 4- 1/n, bk — 1/n). It follows that 0 / (a^ 4- 1/n, bk — 1/n) C On- Note that at least one of the two numbers, ak and bk, must be finite. We will assume that ak is finite. (If ak is infinite, a similar argument holds.) Since (ak 4- 1/n, bk - 1/n) C On and ak 6 Oc, we conclude by applying Exercise 3.21(c) that d(On,Oc) < d((uk + l,bk - = 1- \\ n П/ J n This completes the proof of part (c). d) We first show that d(On,O£+1) > l/n(n 4- 1). Suppose у G On and z G O£+1. By definition, d(y,Oc) > 1/n and d(z,Oc) < l/(n4- 1). Let e > 0 be given. Then there is a w G Oc with \w — z\ < l/(n 4- 1) 4- e. Also, w G Oc implies that \w — y\ > 1/n. Thus, । । । । । । 1 1 1 \Z — у > w — у — W — Z >------------7 — € = —:---— — 6. 1 “ 1 1 1 n n-hl n(n-hl) As z and у do not depend on c, we conclude that |z — t/| > l/n(n + 1). Consequently, because у G On, z G 0£+1 were arbitrary, it follows that d(On,O‘+1)>l/n(n + l). To prove the reverse inequality, let Ik = (а&? &fc) be as in the proof of part (c) and assume as before that ak is finite. Because ak € Oc, ak 4- l/(n 4* 1) G On+i- Therefore, by Exercise 3.21(c), d(On, C)n+i) < d( (ak 4 , bk 4 j г У ) т “ тт* \\ п п/ I п + 1) J п(п4-1) This completes the proof of part (d).
114 □ Chapter 3 Lebesgue Theory on the Real Line LEMMA 3.11 Suppose that A C 1Z and. A* (A) < oo. Assume there is a proper open subset О of 71 with AcO. Let On = {x : d(rr, Oc) > 1/n }. Then A*(A) = lim A*(AnOn) n—>oo PROOF: Let An = An On- Then, by Lemma 3.10(b), Ai С A2 C ••• and, consequently, A*(Ai) < A*(Ag) < • • •. Also, since An C A for all n, A*(An) < A*(A) for all n. By assumption, A*(A) < 00. Thus, {A*(An)}n is a monotone nondecreasing, bounded sequence of real numbers; and, hence, converges to a real number, say a. Clearly, a < A*(A). Now, let Bn = A\An and Cn = An+i \ An. Then A = AnUBn and Bn = cn U Cn+1 U • • •. Thus, A*(A)<A‘(An) + A‘(Bn) (3.8) and A* (Bn) < A* (Cn) + A*(Cn+1) + • • • . (3.9) Now, for n > 2, An+i = АпиСпЭ An-i UCn, so that A*(An_iUCn)<A*(An+i). ' (3.10) Also, An-i C On-i and Cn C O£. So, by Lemma 3.10(d), d(An-i,Cn) > d(On_i,O£) = l/n(n - 1) > 0. Therefore, Theorem 3.8 implies that A*(An_i U Cn) = A*(An-i) + A*(Cn). Using (3.10) and this last equation, we conclude that, for n > 2, A*(Cn)<A*(An+i)-A*(An_i). Then (3.9) implies A‘(Bn) < f>*(An+fc) - A*(An+fc_2)] fc=l so that by (3.8) A*(A) < A*(An) + A*(Bn) < A*(An) + f}[A‘(An+fe) - A*(An+fc_2)]. fc=l
3.3 Further Properties of Lebesgue Outer Measure □ 115 But, A*(An) 4- У^[А*(Лп+&) — Л*(ЛП_|_£_2)] k=i = lim /л*(Ап) + V2[A*(An4-fc) — Л*(ЛП4-А;-2)]1 k k=l ' — lim {—A*(An_i) 4- A*(An+Tn-i) 4- A*(An+Tn)} ттг—+OO = -A*(An_i) + 2a. Hence, we have shown that, for all n, A*(A) < — A*(An_i) 4- 2a. Letting n —> oo, yields A*(A) < a. We have already noted that a < A*(A). Thus, A (A) — a — limn—>0Q A (A^). И THEOREM 3. 9 Suppose that A and В are subsets of with the property that there is an open set О with AcO and BcOc. Then A*(A U B) = A*(A) 4-A*(B). PROOF: If either A* (A) or A*(B) is infinite, then the result is trivial. So, assume that both are finite. If О = 0, then A = 0; and if О = 7£, then В = 0. In either of those cases, the result is also trivial. Hence, we can assume that О is a proper open subset of 7£. As before, let On = { x : d(x, Oc) > 1/n } and An = An On. Because An C On and В C Oc, Lemma 3.10(c) implies that d(An,B) > d(On,Oc) = 1/n and, thus, by Theorem 3.8, A*(AnUB) = A*(An)4-A*(B). Since AnUB C AuB, A*(An U B) < A* (A U B). Thus, for each n 6 X, A* (A U B) > A*(An U B) = A*(An) 4- A*(B). Letting n —> oo and applying Lemma 3.11, we get that A*(A U B) > A*(A) 4-A*(B). Proposition 3.1(e) shows that the reverse inequality holds.
116 □ Chapter 3 Lebesgue Theory on the Real Line Lebesgue Outer Measure Is Not Finitely Additive We have now seen that, under certain conditions, А*(ЛиВ) = A*(A) + A*(B) (3-11) for disjoint subsets, A and B, of TZ. Our next theorem, which we will state and prove shortly, shows that (3.11) does not hold for every pair of disjoint subsets, A and B, of TZ , that is, that A* is not finitely additive. In view of Theorem 3.9, it is clear that if (3.11) fails to hold for disjoint subset, A and B, then those sets must be considerably intermingled. To obtain this intermingling, we proceed as follows. LEMMA 3.12 For x, у G TZ, define x ~ у if and only if у — x G Q. Then ~ is an equiv- alence relation and, hence, partitions TZ into disjoint equivalence classes. Furthermore, there is a set S C [0,1) that contains exactly one element from each equivalence class. PROOF: That ~ is an equivalence relation is left as an exercise for the reader. By the axiom of choice (see page 16), we can select a set T C TZ that contains exactly one element from each equivalence class. Let us set S = { x — [x] : x e T } where [x] denotes the greatest integer in x. Because for each x, x — [x] € [0,1) and x — [x] ~ x, the proof is complete. LEMMA 3.13 Let S be the set defined in Lemma 3.12 and W = (-1,1) П Q. Then a) {S + r}r€q forms a collection of pairwise disjoint sets. b) (0,1)сигеж(^ + г)с(-1,2). PROOF: a) Suppose q, r G Q and (S + q) П (S + r) / 0. Let у G (S + q) П (S + r). Then there exist u, v G S such that y = u + q = v + r. Hence, и ~ v. Since S contains only one element from each equivalence class, we must have и = v, which, in turn, implies q — r. b) Let x G (0,1). Then there is a и G S such that x ~ u. Put r = x — u. Then r G Q and x G S + r. Moreover, since x G (0,1) and S C [0,1), -1 < r < 1. Thus, (0,1) C [JreW(S + r). That [JreW(S + r) C (-1,2) follows immediately from the fact that S C [0,1). Note: Lemma 3.13(a) shows that the sets, {S+r}rGQ, are pairwise disjoint. They are also considerably intermingled as is shown in Exercise 3.27.
3.3 Further Properties of Lebesgue Outer Measure □ 117 THEOREM 3.1 0 Lebesgue outer measure, A*, is not finitely additive. PROOF: Suppose to the contrary that A* is finitely additive. Let {Qn}^=i be an enumeration of the rationale in (—1,1) and set En = S + qn. Us- ing the assumed finite additivity of A*, Proposition 3.2, Lemma 3.13, and Proposition 3.1(c) and (e), we conclude that z od \ oo 1 = A*((0,1)) < A* ( (J En ) < 52 A*(En) 'n=l ' n=l n / 71 \ = lim y'A’CEfc) = lim A* ( I I Ek ) < A* ((-1,2)) = 3. n—>oo ' n—>OO \ v ' k=i 4=i z (3-12) This shows 1 < limn-^oo A* (24) < 3. But, by Proposition 3.1(d), ^A*(Efe) = f>*(S + 9fc) = 5>*(5) = nA‘(S)- fc=l k=l k=l Consequently, 1 < lim^oo nA*(S) < 3, which is impossible (why?). Hence, A* is not finitely additive. COROLLARY 3.2 Lebesgue outer measure, A*, is not countably additive. That is, Condi- tion (М3) does not hold with A* and A = EXERCISES 3.3 3.21 Prove the following facts: a) For fixed E С 1Z, the function d(x, E) is continuous. b) If E and F are subsets of 1Z, then d(E, F) — inf{ d(y, F) : у G E}. c) If A С E and В C F, then d(E, F) < d(A, B). d) d(E,F) =d(E,F). 3.22 Prove the following facts: a) Suppose that F is a closed subset of P, К is a compact subset of P, and F П К = 0. Then d(F, K) > 0. . b) Show that part (a) is not true if it is assumed only that К is closed. 3.23 Prove Lemma 3.8 on page 111. 3.24 Verify parts (a) and (b) of Lemma 3.10 on page 113.
118 □ Chapter 3 Lebesgue Theory on the Real Line 3.25 Suppose that О is open. Prove that Л* (W) = Л* (W П O) + Л* (W П Oc) for all subsets W of TZ. 3.26 Define x ~ у if and only if x - у G Q. Show that ~ is an equivalence relation. 3.27 Let N be a positive integer, {rn}£Lx an enumeration of Q, and S as in Lemma 3.12. For each n G AT, define Sn = S + rn- Prove that there is no open set О with the property that (Jn=o Sn G О and LCLn+i Sn G Oc. 3.28 Suppose that 0 < a < b < 1. Prove that it is possible to select the elements of the set, S, in Lemma 3.12 so that S G (a, b). 3.29 Provide a detailed justification for each step in (3.12) on page 117. 3.4 LEBESGUE MEASURE For ease in reference, we repeat once more that we are searching for a function /j, defined on some collection, A, of subsets of TZ such that (Ml) /z(/) = 1(1), for all intervals I. (М2) A is a (7-algebra and A D B. (М3) If Ai, A2, ... are in A, with А{ A Aj — 0 for i / j, then We have seen that Conditions (Ml) and (М2) hold with /z = Л* and A = P(7£), but that Condition (М3) does not (Corollary 3.2 on page 117). Note, however, that we do not need to have our measure, /z, defined for all subsets of 7Z; Condition (М2) requires only that it be defined on a cr-algebra, A, of subsets of TZ that contains the Borel sets. Thus, one way to get Condition (М3) to hold might be to restrict Л* to some proper subcollection of subsets of 7?,; that is, select A to be a proper subset of P(7Z). And, to do that, we need to identify a criterion for deciding whether a subset of TZ is measurable, that is, is a member of A. By Condition (М2), we must have В C A; so, in particular, A must contain all open sets. Hence, the criterion we select must be satisfied by all open sets.
3.4 Lebesgue Measure □ 119 The Caratheodory Criterion Theorem 3.9 states that if A and В are subsets of TZ with the property that there is an open set, O, with А С О and В C Oc, then A*(A U B) = A*(A) + A*(B). As a consequence of Theorem 3.9, we obtain the following proposition. PROPOSITION 3.3 Let О be an open set. Then A*(W) = A* (TV A O) + A*(TV A Oc) (3.13) for every subset W of TZ. PROOF: For every subset W of TZ, we have W = (IVAO)U(TVAOC). Since W А О с О and W A Oc C Oc, we see that (3.13) is a simple consequence of Theorem 3.9. Equation (3.13) provides an additivity relation for Lebesgue outer mea- sure that is satisfied by all open sets. That relation shows the way to the required criterion for deciding whether a subset of TZ is measurable. DEFINITION 3.8 Caratheodory Criterion A set E C TZ is said to satisfy the Caratheodory criterion if A*(W) = A*(W A E) + X*(W A Ec) (3.14) for all subsets W of TZ. We denote by Л4 the collection of all subsets of TZ that satisfy the Caratheodory criterion. Note: By Proposition 3.1(e), the inequality A*(W) < A* (TV A £?) + A* (TV A Ec) always holds. Consequently, to prove that a subset E of TZ is a member of M, it suffices to establish the inequality A* (TV) > A* (TV A E) + A* (TV A Ec) (3.15) for all subsets W of TZ. The next theorem demonstrates that Condition (М2) holds for the collection, Л4, of subsets of TZ that satisfy the Caratheodory criterion.
120 □ Chapter 3 Lebesgue Theory on the Real Line THEOREM 3.11 Л4 is a а-algebra and МэВ. PROOF: That M is closed under complementation is clear. First we prove that Л4 is closed under finite unions. So, assume А, В G Л4. We claim that A U В G M.. Let W C 7Z. Then, we must show that A* (TV) > A*(TV П (A U B)) + A*(TV П (A U B)c). (3.16) (See the note following Definition 3.8.) Now, we can write W П (A U B) = (TV П A) U (TV П Ac П B) and, consequently, by the subadditivity of A*, A* (TV П (A U B)) < A*(TV П A) + A*(TV П Ac П B). Therefore, A* (TV П (A U B)) + A* (TV П (A U B)c) < A* (TV П A) + A* (TV П Ac П B) + A* (TV П (A U B)c) = A* (TV П A) + [A* ((TV П Ac) П В) + Л* ((TV П Ac) П Bc)] . Because В G Л4, the quantity between the square brackets in the previous expression equals A* (TV П Ac). Thus, A* (TV П (A U B)) + A* (W П (A U B)c) < A* (TV П A) + A* (TV П Ac). This last sum equals A* (TV) because A G Л4. Hence, (3.16) holds. We have now established that Л4 is an algebra of sets. Next, we show that Л4 is closed under countable unions. So, as- sume {En}^Lx с Л4. We must prove that IJJXi G A4. To begin, we disjointize the sets En, n — 1, 2, .... Let Ai = Ex, A2 = E2 \ Ex, A3 = E3 \ (Bi U E2), and, in general, An — En\ (U£=i Then, see Exercise 3.30, At П Aj — 0, for i / j, and (J^Li An = U^=i Moreover, because M is an algebra of sets and En G M for n G V, it follows that An G M for n G Af. Now, let TV be any subset of Л and set E — UXi = UXi ^n- We must show that A* (TV) > A*(W П E) + A* (TV П Ec). By the subadditivity of A*, z OO \ OO = А‘[и(ТГПЛп)) <£Г(И'ПАп). ^n=l ' n=l (3-17)
3.4 Lebesgue Measure □ 121 For each n eV, set Bn = |Jfc=i Аь Then, because A4 is an algebra, Bn G M for all n G A/*. Consequently, for all n, A*(IV) = X*(W A Bn) + X*(W A B„). (3.18) Because Bn C Um=i = E, it follows that Ec C B^. This last fact and (3.18) imply that A*(W) > X*(W A Bn) + Л*(Ж A £?c). (3.19) We will now prove by induction that for all n G Af, A*(TVOBn) = ^A*(TVnAfc). (3.20) k=l The equation holds trivially when n = 1. So, assume that it holds for n. As An+i 6 A4, we have A*(TVoBn+1)-A*((TVDBn+1)nAn+1) + A‘((TVnBn+i)HA‘+1). Because the AfcS are pairwise disjoint, W П Bn+i П An+i — W П An+i and WQBn+i nA„+1 = WriBn. Thus, by (3.21) and the induction hypothesis, A*(TV П Bn+1) = A* (IV П An+1) + A* (TV П Bn) = A*(TV П An+1) + £A*(TV n Ak) = A*(TV О Afc), k=l k=l as required. Employing (3.19) and (3.20), we conclude that A*(TV) > 52 A’(TV О At) + A*(TVnBc) k=l for all n G N and, consequently, A‘(TV) > ^A*(WnAn) +A‘(TVn£?c). n=l Applying (3.17) to the previous inequality, we deduce that ' A*(W) > A*(W A E) + A*(W A Ec). This shows E G M. We have now established that A4 is a cr-algebra.
122 □ Chapter 3 Lebesgue Theory on the Real Line It remains to prove that Л4 D B. By Proposition 3.3, Л4 contains all open sets and, as we have just seen, M is a a-algebra. Consequently, since В is the smallest a-algebra that contains all open sets, it must be that M D B. Our next theorem demonstrates that Condition (М3) is satisfied when Lebesgue outer measure, A*, is restricted to Л4. We denote by A the restriction of Lebesgue outer measure to Л4; that is, А: Л4 —> is defined by A(B) = A*(B). THEOREM 3.12 If Ai, Л2, ... are in M, with Ai nAj = 0 for i / j, then A(UA") = Ew- ' n ' n PROOF: We first prove that A is finitely additive on Л4. So, let А, В E Л4 with А А В = 0. Set W = A U B. Then W A A = A and W A Ac = B. Consequently, since A e Л4, we have by (3.14) that A(A U B) = A(W) = A*(W) = A*(W A A) + A*(W A Ac) = A*(A) + A*(B) =A(A) + A(B). This shows that A is finitely additive. Suppose now that {An}^_1 С Л4 with Ai A Aj = 0 for i / j. Using the fact that A is finitely additive on Л4 and the monotonicity of Lebesgue outer measure, we conclude that zn / m \ / 00 £A(Afc) = A(|jAfc) <A(U An k—1 ^k-1 ' 'n—1 for all m E X. Letting m —> 00 gives ^(^n) < A (U^Li ^n)« The re- verse inequality obtains because of the countable subadditivity of Lebesgue outer measure. Lebesgue Measurable Sets and Lebesgue Measure From Proposition 3.2 on page 107 and Theorems 3.11 and 3.12, we see that Conditions (M1)-(M3) are satisfied with /z = A and A = Л4; that is,
3.4 Lebesgue Measure □ 123 (LI) A(/) = £(/), for all intervals I. (L2) At is a a-algebra and M D B. (L3) If Ai, A2, ... are in At, with Ai C\Aj=$ for i / j, then A Consequently, the set function, А: At 11, is the required extension of length. We will employ the following terminology: DEFINITION 3.9 Lebesgue Measurable Sets and Lebesgue Measure The members of At are called Lebesgue measurable sets. That is, E is a Lebesgue measurable set if and only if for every subset W of 1Z, A*(W) = A*(W П E) + A*(W П £?c). The restriction of Lebesgue outer measure to At is denoted by Л and is called Lebesgue measure. In the next few propositions, we will establish some additional prop- erties of Lebesgue measure and Lebesgue measurable sets. PROPOSITION 3.4 A subset of 1Z with Lebesgue outer measure zero is a Lebesgue measurable set; that is, X*(E) = 0 => E 6 M. PROOF: Suppose that A*(E) = 0. Let W be an arbitrary subset of 1Z. Since W П E С E, we conclude from the monotonicity of Lebesgue outer measure that A*(W П E) < A*(E) = 0. Using the fact that W П Ec C W, we now conclude that A*(W) > A*(W П Ec) = A*(W П E) + A*(W П Ec). This last inequality shows that E E Л4. PROPOSITION 3.5 Every countable subset of 71 has Lebesgue measure zero.
124 □ Chapter 3 Lebesgue Theory on the Real Line PROOF: Let E с ft be countable, say E = {xn}“=1. Then we can write E = U~=i{®n}. Note that if a 6 P, then, by. (LI), A({a}) = A([a, a]) = a — a = 0. Therefore, applying (L3), we conclude that / OO \ oo Л(Е)=А( J{xn})=£A({2:n})=0) 'n=l ' n=l as required. The next proposition shows that the converse of Proposition 3.5 does not hold. PROPOSITION 3.6 The Cantor set, P, has Lebesgue measure zero. PROOF: Let G = [0,1] \ P. From Chapter 2 (page 59), we know that G can be written as a countable union of disjoint open intervals, {/nKXp with the property that = L Hence, by (L3) and (LI), oo oo A(G) = = €(/„) = !• n=l n=l Clearly, P and G are disjoint and P U G = [0,1]. Therefore, 1 = A(P) + A(G) = A(P) + 1, which shows that A(P) = ft. Another useful result is the following. THEOREM 3.13 Suppose that {P'nl^Li is a sequence of Lebesgue measurable sets with Er С E2 C • • •. Then x([)En} ^Im^A^). '71=1 ' П-*00 PROOF: If A(Pn) = oo for some n, then both sides of the previous equation equal oo. So, assume A(£'n) < oo for all n.
3.4 Lebesgue Measure □ 125 To begin, we disjointize the Ens. Let Аг = Ei and, for n > 2, let An — En\ En-i- Then it is easy to see that {An}^Lj с M, Ai П Aj — 0 for i j, and |J“ x An = U^Li En. Therefore, by countable additivity, GOO \ z OO \ oo j£n)=A (|j4n)=£A(An). »*=1 ' 'n=l ' n=l Because En-\ C En, we have А(ЛП) = A(En \ -En-i) = A(En) - A(En_i) for n > 2. Consequently, 0OO \ oo n J En ) = £ A(An) = Jim A(Afc) ,b=l ' n=l k=l (n \ A(Ei) + V[A(Efe) - A(Efc_x)] = lim A(En), * / n—>oo k=2 / as required. The Relation Between В and M We close this section by discussing the relationship between the collection of Borel sets, B, and the collection of Lebesgue measurable sets, Л4. By Theorem 3.11, В С Л4. The question now is: Does В = Л4? In other words, is every Lebesgue measurable set a Borel set or are there Lebesgue measurable sets that are not Borel sets? It is not easy to answer that question. In fact, Lebesgue and Borel argued the question without finding the answer. It turns out that the answer to the question is no — there are Lebesgue measurable sets that are not Borel sets. In other words, we have the following theorem: / THEOREM 3.14 The a-algebra of Borel sets, B, is a proper subcollection of the a-algebra of Lebesgue measurable sets, Л4. PROOF: See Exercise 3.50. EXERCISES 3.4 3-30 Let be any sequence of subsets of 71. Define Ai = Ei, Аъ = Еъ\Ех, Аз = Ез \ (Ei UE2), and, in general, An = En \ (UZ=i f°r n £ Ap- prove that Ai П Aj = 0, for i / j, and IJXi = UZ=i
126 □ Chapter 3 Lebesgue Theory on the Real Line 3.31 In Chapter 2, we introduced the concept of measure zero. Prove that this concept is equivalent to that of Lebesgue measure zero. In other words, show that a subset ECU has measure zero in the sense of Definition 2.19 on page 85 if and only if X(E) = 0. ★3.32 Verify that if A G M, X(A) = 0, and В C A, then В G M and A(B) = 0. 3.33 Suppose that A, В G M are such that А С В and A(A) < oo. Show that A(B \ A) = A(B) — A(A). 3.34 Use properties of Lebesgue measure to supply a simple proof that any (non- degenerate) interval of P is uncountable. 3.35 Suppose that {Bn}^=1 С M and that Ei D E2 D • • •. Also suppose that A (Bi) < oo. Prove that A lim A(Bn). Can the assumption that A(Bi) < oo be dropped? Why? 3.36 Show that if A, В G M and A(A A B) < oo, then A(A U B) = A(A) + A(B) - A(A П B). 3.37 Suppose A* (A) = 0. a) Show that for any set B, A* (A U B) = A*(B). b) Show that if A U В G Л4, then В G Л4. 3.38 Find a sequence of pairwise disjoint sets, {An}^°=1, such that strict inequal- ity holds in the relation A* ✓ OO X oo Sl=l ' n=l A*(An). Hint: Is {An}“=1 С M possible? ★3.39 If 0 < a < 1, construct a set, Pa, in a manner similar to that in which the Cantor set is constructed, except that at the nth step remove open intervals of length a/3n instead of l/3n. Show that Pa is closed and that A(Pa) = 1 — a > 0. 3.40 Prove that there is a sequence of continuous functions, on [0,1] that converge pointwise to a function f £ B([0,1]). Hint: Use Exercise 3.39. 3.41 Prove that there is a Riemann integrable function on [0,1] that is not a Borel measurable function. Hint: The proof of Theorem 3.14, which is carried out in Exercise 3.50, shows there is a subset of the Cantor set that is not a Borel set. 3.42 Suppose that В G Л4. Show that for each e > 0, there is an open set, O, with О D В and A(<9 \ B) < €. Hint: First consider the case where A(B) < oo and use the definition of Lebesgue outer measure.
3.4 Lebesgue Measure □ 127 ★3.43 Suppose that E G Л4. Show that for each € > 0, there is a closed set, F, with F С E and X(E \ F) < 6. ★3.44 A set is called a G^-set if it is the intersection of a countable number of open sets; and a set is called an Fa-set if it is the union of a countable number of closed sets. Note that G^-sets and Fa-sets are Borel sets. Now suppose that E G Л4. a) Show that there is a G$-set, G, and an Fa-set, F, such that F С E C G and A(E \ F) = A(G \ E) = 0. b) Referring to part (a), deduce that A(F) = A(E) = A(G). 3.45 Let E G M. Prove that A(E) = inf{ A(O) : О D E, О open}. 3.46 Let E G Л4. Prove that A(E) = sup{ A(K) : К С E, К compact}. 3.47 Let E C TZ. a) Suppose there is a Borel set, B, such that В С E and A*(E \ B) = 0. Show that E G M. b) Suppose that A*(E) < oo and that A*(E) = sup{ A(F) : F С E, F closed } = inf{ A(G) : О D E, О open }. Show that E G M. 3.48 Suppose that {En}^! is a sequence of pairwise disjoint Lebesgue measur- able sets. Prove that for all subsets A of TZ. 3-49 Suppose that E G M and that A(E) < oo. Show that for each б > 0, there axe a finite number of pairwise disjoint intervals, /i, /2, • • • Д, such that ★3_50 Prove Theorem 3.14 on page 125. Proceed by establishing each of the following facts: a) If C G M and x G TZ, then C + x G M and A(C + x) = A(C). b) Let S be the set defined in Lemma 3.12 on page 116. If C G M and G C S, then A(G) = 0. Hint: Consider { C + r : r G (-1,1) П Q }.
128 □ Chapter 3 Lebesgue Theory on the Real Line c) If D C and A*(Z>) > 0, then there is a nonmeasurable subset of D. Hint: Let Dr = D A (S + r) for r € Q. Use parts (a) and (b) to show that if Dr € At, then A(£>r) = 0. d) Define f: [0,1] —> TZ by f(x) = x + V’M? where denotes the Cantor function (see page 77). Then f is a strictly increasing function and maps [0,1] onto [0,2]. e) The function, g — f is continuous and, hence, Borel measurable. f) f maps the Cantor set onto a set, 4, with A(4) = 1. g) Let E C A with E M. [Such an E exists by parts (f) and (c).] Then f-\E) ем but f-'(E) $B. 3. 51 Prove that the set S defined in Lemma 3.12 on page 116 is not a Lebesgue measurable set. 3.5 THE LEBESGUE INTEGRAL FOR NONNEGATIVE FUNCTIONS Recall that our reason for generalizing the concept of length to all Borel sets is so that we can extend the Riemann integral to an integral that applies to all Borel measurable functions. We have, in fact, generalized the concept of length to all Lebesgue measurable sets. Consequently, we will be able to extend the Riemann integral to an integral that applies to a much larger collection of functions than the Borel measurable functions. We will call that larger collection of functions the Lebesgue measurable functions. Lebesgue Measurable Functions There are two ways that we can approach the definition of Lebesgue mea- surable functions. Here is the first approach: Taking our cue from the development of the Riemann integral, we begin by defining the integral of a function of the form s(x) — ^акХЕк- If the E^s are intervals, then s is a step function and we simply define the integral to equal the Riemann integral, У^акДЕь)- But now that we have generalized the concept of length, we can do much better. Provided only that the EkS are Lebesgue measurable sets, we define the integral to be £2а&А(Е^). In particular, we see that every function of the form where E G Л4, should be a Lebesgue measurable function; that is, should be integrable in the extended sense. Since we want the collection of all Lebesgue measurable functions to constitute an algebra and be closed under pointwise limits, we make the following definition.
3.5 The Lebesgue Integral for Nonnegative Functions □ 129 DEFINITION 3.10 Lebesgue Measurable Functions We denote by £ the smallest algebra of real-valued functions on 7Z that contains all functions of the form хе, E e M, and is closed under pointwise limits. The members of £ are called Lebesgue measurable functions. Our second approach to obtain the definition of Lebesgue measurable functions is by analogy with a characterization of Borel measurable func- tions. Specifically, as we know from Theorem 3.3 (page 99), a function f is Borel measurable if and only if the inverse image of each open set under f is a Borel set; that is, if and only if /~1(О) e В for all open sets O. This leads to the following definition of Lebesgue measurable functions: DEFINITION 3.11 Lebesgue Measurable Function A real-valued function f on ft is said to be a Lebesgue measurable function if the inverse image of each open set under f is a Lebesgue measurable set; that is, if /~1(О) e M for all open sets O. Note: For brevity, we will often indicate that a function is a Lebesgue measurable function by saying that it is an Ad-measurable function. It really doesn’t matter whether we use Definition 3.10 or Defini- tion 3.11 because the two definitions are equivalent (see Exercise 3.53). But to be specific, we will take Definition 3.11 as our definition of Lebesgue measurable functions. Our next proposition, whose proof is similar to that of Lemma 3.5 on page 98 and is left to the reader, provides some useful equivalent conditions for a function to be Lebesgue measurable. PROPOSITION 3.7 Let f be a real-valued function on ft. Then the following statements are equivalent: a) f is M-measurable. b) For each a eft, /-1((-оо,а)) e M. c) For each a e ft, У”1 ((a, oo)) e M. d) For each a eft, oo,a]) e AT e) For each a eft, /“1([a, oo)) e M.
130 □ Chapter 3 Lebesgue Theory on the Real Line Several important properties of Lebesgue measurable functions are given in the next two theorems. We postpone the proofs of those theo- rems until Chapter 4, where more general results will be established. THEOREM 3.15 The collection of Lebesgue measurable functions forms an algebra. That is, if f and д are Ad-measurable and a ETZ, then a) f + д is Ad-measurable. b) f • д is Ad-measurable. c) otf is Ad-measurable. THEOREM 3.16 Suppose that f and д are Ad-measurable functions and that {/n}^Li is a sequence of Ad-measurable functions that converges pointwise to a real- valued function. Then a) fV д is Ad-measurable. b) f Ад is Ad-measurable. c) limn-^oo fn is Ad-measurable. The Lebesgue Integral of a Nonnegative Simple Function We now begin our extension of the Riemann integral to an integral that applies to all Lebesgue measurable functions. First we introduce a special type of Lebesgue measurable function that generalizes the notion of step functions. DEFINITION 3.12 Simple Function and Canonical Representation An jM-measurable function, s, is said to be a simple function if it takes on only finitely many values; that is, if its range is a finite set. Let ai, П2> • • •, denote the distinct nonzero values of s and, for 1 < к < n, set Ak = { x : s(x) — ak }. Then we can write n g = акХАк • (3.22) /с=1 This is called the canonical representation of s.
3.5 The Lebesgue Integral forNonnegative Functions □ 131 It is easy to see that every step function is a simple function, but not every simple function is a step function. Also, we leave it as an exercise for the reader to show that the sets} Ai, A2, ..., An, appearing in the canonical representation of a simple function are Lebesgue measurable and pairwise disjoint. EXAMPLE 3.2 Illustrates Definition 3.12 The function, s = 3x(o,2) +^X(i,3] is a simple function. However, the given representation in not canonical. In fact, the canonical representation of s is 5 = 3X(o,i) — 3x{i} + 5x(i,2) “ 4X{2,3} + 2^(2,3) “ §XM\{ 1,2,3}? as is easily verified. □ Here is the definition of the Lebesgue integral for a nonnegative simple function. As already noted, this definition is a natural generalization of the Riemann integral of a step function. DEFINITION 3.13 Integral of a Nonnegative Simple Function Let s be a nonnegative simple function with canonical representation, s = акХлк • Then the Lebesgue integral of s over is defined by [ s(x)dA(x) = f\A(Afc). fc=i If E e Л4, then the Lebesgue integral of s over E is defined by / s(x)dA(z) = / xe(x)s(x) dA(x). Je Jn The next proposition shows how we can obtain the Lebesgue integral of a nonnegative simple function from a possibly noncanonical representation. PROPOSITION 3.8 Let s be a nonnegative simple function that can be expressed in the form, s ~ IZfcLi ЬкХвк, where this representation is not necessarily canonical but
132 □ Chapter 3 Lebesgue Theory on the Real Line Bk E M for 1 < к < m and Bi П Bj = 0 for i Ф j. Then (x)dA(x) = f;6fcA(Bfc). fc=l More generally, we have [ s(x)dX(x) = ^bkX(BkC}E) k=l (3.23) (3-24) for each E e M. PROOF: Let s = aiXAi be the canonical representation of s. Also, set ao = 0 and Aq = {x : $(#) = 0}. Because the B^s are pairwise disjoint, we know that for each к = 1, 2, ..., m, there is an i (0 < i < n) such that bfc = Let Di = {k : bk = a;}. Then the L^s are pairwise disjoint, U?=o A = {1,2,..., m}, and Ai = U/ced -Bfc for 1 < i < n. Consequently, [ s(x)dX(x) = ^aiXtAi) = ^2ai 52 i=l i=l k^Di = 52 E = 52 52ЬкХ(ы=^ь^в^- i=o keDi i=0 k€Di k=l Thus, (3.23) holds. To establish (3.24), we first observe that 771 771 XbS = 52 = У^ЬкХВкГУЕ- fc=l k—1 Applying (3.23), we now conclude that / s(x)dX(x)= / Xe(x)s(x) dX(x) Je J7l = [ (,Xes)(x) dX(x) = 52 bkX(Bk П E). This completes the proof of Proposition 3.8. We should point out that Definition 3.13 really does provide a gener- alization of the Riemann integral of a step function; that is, the Lebesgue
3.5 The Lebesgue Integral for Nonnegative Functions □ 133 f g(x)dX(x). [a,b] integral of a step function equals its Riemann integral. Indeed, suppose that g is a step function on [a, 6], say g = £X=i akXik, where the I^s are pairwise disjoint subintervals of [a, b]. Then, by Proposition 3.8, bn n g(x)dx = = 5?afcA(Zfc) k=l fc=i n = 52 ak^k П [a, fr]) = fc=i A technicality: We have defined the Lebesgue integral only for functions whose domain is all of TZ] but, the domain of the step function g is only [a, 5]. To remedy this difficulty, define g(x) = 0 for x E [a, b]c. EXAMPLE 3.3 Illustrates the Integral of a Nonnegative Simple Function a) Let s = 3x(-2,-i] + 4X(-i,i) + Then s(x) dA(x) = 3A((—2, -1]) + 4A((-1,1)) + 8A(AQ = 3 • 1 + 4 • 2 + 8 • 0 = 11. b) Let s = xqc = 1xqc- Then, by Proposition 3.8, [ s(x) dX(x) = 1A(QC A [0,2]) = 2. J[0,2] c) Let f(x) = 1; that is, f = хтг- Then f(x) dX(x) = X(TZ) = oo. Thus, the Lebesgue integral of a nonnegative simple function can be oo. □ LEMMA 3.14 Suppose that s and t are nonnegative simple functions and that a, /3 > 0. Then as + /3t is a nonnegative simple function and [ [as(x) + (3t(x)] dX(x) = a f s(x)dX(x)+/3 I t(x)dX(x) Je J e J e for each E E M. PROOF: Let s = £fc=i akXAk and t = 52^=1 bjXBj be the canonical rep- resentations of s and t, respectively. Also, let a0 = 0?'A) = { x : s(x) = 0 }, bo = 0, and Bq = {x : t(x) = 0}. For each к (0 < к < n) and
134 □ Chapter 3 Lebesgue Theory on the Real Line j (0 < j < m), set Ckj = Ak A Bj. Then the CkjS are pairwise disjoint, s = E"=o akxCki, and t = ££=0 E£=o bjXckj Hence, n m as + pt = Y^aak + • fc=oj=Q This last equation shows that as 4- fit is a nonnegative simple function. Moreover, by applying Proposition 3.8, we can deduce that I [as(x) 4- 0t(xy] dX(x) E = I (as 4-dA(rr) J E = ZZ(aa*+^)A(c^nE) fc=0j=0 n 771 71 m - “EEП E) + P£ £Ь,Х(Ск, П E) k=0 j=0 ' k=0 j=0 as required. The Lebesgue Integral of a Nonnegative M-measurable Function Next we will define the Lebesgue integral of a nonnegative Lebesgue mea- surable function. Before doing so, however, it is useful for motivational purposes to prove the following proposition: PROPOSITION 3.9 a) Let f be a nonnegative M-measurable function. Then there is a nonde- creasing sequence of nonnegative simple functions that converges point- wise to f. In other words, there is a sequence, {sn}^_1, of nonnegative simple functions such that, for all x e 71, si(x) < $2(х) < ••• and ИгПп-юо Sn(z) = f(x). b) If {sn}^Li is a sequence of nonnegative simple functions that con- verges pointwise to a real-valued function f, then f is a nonnegative M-measurable function.
3.5 The Lebesgue Integral forNonnegative Functions □ 135 PROOF: a) For each n E X, set 2n — 2n J ’ for m = 1, 2, ..., n2n, and En — {x : /(x) > n}. As f is Lebesgue measurable, the sets Enmi En are Lebesgue measurable sets. Let n2n , Em — 1 —^ГХЕпгп + nXEn- m=l Then {sn}Xi is a sequence of nonnegative simple functions and, clearly, sn < f for all n G ЛЛ Also, by construction, |/(x) — sn(x)| < 2-n as soon as n is large enough so that f(x) < n. Thus, sn —> f pointwise. Next we show that sn < sn+i for all n G ЛЛ Let x G 11. If x G Enm for some m = 1, 2, ..., n2n, then (m — l)2“n < f(x) < m2~n. Therefore, either m — 1 2n 2m — 1 2n+i or 2m — 1 2n+1 < /(*) m 2n‘ < /(*) < In the former case, sn(x) = sn+i(x) = (m — l)2“n and, in the latter case, sn(x) = (m — l)2”n < (2m — i)2“(n+1) = sn+i(x). Consequently, in either case, sn(x) < 5n+i(x). Finally, if x e En, then f(x) > n = (n2n+1)/2n+1. This implies that sn+i(x) > (n2n+1)/2n+1 = n = sn(x). This completes the proof of part (a). b) This part follows immediately from Theorem 3.16(c) on page 130. Proposition 3.9 shows that the functions that can be approximated by nonnegative simple functions are precisely the nonnegative Lebesgue measurable functions. With that proposition in mind, we now define the Lebesgue integrable of an arbitrary nonnegative Ad-measurable function. DEFINITION 3.14 Lebesgue Integral of a Nonnegative Function Let f be a nonnegative Ad-measurable function. Then the Lebesgue integral of f over H is defined by / f(x) dX(x) = sup / s(x)dA(z), Jn s Jn (3.25)
136 □ Chapter 3 Lebesgue Theory on the Real Line where the supremum is taken over all nonnegative simple functions that are dominated by /. If E e Л4, then the Lebesgue integral of f over E is defined by [ f(x)dX(x)= f XE(x)f(x)dX(x). Je Jn (3.26) Note that (3.25) makes sense for any nonnegative function, Lebesgue measurable or not. Thus, we might ask: Why define the Lebesgue inte- gral only for nonnegative Lebesgue measurable functions; why not define it for any nonnegative function? The reason lies in the previous proposi- tion, Proposition 3.9. For, if f is not Lebesgue measurable, then it cannot be approximated by a sequence of nonnegative simple functions. Hence, the quantity on the right-hand side of (3.25) will generally not reflect the behavior of /. We should mention that there are several widely used notations for the Lebesgue integral: fE f (x) dA(x), fE /(x) dx, fE f(x)X(dx), and fE f dX all denote the Lebesgue integral of f over E. By the way, we will refer to the Lebesgue integral simply as “the integral” when there is no possibility of confusion. PROPOSITION 3.10 Let f and g be nonnegative Lebesgue measurable functions, а > 0, and E e AL Then &) f <g=> fEfdX< fEgdX. b) Ac E and AeM^ fAfdX< fEfdX. c) f(x) = 0 for all x с E => fE f dX = 0. d) A(E) = 0=> fEfdX = 0. e) JEafdX = afEfdX- PROOF: a) Suppose that s is a nonnegative simple function that is dominated by XeJ- As f < g, it follows that s is also dominated by xe9- Therefore, ( XEfdX = тг sup 0<3<XEf s simple I sdX< тг s simple sup I sdX 0<8<XEff Jn as required.
3.5 The Lebesgue Integral for Nonnegative Functions □ 137 b) If A C E, then ха/ < Xe/- Thus, by part (a), [ fdX= [ XAfdX< [ XEfdX= [ fdX, J A Jn Jn Je as required. c) If f(x) = 0 for all x G E, then xeJ = 0- Thus, the only nonnegative simple function dominated by XEf is 0- Consequently, f^XEf dX = 0; that is, fEfdX = 0. d) The function xeJ is zero off of E. Therefore, any nonnegative simple function, s, that is dominated by xeJ must also be zero off of E. In other words, $(#) = 0 for x G Ec. Since X(E) = 0, we have A(A) = 0 for all subsets A of E. It now follows that if s is a nonnegative simple function that is dominated by XEf, then f^sdX = 0. Hence, / f dX= sup / s dX = 0. Je o<s<XEf Jn s simple e) If a = 0, then there is nothing to prove. So, assume that a > 0. Clearly, the required result holds for simple functions. Now, let s be a nonnegative simple function that is dominated by xe ’ (afY Then we have 0 < a"1 s < XEf and, hence, by part (a), a-1 f sdX = [ a~1sdX< f XEfdX= f f dX. Jn Jn Jn Je Thus, f^sdX < a fEfdX for each nonnegative simple function, s, that is dominated by xe • (a/)- This last fact implies that / afdX = sup / sdX<a fdX. Je o<s<XE-(af) Jn Je s simple On the other hand, let s be a nonnegative simple function that is dom- inated by xe/- Then we have 0 < as < xe • (a/)- Therefore, by part (a), a / sdX = / asdX< / XE*(»/)dA = / af dX. Jn Jn Jn Je Thus, a f^sdX < fE af dX for each nonnegative simple function, s, that is dominated by xe/- Consequently, a f dX = a • sup / sdX< af dX. Je o<s<XEf Jn Je s simple This completes the proof of part (e).
138 □ Chapter 3 Lebesgue Theory on the Real Line EXERCISES 3.5 3.52 Let G = {xe'-EeM}. Show that Q is closed under pointwise limits. 3.53 Prove that Definitions 3.10 and 3.11 (page 129) are equivalent by proceeding as follows: Let ? = { f : /”1(O) 6 M for all open sets О }. We must prove that У = £. a) Why do we know that F is an algebra of functions and is closed under pointwise limits? Hint: See Theorems 3.15 and 3.16 on page 130. b) Show that if E G M, then xe 6 F. c) Deduce from parts (a) and (b) that F D C. d) Show that T7 C £ by employing a suitable modification of the proof given in Lemma 3.7 on page 99. 3.54 Explain why every Borel measurable function is a Lebesgue measurable function. Is the converse true? Why? 3.55 Prove Proposition 3.7 on page 129. Hint: Refer to the proof of Lemma 3.5 on page 98. 3.56 Suppose that f is Ad-measurable. a) Show that /~2(В) G M for all В G B. b) True or False: f~r(E) G M for all E G Ad. Hint: Refer to Exer- cise 3.50(g) on page 128. 3.57 Verify that if f is Ad-measurable, then { x : f(x) = a } G Ad for each a 6 TZ. Show that the converse is not true. Hint: Let К be a nonmeasurable set; that is, К Ad. Construct a function, /i, such that {rr : h(x) = a} G Ad for each a E and h”1 ((0, oo)) = K. 3.58 Show that if f is Ad-measurable, then so is \f\. 3.59 Show that every step function is a simple function but not every simple function is a step function. 3.60 Suppofee that the sets, Ai, A2, . -., An, are the ones appearing in the canoni- cal representation of a simple function, s. Prove that those sets are Lebesgue measurable and pairwise disjoint. 3.61 Theorem 3.16(c) on page 130 indicates that if is a sequence of Ad- measurable functions converging pointwise to /, then f is Ad-measurable. What can be said if the family of functions is indexed by an uncountable set? Specifically, supppse that {/t}te(o,oo) is a family of Ad-measurable functions that converges pointwise to f; that is, lim*_00/*(□?) = f(x) for all x 6 H. Is f necessarily Ad-measurable? 3.62 Let E be a Lebesgue measurable set with A(B) < 00. Suppose that {fn}^=1 is a sequence of Lebesgue measurable functions that converges pointwise on E to a function f. Prove that for each pair of positive numbers, e and <5, there is an N G AT and a Lebesgue measurable set A С E such that A(A) < 6
3.5 The Lebesgue Integral forNonnegative Functions □ 139 and \f(x) — fn(x)\ < e for x G E \ A and n > N. Hint: Let Em = {x G E : |f(x) — fn(x)\ > e for some n > m} and apply Exercise 3.35. ★3.63 Egorov’s theorem: The following result shows that, in a certain sense, pointwise convergence of measurable functions is close to being uniform convergence: Let E be a Lebesgue measurable set with X(E) < oo. Suppose that {/n}^! is a sequence of Lebesgue measurable functions that converges pointwise on E to a real-valued function f. Prove that for each 5 > 0, there is a Lebesgue measurable set В С E with A(B) < 6 such that fn —* f uniformly on E \ B. Hint: Apply Exercise 3.62 with e replaced by 1/k and 6 replaced by 6/2k. +3.64 Prove the following facts: a) Suppose that F is a nonempty closed subset of "R, and О is a proper open subset of 7£. Further suppose that F С O. Then there is a continuous function, /, such that f(R) C [0,1], f(F) — {1}, and f(Oc) = {0}. Hint: f can be constructed from the functions, d(- , F) and d(- ,OC). b) Let E G M. Then there is a sequence of open sets, {On}^1? and a sequence of closed sets, {Fn}^^, such that for all n G AT, Fn С E C On, Fn C Fn+i, On Э On+i, and А((П~ , On) \ (U~ i Я»)) = 0. c) Let E 6 AL Then there is a Lebesgue measurable set, B, with A(B) = 0, and a sequence of continuous functions, with 0 < gn < 1 for all n 6 AT, such that limn->oo gn(x) = xe(x) for each x G Bc. d) Let s be a simple function with |s(z)| < M for all x G where M is a real number. Then there is a sequence of continuous functions, and a Lebesgue measurable set, B, such that A(B) = 0, |^n(a;)| < M for x G and n G AT, and limn—oo gn(x) — s(x) for x G Bc. e) Let f be a nonnegative A4-measurable function that is bounded by the real number M and vanishes outside of a finite interval, [—L, L]. Then there is a Lebesgue measurable set В C [—L, L] with A(B) = 0 and a sequence of continuous functions, such that 0 < gn(x) < M for x G H and n G AT, gn(x) = 0 for x [—L - l/n,L + 1/n] and n G AT, and limn—oo gn(x) = /(x) for x G Bc. Hint: Define Ejk = lxe [—L,L] : < f(x) < (J + 1-)— | I AC AC J and set Sk = ^T~XEjk- Then {sfc}fc is a sequence of simple func- tions with 0 < Sk < f and f — Sk < M/к. By part (d), there is a sequence of continuous functions, and a Lebesgue measurable set, Cfc, such that A(Cfc) = 0, |pnfc| < Af, and limn—oognk(x) = Sk(x) for x £ Ck- Furthermore, the gnk$ can be chosen so that they vanish outside of [—L — 1,L 4- 1]. Now apply Exercises 3.62 and 3.63 to the sequence {ы}“=1- f) Let f be a nonnegative A4-measurable function that is bounded by the real number M. Then there is a Lebesgue measurable set, B, with
140 □ Chapter 3 Lebesgue Theory on the Real Line A(B) = 0 and a sequence of continuous functions, {дп}™=^ such that each gn vanishes outside a finite interval, 0 < gn(x) < M for я G and n G Af, and limn—oo gn(x) = f(x) for x e Bc. Hint: Apply part (e) and Exercise 3.62 to the function fn = X[-n,n]/- g) Let f be a nonnegative Ad-measurable function. Then there is a se- quence of nonnegative continuous functions, and a Lebesgue measurable set, B, with A(B) = 0, such that lim^oo gn(x) = f(x) for x G Bc. Hint: Apply part (f) to the function F = f /(1 4- /). 3.65 Let f be an Ad-measurable function. Then there is a sequence of continuous functions, {pnKXn and a Lebesgue measurable set В with A(B) = 0 such that limn-+oo gn(x) = f(x) for x G Bc. Hint: Use the fact that f = f + — f~, where /+ = / VO and f~ = —(/Л0), and apply Exercise 3.64(g). 3.66 Lusin’s theorem: The following result shows that, in a certain sense, a measurable function is close to being a continuous function: Let / be a Lebesgue measurable function and E a Lebesgue measurable set with A(B) < oo. Assume that А ({я G E : /(x) = ±oo}) = 0. Prove that for each e > 0, there is a Lebesgue measurable set A C E with A(A) < e such that / is continuous on E \ A. Hint: Employ Exercises 3.65 and 3.63. 3.67 Suppose that / is a nonnegative Ad-measurable function and that E G Ad. a) Let c > 0 and set Ac = { x G E : f(x) > c}. Prove that A(AC) < i f fdX. C J E b) Let A = { x G E : f(x) > 0}. Show that if fEfdX = 0, then A(A) = 0. 3.68 Suppose that / is a nonnegative Ad-measurable function and that E G Ad. Let A = { x G E : f(x) = oo }. Show that if fEfdX< oo, then A(A) = 0. 3.6 CONVERGENCE PROPERTIES OF THE LEBESGUE INTEGRAL FOR NONNEGATIVE FUNCTIONS An important problem in mathematics is to determine when it is permis- sible to interchange a limit and an integral. For example, suppose that {/nj^Li is a sequence of functions that converges pointwise. Under what conditions can we conclude that I lim fn = lim [ fn? J n—>oo n—>oo J As we noted at the end of Chapter 2, one significant advantage of the Lebesgue integral over the Riemann integral is that the interchange of limit and integral can be justified under less restrictive conditions. In this section and the next, we will develop theorems that provide sufficient conditions for the interchange of those two operations.
3.6 Convergence properties of the integral for nonnegative functions □ 141 Monotone Convergence Theorem The first theorem that we will discuss is called the monotone convergence theorem, or MCT for short. We begin with the following lemma. LEMMA 3.15 Suppose that s is a nonnegative simple function and that a sequence of Lebesgue measurable sets with С E2 C • • •. Then, I sdX = lim I s dX. n~*(X> JEn PROOF: For convenience, set E — UJXi En. Since s is a simple function, we can write s = akXAk- Then, by Proposition 3.8 on page 131, we have for each n G A/*, I s dX — A En)- ^En k=i Now, for each к = 1, 2, ..., m, consider the sequence, {Ak A Lebesgue measurable sets. Since E^ С E2 C • • • and E = U^Li it follows that Ak О Ei C Ak О E2 C • • • and U^=i ^n) = Ak П E. Thus, by Theorem 3.13 on page 124, limn—oo X(Ak A En) = A(Afc A E), for each к (1 < к < m). Consequently, *7П 771 lim / sdX = lim akX(Ak A En) = ak lim X(Ak^En) n—>OO n—.OO *—* ' П—too JE’> fc=l k=l 771 л = ХакХ(АкПЕ) = / sdX. k=i This completes the proof of the lemma. Before we state and prove the monotone convergence theorem (MCT), it will be useful to introduce two common conventions. First, if the integral of a function f is over all of 11, then the 1Z is often omitted; in other words, by convention, J* f dX = J^fdX. Second, we sometimes write fn f f to indicate that {/n)Xi *s a monotone nondecreasing sequence of functions that converges pointwise to the function /. And, likewise, we sometimes write fn | f to indicate that is a monotone nonincreasing sequence of functions that converges pointwise to the function f.
142 □ Chapter 3 Lebesgue Theory on the Real Line THEOREM 3.17 Monotone Convergence Theorem (MCT) Suppose that is a monotone nondecreasing sequence of nonneg- ative Lebesgue measurable functions that converges pointwise to a real- valued function; in other words, for each x € 0 < fi(x) < f2(x) < < fn(x) < and lirrin—oo fn(x) < 00Л Then I lim fn dX = lim [ fn dX JE n—^oo n^ooJE for each E € M. PROOF: For convenience, set f = linin-^o fn. For each E € АЛ, we have 0 < Хе/п T XEf- Hence, it suffices to prove the theorem for E = H. Be- cause fn < fn+i for all n € Af, Proposition 3.10(a) on page 136 implies that ffndX< j/n+i dX for all n € Thus, lim^oo J* fn dX exists (possibly infinite). Let L = lim^^ f fn dX. We must show that L = f f dX. First, fn < f for all n € Af, so it follows immediately that To establish the reverse inequality, let 0 < а < 1 and s be a nonnegative simple function dominated by f. Set En = {x : > as(x)} for each n € Af. Since /i < /2 < • • •, it is clear that Ei С E2 C • • •. Also, because 0 < a < 1, /п T /, and 0 < s < /, it follows that UXi = H- Applying Proposition 3.10(e) and Lemma 3.15, we conclude that a sdX = a lim / s dX = lim / as dX Jn n^°° JEn n^°° J En < limsup / fndX< lim / fndX = L. n—>oo JEn n—>00 Consequently, J's dX < a"1 L for each nonnegative simple function, s, that is dominated by f. This implies that / fdX= sup / sdX<a~1L^ Jn Q<s<f Jn s simple t Since, for each x G И, {fn(x)}^=1 is monotone nondecreasing, limn-*oo fn(x) exists but it may be 00. We assume here that the limit is finite for each x 6 'Jt although, as we will learn in Chapter 4, the theorem is also true without that restriction.
3.6 Convergence properties of the integral for nonnegative functions □ 143 for each 0 < a < 1. Letting a | 1 yields f f dX < L. This completes the proof of the theorem. Note: For a fixed E E the conclusion of the MCT remains valid if the hypotheses are satisfied only on E. (See Exercise 3.72.) Proposition 3.10 on page 136 lists several properties of the Lebesgue integral for nonnegative functions. Conspicuous by its absence is the addi- tivity property. By employing the MCT and Proposition 3.9 on page 134, that important property can now be established. PROPOSITION 3.11 Let f and g be nonnegative Lebesgue measurable functions. Then (f + ff)dX = [ fdX + f gdX for each E € M. PROOF: We first observe that, by Lemma 3.14 (page 133), the additivity property holds for simple functions. Next, we use Proposition 3.9 to select sequences of nonnegative simple functions, {sn}^=1 and such that sn T f and tn g. Noting that sn + tn T f + g, we can apply the MCT and Lemma 3.14 to conclude that [ tf + 9)dX= lim / (sn + tn) dX J m n—> 00 J m = lim I sndX+ lim I tndX= I fdX+ I gdA, as required. By induction, it follows immediately from Proposition 3.11 that if {A}fc=i a finite sequence of nonnegative Ad-measurable functions, then for each E € Л4. However, with the aid of the MCT, we can prove the following stronger result.
144 □ Chapter 3 Lebesgue Theory on the Real Line THEOREM 3.18 Suppose that is a sequence of nonnegative Lebesgue measurable functions such that fn converges to a real-valued function) Then, for each E € M, PROOF: For convenience, set f = fn and let gn = fk for each n € ЛЛ Then {pn}^=i is a monotone nondecreasing sequence of nonnegative Lebesgue measurable functions and gn J f. Thus, by the MCT and (3.27), У fdX = lim^ [ gndX = Jim~ [ ^fkdX E ~JEk=X /Efndx' n=l JE n k=lJE as required. COROLLARY 3.3 Let f be a nonnegative Lebesgue measurable function and {En}n be a sequence of pairwise disjoint Lebesgue measurable sets. Then L fdx=x[ fd^ J\JnEn n JEn In particular, if А, В € M and A A В = 0, then [ fdX= [ fdX+ [ fdX. (3.28) Улив J A JB PROOF: Because the Ens are pairwise disjoint, X|j e = U,nXEn and, S°’ XUn En ’ $ = ^n(XEn/)« Therefore, by Theorem 3.18, /dA = fax^'fdX = f^XEJ)dX n = X [xEjdX^ [ fdX. n J n The proof of Corollary 3.3 is now complete. 1 See the footnote on page 142.
3.6 Convergence properties of the integral for nonnegative functions □ 145 Remark: Equation (3.28) generalizes the property of Riemann integrals that we presented in Theorem 2.6(a) on page 84. Further Convergence Properties The MCT shows that it is permissable to interchange limit and integral for monotone nondecreasing sequences of nonnegative Lebesgue measurable functions. Two additional questions concerning integrals and sequences of nonnegative functions come to mind. Question 1: Suppose that {/n}^=i is a monotone nonincreasing sequence of nonnegative Lebesgue measurable functions; in other words, for each x G 7£, AW > Л(х) > • • > Ш > • • • > 0 . Is it true that the limit and integral can be interchanged, that is, does (3.29) The answer to Question 1 is no — in general, the limit and the integral cannot be interchanged! For example, define fn(x) == \x\/n for each x G R and n G ЛЛ Then fn | 0 pointwise; thus, linin-^o fn dX = 0. But, it is easy to see that fndX = oo for all n G M and, so, lim^oo fn dX = oo. Consequently, (3.29) fails in this case. With an additional condition, however, we can answer Question 1 in the affirmative. Specifically, we have the following theorem. THEOREM 3.19 Suppose that {fn}^=i is a monotone nonincreasing sequence of nonnegative Lebesgue measurable functions. Further suppose that f Д dX < oo. Then for each E e M. PROOF: For convenience set, f = limn-^o /n. As {/n}^Li is monotone nonincreasing, {Д - /n}^Li is a monotone nondecreasing sequence of non- negative Lebesgue measurable functions. Therefore, by the MCT, [ (A-/)dA= lim [ (f!-fn)dX. E n-*°° JE
146 □ Chapter 3 Lebesgue Theory on the Real Line Because ffidX < oo and fn < fi for all n € AT, Proposition 3.10 on page 136 implies that fEfndX < oo for n € AT and E € At. Also, by Proposition 3.11, f fidX= / ((A-/„) + /n)dA= / (/i-/n)dA+ / fndX. e Je Je Je Consequently, we see that fE(fi — fn) dX = fE fi dX — fE fn dX. This last equality also holds when fn is replaced by f. It now follows that [ fidX — [ fdX= [ fcdX- lim [ fndX. Je Je Je n-*oo jE Since all integrals in the previous equation are finite, the proof of the theorem is now complete. Question 2: Suppose that {/niXi is a sequence of nonnegative Lebesgue measurable functions that converges pointwise to a real-valued function. Does a general relationship hold between the sequence, {/B fn <^}n=1, and the number, fE lim^^ fn dX ? Of course, Irnin-^o fn dX need not exist and, so, (3.29) may not even make sense. The most that one can say in general is related by the following theorem. THEOREM 3.20 Fatou’s Lemma Suppose that is a sequence of nonnegative Lebesgue measurable functions that converges pointwise to a real-valued function. Then, for each EtM, [ lim /ndA<liminf [ fndX (3.30) JE П—^OQ П—OO JE PROOF: For convenience, set f = limn_>oo fn and let gn = infk>nfк for each n € АЛ Then {<7n}^=i is a monotone nondecreasing sequence of nonnegative Lebesgue measurable functions and gn T f pointwise (why?). Thus, by the MCT, I f dX = lim / gn dX. Je n~¥OC Je However, since gn < fn for each n € A, it follows from Proposition 3.10(a) that lim / 0ndA<liminf / fndX. n-*°° J e n“*°° Je The proof of Fatou’s lemma is now complete.
3.6 Convergence properties of the integral for nonnegative functions □ 147 EXAMPLE 3.4 Illustrates Strict Inequality in Fatou's Lemma This example shows that the inequality in (3.30) cannot be replaced by an equality. For each define {X[n,n+1] J X[n,n+2]? n odd; n even. Then fn —► 0 pointwise and, hence, in particular, limn-^o fn dX = 0. But, 1, n odd; 2, n even. and, so, liminfn^oo fn dX = 1. Thus, we see that [ fndX = Jn dX. lim fn dX < lim inf / fn П-+ОО n—>OO Consequently, the inequality in Fatou’s lemma cannot be replaced by an equality. □ EXERCISES 3.6 3.69 Let f be a nonnegative Lebesgue measurable function. Show that lim / fdX= I fdX. n“*°° J[-n,n] Jn 3.70 Let f be a nonnegative Lebesgue measurable function. For each n G A/", define fn = f /\n. Prove that limn—oo fE fn dX = fE f dX for each E G M. it3.71 Prove that Lemma 3.15 holds for all nonnegative A4-measurable functions. That is, if f is a nonnegative Lebesgue measurable function and is a sequence of Lebesgue measurable sets with Ei G E2 C • • •, then . I fdX= lim / fdX. </и°°Лп JEn V^7l=l 3.72 Show that for a fixed E G A4, the conclusion of the MCT remains valid if the hypotheses are satisfied only on E. In other words, let E be a Lebesgue measurable set and a sequence of nonnegative Lebesgue measurable functions that is monotone nondecreasing on E and converges pointwise to a real-valued function on E\ that is, for each x G E, 0 < fl(x) < f2{x) < < fn{x) <
148 □ Chapter 3 Lebesgue Theory on the Real Line and limn-oo fn(x) < oo. Prove that I lim fn dX = lim / fn dX. JEn^oo 3.73 Provide an example where strict inequality holds in Fatou’s lemma and where limn—oo f fn dX exists. 3.74 Let {fn}^=1 be a sequence of nonnegative M-measurable functions such that fn—>f pointwise and ffndX—> f f dX < oo. Show that for each E e Л4, fE fndX —► fE f dX. Hint: Use Fatou’s lemma and the inequality, limsupn_>oo(an + bn) > limsupn_>oo an + lim infn—+OO bn- 3.75 Suppose f is a nonnegative M-measurable function and {En}^=1 С M with Ei Э £?2 D • • •. Further suppose f^fdX< oo. Prove that I fdX= lim / fdX. n^JEn Hint: Apply Theorem 3.19. 3.76 Supply a proof for the following improved version of Fatou’s lemma: Sup- pose that is a sequence of nonnegative Lebesgue measurable func- tions. Then I lim inf fn dX < lim inf / fn dX JE n^°° n“*°° JE for each E e M. +3.77 Suppose f is a nonnegative .M-measurable function with ^fdX < oo. Then we define the Laplace transform of /, denoted, F, by F(t)= [ e~txf(x)dX(x), t > 0. J [0,oo) Show that a) F is real valued. b) F is continuous on [0, oo). Hint: First establish that F is nonincreasing, c) lim*-» oo F(t) = 0. 3.78 Establish the following results: a) If О is an open set, then A(O) = sup { J^f dX: Q < f <xo and f continuous } . Hint: Consider fn(x) = (d(a;, Oc)/[1 -I- d(x, Oc)]) n , for n € ЛЛ
3.7 The General Lebesgue Integral □ 149 b) If F is a closed set, then A(F) = inf y/dA : / > xf and f continuous j-. Hint: If A(F) < oo, select an appropriate open set, О D F, and consider fn(x) = (d(x, Oc)/[d(x, Oc) + d(x, F)])n, for n e ЛЛ 3.79 In the next section, we will see how to define the Lebesgue integral for Lebesgue measurable functions that are not necessarily nonnegative. As- suming that can be done, construct a sequence of Lebesgue measurable functions for which the conclusion of Fatou’s lemma fails. Hint: A se- quence consisting of characteristic functions and negatives of characteristic functions will do the trick. 3.7 THE GENERAL LEBESGUE INTEGRAL Up to this point, we have defined the Lebesgue integral only for nonnegative Lebesgue measurable functions. In this section, we will define the Lebesgue integral for arbitrary Lebesgue measurable functions and present some of its most important properties. Definition of the General Lebesgue Integral Basically, the Lebesgue integral of an arbitrary Ad-measurable function, /, is obtained as follows: (1) express f as the difference of two nonnegative functions and (2) define the Lebesgue integral of f to be the difference of the Lebesgue integrals of the two nonnegative functions. To make this idea precise, we begin by defining the positive and negative parts of a function. DEFINITION 3.15 Positive and Negative Parts of a Function Suppose that f is a real-valued function. Then the positive part of /, denoted by /+, is defined by /+ = f VO = max{/, 0} and the negative part of /, denoted by is defined by f~ = -(/ A 0) = - min{/, 0}. Note that both /+ and f~ are nonnegative functions. Proposition 3.12 states some other basic properties of the positive and negative parts of a function. The proof of the proposition is left as an exercise for the reader.
150 □ Chapter 3 Lebesgue Theory on the Real Line PROPOSITION 3.12 Suppose that f is a real-valued function on H. Then b) 1/1 =/+ + /-. c) If f is Lebesgue measurable, then so are and f . We now see that if f is a Lebesgue measurable function, then it can be expressed as the difference of two nonnegative Lebesgue measurable functions; namely, f = f + — f~. Consequently, it is quite natural to define the Lebesgue integral of an arbitrary Lebesgue measurable function in the following way: DEFINITION 3.16 Lebesgue Integral; Lebesgue Integrable Let f be a Lebesgue measurable function and E € Л4. Then the Lebesgue integral of f over E is defined by [ f(x)dX(x) = f /+(x)dA(x)- i f~(x)dX(x) (3.31) J E J E J E provided that the right-hand side makes sense; that is, at least one of the integrals on the right-hand side of (3.31) is finite. In addition, we say that f is Lebesgue integrable over E if both integrals on the right-hand side of (3.31) are finite or, equivalently, if ( |/(a:)|dA(a:) = f f+(x) dX(x) 4- f f (x) dA(x) < oo. (3.32) E JE JE If f is Lebesgue integrable over 11, then we say that f is Lebesgue integrable. EXAMPLE 3.5 Illustrates Definition 3.16 a) Let /(-> = { x > 0; x < 0. Then f dX is not defined. Indeed, /+ = X[o,oo) and f~ = X(-oo,0)> so that both dX and f^f~dX are infinite. However, the Lebesgue integral of f is defined (and, in fact, f is Lebesgue integrable) over any Lebesgue measurable set with finite measure. For instance, if
3.7 The General Lebesgue Integral □ 151 E = [—3,4], then fE /+ dX = 4 and fE f dX = 3, so that we have JE f dX = 4 - 3 = 1 and fE |/| dX = 4 + 3 = 7 < oo. b) We can generalize part (a): If f is a bounded Lebesgue measurable function, then f is Lebesgue integrable over any measurable set, E, with X(E) < oo. For, if | < L, then by Proposition 3.10(a), c) Let [ |f|dA < [ LdX e Je = f LXEdX = LX(E) Jn /(x) = 2, < -3, ,o, 0 < x < 1; x > 1; elsewhere. Then f+ = 2x(o,i) and f~ = 3x[i,oo) and, so, fnf+dX = 2 and f~ dX = oo. This implies that f is not Lebesgue integrable over 1Z al- though the Lebesgue integral is defined and JR/dA = 2 — oo = —oo. □ Properties of the General Lebesgue Integral The next theorem provides some important properties of Lebesgue inte- grable, functions. In proving this theorem, we will employ the following lemma whose proof we leave as an exercise for the reader. LEMMA 3.16 Suppose that f is a Lebesgue measurable function and that E € Al. Fur- ther suppose that f = /i — /2, where /1 and /2 are nonnegative and Lebesgue integrable over E. Then f is Lebesgue integrable over E and f fdX= [ fcdX- [ f2dX. e J e J e THEOREM 3.21 Suppose that f and g. are Lebesgue integrable over E G At and that а € H. Then a) f 4- д is Lebesgue integrable over E and [ (f + g)dX = [ fdX+ [ gdX. E J E J E
152 □ Chapter 3 Lebesgue Theory on the Real Line b) a f is Lebesgue integrable over E and / afdX = a / fdX. JE JE c) f <9=> JEfdX< JEgdX. d) \fEfdX\<fE\f\dX. e) If A and В are measurable subsets of E with АПВ = 0, then [ fdx= [ fdX + [ fdX. JAUB J A JB PROOF: a) Since \f 4-g\ < \ f\ -I- |p|, it follows from Proposition 3.10(a) on page 136 and Proposition 3.11 on page 143 that f 4- g is Lebesgue integrable over E. Now, we have f 4- g = (У+ 4- p+) — (/“ 4- p“). Hence, by Lemma 3.16 and Proposition 3.11, we conclude that f(f + g)dX= f(f++g+)dX — f (f-+g~)dX J E J E J E = [ f+dX+ [ g+dX —[ f~dX- [ g~ dX JE JE JE JE = [ f+dX— [ f-dX+ [ g+dX- f g~ dX JE JE JE JE = [ fdX+ [ gdX. JE JE b) Since \af\ = |а||У|, Proposition 3.10(e) implies that af is Lebesgue integrable over E. If a > 0, then (a/)+ = af+ and (af)~ = af~. Thus, by Proposition 3.10(e) again, [ afdX= [ af+dX- [ af~ dX E J E J E = a [ f+dX-a [ f~dX = a [ fdX. JE JE JE If a < 0, then (a/)+ = — oif and (a/) = —af+. Consequently, by Proposition 3.10(e), [ afdX= [ (-af-)dX- [ (-af+)dX E JE JE = a( [ f^dX— [ f~dXj =a i f dX. \J E J E / J E
3.7 The General Lebesgue Integral □ 153 с) / <p=>p-/>0=> fE(g - /) dX > 0 by Proposition 3.10(a). Now applying parts (a) and (b), we deduce that [ gdX — [ fdX — f (g — f) E JE JE dX > 0. In other words, fEfdX< fEg dX, d) Because f < \ f\ and — f < |/|, we can use parts (b) and (c) to conclude that [ fdX< [ \f\dX E JE and - [ fdX< [ \ f\dX. JE JE These last two relations imply that |/E /dA| < fE \ f\ dX. e) Since A С E, \xa/\ < \xe)\ and, so, \f\ dX < fE \f\ dX. Thus, f is integrable over A or, equivalently^ Ха/ is integrable over E. Similarly, Xef is integrable over E. However, because А П В = 0, we have that XaubJ = XAf + XBf- Therefore, by part (a), [ XAuefdX= [ XAfdX+ [ XBfdX. JE JE JE Since A and В are subsets of E, the previous equation is equivalent to AUB fdX= [ fdX+ [ fdX. J A J В This completes the proof of the theorem. Remark: Parts (a) and (b) of Theorem 3.21 together imply that if a, /3 € and f and g are Lebesgue integrable over E, then [ (af + (3g)dX = a f fdX + p[ gdX. E J E J E This is called the linearity property of the Lebesgue integral. The next theorem, called the dominated convergence theorem, or DCT for short, is one of the most important theorems in analysis. Like the mono- tone convergence theorem, it gives sufficient conditions for the interchange of limit and integral.
154 □ Chapter 3 Lebesgue Theory on the Real Line THEOREM 3.22 Dominated Convergence Theorem (DCT) Suppose that {/n}^=i is a sequence of Lebesgue measurable functions that converges pointwise to a real-valued function. Further suppose that there is a nonnegative Lebesgue integrable function, g, such that \fn\ < g for all n 6 AT- Then / lim fn dX = lim / fn dX for each E 6 M. PROOF: For convenience, set f = limn__»oo fn- Because \ fn\ < g and g is Lebesgue integrable, it follows that /, /i, /2, • • • are Lebesgue integrable. Now, g — > 0 for all n 6 Af and g — fn -+ g — f pointwise. Thus, by Fatou’s lemma (page 146) and the linearity of the integral, dX < lim inf / (p — fn) dX n-°° Je = lim inf n—ЮО / gdX — limsup / fndX. Je n-ЮО Je Since the previous integrals are all finite, we conclude that lim sup / fndX < n—>00 Je (3.33) On the other hand, we also have that g + fn > 0 for all n € Af and g + fn 9 + f pointwise. Applying Fatou’s lemma again, we obtain the relations f gdX+ [ fdX = [ (g + f)dX Je Je Je < lim inf / (g + fn)dX= / pdA-hliminf / fndX n-°° Je Je 71-400 Je or, in other words, [ /dA<liminf [ fndX. (3.34) Je 71-400 Je From (3.33) and (3.34), we see that lim sup / fndX = lim inf / fn dX n—>00 Je n—юо JE I fdX. E
3.7 The General Lebesgue Integral □ 155 This last fact implies that lim^oo fE fn dX exists and [ fdX = lim [ fndX, JE n—oo JE as required. Note: For a fixed E € M, the conclusion of the DCT remains valid if the hypotheses are satisfied only on E. (See Exercise 3.87.) EXAMPLE 3.6 Illustrates the DCT a) In general, the conclusion of the DCT may fail if there is no dominating integrable function, g. For instance, let fn = nX(6,±)« Then fn —► 0 pointwise. Moreover, ffndX = 1 for all n E Л'. Thus, [ lim fn dX = 0 / 1 = lim [fn dX. J n—+OO n—ЮО J The problem here is that there is no integrable function that dominates the sequence {/n}Xr b) Let fn(x) = xnX[o,i] (я) for x € 7£, n e ЛЛ Then fn —> X{i} pointwise. Now, \fn\ < X[o,i] for all n € Af and, clearly, X[o,i] is Lebesgue integrable. Thus, by the DCT, lim I xnd,X(x)= [ X{i}(x)dA(x) = A({1}) =0. n“*°° ./[0,1] </[0,1] Note: Theorem 3.23 on page 157 provides a simpler way to obtain this result. □ There are many corollaries of the DCT. Two of the most important are stated in what follows. The proofs of these two corollaries are left as exercises for the reader. COROLLARY 3.4 Suppose that {/n}Xi is a sequence of Lebesgue measurable functions such that 52X1 \fn\ converges to a Lebesgue integrable function. Then 52X1 fn is Lebesgue integrable and г 00 _°° г JEn=l n=lJb for each E 6 A4.
156 □ Chapter 3 Lebesgue Theory on the Real Line COROLLARY 3.5 Let f be a Lebesgue integrable function and {En}n a sequence of pairwise disjoint Lebesgue measurable sets. Then Un En n J En The Lebesgue Integral is an Extension of the Riemann Integral We will now establish that the Lebesgue integral is indeed an extension of the Riemann integral. In other words, we will show that a Riemann integrable function is also Lebesgue integrable and that the two integrals are equal. First, we need the following lemma. LEMMA 3.17 Let f be a bounded Lebesgue measurable function on [a, 6]. Then f is Lebesgue integrable over [a, b] and, moreover, / f(x) dX(x) = sup / s(x) dX(x) J[a,b] s<f J[a,b] s simple (3.35) and [ /(*) dX(x) = inf J [a,6] t simple f t(x) dX(x) [a,b] (3.36) PROOF: Example 3.5(b) on page 151 shows that f is Lebesgue integrable over [a,b\. We will prove (3.35). The proof of (3.36) is similar and is left as an exercise. First note that if s is a simple function with s < f, then, by Theo- rem 3.21(c) on page 152, f^a bjsdX < b] f dX. Consequently, sup / s(x)dX(x)< / /(x)dA(x). [a.,6] ' •/ [<X,b] s simple It remains to prove the reverse inequality. To accomplish that, we will construct a sequence {5n}^=1 of simple functions with sn < f for all n € N and I fdX= lim / sndX. (3.37) J[a,b] n-°° J[a,b]
3.7 The General Lebesgue Integral □ 157 Set L = sup { | f(x) | : x E [a, 6] }. Then f 4- L is nonnegative and Lebesgue measurable on [a, 5]. Applying Proposition 3.9 (page 134), we obtain a sequence {un}^=1 of nonnegative simple functions such that un T f 4- L, Setting sn = un — L, we see that {«n}^Li is a sequence of simple functions such that sn < f for all n E V and sn f pointwise on [а, 6]. Furthermore, because |sn| < L on [a, b], the DCT implies that (3.37) holds. THEOREM 3.23 Suppose that f is Riemann integrable on [a, b]. Then f is Lebesgue inte- grable on [a, b] and [ /(a) dX(x) = [ f(x) dx. v/[a,b] J а PROOF: To begin, we extend the domain of f to all of 1Z by defining /(#) = 0 for x E [a, b]c. Now, since f is Riemann integrable on [a, 6], it is bounded thereon. So, to prove that f is Lebesgue integrable on [a, b], it suffices to show that f is Lebesgue measurable (why?). Let О be an open subset of 11. We must verify that /~1(О) e Л4. Set 2?={xE7£:/is discontinuous at x }. We have /-i(O) = (/-^O) A E) U (Г т(0) A Ec). (3.38) Clearly, E C [a, b] and, consequently, by Theorem 2.7 on page 86, X(E) = 0. But, every subset of a set with Lebesgue measure zero is Lebesgue measur- able (Exercise 3.32). Hence, the first intersection on the right of (3.38) is a Lebesgue measurable set. Next we show that the second intersection on the right of (3.38) is a Lebesgue measurable set. To begin, note that /~1(O) A Ec = /^С(О). Now, by the definition of E, the function f\£C is continuous. Therefore, by Theorem 2.5 on page 66, /^(O) is an open subset of Ec. In view of Theorem 2.3 on page 62, there is an open subset U of 1Z such that /|^l(O) = U A Ec. Both sets in this last intersection are Lebesgue measur- able (why?). Thus, f-\O) Г\ЕС e Л4. To complete the proof of the theorem, we must prove that the Riemann and Lebesgue integrals of / over [a, b] are equal. First recall that every step function is a simple function and that the Riemann and Lebesgue integrals agree for step functions (because the Lebesgue measure of an interval is the length of the interval). Applying Lemma 3.17 and the definition of the
158 □ Chapter 3 Lebesgue Theory on the Real Line Riemann integral, we now obtain that sup 9<f & step function sup / s(x)dX(x) s<f J[a,b] s simple = / /(*) dX(x) = J [a, b] inf t>f t simple / t(x) dX(x) J[a,b] inf h>f h step function l*b rb I h(x) dx = I f(x) dx. a J a These relations imply that dx = / /(x)dA(x), J[a,b] as required. We have now verified that the Lebesgue integral is indeed a general- ization of the Riemann integral. Consequently, we will frequently denote the Lebesgue integral of f over [a, b] by b f(x) dx regardless of whether f is Riemann integrable over [a, b]. In other words, the notation for the Riemann integral is also used for the Lebesgue integral. Moreover, as previously mentioned, we will often write f(x) dx instead of Je/W ^(z). EXAMPLE 3.7 Illustrates the Lebesgue and Riemann Integrals a) By Theorem 3.23, r rb &n+i _ an+i / xndX(x)= / xn dx —-------------------. J[a,b] J а П 4- 1 b) Clearly, xq is Lebesgue integrable over [0,1]. However, it is not Rie- mann integrable on [0,1] because it is discontinuous everywhere. c) Define f(x) = 1/y/x, for 0 < x < 1, and zero otherwise. Note that f has only two discontinuities and, hence, the set of points of discontinuity of f
3.7 The General Lebesgue Integral □ 159 has measure zero. But, f is not Riemann integrable on [0,1] because it is not bounded. It is, however, Lebesgue integrable on [0,1], as we will now show. For each n e let fn = X[i/n,i]/- Then fn is Riemann integrable on [0,1] and, so, by Theorem 3.23, / /n(a:)dA(rc) = [ fn(x)dx = [ ^==2-2x/n 4 ./[0,1] Jo J± Vх Now, {fn}^=1 is a monotone nondecreasing sequence of nonnegative Lebesgue measurable functions and fn —> f pointwise. Applying the MCT, we conclude that [ f(x) dX(x) = lim I fn(x) dX(x) = 2 < oo. •/[0,1] n-°° J[o,i] Hence, f is Lebesgue integrable over [0,1]. □ EXERCISES 3.7 3.80 Prove Proposition 3.12 on page 150. 3.81 Determine the positive and negative parts of the following functions: a) sinx. b) x2 — 4. c) |z|. 3.82 Prove Lemma 3.16 on page 151. 3.83 Show that if f is Lebesgue integrable (over 7£), then it is Lebesgue integrable over E for each E € M. 3.84 Prove Corollary 3.4 on page 155. 3.85 Prove Corollary 3.5 on page 156. 3.86 Suppose that f is Lebesgue integrable over E and that is a se- quence of Lebesgue measurable sets with Ei С E2 C • • • and ~ Prove that I fdX = lim / fdX. Je n-*°° JEn 3.87 Let E be a Lebesgue measurable set and {fn}™=1 a sequence of Lebesgue measurable functions that converges pointwise on E to a real-valued func- tion. Suppose that g is Lebesgue integrable over E and that \fn(x)| < g(x) for n € AT, xtE. Prove that I lim fn dX = lim / fn dX. JEn-^oo
160 □ Chapter 3 Lebesgue Theory on the Real Line 3.88 Bounded convergence theorem (BCT): Let E 6 M with A(F) < oo and {/n}^! a sequence of Lebesgue measurable functions that converges pointwise on E to a real-valued function. Further suppose that there is an M Gfa such that |/n(x)| < M for n G A/*, x E E. Show that lim fn dX = lim / fndX. 71—‘OO 71—‘OO 3.89 Construct an example where \fn\ < M for all n G AT, fn—>f pointwise but J fndX f fdX. Why doesn’t this contradict the BCT? 3.90 Complete the proof of Lemma 3.17 on page 156 by establishing (3.36). *3.91 Theorem 3.23 on page 157 shows that every Riemann integrable function is also a Lebesgue integrable function. This refers only to the proper Riemann integral. In this exercise, we will exhibit a function that has an improper Riemann integral but is not Lebesgue integrable. Let /(®) = 1 sinx x ' 1, x 0; x = 0. Show that a) f has an improper Riemann integral over 7Z equal to 7Г. b) f is Lebesgue measurable. c) f is not Lebesgue integrable over IV 3.92 Show that, if f is Lebesgue integrable (over TV) and the improper Riemann integral exists, then f(x) dX(x) = f(x} dx. 3.93 Prove that the results of Exercise 3.77 on page 148 remain valid if f is Lebesgue integrable over [0, oo). 3.94 Let {fn}^=1 and be sequences of Lebesgue measurable functions and E G AL Suppose that on E, \fn\ < gn, fn —> f, and gn —> g. Fur- ther suppose that g, <?i, c/2, • • • are Lebesgue integrable over E and that fE gn dA — JEg dX. Prove that jE fn dX-> fEf dX. 3.95 For E ClZ and a G 7£, let E+a = { x + a : x G E } and aE = { ax : x G E }. Suppose that f is a Lebesgue integrable function. a) Show that I f(x -h a) dX(x) = I f(x)dX(x) Jit Jit and, if a / 0, I f(ax)dX(x) = pr / f(x)dX(x). Jit lai Jit Hint: Start with the case / = Xa, where A G M.
3.8 Lebesgue Almost Everywhere □ 161 b) Show that, for E E M, I f(x + a) dX(x) — I f(x) dX(x) and, if a 0, /(ax) dA(x) = j^i Д /(X) dA(a:)- 3.96 Consider a function F: 7Z x I —♦ 7£, where I is a nonempty open interval. Suppose that dF/dt exists at each point of 7Z x Z, F(-,to) is Lebesgue integrable for some to € Z, and there is a Lebesgue integrable function G such that 137(ж,t)| < G(x) for x € 7Z and t E I. Prove that F(-,t) is Lebesgue integrable for each t E I and that F(x,t)dX(x) = j ^L(x,t)dX(x). 3.97 Consider a function F.7Z x T —* 7Z, where T C 71. Suppose that F(-,t) is Lebesgue measurable for each t E T and that there is a Lebesgue integrable function g such that |F(rr,t)\ < g(x) for x E 7Z and t E T. Establish the following: a) If F(x, •) is continuous on T for each x E 7Z, then the function defined on T by f(t) = f F(x, t) dX(x) is continuous. b) If T is an interval of the form (6, oo) and if lim*—oo F(x,t) exists for each x E 7Z, then lim / F(x, t) dX(x) = / lim F(x, t) dX(x). t-^ooj J t—*oo 3.98 Provide an example to show that the conditions given in the DCT are not necessary for the interchange of limit and integral. 3.8 LEBESGUE ALMOST EVERYWHERE Frequently, we are not concerned whether a certain property holds ev- erywhere as long as it holds “most places.” For example, in order for a bounded function, /, to be Riemann integrable on [a, b], it does not have to be continuous everywhere on [a, b] — all that is required is that the set of points at which f is discontinuous have measure zero. Consider, also, the sequence of functions, /n(x) = X[-1,1] (#)#n, for n E Л'. That sequence of functions does not converge pointwise on 7Z, but it almost does. Indeed, fn(x) X{i}(#) except when x = —1. As the
162 □ Chapter 3 Lebesgue Theory on the Real Line Lebesgue (or Riemann) integral is not affected by the value of a function at a single point, the lack of convergence of {fn}^-i at x = — 1 should really not disturb any convergence results involving the integral. In this section, we will define the concept of a property holding almost everywhere and show that our previous results for the Lebesgue integral remain valid when “everywhere” is replaced by “almost everywhere.” DEFINITION 3.17 Lebesgue Almost Everywhere A property is said to hold Lebesgue almost everywhere, or A-ae for short, if it holds except on a set of Lebesgue measure zero, that is, except on a set N with A (TV) = 0. EXAMPLE 3.8 Illustrates Definition 3.17 a) Two functions, f and g, are equal Lebesgue almost everywhere, written f = 9 A-ae, if A ({ x : g(x) / Дж) }) = 0. b) A sequence of functions, {/n}£Li> converges Lebesgue almost everywhere to f, written fn—*f A-ae, if limn_4oo fn(x) = f(x) except on a set of Lebesgue measure zero. In other words, fn -> f A-ae if and only if A ({ x : lim^oo /n(x) / f(x) }) = 0. □ Out first proposition demonstrates that a function equal almost every- where to a Lebesgue measurable function is itself Lebesgue measurable. PROPOSITION 3.13 Suppose that f is a Lebesgue measurable function and that g = f A-ae. Then g is Lebesgue measurable. PROOF: Set В = { x : g(x) = f(x) } and let О be an open set. We claim that ^~1(O) E M. To begin, we write g~\O) = (5-40) П B) U П Bc). (3.39) We will show that both intersections on the right of (3.39) are Lebesgue measurable sets. As g = f on B, it follows that ^~1(O) Г\В = /-1(О)ПВ. However, this last intersection is a Lebesgue measurable set because В € M (why?) and f is Lebesgue measurable. Hence, the first intersection on the right of (3.39) is a Lebesgue measurable set.
3.8 Lebesgue Almost Everywhere □ 163 Now, by assumption, A(BC) = 0. Therefore, the second intersection on the right of (3.39) is a Lebesgue measurable set because it is a subset of a set having Lebesgue measure zero. Our next result shows that the collection of Lebesgue measurable func- tions is closed under almost-everywhere limits. More precisely, we have the following proposition. PROPOSITION 3.14 Suppose that {/n}Xi a sequence of Lebesgue measurable functions and that fn —* f A-ae. Then f is a Lebesgue measurable function. PROOF: Set В = { x : lim^oo fn(x) = }. Then A(BC) = 0. Let 9n = Хв/п and д — x&f. Then is a sequence of Al-measurable functions and gn —* g pointwise. Hence, by Theorem 3.16(c) on page 130, g is Al-measurable. But f = g A-ae and, consequently, f is Af-measurable by Proposition 3.13. Remark: We should point out that Propositions 3.13 and 3.14 are not valid for Borel measurable functions. This is because subsets of Borel sets of measure zero are not necessarily Borel spts. (See Exercise 3.99.) Next, we will prove that the Lebesgue integral of a function is not affected by changing its values on a set of measure zero. PROPOSITION 3.15 Let f and g be Lebesgue measurable functions with f = g A-ae. If f is Lebesgue integrable, then so is g and, moreover, for each E € M, PROOF: Set В = {x : g(x) — f(x)}. Then, by assumption, A(BC) = 0. Applying Corollary 3.3 on page 144 and Proposition 3.10(d) on page 136, we find that Therefore, g is Lebesgue integrable.
164 □ Chapter 3 Lebesgue Theory on the Real Line Now, let E G M. Then, by Theorem 3.21(e) on page 152, E ЕПВ ЕПВС ЕПВ I gdX EC\BC (3.40) ЕГ\ВС I gdx. ЕПВС E We will complete the proof of the proposition by showing that the last two integrals in (3.40) are zero. Employing Theorem 3.21(d) and Proposi- tion 3.10(d), we deduce that ЕПВ1 ( |/|dA = o. ЕПВС Similarly, JEfV3cgdX = 0. We often encounter functions that are only defined Lebesgue almost everywhere. Since the integral of a Lebesgue measurable function is not affected by its values on a set of measure zero, it is reasonable to make the following definition. DEFINITION 3.18 Integral of a Function Defined Almost Everywhere Suppose that f is a function defined Lebesgue almost everywhere; that is, if D is the domain of /, then X(DC) = 0. Further suppose that there is a Lebesgue measurable function, g, such that g(x) = f(x) for x G D. Then, for E G At, we define the Lebesgue integral of f over E by [ fdX = [ gdX E JE provided that the integral on the right-hand side exists (i.e., the in- tegrals of the positive and negative parts of g over E are not both infinite). Finally, we should point out that Fatou’s lemma and the DCT remain valid if the hypothesis of pointwise convergence is replaced by convergence A-ae. The proofs are left as exercises for the reader.
3.8 Lebesgue Almost Everywhere □ 165 f№ = | EXERCISES 3.8 *3.99 Show that a subset of a Borel set of Lebesgue measure zero is not neces- sarily a Borel set. Hint: Refer to Exercise 3.50 on page 127. *3.100 Show that Proposition 3.13 (page 162) fails for Borel measurable functions. 3.101 For Lebesgue measurable functions, f and g, define f ~ g if and only if f = g A-ae. Prove that ~ is an equivalence relation. 3.102 Respond True or False to each of the following statements. Justify your answer. a) If f is continuous A-ae, then f is equal to a continuous function A-ae. b) If f is equal to a continuous function A-ae, then f is continuous A-ae. 3.103 Let be a sequence of Lebesgue measurable functions such that limn->oo fn (x) exists A-ae. Define lim /n(x), if lim fn(x) exists; n—ЮО n—юо 0, otherwise. Prove that f is Lebesgue measurable. 3.104 Verify that Definition 3.18 is well posed. That is, assume g and h are Lebesgue measurable functions that equal f on its domain, D. Show that, for E € Ad, either fEhdX = fEgdX or neither integral exists. 3.105 Show that the DCT (page 154) remains valid if convergence pointwise is replaced by convergence A-ae. In other words, suppose that {fn}™-! is a sequence of Lebesgue measurable functions that converges A-ae to a real- valued function. Further suppose that there is a nonnegative Lebesgue integrable function, p, such that \fn\ < g for all n € A/*. Prove that I lim fn dX = lim / fn dX Je n^°° n^°° Je for each E € Л4. Note: limn—oo fn is not defined on all of 71 unless, of course, {fn}™=1 converges everywhere. 3.106 Show that Fatou’s lemma (page 146) remains valid if convergence pointwise is replaced by convergence A-ae. 3.107 Verify that Egorov’s theorem, Exercise 3.63 on page 139, remains valid if fn~*f A-ae on E. 3.108 Let f and g be Ad-measurable functions with f\f — g\dX = 0. Prove that f = g A-ae. 3.109 Show that, if f is Lebesgue integrable and fE f dX = 0 for each E € Л4, then f = 0 A-ae.
Henri Lion Lebesgue (1875-1941) Henri Lebesgue was born at Beauvais, France, on June 28, 1875. He attended the Ecole Nor- male Superieure in Paris between 1894 and 1897, where he was a student of fenile Borel. He worked on his doctoral thesis between 1899 and 1902 while teaching mathematical science at the 1усёе in Nancy and received his doctor- ate from the Sorbonne in 1902, Lebesgue did research in many different areas of mathematics, among which were function theory, set theory, and the calculus of variation. His and Emile Borel's work provided the foundation for the modern theory of functions of a real variable, Lebesgue’s interest in Riemannian integration and its associated prob- lems led to his creation of the Lebesgue integral in 1902. Not only has the Lebesgue integral been important to the amplification of the theory of trigonometric series, curve rectification, and calculus, but it has also proved central to the development of measure theory. Many honors were bestowed upon Lebesgue. Among these were the Prix Houllevique in 1912, the Prix Poncelet in 1914, and the Prix Sain- tour in 1917. He was elected to the French Academy of Sciences in 1922 and to the Royal Society in 1934. Lebesgue taught at the University of Rennes from 1902-1906; at the University of Poitiers from 1906-1910, at the Sorbonne from 1910-1921; and, finally, at the College de France. He died in Paris on July 26, 1941. 166
□ □ Measure Theory In Chapter 3, the collection of continuous functions was expanded to the collection of Borel measurable functions, the smallest algebra that contains the continuous functions and is closed under pointwise limits. We then extended the Riemann integral so that it applies to all Borel measurable functions and, in doing so, we encountered Lebesgue measure, the collection of Lebesgue measurable functions, and the Lebesgue integral. We will discover, in this chapter, that the concepts and methods of Chapter 3 lend themselves to considerable generalization with relatively little effort and huge rewards. This generalized theory has extensive ap- plications throughout mathematics and, as well, to a large variety of fields outside of mathematics. 4.1 MEASURE SPACES When we examine the definition of the Lebesgue integral carefully, we find that it depends ultimately on the concept of measure. More precisely, the mathematical framework requires a set, a a-algebra of subsets, and a set function that assigns to each set in the a-algebra a nonnegative number (its 167
168 □ Chapter 4 Measure Theory measure). In Chapter 3, this consisted, respectively, of 7£, Л4, and A. But we can abstract the mathematical framework to provide a broader setting for the integral. We begin by considering the general concept of measure. In developing Lebesgue measure, we imposed three conditions; namely, Conditions-(Ml )-(M3) on page 105. The first two conditions are specific to the generalization of length; but the third is not. In fact, Condition (М3), the countable-additivity condition, is the primary property of an abstract measure. DEFINITION 4.1 Measure, Measurable Space, Measure Space Let Q be a set and A a a-algebra of subsets of Q. A measure, /z, on A is an extended real-valued function satisfying the following conditions: a) /z(A) > 0 for all A G A. b) m(0) = O. c) If Ai, A2, ... are in A, with Ai ClAj = 0 for i / J, then м(ил«) = ' n ' n The pair (Q, A) is called a measurable space and the triple (Q, A, /z) is called a measure space. Note: We will often refer to members of A as А-measurable sets. We should point out the following fact: If /z satisfies (a) and (c) of Definition 4.1, then it is a measure (i.e., also satisfies (b)) if and only if there is an A G A such that /z(A) < 00. We leave the proof of this fact to the reader. EXAMPLE 4.1 Illustrates Definition 4.1 a) (7£, Л4, A) is a measure space, the one that we studied in Chapter 3. b) (7J, B, A|#) is a measure space. c) Suppose that (Q,A,/z) is a measure space and that D G A. Define Ap = {D П A : A G A} and /zp = /Z|XD• Then Ap is a a-algebra of subsets of D, /zp is a measure on Ap, and, hence, (D, Ap,/zp) is a measure space. d) Referring to part (c), let Q = 7£, A = M, /z = A, and D = [0,1]. Then ([0,1], At[0,1], A[o,ij) is a measure space. A[0,i] *s ca^ed Lebesgue measure on [0,1]. More generally, if D is any Lebesgue measurable
4.1 Measure Spaces □ 169 set, then (D, A4p,Ap) is a measure space and Ap is called Lebesgue * measure on D. e) Refer to part (c). By Theorem 3.7 on page 102, if D G B, then we have BD = B(D). f) Let Q be a nonempty set and A = P(fl). Define /z on A by „(E)-“Eis“‘e; [ oo, if E is infinite, where N(E) denotes the number of elements of E. Then ц is a measure on A and is called counting measure. g) Let Q = Af, Л = P(Af), and /z be counting measure on Л, as defined in part (f). Then, for instance, /z(Af) = oo and /z({l, 3}) = 2. We will see later that (Af, P(Af),/z) is the appropriate measure space for the analysis of infinite series. h) Suppose that (Q,X,/z) is a measure space. If /z(Q) = 1, then (fl, A, /z) is called a probability space and /z a probability measure. Fur- thermore, /z is usually replaced by a P (for probability). Two simple examples are as follows: (i) ([0,1],Л4[од], A[o,i]) is a probability space since A([0,1]) = 1. It is an appropriate measure space for analyzing the experiment of selecting a number at random from the unit interval. (ii) Consider the experiment of tossing a coin twice. The set of possible outcomes for that experiment is fi = {НН, HT, TH, TT} where, for instance, HT denotes the outcome of a head on the first toss and a tail on the second toss. Set A = P(fl) and, for E G Л, define P(E) = 7V(P)/4 where, as before, N(E) denotes the number of elements of E. Then (П,Л, P) is a probability space—the appro- priate measure space to use when the coin is balanced (i.e., equally likely to come up heads or tails). To illustrate: The probability of getting at least one head in two tosses of a balanced coin is P({HH, HT, TH}) = 3/4. i) Let fi be a nonempty set, {rrn}n a sequence of distinct elements of Q, and {an}n a sequence of nonnegative numbers. For E C Q, define m(E) = £ a”> xnEE where the notation, ^,XnEE, means the sum over all indices, n, such that xn G E. Then /z is a measure on P(Q) and, consequently, (Q,P(Q),/z) is a measure space. Here are two special cases: (i) If Q is countable, {xn}n is an enumeration of Q, and an = 1 for all n, then the measure, /z, defined in (4.1) is counting measure.
170 □ Chapter 4 Measure Theory (ii) If the sequence, {rrn}n, consists of only one element, say xq, and if ao = 1, then the measure, /z, defined in (4.1) takes the form /БЛ fl, if Xq e E; ^> = (0, if^E. This measure is denoted by 6XQ and is called the unit point mass or Dirac measure concentrated at Xq. Note that is a probability measure. j) Let (Q, A) be a measurable space such that {x} G A for each x G Q. A measure, /z, on A is called discrete if there is a countable set К С Q such that /z(JCc) = 0. It is not too difficult to show that if /z is a discrete measure, then we can write /z = See Exercises 4.6 and 4.19 for more on discrete measures. □ The following theorem provides some important properties of mea- sures. We leave the proof as an exercise for the reader. THEOREM 4. 1 Suppose that (Sl,A,p) is a measure space and that A and В are A- measurable sets. Then the following hold: a) If /z(A) < 00 and А С В, then p(JB \ A) = p(B) — /z(A). b) Ac В => /z(A) < /z(B). (monotonicity) c) If {En}„=1 C A with EiD E2 D • • • and /z(Ei) < 00, then lim /z(En). n—*00 d) If {£n}^=i G A with Ei С E2 C , then lim м(-Еп)- n—>00 e) If {En}n C A, then This property is called countable subadditivity.
4.1 Measure Spaces □ 171 Almost Everywhere and Complete Measure Spaces Recall from Section 3.8 that a property holds Lebesgue almost everywhere (A-ae) if it holds except on a set of Lebesgue measure zero. That concept can be generalized to apply to any measure space. DEFINITION 4.2 Almost Everywhere A property is said to hold p almost everywhere, or //-ae for short, if it holds except on a set of //-measure zero, that is, except on a set N with p(N) = 0. Note: Several terms are used synonymously for “almost everywhere.” Here are a few: almost always, for almost all x G fl, and, in probability theory, almost surely, with probability one, and almost certainly. Proposition 3.4 on page 123 implies that subsets of Lebesgue measur- able sets of Lebesgue measure zero are also Lebesgue measurable sets. On the other hand, Exercise 3.99 on page 165 indicates that there exist subsets of Borel sets of Lebesgue measure zero that are not Borel sets. Those two facts have relevance to almost-everywhere (ae) properties of measurable functions. For instance, by Proposition 3.13 on page 162, if f is Lebesgue measurable and g — f A-ae, then g is Lebesgue measurable. However, as Exercise 3.100 on page 165 shows, that result is not true for Borel measurable functions. We now see that it is important to know whether subsets of sets of measure zero are measurable sets. Hence, we make the following definition. DEFINITION 4.3 Complete Measure Space A measure space, (П,Л, //), is said to be complete if all subsets of Л-measurable sets of //-measure zero are also Л-measurable; in other words, if A e Л and //(A) = 0, then В G Л for all В C A. Thus, (7£, Л1, A) is a complete measure space while (7£, B, A|#) is not a complete measure space. The following theorem shows that any measure space can be extended to a complete measure space. We leave the proof of the theorem as an exercise for the reader.
172 □ Chapter 4 Measure Theory M(F) = { THEOREM 4. 2 Let (Q, Д, p) be a measure space. Denote by A, the collection of all sets of the form В U A where В G A and А С C for some C G A with p(C) = 0. For such sets, define p(B UA) = p(B). Then A is a a-algebra, p is a measure on A, and (fl, A, p) is a complete measure space. Furthermore, Ac A and р|Л = p. (Q, Л, p) is called the completion of (Q, A, p). It can be shown that the measure space, (7£, Л4, A), is the completion of the measure space, A|#). See Exercise 4.16. EXERCISES 4.1 4.1 Suppose that (Q, Д, p) is a measure space and that D is an Л-measurable set. Define Ad — { DO A : A G A } and pp = р|Лг). Show that (D, Ad, Pd) is a measure space. 4.2 Let Q be a nonempty set and Д = P(Q). Define p on Л by N(E), if E is finite; oo, if E is infinite, where N(E) denotes the number of elements of E. Prove that p is a measure on A. 4.3 Consider the experiment of selecting a number at random from the closed interval [—1,1]. a) Construct an appropriate probability space for this experiment. b) Determine the probability that the number selected exceeds 0.5. c) Determine the probability that the number selected is rational. 4.4 Let (П,Л) be a measurable space, p and v measures on Л, and а > 0. Define set functions, p 4- v and cup, on A by (g + i/)(A) = ц(А) + u(A), (aii)(A) = ац(А). a) Show that p + v is a measure on Д. b) Show that ap is a measure on A. 4.5 Let (Q, A) be a measurable space, {pn}^! a sequence of measures on A, and {an}^=1 a sequence of nonnegative real numbers. Define anPn on A by OO \ oo У7 ) и)= У? &niin(A). n=l ' n=l Prove that QnPn is a measure on A. +4.6 Refer to Example 4.1(j). Let (Q,X) be a measurable space and suppose that {ж} G A for each x G Q. Show that a measure p on Л is discrete if and only if there is a countable subset К of Q such that p = YLxek
4.1 Measure Spaces □ 173 4.7 Suppose that a balanced coin is tossed three times. a) Construct a probability space for this experiment in which each possible outcome is equally likely. b) Determine the probability of obtaining exactly two heads. c) Express the probability measure, P, as a finite linear combination of Dirac measures. 4.8 Let Q be a nonempty set, {xn}n a sequence of distinct elements of Q, and {fln}n a sequence of nonnegative real numbers. For E C Q, define = 52 “n- xn GE a) Show that p is a measure on P(Q). b) Interpret the ans in terms of the measure, p. c) Express p as a linear combination of Dirac measures. 4.9 Suppose that two balanced dice are thrown. a) Construct a probability space for this experiment in which each possible outcome is equally likely. b) Use part (a) to determine the probability that the sum of the dice is seven or 11. c) Construct a probability space for this experiment in which the outcomes consist of the possible sums of the two dice. d) Use part (c) to determine the probability that the sum of the dice is seven or 11. 4.10 Prove Theorem 4.1. 4.11 Let (0,Л) be a measurable space. A measure, p, on A is called a finite measure if /i(Q) < oo. A measure space, (0,Л,/1), is called a finite measure space if p is a finite measure. For a finite measure space, prove the following: a) If A and В are Л-measurable sets, then jtt(A U B) = p(A) + p(B) — p(A П B). b) Generalize part (a) to an arbitrary finite number of А-measurable sets. 4.12 Let {En}^! be a sequence of Л-measurable sets. Prove that lim inf p(En)- n—»oo 4.13 Let {En}^-! be a sequence of Л-measurable sets with p ((J^Li < oo. Prove that lim sup p(En\ n—»oo
174 □ Chapter 4 Measure Theory ★4.14 Let (Q, Д, д) be a measure space and {Fn}^-! a sequence of Л-measurable sets. Define E = { x : x G En for infinitely many n }. a) Prove that E = f|“=1 (|J~ n Ek). b) Prove that ^{En) < oo => ц(Е) = 0. 4.15 Prove Theorem 4.2. 4.16 Prove that the measure space, (7£, M, X), is the completion of the measure space, (TZ, B, X|B). Use the following steps: a) Verify that В С M by employing Exercise 3.32 on page 126. b) Show that В D M by applying Exercise 3.44 on page 127. c) Prove that Л = A|b. Hint: Use the fact established in parts (a) and (b) that M = B. +4.17 Let (Q,X, д) be a measure space. Suppose that (О,^7, p) is_a complete measure space with T D Л and Р|д = д. Prove that F D A and that = д. Conclude that (Q, Д,д) is the smallest complete measure space that contains (П,Л, д). 4.18 Let f be a nonnegative At-measurable function. Define д/ on M by M(E)= [ fdX. J E Prove that д/ is a measure on Л4. 4.19 Let (О,Л, д) be a measure space such that {ж} G A for each x G Q. An element x G Q is said to be an atom of д if д({я}) > 0. Assume now that д is a finite measure, that is, д(О) < oo. Prove the following facts, а) д has only countably many atoms. b) д can be expressed uniquely as the sum of two measures, дс and /id, where дс has no atoms and да is discrete. Moreover, we have that да = where К is the set of atoms of д. 4.2 MEASURABLE FUNCTIONS The next step in developing the abstract Lebesgue integral is to introduce the concept of measurability for functions defined on an abstract space. In addition to real-valued functions, we will also consider complex-valued and extended real-valued functions. We begin with real-valued functions. Real-Valued Measurable Functions Let (Q, Д) be a measurable space and f: Q —> 'll. We want to specify when f is measurable. In the previous chapter, we discussed two kinds of mea- surable functions: Borel measurable functions and Lebesgue measurable
4.2 Measurable Functions □ 175 functions. Recall that a real-valued function, /, is Borel measurable if and only if /~1(O) G В for each open set О C TZ and it is Lebesgue measurable if and only if /~1(О) e M for each open set О C TZ. Hence, it is quite natural to make the following definition. DEFINITION 4.4 Real-Valued Measurable Function Let (П, A) be a measurable space. A real-valued function f on Q is said to be an Л-measurable function if the inverse image of each open subset of TZ under f is an Л-measurable set, that is, if /-1(O) G A for all open sets О C TZ. EXAMPLE 4.2 Illustrates Definition 4.4 a) Let Q = TZ. Then, as we know from Chapter 3, the Borel measurable functions are the В-measurable functions and the Lebesgue measurable functions are the jM-measurable functions. b) Let (Q,Л) be a measurable space, D G Л, and Ad = {DHA : A e A}. Then a function, /:£)—> 7£, is Лр-measurable if and only if for each open subset О of TZ, /~г(О) is of the form D П A for some A e A. c) Let Q be a nonempty set. Then every real-valued function on Q is P(Q)-measurable. An important special case: If Q = Af, then A is usu- ally taken to be P(J\f); hence, all functions ~^TZ are Л-measurable. But functions on are infinite sequences. Consequently, in this case, the Л-measurable functions are precisely the infinite sequences. □ The following proposition provides some useful equivalent conditions for a function to be Л-measurable. To prove the proposition, we proceed in a similar manner as we did in the proof of Lemma 3.5 on page 98. PROPOSITION 4.1 Let (П, Л) be a measurable space and f a real-valued function on Q. Then the following statements are equivalent: a) f is A-measurable. b) For each a e TZ, /“x((—oo,a)) e A. c) For each a eTZ, ((a, oo)) G Л. d) For each a eTZ, /-1 ((—oo, a]) G Л. e) For each a eTZ, f~r ([a, oo)) G A.
176 □ Chapter 4 Measure Theory Theorem 4.3, which we prove next, gives several important properties of real-valued Д-measurable functions. Note that Theorem 3.15 on page 130 is a special case. THEOREM 4.3 Let (SI, A) be a measurable space. Then the collection of real-valued A-measurable functions forms an algebra. In other words, if f and д are Л-measurable and а G H, then a) f + д is A-measurable. b) af is Л-measurable. c) f • д is A-measurable. PROOF: a) By Proposition 4.1, to prove that f + д is Д-measurable, it suffices to show that { x : f(x) + д(х) > а } G Д for each a G 1Z. Now, { x : f(x) + g(x) > a} = {x: f(x) > a- g(x) } = U { x : f(x) >r> a- g(x) } Гб<Э = и ng-1((a-r,o°))). reQ This last union is an Д-measurable set since f and g are Д-measurable functions, Д is a cr-algebra, and Q is countable. Consequently, f + g is an Д-measurable function. b) If a = 0, then af = 0, which is Д-measurable (why?). So, assume a / 0 and let О be any open set in 1Z. Then a~YO = { a~ry : у G О } is open. Therefore, because f is Д-measurable, (a/)“1(O) = /“1(a“1O) G Д. This proves that af is Д-measurable. c) First we show that if f is Д-measurable, then so is f2. If a < 0, then (/2)“1((a, сю)) = Q G Д. If a > 0, then we have Cf2)_1((a,o°)) = {* f{x)2 > a} = { x : fix') > y/a,} U { x : f(x) < -y/a} = /-1 ((x/a, oo)) U/-1((-oo,-x/a))- This last union is an Д-measurable set because f is Д-measurable. Hence, f2 is an Д-measurable function whenever f is.
4.2 Measurable Functions □ 177 Now, for any two functions, f and g, we can write f-9= j(Cf + 5)2 ~ Cf-5)2)- Applying parts (a) and (b) of this theorem and the fact that the square of an Л-measurable function is Л-measurable, we conclude that f • g is an Л-measurable function. We should emphasize that the measurability (or nonmeasurability) of a function depends only on the cr-algebra, Л, of subsets of fi; that is, it has nothing to do with a measure. Nonetheless, if (fi, Л, P) is a probability space, then the Л-measurable functions are called random variables. Thus, an Л-measurable function is a random variable only when con- sidered in the context of a probability space. By the way, in probabil- ity theory, random variables are usually denoted by uppercase italicized English-alphabet letters that are near the end of the alphabet (e.g., X, Y, and Z) instead of the more usual /, g, and h. EXAMPLE 4.3 Illustrates Random Variables Let (0,Л, P) be the probability space from subpart (ii) of Example 4.1(h) on page 169. Define X(hh) = 2, X(ht) = X(th) = 1, and X(tt) = 0. Then X: fi —* TZ is a random variable. It indicates the number of heads obtained when a balanced coin is tossed twice. □ Our next result is a generalization of Proposition 3.13 on page 162 to an arbitrary complete measure space. Its proof is essentially identical to that of Proposition 3.13. PROPOSITION 4.2 Suppose that (О,Л, p) is a complete measure space. If f is A-measurable and g = f p-ae, then g is A-measurable. Complex-Valued Measurable Functions In applying real analysis, we often encounter complex-valued functions. This occurs, for instance, in Fourier analysis. We will denote the set of all complex numbers by C. Here now is the definition of measurability for complex-valued functions.
178 □ Chapter 4 Measure Theory DEFINITION 4.5 Complex-Valued Measurable Function Let (Q, Л) be a measurable space. A complex-valued function f on Q is said to be an *4-measurable function if the inverse image of each open subset of C under f is an Л-measurable set, that is, if /“1 (O) G A for all open sets О С C. The following theorem provides a useful characterization of measura- bility for complex-valued functions. We leave the proof of the theorem as an exercise for the reader. THEOREM 4.4 A complex-valued function f on Q is A-measurable if and only if both its real part, 3tf, and its imaginary part, Qf, are (real-valued) A-measurable functions. EXAMPLE 4.4 Illustrates Complex-Valued Measurable Functions a) Any real-valued Д-measurable function on Q is also a complex-valued Л-measurable function. b) Let Q = H and A = B. Define f: 71 —> C by f(x) = егх. The real and imaginary parts of f(x) are cos x and sin x, respectively. Since those two functions are continuous, they are B-measurable. Consequently, by Theorem 4.4, f is a complex-valued Б-measurable function. c) If д and h are real-valued Л-measurable functions, then, by Theorem 4.4, the complex-valued function, f = д + ih, is also Л-measurable. d) Let {an}^_i be a sequence of complex numbers and define f:M —> C by /(n) = an. Then f is a complex-valued P(JV}-measurable function. □ Theorem 4.3 holds also for complex-valued Л-measurable functions. That is, the collection of complex-valued Д-measurable functions forms a (complex) algebra. See Exercise 4.32. Extended Real-Valued Measurable Functions In addition to real- and complex-valued functions, we frequently must deal with extended real-valued functions, in other words, functions that take values in = 71 U {—oo, oo}. This is especially so when considering suprema, infima, and limits. For instance, define /„(x) = -^е-(^)2/2 V 27Г
4.2 Measurable Functions □ 179 -for x G И and n G ЛЛ Then, as n —> oo, fn(x) —> 0 if x / 0 and /n(0) —> oo. Consequently, the sequence, {/n}^i> of real-valued functions converges pointwise to the extended real-valued function, /, where /(*) = o, 00, if x A 0; if x = 0. Thus, we next consider measurability for extended real-valued func- tions. Recall that, by definition, a real-valued function, /, is Л-measurable if G Л for all open sets О C 1Z. Also, by definition, a complex- valued function, /, is Л-measurable if /-1(O) G Л for all open sets О С C. Hence, once we identify the open sets of 7£*, we have a natural way to define extended real-valued Л-measurable functions. DEFINITION 4.6 Open Subsets of the Extended Real Numbers A subset of 7£* is said to be open if it can be expressed as a union of intervals of the form (a, 6), [—oo, 6), and (a, oo], where a, b G 11. DEFINITION 4.7 Extended Real-Valued Measurable Function Let (Q, Л) be a measurable space. An extended real-valued function f on Q is said to be an Л-measurable function if the inverse image of each open subset of H* under f is an Л-measurable set, that is, if /“1(O) G Л for all open sets О C 11*. The next proposition provides the analogue of Proposition 4.1 for ex- tended real-valued functions. Its proof is left as an exercise. PROPOSITION 4.3 Let (Q, Л) be a measurable space and f an extended real-valued function on Q. Then the following statements are equivalent: a) f is A-measurable. b) For each a eH, oo,a)) G Л. c) For each a G ft, f~r ((a, oo]) G Л. d) For each a G 11, f~r ([—oo, a]) G Л. e) For each a G 1Z, f~r ([a, oo]) G Л.
180 □ Chapter 4 Measure Theory Theorem 4.3 shows that the collection of real-valued Л-measurable functions forms an algebra. In the case of extended real-valued functions, if we adopt the convention that oo — oo is some fixed extended real num- ber, then the collection of extended real-valued Л-measurable functions is closed under addition, scalar multiplication, and multiplication. See Exer- cises 4.39 and 4.40. The next theorem shows that the collection of extended real-valued Л-measurable functions is closed under maxima, minima, suprema, infima, and pointwise limits. Note that Theorem 3.16 on page 130 is an immediate consequence. THEOREM 4.5 Suppose that f and g are extended real-valued A-measurable functions and that {fn}n=i is a sequence of extended real-valued A-measurable functions. Then a) f V g and f f\g are A-measurable. b) supn fn and infn fn are A-measurable. c) limsup^QQ fn and lim inffn are Л-measurable. d) If {/n}^°=i converges pointwise, then limn_+oo fn is A-measurable. PROOF: a) Let h = f\/g and а G K. Then b-1 ((a, oo]) = oo])u<7-1((a, oo]). This union is in A because f and g are Л-measurable functions. Thus, f V g is Л-measurable. Similarly, f A g is Л-measurable. b) Let h = supn/n and a G 11. Then iT^oo]) = U~=i fn 1 ((<*,^]). This union is_in A because each fn is an Л-measurable function. Hence, we see that supn fn is Л-measurable. Similarly, infn fn is Л-measurable. c) Since limsupn-^/n = infn supfc>n Д, it follows from part (b) that lim sup^^ fn is Л-measurable. Using an entirely similar argument, we find that lim infn-^ fn is Л-measurable. d) If {/n}^=i converges pointwise, then Ит^^ fn = limsup^^ fn. So, limn-юо fn is Л-measurable by part (c). A common application of Theorem 4.5 occurs when {/n}^Li is a se- quence of real-valued Л-measurable functions but one or more of infn /n, supn/n, liminfn—oo fn, limsupn_^00 fn, and lim^^ fn are extended real- valued Л-measurable functions.
4.2 Measurable Functions □ 181 EXAMPLE 4.5 Illustrates Theorem 4.5 a) Let (П, A, /i) = (TZ, A4,A). Define fn(x) = n c-(nx)2/2 л/2тг for x G fZ and n € Af. Then is a sequence of real-valued Af-measurable functions and fn~*f pointwise, where /(x) = if x / 0; if x = 0. By Theorem 4.5(d), f is an extended real-valued Д-measurable function, a fact that we can easily verify directly. b) Let f be an extended real-valued Д-measurable function. By Theo- rem 4.5(a), \ f\ is Д-measurable since \ f\ = f V — f. □ Theorem 4.5(d) shows that if a sequence, {fnKXp of Д-measurable functions converges pointwise to a function, /, then f is an Д-measurable function. What if the convergence is only almost everywhere? In general, we cannot conclude that f is Д-measurable; however, for complete measure spaces we can. PROPOSITION 4.4 Let (fl,A,p) be a complete measure space. Suppose that a se- quence of complex-valued or extended real-valued A-measurable functions and that fn^f p-ae. Then f is an A-measurable function. PROOF: The proof is essentially identical to that of Proposition 3.14 on page 163 and is left to the reader. EXERCISES 4.2 4.20 Prove Proposition 4.1 on page 175. 4.21 Let (Q, Д) be a measurable space and f a real-valued function on Q. Prove that f is Д-measurable if and only if /~1(B) G A for each В 6 B. 4.22 Suppose that (О,Д) is a measurable space and that f:Q —* is an Д-measurable function. Further suppose that g:1Z —► 1Z is a Borel mea- surable function. Prove that g о f is Д-measurable. 4.23 Let D e B. Show that C(D), the collection of Borel measurable functions on D, is precisely the collection of В /^-measurable functions.
182 □ Chapter 4 Measure Theory 4.24 Let (Q, Л) be a measurable space, D € Л, and Ad = {D A A : A G Л }. a) If /: Q —► is Л-measurable, show that /|p is Xp-measurable. b) Suppose that g\ D —> H is Ad-measurable. Define /: Q —» by f(x} = / 9^ J( ’ [0, x£D. Prove that f is Л-measurable. (This shows that every Лр-measurable function can be extended to an Л-measurable function.) 4.25 Prove Proposition 4.2. 4.26 Provide an example to show that the hypothesis of completeness cannot be omitted from Proposition 4.2. 4.27 If О is an open subset of H and a is a nonzero real number, show that оГгО is an open subset of 4.28 Prove Theorem 4.4. Hint: Use the fact that each open set in C is a countable union of open rectangles. [An open rectangle in C is a set of the form {u + iv € О: a < и < b, c < v < d }.] 4.29 Show that every real-valued Л-measurable function is a complex-valued Л-measurable function. 4.30 The collection, P2, of Borel sets of C is defined to be the smallest cr-algebra of subsets of C that contains all the open subsets of C. Show that f: Q —» C is Л-measurable if and only if /“1(B) G A for all В G #2- 4.31 Let (Q, Л, P) be a probability space, X a random variable on Q, and t a fixed real number. Define g:£l —> C by g = eltX; that is, for each x G Q, g(x) = Prove that g is Л-measurable. Is g a random variable? Explain your answer. ★4.32 Prove that the collection of complex-valued Л-measurable functions forms a complex algebra. That is, if f and g are complex-valued Л-measurable functions and a G C, show that f + g, af, and f • g are complex-valued Л-measurable functions. 4.33 Show that each open subset of Tt is also an open subset of TV. 4.34 Prove Proposition 4.3 on page 179. Hint: Show that each open set in K* can be written as a countable union of intervals of the form (a, b), [—00, b\ and (a, 00], where a, b G 4.35 Show that ► TV is Л-measurable if and only if (i) 00}) and /~1({oo}) are in A and (ii) f~\B) G A for all В Ев. 4.36 Show that every real-valued Л-measurable function is an extended real- valued Л-measurable function. 4.37 Show that a set О С H is open in H if and only if there is an open subset U of ft* such that О = H A U.
4.2 Measurable Functions □ 183 4.38 Suppose that f and g are extended real-valued Л-measurable functions. Prove that the following three sets are Л-measurable: a) { x : f(x) > g(x) }. b) {x : f (x\> g(x)}. с) {x: /(x) = #(x)}. 4.39 Suppose that f and g are extended real-valued Л-measurable functions and that /3 E TV. Set E = { x : f(x) = oo, g(x) = —oo } U { x : f(x) = —oo, g(x) = oo }. For x e E, define (/+ <?)(£) = /?; otherwise, define (f + g)(x) = f(x)+g(x), as usual. Prove that f + g is Л-measurable. 4.40 With the convention established in the preceding exercise, prove that the collection of extended real-valued Л-measurable functions is closed under scalar multiplication and multiplication. 4.41 Suppose that is a sequence of extended real-valued Л-measurable functions. Verify that { x : limn—oo fn(x) exists } is an Л-measurable set. 4.42 Suppose is a sequence of complex-valued Л-measurable functions that converges pointwise to a complex-valued function, f. Prove that f is Л-measurable. 4.43 Construct a sequence, of Л-measurable functions that converges almost everywhere to a function, /, that is not Л-measurable. Hint: Take (Q, Л, /z) = (7£, B, A|S) and do something with a non-Borel measurable sub- set of the Cantor set. 4.44 Prove Proposition 4.4. ★4.45 Suppose that is a sequence of complex-valued Л-measurable func- tions. Define lim /n(z), if lim fn(x) exists; n—>oo n—*oo 0, otherwise. Prove that f is Л-measurable. 4.46 Suppose that is a sequence of complex-valued Л-measurable func- tions and that fn —* g /i-ае. Prove that there exists an Л-measurable function, /, such that fn~*f fi-ae. Note: g need not be Л-measurable unless, of course, (Q, Л, /z) is complete. 4.47 Suppose that E is an open subset of C and that g is a real-valued continuous function on E. Further suppose that f is a complex-valued Л-measurable function on Q with the range of f being a subset of E. Prove that g о f is a real-valued Л-measurable function on Q. Repeat the proof if E is a closed subset of C. 4.48 Suppose that f: Q —> C is Л-measurable. Verify that f can be written in the “polar” form, f — Re'e, where R: Q —> [0, oo) and 0:Q —> R are Л-measurable functions. f(x) = I
184 □ Chapter 4 Measure Theory 4.3 THE ABSTRACT LEBESGUE INTEGRAL FOR NONNEGATIVE FUNCTIONS Now that we have discussed measure spaces and measurable functions, we can proceed to develop the abstract Lebesgue integral, that is, the Lebesgue integral on an arbitrary measure space, (Q, Л, /z). As we will see, the development of the abstract Lebesgue integral is almost identical to that of the Lebesgue integral on the real line, that is, on (7£, Л4, A), given in Chapter 3. Consequently, many of the proofs will be left to the reader. Following the procedure in Chapter 3, we will first define the abstract Lebesgue integral of a simple function, then of a nonnegative Л-measurable function, and then of a real-valued Л-measurable function. In addition, we will also define the abstract Lebesgue integral of extended real-valued and complex-valued Л-measurable functions. Nonnegative functions will be considered in this section and general functions in the next. The Lebesgue Integral of a Nonnegative Simple Function Let (Q, Л, p) be a measure space. An Л-measurable function on Q is called a simple function if it takes on only finitely many values. More precisely, we have the following definition. DEFINITION 4.8 Simple Function and Canonical Representation An Л-measurable function, s, is said to be a simple function if its range is a finite set. Let <22, • • •, an denote the distinct nonzero values of <s and set Ak = { x : s(;r) = ak }, 1 < k < n. Then n S = ^акХАк- fc=l This is called the canonical representation of s. We leave it as an exercise for the reader to show that the sets, Л1? A2, ..., An, appearing in the canonical representation of an Л-measurable simple function, are Л-measurable and pairwise disjoint. EXAMPLE 4.6 Illustrates Definition 4.8 a) The Lebesgue measurable simple functions introduced in Chapter 3 are Al-measurable simple functions in the sense of Definition 4.8. b) If Q is a finite set, then every Л-measurable function is simple. □
4.3 The Abstract Lebesgue Integral for Nonnegative Functions □ 185 In Definition 4.9, we give the definition of the abstract Lebesgue inte- gral of a nonnegative Д-measurable simple function. It is a straightforward generalization of the definition presented in Chapter 3 for the Lebesgue in- tegral of a nonnegative Lebesgue measurable simple function. DEFINITION 4.9 Integral of a Nonnegative Simple Function Let (fl, A, fjb) be a measure space and s a nonnegative Л-measurable simple function on Q with canonical representation, s = &kXAk- Then the (abstract) Lebesgue integral of s over Q with respect to p is defined by / s(x)dn(x) = Vak^Ak). fc=i If E G A, then the (abstract) Lebesgue integral of s over E with respect to p is defined by / s(z) dfi(x) = / XE(x)s(x)dp,(x). Je Jn Note: The notations fE s dp and fE s(x) p(dx) are commonly used in place of JEs(x)dfi(x). The next proposition shows how we can obtain the abstract Lebesgue integral of a nonnegative simple function from a possibly noncanonical representation. The proof is identical to that of Proposition 3.8 on page 131. PROPOSITION 4.5 Let s be a nonnegative А-measurable simple function that can be expressed in the form, s = ькХвк, where this representation is not necessarily canonical but Bk E A for 1 < к < m and Bi П Bj = 0 for i / j. Then P m / s(x) dlAx) = VbkidJBk). More generally, f s(x) dp(x) = bklKBk o E) k=l for each E e A.
186 □ Chapter 4 Measure Theory The following fact is proved in precisely the same way as Lemma 3.14 on page 133. PROPOSITION 4.6 Suppose that s and t are nonnegative A-measurable simple functions and that a, /3 > 0. Then ois+[3t is a nonnegative A-measurable simple function and / (as + /3Z) dp = a / sdp + (3 I tdp JE JE JE for each E e A. The Lebesgue Integral of a Nonnegative A-measurable Function The next thing on the agenda is the definition of the abstract Lebesgue inte- gral for a nonnegative extended real-valued Д-measurable function. Propo- sition 4.7 provides the motivation for that definition. PROPOSITION 4.7 a) Suppose that f is a nonnegative extended real-valued A-measurable function on fl. Then there is a nondecreasing sequence of nonnega- tive A-measurable simple functions that converges pointwise to f. In other words, there is a sequence, {snJ-^Lp of nonnegative A-measurable simple functions such that, for all x E fl, < S2^x) < ••• and limn-^oo sn(x) = f[x). ъ) is a sequence of nonnegative A-measurable simple functions that converges pointwise on fl to a function, f, then f is a nonnegative extended real-valued A-measurable function. PROOF: The proof is left as an exercise for the reader. Proposition 4.7 shows that the functions that can be approximated by nonnegative Д-measurable simple functions are precisely the nonnegative extended real-valued Д-measurable functions. Thus, we make the following definition. DEFINITION 4.10 Lebesgue Integral of a Nonnegative Function Let f be a nonnegative extended real-valued Д-measurable function on Q. Then the (abstract) Lebesgue integral of f over fl with
4.3 The Abstract Lebesgue Integral for Nonnegative Functions □ 187 respect to /1 is defined by / f(x) dp>(x) = sup / where the supremum is taken over all nonnegative Д-measurable sim- ple functions that are dominated by f. If E e Д, then the (abstract) Lebesgue integral of f over E with respect to /j, is defined by ( f(x) dp(x) = [ e Ja Xe(x)J(x) dp(x). Note: The abstract Lebesgue integral of a nonnegative jM-measurable func- tion with respect to A is identical to its Lebesgue integral, as defined in Chapter 3. Some of the more important properties of the abstract Lebesgue in- tegral for nonnegative extended real-valued Д-measurable functions are provided in Proposition 4.8. The proof is left as an exercise for the reader. PROPOSITION 4.8 Let f and g be nonnegative extended real-valued A-measurable functions on ft, а > 0, and E e A. Then a) f <9 H-ae => fEfd(j, < fEgdp. b) В с E and В E fBfdp<fEf d/j,. c) f(x) — 0 f°r all x G E => f„f du — 0. d) M(F) = 0^/B/dM = 0. e) Convergence Properties of the Abstract Lebesgue Integral for Nonnegative A-measurable Functions We now present two major convergence theorems for the abstract Lebesgue integral of nonnegative extended real-valued Д-measurable functions — the monotone convergence theorem (MCT) and Fatou’s lemma. The proofs are similar to those given in Section 3.6 (page 140 onward). The MCT is stated first. Note that it applies to extended real-valued Д-measurable functions as well as to real-valued Д-measurable functions.
188 □ Chapter 4 Measure Theory THEOREM 4.6 Monotone Convergence Theorem (MCT) Suppose that is & monotone nondecreasing sequence of nonnega- tive extended real-valued A-measurable functions. Then, for each E e A, COROLLARY 4.1 Let ft 9> fl, /2, • • • be nonnegative extended real-valued A-measurable functions and let E e A. Then &) fE(f + d)df/ = + fEgdfi. b) fEEn=1fnd^ = Zn=1fEfnd^. c) If {f?n}n C A are pairwise disjoint, then Jjj E f dp = fE^ f dp. Proposition 4.8(e) and Corollary 4.1(a) together imply that if f and g are nonnegative extended real-valued A-measurable functions and a, (3 > 0, then [ {af 4- (3g) dp = a f fdp + /3 f gdp. (4.2) J £2 J £2 J £2 Equation (4.2), Proposition 4.7, and the MCT are frequently used together for “bootstrapping arguments.” That is, suppose we want to prove that a certain Lebesgue-integral property holds for all nonnegative Л-measurable functions. To bootstrap, we employ three steps: First we show that the property holds for characteristic functions of Л-measurable sets; next we apply (4.2) to conclude that the property holds for nonnega- tive simple functions; and then we use Proposition 4.7(a) and the MCT to deduce that the property holds for all nonnegative Л-measurable functions. Exercises 4.60 and 4.61 provide illustrations of bootstrapping. Next we state Fatou’s lemma. This version of Fatou’s lemma not only generalizes to arbitrary measure spaces the version presented in The- orem 3.20 on page 146 but its hypotheses are less restrictive. Specifically, it does not impose any convergence conditions on {fn}n=r THEOREM 4.7 Fatou’s Lemma Suppose that {/n}Xi JS a sequence of nonnegative extended real-valued A-measurable functions. Then, for each E e A, / lim inf fn dp < lim inf JE n~*°° n—*oo f fndp- E
4.3 The Abstract Lebesgue Integral for Nonnegative Functions □ 189 EXAMPLE 4.7 Illustrates the Abstract Lebesgue Integral a) Let (О,Л, /z) be a measure space and f a nonnegative extended real- valued Л-measurable function on Q. Suppose that xq e Q and that {xo} e Л. We claim that /d/z = /(xo)/z({^o})- (4-3) To see this, note that X{x0}f is the simple function /(xo)x{x0} an<45 hence, by Definition 4.9 on page 185, / / x{xo}fdfi= / f (xq)x{x0} dfi — f{x0)^{xQ}). «/{xo} More generally, let C = {xn}n be a countable subset of Q such that {a:n} G A for each n. Then, by Corollary 4.1(c) and (4.3), [ fdn= [ f dp, Jc •'Un{*n} =52 [ fd^=^2 f&MM)- (4-4) b) Consider the measure space (A/', P(AZ'), /z) where /z is counting measure on P(JV’). Then, as we learned in Example 4.2(c), a nonnegative real- valued V(^-measurable function, /, on X is a nonnegative infinite sequence, {an}^Li, where we have let an = f(n). Thus, by (4.4), - oo oo / fdn=52-f(nM<n})= 52an- n=l n=l Hence, we can apply abstract measure theory to study infinite series. c) Let (Q, Л, P) be a probability space and X a nonnegative random vari- able. Then the abstract Lebesgue integral of X over Q with respect to P is called the mean (expectation, expected value) of X. The mean of X is denoted by £(X). Thus, £(X) = [ XdP. For instance, consider the experiment of tossing a balanced coin twice. An appropriate probability space for that experiment is (Q, Л, P), where
190 □ Chapter 4 Measure Theory Q = {HH, HT, TH, тт}, A = Р(П) and, for E € A, P(E) — N(E)/4. Let X denote the number of heads obtained. Then, by (4.4), the mean of X equals £(X) = / XdP = X(hh)P({hh}) + X(ht)P({ht}) JQ + X(th)P({th}) + X(tt)P({tt}) „ 1 1 ,1 1 , = 2-- + l-- + l-- + 0-- = l, 4 4 4 4 which is intuitively what it should be. d) Let Q be a set, {xn}n a sequence of distinct elements of П, and {bn}n a sequence of nonnegative real numbers. For E С П, define M(E) = £ bn. xn&E Then p is a measure on P(Q). Let f be a nonnegative function on Q and set C = {^n}n- Then, by Corollary 4.1(c) on page 188, Proposi- tion 4.8(d) on page 187, and (4.4), = 52 /(хп)д({®п}) = ^f(.Xn)bn. (4-5) We will employ (4.5) frequently. e) Let Q be a set, A = P(Q), and p counting measure on A. If f is a nonnegative function on fl, then ( fdfi= 52/(x), where f(x>) = SUP {23xgf /(x) : F finite, F C fi}. The verifica- tion of this is left to the reader. □ EXERCISES 4.3 4.49 Establish that the sets appearing in the canonical representation of an Д-measurable simple function are Д-measurable and pairwise disjoint. 4.50 Prove Proposition 4.7 on page 186. Hint: Refer to Proposition 3.9 on page 134.
4.3 The Abstract Lebesgue Integral for Nonnegative Functions □ 191 4.51 Suppose that f is a nonnegative extended real-valued Д-measurable func- tion on Q, c > 0, and Ac = { x : f(x) > c}. Prove that д(Л) < 7 [ fdfi. cJn +4.52 Let f be a nonnegative extended real-valued Л-measurable function on Q and E 6 Л. Prove that f f dfi = 0 if and only if f = 0 jz-ae on E. +4.53 Suppose that f is a nonnegative extended real-valued Л-measurable func- tion on Q and that f du < oo. Show that f is finite jz-ae. 4.54 Prove Proposition 4.8 on page 187. Hint: Refer to Proposition 3.10 on page 136. 4.55 Prove the MCT, Theorem 4.6 on page 188. 4.56 Show that for a fixed E 6 Л, the conclusion of the MCT remains valid if the hypotheses are satisfied only on E. 4.57 Prove Corollary 4.1 on page 188. 4.58 Suppose that f is a nonnegative extended real-valued Л-measurable func- tion on Q. Also, suppose that C A with Ex С E2 C • • •. Prove' that / f d/i = lim / f dfi. 7-1=1 4.59 ProveTatdu’s lemma, Theorem 4.7 on page 188. 4.60 Suppose that (Q, Л, /z) is a measure space, D 6 Л, and f is a nonnegative extended real-valued Л-measurable function on Q. Let (D,Ad^d) be as defined in Example 4.1(c) on page 168. Show that / fdp.= / f\DdnD. J D J D Hint: Use bootstrapping. +4.61 Let (Q, Л, jz) be a measure space and g a nonnegative Л-measurable function on Q. For E € Л, define v(E) = / gd^i. J E a) Show that и is a measure on Л. b) Show that / fdv= / fgdfi Jq Jn for each nonnegative Л-measurable function, /. Hint: Bootstrap.
192 □ Chapter 4 Measure Theory 4.62 Let {flmn}m,n=i be a double sequence of nonnegative numbers. Prove that oo oo oo oo 0>mn — U-пгп • П=1 771=1 771=1 П=1 Hint: Refer to Example 4.7(b) on page 189. 4.63 Let f : Q —► [0,1] be an Л-measurable function. a) Prove that limn—oo dp = ((0,1]))- b) If /i(Q) < oo, prove that lim n—>oo /n7ndM = M(r1({l}))- 4.4 THE GENERAL ABSTRACT LEBESGUE INTEGRAL In the previous section, we discussed the abstract Lebesgue integral for nonnegative extended real-valued Л-measurable functions. We will now expand the definition of the abstract Lebesgue integral so that it applies to Л-measurable functions that are not necessarily nonnegative. We begin with extended real-valued functions. Lebesgue Integral of an Extended Real-Valued Function Let (fl, Л, p) be a measure space. To define the abstract Lebesgue integral of an extended real-valued Л-measurable function, /, on Q, we follow the procedure used in Section 3.7 for defining the Lebesgue integral of a real- valued Lebesgue measurable function on 11. DEFINITION 4.11 Integral of an Extended Real-Valued Function Let f be an extended real-valued Л-measurable function on Q and E € Л. Then the (abstract) Lebesgue integral of f over E with respect to /1 is defined by [ fdn^~[ f+dfi- i f du e Je Je (4-6) provided that the right-hand side makes sense; that is, at least one of the integrals on the right-hand side of (4.6) is finite. Here /+ = / V 0 and f~ = —(/Л0) denote the positive and negative parts of /, respec- tively. In addition, we say that f is Lebesgue integrable over E
4.4 The General Abstract Lebesgue Integral □ 193 if both integrals on the right-hand side of (4.6) are finite or, equiva- lently, if I \f\dp.= I f+dfj,+ [ f d/j, < oo. Je Je Je (4-7) If f is Lebesgue integrable over Q, then we say that f is Lebesgue integrable. We should mention that if f is Lebesgue integrable (over Q), then it is Lebesgue integrable over every E € A. Here are some examples. EXAMPLE 4.8 Illustrates Definition 4.11 a) Let ($7,Д,^) = (7^,Л4,Л) and /(x) = x. Then rw = {S,' ж > 0; x < 0. and 0, —x, x > 0; x < 0. (i) If E = 11, then f+ dX = f^f dX = oo. Hence, the integral, JbfdX, is not defined. (ii) If E = [-1,2], then JEf+dX = 2 and fE f~ dX = 1/2 so that fEfdX = 2—1/2 = 3/2. And, since fE \f\ dX = 2+1/2 = 5/2 < oo, we see that f is Lebesgue integrable over [—1,2]. (iii) If E = (—oo, 1), then fE f+ dX = 1/2 and fE f~ dX = oo so that fE fdX = 1/2 - oo = -oo. However, as fE |/| dX = 1/2 + oo = oo, we see that f is not Lebesgue integrable over (—oo, 1). b) Let (П,Д,^) = (Л/",7?(A/’),/i), where p is counting measure on Then real-valued Д-measurable functions are simply infinite sequences of real numbers. Referring to Example 4.7(b) on page 189, we see that a sequence of real numbers, is Lebesgue integrable (over Af) if and only if oo ^2 lan| < OO, n=l (4-8) that is, the series is absolutely convergent. For instance, the sequence, {(—1)п/мр}^=1, is Lebesgue integrable if and only if p > 1. Note that, although £„=1(-1)п/п converges, {(“l)n/^}Xi not Lebesgue inte- grable as the series is not absolutely convergent. □
194 □ Chapter 4 Measure Theory Lebesgue Integral of a Complex-Valued Function Next we will define the abstract Lebesgue integral for complex-valued Л-measurable functions. First some preliminaries. DEFINITION 4.12 Modulus of a Complex-Valued Function Let f be a complex-valued function on Q. Then the modulus of /, denoted by |/|, is defined to be the real-valued function l/l = vm2 + (W- In other words, |/|(z) = |/(z)|, where |/(x)| denotes the modulus of the complex number /(z). The following two propositions will be required. We leave the proofs as exercises for the reader. PROPOSITION 4.9 Let f be a complex-valued function on Q. Then a) |/| < |3?/| + |S/|. b) |»/| < |/| and |9/| < \f\. c) \f\ is A-measurable if f is. PROPOSITION 4.10 Let f be a complex-valued A-measurable function on Q and E e A. Then \f\ is Lebesgue integrable over E if and only if both Rf and are. In view of Proposition 4.10 and the fact that f = %lf 4- it is reasonable to make the following definition. DEFINITION 4.13 Integral of a Complex-Valued Function Let f be a complex-valued Л-measurable function on Q and E € Л. We say that f is Lebesgue integrable over E with respect to p if \ f\ is Lebesgue integrable over E with respect to д; that is, [ |/| dfi < oo. JE
4.4 The General Abstract Lebesgue Integral □ 195 In that case, the (abstract) Lebesgue integral of f over E with respect to д is defined by [ fd^i= [W)d^i [ &f)dfi. Je Je Je If f is Lebesgue integrable over Q, then we say that f is Lebesgue integrable. For a measure space, (Q, Л, /z), the collection of all complex-valued Lebesgue integrable functions is denoted by £1(Q, Л,/1). When no con- fusion will arise, we write /^(/i) for £1(Q, Л,/z). EXAMPLE 4.9 Illustrates Definition 4.13 a) Let (Г2,Л,/х) = (7£, A4,A) and f(x) = егх/(1 + x2). Then we have Ж/(ж) = cosrr/(l-Fz2), ^f(x) = sinz/(l-hz2), and |/(z)| = 1/(1 H-rr2). By Exercise 3.71 on page 147 and Theorem 3.23 on page 157, [ |/(x)| dX(x) = f (1 + z2) 1 dX(x) n Jit r fn dx = lim / (1 4-z2)-1 dA(z) = lim / —— n-00 J^nn] n->oo J_n (1 + x2) = lim (arctan(n) — arctan(—n)) = 7г < 00. Therefore, f e C\X). b) Let (Q, Л, ^) = (Л<, P(Af),/i), where /i is counting measure on P(-V). Then complex-valued Л-measurable functions are simply infinite se- quences of complex numbers. Referring to Example 4.7(b) on page 189, we see that a sequence of complex numbers, {an}^L15 is in £1(m) if and only if the series, converges absolutely. We point out here that the notations, f1 or ^1(A/’), are generally used in place of c) Let (fl, Л) be a measurable space. A measure, /1, on A is said to be a finite measure if /i(Q) <00. If is a finite measure, then (Q, A, /1) is called a finite measure space. For a finite measure space, each bounded complex-valued Л-measurable function, /, is in £1(/i). Indeed, if l/l < M, then by Proposition 4.8(a) on page 187, [ \ f\dp.< [ Mdp, = M/i(Q) < 00. Jq Jq
196 □ Chapter 4 Measure Theory Note that boundedness is a sufficient but not necessary condition for integrability. For instance, let (О,Д,/1) = ((0,1), A4(0,i)? A(0,i)) and f(x) = x~i. Then f is not bounded on (0,1) but is in C1 (A(o,i))- d) If (Q, Д, P) is a probability space, then the integrable functions, that is, members of £г(Р), are called random variables with finite mean or finite expectation. □ The following theorem, whose proof is left as an exercise, provides some important properties of Lebesgue integrable functions. THEOREM 4. 8 Suppose that f and g are in £Х(П, Д,д) and that a G C. Then a) f + 9 £ & M and [ (f + 9)dp= [ fdp + f Г2 J VI J Г2 9 dp. b) af G ^(p) and I af dp = a I f dp. Ja Jq c) If f and g are real-valued and f < g on fl, then f^fdp < j^gdp. d) |/п/йм| < /П1/Ид- e) KE) = 0 => JEf du = 0. f) If A and В are disjoint A-measurable sets, then A\JB в A Remark: Parts (a) and (b) of Theorem 4.8 together imply that if a, fl G C and f,ge ^(p), then [ (af + flg) dp = a [ JQ JQ fdp + fl / gdp. Jq This is called the linearity property of the abstract Lebesgue integral. As mentioned in Section 3.8, we often encounter functions that are only defined almost everywhere. Because the Integral of an Д-measurable function is not affected by its values on a set of measure zero, it is reasonable to make the following definition.
4.4 The General Abstract Lebesgue Integral □ 197 DEFINITION 4.14 Integral of a Function Defined Almost Everywhere Let (О,Д, p) be a measure space. Suppose that f is a function de- fined p-ae on Q; that is, if D is the domain of /, then p(Dc) = 0. Further suppose that there is an Д-measurable function, p, such that g(x) = f(x) for x G D. Then, for E G Д, we define the (abstract) Lebesgue integral of f over E by / fdp = / gdp, JE JE provided that the integral on the right-hand side exists. Dominated Convergence Theorem Theorem 3.22 on page 154 gives the dominated convergence theorem (DCT) for real-valued functions on the measure space (7£, Л4, Л). In what follows, we generalize the DCT so that it applies to complex-valued functions on an arbitrary measure space ($7,Д,р). Note that the version of the DCT given here has weaker hypotheses than the one presented in Theorem 3.22. THEOREM 4. 9 Dominated Convergence Theorem (DCT) Let (fl,A,p) be a measure space. Suppose that is a sequence of complex-valued A-measurable functions that converges p-ae. Further suppose that there is a nonnegative Lebesgue integrable function, g, such that \fn\ < g p-ae for each n G X. Then / lim fn dp = lim / fn dp (4.9) JE n-oo n->oo JE for each E G A. /(*) = ’ PROOF: Without loss of generality, we can assume that, for each n G X, \fn\ < g everywhere on Q (why?). Define lim fn(x), if lim fn(x) exists; n—>oo n—*oo 0, otherwise. Then, by Exercise 4.45 on page 183, f is Д-measurable. Moreover, since {/n}~ x converges p-ae, fn—*f р-ж. From Definition 4.14, we see that limn—oo fn dp = fEf dp and, therefore, to prove (4.9) it suffices to prove [ fdfi = lim [ fnd(J, (4-10) Je ”-,o° Je for each E G Д.'
198 □ Chapter 4 Measure Theory First suppose that each fn is real-valued. Then (4.10) can be proved by employing Fatou’s lemma (Theorem 4.7 on page 188) and the same argument that was used in the proof of the DCT for the Lebesgue integral on the real line (Theorem 3.22). Next, we remove the restriction that each fn is real-valued. Note that {|/ — /n|}^=i is a sequence of real-valued Л-measurable functions that converges to 0 /i-ае. Furthermore, for each n e X, we have \f — fn\ < I/1 + |/n| < 2g, an integrable function. Consequently, by Theorem 4.8 and the previous paragraph, as n —* oo, I fdp — I fndp JE JE < [ \f-fn\dp-* [ 0с?м = 0 JE JE for each E e A. This completes the proof of the DCT. Three of the many important corollaries of the DCT are given in what follows. Several other corollaries are considered in the exercises. COROLLARY 4.2 Suppose that {/n}^Li is a sequence of complex-valued A-measurable func- tions such that oo £ [ l/nl du < oo. п=17Я Then fn converges p-ae and for each E 6 A. PROOF: From Corollary 4.1(b) on page 188, we know that 1/n Им- By assumption, the sum on the right-hand side of the previous equation is finite and, hence, so is the integral on the left-hand side. In other words, if we set g = 52^ \ fnL then g is Lebesgue integrable. From Exercise 4.53 on page 191, we conclude that g is finite p-az which, in turn, implies that fn converges p-ae.
4.4 The General Abstract Lebesgue Integral □ 199 Set gn = fk- Then, for each n e ЛГ, \gn\ < g and, as we have just seen, {gn}™=i converges /i-ае (to Zn)« Therefore, by the DCT and Theorem 4.8(a), f 52 fn dp = / lim gn dp = lim / gn d/i e “ Je n-*°° Je г n n г °° г = lim / V fk dp = lim У2 / fkdp=y2 fn dp П—>OO n—' /о ' / E, Jbfc=l k=lJE n=lJE for each E e A. COROLLARY 4.3 Let (Q, Д,/i) be a measure space, f e £1(m), and a sequence of А-measurable sets with Ei C E% C • • •. Then ur=1 f dp = lim I f dp. En n~^°° JEn PROOF: For convenience, let E = U^=i -E'n- It is easy to see that XEnf —* Хе/ pointwise and that |хеп/| < \f\ E >C1(/i) for each n e ЛЛ Thus, by the DCT, / fdp= / XEfdp= lim / XEnfdp = Je Jn n-*°° Jn lim / n-*°° Je, as recpgred. COROLLARY 4.4 Bounded Convergence Theorem Let (Q, A, p) be a Snite measure spacer Suppose that {/n}^=i 2S a sequence of uniformly bounded, complex-valued, А-measurable functions that con- verges p-ae. Then lim fn dp = lim / fn dp n—+OO n—+OO J E for each E e A.
200 □ Chapter 4 Measure Theory PROOF: By assumption, there is a real number, Af, such that \fn\ < M for all n € X. Because (Q, Л, д) is a finite measure space, the function g(x) = Af is Lebesgue integrable (why?). Applying the DCT completes the proof. EXAMPLE 4.10 Illustrates the DCT a) Suppose that for each n E X, {flnfc}fc?=i is a sequence of complex num- bers and that limn_oo^nfc = for each к E AT. Further suppose that there is a sequence of nonnegative numbers, {bfc}fcL15 such that YlkLi bk < oo and \ank | < bfc for fc, n E X. We claim that oo oo lim У^апк = У'ak. (4.11) n—*oo fc=l fc=l Indeed, consider the measure space, (Х,Р(Х),д), where /1 is counting measure. Define fn(k) = anfc, /(fc) = ak. and p(fc) = bk- By assump- tion, g is integrable, \fn\ < g for all n G X, and fn —* f pointwise on X. Thus, by the DCT, fj^jndp-* fj^fdpasn-^ 00. How- ever, fn dp = ank and j^fdp = ak (see Exercise 4.73). Thus, (4.11) holds. Without a dominating integrable sequence, (4.11) may fail. For instance, take ank = 6nk and ak = 0. Then linin—oo anfc = &к for each к E Af. But, as 52^ unfc — 1 for all n G X, we see that 00 00 lim 52 ank = 1 / 0 = У2 ak. fc=l fc=l Therefore, (4.11) fails to hold. b) Let (fl, A. P) be a probability space and X a real-valued random vari- able having finite expectation, that is, X E £г(Р). Define f on 11 by /(f) = £(eltX). Note that the definition of / makes sense because |eltX| < 1. We claim that /'(0) = i£(X). To prove this, let {fn}^Li be an arbitrary sequence of nonzero real numbers that converges to 0. For each n G X, define Yn = (eltnX — 1) /tn. Then (see Exercise 4.75) (4.12) Cn — 0 JQ tn JQ Now, for x E 7£, we have \егх — 1| < |x| and, therefore, |УП| < |X| for each n G X. As Yn —* IX pointwise on fi, we can apply the DCT to conclude that lim = Пт ( YndP= [ iX dP = i£(X). n—00 tn - 0 n-ioo /0 /fi
4.4 The General Abstract Lebesgue Integral □ 201 Because {£n}^Li is an arbitrary sequence of nonzero real numbers con- verging to 0, it follows that /'(0) exists and equals i£(X). □ EXERCISES 4.4 4.64 Let (Q, A, /2) = (A/*, P(A/*), /i), where /2 is counting measure on Define f(n) = (-l)n/n for n € Af. Is f^fdp defined? Explain your answer. 4.65 Prove Proposition 4.9 on page 194. 4.66 Prove Proposition 4.10 on page 194. 4.67 Let f(x) = x~?. Show that f 6 £* ((0,1), A4(o,i), A(0,i)). 4.68 Prove Theorem 4.8 on page 196. 4.69 Prove that Definition 4.14 on page 197 is well-posed. In other words, assume that f is defined /2-ae on Q and that g and h are Л-measurable functions that equal f on its domain. Show that for E G A, either fEgdp = fE hdp or neither integral exists. 4.70 Show that for a fixed E G A, the conclusion of the DCT remains valid if the hypotheses are satisfied only on E. 4.71 State and prove a version of the DCT for extended real-valued A-measurable functions. ★4.72 Suppose that f 6 £г(П, A, p). Further suppose that {En}^ is a sequence of pairwise disjoint A-measurable sets. Prove that [ fdp = y[ fdp. ±4.73 Let f e ^(QjAjp) and C = {xn}n a countable subset of Q such that {xn} 6 A for each n. Prove that / fdp. = '^f(xn)n({xn}). n Deduce that if {an}^! € then f °° J* n=1 where /(n) = an. 4.74 Assume that is a convergent series of nonnegative numbers and’ that, for n, k e Af, bnk are complex numbers with |bnfc| < M < oo. Also assume that limn->oo bnk = bk for each к G A/*. Prove that oo oo lim dkbnk = У2 ak^k’ n~* k=l ' k=l
202 □ Chapter 4 Measure Theory 4.75 Provide a detailed justification of (4.12) on page 200. 4.76 Let (Q, Л, P) be a probability space and Y a real-valued random variable taking on only finitely many values, say yi, уъ, ..., Уп- Verify that '£(У) = ^укР(У = ук) (4.13) fc=l where, by convention, {Y = y} = {x e Q : Y(x) = y}. (Equation (4.13) shows that the mean of a random variable, У, taking on only finitely many values is a weighted average of the values of Y, weighted according to their probabilities.) 4.77 Let (Q,Л,/х) be a measure space, f G £г(/1), and {En}^=1 a sequence of Л-measurable sets with Ei D Eq D • • •. Prove that I fdp.— lim I fd^. 4.78 Suppose that f : [0,1] x (0,1) —* 11 is such that for each fixed у G (0,1), the function, /М, defined by /^(ж) = f(x,y), is A4[0,i]-measurable. Further suppose that df /ду exists and is bounded on [0,1] x (0,1). Show that f(x,y)dx = lo ^(x,y)dx. 4.79 Let f G £г(О,Л,/i). Show that for each 6 > 0, there is an A G Л with m(A) < oo and fAC \ f\ dp < c ★4.80 Suppose that f G £1(П,Л,/1). Show that for each e > 0, there is a 6 > 0 such that /i(E) < 6 => fE \ f\ dp < e. *4.81 Let f G £1<(7^,Л4,Л). Then we define the Fourier transform of /, de- noted /, by f(t) = I e~itxf(x)dX(x), tell. Jn a) Prove that f is continuous on H. b) Prove that if [xf (x)| dX(x) < oo, then f is differentiable on and /'(*) = f {-ix)e~itxf(x)dX(x). tell. Jn *4.82 Suppose that f G £1(Q,X,/i). a) Show that for each e > 0, there is a bounded Л-measurable function, <?, such that fQ I/ - g| d/i < e. b) Show that for each e > 0, there is an Л-measurable simple function, s, such that Jn I/ - s| d/i < 6.
4.5 Convergence in Measure □ 203 4.5 CONVERGENCE IN MEASURE To this point, we have discussed three types of convergence for functions: pointwise convergence, uniform convergence, and almost-everywhere con- vergence. Another kind of convergence, important especially in probability theory, is convergence in measure? Here is the definition. DEFINITION 4.15 Convergence in Measure Let (Г2,Л,/х) be a measure space and a sequence of complex- valued Л-measurable functions on Q. Then {/n}^Li is said to con- verge in measure to the Л-measurable function f, if for each e > 0, lim ц ({ x : |/(x) - fn(x)\ > c }) = 0. n—>oo We often write indicate convergence in measure. Thus, fn f if the measure of the set where fn differs from f by more than any prescribed positive number tends to zero as n oo. A first question is whether there is a relationship between almost- everywhere convergence and convergence in measure. The following exam- ple shows that, generally speaking, there is no relationship. EXAMPLE 4.11 Illustrates Definition 4.15 a) Let (Г2,Л,/х) = (7£,A4,A). Set f(x) = 0 and /n(z) = x/n for x e H and n e АЛ Then fn~*f pointwise and, hence, A-ae. But fn -/+ f in measure. Indeed, for e > 0, { x : |/(x) - /п(я)| > e } = (~°°,-Пб) u (n6> °0) which has infinite Lebesgue measure for every n 6 Af. Therefore, we see that A({x : |/(x) - /n(^)| > e}) 0. Hence, almost-everywhere convergence does not imply convergence in measure. b) Let (П,Л,м) = ([о, 1],A4[O,1],A[O,1])- Define /1 = X[o,i], /2 = X[o,i/2], /3 = X[i/2,i] and, in general, if n = к + 2-\ where 0 < к < 2J , define t In probability theory, the terminology convergence in probability is used in place of convergence in measure.
204 □ Chapter 4 Measure Theory fn = X[k2-j,(fc+i)2->]- Then for e > 0, 2 g({x : |/n(x)| > e}) < - -* 0 as П —oo. So, fn 0. But, for each x e [0,1], the sequence, {/n(^))Xi’ contains infinitely many Is and infinitely many Os. Thus, {/n}^Li converges for no x e [0,1] and, in particular, № □ Example 4.11(a) shows that, in general, convergence almost every- where does not imply convergence in measure. For finite measure spaces, however, the implication is correct. PROPOSITION 4.11 Suppose that (fl^A^p) is a finite measure space and that is a sequence of complex-valued A-measurable functions that converges p-ae to the A-measurable function f. Then fn f- PROOF: Let В = { x : fn(x) f(x) }. Then, by assumption, p(B) = 0. For e > 0, define En = { x : \ f(x) - /n(x)| > e } and E = П~=1 (U“=n Ek). We must show that lim^oc p(En) = 0. Note that x e E if and only if x e En for infinitely many n. It follows easily that E С В and, hence, p(E) = 0. Because p(£l) < oo and Ek э ur=n+1 Ek for each n e Af, we conclude from Theorem 4.1(c) on page 170 that limsup/z(En) < lim p[ I J Ek ) = p(E) = 0. n—oo n—oo \ J k=n z Hence, linin—oo p(En) = 0, as required. As we discovered in Example 4.11(b), convergence in measure does not imply almost-everywhere convergence. However, we do have the following useful result. PROPOSITION 4.12 Suppose that is a sequence of complex-valued A-measurable func- tions that converges in measure to the A-measurable function f. Then there is a subsequence, {fnk}^=1, of {/n}^=i such that fnk —» f p-ae.
4.5 Convergence in Measure □ 205 PROOF: We can, for each к e AT, choose an e X such that м Qx : |/(x) - /njk(x)| > | < 2-fc. (4.14) Furthermore, can be selected so that n\ < П2 < •••• Now let Ek = {x : \f(x) - fnk(x)\ > AT1} and E = Note that x E E if and only if |/(x) — /nfc(^)| > for infinitely many k. From (4.14), we see that м№) < 00 and, consequently, by Exercise 4.14 on page 174, p(E) = 0. We claim that fnk —» f on Ec. So let x E Ec and e > 0 be given. Choose fci E AT so that fcf1 < e. Since x E, it follows that there is a A?2 E Af such that x Ek for к > k2- Let К = max{ki,&2}- Then we have that |/(x) - fnk(x)\ < &-1 < e for all к > К. The DCT for Convergence in Measure By employing Proposition 4.12, we can prove that the dominated con- vergence theorem remains valid when almost-everywhere convergence is replaced by convergence in measure. That is, we have the following result: THEOREM 4.10 Let (Sl,A,p) be a measure space. Suppose that {/n}^=i is a sequence of complex-valued А-measurable functions that converges in measure to the А-measurable function f. Further suppose that there is a nonnegative Lebesgue integrable function, g, such that \fn\ < g p-ae for each n E Af. Then [ У dp = lim [ fndp (4.15) JE n-^OQ JE for each E E A. PROOF: Let E E A. To prove (4.15) it suffices, by Exercise 2.32 on page 56, to show that every subsequence of {fE fndp}n=1 has a subse- quence that converges to fEfdp. So, let {n/c}^L1 be a subsequence of АЛ Whereas fn f, it is clear that fnk f. Applying Proposition 4.12, we deduce that {fnk}kLi has a subsequence, with fnk. f p-ae. Clearly, we have |/nfc. | < g p-ae for each j E Jv and, hence, by the DCT (Theorem 4.9 on page 197), fdp = lim / fn dp. Je This completes the proof.
206 □ Chapter 4 Measure Theory EXERCISES 4.5 4.83 Show that if fn f and fn g, then f = g /2-ae. +4.84 Suppose that f, Ji, /2, are in £1(О,Л,/х) and that f \f — fn\dp, —> 0 as n —> 00. Show that fn —> / in measure. 4.85 Let (Q,v4,/z) be a measure space. A sequence, of complex-valued Л-measurable functions on Q is said to converge almost uniformly to the complex-valued Л-measurable function, /, if for each € > 0, there is a set A G Л such that /z(A) < € and fn—>f uniformly on Ac. a) Prove that almost-uniform convergence implies convergence in measure; that is, if fn —> f almost uniformly, then fn —> f in measure. b) Prove that almost-uniform convergence implies almost-everywhere con- vergence; that is, if fn —> f almost uniformly, then fn—>f p-ae. c) Does almost-uniform convergence imply pointwise convergence? Justify your answer. 4.86 Provide a detailed justification for all statements in Example 4.11(b). 4.87 Let /, <7, /1, /2, • • • be complex-valued Л-measurable functions. Suppose that fn —► g /z-ае and that fn —> f in measure. Prove that f — g /i-ае and, hence, that fn-+f /x-ae. 4.88 Fatou’s lemma for convergence in measure: Suppose that {fn}^-! is a sequence of nonnegative Л-measurable functions that converges in mea- sure to f. Prove that I f dp < lim inf I fn dp Je n^°° Je for each E G Л. Hint: Select a subsequence of { A that con- verges to lim infn-oo fE fn dp. 4.89 Establish the following fact: If {fn}^=1 converges in measure, then it is also Cauchy in measure, that is, for each e > 0, /i({ x : |/n(^) — f™(x)\ > e }) —* 0 as m, n —* 00. 4.90 Prove the following strengthened version of Proposition 4.12. Suppose that is a sequence of complex-valued Л-measurable functions that con- verges in measure to f. Then there is a subsequence, {/nfc}fc?=i, of {/n}^ such that fnk —> f almost uniformly. Hint: Show that there is a subse- quence {nfcJkLi of Af such that M (*[ x : l/nfc(x) — fnk+1 (я)| > 2 }) < 2 You will also need to apply the Weierstrass M-test. 4.91 Suppose that (9,Л,/1) is a measure space and that f, fi, f2, ... are complex-valued Л-measurable functions on Q. Show that 00 / 00 ✓ 00 \ \ {*: Jim^nCz) = /(*)} = P| ( |j(P]{z: < £}) j • m=l \n=l 'fc=n ' /
4.6 Extensions to Measures □ 207 ★4.92 Suppose that (9,Л,/х) is a finite measure space and that /, /i, /2, • * • are complex-valued Л-measurable functions on Q. Show that fn—>f M-ae if and only if for each c > 0, lim M{x : \f(x) - fk(x)\ > e}) =0. n—»oo \ / Xk=n 7 Compare this equation with the definition of convergence in measure. 4.93 Suppose (П,Л, д) is a finite measure space and is a sequence of complex-valued Л-measurable functions that converges in measure to f. Further suppose g: C —* C is continuous. Prove that g о fn —> g о f in mea- sure. Hint: For a given e > 0, let an = Д ({ x : |^(/(ж)) - ^(/п(ж)) | > e }). Show that each subsequence of {an}^_1 has a subsequence that converges to 0. 4.94 Egorov’s theorem: Suppose that (О,Л, д) is a finite measure space and that /, /i, /2, • • • are complex-valued Л-measurable functions on Q. Prove that if fn —> f then fn—*f almost uniformly. 4.6 EXTENSIONS TO MEASURES In Chapter 3, the concept of length was extended and replaced by that of measure. Specifically, we began with the collection of intervals and the set function, t, that assigns to each interval its length. The problem was to extend t, to a measure defined on a a-algebra of subsets of TZ that contains all intervals. We proceeded as follows: First we extended the concept of length to all subsets of by defining Lebesgue outer measure, A*: А*(Л) = inf < : {In}n open intervals, D A ►. (4.16) Then we defined the Lebesgue measurable sets, Л4, to be the collection of subsets E of that satisfy A*(W) = A*(W П E) + A*(W П Ec) (4.17) for all W C 1Z. Finally, we proved that M is a cr-algebra containing all intervals and that the set function, A = A* 1^4, is a measure on Л4 satisfying A(7) = ^(/) for all intervals I. Thus, Lebesgue measure, A, provided the required extension of length.
208 □ Chapter 4 Measure Theory In this section, we will use our experience from Chapter 3 to handle more general situations. Suppose then that Q is a set, C is a nonempty collection of subsets of fi, and l is a nonnegative extended real-valued set function on C. Our two primary questions are: Question 1: Can l be extended to a measure on a ст-algebra containing С 1 Question 2: If such an extension exists, when is it unique? We begin by considering Question 1. Necessary Conditions; Semialgebras First we will obtain some necessary conditions on l for an affirmative answer to Question 1. So, assume that l can be extended to a measure, g, on a a-algebra, A D C. Then, by Definition 4.1 on page 168, Theorem 4.1 on page 170, and the fact that g is an extension of t, we must have (El) If 0 e C, then t(0) = 0. (E2) If {Cfc}J?=1 is a finite sequence of pairwise disjoint members of C whose union is in C, then Gn \ n Л=1 / fc=l (E3) If C, Ci, C2, .. • are in C and С C (Jn then n Conditions (E1)-(E3) are necessary conditions for the extension of l to a measure on a ст-algebra containing C. In other words, unless those three conditions hold, such an extension is impossible. Remarkably, as we will see, if C is a semialgebra (defined in what follows), then those three conditions are also sufficient for the extension. DEFINITION 4.16 Semialgebra of Subsets Let Q be a set. A nonempty collection, C, of subsets of Q is called a semialgebra if the following conditions hold: a) If А, В e C, then А П В e C. b) If C € C, then there is a pairwise disjoint finite (possibly empty) sequence of members of C whose union is Cc.
4.6 Extensions to Measures □ 209 In words, C is a semialgebra if it is closed under intersection and the complement of each member of C is a finite (possibly empty) disjoint union of members of C. In what follows, we present a few examples of semialgebras. The jus- tifications are left as exercises for the reader. EXAMPLE 4.1 2 Illustrates Definition 4.16 a) Any algebra and, hence, any ст-algebra is a semialgebra. b) Suppose that Q is a finite set. Let C denote the collection of sets consist- ing of the empty set and all singleton sets, that is, sets of the form {#}, where x G Q. Then C is a semialgebra. c) Let I denote the collection of all intervals of 7£, including intervals of . the form (a, a) and [a, a]. Then T is a semialgebra of subsets of It. d) Let In denote the collection of all n-dimensional intervals in 1tn] that is, all sets of the form Ц x I2 x • • • x In where Ij € I for 1 < j < n. Then Tn is a semialgebra of subsets of 1tn. □ Existence of an Extension Suppose now that Q is a set, C is a semialgebra of subsets of Q, and l is a nonnegative extended real-valued set function on C satisfying Condi- tions (E1)-(E3). As we mentioned earlier, under those assumptions, there exists an extension of l to a measure, д, on a ст-algebra, Л, containing C. To obtain the extension, we will mimic the procedure used in Chapter 3 for extending the concept of length. The first step is to extend l to all subsets of Q using (4.16) on page 207 as a guide. This is done in Definition 4.17. DEFINITION 4.17 Outer Measure Let Q be a set, C a semialgebra of subsets of Q, and l a nonnegative ex- tended real-valued set function on C satisfying Conditions (E1)-(E3). Then the set function, д*, defined on P(Q) by д*(0) = 0 and M*(A) = inf J J>(Cn) ; {C„}n С C, (JCn D A к for A / 0, is called the outer measure induced by l and C.
210 □ Chapter 4 Measure Theory The next example provides some illustrations of outer measure. The details of verification are left to the reader as exercises. EXAMPLE 4.1 3 Illustrates Definition 4.17 a) Suppose that Q = {#i, #2?,xn} is a finite set and {ax, a2, • • • ? an} are nonnegative real numbers. Let C denote the collection of sets consisting of the empty set and all singleton sets. Define l on C by t(0) = 0 and = ak for 1 < к < n. Then Conditions (E1)-(E3) hold and m*(a) = 22 хкел for each A C Q. b) Let T denote the collection of all intervals of 7£, including degenerate intervals of the form (a, a) and [a, a]. Take Q = 7£, C = Z, and l = £ ( = length). Then Conditions (E1)-(E3) hold and /i* = A*; that is, the outer measure induced by £ and I is Lebesgue outer measure. c) Let In denote the collection of all n-dimensional intervals in 1Zn. Take Q = 1V1, C — Tn, and l = £n = volume; that is, for Д x I2 x • • • x In € Zn? £n(A x I2 x • • • x In) = £(/i)£(/2) • • -£(ln). Then Conditions (E1)-(E3) hold. The outer measure induced by £n and Tn is called n-dimensional Lebesgue outer measure and is denoted by A*. □ Some basic properties of outer measure are provided by the following proposition. Note that part (a) of the proposition shows that /z* is indeed an extension of l. PROPOSITION 4.13 The outer measure, /z*, induced by l and C satisfies a) — 1; that is, = l(C) for C G C. b) p*(A) > 0, for all A C Q. (nonnegativity) с) А с В => /z*(A) < /z*(S). d) M*(Un An) < (monotonicity) (countable subadditivity) PROOF: We leave the proofs of parts (b) and (c) as exercises. a) Let С G C. If C = 0, then, by Condition (El) and Definition 4.17, t(0) = 0 = ;z*(0). So, assume C/0. Since {С} С C and C D C, we have /z*(C) < t(C). On the other hand, if {Cn}n С C and |Jn Cn D C, then, by Condition (E3), l(C) < Thus, t(C) < /z*(C). d) If /z*(An) = 00 for some n, then, by part (b), the required inequality holds. So, we can assume that /z*(An) < 00 for all n. Let б > 0 be
4.6 Extensions to Measures □ 211 given. For each n, choose {Cnk}k С C such that \JkCnk D An and Efci(Cnfc) < /z*(An) + e/2n. Then {Cnfc}n,fc С C, \Jn>kCnk D UnA» and, therefore, д*(ил”) E^*) = EE^) ' n ' n,k п к <22(д*(Ап) + ^)<£М*Ш4-е. n n Because e > 0 was arbitrarily chosen, /1* ((Jn An) < We have now completed the first step in obtaining the extension of l to a measure on a cr-algebra containing C; namely, the construction of the outer measure, /i*, which is an extension of l to all subsets of fi. The second step is to restrict /i* to an appropriate a-algebra of subsets of Q so as to ensure countable additivity. Thus, with (4.17) on page 207 in mind, we make the following definition. DEFINITION 4.18 Measurable Sets A set E С fi is said to be /i*-measilrable if /i*(W) = /i*(W П E) + П Ec) (4.18) for all subsets W of Q. The collection of all /^-measurable sets is denoted by A. EXAMPLE 4.1 4 Illustrates Definition 4.18 a) Suppose that fi = {xi, 2:2,, xn} is a finite set and {ai, a2? • • •,o,n} are nonnegative real numbers. Let C denote the collection of sets consisting of the empty set and all singleton sets. Define l on C by t(0) = 0 and t({x*}) = o>k for 1 < к < n. Referring to Example 4.13(a), it is easy to see that A = P(f2). In other words, all subsets of Q are /^-measurable. b) Take Q = 7£, C = Z, and l = t. Then, by Example 4.13(b), /1* = A*. Hence, in this case, the /immeasurable sets are the Lebesgue measurable sets; that is, A = Л4. c) Take Q = 7£n, C = Tn, and l = tn = volume. Then the /1*-measurable (i.e., A*-measurable) sets are called n-dimensional Lebesgue mea- surable sets and the collection of all such sets is denoted by Mn. □
212 □ Chapter 4 Measure Theory We claim that the set function, /1 = /х*|д, is the required extension of l. To verify this, we must now establish three facts: A D C, A is a cr-algebra, and /1 is a measure on A. The proofs of these facts are considered in the following three propositions. PROPOSITION 4.14 Every set C EC is ^-measurable. That is, A D C. PROOF: Let C € C. We must show that (4.18) holds with E = C. Because of countable subadditivity (Proposition 4.13(d)), it suffices to prove that g*(W) > ^(W + Cc). (4.19) If C = 0, it is trivial. So assume C/0. If /i*(W) = 00, then clearly (4.19) holds. So, assume that /i*(W) < 00. Let c > 0 be given. Choose {Cn}n С C such that W C |Jn Cn and £t(C'n)<M*(TV) + e. (4.20) n Now, W А С C Un(Cn Cl C) and, hence, by Proposition 4.13, д,(1ГПС)<^/1‘(СпПС) = £\(СпПС). (4.21) Also, we have WnCc C Un(Cn ACC) and, so, Proposition 4.13 implies that /x*(W A Cc) < /i*(Cn A Cc). Since С E C and C is a semialgebra, there exist a finite number of pairwise disjoint members of C, say Ai, ..., Am, such that Cc = UZLi Ak- Then, for each n, Cn A Cc = A Ak) and, therefore, M*(Cn A Cc) < p,*(Cn A Ak) = A Ak). Consequently, M*(WnCc)< J2j\(CnnAfe). (4.22) n k=l Because of (4.21) and (4.22), we can conclude that f!*(W П C) + p.*(W П Cc) < 22 c(Cn П C) + 22 E i(Cn n Ak) П V=1 (4-23) = 22 (^Cn n C) + 22 i(Cn n Ak)). n ' fc=l '
4.6 Extensions to Measures □ 213 But, Cn = (Cn П C) U (Cn n Cc) = {Cn n С) U (U£=i(Cn n Л)), Which is a finite disjoint union of members of C. Thus, by Condition (E2), m t(Cn) = t(CnnC) + (4.24) fc=l Substituting the left-hand side of (4.24) for the right-hand side in (4.23) and employing (4.20), we can conclude that H*{WnC) + f(WO Cc) < ^2i(Cn) < p,*{W) + e. n As e > 0 was arbitrary, we see that (4.19) holds. PROPOSITION 4.15 A is a а-algebra of subsets of Q. PROOF: The proof is a duplication of the one given for Theorem 3.11 on page 120 with Л4 replaced by A and A* replaced by /1*. PROPOSITION 4.16 Let /1 = /х*|Л. Then /i is a measure on A. PROOF: Since, by definition, /i*(0) = 0, it follows that /1(0) = 0. Also, by Proposition 4.13(b), /i*(A) > 0 for all A G Q and, hence, /i(A) > 0 for all A € A. To show that /i is countably additive, we duplicate the proof of Theorem 3.12 on page 122, replacing Л4 by A, A* by /i*, and A by /i. We have now established that /i is the required extension of l. As an added bonus, it turns out that the measure space, (Q,A,/i), is complete. To see this, let A G A with /i(A) = 0. We must show that if В G A, then В € A. By the monotonicity of /i*, we have /i*(B) < /i*(A) = /i(A) = 0. Therefore, /i*(B) = 0. Now, let W C Q. Then /i*(W П B) < /i*(B) = 0 and /i*(W П Bc) < Thus, /i*(W) >/1*(ЖПВс) =/i*(WnB)+/i*(WnBc), which implies that В € A. The results that we have obtained so far are summarized in the fol- lowing theorem.
214 □ Chapter 4 Measure Theory THEOREM 4.11 Extension Theorem Suppose fi is a set, C is a semialgebra of subsets of fl, and l is a nonneg- ative extended real-valued function on C satisfying Conditions (E1)~(E3) on page 208. Let p* denote the outer measure induced by c and C, A the collection of p*-measurable sets, and p = М*|д- Then A is a a-algebra, 4 D C, p is a measure on A, and p\c = Moreover, the measure space, (tl,A,p), is complete. An important application of Theorem 4.11 is to n-dimensional Lebesgue measure: Let fi = 1Zn, C =Tn, and l = tn = volume. Then /i* = A* and A = Л4П. The restriction of A* to Mn is denoted by An and is called n-dimensional Lebesgue measure. Uniqueness of an Extension Theorem-4.il states, in particular, that l has an extension to a measure on a cr-algebra containing C, thus answering Question 1 on page 208. Now we will consider Question 2, the question of uniqueness: Under the assump- tions of Theorem 4.11, is an extension of c to a measure on a cr-algebra containing C unique? In general, the answer to the uniqueness question is no (see, for instance, Exercise 4.107). However, under certain conditions, we can establish uniqueness results. We now proceed to do that. To begin, we define two collections of subsets of Q associated with C: C(r denotes the collection of all subsets of Q that are countable unions of members of C; in other words, E € Ca if and only if there exists {Cn}n С C such that E = |Jn Cn. Са$ denotes the collection of all subsets of fi that are countable intersections of members of Ca; in other words, F G Cas if and only if there exists {En}n C Ca such that F — Qn En. Next, we establish three lemmas that are required in order for us to prove a uniqueness theorem. LEMMA 4.1 Let A C Q. a) Given e > 0, there is an E G Ca with E D A and p*(E) < p*(A) 4- e. b) There is an F € Ca$ such that F D A and p*(F) = P*(A). PROOF: a) If p*(A) = oo, then also /i*(Q) = oo. The required result now follows because Q G Ca. (Why?) So, assume that p*(A) < oo. Then there exists {Cn} С C such that (Jn Cn Э A and £n t(Cn) < д*(А) 4- e. Let
4.6 Extensions to Measures □ 215 E = Un Cn. Then E e Ca, E D A, and д*(Е) < 5>*(Cn) = J>(Cn) < M*(A) + e. b) If /i*(A) = oo, take F = Q. So, assume that /i*(A) < oo. By part (a), we can, for each n € AT, choose En G Ca such that En D A and Д*(ВП) < д*(А)+1/п. Let F = |X=1 En. Then F G Ca6 and F D A. In particular, then, /i*(F) > /z*(A). On the other hand, because F C En, we have p*(F) < /1*(ВП) < /i*(A) + l/n f°r a^ n G ЛА Therefore, д*(В)<д*(А). LEMMA 4.2 The algebra generated by C, Ao(C), consists of the empty set and all finite disjoint unions of members of C. PROOF: Let denote the collection of sets consisting of the empty set and all finite disjoint unions of members of C. We must prove that T> — Аъ(С). Clearly any algebra of sets containing C must contain P; so, С Ao(C). To establish the reverse inequality, it suffices to prove that is an algebra, because Aq(C} is the smallest algebra containing C. First we show that is closed under finite intersections. So, suppose A G V and В G V. We claim that А П В G V. If either A or В is empty, then А П В = 0 G T). So, assume neither A nor В is empty. Then there exists a pairwise disjoint sequence, {Ai}Jl1? of members of C such that A = (Jili and a pairwise disjoint sequence, {Bj}j=1, of members of C such that В = Uj=i Bj- Consequently, m / ✓ n \ \ m n AnB = U An(jBj) =ии(лпвд i=l \ J = 1 / i=lj=l Since A^ Bj G C, we have А; П Bj G C. Moreover, since the AiS and BjS are each pairwise disjoint, so are the (Ai П By)s. Hence, А П В is a finite disjoint union of members of C and, consequently, is a member of V. Next, we show that V is closed under complementation. Assume that A G P. If A = 0, then Ac = Q G (Why?) If A 0, then there exists a pairwise disjoint sequence, {Ai}£L 1? of members of C such that A = Uili Л- Since Ai G C, Af is a finite disjoint union of members of C; hence, Af G V. From the previous paragraph, we know T) is closed under finite intersections. Thus, Ac = x A^ G
216 □ Chapter 4 Measure Theory LEMMA 4.3 Let E 6 Ca. Then E can be written as a countable disjoint union of members of C. PROOF: By definition, there exists {Cn}n С C such that E = |Jn Cn. In particular, {Cn}n С До(С). Let = Ci and Dn = Cn\|JZ=i for n - 2- Then the Dns are pairwise disjoint and E — (Jn Dn. Moreover, Dn € Ло(С) for each n. Without loss of generality, we can assume that Dn /= 0 for all n. Since Dn € Ло(С), we know by Lemma 4.2 that there is a finite sequence, of pairwise disjoint members of C such that Dn = U*=i &nj- It follows that {£11,..., Eik! , £*21 ? • • •, E2k2> • • •} is a countable collection of pairwise disjoint members of C whose union is E. We are now in a position to prove a theorem that deals with the question of uniqueness for an extension of l to a measure on a a-algebra containing C. THEOREM 4.12 Let Q be a set, C a semialgebra of subsets of П, and l a nonnegative extended real-valued function on C satisfying Conditions (E1)-(E3) on page 208. Suppose there is a sequence, {Cn}n, of subsets of Q such that (E4) {Cn}n с C, |Jn Cn = Q, and ь(Сп) < 00 for each n. Then there exists a unique extension of l to a measure on A(C), the а-algebra generated by C. PROOF: Let /x* be the outer measure induced by l and C, A the collection of jx*-measurable sets, and p = /х*|д. By Theorem 4.11, A is a cr-algebra, A D C, p is a measure on A, and p\c = ь. It follows that A D A(C) and that if we define v = P\a(C), then v is an extension of l to A(C). Therefore, the existence portion of the theorem is established. It remains to prove the uniqueness portion of the theorem, that 1/ is the only extension of l to Л(С). In other words, we must show that if т is a measure on Л(С) with r(C) = t(C) for all C € C, then r(A) = i/(A), A € A(C). (4.25) In establishing (4.25), we will use the fact that CG С Л(С), which follows because A(C) is a cr-algebra containing C. First, we will show that t(E) = i/(E), EeCa. (4.26)
4.6 Extensions to Measures □ 217 If E G Ca, then, by Lemma 4.3, there exists {Cn}n С C with Ci П Cj = 0, forfi / j, such that E = |J Cn. Consequently, 7(E) = £ 7(Cn) = Y, <cn) = £ ^(Cn) = p(E), n n n which establishes (4.26). Next, we will show that r(A) = i/(A), A e A(C), i/(A) < oo. (4.27) For a given e > 0, we can, by Lemma 4.1(a), select a set E € Ca such that E D A and /1*(Е) < /i*(A) 4- б which, in this case, is equivalent to i/(E’) < i/(A) 4- e. As E D A and E G Ca, we conclude from (4.26) that t(A) < r(E) = i/(E) < i/(A) + 6. As e > 0 was arbitrary, we see that r(A) < i/(A), A G A(C), i/(A) < oo. (4.28) To prove the reverse inequality, we again select, for a given б > 0, a set E G Ca such that E D A and v(E) < p(A) 4- 6. Since i/(A) < oo, we have v(E \ A) = v(E) - i/(A) < 6. Applying (4.28) to E \ A, we obtain т(Е \ A) < 6. Hence, by (4.26), we can now conclude that p(A) < i/(E) = t(E) = t(A) + r(E \ A) < t(A) 4- 6. As e > 0 was arbitrary, we see that i/(A) < r(A). This and (4.28) imply that (4.27) holds. It remains to establish (4.25) when i/(A) = oo. Let {(7n}n be as in Condition (E4). By Exercise 4.106, we can assume that the Cns are pairwise disjoint. Now, A = AnQ = |Jn(A^n)’ Because i/(A П Cn) < u(Cn) = b(Cn) < oo, (4.27) implies that i/(A П Cn) = т(А П Cn). Consequently, i/(A) = J>(A П Cn) Cn) = 7(A). n n The proof of the theorem is now complete. Three particularly important consequences of Theorem 4.12 are given here in Corollaries 4.5-4.7. We will refer to these corollaries frequently.
218 □ Chapter 4 Measure Theory COROLLARY 4.5 Let (Q, A, p) be a measure space. Suppose that C is a semialgebra of subsets of Q such that the а-algebra generated by C is A. Further suppose that there is a sequence, {Cn}n С C, with Un^ = an(^ < °° f°r each n. If v is a measure on A such that v(C) = p(C) for all C € C, then v = p, that is, = /i(A) for all A € A. PROOF: Let i = p\c (= P|c)• Since p is a measure, it follows immediately that Conditions (E1)-(E3) are satisfied by i and C. Also, by assumption, Condition (E4) holds. Therefore, by Theorem 4.12, i has a unique extension to the a-algebra generated by C, which, by hypothesis, is A. Since both p and if are extensions of i to A, it must be that if = p. COROLLARY 4.6 Let p and v be two Borel measures (i.e., measures onB) such that р(Г) < oo for all finite intervals and p(I) = i/(I) for all I G I. Then p = v. PROOF: By Exercise 4.98(b), T is a semialgebra and, by Exercise 4.108, the cr-algebra generated by Z is B. We have H = U^=i[“n?n] an<^ by assumption, /z([—n, n]) < oo for all n E J\l\ The required result now follows from Corollary 4.5. COROLLARY 4.7 Let (Q, Л) be a measurable space and C a semialgebra of subsets offl such that the а-algebra generated by C is A. If p and v are two finite measures on A such that p(C) = v(C) for all C EC, then p = u. PROOF: By Exercise 4.105(a), Q G Ca. As p is a finite measure, we see that all the assumptions of Corollary 4.5 are satisfied. Remark: In Corollary 4.7, we really need only assume that at least one of the measures, p and i/, is finite (why?). EXERCISES 4.6 4.95 Provide the details showing that Conditions (E1)-(E3) on page 208 are necessary for the extension of l to a measure on a a-algebra containing C. ★ 4.96 Let C denote the collection of intervals of R of the form (a, 6] and (c, oo), where — oo < a < b < oo and —oo < c < oo. Prove that C is a semialgebra and that Л(С) = В. 4.97 Suppose that Q = {#i, #2,.. •, £n} is a finite set and {ai, аг,..., an} are nonnegative real numbers. Let C denote the collection of sets consisting of
4.6 Extensions to Measures □ 219 the empty set and all singleton sets, that is, sets of the form {#}, where x G Q. Define l on C by t(0) = 0 and t({zfc}) = л/с for 1 < fc < n. a) Verify that Conditions (E1)-(E3) on page 208 hold. b) Show that C is a semialgebra of subsets of Q. 4.98 Let T denote the collection of all intervals of 7£, including degenerate in- tervals of the form (a, a) and [a, a]. Take Q = Tfc, C = Z, and l = I ( = length). a) Show that Conditions (E1)-(E3) on page 208 hold. b) Show that I is a semialgebra of subsets of TZ. 4.99 Let Z2 denote the collection of all two-dimensional intervals in T?2; that is, all sets of the form /1 x I2 where I3e T for 1 < j < 2. Take Q = 7£2, C = Z2, and l = £2 = area; that is, for Zi x Z2 G Z2, £z(Zi x Z2) = Z(Zi)Z(Z2). a) Show that Conditions (El)-(E3) on page 208 hold. b) Show that Z2 is a semialgebra of subsets of 1Z2. 4.100 Generalize Exercise 4.99 to n-dimensions. 4.101 Refer to Exercise 4.97. Prove that p* = ]Г£=1 a^xk\ that is, prove that м*И) = ИХкЕА ak for each A c Q- 4.102 Refer to Exercise 4.98. Prove that the outer measure, /1*, induced by £ and Z is Lebesgue outer measure. 4.103 Prove parts (b) and (c) of Proposition 4.13 on page 210. 4.104 Refer to Exercises 4.97 and 4.101. Establish that every subset of Q is /2*-measurable; that is, A = P(Q). 4.105 Let C be a semialgebra of subsets of Q. a) Prove that Q G b) Is it necessarily true that Q G C? 4.106 Suppose Condition (E4) on page 216 holds. Prove there exists {En}n С C with (Jn En = Q, EiO Ej = 0, for i / J, and c(En) < 00 for each n. 4.107 Prove that Condition (E4) cannot be omitted as a hypothesis in Theo- rem 4.12. Hint: Let C be as in Exercise 4.96, t(0) = 0, and l(C) = 00 for C G C and C / 0. 4.108 Let Z be as in Exercise 4.98. Show that the cr-algebra generated by Z is B. 4.109 Let I be as in Exercise 4.98. Suppose that g is a nonnegative Lebesgue measurable function on 1Z satisfying gdX < 00 for each n G Define l on Z by t(C) = [ gdX. Jc a) Verify that Conditions (E1)-(E3) are satisfied by l and Z. b) Show that there is a unique extension of l to a measure, д, on В and that 11(B) = fBg dX. 4.110 Suppose that p and 1/ are two finite Borel measures with the property that //((—00, я]) = i/((—00, re]) for all x elZ. Prove that /1 = 1/.
220 □ Chapter 4 Measure Theory 4.111 Can the finiteness assumption be dropped in Exercise 4.110? Explain. 4.112 Suppose that p and и are two Borel measures with the property that /i((—oo, я]) = г/((~-оо,я]) < oo for all x 6 H. Prove that /1 = v. it4.113 Let Q be a set, C a semialgebra of subsets of Q, and t a nonnegative extended real-valued function on C satisfying Conditions (E1)-(E4). Also, let /i* be the outer measure induced by l and C, A the collection of /im- measurable sets, and p = М*|д- Suppose that E G A. a) Show that there is an A € A(C) with A Z) E and p(A \ E) = 0. Hint: First assume that p(E) < oo and employ Lemma 4.1. b) Show that there is a В G A(C) with BcE and p(E \ B) = 0. it4.114 Let Q be a set, C a semialgebra of subsets of Q, and l a nonnegative extended real-valued function on C satisfying Conditions (E1)-(E4). Also, let /z* be the outer measure induced by l and C, A the collection of //*- measurable sets, p = м*|д, and u = М|Л(С)- Prove that (Q,A, p) is the completion of (Q, A(C),i/). Hint: Use Exercise 4.113 and Exercise 4.17 on page 174. 4.115 Consider the fneasure space (7£, A4,A). a) Can we deduce from Theorem 4.12 that A is the unique extension of length to a measure on Л4? Explain. b) Prove that A is the unique extension of length to a measure on Л4. 4.7 THE LEBESGUE-STIELTJES INTEGRAL In the previous section, we developed existence and uniqueness theorems for extensions to measures. Specifically, suppose that Q is a set, C is a semialgebra of subsets of Q, and l is a nonnegative extended real-valued function on C. If Conditions (E1)-(E3) on page 208 hold, then there is an extension of l to a measure on a a-algebra containing C; and if, in addi- tion, Condition (E4) on page 216 holds, then an extension to the smallest a-algebra containing C is unique. Two important applications of this theory are to the Lebesgue-Stieltjes integral and to product measure spaces. We will discuss the former appli- cation in this section and the latter in the next. Distribution Function of a Finite Borel Measure Recall that a measure, p, on the Borel sets, Б, is called a Borel mea- sure and that such a measure is called finite if p(1Z) < oo. With these conventions in mind, we make the following definition.
4.7 The Lebesgue-Stieltjes Integral □ 221 DEFINITION 4.19 Distribution Function of a Finite Borel Measure Let /z be a finite Borel measure. Then the distribution function of /x, denoted FM, is the real-valued function defined on 11 by = д((-оо,а:]). Note: We will sometimes omit the subscript /z in FM, provided that no confusion will arise. Example 4.15 gives some illustrations of distribution functions. The reader should supply the details of verification. EXAMPLE 4.15 Illustrates Definition 4.19 a) Let fi = A|#. Then /z is a Borel measure but is not finite because /z(7£) — X(1Z) = oo. Hence, we do not define the distribution function of fl. b) For В e B, define fi(B) = A(Bn (0,1)). Then fi is a Borel measure and, as fi(lZ) = A ((0,1)) — 1 < oo, it is a finite Borel measure. Its distribution function, FM, is easily seen to be r 0, x < 0; FM(z) = < x, 0 < x < 1; 11, z>l. c) Recall that if b G 7£, then the set function, is a measure on P(7£), called the Dirac measure concentrated at b. Let fi = restricted to B. Then fi is a finite Borel measure and „ f ч f 0, x < b; W = b, X>b. d) Suppose that {nn}^=i is a sequence of nonnegative real numbers with 52^= i an < °0- Define fi on В by m(b) = 52an- n€B
222 □ Chapter 4 Measure Theory Then fi is a finite Borel measure whose distribution function is И = an> n=l where [z] denotes the greatest integer in x. □ Some of the more important properties of distribution functions are presented in the next two propositions. PROPOSITION 4.17 Let fi be a finite Borel measure and F its distribution function. Then a) F is monotone nondecreasing. b) F is right continuous. c) F is bounded. d) Ипъс_>_О0 F(x) = 0. \ PROOF: a) If x < ?/, then (—оо,ж] C (—oo,?/] and, hence, by the monotonicity property of measures, F(x) = fi((—oo,x]) < g((—oo, ?/]) = F(y). b) Let x e TZ. Since F is nondecreasing, limy;x F(y) = F(x+) exists. Now, (-oo.x + 1] D (-oo,z 4- |] D ••• and П^°=1(-оо, x + £] = (~oo,z]. Therefore, because fi is a finite Borel measure, we have, by Theo- rem 4.1(c) on page 170, F(x) = /z((-oo,x]) = ^lim, д((—oo,x + £]) = lim F(x -I- A) = F(x+). n—+oo 4 n/ Hence, F(x+) = F(x), that is, F is right continuous at x. Because x € TZ was arbitrarily chosen, we see that F is right continuous. c) As fi is a finite measure, we have F(x) = fi((—oo,x]) < fi(1Z) < oo, for each x € TZ. Hence F is bounded by fi(TZ). d) First note that, because F is monotone, lim^-so F(x) exists. Also, we have 0 = D^L1(—co, — n] and (—oo, —1] D (—oo, —2] D • • •. Thus, since fi is a finite measure, 0 = /z(0) = lim /z((-oo, -n]) = lim F(-n) = lim F(x). The last equality holds because lim^-^-oo F(x) exists. Proposition 4.17(a) shows that FM is monotone nondecreasing. Hence, F^x) has a limit as both x —> — oo and x —» oo. We denote those limits by FM(—oo) and FM(oo), respectively. By Proposition 4.17(d), FM(—oo) = 0; and it is easy to prove that FM(oo) = fi(TZ).
4.7 The Lebesgue-Stieltjes Integral □ 223 PROPOSITION 4.18 Let be a finite Borel measure and F its distribution function. Then, for —oo < а < b < oo, д((а, b]) = F(b) - F(a) (4.29) and, for —oo < c < oo, /z((c, oo)) = F(oo) - F(c). (4.30) PROOF: If а = — oo, then, by Definition 4.19 and Proposition 4.17(d), g((a, b]) = F(b) = F(b) - F(-oo) = F(b) - F(a). If —oo < a < oo, then, since fi is a finite measure, we have д((а, 6]) = /z((-oo, 6]) - p((-oo, a]) = F(b) - F(a). This proves (4.29). To prove (4.30), note that д((с, oo)) = д(тг) - м((-00,c]) = F(oo) - F(c), as required. Lebesgue-Stieltjes Measure We now consider the following two important questions concerning a real- valued function, F, on 7£: Question 1: Under what conditions is F the distribution function of some finite Borel measure? Question 2: Can F be the distribution function for two different finite Borel measures? As we have just seen, a necessary condition for a real-valued function, F, on R, to be the distribution function of some finite Borel measure is that (a)-(d) of Proposition 4.17 hold. In other words, unless F satisfies the properties listed in Proposition 4.17, it cannot possibly be the distribution function of a finite Borel measure. By employing Theorem 4.12 on page 216, we can show that the proper- ties listed in Proposition 4.17 are not only necessary, but are also sufficient for F to be the distribution function of some finite Borel measure. More- over, using that same theorem, we can prove that the answer to Question 2 is no. So, assume that F satisfies (a)-(d) of Proposition 4.17. We will use F to define a nonnegative set function, l, on a semialgebra, C, of subsets
224 □ Chapter 4 Measure Theory of H. Then we will prove that Conditions (E1)-(E4) of Section 4.6 hold for l and C. Finally, we will show that the measure, /z, guaranteed by Theorem 4.12 is a finite Borel measure whose distribution function is F and that /z is the only such measure. To begin, let C denote the collection of intervals of 11 of the form (a,b] or (с, сю), where —oo < a < b < oo and —oo < c < oo. Then, by Exercise 4.96 on page 218, C is a semialgebra and Л(С) = В. Next, we want to use F to define a nonnegative set function, z, on C in such a way that if /z is an extension of l to a measure on Z3, then F is the distribution function of /z. In view of (4.29) and (4.30), we see that l should be defined on C as follows: For —oo<a<b<oo, t((a,i>])=F(b)-F(a), (4.31) and, for —oo < c < oo, z((c, oo)) = F(oo) - F(c). (4.32) Note that l is nonnegative because, by assumption, F is nondecreasing. Now we will verify that Conditions (E1)-(E4) hold for l and C. Us- ing (4.31) with b = a, we see that z(0) = z((a, a]) = F(a)—F(a) = 0. Hence, Condition (El) on page 208 holds. To verify Condition (E4) on page 216, we can, for instance, take {Cn}n to consist of the single set (—00,00). The validity of Conditions (E2) and (E3) for l and C are established in Lemmas 4.4 and 4.5, respectively. In proving those lemmas, it is con- venient to write (c, 00) as (с, ш], with the conventions that F(cj) = F(oo), t((c, tu]) = F(oo) - F(c), and (c, tu]c = (—00, с]. Using this notation, C con- sists of all sets of the form (a, b], where either —00 <a<b<ooora> —00 and b = tu. LEMMA 4.4 Suppose that is a finite sequence of pairwise disjoint members ofC whose union is in C. Then z(|J^=1 Ck) = 52/Ui ь(Ск)- PROOF: Set C = Ufc=i &k- Then, by assumption, C € C. So we can write C = (a, b] and Ck = (ak,bk\, 1 < к < n. Without loss of generality, we can assume that ax < аг < • • • < an- Since UZ=i = C and the C^s are pairwise disjoint, а = ax < b± = a 2 < 62 = • • • = Qn-i < bn-i — an and bn = b. Hence, t(C) = F(6) - F(a) = £(F(i>fc) - F(afc)) = fc=l fc=l as required.
4.7 The Lebesgue-Stieltjes Integral □ 225 LEMMA 4.5 Assume C, C2, ... are in C and С C |Jn ^n- Then ь(Сп). PROOF: We can write Cn = (an, bn], for each n, and C = (a, b]. Assume first that a, b G Я; that is, C is a finite interval. Let б > 0 be given. Because F is right continuous, we can choose a <5 > 0 such that F(a+<5) < F(a) 4-б/2 and, for each n, a 6n > 0 such that F(bn + <5n) < F(bn) + e/2n+1. The interval [a + <5, b] is closed and bounded and, since С C Un^’ it follows that [a + <5,6] C Un(an»^n + <5n)« Hence, by the Heine-Borel theorem, there is an N G A/* such that [a + <5,6] C U^=i(an, bn + <5n). Set In — bn + 6n). Arguing as in Proposition 3.2 On page 107, we find that there is an integer m, with m < N, and a sequence of intervals {Л}£1 such that Ji = (ci,d$) G {ln}n=i> for 1 < г < m, and Cj < a 4“ 5, c2 < dj d2, ..., Cm dm—Y < b <z dm. Because F is nondecreasing and C {ln}n=i C {(an> &n+ <5n)}n> we conclude that F(b)-F(a + 6)<F(dTn)-F(c1) < F(dm) - F(cj) + (F(d!) - F(c2)) + -.- + (F(dro_1)-F(Cm)) m = - Ffo)) < ]T(F(bn + 6n) - F(an)). i=l n Consequently, F(b) - F(a) < F(i>) - F(a + 6) + f < £(F(6n + 6n)-F(an)) + f n < £№) + 2^r - F(an)) + f n < 2(F(6n)-F(an))+e. n In other words, l(C) < t(Cn) + e. As б > 0 was arbitrary, we have б(С) < 52n^(C'n), as required. The lemma has now been established when C is a finite interval. The proof for the case where C is an infinite interval is left as an exercise for the reader. We have now verified that Conditions (E1)-(E4) are satisfied by l and C. Using that fact, we can prove a theorem that answers Questions 1 and 2 on page 208.
226 □ Chapter 4 Measure Theory THEOREM 4.13 Suppose that F is a real-valued function on 7Z satisfying (a)-(d) of Proposi- tion 4.17 on page 222. Then there is a unique finite Borel measure having F as its distribution function. PROOF: Let C and l be as defined earlier. Since Conditions (E1)-(E4) are satisfied by C and t, Theorem 4.12 implies that there is a unique extension of l to a measure, p, on A(C) — B. Using the fact that p is an extension of l and the relation (4.31), we conclude that, for each x € 7£, p((—oo, x]) = t((—oo,x]) = F(x) — F(—oo) = F(x). Thus, F is the distribution function of p. Suppose that v is also a finite Borel measure having F as its distri- bution function. Then, by Proposition 4.18 and the definition of t, we see that i/|C = l. Therefore, by the uniqueness of the extension of t, we must have v = p. Theorem 4.13 reveals that the properties listed in Proposition 4.17 on page 222 are sufficient conditions for a real-valued function on to be the distribution function of some finite Borel measure. Consequently, we make the following definition. DEFINITION 4.20 Distribution Function; Lebesgue-Stieltjes Measure A real-valued function, F, on TZ is called a distribution function provided that the following conditions hold: a) F is monotone nondecreasing, b) F is right continuous, c) F is bounded. d) lim^.oo F(x) = 0. For such a function, the unique finite Borel measure having F as its distribution function is called the Lebesgue-Stieltjes measure cor- responding to F. The next example provides some illustrations of Theorem 4.13 and Definition 4.20. The details of verification are left to the reader as exercises.
4.7 The Lebesgue-Stieltjes Integral □ 227 EXAMPLE 4.1 6 Illustrates Theorem 4.13 and Definition 4.20 a) Let F be defined by x < 0; 0 < x < 1; x > 1. Then F is bounded, nondecreasing, continuous, and F(—oo) = 0. Con- sequently, by Theorem 4.13, there is a unique finite Borel measure hav- ing F as its distribution function. Let /z be the Borel measure defined by = Л(ВП (0,1)). Then, as we discovered in Example 4.15(b) on page 221, /z has F as its distribution function. Hence, /z is the unique fi- nite Borel measure having F as its distribution function; in other words, /z is the Lebesgue-Stieltjes measure corresponding to F. b) Let g be a nonnegative Lebesgue integrable function on 11. Define F on 1Z by F(z) = [ g(t)dX(t). (4.33) J (—oo,a;] Then F is nondecreasing, continuous, bounded, and F(-oo) = 0. So, by Theorem 4.13, there is a unique finite Borel measure having F as its distribution function. For В € Б, define ц(В) = jBgdX. Then /z is a finite Borel measure and, clearly, F is the distribution function of /z. Consequently, /z is the Lebesgue-Stieltjes measure corresponding to F. ' □ The Lebesgue-Stieltjes Integral Assume F is a distribution function; that is, a real-valued function on 1Z satisfying (a)-(d) of Proposition 4.17 on page 222. Then, as we know from Theorem 4.13, there is a unique finite Borel measure, /z, having F as its distribution function. Hence, it is natural to make the following definition. DEFINITION 4.21 Lebesgue-Stieltjes Integral Suppose that F is a distribution function and that /z is the Lebesgue- Stieltjes measure corresponding to F. Let f be a Borel measurable function and В € В. Then the Lebesgue-Stieltjes integral of f over В with respect to F is defined to be [ f(x)dF(x)= [ /(x)d/z(z), J в Jb provided the integral on the right-hand side makes sense.
228 □ Chapter 4 Measure Theory EXAMPLE 4.1 7 Illustrates Definition 4.21 sl) Let °, F(x) = x, x < 0; 0 < x < 1; 11, z>l. By Example 4.16(a), F is a distribution function and the Lebesgue- Stieltjes measure corresponding to F is given by = A(B П (0,1)), В € В. Let f be a Borel measurable function and В € В. Then the Lebesgue-Stieltjes integral of f over В equals [ fdF= [ fdp = [ fdX В Jb JBn(0,l) (4.34) provided the integral makes sense. To verify the last equality in (4.34), we apply the bootstrapping technique. The details are left to the reader as an exercise. b) Let g be a nonnegative Lebesgue integrable function. Define F on H by F(x) = [ g(t)dX(t). J (—00,2:] By Example 4.16(b), F is a distribution function and the Lebesgue- Stieltjes measure corresponding to F is given by /z(B) = fBg dX, В € В. Let f be a Borel measurable function and В € В. Then the Lebesgue- Stieltjes integral of f over В equals f fdF= f fd» = f fgdX (4.35) J В J В J в provided the integral makes sense. To establish the last equality in (4.35), we proceed as follows. By Exercise 4.61 on page 191, the equality holds if f is a nonnegative Borel measurable function. If / is an extended real-valued Borel measurable function, write f = /+ — f~ and use the linearity of the abstract Lebesgue integral to conclude that (4.35) again holds. Finally, if f is a complex-valued Borel measurable function, write f = %tf 4- and apply the linearity of the abstract Lebesgue integral to again conclude that (4.35) obtains. Before leaving this example, we should point out that part (a) is a special case of part (b) with g = X(o,i)« □
4.7 The Lebesgue-Stieltjes Integral □ 229 EXERCISES 4.7 4.116 Provide the details for the illustrations given in parts (b)-(d) of Exam- ple 4.15 on page 221. 4.117 Let {zn}n be a sequence of distinct real numbers and {6n}n a sequence of nonnegative real numbers with J2n bn < oo. Define /z on В by m(B) = 52 bn xnEB a) Explain why p is a finite Borel measure. b) Determine the distribution function of p. 4.118 Define p on В by /z(B) = Х[о,оо)(я)яе-х dX(x). a) Explain why p is a finite Borel measure. b) Determine the distribution function of p. 4.119 Let p be a finite Borel measure. Prove that FM(oo) = p(7V). 4.120 Complete the proof of Lemma 4.5 on page 225. In other words, prove that Condition (E3) on page 208 is satisfied by t and C when C is an infinite interval. Hint: First assume C = (—oo, 6], where b < oo, and note that t(C) = limx-^-oo l ((x,b]). 4.121 Verify all statements made in Example 4.16(a) on page 227. 4.122 Verify all statements made 4.123 Verify all statements made 4.124 Verify all statements made in Example 4.16(b) on page 227. in Example 4.17(a) on page 228. in Example 4.17(b) on page 228. 4.125 Define F on H by x < 0; 0 < x < 1; 1 < x < 2; 2 < x < 3; x > 3. a) Show that F satisfies (a)-(d) of Proposition 4.17 on page 222. b) Obtain the finite Borel measure, /z, whose distribution function is F. 4.126 Suppose that {an}^! is a sequence of nonnegative real numbers and that 522^1 an < oo. Define F on 7^ by [®] F(x)=52an- n=l a) Show that F is a distribution function, that is, satisfies (a)-(d) of Propo- sition 4.17. b) Determine the Lebesgue-Stieltjes measure corresponding to F. c) If f is Borel measurable, determine J f dF.
230 □ Chapter 4 Measure Theory 4.127 Generalize the previous exercise as follows: Suppose that {xn}n is a se- quence of real numbers and that {an}n is a sequence of nonnegative real numbers with an < oo. Define F on 1Z by = 52 “»• Xn<x a) Show that F is a distribution function, that is, satisfies (a)-(d) of Propo- sition 4.17. b) Determine the Lebesgue-Stieltjes measure corresponding to F. c) If f is Borel measurable, determine f f dF. 4.128 Let a be a positive constant and define F(x) = 1 — e~ax, for x > 0, and zero otherwise. a) Show that F satisfies (a)-(d) of Proposition 4.17. b) Find a nonnegative Borel measurable function, g, such that F(a?) = f g(t) dt J —oo for all x G 1Z. c) Determine the unique finite Borel measure that has F as its distribution function. d) Find f^xdF(x) and f^ettx dF(x). Hint: Use Example 4.17(b). 4.129 Let F:7Z —* 1Z be defined by F(x) = < 0, (x + 2)/4, 1, x < —2; —2 < x < 2; x > 2. a) Show that F is a distribution function. b) Determine the Lebesgue-Stieltjes measure corresponding to F. c) Find f^xdF(x), f^x2 dF(x), and /^еах dF(x) for t elZ. 4.130 Let F: 1Z TZ be defined by n=0 where a is a positive constant. Obtain J^xdF(x) and J^x2 dF(x). 4.131 Suppose that F is a distribution function. Further suppose that F is differentiable on 1Z and that F' € R([a, 6]) for all a, b G 1Z. If f is Borel measurable, show that /(z)dF(z) = / f(x)F'(x)dX(x)
4.8 Product Measure Spaces □ 231 whenever the integral on the right-hand side makes sense. Hint: Use the fundamental theorem of calculus and Example 4.17(b) on page 228. 4.132 Let V’ denote the Cantor function, as defined on page 77. Set ro, F(x) = < V’Ce), , 1, x < 0; 0<x< 1; x > 1. a) Show that F is a distribution function. b) Verify that F' = 0 A-ae. c) Prove that the conclusion of the previous exercise is not valid. 4.8 PRODUCT MEASURE SPACES Our second application of the theory of extensions to measures, which we developed in Section 4.6, will be to product measure spaces. In this section, we will see how two measure spaces naturally give rise to a third measure space, called the product measure space. To help motivate product measure spaces, we consider the following example. EXAMPLE 4.18 Motivates Product Measure ' Note that in each of the illustrations below, a nonnegative set function is expressed in terms of a product of two measures. a) Let A denote Lebesgue measure on 1Z. If I and J are two intervals in 7£, then the Cartesian product, I x J, is a rectangle in 1Z2 ( = 1Z x 1Z) whose area can be expressed as area(/ x J) = £(/)£(J) = A(I)A(J). b) Let Г and Л be two finite sets and p and v counting measure on Г and Л, respectively. Also, as before, let N(E) denote the number of elements of a finite set, E. If А С Г and В С Л, then the number of elements of Ax В can be expressed as N(A x B) = /z(A)t/(B), as we know from the fundamental principle of counting. □
232 □ Chapter 4 Measure Theory Existence of a Product Measure Suppose that (Г, 5,/z) and (Л, T, v) are two measure spaces. As usual, we let Г x Л denote the Cartesian product of Г with Л: Г x Л = { (я, у) : x G Г and у G Л }. Our first task is to prove the existence of a cr-algebra, Л, of subsets of Г x Л that contains all sets of the form S x T, where S E S and T G T, and a measure, on A such that u/(S xT) = /z(S>(T). (4.36) This will be accomplished by applying the theory of extensions to measures. We begin with the following definition. DEFINITION 4.22 Measurable Rectangles Let (Г,5) and (Л, T) be measurable spaces. A subset of Г x A of the form S x T, where S E S and T G T, is called a measurable rect- angle. The collection of all measurable rectangles is denoted by U. Proposition 4.19 establishes that U is a semialgebra. PROPOSITION 4.19 The collection, 1Л, of all measurable rectangles is a semialgebra of subsets of Г x A. PROOF: Let A, В G U. Then there are sets Si, S2 € S and T1? T2 C T such that A = Si x Ti and В = S2 x T2. As А А В = (Si A S2) x (Ti А Г2) and S and T are сг-algebras, it follows that А А В G U. Hence, U is closed under finite intersections. Now let C G U and choose S G S and T G T such that C = S x T. Then it is easy to see that Cc = (Г x Tc) U (Sc x Г), which is a finite disjoint union of members of U. Next we define a nonnegative extended real-valued set function, ь, on U. In view of (4.36), this should be done as follows: For S G S and T G T, define t(S x T) = g(S>(T). (4-37)
4.8 Product Measure Spaces □ 233 If we can verify that Conditions (E1)-(E3) on page 208 hold for l and U, then Theorem 4.11 on page 214 will ensure the existence of a cr-algebra, Д D W, and a measure, on A satisfying (4.36) —thereby completing our first task. To verify Condition (El), we note that 0 € U. In fact, 0 = S x T if and only if at least one of S and T are empty. But then t(0) = /i(5r)z/(T’) = 0, as required. The validity of Conditions (E2) and (E3) are established here in Lemmas 4.6 and 4.7, respectively. LEMMA 4.6 Suppose that {<Л}£=1 is a finite sequence of pairwise disjoint members of U whose union is in U. Then t(Ufc=i Ct) = ^Gfe)- PROOF: Set C = U£=i &k- Then, by assumption, we can write C = S x T and Ck = Sk x Tfc, for 1 < к < n, where S, Sk G 5 and T, Tk E T. Let x € S and set Nx = { к : x G Sk }. If у € T, then (x, у) € C and so there is a к such that (x, y) G Sk *Tk\ thus, у € Tk for some к G Nx. On the other hand, if у € Tk for some к E Nx, then (я,т/) G Sk x Tk C S x T, so that у e T. Hence, T = Ukev Tk and, since the CfcS are pairwise disjoint, the sets Tk, к E'NX, are also pairwise disjoint. Consequently, u(T) = y(Tk). It follows (see Exercise 4.133) that n p(T)XS(x) = J2I/(Tfc)xsfc(x) (4.38) fc=l for all x G Г. Therefore, t(C) = n(S)v(T) = [ i/(T)xs(x)dM(x) = [ Пл 71 71 к=1 k=l k=l as required. LEMMA 4.7 Assume C, Ci, C2, . •. are in U and С C |Jn Cn. Then l(C) < t(Cn). PROOF: We can write C = S x T and Cn = Sn x Tn, where S, Sn G 5 and T, Tn G T. Let x G S and set Nx = {n : x G Sn }. If у G T,
234 □ Chapter 4 Measure Theory then (rr, у) € C and so there is an n € Nx such that € Sn x Tn. Therefore, T C UneNx an<^ so "(Р) — Y^neNx ^(^n)- This implies that l/(r)xs(x) < EnHTn)Xs„(z) for all X e Г. Thus; t(C) = /z(S>(T) = ^T)Xs(x) d^x) < £ v^Xsn(x) = v(Tn)XsAx)d^x) = Y,^n}v(Tn) = J>(Cn), nJV as required. We have now verified that Conditions (E1)-(E3) are satisfied by l and 14. Therefore, by Theorem 4.11 on page 214, we can deduce the fol- lowing result, which completes our first task. THEOREM 4.14 Suppose that (Г, 5, p) and (Л, T, v) are measure spaces. Let 14 — {S' xT : SeS and TeT} and define l on 14 by l(S x T) = p(S)v(T). Then there exists an extension of l to a measure on a a-algebra containing 14. The Product Measure Space In most Of our work with product measure spaces, it will be necessary to impose a restriction on the factors, (Г,5,/1) and (A,T, z>), namely that they are a-finite measure spaces. DEFINITION 4.23 сг-finite Measure Space A measure space, (П,Л,/i), is called a а-finite measure space if there is a sequence, {An}n, of Л-measurable sets such that |Jn An = Q and p(An) < oo for each n. EXAMPLE 4.19 Illustrates Definition 4.23 a) (7?., Л4,Л) is сг-finite. Indeed, the sets, An = [—n,n], n 6 Af, satisfy UXi An — 1Z and A(An) < oo for each пеЛГ. b) Let 7 be counting measure on P(AT). We have Af = UXi{n) an<^ т({п}) = 1 < oo for each n 6 Af. Therefore, we see that (Af, 7?(Af), 7) is a сг-finite measure space.
4.8 Product Measure Spaces □ 235 c) Let Q be a nonempty set and A = {Q, 0}. Define /z(Q) = oo and /1(0) = 0. Then (Q, A, p) is not a сг-finite measure space. d) Clearly, any finite measure space is cr-finite. In particular, any proba- bility space is a сг-finite measure space. □ As you probably noted, the condition of сг-finiteness is quite similar to Condition (E4) on page 216. In fact, the next proposition shows that, for product measure spaces, there is an important relationship between the two conditions. PROPOSITION 4.20 Suppose that (Г, 5, p) and (Л, T, i/) are two a-finite measure spaces. Let If be the semialgebra of measurable rectangles and ь the nonnegative ex- tended real-valued set function on If as defined in (4.37) on page 232. Then Condition (E4) is satisfied by l and U. PROOF: By the сг-finiteness assumption, we can choose {Sn}n C 5 and {Tn}n С T such that /i(Sn) < oo and z/(Tn) < oo, for all n, and Г = |Jn Sn and A = Un Let An = ULi $k and Bn = ULi Then {An}n C 5 and {Bn}n С T; p(An) < oo and v(Bn) < oo for all n; and Г = Un^n and A = Un ^n- Moreover, Ai С A2 C • • • and Bi С B2 C • • •. Let Cn = An x Bn. We claim that {Cn}n is the required sequence of sets; that is, {Cn}n C If, Un = Г x A, and t(Cn) < oo for each n. The first and third properties of {Cn}n are obvious from the previous paragraph. To prove the second property, suppose that (rr, у) € Г x A. Since x G Г and у G A, there is an ni with x G Ani, and an n2 with у G Bn2. Let n = max{ni,n2}. Then, because {An}n and {Bn}n are nondecreasing sequences of sets, we have x G An and у G Bn and, consequently, that (x, y) eAnxBn = Cn. Thus, Г x A = UnCn. To summarize, we have now shown that Conditions (E1)-(E3) hold for l and W; and that, if (T,5,/i) and (Л, T, i/) are both сг-finite, then Condition (E4) holds as well. Therefore, on account of Theorem 4.12 on page 216, we have the following result. THEOREM 4.15 Suppose that (Г, S, p) and (Л, T, v) are а-finite measure spaces. Let If = {S xT : S e S andT eT} and define l on If by l(S x T) = p(S)v(T). Then there exists a unique extension of l to a measure on the cr-algebra generated by If.
236 □ Chapter 4 Measure Theory Special notation and terminology are used for the extension of 6, the a-algebra generated by W, and the resulting measure space. This is intro- duced in Definition 4.24. DEFINITION 4.24 Product Measure Space Suppose that (T,5,/i) and (A,T, z/) are а-finite measure spaces and let U and l be as in Theorem 4.15. The a-algebra generated by U, the smallest a-algebra containing all measurable rectangles, is called the product a-algebra of S with T and is denoted by S X T. The unique extension of l to a measure on 5 x T is called the product measure of pt with v and is denoted by pt X v. The measure space (Г x A, S x T, p x z/) is called the product measure space of (Г, 5, p) with (Л, T, z/). Note: It is important to realize that <S x T is a notation for the a-algebra generated by U and is not the Cartesian product of the sets 5 and T. EXAMPLE 4.2 0 Illustrates Definition 4.24 a) Let (Г,<9, p) = (A,T, z/) = (7£, A4,A). Since Л4 contains all intervals, any rectangle, R € T?2, is a measurable rectangle. If R = I x J, where I and J are intervals, then (A x A)(7?) = A(I)A( J) = area(P). So, A x A is a generalization of area to all M x .M-measurable sets. b) Let (r,S,/z) = (A,T, z/) = (Af,P(Af),7), where 7 is counting measure on P(Af). As we know from Example 4.19(b), (Af, P(Af),7) is a a-finite measure space. We leave it as an exercise for the reader to show that the product a-algebra of P(Af) with Р(М) consists of all subsets of Af x Af; that is, P(Af) x P(Af) = P(J\f x Af). And, furthermore, the product measure of 7 with 7 is counting measure on P(M x X). In other words, the product measure space, (Af x Af, P(J\T) xP(Af), 7 x 7), is the measure space (Af xAf, P(Af xAf), к), where к is counting measure on P[N^N). c) If (П1,Л1,Р1) and (Q2,-4-2,P2) are two probability spaces, then so is (Qi x 02,Д1 x Д2?Р1 x P2). As we will discover in Chapter 5, the product probability space is the appropriate mathematical model for the juxtaposition of two independent experiments. □ Sections of Sets and Functions in Product Spaces We learned in calculus that a double (Riemann) integral can be evaluated as two iterated single integrals. Our next task is to prove a generalization
4.8 Product Measure Spaces □ 237 of that result to product measure spaces; roughly speaking, a theorem of the following form: If f: Г x Л —► U is S x T-measurable, then (4.39) In establishing (4.39), we must first show that it makes sense. For instance, we need to verify that if f is an <S x T-measurable function on Г x Л, then the function, /^j, defined on Л by f[x](y) = f(x,y) is T-measurable; that the function, g, defined on Г by g(x) = fA f(x, y) dv(y) is S-measurable; and so forth. To begin, we define the sections of a set. DEFINITION 4.25 Sections of a Set in a Product Space Suppose А С Г x Л. Then the Г-sections of A and the A-sections of A are defined, respectively, by Ax = { у € A : (ж, у) € A }, x € Г; and Ay = { x € Г : (x, у) € A }, у E Л. Note that each Г-section is a subset of Л and that each Л-section is a subset of Г. Figure 4.1 provides a visual representation of a Г-section. FIGURE 4.1 A Г-section.
238 □ Chapter 4 Measure Theory EXAMPLE 4.2 1 Illustrates Definition 4.25 a) Let A = S x T, where S С Г and T C A. Then Ад f T’ \9, if x € S; if x S. and if У € T; ify£T. b) Let Г = A = TZ and A = { (x, y) : x2 + 4y2 < 4 }. Then Ax = -|(4-a:2)5 i(4-x2)5 Z Z for |rr| < 2, and Ax = 0, otherwise. Similarly, Ay = [-2(1 - y2)5,2(1 - y2)^] for \y\ < 1, and Ay = 0, otherwise. □ We next prove that sections of 5 x T-measurable sets are themselves measurable. More precisely, we have the following proposition. PROPOSITION 4.21 Suppose that (Г, S) and (Л, T) are measurable spaces and that A € S xT. Then a) Ax e T for all x € Г. b) Ay € 5 for all у € Л. PROOF: We prove only (a). The proof of (b) is similar and is left as an exercise for the reader. Set P = { A € S x T : Ax G T for all x € Г }. It follows immediately from Example 4.21(a) that P contains all measurable rectangles; that is, P D W. Because 5 x T is, by definition, the smallest cr-algebra containing Z/, the proof will be complete once we show that P is a cr-algebra; because that will imply P = 5 x T (why?). So, assume that A € P. Then A € S x T and Ax € T for all x € Г. Therefore, Ac G SxT and (Лж)с € T for all x € Г. But, (Лж)с = (Ас)ж and, hence, Ac G P. Now assume that {An}n С P. Then {Лп}п C 5 x T and {(Ai)x}n С T for all x G Г. Thus, \Jn An e SxT and ип(Лп)ж 6 T for all rr G Г. However, UnHnJx = (Un4)x and, so, Un An € V- Consequently, P is a cr-algebra. Having discussed sections of subsets of Г x Л, we now move on to the consideration of sections of functions on Г x Л. The sections of such functions are obtained by holding one of the two variables fixed.
4.8 Product Measure Spaces □ 239 DEFINITION 4.26 Sections of a Function on a Product Space Suppose that f is a function on Г x Л. Then the Г-sections of f and the Л-sections of f are defined, respectively, by and f[x](y) = f(x,y), х€Г; Note that each Г-section of f is a function on Л and that each A-section of f is a function on Г. EXAMPLE 4.22 Illustrates Definition 4.26 Let Г = 1Z and Л = АЛ Define f: 1Z x A/* —* 1Z by f(x,y)=xy-\------. У Then /[ipAf —* 1Z is given by f[i](y) = l/2y + 1/4?/ and f^ilZ —* TZ is given by f^(x) = x2 + x2/2 = 3x2/2. □ Proposition 4.22, which we prove next, shows that sections of S x T- measurable functions are themselves measurable functions. PROPOSITION 4.22 Let (Г, 5) and (Л, T) be measurable spaces. Suppose that f is an extended real-valued or complex-valued S x T-measurable function on Г x Л. Then a) is T-measurable for all x e Г. b) is S-measurable for all у € Л. PROOF: To prove part (a), let x € Г. We will employ the bootstrapping technique to show that is T-measurable. So, assume first that f = x As where A G S x T. Then, £ f \ £( \ f T (x,y) £ fl» у £ A&; / \ /н(г/) -f(x,y) - “(о, y<£Ax. ~XA^- Since A e S kT, we know by Proposition 4.21(a) that Ax € T and, hence, that XAx is T-measurable. Next assume that f is a nonnegative simple
240 □ Chapter 4 Measure Theory function, say f = akXAk> where Ak G 5 x T for 1 < к < n. Then /[z] = акХ(Аь)х which is T-measurable, being a linear combination of T-measurable functions. Now assume f is a nonnegative extended real-valued 5 xT-measurable function. Then, by Proposition 4.7(a) on page 186, there is a sequence, {$п}^1, of nonnegative 5 x T-measurable simple functions that converges pointwise to f on Г x Л. From the previous paragraph, we know that is a sequence of T-measurable functions on Л. Moreover, since sn —► f pointwise on Г x Л, it is clear that (sn)[z] “* f[x] pointwise on Л. Therefore, by Proposition 4.7(b), f[x] is T-measurable. Next assume that f is an extended real-valued 5 x T-measurable function. We write f = f+ — f~ and note that f[x] = — /[”]. Us- ing the result of the previous paragraph and the fact that the difference of two T-measurable functions is T-measurable, we conclude that f[x] is T-measurable. Finally, assume that f is a complex-valued S x T-measurable function. We write f = $lf + iQf and note that f\x] = (SR/)[Z] + i(S/)[z]. Then we apply the result of the previous paragraph and the fact that a linear combination of T-measurable functions is T-measurable to conclude that /[z] is T-measurable. This completes the proof of part (a). The proof of part (b) is similar and is left as an exercise. In final preparation for our theorems on iterated integrals in product spaces, which we will consider in the next section, we prove the following two lemmas. Note the o-finiteness assumptions in each lemma. LEMMA 4.8 Suppose that (Г, S, p) and (Л, T, i/) are two a-Gnite measure spaces. Then, for each AtSxT, a) the function, g, defined on Г by g(x) = v(Ax) is S-measurable. b) the function, h, defined on Л by h(y) = p(Ay) is T-measurable. PROOF: Let V = { A e S x T : (a) and (b) hold}. We will show that V D Aq(W) and that T> is closed under monotone lim- its. It will then follow from the monotone class theorem, Theorem 1.1 on page 30, that V D S x T. Since, by definition, V C S x T, we will have P = S x T, as required. We first establish that P D W. So, let S x T be a measurable rectangle. Then (S x T)z = T, if x € S, and is empty otherwise. Consequently, we
4.8 Product Measure Spaces □ 241 have g(x) = i/((S x T)x) = v(T)xs(x). Since S € S, g is 5-measurable. Similarly, we find that h(y) = x T)37) = is T-measurable. So, T)^U. Next we show that P D Ao(^). Let A € Aq(LT). Then, by Lemma 4.2 on page 215, A is a finite disjoint union of members of ZY, say A — |J£=1 Ль- Now, Ax = Uk=i(^A:)a: and, because the A^s are pairwise disjoint, so are the (Afc)zs for each fixed x € Г. Consequently, v(Ax) = ^Jk=i р(СЛ)я)- Because Ak G W, the previous paragraph implies that gk(x) = v((Ak)x) is 5-measurable. Hence, g(x) = z/(Ax) = 52fc=i^fc(^) is 5-measurable, being a sum of 5-measurable functions. Similarly, the function h(y) = //(A37) is T-measurable. Thus, V D Aq(U). Next we prove that T> is closed under nondecreasing limits; that is, if {An}^=1 G P and Ai G A2 G •••, then |J^=i € P. We have (Ai)x C (A2)x G • • • and, hence, Theorem 4.1(d) on page 170 implies that (/ °O \ \ / °O к (UAn) )= u\ )= Диасом*)- (4-4°) Vn=i / 41=1 / n-*°° Since {An}^=1 G P, the function, ^n(x) = ^((An)x), is 5-measurable for each ntN and, by (4.40), gn(x) —* ^((U^Li Ai)z) pointwise on Г. So, by Theorem 4.5(d) on page 180, g(x) = ^((IJ^Li An)x) is 5-measurable. A similar argument shows that the function, h(y) = m((U^=i ^n)27), is T-measurable. Hence, IJ^Li G. "D. Finally, we must verify that P is closed under nonincreasing limits; that is, if {An}^=1 G P and Ai D A2 D • • •, then An G P. Suppose first that there exist S G 5 and T G T with //(5) < oo and v(T) < oo such that Ai G SxT. Then (AJ^ G T and (Ai)37 G S and, consequently, i/((Ai)x) and //((Ai)37) are both finite. Applying Theorem 4.1(c) and an argument similar to the one used in the preceding paragraph, we find that ПХМпеР. To handle the general case — that is, no restriction on Ai—we must invoke the сг-finiteness assumption. We can select nondecreasing sequences, {Sk}k C 5 and {Tfcjfc G T, such that Г = |Jfc Sk and A = |Jfc7k, and, for all fc, v(Sk) < oo and z/(7fc) < oo. Define 8 = { E G 5 x T : E A (Sk x Tk) G P for all к }. (4.41) We leave it as an exercise for the reader to prove that £ = 5 x T. (See Exercise 4.144.) Again, let {An}^^ be a nonincreasing sequence of members of P, but this time with no restriction on Ap For convenience, set A = Ai-
242 □ Chapter 4 Measure Theory Then A e £ ( = 5 x T) and, thus, An(Sk x Tk) E V for all k. The sequence, {Sk x Tk}k> is nondecreasing because {Sk}k and {Tfc}fc are nondecreasing. This, in turn, implies that the sequence, {An(Sfc хТ^)}^, is nondecreasing. Since we have already shown that T> is closed under nondecreasing limits, we can conclude that ^(A C x W) But, x W = Г x Л (why?) and, consequently, A = А П (Г x Л) = А П x Tfc)) = |J(A Г) (Sk x Tfe)). ' к 'к This proves that A € T>. We have now established that T> D Ao (I/) and that P is closed under monotone limits. Therefore, by the monotone class theorem, P contains the cr-algebra generated by -Aq(W), which is 5 x T. Since, by definition, P C S x T, we deduce that P = 5 x T, as required. LEMMA 4.9 Suppose that (Г, 5, p) and (Л, T, v) are two а-Snite measure spaces. Then, for each AeSxT, a) (jj. x i/)(A) = Jr „(Az) dfi(x) b) (V x i/)(A) = JA fj.(Av) dv{y). PROOF: We will prove part (a). The proof of part (b) is similar and is left as an exercise for the reader. For A € S x T, define t(A) = J v(Ax)dp(x). In view of Lemma 4.8, the integral exists because the function, g, defined on Г by g(x) = и{Ах) is a (nonnegative) 5-measurable function. We will show that т is a measure on S x T and that т = p x и on U. This will imply, by the uniqueness portion of Theorem 4.15 on page 235, that т = p x v on 5 x T, as required. Clearly, t(A) > 0, for all A € SxT, and r(0) = 0. Assume that {An}n is a sequence of pairwise disjoint members of S x T. Then {(An)x}n is a sequence of pairwise disjoint members of T. Consequently, Hence, т is a measure on S x T.
4.8 Product Measure Spaces □ 243 Now suppose that S x T is a measurable rectangle. Then r(5 x T) = i/((S x T)x) dfi(x) = I p(T)xs(z) d^x) = n(S)v(T) = (цх z/)(S x T). This shows that r agrees with /z x и on U. EXERCISES 4.8 4.133 Verify (4.38) on page 233. Hint: Show that if v(Tk)xsk(x) > 0 for some /с, then x € 5. 4.134 Let /1 be counting measure on P(7£). Show that (7£, P(7£),/z) is not a cr-finite measure space. 4.135 Suppose that (Г,5, /z) and (A,T, z/) are cr-finite measure spaces. Prove that the product measure space, (Г x A, S xT,/ixi/), is cr-finite. 4.136 Show that Л4 x Л4 contains all open and closed subsets of 7£2; that is, each open or closed subset of 7£2 is Л4 x Л4-measurable. ★4.137 Let 7 be counting measure on P(Af). a) Show that the product cr-algebra of P(N) with P(Af) consists of all subsets of A/* x J\T-, that is, P(Af) X Р(АГ) — Р(ЛГ x M). Hint: N x AT is countable. b) Show that the product measure of 7 with 7 is counting measure on P(Af) x P(A/*) ( = P(AT x Af)). 4.138 Suppose that Г = {xi,X2,..., xm} and A = {2/1,3/25.. •, Уп} are finite sets and that {ai, аг, •.., flm} and {&i, 62, • • •, bn} are two sets of nonnegative numbers. Define /z on P(Q) and v on P(A) by /z(A) = ^x.EAaj and z/(B) = €B bk. Determine explicitly a) the product cr-algebra, 'P(Q) x P(A). b) the product measure, /z x 1/. 4.139 Suppose that g is a complex-valued S-measurable function on Г and that h is a complex-valued T-measurable function on A. Define f on Г x A by f(x,y) = g(x)h(y). Show that f is 5 x T-measurable. 4.140 Let Г = A = P, and A — { (rr, у) : 0 < у < x2 and x > 0}. Determine Ax and Ay. 4.141 Prove part (b) of Proposition 4.21 on page 238. 4.142 Prove part (b) of Proposition 4.22 on page 239. 4.143 True or False: a) If А, В С Г x A are disjoint and x G Г, then Ax and Bx are disjoint, b) If Si x Ti and S2 x T2 are disjoint rectangles in Г x A, then Si and S2 are disjoint. c) If Si x 7i and S2 x T2 are disjoint rectangles in Г x A, then either Si and S2 are disjoint or Ti and T2 are disjoint.
244 □ Chapter 4 Measure Theory 4.144 Let 8 be defined as in (4.41) on page 241. Prove that 8 = S x T by employing the following steps. a) Show that 8 D -4o(ZV). Hint: Ao(U) is an algebra and Ao(U) C T>. b) Show that 8 is closed under nondecreasing limits. c) Show that 8 is closed under nonincreasing limits. d) Conclude that 8 = 5 x T by employing the monotone class theorem. 4.145 This exercise shows that ст-finiteness cannot be omitted as a hypothesis in Lemma 4.8. Let S be the set defined in Lemma 3.12 on page 116. By Exercise 3.51, S Л4. Also, let (Г,5, /z) = (7£, A4,A) and (A,T, u) = (TZ, P(TZ), v), where i/(T) = y(T П S') and 7 is counting measure. a) Let Qo = Q \ {0} and define A = { (x, x + r) : x G TZ, r G Qo }• Show that A G M xP(TZ). Hint: First show that the function, f(x, y) = y-x, is Л4 x P(7£)-measurable. b) Show that v(Ax) = xs^(x) and conclude that x —* v(Ax) is not Ad- measurable. c) Why doesn’t the result in part (b) contradict Lemma 4.8? 4.146 Prove part (b) of Lemma 4.9 on page 242. Exercises 4.1J7-4.151 should be completed by all readers who plan to cover the probability material in Chapter 5. +4.147 Denote by B2 the smallest cr-algebra of subsets of TZ2 that contains all open sets of TZ2. Members of B2 are called two-dimensional Borel sets. a) Show that B2 = В x B. b) A measure on B2 is called a two-dimensional Borel measure. Sup- pose that /z and v are finite two-dimensional Borel measures such that p(A x B) = v(A x B) for all A, В G B. Prove that p = и. 4.148 Let T2 denote the collection of all two-dimensional intervals in 7£2; that is, all sets of the form I x J where I, J G T. a) Show that the ст-algebra generated by T2 is B2; that is, Л^г) = B2. Hint: Use Exercise 4.147(a). b) Let p and v be two-dimensional Borel measures such that p(K) < 00 for all bounded two-dimensional intervals and p(K) = v(K) for all К G T2. Prove that p = 1/. 4.149 Let J denote the collection of intervals of TZ of the form (a, 5] and (c, 00), where —00 < a < b < 00 and —00 < c < 00. Also, let J 2 denote the collection of all subsets of TZ2 of the form I x J where I, J G J. Prove that J is a semialgebra and that the cr-algebra generated by J2 is #2- 4.150 Suppose that p and v are finite two-dimensional Borel measures such that p{(—00, cr] x (—00,2/]) = i/((—00,я] x (—00,2/]) for ali x, у E TZ. Prove that /z = v. Hint: It suffices to prove that p = v on J 2- 4.151 Let p and v be finite Borel measures and r a two-dimensional Borel mea- sure. Suppose that т((—oo,z] x (—сю, 2/]) = /z((—00, z])i/((—oo, 2/]) for all x, у G TZ. Prove that т = p x v.
4.9 Iteration of Integrals in Product Measure Spaces □ 245 4.9 ITERATION OF INTEGRALS IN PRODUCT MEASURE SPACES In Section 4.8 we discussed product measure and product measure spaces. Now we will learn how to evaluate integrals on product measure spaces by iteration; that is, by the evaluation of two integrals on the factor spaces. We will present several theorems of this type. The first theorem is known as Tonelli’s theorem? THEOREM 4.1 6 Tonelli’s Theorem Suppose that (Г, 5, p) and (Л, T, p) are a-Snite measure spaces. Let f be a nonnegative extended real-valued S x T-measurable function on Г x A. Then a) f[x] is T-measurable for all x € Г. b) f M is S-measurable for all у € Л. с) g(x) = fA f(x,y)dis(y) is S-measurable. d) h(y) = frf{x,y)dp(x) is T-measurable. e) the equalities, 17 л л L hold. PROOF: Parts (a) and (b) are the contents of Proposition 4.22 on page 239. It remains to verify parts (c)-(e). To begin, we will show that if /i, /2, ..., fn are nonnegative S x T-measurable functions on Г x Л that satisfy (c)-(e) and ci, C2, ..., cn are nonnegative real numbers, then ££=1 ckfk satisfies (c)-(e). It suffices to verify this for n = 2. Let f = c^fi + 02/2- Then, by the linearity of the Lebesgue integral, we have / f(x,y)dv(y) = ci / fi(x,y)dv(y) + c2 / /2(2, У)dv(y) Ja Ja Ja Since a linear combination of measurable functions is measurable and, by assumption, Д and /2 satisfy (c), we conclude that g(x) = fA f(x,y) dv(y) t Some authors attribute this theorem to G. Fubini.
246 □ Chapter 4 Measure Theory is S-measurable. Similarly, h(y) = fr f(x,y)dp,(x) is T-measurable. Now using the linearity of the Lebesgue integral and the assumption that Д and /2 satisfy (e), we get Hence, the first equation in (e) holds for f. A similar argument shows that the second equation in (e) holds for f. We will now bootstrap to prove the theorem. If f = where A € 5 x T, then g(x) = v(Ax) and h(y) = ц(Ау). Therefore, by Lemma 4.8 on page 240, (c) and (d) hold; and, by Lemma 4.9 on page 242, (e) holds. Hence, (c)-(e) are satisfied if f is the characteristic function of a set in SxT. It now follows immediately from the previous paragraph that (c)-(e) hold for nonnegative simple functions. If f is a nonnegative extended real-valued S x T-measurable function, then, by Proposition 4.7(a) on page 186, we can choose a sequence, {sn}^=i, of nonnegative simple functions such that sn J f pointwise on Г x Л. It follows that (sn)[z] T f[x] pointwise on Л and, so, by the MCT, = / ffay)dv(y) = Hm / sn(x,y)dv(y). J A n-*oo This shows that g is the pointwise limit of the 5-measurable functions, /лsn(я,*/)dr/(?/), n € J\f. Hence, g is 5-measurable. Similarly, we find that h(y) = fr f(x, y) dp>(x) is T-measurable. Finally, employing the MCT twice more yields / /d(/zxp) = lim / snd(/zxz/) JrxA n”*°°JrxA This verifies the first equation in (e) and a similar argument establishes the second equation in (e).
4.9 Iteration of Integrals in Product Measure Spaces □ 247 Tonelli’s theorem deals with iterated integrals for nonnegative mea- surable functions. In that case, there is no issue of the existence of the integrals occurring in the theorem. Now we will consider the iteration of integrals for complex-valued measurable functions. To ensure the existence of the integrals involved, an integrability condition is imposed. THEOREM 4.1 7 Fubini’s Theorem Suppose that (T,S,p) and (A,T,v) are сг-finite measure spaces. Let f be a complex-valued S x T-measurable function on Г x Л such that at least one of the quantities, (i) [ x «/ГхЛ is finite. Then a) f[x] G f°r p-almost all x € Г. b) € £\p) for v-almost all у € A. с) g(x) = /л fjx,y) du{y) is defined p-ae and is in £х(р). d) h(y) — Jr y) dp(x) is defined v-ae and is in jC1^). e) the equalities, ГхЛ hold. PROOF: By Tonelli’s theorem, the three integrals, (i), (ii), and (iii), are equal. Since, by assumption, at least one of the integrals is finite, they all must be finite. By Proposition 4,22 on page 239, f[x] is T-measurable for all x € Г and is 5-measurable for all у € Л. Assume now that f is real-valued and write f = f+ — f~. It will be convenient to let Д = /+ and f2 = f~. Because 0 < fj < \f\, it follows from (ii) that Jr [fA fj dv\ dp < po, for j = 1, 2. Consequently, by Exercise 4.53 on page 191, I fj(x,y)dv(y} M“ae> J = 1, 2- (4.42) JA Let E = {z e Г : JAfj(x,y)dv(y) < oo, forj = land2}. Then, for
248 □ Chapter 4 Measure Theory x e E, both and are in jC1 (zx); hence, so is j\x]. Since, by (4.42), ^(£*c) == 0, we see that (a) holds. A similar argument establishes (b). Next, for j = 1 and 2, we define gj(x) = Xe(^) fA fjfx, y) dv(y). Then gj is real-valued and, by part (c) of Tonelli’s theorem, is 5-measurable. Moreover, it follows immediately from (ii) that fr gj dp < oo. Conse- quently, gj e £X(m) f°r J = 1 and 2. But, then, Theorem 4.8 on page 196 implies that g\ — g2 E El{p). However, if x € E, Pi(^)-P2(^) = / fi(x,y)dv(y) - / f2(x, y) dv(y) = / f(x,y)dv(y). J A J A J A Therefore, the function, ^(x) = fA f(x, y) dv(y), is defined /z-ае and is in £1(/z). This proves that (c) holds and a similar argument verifies (d). Employing part (e) of Tonelli’s theorem, we deduce that ГхА ГхА ГхА = I gidp- I g2dp= I gdp = / / f du dp. Jr Jr Jr Jr Lja This establishes the first equation in (e) and a similar argument verifies the second equation in (e). We have now shown that Fubini’s theorem holds if f is real-valued. The verification for complex-valued f is left to the reader as an exercise. Example 4.23 provides applications, illustrations, and remarks about the Tonelli and Fubini theorems. EXAMPLE 4.23 Illustrates the Tonelli and Fubini Theorems a) The following is a theorem from calculus: Suppose that f is a real-valued function of two variables, defined and continuous on the rectangle, Then f is Riemann integrable on R and r r Г*> [ i >d (4-43) a c R L J This result can be proved by employing Fubini’s theorem, as outlined in Exercise 4.167.
4.9 Iteration of Integrals in Product Measure Spaces □ 249 b) Suppose that {amn}m,n=1 is a double sequence of nonnegative real num- bers. Then (4.44) To prove (4.44), let (Г, 5,/1) = (A,T, r/) — (Af\P(M),y) where 7 is counting measure. Then, by Exercise 4.137 on page 243, the product measure space is (Af x Af,P(Af x Af), к), where к is counting measure on P(Af x Af). Define /: Af x Af —* by /(m,n) = amn. Then, by Example 4.7(b) on page 189 and Tonelli’s theorem, /(m, ri) c) This part shows that the a-finiteness hypothesis cannot be dropped in Tonelli’s theorem. Let 7 be counting measure, (Г, S, ^) = (7£, A4,A), (A,T,z/) = (7£,7^(7?.),7), and D — {(x,y) : x = y}. Set f = xp- We claim that f is A4 x P(7?,)-measurable or, equivalently, D e A4 x P(7£). Indeed, since A4 x P(P) is a a-algebra and contains all rectangles (in the geometry sense), it also contains all open sets in P2 and, hence, all closed sets in P2. Clearly D is a closed subset of P2. Now, it is easy to see that = X{x}(y) and f^(x) = X{y}(xY Consequently, we have f(x,y) dy(y) = 7({z}) = 1 for each x G P and we have /(#, y) dX(x) = A({?/}) = 0 for each у eP. Hence, f(x,y) dy(y) dX(x) = 00 7^ 0 /(x,?/) dX(x) dy(y). Therefore, the second equation in part (e) of Tonelli’s theorem fails. Note that the measure space, (7£,P(7£), 7), is not a-finite. d) In this part, we show that the integrability condition cannot be omitted from Fubini’s theorem. Let (Г,5,/1) = (A,T, i/) = (Z,T?(Z),7), where Z is the set of integers and 7 is counting measure on P{Z). Clearly, (Z,P(Z),7) is а-finite. Let /: Z x Z —► P be defined by {rr, у = x\ -x, y = x + l-> 0, elsewhere.
250 □ Chapter 4 Measure Theory Then, L f (x, y) dr^y) = £ f(x, y) = x + (-x) = 0, for each x e Z, and, hence, Z \.JZ /(x, y) d^y) dy(x) = ^2 0 = 0. On the other hand, fz f(x, y) d-^x) = Y,x f(x> ?/) = _(?/“1) + У = 1, for each у G Z, and, consequently, /(x, y) d7(x) dy(y) = 521 = Thus, the second equation in part (e) of Fubini’s theorem fails. Note that here none of the integrals in (i)-(iii) of Fubini’s theorem is finite. □ The Completion of the Product Measure Space The product of two measure spaces may not be complete, even when each factor space is complete. For instance, we know that the measure space, (7£,.A4, A), is complete. But the product of that measure space with itself, (7£2,A4 x Л4,Л x A), is not complete. Indeed, let N be a non-Lebesgue measurable set, A = N x {0}, and В = 1Z x {0}. Then (A x A)(B) = 0, A С B, but A M. x M because A0 — N M. (see Proposition 4.21 on page 238). Consequently, we see that (7£2,A4 x Л4, A x A) is not a complete measure space. Recall from Theorem 4.2 on page 172 that, given a measure space, (fl, A,/z), there is a complete measure space, (fl,A,/z), called the com- pletion of (fl, A,/z), such that A D A and = /z. It is often more appropriate to work with the completion of a product measure space than the product measure space itself. An important example of this occurs in classical analysis, as we now show. We just discovered that the measure space, (7£2, Л4 x Л4, A x A) is not complete. This can cause difficulties. For instance, as Exercise 4.162 re- veals, a function can be Riemann integrable over a set D C TZ2 without being Л4 x M-measurable. However, this cannot happen with the comple- tion, (7?.2,Л4 x Л4,А x A). In fact, we have the following two-dimensional analogue of Theorem 3.23 on page 157, whose proof is left as an exercise for the reader.
4.9 Iteration of Integrals in Product Measure Spaces □ 251 THEOREM 4.18 Suppose that f is Riemann integrable on [a,b] x [c,d]. Then f is Lebesgue integrable on [a, b] x [c, d] with respect to A x A and f(x,y)dX x X(x,y) - f f f(x,y)dxdy. [a,b] x [c,d] J a J c Note: Because of Theorem 4.18, we will often denote the integral on the left of the previous equation by the integral on the right, regardless of whether f is Riemann integrable. We should point out that the measure space, (7£2,A4 x A4,A x A), is identical to the measure space, A2), discussed in Section 4.6 on page 214. In other words, A x A is two-dimensional Lebesgue measure and M x Л4 is the collection of two-dimensional Lebesgue measurable sets. The verification of these facts is considered in Exercise 4.163. We Can derive analogues of Tonelli’s theorem and Fubini’s theorem for the completion of a product measure space provided that the factor spaces are complete and сг-finite. We begin with two lemmas. LEMMA 4.10 Let (fi, Л, p) be a measure space and f an A-measurable function on fi. Then there exists an A-measurable function, ф, on fl such that ф = f p-ae. PROOF: We employ the bootstrapping technique. So, first suppose that f = Xe, where E € A. By definition, we can select sets, A, B, and C, such that E = В U A, where В, C G A, A С C, and p(C} = 0. Now we define the function, ф = хв, and note that, because В G А, ф is Л-measurable. Let D — { x : ф(х) / f(x) }. Then D = E\B C Ac(l Since p(C) = p(C) = 0 and D С C, it follows by completeness that D G A and p(D) = 0. Thus, ф — f p-ae. Suppose now that f is a simple function, say f = akXAk- Select Л-measurable functions, фь, such that фь = XAk p-ae, for 1 < к < n. If Dk = {x : фк(х) / XAk(x)}, then the set, D = Uk=i &k, has Д-measure zero and the Л-measurable function, ф = 52ь=1 акФк> equals f on D. __ If f is nonnegative, choose a sequence, of nonnegative Л- measurable simple functions such that sn T f pointwise on fl. Then, using the previous paragraph, select a sequence, of Л-measurable func- tions such that, for each n G N, tn = sn p-ae. Define ф = limsupn_>oo and note that ф is Л-measurable. We claim that ф = f p-ae. To prove this,
252 □ Chapter 4 Measure Theory let An = { x : tn(x) / sn(x)} and A = (JXi Then p(A) = 0 and, if x £ A, ф(х) = limsup tn(x) = lim sn(x) — f(x). n—*oo n—*oo Consequently, ф = f p-ae. We leave the remainder of the proof as an exercise for the reader. LEMMA 4.11 Suppose that (Г,5, p) and (Л, T, p) are complete, а-finite measure spaces. If £ is an S x T-measurable function such that £ = 0 p x v-ae, then a) for p-almost all x G Г, = 0 v-ae. b) for v-almost all у G Л, f M = 0 p-ae. PROOF: Let E — {(x,y) : £>(x,y) /0). Then p x — 0, by as- sumption. We can select sets, A, B, and C, such that E = В U A, where В, С E S x T, Ac C, and (p x i/)(C) — 0. Since p x v(E) = 0, it is clear that (p x i/)(B) = 0. Let D = BuC. Then D G 5 x T and (p x i/)(D) = 0. Consequently, by Lemma 4.9(a) on page 242, Jr v(Dx) dp(x) — 0. This implies that v(Dx) = 0 p-ae. Set N = {x : v(Dx) /0} and note that p(N) = 0. If т TV, then i/(Dz) = 0 and, since Ex C Dx and (Л, T, i/) is complete, it follows that Ex G T and v(Ex) = 0. But Ex = {у : £[ж](£/) /0} and, hence, we see that £[я.] =0 p-ae. Thus, part (a) holds and a similar argument establishes the validity of part (b). We are now in a position to prove the analogues of Tonelli’s theorem and Fubini’s theorem for the completion of a product measure space. The former theorem is presented as Theorem 4.19 and the latter theorem is left to the reader as an exercise. THEOREM 4.1 9 Suppose that (Г, 5, p) and (Л, T, i/) are complete, а-finite measure spaces. Let f be a nonnegative extended real-valued S x T-measurable function on Г x Л. Then a) f[x] is T-measurable for p-almost all x G Г. Ъ) is S-measurable for v-almost all у G Л. с) g(x) = f(x, y) dv(y) is defined p-ae and is equal to an S-measurable function p-ae. d) h(y) = fr f(x,y) dp(x) is defined p-ae and is equal to a T-measurable function v-ae.
4.9 Iteration of Integrals in Product Measure Spaces □ 253 e) the equalities, f(x, y) dp x i/(x, y) = f(x,y) dv(y) dp(x) f(x, y) dp(x) dv(y), hold. PROOF: Choose, by Lemma 4.10, an 5 x T-measurable function, ф, such that ф = f ijl x i/-ae. Let E = {(x,y) : ф{х,у) / /(rr, ?/)}. Arguing as in the proof of Lemma 4.11, we can select D € S x T with E C D and (/z x i/)(D) = 0. If we define h = хэсФ and £ = then h is S x T- measurable, f = h + £, and £ = 0 /z x i/-ae. By part (a) of Tonelli’s theorem, h[x] is T-measurable for x € Г. Since /[x] = fyz] + £[z] and £ = 0 /z x p-ae, we can conclude from Lemma 4.11(a) that, for /z-almost all x € Г, = Лэд i/-ae. Consequently, because (Л, T, i/) is complete, f[x] is T-measurable for /z-almost all x E Г. This completes the proof of part (a) and a similar argument establishes part (b). From part (c) of Tonelli’s theorem, the function, p, defined on Г by p(x) = fAh(x,y) dv(y) is 5-measurable. Let A = {x : f[x] = h[x] i/-ae}. By the previous paragraph, /z(Ac) =0. If x 6 A, then f[x] is T-measurable and fA f(x, y) dv(y) = fAh(x,y) dis(y). This verifies part (c). Similarly, part (d) holds. To establish the first equation in part (e), we apply Exercise 4.166, part (e) of Tonelli’s theorem, and Definition 4.14 on page 197: f dp x v A similar argument verifies the second equation in part (e). The Product of More Than Two Measure Spaces Up to this point, we have only considered product measure spaces in which there are two factors. Using similar techniques, we can develop the theory of product measure spaces in which there are a finite number of factors. We present only the highlights.
254 □ Chapter 4 Measure Theory THEOREM 4.2 0 Suppose that (Q^, Л/с, /ifc), 1 < к < n, are а-Snite measure spaces. Let X£=1 = { (xi,...,xn) : Xk G Qfc, for 1 < к < n }, the Cartesian prod- uct of Qi, ..., Qn. Also, let X £=1 Ak denote the а-algebra generated by the n-dimensional measurable rectangles. Then there is a unique measure, XjLi/Zfc, on x£=1-4fc such that (x£=1Mfc) (x£=1 Afc) = flLiMfcHk) for all n-dimensional measurable rectangles. The (т-algebra, Х£=1 Л&, is referred to as the product cr-algebra of Ai, ..., An’, the measure, X^=1//fc, as the product measure of Mi, pn’, and the measure space, (x£=1Qfc, X£=1 Ak, X£=1//fc), as the product measure space of (Qi, Ai,pi), ..., (£ln,An, pn\ Tonelli’s theorem and Fubini’s theorem generalize to n-dimensional product spaces. In particular, the integral of a nonnegative Х£=1Лаг measurable function or a function in C1 (X^=1//fc) can be evaluated by forming the iterated integrals in any order. For example, if n = 3 and f G ^(pi x /z2 x Рз), then for each permutation, ii, i%, is, of 1, 2, 3. EXAMPLE 4.24 Illustrates the Product of Finitely Many Measure Spaces a) Let Bn denote the (т-algebra generated by the open sets of TZn. Members of Bn are called n-dimensional Borel sets. It is not too difficult to show that Bn = В x • • • x B. b) It can be shown that (IV1, Л4 x • • • x Л4, A x • • • x A) = (7in, A4n, An). In other words, Л x • • • x Л is n-dimensional Lebesgue measure and Л4 x • • • x M is the collection of n-dimensional Lebesgue measurable sets. We should also point out that Theorem 4.18 can be general- ized to arbitrary dimensions. That is, if f is Riemann integrable on X^=i[ttfc,bfc], then f is Lebesgue integrable on X \&k, bk] with re- spect to An and The proof of this fact is left to the reader as an exercise.
4.9 Iteration of Integrals in Product Measure Spaces □ 255 EXERCISES 4.9 Note: In some of the exercises, you will need the following two facts: (1) Let D C 1Z2. A subset of D is open in D if and only if it can be expressed as the intersection of D with an open subset of H2. (2) A function, f: D —> 11, is continuous if and only if /~1(O) is open in D for each open set О C 11. 4.152 Complete the proof of Fubini’s theorem; that is, assuming that the theo- rem holds for real-valued functions, prove that it holds for complex-valued functions. Hint: Write f = 3?/ + iQf and use Proposition 4.9(b), found on page 194. 4.153 Suppose that f is a Lebesgue measurable function on 1Z such that [ \f(x)\dx<oo and [ №^dx<oo. J —OO J —OO I ' Define <7(x,y) = f(x)/(x2 4- 2/2), if (x,y) / (0,0), and zero otherwise. a) Show that G is M x A4-measurable. b) Prove that G G £г(А x Л) and that [ Gd(AxA)=7r f J-** J-x M 4.154 Suppose that {amn}m,n=i is a double sequence of complex numbers for which at least one of the quantities, | Umn I) (Em I), is finite. Then both quantities are finite and Hint: Use Exercise 4.73 on page 201. 4.155 Let (Г,5,д) = (A,T, z/) = ([0,1], A4[0,i], A[o,i]) and suppose / G £1(/zxz/). Prove that 4.156 Let (Г,5,/1) = (A, 7", z/) = ([-1,1], A4[-i,i], А[_1д]) and / be a continuous function on Г x A. Prove that / G £1(A[_ljl] x А[_1д]) and that, if D = { (x, y) : x2 + y2 < 1}, then 1 1—X2 f(x,y)dy dx.
256 □ Chapter 4 Measure Theory +4.157 This exercise introduces the convolution of two Borel measurable functions. For convenience, we will write A|B simply as Л. a) Let f be a Borel measurable function on 1Z and define F on 1Z2 by F(x, y) = f(x — y). Show that F is B2-measurable. Hint: (x,y) x —у is continuous. b) Suppose that £ is a Borel measurable function on 71. Prove that the function, ф, on 7Z2 defined by ф(х,у) = £(y) is B2-measurable. c) Suppose that h is a nonnegative Borel measurable function. Show that I h(x — y) dX(x) = / h(x) dA(x) Jn Jn for each у e 11. Hint: Bootstrap. d) Suppose that f,g€ CJtlZ.B, A). Prove that the function, f*g, defined on 1Z by (7*S)(»)= / f(x-y)g(y)dX(y) Jn exists for А-almost all x G H and is in £j (7£, B, A). The function, f *g, is called the convolution of f with g. +4.158 This exercise introduces the convolution of two cr-finite Borel measures, jz and y. a) If E G B, show that the set, A — {(x^y) : x 4- у G E}, is a two- dimensional Borel set, that is, is in B2. b) Show that the functions, g(x) — y(E — x) and h(y) = p,(E — ?/), are Borel measurable. c) Verify that (д x z/)(A) = [ v(E - x) dfj.(x) = I n(E - y) dv(y), Jn Jn where A is the set defined in part (a). d) For E G B, define (д * v)(E) = I n(E - y) dv(y). Jn Show that /z * у is a Borel measure. The measure (i * у is called the convolution of /z with y. Part (c) shows that /z * у = у * /z. e) If f G B, /2 * 1/), prove that I + y) d(jJ. x v)(x, у) = I f(t) d(p, * Jn2 Jn Hint: Bootstrap.
4.9 Iteration of Integrals in Product Measure Spaces □ 257 4.159 Let /л be a finite Borel measure. Define д: —> C by /z(s) = / ezts dp,(t). The function д is called the Fourier-Stieltjes transform of /z. a) Show that /z is well-defined, that is, the integral exists for each s G 1Z. b) Let и be a finite Borel measure. Prove that g*l/(s) = /z(s)l/(s), where /z * и is the convolution of /z and v as defined in Exercise 4.158. Hint: Use Exercise 4.158(e) and Fubini’s theorem. 4.160 Suppose that (Г,5, /z) and (A, 7", v) are two сг-finite measure spaces. Let 1Л be the semialgebra of measurable rectangles and l be defined on 1Л by t(S x T) = ij,(S)v(T). Furthermore, let (Г x Л, Л, r) be the complete mea- sure space induced by U and t, as in Theorem 4.11. Prove that A = S x T and т = /z x v. 4.161 Provide an example in which the measure space, (ГхЛ,5хТ,//х z/), is complete. 4.162 Construct a function f on 1Z2 that is Riemann integrable on [0,1] x [0,1] but is not M x M-measurable. Hint: Do something with a non-Lebesgue measurable set and use the fact that a function is Riemann integrable on [o, 1] X [0,1] if and only if the set of its points of discontinuity has two- dimensional Lebesgue measure zero. 4.163 Prove that (7£2,.M x Л4, A x A) = (7Z2,M2, A2). Use the following steps: a) Let T2 denote the collection of all sets of the form I x J, where I and J are intervals of 1Z. Show that M x M D A(T2) and that A x A agrees with A 2 on %2- b) Use part (a) to conclude that M x M D М2 and A x X\m2 = ^2- Hint: Employ Theorem 4.12, Exercise 4.114 on page 220, and Exercise 4.17 on page 174. c) Show that M x M С М2 and that A2 agrees with A x A on M x M. Hint: First show that В x 1Z G М2 for all В G В and then that E x 1Z G М2 for all E G M. ____ d) Use part (c) to conclude that М2 D M x M and A2|JV1xJV1 = A x A. e) Deduce the required result. 4.164 Generalize the previous exercise to n-dimensions. 4.165 Complete the proof of Lemma 4.10 on page 251. 4.166 Suppose that (Г2,Л,д) is a measure space and that f is an Л-measurable function on Q. Prove that / = / fdp. Jn Jn Hint: Bootstrap.
258 □ Chapter 4 Measure Theory 4.167 Establish the calculus theorem stated in Example 4.23(a) on page 248 by proceeding as follows: a) Define h on 7£2 by h(x,y) = /(#,?/), if (x,y) G R, and zero otherwise. Show that h is Л4 x A4-measurable and is in £?(А x A). b) Prove that Ijf(x,y)dxdy = f fd(XxX), R R where the integral on the left is a double Riemann integral. Hint: Use Exercise 4.166. c) Deduce (4.43) on page 248. d) Does (4.43) remain valid if f is assumed only to be Riemann integrable on [a, 6] x [c, d]? Explain. 4.168 State and prove the analogue of Fubini’s theorem for the completion of a product measure space. 4.169 Integration by parts: In this exercise, we will develop an integration by parts formula for Lebesgue-Stieltjes integrals. We proceed using the following steps: a) Let /1 be a finite Borel measure on R with distribution function, FM. Define FM(x—) = sup{FM(t) : t < x}. Show that F^(x—) = д((—oo, я)} for each x G R. b) Use part (a) to deduce that, for each x G R, д({я}) = Fm(j:) ~ FM(x—). [Thus, FM is continuous at x if and only if x is not an atom of p,.] c) Let у be a finite Borel measure having distribution function, Fu. Prove that, for a, b G R, I Fn(x)dy(x)+ I Fv(x) dp.(x) J (a,b] J (a,b] = ГД(Ь)Я(Ь) - Гм(а)Я(а) + / (F^x) - ГДх-)) dv(x). J (a,b] Hint: Apply Tonelli’s theorem to show that / FMdvty) + / F„(x) dp,(x) = / H(x,y)d(n x v)(x,y), J(a,b] •'(“.4 where H(x,y) = X(-oo,j/](?/) + Then show H(x,y) = Х{к}(г)Х(а,Ь)(3/) + X(-oo,a](z)X(a,b](!/) + X(a,b](;r)X( — oo,a1(») + Х(.,ч(1)Х(а,ч(»). 4.170 Let (Qfc, Л/с), 1 < к < n, be measurable spaces. Denote by U the collection of n-dimensional measurable rectangles; that is, U = < X Ak : Ah G Ak, 1 < к < n > . I fc=i J Prove that U is a semialgebra.
4.9 Iteration of Integrals in Product Measure Spaces □ 259 Exercises 4-^71-4-175 should be completed by all readers who plan to cover the probability material in Chapter 5. +4.171 Denote by Bn the smallest cr-algebra of subsets of 7Zn that contains all open sets of 7dn. a) Show that Bn = В x • • • x B. b) A measure on Bn is called an n-dimensional Borel measure. Sup- pose that /i and у are two finite n-dimensional Borel measures such that X ^=1 Bk) = i/( X ^=1 Bk) for all Bi, B2, ..., Bn e B. Prove that p = y\ that is, /z(B) = i/(B) for all В e Bn- 4.172 Let Zn denote the collection of all n-dimensional intervals in 7Zn] that is, all sets of the form x I2 x • • • x In where Ij G T for 1 < j < n. 8l) Show that the cr-algebra generated by Tn is Bn\ that is, Л(ТП) = Bn- Hint: Use Exercise 4.171(a). b) Let p and у be two n-dimensional Borel measures such that p(I) < 00 for all bounded n-dimensional intervals and p(I) = z/(Z) for all I €ln. Prove that p — y. 4.173 Let J denote the collection of intervals of of the form (a, 5] and (c, 00), where —00 < a < b < 00 and —00 < c < 00. Also, let Jn denote the collection of all subsets of 7Zn of the form Ji x J2 x • • • x Jn where Jk € J for 1 < к < n. Prove that J7n is a semialgebra and that the cr-algebra generated by Jn is Bn. * +4.174 Suppose that p and у are two finite n-dimensional Borel measures such that p( X £=1(—00, Xfc]) = X ^=1(—00, Xfc]) for all xi, £2, - - -, xn ETZ. Prove that p = y. Hint: It suffices to prove that p = у on Jn- +4.175 Let /11, ..., pn be finite Borel measures and p an n-dimensional Borel measure. Suppose that д(X£=1(—00,□;&]) = flfc=i xfc]) f°r aii xi, Х2, ..., xn G TZ. Prove that p = X £=1 ph-
Andrei Nikolaevich Kolmogorov (1903-1987) Andrei Kolmogorov was born on April 25.1903, in Tambov, Russia. He attended Moscow State University, graduating from there in 1925. Kolmogorov's contributions to mathematics encompass a formidable range of subjects. A partial listing includes functions of a real vari- able, trigonometric series, probability theory, theory of algorithms, functional analysis, topol- ogy, dynamical systems, information theory, and classical mechanics. Kolmogorov revolutionized probability theory. He introduced the mod- ern axiomatic approach to probability and proved many of the fundamen- tal theorems that are a consequence of that approach. He also developed two systems of partial differential equations that play a crucial role in the theory of Markov processes. In addition to his work in higher mathematics, Kolmogorov was inter- ested in the mathematical education of schoolchildren. He was chairman of the Commission for Mathematical Education under the Presidium of the Academy of Sciences of the U.S.S.R. During that time, he was in- strumental in the development of a new training program which was incorporated into the Soviet schools. Many articles and books were written by Kolmogorov. The book; In- troductory Real Analysis, co-authored with S. V. Fomin, provides, in the bibliography, a listing of some of his publications. Kolmogorov was a member of the faculty at Moscow State University until his death in 1987. 260
Elements of Probability Probability is the mathematical discipline dealing with the analysis of ran- dom phenomena. Intuitively, the probability of an event is a measure of the likelihood of its occurrence — a probability near 0 indicates that the event is unlikely to occur, whereas a probability near 1 suggests that the event is likely to occur. The origins of the theory of probability are usually taken to be in the middle of the seventeenth century, although the basic concepts of proba- bility date back to before the birth of Christ. With the development of the natural sciences in the early 1900s, it became increasingly important for probability to have a formal mathematical framework similar to that found in other branches of mathematics such as geometry and abstract algebra. Measure theory supplied the required framework. In this chapter, we will introduce the elements of probability theory based on the axiomatic development by Andrei Nikolaevich Kolmogorov. The foundations will be presented in Sections 5.1-5.3. Then, as a first application, we will examine several theorems, known collectively as laws of large numbers, which comprise some of the most important results in probability. We will return to further explore probability theory in other chapters of the text. 261
262 □ Chapter 5 Elements of Probability 5.1 THE MATHEMATICAL MODEL FOR PROBABILITY In this section, we will develop the mathematical model for probability based on the theory of measure discussed in Chapter 4. However, before we begin with that development, it will be useful for motivational purposes to provide an interpretation of the meaning of probability. To that end, let us think of an event as some specified result that may or may not occur when an experiment is performed; for example, a head comes up (the event) when a coin is tossed (the experiment). The usual interpretation of probability is the relative-frequency interpretation, which construes the probability of an event to be the relative frequency of its occurrence in a large number of repetitions of the experiment. More formally, let E be an event and P(E) its probability. For n repe- titions of the experiment, let n(E) denote the number of times that event E occurs. The relative-frequency interpretation is that, for large n, the pro- portion of times that event E occurs in the n repetitions of the experiment will be approximately equal to the probability that event E occurs on any particular trial: ~~ ~ for lar6e n' (5.1) To illustrate, consider the experiment of a single toss of a balanced coin. Because the coin is balanced, we reason that there is a 50-50 chance that the coin will come up heads (i.e., will land with heads facing up). Thus, we attribute probability 0.5 to that event. The relative-frequency interpretation is that in a large number of tosses of the coin, heads will come up about half the time. We used a computer to perform two simulations of tossing a balanced coin 100 times. The results are displayed in Figs. 5.1 and 5.2 and seem to corroborate the relative-frequency interpretation. FIGURE 5.1 FIGURE 5.2
5.1 The Mathematical Model for Probability □ 263 We should emphasize that all attempts to use (5.1) as a definition of probability have failed. Nonetheless, the relative-frequency interpretation is invaluable for motivational purposes in the axiomatic development. Fur- thermore, we shall see that once the axioms of probability are in place, a mathematically precise version of (5.1) can be proved as a theorem. Probability Spaces Consider now an experiment whose outcome cannot be predicted with cer- tainty beforehand. Such an experiment is called a random experiment. The set of possible outcomes of the experiment is called the sample space and is usually denoted by the English letter, S, or the Greek letter, П; we will use the latter notation? The possible outcomes themselves are denoted generically by the Greek letter, w. Actually, we will permit as a sample space any set containing all the possible outcomes of the experiment. This is because, a priori, it is some- times difficult to know precisely the possible outcomes of an experiment. For instance, consider the experiment of rolling a die once and observing the number of dots on the face pointing up. The most natural choice for the sample space is Q = {1,2,3,4, 5,6}. However, it is conceivable, because of, say, an imperfection in the die, that four would never come up. The im- portant factor in the choice of a sample space is that all possible outcomes are included as elements, not that all elements are possible outcomes. Associated with a random experiment is a collection of events, usually called the event class, which we will denote by A. The assumption is that any specified event will either occur or not occur when the experiment is performed. Each event, E € Л, can be considered a subset of the sample space; namely, the collection of outcomes that satisfy the conditions for the occurrence of E. Using this identification between events and sets, we see that an event, E, occurs if and only if the outcome of the experiment, cu, is a member of E, that is, w G E. We should point out that the empty set, 0, corresponds to an event that cannot occur and is called the impossible event. Two events, A and B, are called mutually exclusive if their joint occurrence is impossible; in other words, if A and В are disjoint. More generally, if each pair of events among a collection of events is mutually exclusive, then we say that the events in the collection are pairwise mutually exclusive. 1 The term outcome space is more descriptive than “sample space,” but we will adhere to the traditional terminology.
264 □ Chapter 5 Elements of Probability EXAMPLE 5.1 Illustrates Sample Spaces and Events a) Consider the experiment of tossing a coin three times. A sample space for the experiment is Q = {HHH, HHT, HTH, HTT, THH, THT, ТТН, TTT} where, for instance, HTT denotes the outcome of a head on the first toss and tails on the second and third tosses. Then, for instance, the event, E, that the first two tosses are heads consists of the two out- comes, HHH and HHT. In other words, E = {HHH, HHT}. Now, let F be the event that exactly two of the three tosses are tails. Clearly, it is not possible for both E and F to occur when the experiment is per- formed; hence, E and F are mutually exclusive. We can see this fact set theoretically by noting that F = {HTT, THT, TTH} and, so, E Q.F = 0. b) Suppose that, starting at 6:00 PM, we observe the elapsed time, in hours, until the first patient arrives at a certain emergency room. For this experiment, we can take the sample space to be the nonnegative real numbers: Q = [0, oo). Then, for instance, the event, E, that the first patient arrives between 6:15 and 6:30 PM, inclusive, consists of all real numbers between 1/4 and 1/2, inclusive; that is, E = [1/4,1/2]. □ Next we need to decide on what properties an event class, A, must have. First of all, if E € A (i.e., E is an event), then we can speak of the occurrence or nonoccurrence of E. However, the nonoccurrence of E is equivalent to the occurrence of the complement of E. Hence, if E e A, then we require that Ec 6 A. Suppose that A, В e A. Then we can speak of the occurrence of each of the two events individually. Hence, it should be meaningful to speak of the occurrence of at least one of the two events. But, the occurrence of at least one of A and В is equivalent to the occurrence of the union of A and B. Thus, if A, В 6 A, then we require that A U В 6 A; that is, A should be closed under finite unions. For mathematical reasons, we will impose the stronger requirement that A be closed under countable unions. To summarize, we see that the event class, A, should be closed under complementation and countable unions. In other words, A should be а a-algebra of subsets of Q. We now turn our attention to probability. In the axiomatic treatment of probability, we assume that to each event, E, there corresponds a num- ber, P(E), representing the probability that event E occurs. Thus, we can think of P as a set function defined on the collection, A, of events. We will employ the relative-frequency interpretation of probability in order to delineate the properties required of P. So, assume that the experiment is repeated a large number, n, of times. Then, by (5.1), P(E) « n(E)/n for each event, E. Clearly, n(E)/n > 0
5.1 The Mathematical Model for Probability □ 265 and, consequently, we require that P(E) > 0 for each event, E. In other words, probabilities should be nonnegative numbers, an obvious restric- tion. Note also that since Q contains all of the possible outcomes of the experiment, it must occur every time the experiment is performed. Hence, n(Q)/n = 1, which means that we should have P(Q) = 1, another obvious condition. Further, since 0 represents an impossibility, n(0)/n = 0, which means that we should have P(0) = 0, again an obvious condition. Finally, suppose that A and В are mutually exclusive (disjoint) events. Then, we have n(A UB) = n(A) 4- n(B) and, consequently, by (5.1), P(A u B) « ra(A ug) = п(л) + n(B) = 2^ + « Р(Л) + P(B). n n n n Hence, we require P to be finitely additive. Again, for mathematical rea- sons, we will impose the stronger condition of countable additivity. This and the previous paragraph indicate that P should be a probability measure on the a-algebra, A, of events. In summary, the mathematical model for a random experiment consists of a set, Q, containing the possible outcomes of the experiment; a a-algebra, Л, of subsets of Q, representing the collection of events; and a probability measure, P, on Л, where, for each E € A, P(E) is interpreted as the probability that event E occurs. As we learned, in Example 4.1(h) on page 169, the triple, (Q,.A, P), is called a probability space. DEFINITION 5.1 Probability Space A probability space is a triple, (Q, Л, P), where Q is a set, A is a a-algebra of subsets of Q, and P is a probability measure on A. The following examples illustrate the discussion of probability spaces. We leave any remaining details as exercises for the reader. EXAMPLE 5.2 Illustrates Definition 5.1 a) Refer to Example 5.1(a) on page 264. In this case, we take A — P(fl) so that every subset of Q is an event. If the coin is balanced, then, by symmetry, each possible outcome should be equally likely, implying that each has probability 1/8. This, in turn, implies that the appropriate probability measure is P = 7/8, where 7 is counting measure on P(Q). In other words, for each E G Л, P(B) = 2V(B)/8, where N(E) denotes the number of elements of E.
266 □ Chapter 5 Elements of Probability b) Suppose that fi is a countable sample space, that is, the experiment has either a finite or countably infinite number of possible outcomes, say cui, o?2, ... . For a countable sample space, we always take A = P(fi). Let pn = P({o?n}). Then P = £npnt>u>n; that is, for E € A, P(E) = £ pn. c) As a special case of part (b), suppose that fi is finite and that each possible outcome is equally likely. Then we must have pn = l/7V(fi) for n = 1, 2, ..., 7V(fi) and, moreover, p(E} = w. P(E’ JV(fi) for each event, E. This probability model is often referred to as the discrete uniform model. It can be used as the mathematical model for selecting a point at random from the finite set fi. d) Suppose that fi is a bounded Lebesgue measurable subset of 1Zn having positive Lebesgue measure and let A = { fi П M : M G A4n }• For E e A, define P(E) = An(E)/An(fi). Then (fi, A, P) is a probability space. This probability model is often referred to as the continuous uniform model. It can be used as the mathematical model for selecting a point at random from the set fi. □ Because a probability space is, in particular, a finite measure space, we can immediately infer for probability spaces any properties of finite measure spaces. For future reference, we list some of the more important properties of probability measures in Proposition 5.1. PROPOSITION 5.1 Suppose that is a probability space and that A, B, and E are events, that is, А, В, E € A. Then the following hold: a) If A С B, then P(B \ A) = P(B) - P(A). b) P(EC) = l-P(B). с) А С В => P(A) < P(B). d) 0 < P(B) < 1. e) P(A U B) = P(A) 4- P(B) - P(A П B). f) If {Bn}Xi c -A Ei E2D '••, then p(C}En}= lim P(Bn). \ 1 1 / n—>oo
5.1 The Mathematical Model for Probability □ 267 g) If {.Enl^Li c Ei С E2 C , then p( U = lim w)- 'n=l ' n->0° h) If {En}n C A, then p(l)En} <^P(En). ' n ' n In probability, this last property is called Boole’s inequality. Conditional Probability Frequently, we need to obtain the probability of an event, B, under the condition that another event, A, has occurred. For instance, consider the experiment of selecting an adult American at random. We might be inter- ested in the probability that the person selected is a Democrat (event B). But we also might want to know the probability that the person selected is a Democrat assuming that the person selected is a female (event A). The former probability, as we know, is denoted by P(B). On the other hand, the latter probability is denoted by P(B\ A), read “the probability of В given A,” and is called the conditional probability of event В given that event A has occurred. More generally, we can refer to the relative-frequency interpretation of probability in order to obtain a formal definition of conditional probability; that is, a definition in terms of the original probability space, (Q,A, P). So, assume that the experiment is repeated a large number, n, of times. Let E be an event with nonzero probability. Given that event E occurs, an event F will occur if and only if event E П F occurs. Consequently, in the n repetitions of the experiment, the relative frequency of occur- rence of event F among those times in which event E has occurred equals n(E П F)/n(E). But, by (5.1), n(EnF) _ n(EQF)/n _ P(EnF) n(E) “ n(E)/n ~ P(E) ’ Therefore, we make the following definition: DEFINITION 5.2 Conditional Probability Let (Q, Л, P) be a probability space and E G A with P(E) > 0. Then, for F G A, the conditional probability of event F given that event E
268 □ Chapter 5 Elements of Probability has occurred is defined by P(F|E) = P(E П F) P(E) EXAMPLE 5.3 Illustrates Definition 5.2 Refer to Examples 5.1(a) and 5.2(a). Suppose that a balanced coin is tossed three times. Let В denote the event that a total of two heads are tossed and A denote the event that the first toss is a head. We have A = {ИНН, HHT, HTH, HTT} and В = {HHT, HTH, THH}. Consequently, the conditional probability of event В given that event A has occurred is P(n I дч = P(AHB) = P({HHT, HTH}) = I = 1 1 ' P(A) P({HHH, HHT, HTH, HTT}) I Observe that the (unconditional) probability of event В is P(B) = - = 0.375. Hence, the information that event A has occurred affects the probability that event В occurs. □ The next proposition, whose proof we leave as an exercise for the reader, shows that, for fixed E, the set function P(-1E) is a probability measure on A. That probability measure provides the likelihood of events under the condition that event E has occurred. PROPOSITION 5.2 Let (Q, A, P) be a probability space and E e A with P(E) > 0. Define Pe on A by Pe(A) = P(A | E). Then Pe is a probability measure. Independent Events Next we will define independence for events. Intuitively, event F is inde- pendent of event E if the occurrence or nonoccurrence of event E doesz not affect the probability of F; that is, if P(F | E) = P(F). In view of Defini- tion 5.2, this is equivalent to the condition that P(E П F)/P(E) = P(F). Clearing fractions yields the equation P(E П F) = P(E)P(F). This last equation has the advantages of symmetry and not requiring the event E to have positive probability. Hence, we make the following definition:
5.1 The Mathematical Model for Probability □ 269 DEFINITION 5.3 Independent Events Two events, E and F, are said to be independent^ if P(FAF) =P(B)P(F). If E and F are not independent, then they are called dependent. EXAMPLE 5.4 Illustrates Definition 5.3 Refer to Example 5.3. Suppose that a balanced coin is tossed three times. Let A denote the event that the first toss is a head, В the event that a total of two heads are tossed, and C the event that the last two tosses are heads. We have P(A) = 4/8 = 0.5, P(B) = 3/8 = 0.375, P(C) = 2/8 = 0.25, P(A ПВ) = 2/8 = 0.25, and P(A ПС) = 1/8 = 0.125. It follows that P(A П B) / P(A)P(B) and P(A A C) = P(A)P(C). Hence, events A and В are dependent, while events A and C are independent. □ We have defined independence for two events. For more than two events, we must be careful to distinguish between two types of indepen- dence, pairwise independence and mutual independence. Events Ai, A2, ..., An are said to be pairwise independent if, for г j, A$ and Aj are independent in the sense of Definition 5.3. In probability theory, however, the concept of mutual independence plays a more prominent role. DEFINITION 5.4 Mutually Independent Events Let (fi,Л, P) be a probability space. Events Ai, A2, ..., An are said to be mutually independent if for each subset {zi, гг,. .., im} of {1,2,..., n}, we have Р(А^ A Ai2 A • • • A Aim) = Р(Аъ)Р(Ла) • • • P(Aim). The events of an arbitrary (not necessarily finite) collection are called mutually independent if every finite number of them are mutually in- dependent. t The terms statistically independent, stochastically independent, and probabilistically independent are also used.
270 □ Chapter 5 Elements of Probability Note: Although mutually independent events are pairwise independent, the converse is not true. See Exercise 5.18(a). One advantage of mutual independence over pairwise independence is that, with mutual independence, events formed by set operations on disjoint subcollections are also mutually independent. For example, if E, F, and G are mutually independent events, then E U F and G are also independent events. The following theorem plays a crucial role in many probabilistic argu- ments. In interpreting the theorem, observe that for a sequence of events, {An}^Li, the event ПХ1 (Ujb=n ^*0 occurs if and only if infinitely many of the Ans occur. THEOREM 5.1 Borel-Cantelli Lemma Suppose that (Q, A, P) is a probability space and that {An}^ C A. a) If12n=i p(An) < oo, then b) If Ai, A2, ... are mutually independent and 52X1 -P(An) = 00, then PROOF: For convenience, set En = a) We have Ex D E2 Z) • • and |"]Xi = rC=i(Ub=n Afc)- Applying Proposition 5.1(f) and Boole’s inequality, we obtain that GOO \ 00 П En ) = lim P(Pn) < lim VP(Afc) = 0, ’ J п-^оо П—ЮО ' »t=l 7 fc=n where the last equation holds because P(An) < oo. b) In this part, we will use the fact that, for x > 0, e~x > 1 — x. Let ntN be fixed but arbitrary. Applying Proposition 5.1(f) to the sequence of events, A£ for m = n, n 4-1, ..., and using Exercise 5.20(b), we
5.1 The Mathematical Model for Probability □ 271 get that Goo \ / m p(n-n л=П 7 ^fc=n m m = nm)=,ta,n[i-p(M k=n k=n m < lim ГТ e-p(j4fc) m—>oo k=n lim exp - > PMfc) m—>oo z—-' L k=n = o, where the last equality holds since Y^k=nP(Ak) = oo. Consequently, for each пбЛС P(En) = 1- The required result now follows easily. EXERCISES 5.1 5.1 Marilyn vos Savant publishes a column in Parade magazine. A variation of the following problem appeared in her column and caused tremendous controversy among the mathematical community: On a game show, there are three doors behind which there is one prize each. Two of the prizes are worthless and one is valuable. A contestant selects one of the doors following which the game-show host, who knows where the valuable prize lies, opens one of the remaining two doors to reveal a worthless prize. The host then offers the contestant the opportunity to change his selection. Should he switch? Hint: Use the relative-frequency interpretation of probability. 5.2 Refer to Example 5.2 on page 265. Provide the details for parts (a)-(d) of that example. 5.3 Suppose that (Г,5,/х) is a measure space and that Q G S is such that 0 < /z(Q) < oo. Let A = So and, for E G A, define P(E) = /i(E)/jz(Q). Show that (Q, A, P) is a probability space. 5.4 Refer to Example 5.1(b) on page 264. As in the example, let Q = [0, oo) and set A = A4[o,oo)- Experience shows that the probability is 1 — e~7t that the first patient arrives within t hours of 6:00 PM. a) Prove that there exists a unique probability measure on A consistent with the previous sentence. b) Determine explicitly the probability measure in part (a). c) Determine the probability that the first patient arrives between 6:15 and 6:30 PM. 5.5 Provide the proof for Proposition 5.1 on page 266. You may cite any theo- rems from Chapter 4. 5.6 Let (Q,A, P) be a probability space and {An}n a sequence of events with P(An) = 1 for each n. Prove that P(Qn An) = 1.
272 □ Chapter 5 Elements of Probability 5.7 Use induction to prove the following generalization of Proposition 5.1(e): If Ei, E2, • • •, En are n events, then P{E1 U E2 U • • • U E„) = y^F(Ei) - £2 P(Eii n Ei2) + • • • i=l ii<i2 + (-l)*+1 $2 P(EiinEi2n---nEik) + • • - + (-l)n+1P(Ei П E2 П • • • П En). 5.8 Suppose that a coin has probability, p, of coming up heads, where 0 < p < 1. Consider the experiment of tossing the coin until a head appears. a) Determine a sample space for this experiment. b) Assign probabilities to each of the possible outcomes. c) Construct a probability space for the experiment. d) Repeat parts (a)-(c) if p = 1. e) Repeat parts (a)-(c) if p = 0. 5.9 Consider the experiment of rolling two balanced dice. a) Construct a probability space for the experiment. b) Determine the probability of rolling doubles, that is, of both dice coming up the same number. c) Use Definition 5.2 to obtain the conditional probability of rolling doubles given that the sum of the dice is four. d) Solve part (c) without using Definition 5.2 but instead by constructing a new sample space based upon the condition that the sum of the dice is four. 5.10 Suppose that two cards are selected at random from an ordinary deck of 52 playing cards, where the first card selected is not replaced prior to the drawing of the second card. a) Employ counting techniques to determine the number of possible out- comes of the experiment. b) Use Definition 5.2 and counting techniques to obtain the conditional probability that the second card selected is a heart given that the first card selected is a heart. c) Solve part (b) without using Definition 5.2 but instead by constructing a new sample space based on the condition that the first card selected is a heart. 5.11 Refer to Exercise 5.4. a) Determine the probability that the first patient arrives after 6:15 PM. b) Determine the (conditional) probability that the first patient arrives af- ter 6:15 PM given that the first arrival occurs after 6:10 PM. 5.12 Prove Proposition 5.2 on page 268.
5.1 The Mathematical Model for Probability □ 273 ★5.13 Let (Q, А, P) be a probability space. Suppose that {En}n is a sequence of pairwise mutually exclusive events with |J En — Q. a) Prove that, for each event A, PU)=^P(EnflA). n b) Assuming also that P(Fn) > 0 for each n, prove the law of total probability: For each event A, P(A) = £ P(A | E„)P(E„) = $2 PEn (A)P(En). n n c) Assuming also that P(A) > 0, prove Bayes’ rule (named in honor of the 18th century clergyman, Thomas Bayes): For each fc, Р<Е,|Л)-£.р(л I &)««.)• 5.14 This exercise considers some basic properties of independence. a) Show that if events Ё and F are both mutually exclusive and indepen- dent, then either P(E) = 0 or P(F) = 0. Equivalently, two events with positive probability cannot be both mutually exclusive and independent. b) Show that if event E and event F are independent and E C F, then either P(E) = 0 or P(F) = 1. 5.15 Refer to Example 5.2(d) on page 266. Take n = 1 and Q = [0,1]. Sup- pose that [a, 5] is a nonempty, proper subinterval of [0,1]. Determine all subintervals of [0,1] that are independent of [a, 5]. 5.16 Suppose that a card is randomly selected from’an ordinary deck of 52 playing cards. Let A denote the event that the card selected is a king, В the event that the card selected is a heart, and C the event that the card selected is a face card. a) Are events A and В independent? b) Are events A andC independent? 5.17 Refer to Example 5.2(d) on page 266. a) Let Q = {(x, y) € 7£2 : 0 < x, у < 2 }. Suppose that a point is selected at random from Q. Let A denote the event that the ж-coordinate of the point selected is at most one and let В denote the event that the ^-coordinate of the point selected is at most 0.5. Determine whether A and В are independent events. b) Repeat part (a) if Q = {(ж, у) C И2 : 0 < у < x < 2 }.
274 □ Chapter 5 Elements of Probability 5.18 Suppose that two balanced dice, one orange and the other black, are rolled. Let A = event the orange die comes up even; В = event the black die comes up even; C = event the sum of the dice is even; D = event the orange die comes up 1, 2, or 3; E = event the orange die comes up 3, 4, or 5; F = event the sum of the dice is 5. a) Show that the events, A, B, and C, are pairwise independent but not mutually independent. b) Show that A U В and C are dependent events. c) Show that P(P П E П F) = P(D)P(E)P(F) but that D, E, and F are not pairwise independent (and, hence, not mutually independent). 5.19 Prove that if E and F are independent events, then so are E and Fc. 5.20 Suppose that Ai, Аг, .. •, An are mutually independent events. a) Prove that Ai U Аг U • • • U An-i and An are independent events. Hint: Use induction. b) Prove that P(Q^=1 A£) = Пл=1 Use induction, part (a), and Exercise 5.19. 5.2 RANDOM VARIABLES When a random experiment is performed, it is often some numerical quan- tity associated with the outcome that is of interest, rather than the out- come itself. For example, consider the classical (noncasino) game of craps in which two balanced dice are rolled. Each possible outcome of the ex- periment can be represented as an ordered pair of integers, (г, j), where i and j are the number of dots showing on the two dice. But what is of concern here is the sum, i + J, not the outcome, (г, j), itself. Similarly, in studying the relationship between height and weight, we might sample in- dividuals from the population. Here we would be interested in the heights and weights of the individuals selected, not the individuals themselves. In the first example of the previous paragraph, we have a real-valued function, sum of the two dice, defined on a sample space; and, in the second example, a vector-valued function, (height, weight), defined on a sample space. Traditionally, in probability, real-valued functions on a sample space are called random variables and vector-valued functions on a sample space are called random vectors. It is also traditional to denote random variables and vectors by uppercase italicized English-alphabet letters near the end of the alphabet.
5.2 Random Variables □ 275 Random Variables and Their Distributions For a rigorous development of random variables and random vectors, we need to be more precise. So, suppose that (Q, A, P) is a probability space and that X is a real-valued function on fi. Usually, we are interested in the probability that X takes on various values (e.g., the probability that X equals two, that X exceeds 7.5). More generally, for each Borel set, B, we want to know the probability that the value of X is a member of B; that is, P({ u> : X(w) G В }). But, for that probability to exist, { ш : X(cu) G В } must be an event. Hence, we make the following definition: DEFINITION 5.5 Random Variable Let (Q, Л, P) be a probability space. A real-valued function, X, on Q is called a random variable if { w : X(o>) G В } G A for each В e В. Remark: From Exercise 4.21 on page 181, we know that a real-valued func- tion f on Q is Л-measurable if and only if /-1(B) G A for each В G B. Thus, we see that random variables are just real-valued Л-measurable func- tions. However, as we mentioned in Section 4.2, the term “random variable’' is used for measurable functions in the context of probability spaces, even though the measurability (or nonmeasurability) of a function has nothing at all to do with a measure. In probability, we ordinarily employ the notation {X 6 B} in place of the more common notations, X~1(B) or {cu : X(o>) G B}. The reason is that the former notation is more suggestive. Also, for brevity, commas usually replace intersection symbols in probability expressions involving events defined in terms of random variables. For instance, we generally write P(X G А,У G B) instead of P({X G А} П {Y G B}). One of the most important quantities affiliated with a random variable is its probability distribution. Roughly speaking, the probability distribu- tion of a random variable describes the probabilities associated with the various values of the random variable. More precisely, we have: DEFINITION 5.6 Probability Distribution Let X be a random variable on the probability space (Q, Л, P). Then the probability distribution of X, denoted дх, is the set function on В defined by — P(X G B).
276 □ Chapter 5 Elements of Probability The proof of the next proposition is left to the reader as an exercise. PROPOSITION 5.3 Let X be a random variable on the probability space (Sl,A,P). Then px is a probability measure on B. In the following example, we will present some illustrations of random variables and their probability distributions. The reader should supply the required details of verification. EXAMPLE 5. 5 Illustrates Definition 5.6 a) A random variable, X, is said to be a discrete random variable if there is a countable set, K, such that P(X € K) = 1. For such a random variable, write К = {xn}n. Then the probability distribu- tion of X is given by px = where pn = P(X = zn). For a discrete random variable, the function, px-H —* [0,1], defined by Px(^) = P(X = x), is called the probability mass function (pmf) of X. Note that px is zero on Kc and that px(zn) = pn. b) Suppose that two balanced dice are rolled. An appropriate probability space is obtained by taking Q = { (г, j) : г, j = 1,2,..., 6 }, A = P(Q), and P = 7/36 where 7 is counting measure. Let X denote the sum of the dice. Because P(X E {2,3,..., 12}) = 1, we see that X is a discrete random variable. The pmf of X is px(x) = < (x-l)/36, (13 - z)/36, 0, x = 2, 3, . x = 8, 9, . otherwise. 7; 12; c) A random variable, X, is said to be an absolutely continuous ran- dom variable if there is a nonnegative Borel measurable function, /, such that px(B) = fBfdX for all В E B? For such a random vari- able, we usually write f = fx and call fx the probability density function (pdf) of X. d) Suppose that a number is selected at random from the interval [0,1] and let X denote the number obtained. Then, for В E B, we have Mx(B) = P(X E В) = A(B П [0,1]) = fB X[o,i] dA. Hence, X is an t In elementary probability courses, absolutely continuous random variables are usually referred to simply as continuous random variables. However, as we will see in part (e), to be precise we need to include the adjective “absolutely.”
5.2 Random Variables □ 277 absolutely continuous random variable with pdf, fx = X[o,i]- Such a random variable is said to have the uniform distribution on [0,1]. e) A random variable, X, is said to be a continuous random variable if P(X = x) = 0 for all x € P. Note that if X is a continuous random variable, then P(X € K) = 0 for each countable subset К C Tl\ thus, a continuous random variable is not discrete and vice versa. Also, note that an absolutely continuous random variable is a continuous random variable. However, the converse is not true. See Exercise 5.28. f) There are random variables that are neither discrete nor continuous. See, for instance, Exercise 5.32. □ Closely associated with the probability distribution of a random vari- able is the probability distribution function. We define this next. DEFINITION 5.7 Probability Distribution Function Let X be a random variable on the probability space (Q, Д, P). Then the probability distribution function of X, denoted Fx, is the real-valued function on R, defined by Fx(rr) = P(X < x). Remark: From Definitions 5.6 and 5.7, we see immediately that, for a random variable, X, the probability distribution and probability distribu- tion function are related by the equation Fx(a?) = Мх((~°°>ж])- In other words, the probability distribution function of X is also the distribution function of px, in the sense of Definition 4.19 on page 221. EXAMPLE 5. 6 Illustrates Definition 5.7 a) For a discrete random variable, as described in Example 5.5(a), we have *x(z) = Xxn<xPn = Y,t<xPX^- b) For an absolutely continuous random variable, as described in Exam- ple 5.5(c), we have Fx(tf) — fx(t) dt, where, in general, the integral is a Lebesgue integral. □ Clearly, two random variables having the same probability distribution must also have the same probability distribution function. The converse is also true, as the next theorem shows. THEOREM 5.2 Two random variables having the same probability distribution function have the same probability distribution; that is, Fx = Fy => px = Mr •
278 □ Chapter 5 Elements of Probability PROOF: Let F = Fx = Fy. By assumption, both of the finite Borel measures, /z% and /zy, have F as their distribution function. Therefore, by the uniqueness portion of Theorem 4.13 on page 226, we must have Given a probability measure, /z, on the Borel sets, or, equivalently, a distribution function with F(oo) = 1, does there exist a probability space and a random variable defined thereon whose probability distribution is /z? The answer is yes! See Exercise 5.25. Random Vectors and Their Distributions Frequently, we are interested in two or more numerical quantities associated with the outcome of a random experiment, for example, the height and weight of a randomly selected individual. This leads to the notion of a random vector or, equivalently, two or more random variables considered simultaneously. To begin our discussion of random vectors, we recall that Bn denotes the a-algebra generated by the open sets of Ип and that the members of Bn are termed n-dimensional Borel sets. In Exercise 4.171(a), we showed that Bn = В x • • • x B\ in other words, Bn is also the cr-algebra generated by the n-dimensional Borel rectangles—sets of the form x ••• x Bn, where Bk € Б, 1 < k < n. With these facts in mind, we now prove Proposition 5.4. PROPOSITION 5.4 Let Xi, ..., Xn be n random variables all defined on the same probability space (Q, Л, P). Then { w: (A'i(u'),..., Xn(u/f) € В } e A for all В e Bn. PROOF: Let Bk € B, 1 < к < n. Because each Xk is a random variable, we have {Xk € Bk} E A for 1 < к < n. Therefore, because A is a a-algebra, { w : (XjCw), .... Xn(w)) € By x • • • x Bn } = Q {Хк € Bk} E A. (5.2) fc=l Now, let 7={Вб^:{и:(Ш-Л(«))бВ}бЛ}. Since Bn and A are сг-algebras, so is F. Furthermore, by (5.2), F contains all n-dimensional Borel rectangles. Thus, F — Bn.
5.2 Random Variables □ 279 In view of Proposition 5.4, we now make the following definition: DEFINITION 5.8 Joint Probability Distribution Let Xi, ..., Xn be n random variables all defined oiHrhe same prob- ability space (П,Д, P). Then the joint probability distribution of Xi,... , Xn, denoted /zx1 x„> is the set function on Bn defined by цХ1,...,хЛВ) = р((хъ.. .’,k) e в). The proof of the following proposition is left to the reader. PROPOSITION 5.5 Let %i, ..., Xn be n random variables all defined on the same probability space (Л,Л, P). Then pxi,...,xn Is a probability measure on Bn. Here now are some examples of joint probability distributions. The details of verification should be supplied by the reader. EXAMPLE5.7 Illustrates Definition 5.8 a) Random variables, Xi, ..., Xn, all defined on the same probability space (fi, Л, P), are said to be jointly discrete if there is a countable set, К C 7£n, such that P((Xi,... ,Xn) € X) = 1. It is easy to see that if Xi, ..., Xn are jointly discrete, then each X^, 1 < к < n, must be a discrete random variable. The function, Pxi,...,Xn: [0,1], defined by .• ,*n) = P(Xi = xu ... ,Xn = xn) is called the joint probability mass function (joint pmf) of Xi, ..., Xn. In this context, each individual pmf, pxk, 1 < к < n, is called a marginal probability mass function (marginal pmf). b) Random variables, Xi, ..., Xn, all defined on the same probability space (О,Д, P), are said to be jointly absolutely continuous if there is a nonnegative jBn-measurable function, /, on TV1 such that MXi,...,Xn(^) = fsfd^n for all В e Bn. For such a random variable, we usually write f = fx^.^Xn and call /xi,...,x« the joint probability density function (joint pdf) of Xi, ..., Xn. It is not too difficult to show that if Xi,..., Xn are jointly absolutely continuous, then each X^, 1 < к < n, must be absolutely continuous. In this context, each indi- vidual pdf, /xfc, 1 < к < n, is called a marginal probability density function (marginal pdf). □
280 □ Chapter 5 Elements of Probability In analyzing jointly distributed random variables, it is useful to gener- alize the concept of a probability distribution function to apply to several random variables. This is done in Definition 5.9. DEFINITION 5.9 Joint Probability Distribution Function Let Xi, ..., Xn be n random variables all defined on the same prob- ability space (Q,A, P). Then their joint probability distribution function, denoted Fxlj...jxn, is the real-valued function on 1Zn defined by Fxi(^i,..., xn) = P(Xi < Xi,..., Xn < xn). Remark: From Definitions 5.8 and 5.9, we see that the joint probability distribution and joint probability distribution function are related by the equation Fx1,...,x„(xi,...,xn) = Mx1,...,x„((-00,a;i] x ••• x (-00,жп]). By the previous remark, it is clear that if Xi, ..., Xn and Yi, ..., Yn have the same joint probability distribution, then they must also have the same joint probability distribution function. That the converse is also true is an immediate consequence of Exercise 4.174 on page 259. THEOREM 5.3 Two random vectors having the same joint probability distribution function have the same joint probability distribution; that is, Fxx,...,xn = FY1.yn => = ДУ!...yn- Given a probability measure, p, on Bn, does there exist a probability space and random variables defined thereon whose joint probability distri- bution is //? The answer is yes! See Exercise 5.44. Independent Random Variables Next we will discuss independence for random variables. Let us begin by considering two random variables. Intuitively, two random variables are independent if knowing the value of one of the variables does not affect the probability distribution of the other random variable. To be precise, two random variables, X and Y, are called indepen- dent if for each pair of Borel sets, A and B, the events {X € A} and {Y G B} are independent in the sense of Definition 5.3 on page 269; that is, if P(X € A, Y G B) = P(X G A)P(Y G B). More generally, we have the following definition:
5.2 Random Variables □ 281 DEFINITION 5.10 Mutually Independent Random Variables Random variables, Xi, ..., Xn, all defined on the same probability space (Q, Л,Р), are said to be mutually independent if P(xx e Bi,...,xn e Bn) = P(xx e BJ • • • P(xn e Bn), for all Borel sets Bi, ..., Bn. The random variables of an infinite col- lection are called mutually independent if the random variables of each finite subcollection are mutually independent. In other words, if I is an infinite set, then the random variables {Xc}lEi are mutually inde- pendent if, for each n 6.V and subset {ti,..., tn} C /, the n random variables Xtl, ..., Xin are mutually independent. We can also define pairwise independence for random variables: Ran- dom variables [Xl}lEi, all defined on the same probability space, are said to be pairwise independent if, for each pair of distinct elements г, j 6 I, the random variables Хг and X3 are independent. It is easy to see that mu- tually independent random variables are pairwise independent. However, the converse is not true. See Exercise 5.45(b). EXAMPLE 5.8 Illustrates Definition 5.10 Consider the experiment of rolling three balanced dice, say, one orange, one green, and one black. Let Xi, Хг» and X$ denote the number of dots facing up on the orange, green, and black dice, respectively, and let X4 denote the sum of the three dice. Then it is clear intuitively that Xi, X2, and X3 are mutually independent but that Xi, X2, X3, and X4 are not even pairwise independent. The reader should justify these statements mathematically. □ An important property of mutual independence is that functions of disjoint subcollections of mutually independent random variables are also mutually independent. That is, we have the following proposition: PROPOSITION 5.6 Suppose that Xi, ..., Xn are mutually independent random variables and that nj e A/*, 1 < j < fc, with ni < П2 < • • • < njt = n. Further
282 □ Chapter 5 Elements of Probability suppose that fi is Bni-measurable, /2 is Bn2_ni-measurable, ..., and Д is ВПк-Пк_1-measurable. Then the random variables, А(хг,...,xni), /2(хП1+1)• • •,xn2),fk(xnk_1+1,...,x„k), are mutually independent. PROOF: We will prove the proposition in case nj = j, 1 < j < к = n. The general case is left as an exercise for the reader. Let Bj G В for 1 < j < n. For the special case, we have P(f1(X1)eB1,...Jn(Xn)eBn) = Р(Хг 6 /г^вд...,xn e /-'(Bn)) = p(Xi 6 /{-'(BJ) • -p(xn e f-\Bn)) = p(f1(x1)eB1)---P(fn(xn)eBn), as required. We will now obtain two equivalent conditions for the mutual indepen- dence of random variables. THEOREM 5.4 Suppose that Xi, ..., Xn are random variables all defined on the same probability space (0,Л, P). Then Xi, ..., Xn are mutually independent if and only if Vx^.^Xn = MXj x * • * X gxn; (5.3) that is, if and only if the joint probability distribution of Xi, ..., Xn is equal to the product measure induced by the n marginal probability distributions. PROOF: Let B^, for 1 < к < n, be any n Borel sets. Suppose first that (5.3) holds. Then P(Xi eBi,...,xn eBn) = gxi,...,xnf X Bk\ ' fc=i ' = f X \ f X в*} 4 k=i ' 4 fc=i 7 = ПмхДВ^) = ПР(х,еВ^. fc=i fc=i Hence, Xi, ..., Xn are mutually independent.
5.2 Random Variables □ 283 Conversely, suppose that ., Xn are mutually independent. Then we have ✓ 71 к n цХ1,...,х„( X Bk) =P(x1eB1,...,xneBn) = Y[P(XkeBk) ' fc=l ' fc=l = =(x V x fc=l ' fc=l ' ' fc=l ' Thus, px^...,xn agrees with X£=1^xfc on n-dimensional Borel rectangles. Therefore, by Exercise 4.171(b) on page 259, ДХ1,...,ХП = Х£=1 ^xfc- Our second equivalent condition for mutual independence is, in prac- tice, easier to verify than the one given in Theorem 5.4. THEOREM 5.5 Suppose that Xi, ..., Xn are random variables all defined on the same probability space (Q,A, P). Then Xi, ..., Xn are mutually independent if and only if for all xlf ..., xn e 7£, Ex1,..„x„(xi,...,rn) = -Fx1(xi)---Fx„(a:n); (5.4) in other words, if and only if the joint probability distribution function of X-i, ..., Xn is equal to the product of the marginal probability distri- bution functions. PROOF: Let Xi, ..., xn be any n real numbers. Suppose first that Xi, ..., Xn are mutually independent. Then ^X1}...,Xn(xb • • • ,^n) = P(X1 < £1, • • • ,Xn < xn) n ' n = Под<^) = Пад. k=l k=l Hence, (5.4) holds. Conversely, suppose that (5.4) holds. Then we have Mx1,...,Xn(Xfc=1(-oo,xfe]) = П^=1 Mfc((—oo,Xfc]). Thus, by Exercise 4.175 on page 259, ^Xi,...,xn = X£=1 Pk and, consequently, on account of The- orem 5.4, Xi, ..., Xn are mutually independent. We should point out that special equivalent conditions for mutual inde- pendence exist for jointly discrete and jointly absolutely continuous random variables. See Exercises 5.51 and 5.53 for details.
284 □ Chapter 5 Elements of Probability EXERCISES 5.2 5.21 Prove Proposition 5.3 on page 276. 5.22 Provide the details of verification for parts (a), (b), (d), and (e) of Exam- ple 5.5 on page 276. 5.23 Let (П,Д, P) be a probability space and X a random variable defined thereon. Respond True or False to each of the following statements. Justify your answer. a) If Q is countable, then X is a discrete random variable. b) If the range of X is countable, then X is a discrete random variable. c) If X is a discrete random variable, then the range of X is countable. 5.24 Prove that X is a continuous random variable if and only if its probability distribution function, Fx, is a continuous function on H. Hint: Refer to Exercise 4.169(b). 5.25 Let [i be a probability measure on B. Show that there exists a probability space and a random variable defined thereon whose probability distribution is p. Hint: Define an appropriate random variable on (7£,B,/z). 5.26 Refer to Example 5.5(a). a) Assume X is a discrete random variable with pmf, px- Let {xn}n be a sequence of real numbers such that P(X € {xn}n) = 1 and set Pn = px(xn). Prove that {pn}n is a sequence of nonnegative real num- bers whose sum is one. b) Conversely, suppose that {xn}n is a sequence of real numbers and that {pn} n is a sequence of nonnegative real numbers whose sum is one. Define p(x) = pn, if x = xn for some n, and zero otherwise. Prove that there is a discrete random variable, X, having p as its pmf. Hint: Employ Exercise 5.25. 5.27 Refer to Example 5.5(c). a) Assume X is an absolutely continuous random variable with pdf, /%. Show that fnfx dX = 1. b) Conversely, suppose that f is a nonnegative Borel measurable function such that J^fdX = 1. Prove that there is an absolutely continuous random variable, X, having f as its pdf. Hint: Employ Exercise 5.25. ★5.28 Let ф be the Cantor function and define F on by ( °, F(x) = k 1, x < 0; 0 < x < 1; x > 1. a) Show that F is the probability distribution function of a random vari- able, X. b) Prove that the random variable, X, in part (a) is continuous but not absolutely continuous.
5.2 Random Variables □ 285 ★5.29 An absolutely continuous random variable with pdf, f(x) = (27r)~ie~z2/2, is said to have the standard normal distribution. Suppose that X has the standard normal distribution and let Y = X2. a) Obtain the probability distribution function of the random variable, У, in terms of that of X. b) Show that Y is absolutely continuous and determine its pdf. c) Obtain the probability distribution of Y. (This probability distribution is called the chi-square distribution with one degree of freedom.) ★5.30 Suppose that a number is selected at random from the interval [a,/3] and let X denote the number obtained. a) Find the probability distribution function of the random variable X. b) Show that X is absolutely continuous and determine its pdf. c) Determine the probability distribution of X. (This probability distribu- tion is called the uniform distribution on [ct,/3].) 5.31 Suppose that X has the uniform distribution on [0,1]. Let m E M and define Y = 1 + [mX], where [z] denotes the greatest integer in x. Obtain the pmf of Y. 5.32 Construct an example of a random variable, X, that is neither discrete nor continuous. Hint'. Let Y have the uniform distribution on [—1,1] and set X = У+. 5.33 Suppose that a point is selected at random from the unit disk, that is, from the set D = {(x,2/) : x2 + y2 < 1}. Let R denote the distance from the origin to the point obtained. a) Find the probability distribution function of the random variable R. b) Show that R is absolutely continuous and determine its pdf. c) Determine the probability distribution of R. 5.34 Refer to Exercise 5.32. Obtain the probability distribution function of X. 5.35 Prove Proposition 5.5 on page 279. 5.36 Refer to Example 5.7(a) on page 279. Write К = where Xj E Rn for each j. Determine the joint probability distribution of Xi, ..., Xn. 5.37 Refer to Example 5.7 on page 279. a) Suppose that X and У are jointly discrete random variables. Show that, individually, X and У are discrete random variables and determine their (marginal) probability mass functions in terms of the joint pmf. b) Suppose that X and У are jointly absolutely continuous random vari- ables. Show that, individually, X and У are absolutely continuous ran- dom variables and determine their (marginal) probability density func- tions in terms of the joint pdf.
286 □ Chapter 5 Elements of Probability 5.38 Refer to Example 5.7 on page 279. This exercise generalizes the previous one from n = 2 to general n. a) In Example 5.7(a), show that each Xk must be a discrete random variable and obtain its (marginal) pmf in terms of the joint pmf. b) In Example 5.7(b), show that each Xk must be an absolutely continuous random variable and obtain its (marginal) pdf in terms of the joint pdf. 5.39 Respond True or False to each of the following. Justify your answers. a) If Xi, ..., Xn are discrete random variables all defined on the same probability space, then they are jointly discrete. b) If Xi, ..., Xn are absolutely continuous random variables all defined on the same probability space, then they are jointly absolutely continuous. 5.40 Suppose that Xi, ..., Xn are mutually independent, absolutely continuous random variables. Prove that they are jointly absolutely continuous. ★5.41 Suppose that two balanced dice are rolled. Let X and Y be, respectively, the minimum and the maximum of the two numbers observed. a) Show that X and Y are jointly discrete. b) Determine the joint pmf of X and Y. c) Obtain the marginal pmf of X; of Y. 5.42 Suppose that a point is selected at random from the unit square, that is, from the set S = { (.т, у) : 0 < x, у < 1}. Let X and Y denote, respectively, the x- and ^/-coordinates of the point obtained. a) Show that X and Y are jointly absolutely continuous. b) Determine the joint pdf of X and Y. c) Obtain the marginal pdf of X; of Y. +5.43 Repeat the previous exercise if S is replaced by the unit disk, D. 5.44 Let p be a probability measure on 23n- Show that there exists a proba- bility space and random variables defined thereon whose joint probability distribution is p. 5.45 This exercise examines the relationship between mutual independence and pairwise independence of random variables. a) Suppose that Xi, ..., Xn are mutually independent random variables. Prove that they are also pairwise independent. b) Construct an example to show that pairwise independence does not im- ply mutual independence. 5.46 Provide a detailed verification for all statements made in Example 5.8 on page 281. 5.47 Supply the proof for Proposition 5.6 in the general case. +5.48 Consider an experiment having two possible outcomes, say, success, s, and failure, /, with respective probabilities, p and q = 1 — p. Suppose now that
5.2 Random Variables □ 287 the experiment is repeated independently a finite number of times. Such repetitions are called Bernoulli trials in honor of James Bernoulli. a) Construct a probability space for a sequence of n Bernoulli trials. b) Let X denote the total number of successes in n Bernoulli trials. Obtain the pmf and probability distribution of the random variable X. (This probability distribution is called the binomial distribution with pa- rameters n and p.) ★5.49 Refer to Exercise 5.48. Suppose that, for each n G V, Xn has a binomial distribution with parameters n and A/n, where A is a positive constant, a) Prove that, for each nonnegative integer fc, lim P(Xn = fc) = e"A 77. (5.5) n—foo Ki b) Let pk denote the quantity on the right-hand side of (5.5). Show that the function defined on by p(x) = pk, if x = к for some nonnegative integer k, and zero elsewhere, is the probability mass function of a ran- dom variable. (The probability distribution of such a random variable is called the Poisson distribution with parameter A.) 5.50 Consider an experiment having a finite number, r, of possible outcomes, say, oi, ..., Or, with respective probabilities, pi, ..., pr. Suppose now that the experiment is repeated independently a finite number of times. Such repetitions are called multinomial trials. a) Construct a probability space for a sequence of n multinomial trials. b) For each k, 1 < к < r, let Xk denote the total number of times that out- come Ok occurs in the n multinomial trials. Determine the joint pmf and the joint probability distribution of the random variables Xi, ..., Xr. (This probability distribution is called the multinomial distribution with parameters n and pi, ..., pr.) c) For each fc, 1 < к < r, determine the (marginal) probability distribution of Xk. Hint: Reformulate the model so that each trial has only two possible outcomes. 5.51 Suppose that Xi, ..., Xn are jointly discrete random variables. Prove that they are mutually independent if and only if their joint probability mass function is equal to the product of the marginal probability mass functions; that is, if and only if px1,...,xn(xi,... ,xn) = pxj(^i) • *-рхпЫ for all xi, ..., xn € 5.52 Let X and Y be the random variables defined in Exercise 5.41. Apply Exercise 5.51 to determine whether X and Y are independent. 5.53 Suppose that Xi, ..., Xn are jointly absolutely continuous random vari- ables. Prove that they are mutually independent if and only if the func- tion, /, defined on 'R.n by /(xi,..., xn) = /xj (xi) • • •/xn(xn), is a joint probability density function for Xi, ..., Xn.
288 □ Chapter 5 Elements of Probability 5.54 Apply Exercise 5.53 to determine whether the random variables, X and У, are independent, where X and Y are as in a) Exercise 5.42. b) Exercise 5.43. ★5.55 Let X and Y be jointly absolutely continuous random variables with joint pdf given by fx,y(x,y) =-----* where 0 < p < 1. a) Determine the marginal pdf of X and of Y. b) Show that X and Y are independent if and only if p = 0. ★5.56 Suppose that X and Y are independent random variables. a) Prove that /xx+y = px * P>y, where * denotes convolution of measures, as defined in Exercise 4.158 on page 256. b) If X is absolutely continuous, prove that X 4- Y is absolutely continuous and has pdf, fx+Y (2) = fx(z - y) dp,y(y). c) If both X and Y are absolutely continuous, prove that fx+Y = fx* fy, where * denotes convolution of functions, as defined in Exercise 4.157 on page 256. 5.3 EXPECTATION OF RANDOM VARIABLES In this section, we will discuss the expectation of a random variable, a concept that is central to the theory of probability and its applications. To motivate the formal definition of expectation, we will first provide an interpretation of its meaning. The most common interpretation is the long-run-average interpretation, which construes the expectation of a random variable to be the average value of the random variable in a large number of independent observations. More formally, let X be a random variable on a probability space (П,Л, P), and let £(X) denote its expectation. For n independent rep- etitions of the experiment, let Xi, ..., Xn represent the n values of the random variable, X. The long-run-average interpretation is that for large n, the average value of Xi, ..., Xn will be approximately equal to £(X): %- ----+ —n ~ 8(X), for large n. (5.6) n We will now employ (5.6) to motivate the formal definition of expecta- tion for a simple random variable, that is, a random variable that takes on only finitely many values, say, xi, ..., xm. So, assume that the experiment
5.3 Expectation of Random Variables □ 289 is repeated independently a large number, n, of times. Then, in view of the long-run-average interpretation of expectation and the relative-frequency interpretation of probability, we have £(X) « + | • n({X = xfc}) n n m m (5.7) * П k=l k=l where, as usual, n(E) denotes the number of times that an event E occurs in n repetitions of the experiment. Because of (5.7), we see that the expectation of a simple random vari- able, X, should be defined by £(X) = f>fcP(X = ;rfc), (5.8) fc=l where Xi, ..., xm are the possible values of X. But the quantity on the right-hand side of (5.8) is the abstract Lebesgue integral of the simple ran- dom variable, X, over Q with respect to P. Generalizing now to arbitrary random variables, we make the following definition: DEFINITION 5.11 Expectation of a Random Variable Let X be a random variable on a probability space (Q, Д, P). Then the expectation of X, denoted £(X), is defined by £(%) = [ X(u)dP(u), Jn (5-9) provided the integral on the right-hand side exists. If X 6 £X(Q, Л, P), that is, the integral on the right-hand side of (5.9) exists and is finite, then we say that X has finite expectation. Remark: Terms used synonymously for expectation are mean, expected value, and first moment.
290 □ Chapter 5 Elements of Probability EXAMPLE 5.9 Illustrates Definition 5.11 a) Suppose that two balanced dice are rolled and let X denote the sum of the dice. Note that X is a simple random variable, taking on the values 2, 3, ..., 12. And, as we found in Example 5.5(b) on page 276, P(X — к} — / - к) - I (13 _ fe)/36j Therefore, the expectation of X equals Г 12 £(X) = / XdP = \^kP(X = k) = 2 • k=2 к = 2, 3, ..., 7; к = 8, 9, 12. — + 3- — + -- - + 12- — = 7. 36 36 36 b) Suppose that a point is selected at random from the unit disk, that is, from the set D = {(x,y) : x2 4- y2 < 1}. Let R denote the distance from the origin to the point obtained. Referring to Example 5.2(d) on page 266, we see that an appropriate probability space for the ex- periment is (П,Л, P), where ft = D, A = {D Q M : M e A^}, and P(A) = 7Г-xA2(A) for A € A. Referring to Theorem 4.18 on page 251, we have ОД= [ RdP=- [ y/x2 + y2 dX2(x,y) JQ JD = | УУ \4r2 + y2dxdy = |, D where the last equality is easily obtained using polar coordinates. □ Since the expectation of a random variable is, by definition, its abstract Lebesgue integral, all properties of abstract Lebesgue integration apply immediately to expectation, for example, linearity, MCT, DCT. On the other hand, because a probability space is a finite measure space, with total measure equal to one, expectation has properties that are not shared by abstract Lebesgue integrals on arbitrary measure spaces. For instance, a bounded random variable has finite expectation and the expectation of a constant random variable is equal to the constant. Expectation in Terms of Probability Distributions All of the probabilistic information about a random variable, X, is con- tained in its probability distribution, [ix • This indicates that we should be able to express 8(X), the expectation of X, in terms of As a matter of fact, we can do considerably more.
5.3 Expectation of Random Variables □ 291 I X{XEB}dP = P(X&B) Jo THEOREM 5.6 Let X be a random variable on the probability space (Q, Д, P). Then, for each Borel-measurable function, g, on TZ, we have £(<?(*)) = f g(x)dpx(x), (5.10) Jn in the sense that if one side exists, then so does the other and they are equalJ In particular, 5(X) = [ xdpx(x). (5.11) Jn PROOF: We employ the bootstrapping technique. So, suppose first that g — Xb, where В € В. Then £(g(X)) = £ (х{хев}) = = цх(В) = / XB(x)dnx(x) = I g(x)dnx(x). Jn Jn Hence, (5.10) holds for characteristic functions. Next suppose that g is a nonnegative В-measurable simple function, say, g = £Г=1 bkXBk- Noting that g(X) = bkX{xeBk}, we can apply the linearity property of abstract Lebesgue integrals and the result of the previous paragraph to conclude that (5.10) again holds. Now assume that g is a nonnegative Borel measurable function. Us- ing Proposition 4.7(a) on page 186, we select a nondecreasing sequence of nonnegative В-measurable simple functions, {$n}^Li5 converging pointwise on TZ to g. Then it is easy to see that, {sn(X)}^t=1, is a nondecreasing se- quence of nonnegative random variables converging pointwise on Q to g(X). Consequently, by the MCT (applied twice) and the result of the previous paragraph, we have £(<z(X)) = (sn(X)) i sn{x)dnx{x) = f g(x) dgx(x). J n J n The remainder of the proof proceeds in the usual way and is left as an exercise for the reader. t Recall that g(X) is another notation for the composition, g о X, of g with X.
292 □ Chapter 5 Elements of Probability It is important to note that Theorem 5.6 provides us with two methods for obtaining the expectation of a function of a random variable. Specifi- cally, suppose that У is a random variable and that h is a Borel measurable function. Applying (5.10) with X = У and g = to, we have the formula £(Л(У)) = [ h(x)dpy(x). Jn On the other hand, by using (5.10) with X = Л(У) and g the identity function, we get the formula £(Л(У)) = [ xdnh(Y)(x). Jn Generally speaking, the first formula is easier to use because it avoids having to determine the probability distribution of Л(У). However, there are cases when the second formula is more efficient. We should also point out that Theorem 5.6 implies that the expecta- tion of a function of a random variable depends only on the probability distribution of the random variable. In other words, if X and У have the same probability distribution, then £(^(X)) = £(</(У)) for all Borel measurable functions, g. EXAMPLE 5.10 Illustrates Theorem 5.6 a) Refer to Example 5.5(a) on page 276. Suppose X is a discrete random variable with probability mass function, px- Let {xn}n be such that P(X e {zn}n) = 1 and set pn - P(X = xn)- Then px = Y,nP^xn and, hence, by (5.10), = ^g(x)dpx(x) = ^g(xn)pn = 52g(x)px(x), for each Borel-measurable function, g. b) Refer to Example 5.5(c) on page 276. Suppose that X is an absolutely continuous random variable with probability density function, Then /zx(B) = fx dX and, hence, by (5.10), £(g(X))= [ g(x)dpx(x)= f g(x')fx(x')dX(x), Jn Jn for each Borel-measurable function, g.
5.3 Expectation of Random Variables □ 293 c) Let X be a random variable on (Q,X,P) and n € ЛЛ If Xn e C^P), then we say that X has a finite nth moment and we define the nth moment of X to be 8(Xn). By Theorem 5.6, specifically, (5.10), we have £(Xn) = xn dpx(z). It can be shown that if X has a finite nth moment, then it has a finite moment of each order less than n. □ Next we will discuss a generalization of Theorem 5.6 to random vec- tors. The proof of this generalization is essentially identical to that of Theorem 5.6 and is left as an exercise for the reader. THEOREM 5.7 Let Xi , ..., Xn be random variables all defined on the same probability space, (Q,A,P). Then, for each Bn-measurable function, g, on TZn, £ , • • • J *n)) == I ? ^n) ^МХ1,...,ХП(^1) • • • ) ^n)> in the sense that if one side exists, then so does the other and the two sides are equal. We will apply Theorem 5.7 to obtain an important result concerning the expectation of the product of random variables. By the linearity prop- erty of the abstract Lebesgue integral, we know that the expectation of the sum of two random variables equals the sum of their expectations. Al- though, in general, the expectation of the product of two random variables does not equal the product of their expectations, we do have the following result. PROPOSITION 5.7 Suppose that X and Y are independent random variables having Suite expectations. Then XY has Suite expectation and 8(XY) = £(Х)£(У). PROOF: First note that, on account of Theorem 5.6, we have I \xy\dfiY(y) dpx(x)= / |z|d/xx(z) / |уНду(у) n J Jn Jn = £(|X|)£(|У|) < oo. Since X and Y are independent, Theorem 5.4 on page 282 implies that Mx,y = Mx xMr- Therefore, applying Theorem 5.7, Fubini’s theorem, and
294 □ Chapter 5 Elements of Probability Theorem 5.6, we get E(XY) = f xyd^X'Y Jn2 xydp.Y(y) dyx{x) = [ xdfix(x) i ydfj.Y(y) = £(X)£(Y). Jn Jn This completes the proof. We can generalize Proposition 5.7 to n mutually independent ran- dom variables. The proof can be accomplished either by employing the n-dimensional version of Fubini’s theorem or by using induction, Proposi- tion 5.6 on page 281, and Proposition 5.7. COROLLARY 5.1 Suppose that Xi, , Xn are mutually independent random variables hav- ing Snite expectations. Then the random variable, Пк=1 a^so ^ias expectation and 5(П£=1 xk) = Пь=1 £№)• Variance of a Random Variable If X is a random variable, then the expectation of (X — S(X))2 is of particular importance. That quantity is called the variance of X. DEFINITION 5.12 Variance of a Random Variable Let X be a random variable having finite expectation. Then we define the variance of X, denoted Var(X), by Var(X) = f((X-£(X))2}. If Var(X) < oo, then X is said to have finite variance. The square root of the variance of X is called the standard deviation of X. Note: We leave it as an exercise for the reader to prove that Var(X) = E(X2) - (£(X))2.
5.3 Expectation of Random Variables □ 295 It is often simpler to compute the variance of a random variable by using this latter formula. The formula also makes it clear that X has finite variance if and only if it has a finite second moment. The variance of a random variable, X, is a measure of its dispersion relative to the mean, being the expected value of the square of the distance from X to £(X). Thus, the smaller the variance, the less likely that X will take a value far from its mean. More precisely, we have the following fact: PROPOSITION 5.8 Chebyshev’s Inequality Suppose that X is a random variable defined on the probability space (Q, A, P) and having finite variance. Then, for each e > 0, P(|X-£(X)| > e) < (5.12) PROOF: We have Var(X) = f((X-£(X))2} = I (X — £(X))2 dP > [ (X-£(X))2dP J{|X-£(X)|>e} >[ €2 dP = e2P(\X — £(X)\ >e), J{\X-£(X)|>e} as required. Note: It is trivial that Chebyshev’s inequality also holds for random vari- ables having finite expectation and infinite variance, but it is of little value in that case. Although Chebyshev’s inequality is quite easy to prove, it is, nonethe- less, indispensable as a tool in probability theory. The importance of Chebyshev’s inequality is due to its universality—it holds for every ran- dom variable (having finite expectation). And despite the fact that (5.12) will usually not be sharp, it is the best that can be said in general. See Exercise 5.73. Variance of a Sum Many probabilistic arguments require the analysis of the variance of a sum of random variables. We will see this, for instance, in the next section when we discuss laws of large numbers. To begin, it will be useful to make the following definition.
296 □ Chapter 5 Elements of Probability DEFINITION 5.13 Covariance of Two Random Variables Suppose that X and Y have finite variances and are defined on the same probability space. Then the covariance of X and У, denoted Cov(X, У), is defined by Cov(X, У) = f((X - £(X)) (У - £(У))}. The finite-variance assumption in Definition 5.13 assures the existence of 8(ХУ). This is a consequence of Cauchy’s inequality, which will be proved in a more general setting in Section 9.2 (see Theorem 9.1). Note that Cov(X, X) = Var(X). Also, it follows easily from properties of expectation that Cov(X,y) = 8(ХУ) - 8(X)f (У). We now present a formula for the variance of the sum of a finite number of random variables. PROPOSITION 5.9 Suppose that Xi, ..., Xn have finite variances and are defined on the same probability space. Then Xi 4- • • • 4- Xn‘has finite variance and xfc=l ' fc=l PROOF: We have Var(f^Xfc) = (^Xk-t 4=i ' \ 4=i =*((£(**- \ 4=1 =£^(xfe-£( = f;var(xfc) + : k=l Var(Xfc) + 2^2 Cov(Xi, Xj). (5.13) i<j A))2+2 E(x« -£ (x«)) -£ (*>)) j i<j ' -адж - од)), i<3 as required.
5.3 Expectation of Random Variables □ 297 From (5.13), we see that a significant simplification will occur in the formula for the variance of the sum of random variables if the covariances are all zero. This leads to the following definition and corollaries: DEFINITION 5.14 Uncorrelated Random Variables Suppose that X and Y have finite variances and are defined on the same probability space. Then they are said to be uncorrelated if Cov(X, У) = 0. Random variables {Xl}lEi are called uncorrelated if, for each pair of distinct elements, г, j E /, the two random variables, Хг and X,, are uncorrelated. Note: It follows immediately from Proposition 5.7 on page 293, that two independent random variables with finite variances are uncorrelated. The converse, however, is not true. See Exercise 5.80. COROLLARY 5.2 If Xi, ..., Xn are uncorrelated random variables, then s П ч n Var(£xk) =£Var(Xk). 4=i ' k=i COROLLARY 5.3 If Xlf ..., Xn are pairwise independent random variables with Snite vari- ances, then s П \ n Var(£xfc) =£Var(Xfc). 4=i ' fc=i In particular, the previous equation holds for mutually independent random variables with Snite variances. EXERCISES 5.3 5.57 Let Q be a finite set, P a probability measure on P(Q), and X a random variable on (Q, P(Q),P). Show that £(X) = X(w)P({u>}), so that £(X) is a weighted average of the values of X, weighted by probabilities. 5.58 Suppose that a balanced coin is tossed three times. If X denotes the total number of times the coin comes up heads, determine the expectation of the random variable X.
298 □ Chapter 5 Elements of Probability 5.59 Suppose that a point is selected at random from the unit square, that is, from the set S = { (ж, у) : 0 < x, у < 1}. Let U denote the larger of the two coordinates of the point obtained. Compute the expectation of the random variable U. 5.60 Provide a detailed verification for parts (a) and (b) of Example 5.10. 5.61 Find the first two moments for a random variable having the a) uniform distribution on [a, /3] (refer to Exercise 5.30 on page 285). b) standard normal distribution (refer to Exercise 5.29 on page 285). 5.62 Let n 6 ЛЛ Construct a random variable having a finite nth moment but no finite moment of any higher order. 5.63 Suppose X is a random variable with finite nth moment. Prove that X has a finite mth moment for all nonnegative integers, m < n. if 5.64 Suppose that X is a nonnegative random variable and that n G ЛЛ a) Prove that £(Xn) = n xn~1P(X > x)dx. Hint: Express xn as an integral and apply Tonelli’s theorem. b) If, in addition, X is nonnegative-integer valued, deduce from part (a) that 5(Xn) = x (kn -(k- l)n)P(X > k). 5.65 Prove Theorem 5.7 on page 293. 5.66 Show that, in general, the expectation of the product of two random vari- ables is not equal to the product of their expectations. 5.67 Prove Corollary 5.1 on page 294. 5.68 Suppose that a point is selected at random from the unit ball, that is, from the set В = { (ж, у, z) : x2 + у2 + z2 < 1}. Let X, Y, Z, and R denote, respectively, the ж-coordinate, ^-coordinate, ^-coordinate, and distance to the origin of the point obtained. a) Determine £(R) by employing Theorem 5.7. b) Determine £(R) by first finding the probability distribution of R and then applying (5.11). ★5.69 This exercise examines some basic properties of variance. Let c 6 R and X be a random variable with finite expectation. Prove that a) Var(X) = f (X2) - (f(X))2. b) Var(cX) = c2Var(X). c) Var(c + X) = Var(X). d) Var(X) = 0 if and only if X is constant P-ae. 5.70 Let Y have the uniform distribution on [—1,1] and set X = Y+. Obtain the mean and standard deviation of X. Refer to Exercises 5.34. 5.71 Refer to Exercise 5.48 on page 286. Let X have the binomial distribution with parameters n and p. Determine the mean and variance of X. 5.72 Refer to Exercise 5.49 on page 287. Let X have the Poisson distribution with parameter Л. Determine the mean and variance of X.
5.3 Expectation of Random Variables □ 299 5.73 Construct an example where equality holds in Chebyshev’s inequality for some c > 0. 5.74 The following result, known as Markov’s inequality, is due to the Russian mathematician Andrey Andreyevich Markov (1856-1922): Suppose X is a nonnegative random variable on the probability space (Q, Д, P). Then, for each € > 0, we have P(X > e) < £(X)/e. a) Prove Markov’s inequality. b) Deduce Chebyshev’s inequality from Markov’s inequality. 5.75 Let X be a random variable on (Q, A, P) and suppose that ф is a function on H that is positive, increasing on (0, oo), and satisfies </>(—x) = Ф(х)- Prove that, for each e > 0, P(|X| > c) < £(</>(%))/0(c). 5.76 This exercise investigates some basic properties of covariance. All random variables are assumed to have finite variance and to be defined on the same probability space. a) Show that Cov(X, Y) = £(XY) - £(X)£(Y). b) Let ai, 1 < i < m, and bj, 1 < j < n, be sequences of real numbers. Prove that Cov I aiXi, ^2 bjYj \ ^2 a»b>Cov(Xi, Yj). j = l ' i=l j = l This is called the bilinearity property of covariance. 5.77 Suppose that two balanced dice are rolled. Let X and Y denote, respec- tively, the minimum and maximum of the two numbers observed. Determine Cov(X, У). Note: Refer to Exercise 5.41 on page 286. 5.78 Obtain the covariance of the two random variables in each part that follows: a) Suppose that a point is selected at random from the unit square, that is, from the set S = {(x, у) : 0 < x, у < 1}. Let X and Y denote, respec- tively, the x- and ^/-coordinates of the point obtained. b) Repeat part (a) if S is replaced by the unit disk, D. Note: Refer to Exercise 5.43 on page 286. 5.79 Refer to Exercise 5.55 on page 288. Determine Cov(X, Y). 5.80 Let О be uniformly distributed on [0,2тг]. Define X = cos О and Y = sinO. Show that X and Y are uncorrelated but not independent. 5.81 Redo Exercise 5.71 using the following steps: For 1 < к < n, let Xk be 1 or 0 according as the fcth trial results in success or failure. a) Obtain £(Xfc) and Var(Xfc). b) Explain why Xi, ..., Xn are independent. (No work is required here!) c) Explain why X = Xi 4--------F Xn. d) Use parts (a)-(c) to find the mean and variance of X. Compare the work done here with that in Exercise 5.71.
300 □ Chapter 5 Elements of Probability 5.82 Suppose Xi, Xn are mutually independent random variables having identical probability' distributions (such random variables are said to be iid, short for “independent and identically distributed”). Further suppose those random variables have finite variance and denote by fi and a2 their common mean and variance, respectively. Set Xn = (Xi 4-• • • 4-Xn)/n. Prove that a) £(Xn) = Did you use independence here? b) Var(Xn) = a2In. ~^n)2) = (n- l)a2- 5.83 For a finite numerical population, {ai,..., an}, the population mean, and population standard deviation, a, are defined by and <7 = , ± J2(Oi-p)2. i=l \ i=l Suppose that n members of the population are selected at random, where we assume that n < N if the sampling is done without replacement. Denote by Xk, 1 < к < n, the value of the fcth member obtained. Set Xn = (Xi + - + Xn)/n. Prove that a) £(Xk) = and Var(Xfc) = cr2, 1 < к < n. Hint: The value of Xk is equally likely to be any of the N population values. b) 8(Xn) = fi. c) Var(Xn) = cr2/n, if the sampling is with replacement. d) Var(Xn) = * a2/nJ the sampling is without replacement. ★5.84 Let (Q, Д, P) be a probability space and E e A with P(E) > 0. For А б Л, define Pe{A) = P(A \E). By Proposition 5.2 (see page 268), Pe is a probability measure on A. Hence, we can define expectation with respect to Pe. This is called the conditional expectation relative to E. Thus, by definition, the conditional expectation relative to E of a random variable X, denoted by £(X | E) or Ее (X), is 8(X\E) = 8E(X) = [ X(u)dPE(^ Jn provided the right-hand side exists. a) Prove that £F(X) = fE X(w) dP(u)/P(E). b) Use part (a) to interpret conditional expectation. c) Suppose that {En}n are pairwise mutually exclusive events having pos- itive probability and satisfying Q = |Jn Pn. Further suppose that X is
5.4 The Law of Large Numbers □ 301 a random variable having finite expectation. Prove that £еп(Х) exists and is finite for each n and that £(X) = £(X | En)P(En) = 52 £En (X)P(En)- n n This is called the law of total expectation. d) Interpret the previous equation in words. e) Compare the law of total expectation with the law of total probability (Exercise 5.13(b) on page 273). Precisely how are they related? 5.4 THE LAW OF LARGE NUMBERS At the beginning of Section 5.3, we introduced the long-run-average inter- pretation of expectation, (5.6) on page 288, in order to motivate the formal definition of the expectation of a random variable. Now that we have made that formal definition and established some basic properties of expectation, it is natural to ask whether we can prove a mathematically precise version of (5.6) as a theorem. So, let X be a random variable associated with some random experi- ment. Suppose that the experiment is repeated independently and let Xk represent the value of X on the fcth trial. More precisely, we assume that Xi, X2, ••• are mutually independent random variables, all having the same probability distribution as ХЛ Then we want to prove that, in some sense, Xi + • • • + Xn .14) n as n —> oo. The question now is, in what sense do we take the convergence in (5.14)? Naively, we might want the convergence to be pointwise. But that is too much to expect, as the following example shows. EXAMPLE5.il Illustrates (5.14) Consider the experiment of tossing a balanced coin once. Let X = 1 or 0 according to whether the coin comes up a head or a tail. As the coin is balanced, £(X) = 0-(l/2) + l-(l/2) = 1/2. Here, repeating the experiment independently means tossing the coin over and over again. Also, Xk = 1 t The existence of such random variables is a consequence of the Kolmogorov extension theorem. See, for example, Robert B. Ash’s Real Analysis and Probability (Cambridge, MA: Academic Press, 1972).
302 □ Chapter 5 Elements of Probability or 0 according to whether the fcth toss is a head or a tail. In this context, (5.14) becomes + i (5.15) n 2 v 7 as n —> oo, which says simply that, in the long run, the coin comes up heads half of the time. However, it is clear that (5.15) does not hold pointwise, that is, for every possible infinite sequence of heads and tails. For instance, if every toss comes up heads, then the limit is one, while if every toss comes up tails, then the limit is zero. □ Example 5.11 shows that it is unreasonable to expect the convergence in (5.14) to be pointwise. As a next best choice, we might try to prove ahnost-everywhere convergence—and that is exactly what we will do. Be- fore proceeding, we recall from Section 4.1 that, in probability theory, the terms, almost surely, with probability one, and almost certainly, are used synonymously for “almost everywhere.” Preliminaries Several preliminary results will be needed in order to prove (5.14). We begin with the following three lemmas. The proofs of the first two are left to the reader as exercises. LEMMA 5.1 bet {cmn}^ n=1 be a double sequence of real numbers such that • for each n € A/\ cmn —* 0 as m —> oo, and • the sequence {|стпп|}^=1 Is bounded. For a bounded sequence {t/n}X=i of real numbers, let zm = 23^ cmnyn for m € АЛ Then а) Уп —* 0, as n —* oo, => zm —* 0, as m —> oo. b) 12X1 -> 1, as m -> oo, and yn -> y, as n -+ oo => -> y, as m —> oo. LEMMA 5.2 Toeplitz’s Lemma Suppose that an Is a divergent series of positive real numbers and that is a convergent sequence of real numbers. Then r 2-/fc=l aksk ,. hm ------------= lim sn. n-,0° Xfc=i “fc n~x
5.4 The Law of Large Numbers □ 303 LEMMA 5.3 Kronecker’s Lemma Suppose that is an increasing sequence of positive real numbers such that linin—oo bn = oo. Further suppose that xn is a convergent series of real numbers. Then, 1 n lim — Y2 bkxk = 0. n k=l PROOF: Define x = «о = 0 and, for n € sn — Ylk=ixk- Also define bo = 0 and, for n e Af, an = bn- Ьп-г. Using summation by parts (see Exercise 5.87), we get that n n bk%k = bnSn sk—lak- (5.16) fc=l fc=l Note that bn = and that sn —> x as n —> oo. Hence, by (5.16) and Toeplitz’s lemma, 171 / V"^7l \ L 1- / z^fc—1 aksk-l \ n lim — у bkxk = lim sn-----------*=A-------- = x - x = 0, n->0° bn n-,o° \ Xfc=l °fc / as required. Next we will prove two propositions, both due to Kolmogorov. PROPOSITION 5.10 Kolmogorov’s Inequality Let Xi,..., Xn be mutually independent random variables, each with Snite variance. Set Sj = Xi 4----F Xj, 1 < j < n. Then, for each e > 0, P( max |S, -£(S,)| > e) < PROOF: Without loss of generality, we can assume that £(Xj) = 0 for j — 1, ..., n (why?). Let A = {maxi<j<n \Sj| > c} and, for 1 < к < n, ^fc = {|SJ<6, j = |Sfe| > e}. Note that the A^s are mutually exclusive and that A = |Jfc=i Ak- Now, Var(Sn.) = £(S^) = [ S2ndP> [ S2ndP = ^ [ S2ndP. (5.17) «/ Г2 J A.
304 □ Chapter 5 Elements of Probability Let Yk = -Xk+i 4--1- Xn. Then Sn = Sk 4- Yk and, hence, [ S2ndP= [ SkdP + 2 [ SkYkdP+ [ Yk dP. (5.18) «/Afc J Ak J Ak J Ak Because XAk^k is a Sfc-measurable function of Xi, ..., X^, and Yk is a Bn_jfe-measurable function of Xfc+i» • • •, Xni we deduce from Proposi- tion 5.6 on page 281 that XAk$k and Yk are independent random variables. Thus, by Proposition 5.7 on page 293, and the fact that 8(Yk) = 0, [ SkYkdP + [ XAkSkYkdP = 8(xAkSk -П) = 8(xAkSk)£(Yk) = 0. JAk Jn This last equation and (5.18) imply fAk S2dP > fAk S%dP > e2P(Ak). Consequently, by (5.17), Var(Sn)>e2^P(Afc) = e2P(A). fc=l This completes the proof of the proposition. Note: If n = 1, Kolmogorov’s inequality reduces to Chebyshev’s inequality. PROPOSITION 5.11 Suppose that Xi, X2, ... are mutually independent random variables and that SXi Var(Xn) < 00. Then (Xn - £(Xn)) converges P-ae. PROOF: We can assume without loss of generality that 8(Xn) = 0 for all пеМ. Set Sn = Xk- We want to prove that, with probability one, {SnHJ'Li converges. First we will show that, for each e > 0, Jim^Pf (J{|5m+fc -Sm| > e} j = 0. (5.19) By Proposition 5.1(g) on page 267, we have, for each m € X, p(U {|5m+fe - Sm| > e}) = lim p(u {|Sm+fc - > e}) 4=1 ' n~t°° 4=1 ' (5.20) = lim P( max |Sm+fc - STO| > t). n—>oo \ l<fc<n /
5.4 The Law of Large Numbers □ 305 For 1 < к < n, let Yk = Xm+k and к т+к = = Y, х1 = 3т+к-3т. J = 1 J=7n+1 As Xi, X2, ... are mutually independent, so are Yi, ..., Yn. Applying Kolmogorov’s inequality to the T^s, we get p/ l<? с I s> Var(Sm+n - Sm) _ 1 •P( max |*9Г7г_|_д!; Sm| > el < 2 — 2 z > Var(X^). \ i<«<n / e б , _ k=m+l The previous relation and (5.20) imply that, for each m € Af, £ Var(xfc). 4=1 ' fc=m+l Letting m —► 00 in this last relation and using YlkLi Var(Xfc) < 00, we see that (5.19) holds. Now we can show that {5’n}^=1 converges with probability one. Let E = {cj : {Sn(o?)}^=1 does not converge}. Then w G E if and only if is not a Cauchy sequence, which means there exists an r E.V such that, for each n € Af, there is a A: e Af with |Sn+k(^) ~ S'n(a>)| > r"1. In other words, E=U ri(U{lS»«-S"^-} (5-21) r=l \n=l 4=1 7 / But, for each r 6 AT, we have, by Proposition 5.1(f) and (5.19), that / 00 / 00 - = lim P m—^oo < lim P 7П—+OO ((j {|Sm+fc - Sm| > J}) = 0. This last fact and (5.21) imply that P(E) = 0.
306 □ Chapter 5 Elements of Probability The Strong Law of Large Numbers Before proving our next theorem, we will introduce some additional termi- nology. We say that the random variables, Xi, X2, ..., obey the strong law of large numbers if there exists a sequence, of real numbers and a sequence, of positive real numbers tending to infinity such that, with probability one, Xi 4- • • ‘ 4- Xn — an lim ----------------------= 0. n->OO bn (5.22) If the convergence in (5.22) is in probability (i.e., in P-measure), then we say that Xi, X2, ... obey the weak law of large numbers. Because a probability space is a finite measure space, Proposition 4.11 on page 204 implies that if a sequence of random variables obeys the strong law of large numbers, then it obeys the weak law of large numbers. The next result, also due to Kolmogorov, provides a sufficient condition for a sequence of random variables to obey the strong law of large numbers. THEOREM 5.8 Kolmogorov’s Strong Law of Large Numbers Let Xi, X2, ... be mutually independent random variables with Unite vari- ances and set Sn = X± 4----F Xn. Suppose that {bn}^Li is an increasing sequence of positive real numbers satisfying limn_>oo bn = 00 and Vax(Xn) < 00. Then, with probability one. ,iro =(,. n-»OO bn In other words, Xb X2, ... obey the strong law of large numbers with an — £(Sn). PROOF: For n e let Yn = (Xn — £(Xn))/bn and note that £(Yn) = 0. In view of Exercise 5.69 on page 298, У Var(y„) = У Var - У ХЦ™ oo.
5.4 The Law of Large Numbers □ 307 Therefore, by Proposition 5.11, Yn converges with probability one. But, we have Sn-£(Sn) 1 Xk-£(Xk) 1 v The required result now follows from Kronecker’s lemma. An immediate corollary of Kolmogorov’s strong law of large numbers is the following result. Its proof is left to the reader as Exercise 5.88. COROLLARY 5.4 Suppose that Xi, X2, • • • are mutually independent random variables with common finite mean p and variance a2. Then with probability one. In Kolmogorov’s strong law of large numbers and the foregoing corol- lary, besides presuming that the random variables, Xi, X2, • • •, are mutu- ally independent, we impose a restriction on their variances. If we assume that Xi, X2, ... all have the same probability distribution, then the re- striction on the variances can be eliminated. To prove this statement, we first establish the following lemma. LEMMA 5.4 Suppose that X is a nonnegative random variable. Then 00 00 £р(х>п)<од<£р(х>п). - n=l n=0 Thus, X has finite expectation if and only if > n) < 00. PROOF: For n < x < n + 1, P(X > n + 1) < P(X > x) < P(X > n). Integrating these inequalities from n to n + 1 and summing the results, we conclude that OO 00 rn+l 00 J2P(X>n + l)<^2 / P(X>x)dx< £Р(Х>П). n=0 n=0 n=0
308 □ Chapter 5 Elements of Probability By Corollary 3.3 on page 144, the integral in the previous relation equals Jo> x) dx. Applying Exercise 5.64(a) on page 298, we obtain the required result. We will now prove the strong law of large numbers in the case where the random variables, Xi, X2, ..., are mutually independent and have the same probability distributions. Such random variables are said to be iid, short for “independent and identically distributed.” Note that there is no assumption made about the common variance of the X^s; in particular, the common variance may be infinite. THEOREM 5.9 Strong Law of Large Numbers (iid Case) Suppose that Xi, X2, ... are mutually independent and identically dis- tributed random variables with Snite mean, p. Then Xi 4- • • • + Xn hm ---------------= p (5.23) n—>oo n with probability one. PROOF: We can assume without loss of generality that p = 0. Because Xi, X2, ... are identically distributed and have finite mean, Lemma 5.4 implies that 00 00 22 p(i*ni n) = 22 pdxii > ”) < °°- (5-24) n=l n=l The idea of the proof is to truncate the X^s and then apply Theo- rem 5.8. Let E = 0Х=п{1х*1 fc})- Then (5-24) and part (a) of the Borel-Cantelli lemma (page 270) imply that P(E) = 0. Define the sequence of random variables, Yi, Y2, ..., by у ___ у- __J Xn, if |Xn I <C n; Уп - лпх{[Хп\<п} - 10 if |Xnj > n. and note that, if cu 6 Ec, then Yn(w) = Xn(u>) for n sufficiently large. Therefore, to establish (5.23), with p = 0, it suffices to prove that, with probability one, 1 n lim—V~'Yfe = 0. (5.25) n—*00 n z—'
5.4 The Law of Large Numbers □ 309 Since Xi, X2, • • • have the same probability distribution, Theorem 5.6 on page 291 implies that £(УП) = £(XnX{|xn|<n}) = £(-XiX{|Xi|<n))- Hence, by Corollary 4.3 on page 199, £{Yn) = [ XxdP-+ £(Xi) = /z = 0, J{|Xi|<n} as n —* oo. This implies that n 1 £(Yk) —> 0 as n —> oo. Conse- quently, proving (5.25) is equivalent to proving iim n.,n-££;.,rt) _ 0, n—>oo • 71 with probability one; and, to accomplish that, we will verify that the Yns satisfy the hypotheses of Theorem 5.8 with bn = n. As Xi, X2, ... are mutually independent, so are У1, 1г, ... (why?). Furthermore, we have Уаг(Уп) " n2 n=l oo 1 n r =еЛе / xidp OO - OO -| = yl xtdPY^ ^/{rn-lSPGKm} OO - OO 1 <^m J {m—l<\Xi\<m} n=mn 00 г <2 V / |X1|dP = 2f(|X1|)<oo, where, in the previous line, we have used the fact that, for m € AT, X^=mn~2 — 2/m (see Exercise 5.92). Theorem 5.9 indicates that the intuitive notion of expectation as the long-run-average value of a random variable in repeated, independent ob- servations can be formulated and proved mathematically as a consequence
310 □ Chapter 5 Elements of Probability of the axioms of probability. A simple corollary of that theorem shows that this is also true for the relative-frequency interpretation of probability? COROLLARY 5.5 Borel’s Strong Law of Large Numbers Suppose that E is an event associated with some random experiment and let p be its probability. Denote by n(E) the number of times that event E occurs in n independent repetitions of the experiment. Then r n(#) hm ------= p n-^oc n with probability one. PROOF: For each n € AT, define Xn = 1 or 0 according to whether event E occurs or does not occur on the nth repetition of the experiment. Then n(E) = Xi 4- • • • 4- Xn and, as the repetitions of the experiment are inde- pendent of one another, the random variables Xi, %2, • • •, are iid. Their common mean isp = 0'(l—p) + l- p = p. The required result now follows from Theorem 5.9. We have concentrated our discussion in this section on the strong law of large numbers. As we know, if a sequence of random variables obeys the strong law of large numbers, then it must also obey the weak law of large numbers. Nonetheless, the weak law is important in its own right because, for example, it can be proved under weaker conditions than the strong law. Several versions of the weak law will be considered in the exercises. EXERCISES 5.4 Note: In the exercises below, we will use the notation Sn = X\ 4-h Xn. 5.85 Prove Lemma 5.1 on page 302. 5.86 Prove Toeplitz’s lemma, Lemma 5.2 on page 302. 5.87 Prove the summation by parts formula, (5.16) on page 303. Hint: Write bk = aj and interchange summations. 5.88 Prove Corollary 5.4 on page 307. 5.89 Describe, in words, the difference between the weak and strong laws of large numbers for iid random variables having finite mean. Refer to Defini- tion 4.15 on page 203 and Exercise 4.92 on page 207. t Actually, the following corollary is also a corollary of Corollary 5.4.
5.4 The Law of Large Numbers □ 311 5.90 The following result is known as Cantelli’s strong law of large numbers: Suppose that Xi, Xz, ... are mutually independent random variables with uniformly bounded fourth moments. Then (Sn — 5(Sn))/n —> 0, as n —> oo, with probability one. a) Deduce Cantelli’s strong law from Kolmogorov’s strong law. . b) Prove Cantelli’s strong law without reference to Kolmogorov’s strong law. Hint: Employ Exercise 5.75 on page 299 with ф(х) = я4, the Borel-Cantelli lemma (page 270), and Exercise 4.92 (page 207). 5.91 Suppose that independent trials are performed in which an event, E, occurs on the fcth trial with probability pk- Let n(E) denote the number of times that event E occurs in the first n trials. Show that, as n —► oo, n(E) H=1p* ; 0 n n ’ with probability one. 5.92 Prove that for m E JV, V°° A < 2/m. 5.93 Let Xi, X2, ... be iid with finite mean, д, and f a bounded continuous function on 11. a) Prove that lim fX1 + "; + Xn)) =№)• n—»oo \ \ Tl / j b) Deduce from part (a) that, for each t E [0,1], ит f>(£) n-^oo \Tl/ J Hint: Refer to Exercise 5.81. 5.94 Each number in [0,1] has a decimal expansion and, except for numbers of the form m/10n, the expansion is unique. For definiteness, we will use the unique terminating expansion for numbers of the form m/10n. Now, let x E [0,1] have decimal expansion .X1X2 ...; that is, x = xn/10n. For each n E N and к E {0,1,..., 9}, denote by nfc(x) the number of the first n decimal digits of x that equal k. Then x is said to be a normal number if njt(x)/n —> 1/10 as n —> 00 for all digits fc; in other words, if the relative- frequency of occurrence of each decimal digit in x is 1/10. In this exercise, we will prove the following result due to Borel: Except for a Borel set of Lebesgue measure zero, every number in [0,1] is normal. a) Let (П,Д,P) = ([0,1],B[o,i],A[o,i])- Define the functions Yi, Y2, ... on Q by Yn(x) = xn, where xn is the nth decimal digit of x. Prove that
312 □ Chapter 5 Elements of Probability Yi, Y2, ••• are random variables, that is, are В[0,1]-measurable. Hint: Note that 9 9 {yn = fc}=J... (J {Y1=k1,...,Yn-1=kn-1,Yn = k} fci=O and show that each set in the union is an interval. b) Prove that the random variables, Yi, Y2, ..., are iid. c) Show that, for each decimal digit k, limn—oo nk(x)/n = A-ae. Hint: Let X<fe) = x{Yj~k}, for j = 1, 2, ... . d) Deduce that, except for a Borel set of Lebesgue measure zero, every number in [0,1] is normal. 5.95 Repeat Exercise 5.94 for binary instead of decimal expansions. Explain how this provides a model for the random experiment of tossing a balanced coin indefinitely. 5.96 Prove Markov’s weak law of large numbers: Suppose that Xi, X2, • • • are random variables all defined on the same probability space and having finite variances. Further suppose that Var(Xi +----------h Xn) = o(n2). Then linin—oo (Sn — £(Sn))/n = 0, in probability. That is, Xi, X2, ... obey the weak law of large numbers with an = £(Sn) and bn = n. 5.97 Prove Chebyshev’s weak law of large numbers: Suppose Xi, X2, ... are uncorrelated random variables having uniformly bounded variances. Then linin—oo (Sn — £(Sn))/n = 0, in probability. 5.98 Establish the following generalization of Chebyshev’s weak law of large num- bers: Suppose that Xi, X2, ... are random variables all defined on the same probability space. Further suppose that they have uniformly bounded vari- ances and that lim i V Cov(Xfc,Xn) = 0. (5.26) n—*oo П k=l a) Prove that limn->oo(Sn — £(Sn))/n = 0, in probability. b) Random variables, Xi, X2, ..., are said to be asymptotically uncor- related if Cov(Xi, Xj) —> 0 as |i — j\ —► 00. Prove that asymptotically uncorrelated random variables with uniformly bounded variances satisfy (5.26) and, hence, the weak law of large numbers. 5.99 A standard example of a series that is convergent but not absolutely con- vergent is 00 1 ' <5-27) n=l
5.4 The Law of Large Numbers □ 313 Suppose, instead of (5.27), we consider a similar series in*"1, where the signs are chosen at random. In other words, suppose we consider the series oo (5.28) n=l where the Xns are iid, taking the values ±1 each with probability 1/2. a) Show that the series in (5.28) converges with probability one. b) What can be said about convergence of the series ^^=1anXn, w^en {an} is a sequence of real numbers with an < oo? 5.100 Let Xij X2, ... be a sequence of mutually independent random variables with P(Xn = nb) = P(Xn = —nb) = 1/2, wher^ b is a positive real number. Prove the following: a) If b < 1/2, then Sn/n —* 0 with probability one. b) If b > 1, then limsupn_too |Sn|/n = 00 with probability one. c) Conclude that Theorem 5.8 fails if the hypothesis Var(Xn)/&2 < 00 is removed. 5.101 Let Xi, X2, ... be iid random variables, and suppose that £(X*) = 00 and £(Xf) < 00. Prove that Sn/n-+ 00 with probability one.
Johann Radon (1887-1956) Johann Radon, born on December 16,1887, in Tetschen, Bohemia, began his formal schooling at the age of 10 at the Gymnasium in Leit- maritz, Bohemia. Eight years later, in 1905. he enrolled at the University of Vienna to pursue the study of mathematics and physics. Radon presented his doctoral dissertation in 1910 on the calculus of variations. Radon taught at several universities between 1910 and 1919; he spent a semester at the University of Gottingen, a year at the University of Brunn, and time at the Technische Hochschule of Vienna and at the University of Vienna. In 1919 he went to the University of Hamburg for three years, moved subsequently to Greifswald, Erlangen, and then to Breslau in 1928 where he remained until 1945. In 1947, he was elected to the Austrian Academy of Sciences. The calculus of variations continued to fascinate Radon because of its many applications to analysis, geometry, and physics. He applied it to differential geometry to discover Radon curves. Other work included the combination of Lebesgue's and Stieltjes's theories of integration (the development of the Radon integral), the Dirichlet problem of the loga- rithmic potential (application of the Radon-Nikodym theorem), and the development of the Radon transformation technique. Radon spent the last nine years of his life as a full professor at the University of Vienna in Vienna, Austria, where he died on May 25, 1956. 314
Differentiation Up to this point, we have been concentrating on the theory of integra- tion. In this chapter, we will study the theory of differentiation, both in the classical sense of derivatives of functions and in an extended sense of derivatives of measures. We will prove Lebesgue’s remarkable theorem that any monotone func- tion is differentiable almost everywhere. We will also introduce the concepts of bounded variation and absolute continuity and use them to generalize the two fundamental theorems of calculus to Lebesgue integration. We will also extend the notion of measure to include those that are real- valued and complex-valued, establish decomposition and representation theorems for measures, prove and apply the famous Radon-Nikodym theo- rem, and generalize the classical change-of-variable formula for integration. 6.1 DERIVATIVES AND DINI-DERIVATES In this section, we will introduce derivatives and establish the fact that any monotone function has a (finite) derivative almost everywhere. Note: For brevity, we will use the phrase almost everywhere instead of Lebesgue
316 □ Chapter 6 Differentiation almost everywhere and use the notation ae instead of A-ae. To begin, we recall the following definition from elementary calculus. DEFINITION 6.1 Derivative of a Real-Valued Function A real-valued function f defined in some open interval about x € is said to be differentiable at x if /(x + fe) -/(x) lim ;---------- h->0 h exists and is finite. In that case the limit is called the derivative of f at x and is denoted by /'(ж)Л For our study of differentiation, it is useful to introduce the concept of the Dini-derivates of a function at a point. And, in order to do that, we need the following definition. DEFINITION 6.2 Lower and Upper Limits Let g be a real-valued function defined in a deleted interval about the point ж, that is, a set of the form (c, d) \ {ж}, where c < x < d. Then we define lim sup g (г/) = inf sup g(y) y—*x+ 0<y—x<6 liminfg(y) = sup inf o(y) y—>x+ s>0 Q<y-x<S limsupp(y) = inf sup g(y) y-*x~ 6>"Q<x-y<6 lim inf ^(y) = sup inf g(y). y-^x- 6>0 o<x-y<8 These extended real numbers are called, respectively, the upper right, lower right, upper left, and lower left limits of g at x. t If lim^—.o (f(x + h) — /(z)) /h — oo, we will write f'(x) = oo but will not say that f is differentiable at x and, similarly, if the limit is — oo.
6.1 Derivatives and Dini-Derivates □ 317 We introduce these lower and upper limits for the same reason that we introduce the limit inferior and limit superior of sequences; namely, although lim2/_>a;+ g(y), etc., may not exist, limsup2/_>x+ g(y), etc., always exist (in 7£*). We leave it as an exercise for the reader to prove that the right-hand limit of g at ж, lim^i g(y), exists in 7£* if and only if the lower and upper right limits of g at x are equal; in that case, we denote the right-hand limit by g(x+). An analogous result holds for left-hand limits. EXAMPLE 6. 1 Illustrates Definition 6.2 • Let g(y) = sin(l/i/) for у / 0. It is easy to see that for each 6 > 0, sup g(y) = 1 and inf g(y) = -1. 0<y<6 Q<y<6 Consequently, we have lim sup2/__>0+ g(y) = 1 and liminf2/_>0+ g(y) = — 1. Similarly, limsup^Q- g(y) = 1 and lim infy_>o~ g(y) = -1. □ Dini-Derivates We now define the Dini-derivates of a real-valued function. DEFINITION 6.3 Dini-Derivates Let f be a real-valued function defined in an open interval about the point x. Set n+/z \ г /(я + h) -/(я) D f(x) = hmsup —-------£----— л—o+ h O+/W = liminf + h—>0+ h, C-/W = limsup/(l + ',)~№) /1-0- /i—о- rt These four extended real numbers are called the Dini-derivates of f at x. They are, respectively, the upper right, lower right, upper left, and lower left derivates.
318 □ Chapter 6 Differentiation It follows that f is differentiable at x if and only if all four of the Dini-derivates are equal and finite. It also follows that f M lim + h>)~ exists in TZ* if and only if D+f(x) — D+f(x); and similarly for fL(x). EXAMPLE 6. 2 Illustrates Definition 6.3 Let f = xq and x eQ. Then, f(x + h) — f (x) Г -1, if h i Q-, f6 n h 10, if h e Q. k ' It follows easily from (6.1) that for each 6 > 0, f(x + h) - f(x) . t f(x + h)~ /(x) sup —--------7—= 0 and mf —-------------------------f---— = -oo. o</i<6 h o<h<6 h, Therefore, D+f(x) = 0 and D+f(x) = -oo. Similarly, we find that for each 6 > 0, f(x + h) - /(ж) f(x + h) - f(x} sup —--------7----—- = oo and mf — ---------------------г-----— = 0, -6<h<o h -6<h<o h, so that D~f(x) = oo and D_f(x) = 0. □ An Everywhere-Continuous, Nowhere-Differentiable Function It is an elementary fact proved in calculus that if f is differentiable at a point x, then it is continuous at x. The converse of this fact fails. For example, f(x) = |ж| is continuous but not differentiable at x = 0. We now present a more striking example, namely, a function that is continuous at every point of 1Z but differentiable at no points of 11. The idea is to construct a function that is everywhere continuous but oscillates so wildly as to be nowhere differentiable. EXAMPLE 6. 3 A Continuous, Nowhere-Differentiable Function Define ф(х) on [0,1] by 0(x) = x, 1 — ж, 0 < ж < I; | < x < 1.
6.1 Derivatives and Dini-Derivates □ 319 Extend ф to all of H via ф(х + fc) = ф(х) for к G Z. See Fig. 6.1. Next define the functions un, n = 0, 1, 2, ..., by un(x) = </>(4nx)/4n, as portrayed in Fig. 6.2. Note that, for each n, un is continuous on TZ. FIGURE 6.2 Graphs of the uns.
320 □ Chapter 6 Differentiation Now consider the function f defined for all x G TZ by oo /(*) = SUn(x). n=0 For x G It, |un(z)| < 1/(2 -4n) and so the series converges uniformly on 1Z. Hence, f is continuous on H. Note also that for к = 0, 1,2, ... and n > fc, 4n , <№&±4Гк)) <№nx±4n~k) »n(-±4 )-----------------=-------4.---- _ </>(4nx) _ 4n ~ for all x G 7£. To show that f is nowhere differentiable we consider two cases. Ref- erence to Fig. 6.2 will prove helpful during the discussion. Case 1: x is not of the form m/4fc, for some m G Z and к E fJ"- We will find a sequence such that hn / 0 for all n G X, hn —> 0, but (/(a; 4- hn) — fix)')/hn does not have a limit. This will show that f is not differentiable at x. We can assume without loss of generality that 0 < x < 1. Then x lies in exactly one of the intervals (0,1/4), (1/4,1/2), (1/2,3/4), (3/4,1). Hence, we can choose hi so that |hi| = 1/4 and that x + hi G (0,1/2) if x G (0,1/2) and x 4- hi G (1/2,1) if x G (1/2,1). It then follows from (6.2) that /(a; + fei) -/(x) Ai ф\х + Ai) - ф(х) Ai Next, x also lies in exactly one of the intervals (0,1/16), (1/16,1/8), (15/16,1). Hence, we can choose h2 so that |Ti2| = 1/42 and that x 4- h2 E (0,1/8) if x G (0,1/8), x 4- h2 E (1/8,1/4) if x G (1/8,1/4), ..., x 4- h2 E (7/8,1) if x E (7/8,1). It then follows from (6.2) that f(x + h2)-f(x) (2, X e (4=1,1), к = 1, 3, 6, 8; A2 I 0, a: e (V. I)> * = 2, 4, 5, 7. Continuing in this manner, we obtain a sequence {hn}^=1 such that |hn| = l/4n and /(ж 4- hn) — f(x) _ ( odd integer, n odd; hn [ even integer, n even.
6.1 Derivatives and Dini-Derivates □ 321 = 0) Thus, hn —> 0, but Ишп_>оо(/(з; + hn) — f(xf)/hn does not exist. Hence, f is not differentiable at x. Case 2: x is of the form m/4k, for some m G Z and к G ЛЛ Let hn = 4~n. If r > n then, by (6.2), ur(x -I- hn) = ur(x + 4“n) = ur(x) and, consequently, Ur(x ~/ifj) U<p (x hn Now, let n G JV” with n > k. If <r < n — 1, then ur(x) = ur = </>(m4r-fc)/4r = 0. Moreover, because 0 < 4r~n < 1/4 < 1/2, we have </>(4r~n) = 4r-n, and therefore, / 7 x (m , \ 1 f Arfm 1 \\ /АГ ur(x + hn) = ur + hnj = ф I 4 + —J J /4 = ф(т4г~к + 4r~n)/4r = </>(4r“n)/4r = 4~n. Consequently, »r(» + b.)-»rW=1| к<Т<п^!. (6.4) hn Next note that for all r, ur has a right derivative at all points and so, in particular, at x. Hence, it follows that (ur& + hn) - ur(x) hm > ---------7---------- n^°° \ hn (6.5) exists and is finite. Now, for convenience, let dr = (ur(x + hn) — ur(x))/hn. For n > fc, we have /(x + /in) - f(x) /in к—1 n—1 oo r=0 r=fc r—n
322 □ Chapter 6 Differentiation By (6.4), the second term equals n—к and, by (6.3), the third term equals 0. Hence, for n > fc, f(x + hn) - fix) = yi / ur(x + ftn) - ttr(a:) \ r=0 ' Applying (6.5), we can now conclude that ]• + hn) - fix) _ lim -------------------= oo. n-^OO hn In particular, f is not differentiable at ж. □ Vitali Covers Example 6.3 shows that continuity is by no means sufficient for differen- tiability. The function /, constructed in that example, is everywhere con- tinuous but nowhere differentiable. Essentially, the reason that function is nowhere differentiable is because it “oscillates vigorously.” In Section 6.2 we will show that functions that do not oscillate vigorously (in a sense to be made precise) are differentiable almost everywhere. Our next goal is to prove that a monotone function is differentiable almost everywhere, a theorem due to Lebesgue. The proof we give uses the concept of Vitali covers. Roughly speaking, a family, V, of closed intervals is a Vitali cover of a set E of real numbers if every point of E is in arbitrarily small intervals of V. More precisely, we have the following definition. DEFINITION 6.4 Vitali Cover Let E C TZ. A family, V, of nondegenerate closed intervals is said to be a Vitali cover of E if for each x € E and each 6 > 0 there is an I eV such that x e I and £(/) < 6. The following theorem, called the Vitali covering theorem, uses the concept of Lebesgue outer measure A*, defined in Chapter 3 on page 106. THEOREM 6.1 Vitali Covering Theorem Let E C TZ with A*(£?) < oo and suppose V is a Vitali cover of E. Then for each e > 0 there is a finite disjoint collection {Д}£=1 С V such that x*(e\IJ/J <€. ' fc=l '
6.1 Derivatives and Dini-Derivates □ 323 PROOF: Because A*(£?) < oo, we can choose an open set, O, such that О DE, and A*(O) < oo. Set W = {I e V : I С O}. Then W is a Vitali cover for E. (See Exercise 6.12.) The idea of the proof is this: Starting with some Д € УУ, select an I2 G W as large as possible but missing Д; then select an /3 € W as large as possible but missing Д U/2; and continue the process until A* (-E\Ufc=i Ik) becomes small. So, let Ii e W. If E С Д, then we are done. Otherwise, let 51 = sup{ 7(7) : I G УУ, IП Ii = 0 }. BecauseW is a Vitali cover of E and E \ Ii / 0, it follows that 5i > 0. Also, because I С О for all I G УУ, it follows that 5i < A*(O) < 00. Hence we can choose I2 G УУ with I2 A7i = 0 and ^(/2) > 5i/2. Again, if E c Ii U then we are done. We now proceed inductively. Suppose 7i,...,7n G УУ are pairwise disjoint. If E C ULi Ik, then we are done. Otherwise, let 5n = sup{£(7) : I e УУ, I A Ik = 0, 1 < к < n }. Since W is a Vitali cover of E and E \ UJUi Ik / 05 it follows that 5n > 0. Also, because I С О for all I G W and A*(O) < 00, it follows that 5n < 00. Hence, there is a member of УУ, say 7n+i, such that In+i A 7fc = 0, 1 < fc < n, and £(7n+i) > 5n/2. If this process terminates after a finite number of steps, then we are done. Otherwise, it yields a sequence {Tn}^Li Pairwise disjoint members of УУ such that £(7n+i) > 5n/2 and 52£(7n) < 00. Because ^t(In) < 00, there is an N G X such that 00 £ ад<с/5. n=N4-l Set A = E \ (j£=i In- We claim that A*(A) < e. Let x G A. Then x (Jn=i In and so ^(a:, (JnLi In) = S > 0. Because x G E and W is a Vitali cover for E, there is an I G УУ with x G I and 5(7) < 5. It follows that IA In = 0 for n = 1, 2, ..., N. Now, there must be an n G V with 7A7n / 0. Suppose to the contrary. Then for each n G Af, 7 A 7^ = 0, 1 < к <n. Applying the definition of 5n, we get that for each n G.V, 7(7) < 5n < 2£(7n+i). But this is impossible because 7(7) > 0 and £(7n+i) —> 0 as n —> oo. Let m = min{ n : 7A7n / 0 }; note that m > N. Let ym be the midpoint of Im. Then, |z - Ут\ < 1(1) + т/(Лп) < <5m—1 + т/(Лп) / z z 1 5 < 2£(Лп) + -e(Im) = z z Consequently, x G [ym - 11(1т),Ут + = Лп-
324 □ Chapter 6 Differentiation Hence, if x G A, there is an m > N such that x G Jm\ in other words, A c Um=N+i «An- It follows that A‘(A)< f; £(Jm) = 5 £ €(4n)<e. m=N+l . m=7V+l This completes the proof. Differentiability of Monotone Functions We are just about ready to prove Lebesgue’s famous theorem on the almost- everywhere differentiability of monotone functions. First a lemma. LEMMA 6.1 Let f be a real-valued function on (а, ft). Then the set of points in where f±(x) and fL(x) exist (possibly ±oo) but are unequal is countable and, hence, has Lebesgue measure zero. PROOF: We show { x G (a, b) : f'+(x) and fL(x) exist and Д (ж) < fL(x)} is countable. An analogous/argument shows {x G (a, b) : f'+(x) and fL(x) exist and Д (ж) > /-(ж) } also countable. Let E = {x G (a, b) */f+(x) and fL(x) exist and f'+(x) < fL(x)}. We will set up a one-to-one correspondence between E and a countable set, thus establishing the countability of E. So, let x G E. Choose rx G Q such that Д(ж) < rx < fL(x). By the definitions of /+(я) and /L(x), we can choose rational numbers sx and tx such that a<tx<x<sx<b and f(y) - /(д) y-x f(y) - f(x) X<y<Sx tx <y < X or, in other words, f(y) “№) < rx(y-x), tx <У < sx, y^x. Now consider the mapping ф:Е —> Q3 defined by ф(х) = (rx,sx,tx). Because Q3 is countable, it will follow that E is countable if we can prove that ф is one-to-one.
6.1 Derivatives and Dini-Derivates □ 325 Assume to the contrary that there exist x,z e E with x / z and ф[х) — 0(z). Then rz = rx, sz = sx, and tz = tx. So, f(y) - /(*) < rx(y - x), №) - /(*) < rx(y - z), tx<y <SX, y^X tX < У < У Z. Since tx = tz < z < sz — sx and z / x, and tx < x < sx and x 0 z, we conclude that- /(*) - /(x) < rx(z - x) /(*) - /(*) < rx(x - *)> which is impossible. THEOREM 6.2 Let f be a monotone function on [a, b\. Then f is differentiable almost everywhere on [a, b], PROOF: The method of the proof is as. follows. First we will show that {x € (a, 6) : D+f(x) < D+f(x)} has measure zero; a similar argument will show that {a: G (a,6) : D_f(x) < D~f(x)} has measure zero as well. This will establish that {x e (а,Ь) : Д(х) and fL(x) exist in 7£* }c has measure zero. Then, by Lemma 6.1, we will be able to conclude that {z G (a,6) : f'(x) exists in 7£* }c has measure zero. Finally, we will show that {z G (a,b) : ff(x) is infinite} has measure zero. We can assume without loss of generality that f is nondecreasing. Let E = {x G (a,b) : D+f(x) < D+f(x) }. We will show that A*(E) =*= 0. For each r, s G Q with 0 < r < s, let Ers = {x G (a,b) : Щ/(ж) < r < s < D+f(x)}. Then E = U { Ers :r,s G Q, 0 < r < s }, a countable union. If we can prove A*(£?rs) — 0 for all r, s G Q with 0 < r < s, then we will have established that A*(j£) = 0. So, let r,s G Q with 0 < r < s, and set а = A*(F?rs). Let c > 0 be given, and choose О open with О D Ers and A(O) < a4-6. If x G Ers, then D+f(x) < r. Consequently, for each 6 > 0, inf0<h<6(/(^+^)""/(^))/^ < and, hence, there is an h, with 0 < h < 5, and f(x 4- h) — /(x) < rh. Now, let V be the collection of all closed intervals of the form [я, я 4-h], where h > 0, x G Ers, f(x+h) — f(x) < rh, and [z, rr4-h] C On(a,b). Then it follows from the previous paragraph that У is a Vitali cover of Ers- Thus,
326 □ Chapter 6 Differentiation by the Vitali covering theorem, Theorem 6.1, there is a finite sequence A, 12^ •. •, In of pairwise disjoint members of V such that X*(Ers\ Q/fe) <e ' fc=l ' Let Ik — [xk,Xk + hk\- We will need to work with the open intervals (xk.Xk + hk). Set U = {Jk=Axk^xk + hk) and note that X*(Ers\U) < e. (6.6) Also, because U С O, we have ^hk = X(U)<X(O)<a + e. (6.7) k=l Then, by the definition of V, we may conclude that + hk) - f{xk)) <r^Thk <r(a + e). (6.8) fc=l fc=l Next, assume that у G Ers A U. Then D+f(y) > s so that for each 6 > 0 there is a fc, with 0 < к < 6, such that f(y + k) — f(y) > sk. As U is open, [у, у 4- к] C U for sufficiently small k. Consequently, if we let W be the collection of all closed intervals of the form [?/, у 4- fc], where к > 0, у G Ers A C7, f(y 4- k) — f(y) > sk, and [у, у 4- к] G U, then W is a Vitali cover of Ers A U. Using the Vitali covering theorem again, we can choose pairwise disjoint members of W, say Ji, J2, ..., Jm, such that A*((Er,nU) \ (j J,) < e. (6.9) ' J=1 ' From (6.6) and (6.9), we get a = A*(Ers) < X*(Ers П U) + A*(£rs \ U) <« + £>)+< (6’10) J=1 Setting Jj = [yj,yj 4- kj], we obtain from (6.10) and the definition of W that m mm ^(ДУз + kj) - f(yj)) > s^2kj = s^Jj) > s(a - 2e). (6.11) J=1 J=1 j=l
6.1 Derivatives and Dini-Derivates □ 327 Now, + fcj] C U C Ufc=i fcfc, xk + hk] and, therefore (see Exercise 6.14), m n +fcj) - /(%•)) < 52 (f(xk+hk> ~ л**))- j=l fc=l This along with (6.8) and (6.11) imply that s(a — 2e) < r(a 4- б). As б > 0 was arbitrary, it follows that sa < ra. Because r < s and a > 0, we must have a = 0. Thus, we have shown that { x € (a, b) : D+f(x) < D+f(x) } has mea- sure zero. A similar argument shows that { x € (a, b) : D_f(x) < D~f(x) } also has measure zero. Hence, {x e(a,b): f'+(x) and fL(x) exist in TV }c has measure zero. Applying Lemma 6.1, we conclude that f'(x) exists for А-almost all x G (a, 6), although its value may be infinite at some points. Let A = { x G (a, b) : f'(x) = 4-oo }. We wish to show that A(A) = 0. Let x G A and N G AT. Then D+f(x) = oo so that for each <5 > 0 there is an Л, with 0 < h < 6, such that f(x + h) - f(x) > Nh. (6-12) Consequently, if we let U be the collection of all closed intervals of the form [x, ir + h], where h > 0, x G A, f(x + h) — f(x) > Nh, and [rr, rr + Л] G (a, 6), then 11 is a Vitali cover of A. By the Vitali covering theorem, there exist pairwise disjoint members of U, say Ii, I2, ..., In, such that k=l ' (6.13) Set Д = [xk,Xk 4- hk\- Then, by (6.12) and (6.13), NX*(A) < 1 + NX* (Q /Л =1 + N^hk ' fc=i < 1 + £(/(sfc + hfc) - /(xfc)) < 1 + /(b) - /(a), k=l where the last inequality follows from the fact that f is nondecreasing and Ik C (a, 6), 1 < к < n. (See Exercise 6.13.) But, VA*(A) <14- f(b) — f(a) for each N G Af implies that A* (A) = 0.
328 □ Chapter 6 Differentiation Derivatives of Complex-Valued Functions We conclude this section by briefly discussing differentiation of complex- valued functions of a real variable. DEFINITION 6.5 Derivative of a Complex-Valued Function A complex-valued function f defined in an open interval about x G is said to be differentiable at x if _ /(x + h) - /(x) lim ------;------- h-+0 h exists. In that case the limit is called the derivative of f at x and is denoted by fr(x). The proof of the following proposition is left as an exercise for the reader. PROPOSITION 6.1 Let f be a complex-valued function defined in an open interval about x. Let и = %lf and v = 3f. Then f is differentiable at x if and only if и and v are differentiable at x and, in that case, f'(x) = u'(x) 4- iv'(x). EXERCISES 6.1 6.1 Show that limJ/_>a.+ g(y) = L G 7£* if and only if for each sequence with yn > x and yn —* x, limn->oo g(yn) = L. Establish a similar result for left-hand limits. 6.2 Suppose that is a sequence with yn > x and yn —> x. Show that lim inf g(y) < liminf g(yn) < limsupg(yn) < limsupg(y). y-*x+ n—*oo n-*oo y— Establish a similar result for left-hand limits. 6.3 Show that limJ/_tx+ g(y) = L G TV if and only if lim inf g(y) = lim sup g(y) = L. Establish a similar result for left-hand limits. 6.4 Prove that lim^-^ g(y) = L G IV if and only if liminf g(y) = lim sup g(y) — liminf g(y) — lim sup g(y) = L. У—х" y—x~ У~*х+ y—x+
6.1 Derivatives and Dini-Derivates □ 329 6.5 Find the Dini-derivates of f = xq at x if x Q. 6.6 Set /(0) = 0 and f(x) = zsin(l/z) for x 0 0. Find the Dini-derivates of f at each x e Я. 6.7 Suppose / attains its minimum at a point x and that f is defined in an open interval about x. Show that D+f(x) > 0 > D~ f(x). Exercises 6.8-6.11 discuss differentiation of convex functions. A real-valued function, /, on (a, b) is called convex if f(cx + (1 - c)y) < cf(x) + (1 - c)f(y), for all x, у e (a, b) and 0 < c < 1. 6.8 Prove that a convex function on (a, b) is continuous thereon. 6.9 Let f be a convex function on (a, b). Prove that f+(x) and fL(x) exist for all x 6 (a, b). 6.10 Let f be a convex function on (a, b). Prove that f'+ and fL are nondecreasing on (a, b). 6.11 Let f be a convex function on (a, b). Prove that f' exists almost everywhere on (a, b). 6.12 Let ECU and О an open set with О D E. Suppose V is a Vitali cover of E. Show that W = {I 6 V : I C 0} is also a Vitali cover of E. 6.13 Suppose f is nondecreasing on [a, 6]. Let Л, 1 < fc < n, be a sequence of pairwise disjoint subintervals of [a, b] having endpoints аь and bk, 1 < k < n. Then n J2(/(b*)-/(afc))</(b)-/(a). k=l 6.14 Suppose f is nondecreasing on [a, 6]. Let and {Zfc}^ be two dis- joint sequences of closed subintervals of [a, 6] such that UJLi c Uk=i Denote the left and right endpoints of Jj by aj and bj, respectively, and those of Ik by Ck and dk, respectively. Then m n £(№) - /(<b)) < - f(ck)). j=l k=l 6.15 Suppose f is nondecreasing on an interval I having a nonempty interior. a) Show that at each interior point x, both f(x+) and f(x~) exist and that f(x+) = inf{ f(y) :y el and у > x }, f(x-) = sup{ f(y) :y el and у < x }.
330 □ Chapter 6 Differentiation Furthermore, verify that f(x—) < f(x) < /(z+). Conclude that f is continuous at x if and only if f(x+) = f(x—). b) Formulate and prove appropriate analogues of the statements in part (a) in the cases where x is either a left or right endpoint of I. c) Suppose that a, b G I with a < b. Let zi, Z2, ..., xn be points of (a, b). Prove that J2"=i(/(a:t+) “ < /(&) “ f(a)- d) Deduce from the previous parts that the set of points in I where f is discontinuous is countable. 6.16 Show that the derivative of the Cantor function equals zero almost every- where on [0,1]. 6.17 We know that the Cantor function, is nondecreasing on [0,1] and, from the preceding exercise, if/ = 0 ae on [0,1]. Although is nondecreasing and maps [0,1] to [0,1], it is “usually constant” in the sense that it is constant on each subinterval of the complement of the Cantor set. In this exercise, the Cantor function is used to construct a strictly increasing continuous function, /, on [0,1] such that f' = 0 ae on [0,1]. For each n E Af and nonnegative integer к < 2n, let 0, fnk(x) = < t/j(2nx-k), k 1, Define the function, /, on [0,1] by oo 2n —1 n=l k—Q a) Show that f is well-defined and continuous. b) Show that f is strictly increasing on [0,1]. c) Prove that /' = 0 ae on [0,1]. 6.18 Let f be a real-valued function on [a, 6]. Suppose E C [a, d] and f exists and is bounded on E, say, by M. Prove that Л* (/(B)) < MA*(E). 6.19 Prove Proposition 6.1 on page 328. 6.2 FUNCTIONS OF BOUNDED VARIATION In Section 6.1 we proved that every monotone function is differentiable almost everywhere (Theorem 6.2). Furthermore, we stated that, not only monotone functions, but any function that does not “oscillate vigorously” is differentiable almost everywhere. Definition 6.6 makes precise the notion of not “oscillating vigorously.”
6.2 Functions of Bounded Variation □ 331 DEFINITION 6.6 Total Variation; Bounded Variation Let f be a complex-valued function on [a, b\. The total variation of f over [a, b], denoted by is defined by n V*f = sup < ^2- f(xk-1)I a = x0<x1<-- - <xn = b . I fc=i If K6/ < oo, then f is said to be of bounded variation on [а,Ь]. EXAMPLE 6.4 Illustrates Definition 6.6 a) Any monotone function on [a, b] is of bounded variation. In fact, we have V^f = f(b) - f(a), if f is nondecreasing, and V^f = /(a) - /(6), if f is nonincreasing. b) Define f on [0,1] by /(0) = 0 and f(x) = rrsin(l/a;), for x 0. Then f is not of bounded variation on [0,1]. To see this, consider for each the partition of [0,1] given by 0 < 2 < 2 < 2 < < 2 < 2 < i (4n 4-1 )тг 4п7г (4n — 1)тг 2тг 7Г In other words, xq = 0; x^ = 2/(4n + 2 — А?)тг, к = 1, 2, ..., 4n 4-1; and ^4n+2 = 1- Then, 4n4-2 2 о £ ^Xk>> ~ ^Xk~1^ = (4n + 1)7F + (4п + 1)тг 2 2 (4n — 1)тг + (4n — l)?r 2.2 2 2 2 2 . , 2 + • + — + — + -+ sin 1 - - 7Г Зтг ЗТГ ' 7Г and so, 4n4-2 л 4n+2 - ti * tik As к 1 —> oc as n —> oc, it follows that V^f = oo. □ Our next goal is to prove that any function of bounded variation on [a, b] is differentiable almost everywhere on [a, b], This is an imme- diate consequence of two facts: that any monotone function is differen- tiable almost everywhere (Theorem 6.2) and that any real-valued function of bounded variation can be written as the difference of two nondecreasing functions. This latter fact is the content of Theorem 6.3.
332 □ Chapter 6 Differentiation THEOREM 6.3 Let f be a real-valued function of bounded variation on [a, b\. Then VZf + vyf = yyf, a<x<y< b. (6.14) Moreover, f can be written as the difference of two nondecreasing functions on [a, b\. PROOF: Define V£f = 0. First we prove (6.14). If x = a, there is nothing to prove. So, assume x > a. Let a = xq < xi < • • • < xn = у be a partition of [a, y] and set к = min{j : x < Xj }. Then, by the definition of total variation, fc-i 52 |/(x,) - + |/(x) - /(Xfc.jl < V*f j=l and |/(xfc)-/(x)|+ £ \/(хэ)-/(х^\<УУГ j=fc+l Consequently, by the triangle inequality, £ - /(xej-1) i < 521Ж)- Ж-i)! +1/(®) - /(^-1)1 J=1 j=l +\f(xk)-f(x)\+ 52 i/(^) -/(^-1)1 j=fc+l <v*f + vxyf. Because the preceding inequality has been established for any partition of [a,y], we have V*f < V*f + V^f. To obtain the reverse inequality, let a = a?o < Xi < • • • < xm = x and x = уо < yi <•••< yn = у be partitions of [a,a;] and [a;,y\, respectively. Then a = x0 < xi < • • • < xm = x = yQ < т/i < • • • < yn = у is a partition of [a, ?/]. Therefore, m n vm > 52 №•) - /to-oi+52и w - j=l k=l
6.2 Functions of Bounded Variation □ 333 Taking the supremum over partitions of [a,®], we obtain that, for any partition, x = yo < уг <••< yn — у of [x,у], n k=l Taking the supremum over all partitions of [rr, y], yields V^f>V^f + V^f. This establishes (6.14). Write f(x) = V£f- We claim that V*f and V*f-f(x) are nondecreasing functions on [a, b\. From (6.14) and the fact that V*f > 0 for x < y, we deduce that, as a function of x, V£ f is nondecreasing on [a, b\. It remains to prove that V*f — f(x) is nondecreasing on [a, b]. Let a < a; < ?/ < b. By (6.14), (KV - /(y)) - (V*f - /(x)) = v*f - (f(y) - /(x)) > KV-|/(2/)-/(x)|>0, where the last inequality holds because x = xq < Xi = у is a partition of[x,y]. COROLLARY 6.1 Suppose f is a complex-valued function of bounded variation on [a, b]. Then f can be written in the form f = (fi-f2) + i(f3-f4), where fj, 1 < j < 4, are nondecreasing functions on [a, b\. - PROOF: Because f is of bounded variation, so are SR/ and Qf. (See Ex- ercise 6.24.) Applying Theorem 6.3 to SRf and S/ completes the proof. Since any monotone function on [a, b] is differentiable almost every- where on [a, b] (Theorem 6.2) and every function of bounded variation on [a, b] can be written as a linear combination of nondecreasing functions (Corollary 6.1), we have the following theorem. THEOREM 6.4 Any function of bounded variation on [a,b] is differentiable almost every- where on [a, Ь].
334 □ Chapter 6 Differentiation EXERCISES 6.2 6.20 Using only Definition 6.6, prove that if f is of bounded variation on [a, 5], then it is bounded thereon. 6.21 Define / on [0,1] by /(0) = 0 and f (x) = x2 sin(l/z), for x 0 0. Show that f is of bounded variation on [0,1]. 6.22 Define f on [0,1] by /(0) = 0 and f(x) = x2 sin(l/z2), for x 0 0. Show that f is not of bounded variation on [0,1]. 6.23 Define f on [0,1] by /(0) = 0 and f(x) = xa sin(l/z), for x 0 0. Show that f is of bounded variation on [0,1] if and only if a > 1. 6.24 Prove that f is of bounded variation on [a, 6] if and only if Rf and S/ are of bounded variation on [a, 5]. ★6.25 Let f and g be complex-valued functions on [a, 5] and a G C. Prove that a) V^(f + g)< V*f + V*g b) = |a|Va7 6.26 Suppose f is a function of bounded variation on [a, 5]. Show that f has only a countable number of discontinuities. 6.27 Let f be of bounded variation on [a, 6] and D denote the set of points of (a, b) at which f is discontinuous. By Exercise 6.26, we can write D = {zn}n- For each n, set dn = f(xn+) - f(xn~). Show that \dn| < Vabf. +6.28 Let f and g be of bounded variation on [a, 6]. Show that Va(fg) < (sup{ |/(x)| : x € [a, 6] })V*g + (sup{ |ff(a:)| : x € [a, t>] }) V*f. Deduce that the product of two functions of bounded variation on [a, 5] is also of bounded variation on [a, b]. 6.29 Let be a sequence of real-valued functions on [a, b] that converge pointwise to the function f. Prove that V*f < liminfn—oo V* fn- 6.30 Suppose that f: [a, 6] —> [c, d] is monotone and that g is of bounded variation on [c,d]. Prove that Va(g о f) < V^g. 6.31 Let f be of bounded variation on [a, 6]. If f is continuous at xo 6 [a, 6], show that the function g(x) = V*f is also continuous at xq. 6.32 Let {/n}^Li be a sequence of functions of bounded variation on [a, 5] such that £2n fn(a) converges absolutely and Vabfn < oo. Prove that a) fn(x) converges absolutely for each x e [a, 6] b) vab(En/n)<Envo7„ 6.3 THE INDEFINITE LEBESGUE INTEGRAL Recall that the two fundamental theorems of calculus for Riemann integra- tion show that differentiation and integration are inverse operations. More precisely, we have the following two facts.
6.3 The Indefinite Lebesgue Integral □ 335 First Fundamental Theorem of Calculus: Suppose f is Riemann integrable on [a, b]. Let F(x) = I f(t) dt, a<x <b. Ja Then F is differentiable at all points x at which f is continuous (hence, almost everywhere by Theorem 2.7) and at such points F'(x) = /(x). In other words, d fx fajaf№=f(x) (6-15) at all continuity points of f. Second Fundamental Theorem of Calculus: Suppose f is defined on [a, b] and /' exists and is Riemann integrable on [a, b\. Then f f'(t) dt = f(x) - /(a), a < x < b. (6.16) J a In this section, we will prove a generalization of (6.15) to Lebesgue integration theory. Then, in the next section, we will characterize all func- tions f for which (6.16) holds in the Lebesgue sense. To begin, we introduce some useful abbreviations. We write C1 ([a, b]) for /^([a,Ь],Л4[в>ц, А[а>ь]) and £1(72.) for £1(7^,Л4,Л). Recalling our con- vention for using Riemann-integral notation for Lebesgue integrals, we have that f G £1([a,6]) means f is Л4[а,ь]-measurable and f^\f(x)\dx < oo; f G £1(7?.) means f is Al-measurable and |/(x)| dx < oo. Moreover, we will continue to use the phrase almost everywhere instead of Lebesgue almost everywhere and the notation ae instead of A-ae. Our first goal is to prove that whenever f G £1([a,b]), (6.15) holds almost everywhere on [a, b]. Several preliminary results are needed to es- tablish that fact. PROPOSITION 6.2 Suppose f G £1([a, 6]) and set F(x) = i f(t) dt, a <x <b, J a Then F is continuous and of bounded variation on [a, b\. Moreover, Cb V*F= / |/(x)|dx. (6.17)
336 о Chapter 6 Differentiation PROOF: Let x e [a, b\. We will show that F is continuous at x. Let {xn}Xi C [a, b] be such that xn x as n oo. Set fn = fx[a,xn)- Note that we have fn —> fx[a,x) except possibly at x, so that fn fx[a,x) Moreover, since \fn\ < |/| 6 £1([a, 6]), the DCT implies that /*®n lim F(xn) = lim / f(t) dt n—>oo n—>oo /_ dt dt = F(x). Consequently, F is continuous at x. Now we show that F is of bounded variation on [a, b\. Consider an arbitrary partition a = xq < Xi < • • • < xn = b of [a, b\. Then n E k=l П fXk fXk-l |F(xfc) - F(xfc_i)| = / f(x) dx - I fc=l Ja J* f(x) dx Hence, (6.18) To establish the reverse of (6.18), we first consider the case where f is a continuous complex-valued function on the interval [u, 6]. Let б > 0 be given. By the uniform continuity of /, there is a <5 > 0 such that x,y G [a, b] and |z - y\ < 6 implies |/(x) - f(y)\ < e/2(b - a).
6.3 The Indefinite Lebesgue Integral □ 337 Now, let a = xq < xi < • • • < xn = b be a partition such that |xj+i — Xj\ < 6 for j = 0, 1, ..., n — 1. Then |F(xj+i) — F(®j)| f(x) dx |/(x) - f(xj)\dx |y(xj)|dx 2(i> —a)^+1 \f(x)\dx- f ||/(x)| - |f(xj)||dx J Xj (6.19) 2(b-afr+l Xj>> fxJ+l £ f |/(x)| dx - (xj+i - Xj). Xj ° a Summing on both sides of (6.19) we obtain n-l -b VobF>£|F(xj+i)-F(x,)|> / \f(x)\dx — e. J=o Ja As б is an arbitrary positive number, we obtain the reverse of (6.18) in the case where f is a continuous complex-valued function on [a, 6]. To prove the reverse of (6.18) in the general case, let f e £x([a, 6]) and б > 0. Using Exercise 4.82 on page 202, we select a simple function s such that J* \f(x) — $(z)| dx < e/2. Then applying Exercise 3.64(d) on page 139 and the dominated convergence theorem, we choose a continuous function fo such that J* |s(x) — /o(z)| dx < e/2. It follows that ь \f(x) - /о(^)I dx < 6. (6.20) Now let Fq(x) = J® fo(t) dt. From Exercise 6.25 on page 334, we have KbFo-V0b(Fo-F)< V*F. It follows from (6.18) and (6.20) that KbFo - 6 < VabF. (6-21)
338 □ Chapter 6 Differentiation We have already proved the proposition in the case F = Fq. Thus, in view of (6.20), (6.21), and the previous inequality, we conclude that rb rb rb I \f(x)\dx — e< I \f(x)\dx—l \f(x)-f0(x)\dx a J a J a dx = V*F0<V*F + e. Because e was arbitrarily chosen, we have proved the reverse of (6.18) in the general case. Remark: Since any continuous function on [a, b] is uniformly continuous on [a, 6] we have, in fact, that the function F in Proposition 6.2 is uniformly continuous on [a,b\. A result analogous to Proposition 6.2 is valid for functions defined on R. Specifically, we have the following fact whose proof is left as an exercise for the reader. PROPOSITION 6.3 Suppose f 6 £x(7?.) and set F(x) = /* f(t) dt, — oo < x < oo. J — oo Then F is continuous on R and is of bounded variation on every finite closed interval. Moreover, (6.22) where, by definition, V^F = lim^oo V™nF. EXAMPLE 6.5 Illustrates Proposition 6.2 a) It is quite easy to prove directly that sin x is of bounded variation on [0,2тг]. However, it is even easier to prove that fact by employing Proposition 6.2. We just note that cos x G £x([0,2тг]) and that sin x = / cos t dt, 0 < x < 27Г. Jo b) Define F on [0,1] by F(0) = 0 and F(x) = xsin(l/a;), for x / 0. Clearly, F is continuous on [0,1]. However, as we discovered in Example 6.4(b)
6.3 The Indefinite Lebesgue Integral □ 339 on page 331, F is not of bounded variation on [0,1]. Consequently, by Proposition 6.2, it is impossible to find a function f G £1([0,1]) such that xsin(l/x) = fg f(t)dt for 0 < x < 1. In words, F is not the indefinite integral of a Lebesgue integrable function on [0,1]. c) The Cantor function, is continuous and of bounded variation on [0,1] (because it is monotone). Exercise 5.28 on page 284 shows, and we will show again, that is not the indefinite integral of a Lebesgue integrable function on [0,1]. This proves that the converse of Proposition 6.2 fails. □ First Fundamental Theorem of Calculus By Proposition 6.2 (page 335) and Theorem 6.4 (page 333), if f G jC1 ([a, b]) and F is the indefinite integral of /, then F is differentiable almost every- where on [a, b]. So we will have established the generalization of (6.15) to Lebesgue integration theory once we show that F'(x) = f(x) ae. To accomplish that, we need the following two lemmas, whose proofs are left to the reader. (See Exercises 6.38 and 6.39.) LEMMA 6.2 If f E £1([a, 6]) and f* f(t)dt = 0 for all x G [a, b], then f = 0 almost everywhere on [a, b]. LEMMA 6.3 Suppose g is defined in some open interval about x E 1Z and g is continuous at x. Then x+h g(t) dt = g(x). We are now in a position to establish the first fundamental theorem of calculus for Lebesgue integration. THEOREM 6.5 First Fundamental Theorem of Calculus Suppose f G Z21 ([a, b]) and set F(x) = f f(t) dt, а < x <b. J а Then F is differentiable almost everywhere on [a, b] and, in fact, F'(x) = f(x) (6.23) lim h->0 h for almost all x G [a, b].
340 □ Chapter 6 Differentiation PROOF: We have already observed that F is differentiable almost every- where on [a, b\. Hence, it remains to prove (6.23). We do this first for bounded, nonnegative f. So, assume 0 < f < M on [a, b\. Extend the domain of f (and, hence, of F) by setting f(x) = 0 for x [a, b\. Let a < t < b. Note that fn —► F' ae. Because /n(t) = n ftt+” f(s) ds and f is bounded by M, we have |/n| < M for all n. Applying the dominated convergence theorem, we get that fX fX lim / fn(i)dt— / F'(t)dt, n->o° J a J a a < x <b. On the other hand, by Lemma 6.3, we have for a < x < 6, rx / 1 \ lim I fn(t)dt= lim n I Fit d—) — F(t) n-ocJa n-*°° J a L V nJ = lim n n—+OO dt fx+i ra+i I F(t)dt- / F(t)dt x J a n Note that this result does not require f to be bounded. Consequently, we see that J® f(t) dt — f* F'(t) dt, for a < x < b, or fa (/W ~ W) dt = 0, for a < x < b. Thus, by Lemma 6.2, Ff = £ ae. This proves (6.23) in case f is nonnegative and bounded. Next we will establish (6.23) for nonnegative f without the bounded- ness condition. Let fn(t) be as before. Because f is nonnegative, so are the fns. Applying Fatou’s lemma and the previous displayed equation, we get that [ F'(t) dt < liminf f fn(t)dt= i f(t)dt. (6.24) da ж* J a J a We use the method of truncation to reduce this case to the bounded case. For each n G AT, let f(t) < /(t) > n.
6.3 The Indefinite Lebesgue Integral □ 341 Then each gn is a nonnegative bounded measurable function. Consequently, by what we already established for such functions, we conclude that for almost all x G [a, b], d [x fa J 3n(t)dt = gn(x). Now, F(x)= [ f{t)dt = i (f(t)-gn(t))dt+ f gn(t)dt. Ja J a Ja The first term on the right-hand side is nondecreasing because f > gn and, hence, is differentiable almost everywhere and, clearly, its derivative is nonnegative where it exists. Thus, for almost all x G [a,b], d fx F\x) - fa j (/W - dt + gn(x) > gn(x). Because gn] f pointwise on [a, b], the monotone convergence theorem and the previous inequality give [ F'(t)dt> lim [ gn(t)dt = [ f(t)dt. (6.25) «/Cl v CL J Q, It now follows from (6.24) and (6.25) that j* Ff(t) dt = f* f(t) dt for a < x < b, and so F' = f ae on [a, b]. It remains to establish (6.23) without the nonnegativity assumption. For f G ([a, b]), write f = fa - f2 + ifz - ifa where fj > 0, 1 < j < 4. Then, F(x) = [ fa(t)dt — [ f2(t)dt + i /* fz(t)dt-i /* fa(t)dt. J a J a J cl J a It now follows from what we have proved for nonnegative functions that F' = /i - f2 + г/3 - ifa = f ae on [a, b]. As a corollary of Theorem 6.5, we obtain the following result whose proof is left as an exercise for the reader. (See Exercise 6.40.) COROLLARY 6.2 Suppose f G £x(7i) and set F(x) = f fat) dt, —oo < x < oo. J — oo Then F is differentiable almost everywhere on TZ and, in fact, F'(x) = f (z) for almost all x E1Z.
342 □ Chapter 6 Differentiation EXERCISES 6.3 6.33 Let C £1([a, b]). For each n G AT, set Fn(x) = /* fn(t)dt, a<x<b. J a Suppose that Fn —♦ 0 ae. Does it follow that a) fn 0 ae? b) fn —► 0 in measure? * 6.34 Suppose Fi and F2 are disjoint bounded closed sets. Prove that there exist disjoint open sets, Oi and (Э2, such that Oi D Fi and O2 D F2. Hint: Refer to Exercises 3.21(a) and 3.22(a). 6.35 Prove Proposition 6.3 on page 338. 6.36 Give an example of a function that is of bounded variation on [0,1] but that is not continuous on [0,1]. Can such a function be the indefinite integral of a Lebesgue integrable function on [0,1]? Explain your answer. 6.37 Give an example of a function that is continuous on [0,1] but not of bounded variation on [0,1]. Can such a function be the indefinite integral of a Lebesgue integrable function on [0,1]? Explain your answer. 6.38 Prove Lemma 6.2 on page 339 by proceeding as follows. a) Explain why we can, without loss of generality, assume that f is real- valued. b) Show that if f is positive on a set of positive measure, then there is a closed subset К of (a, b) such that J f(x) dx > 0. Hint: Refer to Exercise 4.52 on page 191 and Exercise 3.43 on page 127. c) Use part (b) to deduce that if f is positive on a set of positive measure, then fTf(x)dx 0 for some open interval I C (a, b). Hint: Write О = (a, b) \ К and note that f f(x) dx = — f f(x) dx. d) Use part (c) to show that if f is positive on a set of positive measure, then J* f(t) dt^O for some x G [a, b], contradicting a hypothesis of the lemma. Conclude that the set where f is positive has measure zero. e) Explain why the set where f is negative must have measure zero. 6.39 Prove Lemma 6.3 on page 339. 6.40 Prove Corollary 6.2 on page 341. 6.4 ABSOLUTELY CONTINUOUS FUNCTIONS We have now extended the first fundamental theorem of calculus to the setting of Lebesgue integration theory. We might expect the generalization of the second fundamental theorem of calculus to be as follows: Suppose f is
6.4 Absolutely Continuous Functions □ 343 defined on [a, b] and f exists almost everywhere and is Lebesgue integrable on [a, 6]. Then f f'(t) dt — f(x) — /(a), a < x < b. J a But this is not true in general? For example, let -0 be the Cantor function. Then = 0 ae and so ip G ([0,1]). However, [ ^(t)dt = 0/ 1 = ^(1) ~^(0). Jo In fact, for all x G (0,1], fg ifi'(t) dt = 0 / — V>(0)« Evidently then, a generalization of the second fundamental theorem of calculus to Lebesgue integration theory requires more restrictive hypotheses on f. In the next few pages, we will characterize the functions for which that generalization holds. To begin, we give such functions a special name, the rationale for which will become apparent once we characterize them. DEFINITION 6.7 Absolutely Continuous Function on [a,b] Suppose that f is defined on [a, 6], /' exists almost everywhere and is Lebesgue integrable on [a, b], and /(x) = /(a) + /* /'(£)dt, a<x<b. J a Then f is said to be absolutely continuous on [a, b]. DEFINITION 6.8 Absolutely Continuous Function on 1t Suppose that f is defined on It, f exists almost everywhere and is Lebesgue integrable on 1t, and /(x) = /* f'(t) dt, —oo < x < oo. J—oo Then f is said to be absolutely continuous on It. t It is true, however, if in the previous paragraph, exists almost everywhere” is replaced by exists everywhere.” See Theorem 5 of John F. Randolph’s Basic Real and Abstract Analysis (New York: Academic Press, 1968), p. 424.
344 □ Chapter 6 Differentiation EXAMPLE 6.6 Illustrates Definitions 6.7 and 6.8 a) We just saw that the Cantor function, is not absolutely continuous on [0,1]. Because = 0 ae on [0,1], Theorem 6.5 shows the impossibil- ity of representing as an indefinite integral of an £x([0,1]) function. For if *ф(х) = J** g(t) dt, 0 < x < 1, then we must have 0 = t// = g ae, implying ^(x) = 0, 0 < x < 1, which is not true. b) Theorem 6.5 on page 339, in particular (6.23), shows that if a function F on [a, b] can be represented as the indefinite integral of some function in £x([a, b]), then F(x) = J* F'(t)dt for а < x < b, so that F is absolutely continuous. Note: F(a) = 0. c) Corollary 6.2 (page 341) shows that if a function F on К can be rep- resented as the indefinite integral of some function in £x(7£), then F(x) = F'(t) dt for x G 1Z, so that F is absolutely continuous. Note: F(—oo) = 0. □ The results of Examples 6.6(b) and (c) are summarized in the following proposition. PROPOSITION 6.4 a) Let F be a function deSned on [a, b] and suppose that there is a function f E £x([a, b]) such that F(x) = f* f(t)dt for а < x < b. Then F is absolutely continuous on [a, b\. b) Let F be a function defined on 1Z and suppose that there is a func- tion f e £x(7£) such that F(x) = fl^f^dt for x e 11. Then F is absolutely continuous on 1Z. PROPOSITION. 6.5 Iff is absolutely continuous on [a, b], then it is continuous and of bounded variation on [a, b\. Moreover, (6.26) PROOF: By assumption, /' G £x([a, b]) and Ж) = №) + Г f'(t)dt, J а a <x <b.
6.4 Absolutely Continuous Functions □ 345 Proposition 6.2 (page 335) shows that the function J* f'(t) dt is continuous and of bounded variation on [a, b\. Obviously, constant functions are contin- uous and of bounded variation. Hence, f is continuous and of bounded vari- ation on [a, 6]. Equation (6.26) follows immediately from Proposition 6.2 once we note that for any function g and constant a, V^(a + g) = V^g. EXAMPLE 6.7 Illustrates Proposition 6.5 Define f on [0,1] by /(0) = 0 and f(x) = xsin(l/x), for x / 0. It is easy to see that f is continuous on [0,1]. In Example 6.4(b) on page 331, we learned that f is not of bounded variation on [0,1]. Hence, by Proposition 6.5, it is not absolutely continuous on [0,1]. □ An Equivalent Condition for Absolute Continuity Although, as we know from Proposition 6.5, continuity and bounded varia- tion are necessary conditions for absolute continuity, they are not sufficient. Indeed, the Cantor function is continuous and of bounded variation on [0,1] but is not absolutely continuous. In the next few pages, we will discover a continuity-type condition that is equivalent to absolute continuity. As we have seen, any such continuity- type condition must be stronger than ordinary continuity and, in fact, than uniform continuity. PROPOSITION 6.6 Suppose f is absolutely continuous on [a, b\. Then for each e > 0, there is a S > 0 such that if {(a^, bk)}k=1 is any finite sequence of pairwise disjoint subintervals of [a, b] with^k=i(bk~ak) < 6, then lf(bk)-f(ak)l < e. PROOF: By assumption, /' e £x([a, 6]) and f(x) = f(a) 4- j* f'(t)dt for a < x < b. It follows that /(d) - /(c) = £ f\t) dt, a<c<d<b. Now, let e > 0 be given. Applying Exercise 4.80 on page 202, we can choose <5 > 0 such that if E C [a, 6] is measurable and A(E) < 6, then Д, |/'(t)| dt < e Let {(afc,bfc)}fc=1 be any finite sequence of pairwise disjoint subinter- vals of [a, b] such that “ ak) < Set E = Ufc=i(afc> M- Then
346 □ Chapter 6 Differentiation E is measurable, E C [a, b], and A(E) < 6. Therefore, fc=l k=l k=i ^ak as desired. Proposition 6.6 implies that an absolutely continuous function on a finite closed interval is necessarily uniformly continuous. However, the converse is not true. Indeed, we know that any continuous function on a finite closed interval is uniformly continuous, and we have already encoun- tered several functions that are continuous but not absolutely continuous on a finite closed interval. The question that now arises is whether the necessary condition for absolute continuity in Proposition 6.6 is also sufficient. The answer is yes, as Proposition 6.7 shows. PROPOSITION 6.7 Suppose f is defined on [a, b] and for each e > 0, there is a 6 > 0 such that if {(afc,&fc)}^=is any finite sequence of pairwise disjoint subintervals of [a,b] with (bfc - a*:) < 6, then Sfc=i l/(M ~/(ak)l < e- Then f is • absolutely continuous on [a, b]. To establish Proposition 6.7, we first prove the following two lemmas. LEMMA 6.4 Suppose f satisfies the hypotheses of Proposition 6.7. Then f is of bounded variation on [a, b] and f' e jC1 ([a, b]). PROOF: By assumption, we can choose <5 > 0 so that if {(ajfe,bfc)}^=1 is any finite sequence of pairwise disjoint subintervals of [a, b] such that Efc=i(bfc - at) < <5, then \f(bk) - f(ak)\ < 1. Choose N G Af so that (b—a)/N < 6 and consider the partition of [a, b], a = xq < xi < • • • < xn = b, that divides it into N intervals of length (b-d)/N. If Xk^-i = г/o < У1 < • • • < Ут = Zfc is any partition of [хк_ъхк], then - Vj-1) =xk- xk-i < 6 and so Y™=1 - f(yj-i)\ < 1-
6.4 Absolutely Continuous Functions □ 347 Therefore, < 1 for k = 1, 2, ..., N. Using (6.14) on page 332, we can now conclude that N fc=l Hence, f is of bounded variation on [a, b]. Next we must prove that /' 6 jC1 ([a, b]). Since f is of bounded varia- tion on [a, b], it is differentiable almost everywhere on [a, b]. To show that f is measurable, we first extend the domain of f by setting f(x) = /(b) for x > b. Then we note that because / is measurable, so is the function /n(x) = n(/(x 4- 1/n) — /(ж)) for each n G X (why?); and, because we have fn(x) —» /'(ж) ae, it follows from Proposition 3.14 on page 163 that f is measurable. Finally, we show that /' is Lebesgue integrable. Since / is of bounded variation, we can write / = Д — /2 + г/з ~ if4, where fj, 1 < j < 4, are nondecreasing. Now, let g be any nondecreasing function on [a, b]. Then gf exists ae on [a, b] and so lim -----------------= g (x) ae. n—>00 — n Define g(x) = g(b) for x > b. The functions n{g(x + 1/n) — <7(2:)) are nonnegative and hence, by Fatou’s lemma, [ g'(x) dx < liminf n [ taLr + —) — <?(x)l dx Ja n“>0° Ja n f fb+n rb = liminfn< / g(i)dt — / g(t) dt n-00 Ja f rb+i ra+n = lim inf nlj g(t) dt — J g(t) dt = g(b) — lim sup n / g(t) dt. n-*oo Ja Noting n fa + n g(t) dt > g(a) for all n G Af, we conclude that (x)dx < g(b) - g(a). (6.27)
348 □ Chapter 6 Differentiation Returning to /, we have fr = — /2 + ^/3 ~ ^/4 almos^ everywhere, so that, \f'\ < Y^j=i fj Цепсе, by (6.27), This shows that /' G C1 ([a, b]) and completes the proof of the lemma. LEMMA 6.5 Suppose g satisfies the hypotheses of Proposition 6.7. If gf = 0 ae on [а,Ь], then g is constant on [a, b]. PROOF: Let x € (a, b] be fixed but arbitrary. We will establish the lemma by showing that g(x) = g(a). By hypothesis, g'(y) = 0 for almost all у E (a,x). Let E = {у E (a,x) : g'(y) = 0} and note that X(E) = x — a. Let e > 0. Then, by assumption, we can choose 6 > 0 such that if {(efc,bfc)}fcLi is any finite disjoint collection of subintervals of [a, b] with the property that Y^k=i(!>k — flfc) <6, then m fc=l € 2 (6.28) If у E E, then lim^o+G?(Z/ + Л) — g(yf)/h = 0 and, therefore, for h sufficiently small, \g(y + h) -g(y)| < j < 2 \X CL) (6.29) Because у E E C (a, x), we also have у + h E (a, x) for h sufficiently small. It follows that the collection, V, of all closed intervals of the form [у,У + h], where h > 0, у E E, \g(y + h) — g(y)\ < he/2(x - a), and [у, У + h] C (a, x) is a Vitali cover of E. So, by the Vitali covering theorem, there exist pairwise disjoint members of V, say [yj,yj + hj], 1 < j < n, such that ( & \ U [Vjh Vj + hj] j < 5. (6.30) By relabeling, we can assume that a < y± < У2 < • • • < yn < x. Therefore, we have Fig. 6.3, to which it will be helpful to refer in the ensuing discussion.
6.4 Absolutely Continuous Functions □ 349 1 J 1 1 1 1 1 I a У1 -yi + hl У2 02 4- h2 • • * Уп Уп 4~ hn x FIGURE 6.3 Now, from (6.30) and the fact that A(F) = x — a, we can conclude that hj > x — a — 6. It follows that n—1 (2/1 ~ a) + £(j/j+i - (% + hj)) + (x - (j/n + hn)) < 6. 1=1 In other words, the sum of the lengths of the pairwise disjoint intervals (a,?/i), (?/i + hi,2/2), • • •, (yn 4- hn,x) is less than 6. Therefore, by (6.28), n—1 |p(Vi) - s(a)l + 52 ^(j/j+x) - 9{Vj + M + - g(.Vn + hn)\ < 2’ J=1 On the other hand, by (6.29), E b(s/fe + hk) - g(yk)\ < £ - i• fc=l fc=l k 7 Consequently, by the previous two relations and the triangle inequality, IpW - < I^(z/1) - $(a)l + |p(z/i + hl) - g(yi)| + IflW - g(yi + hi)| + |p(2/2 + h2) - 0(3/2) | 4- • • • 4- \g(yn + hn) - p(2/n)| 4- \g(x) - g(yn 4- hn)| As e > 0 was chosen arbitrarily, this shows that g(x) = 0(a). We can now prove Proposition 6.7 (page 346). Let f satisfy the hy- potheses of that proposition. From Lemma 6.4, we know /' e £1([a, 6]). Now, set F(x) = I f'(t) dt, a < x <b. J a Then F' = f ae on [a, b] (Theorem 6.5) and F is absolutely continuous on [a, b] (Proposition 6.4(a)). The latter fact and Proposition 6.6 imply
350 □ Chapter 6 Differentiation that F satisfies the hypotheses of Proposition 6.7 and, consequently, so does F - f. But (F—fY = F'-f = 0 ae on [a, b] and, consequently, by Lemma 6.5, F — f is constant on [a, b]. Because F(a) = 0, we can now conclude that for all x € [a, b], F(x) — f(x) = —/(a); or, in other words, /(ж) = /(a) 4- [ f'(t) dt, a <x<b. J a This shows that f is absolutely continuous and completes the proof of Proposition 6.7. Second Fundamental Theorem of Calculus We summarize Propositions 6.6 and 6.7 in the following theorem, which is often referred to as the second fundamental theorem of calculus for Lebesgue integration. THEOREM 6.6 Second Fundamental Theorem of Calculus Suppose f is defined on [a, b]. A necessary and sufficient condition for f to exist almost everywhere and be Lebesgue integrable on [a, b], and for [ f'(t) dt = f(x) — f(a), a < x <b, J a is that for each e > 0 there is a 6 > 0 such that 52Z=i l/(M ~ f(ak)\ < € whenever {(a^, bfc)}^=1 is a finite sequence of pairwise disjoint subintervals of[a,b] with 1Х=1(Ьк “ ak) < 6. We conclude this section by giving necessary and sufficient conditions for a function to be absolutely continuous on TZ. The proof is left as an exercise for the reader. (See Exercise 6.56.) THEOREM 6.7 A function f is absolutely continuous on TZ if and only if it is absolutely con- tinuous on every finite closed interval, V^f < oo, and lim^—-oo f(x) = 0. EXERCISES 6.4 6.41 Establish the following facts. a) Suppose f is defined on [a, b] and ff exists and is Riemann integrable on [a, b]. Prove that f absolutely continuous on [a, b]. Conclude that f is absolutely continuous on [a, b] if f' is continuous on [a, b].
6.4 Absolutely Continuous Functions □ 351 b) Suppose f is continuous on 7£, V^J < oo, and limx_>_oo f(x) = 0. Further suppose there exists a finite number of points such that f is absolutely continuous on any finite closed interval that contains none of those points. Prove that f is absolutely continuous on H. c) Suppose F is a continuous distribution function such that F' exists and is continuous except at a finite number of points. Prove that F is absolutely continuous on Tt. 6.42 Prove that f(x) = y/x is absolutely continuous on [0,1]. 6.43 Show that if f and g are absolutely continuous on [a, d], then f + g is absolutely continuous on [a, 6]. 6.44 Show that if f is absolutely continuous on [a, 6] and a G C, then af is absolutely continuous on [a, 6]. 6.45 Let a > 0 and define f: 71 —► TZ by f(x) = e“Qlxl. Show that f is absolutely continuous on TZ. 6.46 Is f(x) = 1 absolutely continuous on [0,1]? On 7£? 6.47 Define f on [0,1] by /(0) = 0 and f(x) = xsin(l/a:), for x / 0. From Example 6.7 on page 345, we know that f is not absolutely continuous on [0,1]. Show that f is absolutely continuous on any interval [a, 6] not containing 0. 6.48 Define f on [0,1] by /(0) = 0 and f(x) = xa sin(l/x), for x / 0. Show that f is absolutely .continuous on [0,1] if and only if a > 1. 6.49 Let f be real-valued and absolutely continuous on [a, b\. Prove that a) f takes sets of measure zero to sets of measure zero; that is, if E C [a, 5] and A(E) = 0, then A (/(E)) = 0. b) / takes measurable sets to measurable sets; that is, if A C [a, b] is measurable, then so is f(A). Hint: Choose an increasing sequence of closed sets contained in A such that the set difference between A and the union of the closed sets has measure zero. Next apply part (a) to show that the set difference between the image of A and the union of the images of the closed sets has measure zero. 6.50 Suppose /: [a, b] —► [c, d] is absolutely continuous and monotone and g is absolutely continuous on [c,d]. Prove that g о f is absolutely continuous on [a, 6]. 6.51 Proposition 6.5 on page 344 states in part that if / is absolutely continuous on [a, 5], then V^f = j* |/'(x)| dx. Show that the hypothesis of absolute continuity cannot be weakened by finding a continuous function of bounded variation that does not satisfy the previous equation. 6.52 Give an example of a function / that is absolutely continuous on [0,1] but is such that /' is not Riemann integrable on [0,1]. 6.53 Construct an absolutely continuous function on [0,1] that is strictly increas- ing but whose derivative vanishes on a set of positive measure. Hint: Let Pa be as in Exercise 3.39 on page 126 and set f(x) = f* xp£ (t) dt.
352 □ Chapter 6 Differentiation 6.54 In establishing Lemma 6.4 on page 346, we proved that if f is of bounded variation on [a, d], then f' G £1([a, 6]). Here is an alternate derivation of that result. a) Show that f is Lebesgue measurable. b) Show that /' is Lebesgue measurable. c) Prove that dx < V£f and, therefore, that }' is Lebesgue inte- grable on [a, 6]. Hint: Use (6.14) on page 332. 6.55 A function, /, is said to be Lipschitzian on [a, b] if there is a constant M such that l№) - < M\x - j/|, x,j/e[a,b]. a) Show that if f has a bounded derivative on [a, d], then it is Lipschitzian on [a, 6]. b) Prove that if f is Lipschitzian on [a, d], then it is absolutely continuous thereon. 6.56 Prove Theorem 6.7. Hint: Use (6.26). 6.57 Integration by parts: Suppose f and g are absolutely continuous on [а, Ь]. Then fg' and f'g are in C1 ([a, b]) and pb pb I f(.x)g'(x>) dx = f(tyg(b) - f{a)g(a) - / f(x)g(x) dx. (6.31) J a J a Establish this result by employing the following steps. a) Show that fg' and f'g are in £*([а, &]). b) Prove that fg is absolutely continuous on [a, 5]. Hint: Show that the hypotheses of Proposition 6.7 (page 346) are satisfied. c) Prove that (6.31) holds. Hint: Recall the product rule from elementary calculus. 6.58 Give an example where the integration by parts formula fails in case f is ab- solutely continuous and g is uniformly continuous and of bounded variation but not absolutely continuous. 6.59 Let h € С1 ([a, &]) and define F(x) = f h(t)(x — t)n dt\ a <x <b. J а Show that F is n-times differentiable on [a, 6], F^ is absolutely continuous on [a, d], and = n! h ae on [a, b]. Hint: Use induction and integration by parts. 6.60 Taylor’s theorem: Suppose f is defined on [a, d], f is n-times differentiable on [a, 6], and f^ is absolutely continuous on [a, b]. Then, for a < x < b, Лх) = f(a) + /'(a)(x - a) + • • • + - a)n . r (6-32) + ± / /(n+1)(t)(x-t)"dt.
6.4 Absolutely Continuous Functions □ 353 Establish this result. Hint: Use induction on n, integration by parts, and Exercise 6.41. 6.61 Show that the hypothesis of absolute continuity cannot be removed in the version of Taylor’s theorem given in the previous exercise. 6.62 Prove that the converse of Taylor’s theorem is true: Suppose that f is defined on [a, 6] and that there are constants ao, ai, ..., an and a function h G C1 ([a, 6]) such that for a < x < 6, f{x) = a0 + ai(x — a) 4----Hn(z-a)n+ f h(t)(x — t)n dt. \ J a Then, on [a, 6], f is n-times differentiable and is absolutely continuous. Moreover, ak = f^k\a)/kl, 0 < к < n, and h = /^n+1^/n! ae on [a, 6]. Hint: Use Exercise 6.59. ★6.63 Integration by substitution: Suppose that g is a monotone and abso- lutely continuous function on [a, b] with range [c, d] and that f G E1 ([c, d]). Then (/ о g)g G C1 ([a, 6]) and [ №(*W(*)|d:r = f f(y)dy. Ja J c Establish this result by employing the following steps. a) Show that, without loss of generality, we can assume that g is nonde- creasing. b) Show that for each open set О C [c, d], A(O) = f g'(x)dx. Jg-ЦО) c) Show that if D is a G^-set, then D satisfies the equation in part (b). d) Let H = {x : g'(x) /0}. IfEC [c,d] has measure zero, prove that А* (t?-1 (E) П H) = 0 and, hence, that g-1 (E) П H is measurable and has measure zero. e) Prove that if A C [c, d] is measurable, then so is g-1(A) Г) H and A(A) = / g'(x)dx. f) Show that if f is a nonnegative measurable function on [c, d], then the function (/ о g)g' is measurable on [a, 6] and /»Ь rd / f(g(.x))g'(x) dx = / f(y)dy. J a J c
354 □ Chapter 6 Differentiation g) Show that if f G C1 ([c, d]), then (/ о g)gr G C1 ([a, 5]) and the equation in part (f) holds. 6.64 Let f be the function in Exercise 6.53. a) Show that there is a set E of measure zero such that /~1(E) is not measurable. Hint: Use the fact that any set of positive (outer) measure contains a nonmeasurable set. Also, refer to Exercise 6.63(e). b) Show that the function g = 1 is not absolutely continuous. Hence, the inverse of an absolutely continuous function, when it exists, need not be absolutely continuous. Hint: Refer to Exercise 6.49(b). 6.5 SIGNED MEASURES We discovered earlier (see, e.g., Exercise 4.61) that, if (Q, A, /1) is a measure space and f is a nonnegative extended real-valued Л-measurable function on Q, then the set function i/(A) = [ fdfi, A G A, (6.33) J A is a measure on A. What about the converse: If (Q,A, /i) is a measure space and у is a measure on A, can a nonnegative extended real-valued A-measurable function f be found such that (6.33) holds? It is easy to see that the answer to this question is no! For example, take (Q,A,/i) = (7£, A4,A) and у = 6q, the Dirac mea- sure concentrated at 0. Then a representation of the form (6.33) is impos- sible. Indeed, if f is any nonnegative extended real-valued Ad-measurable function, then 5o({O}) = l/O = [ fdX. J{0} What goes wrong here is the following: There exists a set A, namely, the set {0}, such that A(A) = 0 but^o(^) / 0, whereas, by Proposition 4.8(d), (6.33) forces i/(A) = 0 whenever /i(A) = 0. In other words, M(A) = 0 => i/(A) = 0 (6.34) is a necessary condition for a measure у to be representable as in (6.33). Remarkably, in most important cases, (6.34) is also a sufficient condition for that representation. That fact is called the Radon-Nikodym theo- rem and will be proved in Section 6.6.
6.5 Signed Measures □ 355 EXAMPLE 6.8 Illustrates (6.33) and (6.34) a) Let I denote the collection of all intervals I С 7£, including degenerate intervals of the form (a,a) and [a, a]. Also, let у be the measure on В such that i/(J) = 0 if I C (—oo,0) and y(I) = e~3a — e~3b if I has endpoints a and 5, where 0 < a < b < oo; according to Corollary 4.6 on page 218, у is determined by these conditions. If we let /i = A|#, then у has a representation in the form (6.33), namely, i/(B) = [ fdX, BeB, Jb where f(x) = 3e~3x for x > 0, and f(x) = 0 otherwise. This is true because the measure w(B) = fBfdX agrees with у on intervals and so, by Corollary 4.6, must equal v. b) Let (fi, Л,//) — (Tt,At,A) and E € AL Set у (A) = X(ACiE), A € At. Clearly (6.34) holds. Here it is obvious that у can be represented in the form (6.33), namely, y(A) = fAXEdX. c) Let (fi, Д) = (Ti, At) and v = <50- As we have seen, v cannot be rep- resented in the form (6.33) if = A. However, if /1 is counting mea- sure on (Ti, Л4), then v does have such a representation, namely, with f = X{0}- d) Let (fi, Л) = (Ti, B) and be the Cantor function. Define ' F(x) = °, < i/j(x), 11, x < 0; 0 < ж < 1; x > 1. Note that F is a distribution function, that is, it satisfies (a)-(d) of Definition 4.20 on page 226. Therefore, by Theorem 4.13 on page 226, there is a unique finite Borel measure, i/, having F as its distribution function. We claim that у has no representation in the form (6.33) if = A|g. Suppose to the contrary that there is a nonnegative Borel measurable function f such that y(A) = fAf dX, for A G B. Because y(R) = 1, / e £X(T£); hence, if we let g = /|[o,i], then 9 G £X([0,1]). Moreover, for 0 < x < 1, ^(x) = F(x) - F(0) = i/((0,z]) = [ fdX= [ g(t)dt. J(o,x] Jo This implies that is absolutely continuous on [0,1], which we know is not true. So у cannot be represented in the form (6.33) if /1 = A|g. □
356 □ Chapter 6 Differentiation Signed Measures In order to prove the Radon-Nikodym theorem, we need to introduce signed measures. This is a simple generalization of measures where the nonneg- ativity condition is dropped. Thus, any measure is a signed measure, but not conversely. DEFINITION 6.9 Signed Measure Let (Q, A) be a measurable space. A signed measure, i/, on A is an extended real-valued function satisfying the following conditions: a) ^(0) = °- b) If Ai, A2, ... are in A, with Ai f}Aj = $ for i / j, then v = 52р(лп)- ' n ' n Remark: The equation in (b) of Definition 6.9 is taken in the extended sense; that is, any values in 7£* are permitted. However, the right-hand side of the equation must make sense: it must converge or it must diverge to ±oo. In particular, и cannot take on both oo and -oo as values; that is, if i/(E) = oo for some E e A, then z/(A) > —oo for all A € A, and if v(E) = —oo for some E e A, then i/(A) < oo for all A € A. EXAMPLE 6. 9 Illustrates Definition 6.9 a) Let (Q, A) be a measurable space, ai,a2 € and /ii and /i2 measures on A, at least one of which is finite. Define ai/ii + a2/i2 on A by (aiMi + a2/i2)(A) = ai/ii(A) + a2/i2(A). It is easy to see that ai/ii + a2/i2 is a signed measure on A. Note that if ai > 0, then is a measure on A. b) Let (Q, A, /i) be a measure space and / 6 £*(//) be extended real-valued. Define i/(A) = I f dp, AeA. J A By Exercise 4.72 on page 201, у is a signed measure on A. If f is nonnegative, then v is a measure. □
6.5 Signed Measures □ 357 The Hahn Decomposition Theorem From Example 6.9(a) it follows that the difference of two measures on a a-algebra, at least one of which is finite, is a signed measure. Our next goal is to show that the converse is also true: Any signed measure on a a-algebra can be expressed as the difference of two measures on that a-algebra, at least one of which is finite. The idea is the following. Let v be a signed measure on a a-algebra A. Suppose we can find a set De A such that y(E) > 0 for each Д-measurable subset of D and such that i/(E) < 0 for each Д-measurable subset of Dc, Then we can define set functions, i/+ and i/“, on A by i/+(A) = ^(Д П jD) and у^~(А) = —y(A П jDc). And it is easy to see that i/+ and y~ are measures on A and that и = — y~. The existence of such a set D is the substance of the Hahn decompo- sition theorem, and the pair (jD, Dc) is called a Hahn decomposition for y. We begin with the following definition. DEFINITION 6.10 Positive and Negative Sets Let (Q, Д) be a measurable space and у a signed measure on A. A set P e A is called positive for у if i/(E) > 0 for all sets E e A with EcP. A set N e A is called negative for у if у(Е) < 0 for all sets E e A with E C N. EXAMPLE 6.1 0 Illustrates Definition 6.10 a) Let (Q, Д) = (7^,Л4), ц = <5o + <5i, and у = A —g. Note that for A e Л4, (X(A), y(A) = A(A) - 1, A(A) - 2, if {0,1} П A = 0; if exactly one of 0,1 are in A; if {0,1} C A. The sets 7i\ {0,1} and (1,2) are positive for y, the sets {0,1} and Q are negative for i/; the set {2,3,4,...} is both positive and negative for y\ and the set [0,1] is neither positive nor negative for i/, although we have iz([0,1]) = —1. In fact, the positive sets for у are precisely those At-measurable sets containing neither 0 nor 1; and the negative sets for у are precisely those At-measurable sets having Lebesgue measure zero. Moreover, the pair consisting of TZ \ {0,1} and its complement is a Hahn decomposition for i/, as is the pair consisting of 11 \ {0,1,2} and its complement, etc.
358 □ Chapter 6 Differentiation b) Let (Я,Л) be a measurable space and v a measure on Л. Then ev- ery Л-measurable set is positive for y\ the negative sets for v are the Д-measurable set having z/-measure zero. A Hahn decomposition for у is (Q,0). c) Any set that is both positive and negative for a signed measure v must have ^-measure zero. However, there may be sets of z/-measure zero that are neither positive nor negative for y. For instance, if v is the signed measure defined in part (a), then the set [0,1) has z/-measure zero, but is neither positive nor negative for y. □ Using the terminology introduced in Definition 6.10, we see that in a Hahn decomposition, (jD,jDc), the set D is positive for у and its comple- ment is negative for v. Thus, intuitively, the set D should in some sense be a maximal positive set. In proving the Hahn decomposition theorem, we will need several lem- mas. The first lemma shows that signed measures share important prop- erties with measures. Its proof is left as an exercise for the reader. LEMMA 6.6 Let (П,Л) be a measurable space and у a signed measure on A. Then the following hold: a) If А, В e Л, A С B, and |i/(B)| < oo, then |i/(A)| < oo. b) If c Е1СЕ2С--', then i/f M En\ = lim i/(En). \ / n—>oo xn=l ' c) If {Дг}^1 C A with Ei D E? 2> • • and |iz(Ei)| < oo, then v( П = lim \ 1 • / n—+00 xn=l 7 LEMMA 6.7 A countable union of positive sets is positive. PROOF: Let (Я,Л) be a measurable space and у a signed measure on Л. Suppose {Pn} С Л is a sequence of positive sets for y. We claim that Un^Pn is positive for y.
6.5 Signed Measures □ 359 Let E c Un^n- We must show that p(jE) > 0. To that end, we “disjointize” E as follows. Set E± = E П P± and, for n > 2, n—1 En — E П Pn \ Pfa. k=l Then the Ens are pairwise disjoint and (Jn En = E. Since Pn is positive for у and En c Pn, we have p(En) > 0. Hence, p(E) = p(En) >0- The next lemma shows that any set of finite positive p-measure has a subset of positive p-measure that is positive for y. LEMMA 6.8 Let (Q,A) be a measurable space and у a signed measure on A. Suppose A e A and 0 < y(A) < oo. Then there is an A-measurable set P C A that is positive for у and satisfies y(P) > 0. PROOF: Note that by Lemma 6.6(a), any subset of A has finite p-measure. The idea of the proof is to keep extracting sets from A that have negative p-measure of large magnitude. If A is positive for p, we are done. Otherwise, A contains a set with negative p-measure. In that case, let Iq = inf{p(E) : E C A}. By assumption, < 0; so there is an n E V such that Zq < —n-1. Let ni be the smallest such n. By definition of Zq, there is an Ai C A such that p(Ai) < —n^1. Note that if n < ni, then L\ > —п"1 and hence p(E) > —n""1 for all E C A. Extract Ai from A; that is, consider A \ Ax. Note that p(A \ AJ = p(A) — p(Ai) > p(A) + — > 0. ni If A \ Ai is positive for p, we are done. Otherwise, A \ Ai contains a set with negative p-measure. In that case, let L2 = inf{ p(E) : E C A \ Ai }. By assumption, L2 < 0; so there is an n e Af such that L2 < —n~1. Let П2 be the smallest such n. By definition of Z2, there is an A2 C A \ Ai such that р(Аг) < — П2 Note that if n < 712, then L2 > —n”1 and hence p(E) > -rT1 for all E c A \ Ai. Extract A2 from A \ Ai; that is, consider A \ Ai \ A2 = A \ (Ai U A2). Note that р(л \ (41 и 42)) = 1/(4) - (i/(4i) + i/(42)) > i/(4) + — + — > 0. П1 n2
360 □ Chapter 6 Differentiation If this process terminates after a finite number of steps, we are done. Otherwise, we obtain a sequence of pairwise disjoint subsets of A and a sequence of positive integers {n^}^ such that for each E M Ak C Aj, z/(Afc) < — тг^1, and is the smallest positive integer n for which there is a subset of A\(J^~X Aj having z/-measure less than —п"1. Let P = A \ Afc, and note that oo oo - p(P) = i/(A) - £ p(Afe) > u(A) + £ - > 0. k=i fc=iПк We claim that P is positive for y. Suppose to the contrary that there is a set В G P with i/(B) < 0. Since i/(A) < oo, Lemma 6.6(a) implies that p(P) < oo. Consequently, < 00 anc^ so> particular, rik oo as к —> oo. Since rik —> oo, there is a fcg such that (nk0 — I)-1 < — i/(B) or, in other words, i/(B) < — (п&0 — I)-1. But В С P and so В С A\|J^2=^1 Aj. This contradicts the minimality of nk0. Hence P is positive for i/. THEOREM 6.8 Hahn Decomposition Theorem Let (Q, A) be a measurable space and у a signed measure on A. Then there is a set D G A such that D is positive for у and Dc is negative for y. The pair (D, Dc) is called a Hahn decomposition for y. PROOF: We can assume without loss of generality that у does not take on the value oo. (Why can this be done?) As we mentioned earlier, D should in some sense be a maximal positive set for y. With that idea in mind, let p = sup{ i/(P) : P positive for у }. Then we can choose a sequence {Pn}^Li of positive sets for у such that lim^oo i/(Pn) = p. Let Dn - Ufc=i pk and D = UXi Dn- Applying Lemma 6.7 twice we see, in turn, that D2, ..., are positive for у and D is positive for y. To show that (D,DC) is a Hahn decomposition for 1/, it remains to prove that Dc is negative for y. To that end, we first note that because D± C D2 G • • •, Lemma 6.6(b) implies that y(D) = lim^oo i/(Dn). Also, Dn D Pn and so Dn = Pn U (Dn \ Pn). Since Dn \ Pn G and Dn is positive, y(Dn \ Pn) > 0; hence, y(Dn) > y(Pn)- Recalling that D is positive, we have P > „(J?) = lim y(Dn) > lim y(Pn) = P- n—>oo n—>00
6.5 Signed Measures □ 361 Consequently, y(D) = p. Since у does not assume the value oo, we must have p < oo. Now, suppose that Dc is not negative for y. Then there is a set A C Dc with y(A) > 0 and, by assumption, i/(A) < oo. Thus, by Lemma 6.8, A contains a set P that is positive for у and has positive z/-measure. Since P and D are positive, so is P U D; and since P C Dc, P П D = 0. Hence, P U D is positive for у and i/(PUD) = i/(P)+^)>p. This contradicts the definition of p. Hence Dc is negative for y. Is the Hahn Decomposition for a signed measure unique? In general, it is not. For example, consider the signed measure у defined in Exam- ple 6.10(a) on page 357. Then (7£\{0,1}, {0,1}) and (7£\{0,1,2}, {0,1,2}) are both Hahn decompositions for y. However, as Exercise 6.69 shows, if (D, Dc) and (E, Ec) are two Hahn decompositions for a signed measure i/, then D and E differ by a set that contains only sets of z/-measure zero, and likewise for Dc and Ec. The Jordan Decomposition Theorem Now that we have established the existence of a Hahn decomposition for a signed measure, we can easily prove that any such measure can be ex- pressed as the difference of two measures, a result known as the Jordan decomposition theorem. Before doing so, we introduce the following terminology. DEFINITION 6.11 Mutually Singular Measures Let (Q, A) be a measurable space. Two measures, pi and /12, on A are said to be mutually singular, denoted pi ± /12, if there is a set E e A such that pi(Ec) = 0 and p2(E) = 0. Note that if pi ± /12, then pi and p2 are supported by complementary sets: For each A G A, pi (A) = pi(A A E) and /12(A) = Pz(A A Ec). EXAMPLE 6.11 Illustrates Definition 6.11 a) Let Л be Lebesgue measure and p any discrete measure on Л4, that is, there is a countable set, K, such that p(Kc) = 0. Then p ± Л.
362 □ Chapter 6 Differentiation b) Let Л be Lebesgue measure and /1 be the Borel measure induced by the Cantor function. Then, considered as Borel measures, /1 ± A. Indeed, if P is the Cantor set then p(Pc>) = 0 and A(P) = 0. c) Let D = {(rr, y) e P2 : у = x }. Define у by i/(B) = A({x € P : (rr,x) e В }), В € B2. Then Лг(В) = 0 and y(Dc) = 0, so that и ± A2. □ THEOREM 6.9 Jordan Decomposition Theorem Let (Q, A) be a measurable space and у a signed measure on A. Then у can be expressed uniquely as the difference of two mutually singular measures, y+ and v~, on A. The representation у = i/+ — y~ is called the Jordan decomposition of y. PROOF: Let (jD,Dc) be a Hahn decomposition for v and, for A e A, define i/+(A) = y(A A D) and i/-(A) = —y(A A Dc). Clearly, v = — y~ and, because (B, Dc) is a Hahn decomposition for 1/, z/+ and y~ are nonnegative. Noting that i/+(Dc) = 0 and i/~(jD) = 0, we see that y+ ± v~. Now suppose we can write v = /11 — /12, where and p2 are mutually singular measures on A. We must show that /11 = i/+ and p2 = y~. Since /11 ± /i2, we can choose E e A such that pi(Ec) = 0 and /12(B) = 0. We claim (E, Ec) is a Hahn decomposition for 1/. Indeed, if F С E, then /i2 (F) = 0, so that i/(F) = /11 (F) > 0; hence, E is positive for 1/. On the other hand, if F C Ec, then /11(F) = 0, so that i/(F) — —/12(F) < 0; hence, Ec is negative for y. From Exercise 6.69, we have for all A e A that y(A A D) = y(A A E) and у (A A Dc) = у (A A Ec). The former equality implies that for A € A, /11 (A) = /ii (A A E) -f- /ii (A A Ec) = /11 (A A E) = /ii(A A E) — /1г(А A E) = i/(A A E) = p(AaP) = /(A). Thus, /ii = i/+. Similarly, p2 = y~.
6.5 Signed Measures □ 363 EXERCISES 6.5 6.65 Refer to Example 6.8(d) on page 355, where is denotes the Borel measure induced by the Cantor function. Show that there is no nonnegative Borel measurable function, /, such that is(B) = fB f dA, for В e B, by finding a set A e В such that A (A) = 0 and i/(A) / 0. 6.66 Define is(E) = fExdx. a) Show that, for each positive real number c, the preceding definition yields a signed measure on Л4[_с,с]. b) Is is a signed measure on Л4? Justify your answer. 6.67 Prove Lemma 6.6 on page 358. 6.68 Concerning the proof of the Hahn decomposition theorem: a) Why can we assume without loss of generality that is does not take on the value oo? b) What happens if p = 0? 6.69 Let (Q, A) be a measurable space and is a signed measure on A. Suppose (D, Dc) and (E, Ec) are two Hahn decompositions for is. a) Show that D and E differ by a set that contains only sets of i/-measure zero, that is, if A G A and A C (D \ E) U (E \ D), then is(A) = 0. b) Show that Dc and Ec differ by a set that contains only sets of i/-measure zero. c) Prove that is (A A D) = is(A A E) and is(A A Dc) = v(A A Ec) for all A g A. 6.70 Let (Q, A, p) be a measure space and is and us measures on A. Show that if и ± p and ш1ц, then is 4- w ± p. 6.71 Let (Q, A) be a measurable space such that {x} G A for each x G Q. A measure, p, on A is called continuous if it has no atoms, that is, p({x}) = 0 for all x G Q. If p is a continuous measure on A and v is a discrete measure on A, show that p ± is. 6.72 Refer to Example 6.11(c) on page 362. For В G B2, let A = {x G 7£ : (x,x) G B}. a) Provide a geometric interpretation of the relation between A and B. b) Prove that A G B. 6.73 Let A be Lebesgue measure and p be the Borel measure induced by the Cantor function. Find a Borel measure, cv, such that ш _L A|g and и ± p. 6.74 Provide an example to show that the uniqueness condition in Theorem 6.9 fails without the requirement of mutual singularity. ★6.75 This exercise will be useful as motivation for the proof of the Radon- Nikodym theorem. Let (Q,A, p) be a measure space and f G £1(/i) be extended real-valued. Define I/(A) = У fdn, AeA.
364 □ Chapter 6 Differentiation As we have seen, v is a signed measure. Let D = { x: f(x) > 0 }. a) Show that (D, Dc) is a Hahn decomposition for и. b) Prove that for A 6 A, i/+(A) = j f+dp and i/“(A) = j f~ dp. 6.76 Let (Q, A) be a measurable space and i/ a signed measure on A. Prove that for each A G A, y+(A) = sup{ i/(E) : E G A, E C A } and i/“(A) = -inf{i/(E) :Ec A, EC A}. 6.77 Let (Q,A) = (A/”, PlT^)) and a sequence of real numbers such that £Г=1 la"l < °°- Define v on P(AT) by v(A) = “n- a) Prove that у is a signed measure. b) Determine i/+ and y~. 6.78 Let (Q,A) be a measurable space and i/i and 1/2 measures on A, at least one of which is finite. Define у = 1/1 — 1/2- Prove that v+ < v\ and v~ <1/2- 6.6 THE RADON-NIKODYM THEOREM Let (Q,A, p) be a measure space and и a measure on A. We want to determine when у can be represented in the form i/( A) = [ fdp, Ac A, J A for some nonnegative extended real-valued Л-measurable function f on Q. As we have seen, a necessary condition for such a representation is that p(A) = 0 whenever //(A) = 0. The Radon-Nikodym theorem shows that, subject to 67-finiteness restrictions, that condition is also sufficient for such a representation. In the following definition, we give that condition a name. DEFINITION 6.12 Absolutely Continuous Measures Let (Q, A) be a measurable space and p and v measures on A. Then v is said to be absolutely continuous^ with respect to p, denoted 1/ /2, if ^(A) = 0 whenever /i(A) = 0. t The reason for the term “absolutely continuous” will become apparent shortly.
6.6 The Radon-Nikodym Theorem □ 365 EXAMPLE 6.12 Illustrates Definition 6.12 a) As we have already noted, if f is a nonnegative extended real-valued Л-measurable function, then the measure i/(A). = f fdp, A e Л, is absolutely continuous with respect to д. b) Let v be the Borel measure on induced by the Cantor function. Then, as Borel measures, no one of the measures <5q, v, and A, is absolutely continuous with respect to one of the others; that is, <?o A, A i/, and so forth. c) Let (Q, Д) = (7£,.Л/t). We have <5q <5o 4* , but <5q + <$i <$o- d) Let (Q, Д) = (7£,Л1). Define /(x) = 0 for x < 0, and /(x) = e~x for x > 0. Set z/(A) = fAfdX for A e Л4. Then i/ < A, but A v. □ THEOREM 6.10 Radon-Nikodym Theorem Let (£l,A,p) be a а-Hnite measure space and v a а-finite measure on Д. If и p, then there is a nonnegative extended real-valued A-measurable function, f, on SI such that i/(A)= [fdp, AeA. (6.35) J A Moreover, f is unique in the sense that if д is a nonnegative extended real- valued A-measurable function with i/(A) = fAgdp for all A e A, then 9 = f ц-ае.) Before proving the Radon-Nikodym theorem, let us consider the main idea behind the proof. Suppose, say, that p is a finite measure on (Q, Л) and that We want to show that (6.35) holds for an appropriately chosen f. What would f have to look like? Let а > 0 and note that и — ар is a signed measure. If f is the required (but unknown) function, then (р-сф)(Л) = У f d/i - сф(Л) = - a) dfj.. t Regarding the ст-finite conditions, stronger versions of the Radon-Nikodym theorem are available. See, for example, (19.27) in Hewitt and Stromberg’s Real and Abstract Analysis (New York: Springer-Verlag, 1965), p. 318.
366 □ Chapter 6 Differentiation By Exercise 6.75 on page 363, it follows that if Da = { x : /(x) > a }, then (Da,Dca)is a Hahn decomposition for i/ — ap. Thinking now of x as fixed and a as varying, we have x e Da if and only if a < /(x) and, therefore, /(x) = sup{ a : a < f{x)} = sup{ a : x € Da }. Thus the procedure for finding f will be essentially as follows: For each a > 0, let (Da,D^) be a Hahn decomposition for v — a/z. Define /(x) = sup{ a : x € Da } for x G Q. This should give a function, /, that satisfies (6.35). We now present a formal proof of the Radon-Nikodym theorem. PROOF: We first assume that ц is a finite measure. For each positive rational number, r, let (Dr, D£) be a Hahn decomposition for v—r/j,. Define f on Q by я/ 4 f sup{ r € Q : x e Dr }, if x € for some r; (0, otherwise. Clearly f is a nonnegative extended real-valued function on Q. We assert that f is Л-measurable. To prove that, it suffices to show that /~1([а,оо]) E Л for each a e It. For a < 0, the inverse image is Q. Hence, we can assume a > 0. We will show that for a > 0, /-1([a,oo])= n(UP-)’ (6-36) q<a 'r>q ' where, here and until specified otherwise, r and q denote positive rational numbers. From the definition of /, we have that /-1((g, oo]) = \Jr>q^r, for each q. Because [a, oo] = Пд<а(<7} oo], it follows that oo]) = P] /-1((g,oo]) = Q ({J Dr\ q<ot q<ot 'r>q ' and so (6.36) holds. Since Q is countable and Dr 6 A for each r, it follows from (6.36) that /-1 ([a, oo]) E A for each a > 0. Thus, f is Л-measurable. We next show that f satisfies (6.35). To that end, let A e A and, for each pair of rational numbers, a and /?, with 0 < a < /?, define E = {x e A i a < f(x) < /?}.
6.6 The Radon-Nikodym Theorem □ 367 We claim that a/z(E) < v(E) < /Зц(Е). (6.37) We begin by establishing the first inequality in (6.37). If a = 0, that inequality is trivial; consequently, we assume that a > 0. By (6.36), E C {x : f(x) > a} C |Jr>gZ)r f°r eac^ Я < a- If we can show that, for each q < a, the latter set is positive for p — q/z, then we will have (p — g/z)(F?) ^at is, p(-E) > qy(E), for each q < a, from which it follows that u(E) > ац(Е). So, suppose r > q and let F C Dr. Because Dr is positive fbr p — r/z, we have 0 < (i/ - r/i)(F) = i/(F) - rju(F) < 1/(F) - g/z(F) = (1/ - g/x)(F). Hence, (p — g/z)(F) > 0 for F C Dr and, consequently, Dr is positive for p — qfjL. Lemma 6.7 now implies that |Jr>q is positive for p — g/z, as required. To establish the second inequality in (6.37), we first note that by defi- nition, if f(x) < (3, then x £ Dp or, in other words, { x : f(x) < /3} C Dp. Because Dp is negative for p — /3/z and E C {x : /(x) < (3 } C Dp, it follows that (p — (3ijl)(E) < 0. That is, p(F-) < /Зц(Е). We have now shown that (6.37) holds. To continue, we need to consider where f is infinite on A. To that end, let H = {x : f(x) = oo }. We will show that if /z(A ПЯ) > 0, then p(A П H) = oo. In doing so, we will use the already established fact that, for each q, (Jr>g is positive for p — g/z. Using the definition of f, we see that if f(x) = oo, then for each q, there is an r > q such that x 6 Dr. Thus H C Ur>q an^’ hence, H is positive for p — q/z. Consequently, for each q, (v — q/y(A АЯ) > 0, that is, p(A A H) > g/z(A П H). -Hence, if /z(A АЯ) > 0, then p(A A H) = oo. Now, assume that /z(A A H) > 0. Then p(A A H) = oo and, hence, p(A) = oo. On the other hand, because /z(A A H) > 0 and f = oo on H, I f dp, > I fdii = oo. J A JADH Thus, if /z(A АЯ) > 0, then both sides of (6.35) equal oo. So, assume that /z(A A H) =0. Then, because p /z, we have p(A А Я) = 0. For each n G Af, set A f A “ 1 - Г/ A An,k — S X G. A '. < f (x) < In n
368 □ Chapter 6 Differentiation for к = 1, 2, .... Then A = Ап,к UlAflH) and, since the An,fc’s are pairwise disjoint and v(A A H) = 0, it follows that i/(A) = ^(An,fc). By (6.37), -—-ц(АП'к) < v(An<k) < —fj.(An<k) n n and, from the definition of An,fc, we conclude that -—-ц(Ап,к) < f fdfj,< -p.(An,k)- n JAn,k n Therefore, for 6 X [ fdfi - -/z(An,fc) < v(An,k) < [ f dp+-jjL(An,k). J a l. n Ja l n Recalling that ц(А П H) =0, we obtain upon summing on к that, for each neJJ, f fdii--^A)<v{A'}< i f dp +—ц(А). J a n Ja n Since we are assuming /1 is finite, /z(A) < °0? and therefore, letting n —> oo in the previous display, we get that [ fdp<v(A)< [ fdp. J A J A So again, (6.35) holds. Suppose now that /z is a а-finite measure. We can write fi as a count- able disjoint union of Л-measurable sets, {Fn}n, where p^En) < oo and i/(En) < oo for each n € X. (See Exercise 6.83.) Let (Еп,Лбп,М^п) be as usual, and set /zn = щЛЕп = цЕп and vn = ^AjEn. We have /J>n(En) = < oo and, since i/ //, we have vn Mn- Hence, by what we have proved for finite measures, there is a nonnegative AEn-measurable function gn such that ^n(-B) == / gndpn, В G AEn- J в
6.6 The Radon-Nikodym Theorem □ 369 For each n e Af, define fn on Q by fn(x) = gn(x) if x G En, and fn(x) = 0 otherwise. Then f = fn is a nonnegative Л-measurable function. Moreover, if A e Л, = ^v{Ar\En) = "^vn(Af\En) = 52 [ • 9ndfin n=l 71=1 71=1 = 52 / fn & = 52 [ fdn=[fdii. n=l J A(~}En n=l АПЕп J A Thus, (6.35) holds. It remains to prove uniqueness. So suppose that g is also a nonnegative extended real-valued Л-measurable function on Q such that z/(A) = gdfjL, Ae A. J A We must show that g = f //-ae. Let E = {x : f(x) > g(x)}. We claim that //(Е) = 0- Let {En}n be the sequence of sets defined in the preceding. We have, for each n G Af, / f dp, = I g d{i = i/(E П En) < oo. J E(~\En J ЕГ\Еп Thus, f and g are integrable over E П En with respect to /л and, conse- quently, so is f — g. Moreover, I (/-fz)^M= / fdn- [ gdfj. = 6. J E(~\En J E(~\En J ЕГ\Еп Since f — g > 0 on E and, hence, on E A En, we must have /л(Е A En) = 0. Consequently, /л(Е) = /л(Е A En) = 0. A similar argument shows that {x : g(x) > f(x) } has //-measure zero. Therefore, g — f /z-ae. DEFINITION 6.13 Radon-Nikodym Derivative The function f given in the statement of the Radon-Nikodym theorem is called the Radon-Nikodym derivative of и with respect to /z and is denoted by du/dfi. Remark: The Radon-Nikodym theorem shows that the Radon-Nikodym derivative is determined only up to sets of //-measure zero.
370 □ Chapter 6 Differentiation EXAMPLE 6.13 Illustrates Definition 6.13 a) Let a be a positive constant. Define F(x) = 1 — for x > 0, and F(x) = 0 otherwise; and let v denote the unique Borel measure induced by F. Define the Borel measure ш by = [ fdX, BeB, Jв where f(x) = ае~ах for x > 0, and f(x) = 0 otherwise. Then it is easy to see that ш has F for its distribution function and, so, w = v. Hence и С Л and dv/dX = f A-ae. Note that we also have, for example, dv/dX = g A-ae, where g(x) = ае~ах for x > 0, and g(x) = 0 otherwise, because g = f A-ae. b) Let p be the measure on defined by p(A) = 7(AnV), where 7 is counting measure on P(7£). Consider the measure и = 522L1 2~n6n on P(7£), that is, neArW If p(A) = 0, then A nV = 0 and, therefore, v(A) = 0. Hence, и <£ p. Let f(x) = 1/2* if x E V, and f(x) = 0 otherwise. We claim that dv/dp = f p-ae. Indeed, if A 6 P(?i), then f dp = I f dp + I f dp = [ f dp J A JacW JAC\Afc JAO!^ - oo - oo = / fXA dp=\^ fXA dp = V J(n)xA(n)g({n}) = 52 ^=VW- n=l пЕАПЛГ Note that a nonnegative function g on 1Z is a Radon-Nikodym derivative of v with respect to p if and only if p = / on V. c) Let (Q, Л, P) be a probability space and E e A such that P(E) > 0. Recall that the conditional probability measure, Pe, corresponding to E is defined by PE(A) = P(A | E) = P(E A A)/P(E). Clearly PE < P. A Radon-Nikodym derivative for PE with respect to P is xe/P(E} because, for each A e A,
6.6 The Radon-Nikodym Theorem □ 371 d) Refer to Example 5.5(c) on page 276. Let X be an absolutely continu- ous random variable on (П, Д, P) with probability density function fx- Then /zx C A (as Borel measures). Moreover, since by definition, fx is a nonnegative Borel measurable function such that px(B) = fx dX for В € Б, fx is a Radon-Nikodym derivative of /zx with respect to A. □ PROPOSITION 6.8 Let (Sl,A,p) be a а-finite measure space and у a а-finite measure on A such that и /z. If g E £x(i/), then E £x(/z) &nd / gdv = [ g^-d/j.. Jq Jn dp PROOF: Exercise 4.61(b) on page 191 shows that the proposition is true if g is nonnegative. For real-valued p, write g = g+ —g~ and apply the result for nonnegative functions twice. For complex-valued g, write g = 3?# + i^g and apply the result for real-valued functions twice. A Relation Between Absolutely Continuous Functions and Absolutely Continuous Measures In Section 6.4 we discussed absolutely continuous functions and, in this section, we discussed absolutely continuous measures. A relation between the two concepts is expressed in the following proposition. PROPOSITION 6.9 Let у be a finite Borel measure and Fy its distribution function. Then у is absolutely continuous with respect to Lebesgue measure if and only if Fv is absolutely continuous on TZ. In this case, dy/dX = F'v A-ae. PROOF: Suppose у A. Then, for В E B, y(B) = JB(dy/dX) dX. In particular, for В = (—оо,ж], f„(x) = р((-оо,х|) = £ dX = j_x £(«) dt. Because Fp(oo) = < oo, it follows that dv/dX € £1(7^). Conse- quently, by Proposition 6.4(b) on page 344, Fy is absolutely continuous on TZ and, by Corollary 6.2 on page 341, F' = dy/dX A-ae.
372 □ Chapter 6 Differentiation Conversely, suppose that Fy is absolutely continuous on TZ. Then, by definition, F' € £x(7£) and Fy(x) = Fy(t) dt, -oo < x < oo. Define cu(B) = [ F'ydX, BeB. J в Then cu has Fy as its distribution function and, consequently, by Theo- rem 4.13 (page 226), lu = v. It follows that у < A and dv/dX = Fy A-ae. Conditional Probability Given a cr-Algebra Let us recall the definition of conditional probability from Section 5.1: Suppose that (Q, Д, P) is a probability space and F is an event having positive probability, that is, F e A and P(F) > 0. Then, for E e A, the conditional probability of event E given that event F has occurred is defined by P(E|F)- P(F) ' We can generalize the notion of conditional probability by conditioning on a cr-algebra instead of just an event. The idea is that if Q is a cr-algebra with Q C A, then conditioning on Q means that we know whether or not each G E G has occurred; and the conditional probability of an event E, given G, denoted P(E\G), is the probability of E computed with that knowledge. To see how to define P(E | (7), we consider the simplest nontrivial case. Suppose that F is an event with probability strictly between 0 and 1. Let G be the cr-algebra generated by F, that is, the smallest cr-algebra con- taining F; clearly, G = {0, F, FC,Q}. Then, given G, we know whether or not F has occurred, that is, whether F has occurred or Fc has oc- curred. In the former case, P(E | G) = P(E | F) and, in the latter case, P(E | G} = P(E | Fc). In other words, P(E | G) = P(E | F)Xf + P(E | Fc)Xfc Note that P(E|(7) is not only a random variable (i.e., is Л-measurable), but is in fact (/-measurable. Furthermore, it is not too difficult to see that P(GnE) = [ P(E\G}dP, GeG- (6.38) Jg We can use the Radon-Nikodym theorem to show the existence of conditional probability given a cr-algebra in the general case. Specifically, with (6.38) in mind, we have the following proposition.
6.6 The Radon-Nikodym Theorem □ 373 PROPOSITION 6.10 Existence of Conditional Probability Let (Q, Л, P) be a probability space, E e A, and Q a а-algebra with Q C A. Then there exists a nonnegative (/-measurable function, P(E | G), such that P(GnE) = [ P(E\G)dP, GeG- (6.39) Jg Moreover, such a function is unique P-ae and is called the conditional probability of E given Q. PROOF: Define pe(G) = P{G П P), for G G G- Then pe is a finite measure on G and /ле < P. The result now follows from the Radon- Nikodym theorem. We will investigate further properties of conditional probability given a cr-algebra in the exercises. EXERCISES 6.6 6.79 This exercise provides an alternative for the definition of absolute continu- ity of measures. Let (Q, Л) be a measurable space and p and и measures on A with и finite. a) Prove that a necessary and sufficient condition for v p is that for each c > 0, there is a 6 > 0 such that v(A) < e, whenever A E A and p(A) < 6. Hint: For the necessity part, suppose to the contrary that there is an c > 0 such that for each 6 > 0, there is an A E A with p(A) < 6 and i/(A) > c. For each n e AT, let An correspond to 6 = 2“n. b) Show that weakening the finiteness condition on и to <7-finiteness inval- idates the necessity portion of part (a). 6.80 Let v be a finite Borel measure and Fu its distribution function. Provide an alternate proof of Proposition 6.9 (page 371) without relying on the Radon-Nikodym theorem, but instead by using Theorem 6.6 (page 350), Theorem 6.7 (page 350), and Exercise 6.79. 6.81 Find two measures, p and u, such that v <£ p, p v, and p JL v. 6.82 Suppose that p and v are measures on (Q, A) such that p ± v and p <C v. What can you say about p? 6.83 Let (Q,A) be a measurable space and p and v cr-finite measures on A. Show that Q can be written as a countable disjoint union of Д-measurable sets, {En}n, where p(En) < oo and v(En) < oo. 6.84 Let Q be a nonempty set and p counting measure on P(Q). Suppose v is a discrete measure on P(Q), that is, there is a countable set К C Q such that v(Kc) = 0. a) Show that v <C p.
374 □ Chapter 6 Differentiation b) Find du/dp. c) Are the hypotheses of our version of the Radon-Nikodym theorem (The- orem 6.10 on page 365) necessarily satisfied in this problem? 6.85 Define v on (7?,,Л4) by i/(A) = Л(АП [—2,2]). Show that i/ <£ A and find du/dX. 6.86 Let /x belhe measure on P(7£)) defined by /x(A) = ClAf), where 7 is counting measure on Let be a sequence of nonnegative real numbers and set v = an8n. Show that v p and find du/dp. 6.87 Refer to Example 5.5(a) on page 276. Let X be a discrete random variable on (Г2,Л, P) with probability mass function px- a) Show that px p, where p is counting measure on (7£,B). b) Prove that px is the unique Radon-Nikodym derivative of px with respect to p. 6.88 Provide an example showing that the cr-finiteness of p cannot be dropped as an hypothesis in the Radon-Nikodym theorem. 6.89 Let (Q, Л, p) be a a-finite measure space and i/i and 1/2 cr-finite measures on A. Assume ui p and 1/2 P- Prove that 1/1 + P2 < M and d(i/i + У2) du\ dv2 ----1------- M-ae. dp dp dp 6.90 Let (Q, Д) be a measurable space and u, 1/, and p cr-finite measures on A such that w v p- Show that w p and dcu _ dw du dp du dp ae* 6.91 Let (И,Л) be a measurable space and p and и сг-finite measures on A such that p <C и and v p. Prove that du I dp — =1 / — p-ae. dp / du 6.92 Suppose that p and v are two cr-finite Borel measures. Recall that the convolution of p and v is the measure defined by (д*1/)(В)= [ p(B — y)du(y), Be в. Jn a) Show that if p A, then p * и A and !^x~^dv^ хея- Hint: Refer to Exercises 4.157 and 4.158 (page 256).
6.6 The Radon-Nikodym Theorem □ 375 b) Show that if both /x and и are absolutely continuous with respect to Lebesgue measure, then fn^x~v^dx^’ x^- In words, the Radon-Nikodym derivative of the convolution of two mea- sures that are absolutely continuous with respect to Lebesgue measure is the convolution of their Radon-Nikodym derivatives. In Exercises 6.93-6.100, (£l,A,P) is a probability space and Q is a a-algebra of subsets of Q with Q C A. 6.93 Suppose that F is an event with probability strictly between 0 and 1. Let Q be the cr-algebra generated by F, that is, Q — {0,F, FC,Q}. Define P(E | £) = P(E | F)Xf + P(E | Fc)Xfc . a) Show that P(E | Q) is ^-measurable. b) Prove that for each G 6 Q, P(G П E) = fG P(E | £) dP. 6.94 Suppose that {Fn}n is a sequence of pairwise mutually exclusive events each having positive probability and such that (J Fn = Q. Let Q be the a-algebra generated by {Fn}n. a) Characterize the sets in Q. b) Prove that, with probability one, P(E | Q) = P(E | Fn)xrn. 6.95 Suppose that E eg. a) What does your intuition tell you regarding P(E | <7)? b) Prove your assertion in part (a). 6.96 Suppose that g = {Q,0}. a) What does your intuition tell you regarding P(E | £7)? b) Prove your assertion in part (a). 6.97 Establish that each of the following hold with probability one. a) P(Q|0) = 1. b) For each E e A, P(E | g) > 0. c) If Ei, E2, ... are in A, with Ei П Ej = 0 for i j, then P(U£" x n ' n *6.98 Conditional probability given a random variable: An important case of conditional probability given a a-algebra is when the cr-algebra is generated by a random variable, X. The cr-algebra generated by X, de- noted Л(Х), is by definition the smallest cr-algebra of subsets of Q for which X is measurable. We define the conditional probability of E given X, denoted P(E | X), to be the conditional probability of E given Л(Х); that is, by definition, P(E | X) = P(E | Л(Х)). a) Show that Л(Х) = { {X 6 В} : В e В].
376 □ Chapter 6 Differentiation b) Prove that there is a nonnegative Borel measurable function, </>, such that P(E | X) = ф о X, P-ae. c) Let ф be the function in part (b). For x G set P(E | X = x) = ф(х), called the conditional probability of E given X = x. Prove that P({XgB}DE) = I P(E\X = x)dpx(x), beB, J в where fix is the probability distribution of X. Hint: Use Theorem 5.6 on page 291. d) Prove that if g is a nonnegative Borel measurable function such that P({XeB}DE) = J g(x)dfix(x), BeB, then, for /ix-almost all я, g(x) = P(E | X = x). 6.99 Refer to Example 5.7(a) on page 279. Suppose X and Y are jointly discrete random variables with joint probability mass function px,Y- Define PY\x(y\x) = ' Рхх(х,У) < Px(x) 0, px(x) > 0; otherwise. a) Prove that for each у G 7£, P(X = y\X = x)=pY[x(y\x) for each possible value, x, of X. Hint: Use Exercise 6.98(d). b) Determine Р(У = у | X). 6.100 Refer to Example 5.7(b) on page 279. Suppose X and Y are jointly ab- solutely continuous random variables with joint probability density func- tion fx,Y- Define fY\x(y\x) = ' fxx(x>y) ' fx(x) 0, fx(x) > °; otherwise. a) Prove that for each C G B, P(Y eC\X = x fY\x(y\x)dy for /zx-almost all x. Hint: Use Exercise 6.98(d). b) Determine Р(У G С | X).
6.7 Signed and Complex Measures □ 377 6.7 SIGNED AND COMPLEX MEASURES In Section 6.5 we introduced the concept of a signed measure. Now, in this section, we will further investigate signed measures and also introduce the concept of complex measures. Recall that if (fl, A) is a measurable space, then a signed measure, z/, on A is an extended real-valued function satisfying the following two conditions: • i/(0) = 0. • If Ai, A2, ... are in A, with Ai = for i ± j, then р(иЛп) = ' n ' n For this definition to make sense, v cannot take on both 00 and —00 as values. We proved the Jordan decomposition theorem—that any signed mea- sure v can be expressed uniquely as the difference of two mutually singular measures, z/+ and z/“, on A. The representation z/ = z/+ — v~ is called the Jordan decomposition of v. In fact, if (Z>, Dc) is a Hahn decomposition for z/, then z/+(A) = z/(A A Z>) and z/“(A) = — v(A A Dc). Now we will give names to the measures z/+ and z/~ and define yet another measure corresponding to a signed measure. DEFINITION 6.14 Variations of a Signed Measure Suppose that (Q, A) is a measurable space and that z/ a signed measure on A with Jordan decomposition v == z/+ — z/“. Define |z/| = i/+ 4- y~. The measures z/+, z/~, and |z/| are called, respectively, the positive variation, negative variation, and, total variation of v. Note that |z/| is a measure, and |z/| = v if v is a measure. Before proving our next result, we introduce the following terminology.
378 □ Chapter 6 Differentiation DEFINITION 6.15 Measurable Partition Let (Q, Л) be a measurable space and A E A A finite sequence, {А*}£=1, of subsets of Q is said to be a measurable partition of A if the AkS are Л-measurable, pairwise disjoint, and their union is A. That is, a) Afc e Л, fc = 1, 2, ..., n, b) Ai nAj = $ for i / j, and c) UZ=1A = A. The next proposition shows that the concept of the total variation of a signed measure is similar to that of the total variation of a function. Exercises 6.103 and 6.104 further explore the analogy. PROPOSITION 6.11 Let (Q, Л) be a measurable space and v a signed measure on A. Then, for each A € A, {71 ' |p(Afc) | : {Лй}£=1 is a measurable partition of A > . fc=i PROOF: Let {А^}^-! be a measurable partition of A. Then, because |l/| is a measure, |i/|(A) = |i/|(Afc). From Exercise 6.101(b), we know that МИ*) > ИЛ01- Therefore, |p|(A) > |iz(Afc)| and, so, {n ' У2 HAfc)| : {^fc}fc=i is a measurable partition of A > . fc=i To prove the reverse inequality, let (Z>, Dc) be a Hahn decomposition for i/, and set Ax = АПР and A2 = AC\DC. Then {Ai, A2} is a measurable partition of A and we have p(Ai)| + |iz(A2)| = ^+(A) + i/“(A) = |i/|(A). Hence, {71 ' У2 |^(Afc)| • Mfc}fc=i is a measurable partition of A ►. fc=i This completes the proof. Next we define the abstract Lebesgue integral of a measurable function with respect to a signed measure. It should be clear that the following definition is reasonable and natural.
6.7 Signed and Complex Measures □ 379 DEFINITION 6.16 Integral with Respect to a Signed Measure Suppose that (Q, Л) is a measurable space and that v a signed measure on A with Jordan decomposition v = — v~. Let E G A and f be an extended real-valued or complex-valued Л-measurable function on Q. Then the (abstract) Lebesgue integral of f over E with respect to 1/ is defined by [ fdv = [ f dv+ - [ e Je Je f du , provided the right-hand side makes sense. EXAMPLE 6.14 Illustrates Definition 6.16 Let (Q, Л) = (7£, A4) and и = Л — 6q — 6i. Then = A and u~ = 60 -F <5^ Let f(x) = x3 + 2. We have and * [ fdu = [ (x3 + 2)dx- [ (z3 + 2) d((50 + <5i)(z) •/[0,1] Jo «/[0,1] = J-'2 + 3> = -T' as the reader should verify. □ Complex Measures Here now is the definition of a complex measure. DEFINITION 6.17 Complex Measure Let (Q, Л) be a measurable space. A complex measure, i/, on A is a complex-valued countably additive set function; that is,
380 □ Chapter 6 Differentiation a) z/(A) G C for all A € A. b) If Ai, A2, ... are in A, with Ai П Aj = 0 for i j, then р(иЛп) = ' n ' n Remark: Because, for a complex measure, v(A) G C for each A G A, we see that there are no sets of infinite v measure. It follows easily from countable additivity that p(0) = 0. EXAMPLE 6.15 Illustrates Definition 6.17 a) Any finite measure or any finite signed measure is a complex measure. b) Let (Q, A) be a measurable space and у a complex measure on A. De- fine, for A G A, (SRp)(A) = 3?(i/(A)) and (9p)(A) = 3(p(A)). Then it is easy to see that Jiz/ and 3т/ are finite signed measures on A and у = SRz/ + i^y. c) Lebesgue measure is not a complex measure on AL (Why?) d) Let (Q, A) be a measurable space and 1 < к < 4, finite measures on A. Define у = i/i — z/2 + ^2 — 1^4, that is, i/(A) = z^i(A) — У2 (A) -j- iy% (A) — 21/4 (A) for A G A. Then у is a complex measure on A. As we will see shortly, all complex measures are of this form. e) Let (fl, A,/z) be a measure space and f G Define p(A) = [ f dp, A G A. J A Then, у is a complex measure on A. □ Using the Jordan decomposition theorem, we can easily establish the following theorem. Its verification is left to the reader as an exercise. THEOREM 6.11 Let (Q, A) be a measurable space and у a complex measure on A. Then there exist unique measures, y+, у±, у^ y2 > suc^ yi y2 y2 > and i/ = i/+ — y~ 4- iy+ — iy~, Next we want to define the total variation, |zz|, of a complex measure y. In view of Proposition 6.11 (page 378), we make the following definition.
6.7 Signed and Complex Measures □ 381 DEFINITION 6.18 Total Variation of a Complex Measure Let (fl, A) be a measurable space and v a complex measure on A. For each A e A, define {n |^(Afc)| : {Afc}JJ=1 is a measurable partition of A > . k=l The set function |z/| is called the total variation of v. Because of Proposition 6.11, Definition 6.18 is consistent with the def- inition of total variation of signed measures. The next proposition shows that, as is the case for signed measures, the total variation of a complex measure is a measure. PROPOSITION 6.12 Let (Q, A) be a measurable space and v a complex measure on A. Then |zz| is a finite measure on A. PROOF: It follows from Definition 6.18 and Theorem 6.11 that |p|(A) < ^+(A) + ^(A) + p2+(A) + p2-(A) for all A E A. Since the right-hand side of the preceding inequality is finite, we see that |p|(A) < сю for all A E A. Clearly, 11,|(0) = 0 and |i/|(A) > 0 for A E A. Now, let {An}n be a pairwise disjoint sequence of A-measurable sets. We can, without loss of generality, assume that the sequence is infinite. Set A = iXXi Ai- We claim that M(A) = f>|(An). n=l Let а < |p|(A). Then there is a measurable partition of A, say, such that а < |i/(Efc)|. But m oo <££№пап)| fc=l n=l oo m oo = 52^№пап)|<52м(>1п), n=l k=l n=l
382 □ Chapter 6 Differentiation where the last inequality holds because, for each n, {E^ П is a measurable partition of An> Consequently, we have |^|(An) > a for each a < |p|(A). It now follows that |i/|(A) < 1И(An)- Next we prove the reverse inequality. Let 6 > 0 be given. For each n E ЛС we can choose a measurable partition {Еп^}^2_х of An such that |^(-Enfc)| > \v\(An) - e/2n. Let N e AT be fixed but arbitrary. Then {Enk : 1 < к < kn, l<n<7V}isa measurable partition of Un=i An- Using Exercise 6.107, we obtain that 7 N X N kn 'n=l ' n=l fc=l N NN > £ (m (a.) ^) = 5>i<a„) - £ n=l n=l n=l Letting N —* oo gives |^|(A) > IH(An) — As 6 > 0 was chosen arbitrarily, we have J2^LX |i/|(An) < |p|(A). Radon-Nikodym Theorem for Complex Measures Next, we will prove the Radon-Nikodym theorem for complex measures. We begin with the following obvious extension of the definition of absolute continuity. DEFINITION 6.19 Absolutely Continuous Complex Measures Let (Q, A, p) be a measure space and v a complex measure on A. Then v is said to be absolutely continuous with respect to /z, denoted и д, if p(A) = 0 whenever //(A) = 0. We have noted previously that if f e £X(Q, A, /z), then the set function . i/(A) = у f dp, Ae A, is a complex measure and, clearly, v < /z. The following generalization of the Radon-Nikodym theorem shows that, under cr-finite conditions, the converse is true.
6.7 Signed and Complex Measures □ 383 THEOREM 6.12 Radon-Nikodym Theorem for Complex Measures Let (£1, A, /z) be a cr-finite measure space and v a complex measure on A. If v p, then there is a function f 6 С1 (£1, A, p) such that p(A) = J f dp, A e A. (6.40) Moreover, f is unique in the sense that if g is an A-measurable function with p(A) = fAgdp for all A € A, then g = f p-ae, PROOF: First we assume that v is a finite signed measure. Let (D, Dc) be a Hahn decomposition for p, and let and v~ be the positive and negative variations of v. Because v is finite, so are and v~. Suppose p{A) = 0. Then p(AnD) = 0 and, therefore, because v p, we have p+(A) = у (A A D) = 0. Hence, i/+ p. Similarly, v~ C p. Applying the Radon-Nikodym theorem (Theorem 6.10 on page 365), we conclude that there exist nonnegative Л-measurable functions, /i and /2, such that for each A € A, p+(A) = / fadp and i/“(A)= / /2 dp. J A J A Note that /i,/2 € £x(/z) because and v~ are finite measures. Letting / = /1 — /2, we see that f E £х(/х) and that (6.40) holds. To prove uniqueness, suppose that g is an Л-measurable function such that v(A) = fAgdp for each A E A. Since v is a finite measure, g E £1(/x) and, hence, so is f — g. Now, let E = { x : f(x) > g(x) }. We have [ (f-g)dfi= f f dp, — [ gdp, = v(E) - i/(E) = 0. Je Je Je Because f — g > 0 on E, it must be that p(E) = 0. Hence, f < g p-a&. Similarly, f > g p-ae. Consequently, g = f p-ae. Now suppose that и is a complex measure. Applying what was just proved to the finite signed measures SRp and we find that there are functions gi,g2 € ^(p) such that for each A E A, (SRi/)(A) = [ g\dp and (£h/)(A) = [ g2dp. J a J a Letting f = gi + ig2 yields a function in C\p} such that (6.40) holds. Uniqueness follows easily from the uniqueness established in the preceding for finite signed measures.
384 □ Chapter 6 Differentiation Conditional Expectation Given a a-algebra Let (Q, A, P) be a probability space and F an event having positive prob- ability, that is, F G A and P(F) > 0. Recall that the set function, P/?, defined by Pf{E) = P(E| F) = E G Д is a probability measure on A. Hence, we can define expectation with respect to P/?. This is called conditional expectation relative to F. Thus, if Y e £1(О,Л, P), then the conditional expectation of Y given that event F has occurred is defined by £(У|Р) = [ Y dPF. Jn According to Exercise 5.84(a) on page 300, £(ylF)=p<w/‘ip- We can generalize the notion of conditional expectation by condi- tioning on a a-algebra instead of just an event. The idea is that if Q is a a-algebra with Q С Л, then conditioning on Q means that we know whether or not each G G G has occurred; and the conditional expectation of a random variable У, given denoted £(Y | £), is the expectation of Y computed with that knowledge. To see how to define £(У | £), we consider the simplest nontrivial case. Suppose that F is an event with probability strictly between 0 and 1. Let Q be the a-algebra generated by F, that is, the smallest a-algebra containing F; clearly, Q = { 0, F, Fc, Q }. Then, given Q, we know whether or not F has occurred, that is, whether F has occurred or Fc has occurred. In the former case, we have £(YfG) = £(У|Р) and, in the latter case, £(У | G) = £(У | Fc). In other words, £(y|^) = £(y|F)XF + £(y|Fc)XFc. Note that £(Y | Q) is not only a random variable (i.e., is Л-measurable), but is in fact ^-measurable. Furthermore, it is not too difficult to see that f YdP= [ £(Y\6)dP, Ge в. (6.41) Jg Jg We can use the complex version of the Radon-Nikodym theorem to show the existence of conditional expectation given a a-algebra in the gen- eral case. Keeping (6.41) in mind, we have the following proposition.
6.7 Signed and Complex Measures □ 385 PROPOSITION 6.13 Existence of Conditional Expectation Let (Q, A, P) be a probability space, Y G £*(Q, A, P), and G a a-algebra with G C A. Then there exists a G-measurable function, £(Y\G), such that [ Y dP = [ £(Y\G)dP, r GeG- (6.42) Jg Jg Moreover, such a function is unique P-ae and is called the conditional expectation of Y given G> PROOF: Define py(G) = fGY dP, for G € G- Then py is a complex measure on G and py < P. The result now follows from the complex version of the Radon-Nikodym theorem. Although it follows trivially from (6.42), it is important to note that £(£(Y|(7)) = SY. We will investigate further properties of conditional expectation given a cr-algebra in the exercises. EXERCISES 6.7 6.101 Let у be a signed measure on (Q,A) and у = y+ — y~ its Jordan decom- position. Show that for A G A, a) -v-(A) < y(A) < y+(A) b) \y(A)\ < \y\(A) 6.102 Let У1 and У2 be finite signed measures on (fl, A). Prove that l^i + ^1 < Ы + Ы; that is, 11/1 + 1/2KA) < 11/11 (A) + 11/21 (A) for each A G A. 6.103 Let f be an extended real-valued Borel-measurable function, integrable with respect to Lebesgue measure. Then y(B) = fB f dX is a finite signed measure on B. Define F„(x) = i/((—00, a?]) = f(t) dt. Prove that |iz|((a,fe]) = VakF„, —oo < a < b < oo. 6.104 Exercise 6.103 can be generalized: Let у be a finite signed measure on B. Define Fu(x) = y((—00, я]) for x G TZ. Then, it can be proved that |i/|((a, b]) = V^Fy, —00 < a < b < 00. Show that the preceding equation does not necessarily hold for other types of intervals, even if у is a measure. 6.105 Suppose that (Q,A) is a measurable space and that у a signed measure on A with Jordan decomposition у = y+ — y~. a) Show that f G £1/|i/|) if and only if f G L1(y+) О £1(i/“).
386 □ Chapter 6 Differentiation b) Suppose that f is Л-measurable and \f\ < M on Q. Prove that for each E G A, < M- \y\(E). 6.106 Prove Theorem 6.11 on page 380. 6.107 Let (Q, A) be a measurable space and у a complex measure on Л. Using only Definition 6.18 on page 381, prove that |i/| is monotone; that is, if A, В e A and A С B, then |i/|(A) < |i/|(B). 6.108 Let (Q, A) be a measurable space and i/ a complex measure on A. a) Prove that |i/(A) | < |i/| (A) for each A G A. b) From part (a), we have |i/(A)| < |i/|(A) for each A G A. Prove that |i/| is the smallest measure dominating у in that sense; that is, if т is a measure on A such that |i/(A)| < r(A)Tor each A G A, then |i/| < r. 6.109 Let (Q, A) = (7£,P(7£)) and у — 6o 4- гбо. Determine |i/|. 6.110 Recall that the total variation of a signed measure, i/, is |i/| = i/+ + i/”, where у = — i/“ is the Jordan decomposition of y. According to Theo- rem 6.11, a complex measure can be uniquely decomposed into four mea- sures as i/ = — i/f 4- Щ j where y+ -L ui an<^ u2 Show that it is not generally true that |i/| equals y+ + yj~ + yf 4- 6.111 Let (Q, A) be a measurable space. Prove the following facts. a) If v is a complex measure on A and a G C, then |ai/| = |a||i/|. b) If i/i and V2 are complex measures on A, then |i/i 4-1/2! < |^i| 4- |i/21- ★6.112 Let (Q, A) be a measurable space. For each complex measure, 1/, on A, define ||i/|| = |i/|(Q). Prove that a) || 1/1 4- У21| < H^i || 4- || 1/21| for all complex measures 1/1 and 1/2 on A. b) ||cti/|| = I ct 11| ИI for all complex numbers a G C and all complex mea- sures v on A. c) ||i/|| = 0 implies that v = 0. 6.113 Let v be a complex measure on (Q, A) and у = 1/+ — i/f 4- iy% — ^2 the decomposition of 1/ given by Theorem 6.11 on page 380. Show that f G if and only if f G ) A n 6.114 Let у be a complex measure on (Q, A). If f G £1(|i/|), define / fdy= / f dy+ —If dvi + i I f dvz —if dy^. Jn Jn Jn Jn Jn (Exercise 6.113 shows that the right-hand side of the foregoing equation makes sense.) Prove the following: a) If /, g G £1(|i/|), then / (/ + 9) dy = / f dy + / g dy. Jn Jn Jn
6.7 Signed and Complex Measures □ 387 b) If a E C and f E then ★6.115 Refer to Exercise 6.114. Let (Q, Л,/z) be a measure space and f E £1(/z). Define fdp, A E A. Prove that for <? E and A E Л, 6.116 Let (Q, A,p) be a measure space and f E ^(p). Define z/i A E A. Prove that М(Л) = A E A. Hint: To prove that |i/|(A) > fA\f\ dp, choose a sequence {sn}^! of A-measurable simple functions such that sn —► xa • sgnj pointwise on Q, where “ denotes complex conjugation and (sgn7)(z) = I /(rc)/l/(x)h /(ж) / 0; №) = 0. Show that each sn can be chosen so that |sn| < 1. ★6.117 Let (О,Л) be a measurable space and v a complex measure on A. a) Prove that there is an Л-measurable function ф on Q such that |</>| = 1 and i/(A) = y* A E A. b) Hint: Use the Radon-Nikodym theorem and Exercise 6.116. Suppose that / is Л-measurable and \f\ < M on Q. Prove that for each E E Л, f fdv E < M • |i/|(E). Hint: Refer to part (a) and Exercise 6.115.
388 □ Chapter 6 Differentiation In Exercises 6.118-6.129, (£1,А,Р) is a probability space, Y G £*(0, A, P), and Q is a a-algebra of subsets of Q with Q C A. 6.118 Suppose that F is an event with probability strictly between 0 and 1. Let Q be the ст-algebra generated by F, that is, Q = { 0, F, Fc, Q }. Define 8(Y \g) = E(Y \F)Xf £(Y \Fc)Xfc- a) Show that 8(Y | Q) is (/-measurable. b) Prove that for each G G Q, fG Y dP = fG S(Y | Q) dP. Hint: Refer to Exercise 5.84 on page 300. 6.119 Suppose that {Fn}n is a sequence of pairwise mutually exclusive events each having positive probability and such that (Jn Fn = Q. Let g be the cr-algebra generated by {Fn}n- a) Characterize the sets in g. b) Prove that, with probability one, £(Y | g) = £(Y | Fn)xFn- 6.120 Show that, for each E G A, P(E 16) = £(xe | 6) P-ae. Thus, conditional probability is a special case of conditional expectation. 6.121 Suppose that Y is (/-measurable. a) What does your intuition tell you regarding £(У | (7)? b) Prove your assertion in part (a). 6.122 Suppose that g = {Q, 0}. a) What does your intuition tell you regarding £(У | (7)? b) Prove your assertion in part (a). 6.123 Let У, У1, Y2 G A, P) and a, an, a2 G C. Establish that each of the following holds with probability one. a) If У = a, then S(Y | g) = a. b) If У1 < У2, then £(У11 g) < S(Y2 | g). с) £(аУ|0)=а£(У|0). d) £(У1+У2|6) = £(У1|6) + £(У2|6). e) m£)|<£(|Y||£). 6.124 Let be a nondecreasing sequence of nonnegative, integrable ran- dom variables converging to the integrable random variable У. Prove that, with probability one, limn—oo £(Yn | g) = £(У | g). 6.125 Suppose g± and g2 are cr-algebras such that Si C S2 C A. Prove that, with probability one, а) £(£(У|61)|62)=£(У|61). b) £(£(У |62) 101) = £(У 161). 6.126 Suppose that Z is ^-measurable and ZY G £1(Q, Л, P). Prove that, with probability one, 8(ZY 15) = ZS(Y | g). 6.127 Conditional expectation given a random variable: A special case of conditional expectation given a cr-algebra is when the cr-algebra is gener- ated by a random variable, X. Recall from Exercise 6.98 on page 375 that the cr-algebra generated by X, denoted A(X), is the smallest cr-algebra of subsets of Q for which X is measurable. Exercise 6.98(a) shows that
6.7 Signed and Complex Measures □ 389 -4(^0 = {{X € В} : В e B}. If Y € £*(0,^4, P), then we define the conditional expectation of Y given X, denoted £(У|Х), to be the conditional expectation of Y given Д(Х); that is, by definition, £(У|Х)=£(У|Д(Х)). a) Prove that there is a Borel measurable function, 0, such that 8(У | X) = ф о X, P-ae. Hint: First assume У > 0. b) Let ф be the function in part (a). For x € 7£, set £ (У | X = x) = </>(x), called the conditional expectation of Y given X = x. Prove that I Y dP = I £(Y\X = x)dp.x(x\ BeB, J{XEB} JB where /ix is the probability distribution of X. Hint: Use Theorem 5.6 on page 291. c) Prove that if g is a Borel measurable function such that / YdP= g(x)dp.x(x), BeB, J{xeB} Jb then, for /zx-almost all ж, g(x) = £(У | X = x). 6.128 Refer to Example 5.7(a) on page 279. Suppose X and У are jointly discrete random variables with joint probability mass function px,r- Define ( рх.у(д,у) py|x(y|s) = < px(x) ’ p{' ( 0, otherwise. a) Prove that £(Y\X = x) = Y/yPr\x(y\x) У for each possible value, ж, of X. Hint: Use Theorem 5.7 on page 293 and Exercise 6.127(c). b) Determine 8(Y | X). 6.129 Refer to Example 5.7(b) on page 279. Suppose X and У are jointly ab- solutely continuous random variables with joint probability density func- tion fx,Y- Define fy\x(y\x) = ' fx,v(x, y) * fx(x) 0, fx(x) > 0; otherwise. a) Prove that £(У|Х = х) = jyfY\x(y\x)dy for /ix-almost all x. Hint: Use Theorem 5.7 on page 293 and Exer- cise 6.127(c). b) Determine S(Y | X).
390 □ Chapter 6 Differentiation 6.8 DECOMPOSITION OF MEASURES Ill this section, we will study several results regarding the decomposition of measures. Our first result, known as the Lebesgue decomposition theorem, is a consequence of the Radon-Nikodym theorem. THEOREM 6.13 Lebesgue Decomposition Theorem Let (Я, A, /z) be a а-Gnite measure space and v a а-Gnite measure on A. Then there exist measures, щ and 1/2, on A such that i/i p, 1/2 J- p, and v = +1/2. Moreover such a representation is unique. It is called the Lebesgue decomposition of i/ with respect to p. PROOF: Clearly p C p + i/, and p and p + v are cr-finite. Therefore, the Radon-Nikodym theorem implies that there is a nonnegative Л-measurable function f on Q such that p(A) = У fd(p + v), AeA. Let E = {x : f(x) > 0}. Obviously, then, p(Ec) = 0. Define measures i/i and 1/2 on A by 1/1 (A) = i/(A Pi E) and 1/2 (A) = i/(A П Ec). Clearly, 1/ = i/j 4-1/2. Moreover, since 1/2(E) = 0 and p(Ec) = 0, we see that 1/2 ± p. We claim that 1/1 p. So, suppose p(A) = 0. Then I fd(p + v)= I fd(p + v) = p(A) = 0. JАПЕ JA Because f > 0 on А П E, it must be that (/z -F i/)(A П E) =0 and, hence, 1/1 (A) = i/(A Cl E) must equal zero. It remains to prove uniqueness. Assume that v = cui ~F<^2, where u/i <C p and 0,2 J- p. We must show that = 1/1 and o>2 = ^2 • Because 1/2 JL p and ± p, there exist sets BrC € A such that /z(B) = /z(C) = 0 and'1/2 (Bc) = 0,2 (C^) = 0. In particular, then, any subset of В U C has /z-measure zero and any subset of Bc О Cc has 1/2- and o>2-nieasure zero. Since 1/1 p and p, it follows that any subset
6.8 Decomposition of Measures □ 391 of В U C also has and cui-measure zero. Thus, for A e A, (J2(A) = u2(AQ(BuC)) + ш2(АП(ВсПСс)) = u2(AQ(BuC)) = <Л (А П (B U C)) -h cu2 (А П (B U С)) = p(An (BU C)) = Pi (А П (B U C)) + p2(A П (BU C)) = p2(A h (B U C)) = p2(A n (B U С)) + p2(A n (Bc n Cc)) = p2(A). Hence cj2 = p2. A similar argument shows that cji = px. EXAMPLE 6.1 6 Illustrates the Lebesgue Decomposition Theorem Let F: TZ —► TZ be defined by (0, < 3-e"*, A~e~x, x < 0; 0 < x < 1; x > 1. Note that В is a distribution function; that is, F is nondecreasing, right continuous, bounded, and F(x) —> 0 as x —► —oo. According to Theo- rem 4.13 on page 226, there is a unique finite Borel measure, p, having F as its distribution function. We will obtain the Lebesgue decomposition of p with respect to Л (considered a Borel measure). Define f(x) = e~x for x > 0 and zero otherwise, and let ur(B) = [ f(t)dt, BeB. Jb Also, set p2 = 2<5q + <$1- Then, clearly, Pi Л and p2 ± A. A simple calculation shows that pi + p2 has F as its distribution function, which implies that p = Pi + p2. Therefore, we have found the unique Lebesgue decomposition of p with respect to A. Example 6.20 provides an alternate (and more straightforward) method for obtaining this decomposition. □ Further Decomposition of Measures Suppose that (Q, A, p) is a ст-finite measure space such that {x} e A for all x e Q. We will show that if p is a ст-finite measure on A, then' it can be decomposed into three mutually singular measures, of which one is absolutely continuous with respect to p, one is singular with respect to p and has no atoms, and one is singular with respect to p and is discrete. To begin, we recall the following definitions.
392 □ Chapter 6 Differentiation DEFINITION 6.20 Atoms Let (fi, Д) be a measurable space such that {x} G A for all x G Q and let i/ be a measure on A. An element x G Q is said to be an atom of z/ if z/({x}) > 0. DEFINITION 6.21 Continuous and Discrete Measures Let (fl, A) be a measurable space such that {x} G A for all x € П. a) A measure v on A is said to be continuous if it has no atoms, that is, z/({x}) = 0 for all x G fl. b) A measure и on A is said to be discrete if there is a countable subset К of Q such that v(Kc) = 0. EXAMPLE 6.1 7 Illustrates Definitions 6.20 and 6.21 a) Let (П,Л) = (7£,A4). Lebesgue measure, Л, is continuous; the measure 6o + is discrete; and the measure A + 6q + is neither continuous nor discrete. For the latter two measures, the set of atoms is {0,1}. b) Refer to Example 5.5 on page 276. Let X be a random variable on (fl, A, P) and fix its probability distribution, that is, дх(В) = P(x g В), Be B. X is discrete if and only if fix is a discrete measure on В; X is continuous if and only if fix is a continuous measure on B. □ The following proposition was (essentially) proved in Exercise 4.6 on page 172, but we state it formally here for completeness. The proof is left to the reader. PROPOSITION 6.14 Let (fl, A) be a measurable space such that {x} G A for all x G П, and let и be a measure on A. Then v is discrete if and only if there is a countable subset К of Q such that v = HxeK We can always take К to be the set of atoms of и. The next proposition shows that, under ст-finite conditions, we can decompose a measure as the sum of a continuous and discrete measure.
6.8 Decomposition of Measures □ 393 PROPOSITION 6.15 Let (Q, A) be a measurable space such that {z} e A for all x € Q, and let v be a cr-finite measure on A. Then there exist mutually singular mea- sures, i/c and i/j, on A such that i/c is continuous, is discrete, and v = vc + i/j- Moreover, such a representation is unique. PROOF: First assume v is finite. We claim и has countably many atoms. Let F С Я be finite. Then Y^xeF ^({z}) = ^(F) < i/(Q). Taking the supre- mum over all finite subsets of Я, we deduce that ^2xEq ^({^}) < < oo. Consequently, by Exercise 2.37 on page 57, only countably many of the ^({z})s are nonzero; that is, v has only countably many atoms. Let К denote the set of atoms of v. As we have just seen, К is countable. Therefore, the measure i/j = Ихек ^({я})^ *s discrete. Let vc = v — To show that vc is a measure on A, it is enough to show that it is nonnegative, because the other two conditions for being a measure are clearly satisfied. Noting that i/({x}) — ^d({^}) for each x e K, we conclude that и and щ agree on all subsets of K. Let A G A. Then i/(A) = i/(A П К) + i/(A П Kc) = pd(A П К) + p(A A Kc) and pd(A) = i/d(A П K) + П Kc) = i/d(A A K). Consequently, pc(A) = i/(A) - pd(A) = р(Л П Kc) > 0. Thus, vc is a measure. Recalling that К denotes the set of atoms of p, we can apply the previous equation with A = {x} to conclude that pc is continuous. Indeed, if x G К, then uc({x}) = i/(0) = 0. On the other hand, if x K, then it is not an atom of и and we have ^c({^}) = ^({x}) — 0. Now assume that и is ст-finite. Select a countable collection, {En}n, of disjoint Л-measurable sets of finite i/-measure whose union is Q. For each n e Af, define the measure vn on A by i/n(A) = v(En A A). Then vn is a finite measure on A. Consequently, by what we just proved for finite measures, we can write vn = i/nc + i/nd, where i/nc is continuous and i/nd is discrete; moreover, i/nd = ^({z})^ — Их^кп where Kn denotes the set of atoms of vn. Note that Kn consists of those atoms of v lying in En. Let К = \JnKn. As each Kn is countable, so is K\ moreover, К is the set of atoms of v. Let uQ — unc and v<i=52Vnd = 52 (6-43) n xQK
394 □ Chapter 6 Differentiation Because each i/nc is continuous, so is i/c; and, because К is countable, i/a is discrete. We have v=52Уп = 52(Pnc+Pnd)= 52 vnc+52 ynd = +^d- n n n n Moreover, because i/c is continuous and is discrete, it follows easily that i/c ± pj- (See Exercise 6.135.) It remains to prove the uniqueness of the decomposition. So, assume we have и = тс + та, where тс is continuous and Ta is discrete. By Propo- sition 6.14 we can write та = 52Td (W)^’ (6-44) xec where C is the collection of atoms of rj. On the other hand, since rc is continuous, we have та({т}) = i/({x}) for all x. Therefore, the set of atoms of tj is identical to that of i/, namely, K. Therefore, С = K. Since С = К and та({я}) = i/({r}) for all x, we see from (6.43) and (6.44) that та = v&. Next note that because tc and i/c are continuous and К is countable, тс(Л П К) = vc(A П К) = 0 for all A e A. Consequently, for A e A, rc(A) = rc(A ПК) + rc(A П Kc) = rc(A П Kc) = rc(A П Kc) + та (А П Kc) = v(A П Kc) = vc(A П Kc) + i/d(A A Kc) = vc(A П Kc) = i/c(A П K) + i/c(A П Kc) = i/c(4). In other words, tc = i/c. THEOREM 6.14 Let (Q, A,/z) be a а-finite measure space such that {x} e A for all x e Q, and let у be a a-finite measure on A. Then there exist measures, u^, i/8C, and v&, having the following properties: a) v&c < p, i/sc J- p, and v& L p b) v8C is continuous and is discrete c) v = i/ac + *4c + vd Moreover, the representation in (c) is unique. PROOF: By the Lebesgue decomposition theorem, there exist unique mea- sures, i/i and i/2, on A such that i/i p, 1/2 ± p, and 1/ = 1/1 + 1/2. Set Vac = vr.
6.8 Decomposition of Measures □ 395 Because v is а-finite, so is z/2. Consequently, by Proposition 6.15, there exist unique mutually singular measures, z/c and z/d, on A such that z/c is continuous, i/d is discrete, and z/2 = *4 4- ^d- Setting z/sc = z/c, we have и — ^ac + ^sc + ^d- So, (b) and (c) hold. Because z/2 ± д, it is easy to see that z/sc ± p and z/d J- p. Thus, (a) holds. It remains to prove uniqueness. So, suppose rac, tsc, and rj are mea- sures on A that satisfy (a)-(c). Let r2 = rsc + tj. Since rsc ± д and та ± /1, we have т2 ± д. Therefore, by the uniqueness part of the Lebesgue decomposition theorem, rac = z/ac and r2 = z/sc + But then, by the uniqueness part of Proposition 6.15, rsc = z/sc and та = z/d. Remark: We leave it as an exercise for the reader to show that the mea- sures z/ac, z/sc, and z/j in Theorem 6.14 are mutually singular. EXAMPLE 6.18 Illustrates Theorem 6.14 Let (Г2,Л, д) = (7?.,Л4,/х), where p is counting measure on AT, that is, Д = SXi Let z/ = <50 + <51 + A. Then z/ac = 6i, z/sc = Л, and z/d = <f>o- Moreover, we have dz/ac/d/z = d6\/dp = X{i} Д-ае. This simple example shows that the absolutely continuous component of a measure need not be a continuous measure. □ Decomposition of Finite Borel Measures An important application of Theorem 6.14 is to the decomposition of finite Borel measures on H with respect to Lebesgue measure (considered a Borel measure), that is, where (Q,X, /z) = (7£, B, A|#) and и is a finite Borel measure. When there is no possibility of confusion, we will, as before, write Л in place of A|#. First we have the following obvious corollary to Theorem 6.14. THEOREM 6.15 Decomposition Theorem for Finite Borel Measures Let v be a Unite Borel measure on TZ. Then v can be decomposed uniquely as the sum of three Unite Borel measures, z/ac, z/sc, and z/d, where z/ac is absolutely continuous with respect to A, z/sc is continuous and singular with respect to A, and z/d is discrete. Our next task is to express the conclusions of Theorem 6.15 in terms of distribution functions. First we state the following proposition relating the atoms of a finite Borel measure to the discontinuities of its distribution function. The proof is left as an exercise for the reader.
396 □ Chapter 6 Differentiation PROPOSITION 6.16 Let у be a finite Borel measure on TZ and Fy its distribution function. Then i/({x}) = Fy(x) - Fy(x-), x G TZ. (6.45) Consequently, the set of atoms of у is precisely the set of discontinuities of Fy. In particular, у is a continuous measure if and only if Fy is a continuous function. EXAMPLE 6.19 Illustrates Proposition 6.16 a) Define '0, F(x) = < ^(x), 1, x < 0; 0<x< 1; x > 1, where ip denotes the Cantor function. Let v be the unique Borel measure having F as its distribution function. Because F is continuous on TZ, Proposition 6.16 shows that и is a continuous measure. b) Let {xn}n be a sequence of distinct real numbers and {an}n a sequence of positive real numbers such that ^2nan < oo. Set у — ^2пап^хп- Then и is a discrete measure and its set of atoms is {xn}n. The distribu- tion function of у is given by Fy(x) = <x an. It follows from Propo- sition 6.16 that F is continuous except at xn, n = 1, 2, ..., where it has, respectively, a “jump discontinuity” of magnitude an, n = 1, 2,... . Note that if у has only a finite number of atoms (i.e., {xn}n and {an}n are finite sequences), then Fy is a step function on TZ. □ In view of Example 6.19(b), it is natural and reasonable to make the following definition. DEFINITION 6.22 Discrete Distribution Function A distribution function, F, is said to be discrete if it can be expressed in the form F(x) = Exn<«an, where {zn}n is a sequence of real numbers and {an}n a sequence of positive real numbers with an < oo. The following two propositions will also be required for our decom- position of distribution functions. The proof of the first one is left to the reader as an exercise.
6.8 Decomposition of Measures □ 397 PROPOSITION 6.17 A distribution function, F, is discrete if and only if the Lebesgue-Stieltjes measure corresponding to F is a discrete measure. PROPOSITION 6.18 Let v be a finite Borel measure on 1Z and F„ its distribution function. Then v is singular with respect to Lebesgue measure if and only if F„ = 0 A-ae. PROOF: For convenience, set F = F„. Suppose that v ± A. Select A € В such that v(A) = 0 and A(AC) = 0. Let E = { x : F'(x) > 0 }. We have f F'dA= [ F'dX + [ F'dX = [ F'dX< [ F' dX. E J Eft A JeC]Ac J eh a J a We will show that the last integral in the previous display is zero. This will imply that F' dX = 0 which, in turn, implies that X(E) = 0, as required. Let C be the collection of intervals of 1Z of the form (a, b] and (c, oo), where —oo < а < b < oo and —oo < c < oo. Then, by Exercise 4.96, C is a semialgebra and A(C) = B. Because F is nondecreasing, we have by (6.27) on page 347 that dA<F(b)-F(a) = i/((a,b]), —oo < а < b < oo. From this it follows easily that F'dX<v(C), CeC. (6.46) Let e > 0. By Lemma 4.1 (page 214) and Lemma 4.3 (page 216), we can select a sequence, {Cn}n, of pairwise disjoint members of C such that Un Cn D A and v (Un Cn) < v(A) 4- e. Because i/(A) = 0 and the Cns are pairwise disjoint, we thus have i/(Cn) < c From Proposition 4.18 on page 223, we have i/((a, b]) = F(b)—F(a), for —oo < a < b < oo, where we are using the convention that (a, oo] = (a, oo). Writing Cn = (an>bn] and referring to (6.46), we conclude that [ F'dX< [ F'dX^Y [ F'dX< V^Cn) < e. A ‘'Un n Je™ n As e > 0 was chosen arbitrarily, it follows that fA F1 dX = 0.
398 □ Chapter 6 Differentiation Conversely, suppose that F' = 0 A-ae. By the Lebesgue decomposition theorem, we can write у — i/i 4-1/2, where 1/1 A and 1/2 -L A. Because у is finite, so are v\ and 1/2- We must show that 1/1 = 0. By the Radon-Nikodym theorem, there is a nonnegative Lebesgue mea- surable function, /, such that [ fdX, J в Be В, (6-47) and, because v\ is finite, f e £1(TC). By Corollary 6.2 on page 341, we have f = Fyi A-ae. Also, by the necessity part of this proposition (proved in the foregoing), we know that F^ = 0 A-ae. Noting that Fy = Fyi + F^, we conclude that 0 = F^ = F'T 4-F'2 = F'x A-ae. Therefore, f = Fyi = 0 A-ae, which, by (6.47), implies that 1/1 = 0. THEOREM 6.16 Decomposition Theorem for Distribution Functions Let F be a distribution function. Then F can be expressed uniquely in the form F = Fac 4- Fsc 4- Fj, (6.48) where Fac is absolutely continuous, Fsc is continuous and F'c = 0 A-ae, and Fd is discrete. Moreover, F'c = F' A-ae. (6.49) PROOF: Let у be the Lebesgue-Stieltjes measure corresponding to F. Ap- plying Theorem 6.15, we can write у = yac 4- ysc 4- yd, where y&c C A, 14c ± A and is continuous, and i/j is discrete. Let Fac, Fsc, and Fj denote, respectively, the distribution functions of i/ac, ysc, and i/j- Then we have F = Fac 4- Fsc 4- Fj. Since i/ac A, Proposition 6.9 on page 371 implies that Fac is abso- lutely continuous; since ysc -L A and i/d -L A, Proposition 6.18 implies that F^. = F^ = 0 A-ae. Also, because ysc is continuous, Proposition 6.16 on page 396 implies that Fsc is continuous; and, because i/d is discrete, Propo- sition 6.17 implies that Fd is discrete. Because F — F&c 4- Fsc 4- Fd and Fs'c = F^ = 0 A-ae, it follows immediately that F(c = F' A-ae. It remains to establish uniqueness. So suppose F = Fi + F2 + F3, where Fi is absolutely continuous, F2 is continuous and F£ = 0 A-ae, and F3 is discrete. For 1 < j < 3, let Tj denote the Lebesgue-Stieltjes measure corresponding to Fj.
6.8 Decomposition of Measures □ 399 Let C be the collection of intervals of 1Z of the form (a, 6] and (c, oo), where —oo < a < b < oo and —oo < c < oo. Then, by Exercise 4.96 on page 218, C is a semialgebra and A(C) = B. As F = Fi + F2 4- F3, the measure и + т2 4-T3 agrees with 1/ on C and, therefore, on B; in other words, у = л + r2 + 73. Since Fi is absolutely continuous, Proposition 6.9 on page 371 implies that Ti A. Since F2 is continuous, Proposition 6.16 (page 396) implies that r2 is a continuous measure and, since F2 = 0 A-ae, Proposition 6.18 (page 397) implies that t2 ± A. And, because F3 is a discrete distribution function, Proposition 6.17 (page 397) implies that T3 is a discrete measure. It now follows from the uniqueness portion of Theorem 6.15 that we have Ti = i/ac, r2 = i/sc, and 73 = У&. This implies that the corresponding distribution functions are equal: Fi = Fac, F2 = Fsc, and F3 = Fa. Theorem 6.16 provides a concrete method for determining the decom- position of a distribution function, F, and, hence, of its corresponding Lebesgue-Stieltjes measure, y. Specifically: Step 1. Determine the derivative, F', of F. Step 2. Fac is the indefinite integral of F', that is, Fac(:r) = [ Ff(t)dt. J—00 We have i/ac(B) = F'dA. Step 3. Fj is obtained from the discontinuities of F, that is, Fd(^) ~ \ an, Xn<X where {xn}n denotes the set of discontinuities of F and {an}n denotes the corresponding magnitudes of the jumps, that is, an = F(xn) — F(xn—). (See Exercise 6.140.) We have щ = ^2na^xn- Step 4. Fsc = F - Fac - Fd. We have ysc = у - i/ac - i/d. EXAMPLE 6.20 Illustrates Decomposition Let F: H —> be defined by F(x) = f°, < 3 — e~x, . 4 - e~x, x < 0; 0 < x < 1; x > 1.
400 □ Chapter 6 Differentiation Note that F is a distribution function; that is, F is nondecreasing, right continuous, bounded, and F(x) —► 0 as x —► —oo. We will apply the pre- ceding Steps 1-4 to decompose F and its corresponding Lebesgue-Stieltjes measure. Step 1. We have F'(x) 0, x < 0; x > 0, x / 1. Step 2. We have x < 0; x > 0. Also, i/ac(B) = fBF'dX. Step 3. F is discontinuous at x = 0 and x = 1. The magnitudes of the jumps at those two points are, respectively, F(0) - F(0—) = 2 — 0 = 2 and F(l) - F(l-) = (4 - e"1) - (3 - e-1) = 1. Therefore, °, < 2, 1з, Fd(x) = x < 0; 0 < x < 1; x > 1. Also, щ = 26q + 6i. Step 4- From Steps 2 and 3 we see that Fac + Fj = F. Consequently, Fac = F - Fac - Fa = 0. We have i/sc = v - - I'd = 0. □ EXERCISES 6.8 6.130 Let (Q,A,/i) = (F.,B,X). For E € M, define r(E) = A(£fl [0,1]). Also define F(x) = < 0, V’(x), 1, x < 0; 0 < x < 1; x > 1, where -0 denotes the Cantor function, and let cu be the unique Borel mea- sure having F as its distribution function. Finally, set и = t4-o>4-£o 4-<5i. Determine the Lebesgue decomposition of и with respect to Л. 6.131 Let (П,Л, p) = (7£,Л4,/1), where p is counting measure on X, that is, д = 6n. Set и = 4-Л. Determine the Lebesgue decomposition of и with respect to p. 6.132 Let (Q, A, p) be a measure space and vi and 1/2 measures on A. If p and U2 L /z, show that ± 1/2 • Hence the two measures in the Lebesgue decomposition of a measure are mutually singular.
6.8 Decomposition of Measures □ 401 In Exercises 6.133-6.135, (П,Л) denotes a measurable space such that {z} G A for all x 6 Q. 6.133 What can you say about a measure, i/, that is both discrete and continuous? 6.134 Find a measure, v, on a measurable space (Q, Д) such that every element of Q is an atom, but и is not discrete. 6.135 Let /1 and v be measures on (Q, A). Prove the following. a) If p is discrete and и is continuous, then /i ± v. b) If и p and /1 is continuous, then so is v. c) If и p and p is discrete, then so is v. 6.136 Show that the measures r/ac, and i/d in Theorem 6.14 are mutually singular. 6.137 Let X be a random variable on (Q, A, P) and let px denote its probability distribution. Prove that px can be expressed as a convex combination of probability measures, /ц, /12, and дз, on B, where /11 A, /12 JL A and is continuous, and /13 is discrete. Convex combination means we can write Дх = where otj > 0, 1 < j < 3, and aj = 1. 6.138 Let и be a finite Borel measure on 1Z and F„ its distribution function. a) Show that for each x G r/({z}) = Fu(rr) — F^x—). b) Prove that x is an atom of и if and only if Fu is discontinuous at x. 6.139 Prove Proposition 6.17 on page 397. 6.140 Suppose that F is a discrete distribution function, say, F(x) = ^xn<x an' a) Show that {rrn}n constitutes the set of discontinuities of F. b) Prove that for each n G Л/*, F(xn) — F(xn~) = an- This shows that the magnitude of the discontinuity of F at the point xn is an- c) Show that F is constant on any interval not containing a discontinuity point of F. 6.141 Let {rn}^! be an enumeration of the rational numbers and {an}^! a sequence of positive real numbers such that £2^1 an < °°- Define the function F on 1Z by F(x) = ^,r <x an' a) Prove that F is continuous at every irrational number and discontinuous at every rational number. b) Show that F is strictly increasing. c) Prove that Ff = 0 A-ae. 6.142 Let и be a finite Borel measure on TZ and Fv its distribution function. Let 1/ = 1/1 + 1/2 be the Lebesgue decomposition of и with respect to A (considered a Borel measure). Prove that dv\/dX = F„ A-ae. 6.143 Let denote the Cantor function. Define {0, x < 0; 2 + x + iIj(x), 0 < x < 1; 4 + rr, l<rr<2; 9, x > 2.
402 □ Chapter 6 Differentiation a) Decompose F into its absolutely continuous, singular continuous, and discrete parts. b) Use part (a) to determine the decomposition of the Lebesgue-Stieltjes measure, i/, corresponding to F into its absolutely continuous, singular continuous, and discrete parts. 6.144 Refer to Exercise 5.30 on page 285. Let X be uniformly distributed on [a, b], and let a < c < b. a) Define Y = min{c, X}. Determine the decomposition of p,y into its absolutely continuous, singular continuous, and discrete parts. b) Define Z = max{c, X}. Determine the decomposition of /iz into its absolutely continuous, singular continuous, and discrete parts. c) Let X be an absolutely continuous random variable with probability density function, fx. Let M be a positive real number, and define Г-M, X<-M\ Y = I X, \X\ < M; [M, X>M. Determine the decomposition of /zy into its absolutely continuous, sin- gular continuous, and discrete parts. 6.145 Let (П,Л,/х) = (7£2,B2,A2), and let D = {(x,y) € 7£2 : x2 + y2 < 1}. Define /z, cu, and r on B2 as follows: /z(B) = A ({a; 6 R : (x.x) 6 B}); cu(B) = 1 if (0,0) 6 B, and zero otherwise; and т(В) = A2(B П D). Let и = /z 4- cu 4- r. Determine the decomposition of и into its absolutely continuous, singular continuous, and discrete parts with respect to two- dimensional Lebesgue measure. 6.9 MEASURABLE TRANSFORMATIONS AND THE GENERAL CHANGE-OF-VARIABLE FORMULA Recall the following change-of-variable formula from elementary calculus, often referred to as integration by substitution. Change-of- Variable Formula for Riemann Integration: Suppose that g is a continuously differentiable monotone function on [a, 6] with range [c, d] and that f is continuous on [c, d]. Then rb rd / f(g(xy)\g'(x)\dx= / f(y)dy. J a J c In this section, we will generalize the change-of-variable formula by applying the theory of measurable transformations. We begin with the following definition.
6.9 Measurable Transformations □ 403 DEFINITION 6.23 Measurable Transformation Let (fi, Л) and (Л, 5) be measurable spaces. A mapping T fi —> Л is called a measurable transformation if T~r(S) € A for each SeS. EXAMPLE 6.2 1 Illustrates Definition 6.23 a) The measurable transformations from (11, B) to (1Z, B) are precisely the (real-valued) Borel measurable functions. b) The measurable transformations from (1Z, Л4) to (11, B) are precisely the real-valued Lebesgue measurable functions. c) More generally than in parts (a) and (b), let (£1,Л) be any measur- able space and (A, 5) = (7£,B). Then the measurable transforma- tions from (fi,v4) to (H, B) coincide with the real-valued Л-measurable functions. □ Our next result shows that the composition of a measurable function with a measurable transformation is a measurable function. PROPOSITION 6.19 Suppose that T is a measurable transformation from (Г2,Л) to (A,S) and that f is an S-measurable function on Л.. Then f oT is an A-measurable function on fi. PROOF: Let О be open in 1Z (in C if f is complex-valued, in H* if / is ex- tended real-valued). Because f is 5-measurable, /“^(O) E S, and because T is a measurable transformation from (£1,Л) to (A, 5), T-1(/_1(O)) E A. Thus, (f о T)-1(<?) = T~1(f~1(O)) E A for each open set O. This shows that f о T is Л-measurable. From a measurable transformation and a measure on its domain space, we get in a natural way a measure on the range space. This is the content of the following proposition whose proof is left to the reader as an exercise. PROPOSITION 6.20 Let T be a measurable transformation from (Г2,Л) to (A, 5) and p a mea- sure on Л. Define цоТ-\3) = KT-^S)), s&s. Then доТ-1 is a measure on S, called the measure induced by ц and T.
404 □ Chapter 6 Differentiation EXAMPLE 6.2 2 Illustrates Proposition 6.20 Let (Q, Л, P) be a probability space and X a random variable thereon. According to Definition 5.6, the set function Mx(B) = P(XeB), Be в, is the probability distribution of X. But note also that X is a measurable transformation from to (7£,B) and that the measure induced by P and X is P о X-!(B) = P(X-x(B)) = P(X e B) = nx(B), BeB. In other words, the measure induced by P and X is the probability distri- bution of X. □ The General Change of Variable Formula With Propositions 6.19 and 6.20 in mind, we now prove the general change- of-variable formula. THEOREM 6.17 General Change of Variable Formula Let {Q, A, p) be a measure space, (A, <S) a measurable space, and T a mea- surable transformation from (Q, A) to (Л, 5). Then, for any S-measurable function f on Л, [ f oT(x)dn(x}= [ n J Л. f(y)dy.oT 1(y), (6.50) in the sense that if one of the integrals in (6.50) exists, then so does the other, and they are equal. PROOF: Suppose first that f is the characteristic function of a set S € S. Then, / f О T(x) dfi(x) = / хз(Т(хУ) dfj,(x) = / XT-i(S)(x)dfj,(x) Jn Jn Jn = = yoT^S) = [ xs^dfioT-^y) Ja = [ Jk
6.9 Measurable Transformations □ 405 Hence (6.50) holds if f is a characteristic function. It now follows easily that (6.50) holds if f is a nonnegative S-measurable simple function. If f is a nonnegative extended real-valued S-measurable function, se- lect a sequence {sn}^Li of nonnegative S-measurable simple functions such that sn T f on Л. Then sn о T f f о T on Q. Applying the monotone con- vergence theorem twice, we get that [ foT(x)dp(x) = lim [ sn(T(x))dp(x) Jn n-*°° Jn = lim [ Sn^dfj, о T~1(y)= [ о Т-1(у). n-°° J a Ja If f is a complex-valued or extended real-valued S-measurable func- tion, we proceed in the usual manner. That is, we decompose f into a linear combination of nonnegative S-measurable functions and apply the result of the previous paragraph to each component. As an immediate consequence of Theorem 6.17, we have the following corollaries. Their proofs are left to the reader as exercises. COROLLARY 6.3 Let (fi, Л, jz) be a measure space, (A, S) a measurable space, T a measurable transformation from (fi, Л) to (A, S), and f an S-measurable function on A. Then, for each S € S, [ fo T’(x) dfj.(x) = [ f(y) dp. О Jt-1(S) Js in the sense that if one of the integrals exists, then so does the other, and they are equal. COROLLARY 6.4 Let (Sl,A,p) be a measure space and (A,S, i/) a ст-finite measure space. Suppose that T is a measurable transformation from (SI, A) to (A,S) such that p о T"1 i/ and p о T-1 is а-finite. Then, for any S-measurable function f on A, ° Г(х) d/j.(x) = -(.y) dv(y), in the sense that if one of the integrals exists, then so does the other, and they are equal.
406 □ Chapter 6 Differentiation EXERCISES 6.9 6.146 True or False: Every real-valued Lebesgue measurable function is a mea- surable transformation from (7^,A4) to (7^,A4). 6.147 Prove Proposition 6.20 on page 403. 6.148 Let (П,Л, P) be a probability space and Xi, ..., Xn random variables thereon. Define X: Q —► Hn by X(cu) = (Xi(cu),..., Xn(u>)). a) Prove that X is a measurable transformation from (Q, Д) to (7£n,Bn)« b) Identify the measure induced by P and X. 6.149 Prove Corollary 6.3. z 6.150 Prove Corollary 6.4. 6.151 Suppose that g is an absolutely continuous and monotone function on [a, b] with range [c, d] and that f is Borel measurable and Lebesgue integrable on [c, d]. Use the general change-of-variable formula (Theorem 6.17 on page 404) to prove that pb pd / №(*))|p'(®)l<&= / f(y)dy, J a J c where both integrals are in the Lebesgue jsense. Hint: Assume first that g is nondecreasing. Let pJB) = f gf dX for В € ^[a,b] and show that, as measures on M ° 9"1 = A. 6.152 Use Exercise 6.151 to establish the change-of-variable formula for Riemann integration given on page 402. 6.153 Let X be an absolutely continuous random variable on (Q, Л, P) with probability density function fx, and let ф be a strictly monotone function on 11 whose inverse is absolutely continuous on 1Z. Prove that Y = ф о X is an absolutely continuous random variable on (Q, Л, P) with probability density function given by fY(y)=fx(<l> ЧуУ) f-ф l(y) ay 6.154 Let (£2,Л,/z) be a finite measure space, and let ф be a nonnegative real- valued Д-measurable function on Q. For x > 0, set G(x) = /z(0-1((r, oo))). Prove that Г Г°° I фИр,= I G(x)dx. Jn Jo 6.155 Let (П,Л,Р) be a probability space, and let X be a nonnegative random variable thereon. Use Exercise 6.154 to prove that (1 — Fx(x))dx.
6.9 Measurable Transformations □ 407 6.156 Suppose g is a real-valued Lebesgue measurable function such that if В G В with A(B) = 0, then A(p~1(B)) = 0; that is, the inverse image under g of any Borel set of Lebesgue measure zero has Lebesgue measure zero. a) Prove that if E 6 M with A(E) — 0, then p“1(E) has Lebesgue (outer) measure zero, and hence is measurable; that is, the inverse image un- der g of any Lebesgue measurable set of Lebesgue measure zero is Lebesgue measurable and has Lebesgue measure zero. b) Prove that g~1(B) G Л4 for each E € Л4, so that g is a measurable transformation from (7£,A4) to (7£,A4). 6.157 Suppose that g is a real-valued Borel measurable function such that, as Borel measures, Ao#-1 A. Prove that g is a measurable transformation from (7£,A4) to (7£,A4). Hint: Exercise 6.156. 6.158 In each of the following parts, we have specified a real-valued Borel mea- surable function, g. In each case, show that, as Borel measures, Aop”1 A and find, explicitly, d(A о g~1)/dX. a) g(x) = x2 b) g(x) = x3 c) g(x) = ex 6.159 Let ip denote the Cantor function. Show that, as Borel measures on [0,1], we have А о гр'1 ± A. 6.160 In Exercise 6.63 on page 353, we proved the following generalization of the change-of-variable formula for Riemann integration: Suppose that g is an absolutely continuous and monotone function on [a, 5] with range [c, d] and that f G C1 ([c, d]). Then (/ о g)g' G jC1 ([a, b]) and rb rd / /М0)Ь'(*)1dx = / f(v)<fy, J a J c where both integrals are in the Lebesgue sense. Explain why this result does not follow directly from the general change-of-variable formula. 6.161 Use the general change-of-variable formula to provide a proof of Theo- rem 5.6 on page 291.

PART THREE □ Topological, Metric, and Normed Spaces
Pavel Samuilovich Urysohn (1898-1924) Pavel S. Urysohn was born in Odessa, Russia, on February 3, 1898, the son of a financier, Urysohn, in 1915, enrolled at the University of Moscow to study physics. However, influenced by Egorov and Luzin, he soon began to con- centrate on mathematics. Urysohn graduated in 1919. but remained at the university to continue his studies. He focussed his early work on integral equations and other analysis prob- lems. In 1921. Urysohn was appointed assistant professor at the Univer- sity of Moscow and Egorov supplied him with two problems that turned Urysohn's attention to topology. In 1922, Urysohn published papers on topology in the journal of the Academie des Sciences and in Soviet and Polish journals. These papers laid the foundations of the Soviet school of topology. His most famous result is the ingenious lemma that bears his name. The results of his work in abstract topology included a theorem on the existence of a topological mapping of any normed space with a countable base into Hilbert space. Urysohn presented, in his memoirs on Cantortan varieties (published posthumously in 1925-26). an inductive definition of dimensionality that became classical. The theory of dimensionality is also known as the Urysohn-Menger theory. Tragically, Urysohn’s contributions were cut short when he drowned at the age of 26 off the coast near Batz, France, on August 17. 1924. 410
Elements of Topological, Metric, and Normed Spaces In this chapter, we will introduce topological, metric, and normed spaces and study some of their basic properties. Like most good ideas in mathe- matics, the concept of a topological space can be approached from several points of view. Since our perspective is from analysis, we will emphasize the connections between topological spaces and the concepts of limit and continuity. 7.1 INTRODUCTION TO TOPOLOGICAL SPACES In this section, we will show how extending the limit concept can lead naturally to the notion of a topological space. Suppose that we have a function f: fi —► Л. What kinds of structures on fi and Л are needed in order to make sense of the formula lim f{x) = 6? (7.1) x—*a In case Q = Л = 7£, (7.1) can be described verbally as follows: “/(x) will be near b whenever x is sufficiently near (but not equal to) a.” Of course, 411
412 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces “f(x) being near b” means that /(x) lies in some prescribed open interval I centered at b and “x is sufficiently near a” means that x lies in a certain interval J centered at a. Here, we seek to capture the idea of nearness in terms of intervals. In general, our approach to (7.1) will be to consider col- lections of subsets of Q that have properties mimicking those of collections of intervals having a common center. Neighborhood Bases, Continuous Functions, and Open Sets Consider an element a e Q. We would like to define a collection 91a of subsets of Q that can be thought of as “neighborhoods” of a. Certainly, our collection should be nonempty and satisfy the condition: a G N for all N G 91/ (7.2) If we think of an element of Q as being “near” a if it belongs to some member of the collection 91a, then at least some of the elements of the intersection of two members of 91a should also be “near” a. Thus, it is reasonable to assume that the collection 91a satisfies the condition: If ЛГ1, #2 ё there exists ТУз 6 Vta such that ^з С M П^2. (7.3) A nonempty collection У1а of subsets of Q satisfying (7.2) and (7.3) is called a neighborhood basis at the point a. Members of %la are called neighborhoods in or, simply, neighborhoods. Using the concept of a neighborhood basis at a point,- we make the following definition. DEFINITION 7.1 Neighborhood Basis on a Set A collection 91 of subsets of a set Q is said to be a neighborhood basis on Q if for each a 6 Q, the collection { N G 91 : a G N} is a neighborhood basis at the point a. The next proposition provides an equivalent set of conditions for a collection of subsets of Q to be a neighborhood basis on Q. Its proof is left to the reader as an exercise. PROPOSITION 7.1 A collection 91 of subsets of a set Q is a neighborhood basis on Q if and only if it satisfies the following two conditions:
7.1 Introduction to Topological Spaces □ 413 а) П — Un col N. b) NUN2 € 01 and x G N\ П N2 implies there exists N3 G 07 such that x G N3 G M П W2. EXAMPLE 7.1 Illustrates Neighborhood Bases a) The collection I of all open intervals of is a neighborhood basis on 1Z. And so is { (x — r, x 4- r): x G r > 0 }. b) There are several natural neighborhood bases on H2. Two examples are I2 = {Ix J:J,JgI}‘ and P = { Dr(a, b) : (a, b) G 7J2, r > 0 }, where Dr(a, b) = { (x, y) : (x — a)2 + (y — b)2 < r2 }. Other examples of neighborhood bases on H2 can be found in the exercises at the end of this section. □ Using the notion of neighborhood basis, we can make sense of (7.1). Suppose У1а and ОТь are neighborhood bases at a and b, respectively. Then we will take (7.1) to mean that the following condition is satisfied: For each M G ЯЯь, there exists N G9la such that f(N \ { a }) С M. In cases where (7.1) holds with b = /(a), the function f is said to be continuous at a with respect to the neighborhood bases %ta and If 91 and 9Я are neighborhood bases on Q and Л, respectively, then f is said to be continuous on П with respect to 97 and S91 if it is continuous at each a G П with respect to the neighborhood bases 9la and OT/(a), where 9ia = {#G9i:aGN} and OT/(a) = {MGOT:/(a)GM}. When we have a neighborhood basis 91 on П, we can also generalize the idea of an open set. Referring to Definition 2.7 on page 57, we define a subset О C Q to be open with respect to 9t if for each x G O, there is an N G 91 such that x G N С O. When it is clear from the context which neighborhood basis we are using, we will say simply that О is an open set.
414 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces We note that all of the sets belonging to 91 are open with respect to 91. More generally, as the reader should verify, a subset of Q is open with respect to 91 if and only if it is a union of members of 91. Theorem 2.2 on page 58 states three fundamental properties of the collection of open subsets of 7£. The next proposition, whose proof is left to the reader, shows that those properties also hold for the collection of subsets of Q that are open with respect to a neighborhood basis. PROPOSITION 7.2 Let 91 be a neighborhood basis on the set SL Then the open sets with respect to 91 satisfy the following conditions: a) The empty set and the set Q are open. b) The union of any collection of open sets is an open set. c) The intersection of any finite collection of open sets is an open set. Exercise 7.2 shows that the neighborhood bases T2 and 7?, defined in Example 7.1(b), determine the same collection of open subsets of H2. It is easy to construct other examples where distinct neighborhood bases determine the same collection of open sets. The following proposition shows, however, that the property of conti- nuity for a function f: Q —> A, where Q and A have neighborhood bases 91 and ЯИ, respectively, depends only on the open sets determined by 91 and Ш1. The proof of the proposition is left to the reader as an exercise. PROPOSITION 7.3 Let 91 and SOI be neighborhood bases on fi and A, respectively. Then a function f: Q —> A is continuous on Q (with respect to 91 and 9У1) if and only if f-^O) is open in Q with respect to 91 whenever О is open in A with respect to 9И. Topological Spaces and Continuous Functions We note that Proposition 7.3 generalizes and is motivated by Theorem 2.5 on page 66. It also shows that, with respect to the concept of continuity, the notion of open set is more fundamental than that of neighborhood basis since two distinct neighborhood bases on a set can determine the same collection of open sets and, hence, the same continuous functions. Thus, we are led to formalize the concept of open set via the following definition.
7.1 Introduction to Topological Spaces □ 415 DEFINITION 7.2 Topology, Topological Space Let fl be a nonempty set. A collection T of subsets of fl is said to be a topology on fl if it satisfies the following conditions: a)0,flGT. b) S С T implies Uogs О ^T. c) Oi, O2 € T implies O\ П O2 € T. If T is a topology on fl, then the pair (Q,T) is called a topological space; the members of T are called T-open or, if there is no danger of confusion, simply open. Note: When the topology under consideration is clear from context, a topological space (fl,T) will usually be referred to simply as fl. It follows from Proposition 7.2 that if 91 is a neighborhood basis on a set fl, then the subsets of fl that are open with respect to 91 constitute a topology on fl, which we will call the topology determined by 91. On the other hand, if T is a topology on the set fl, then the collection {O tT : a G О } is a neighborhood basis at the point a for each a G fl, T is a neighborhood basis on fl, and the topology determined by T is T. We also have the following definition. DEFINITION 7.3 Neighborhood Basis for a Topological Space Let (fl, T) be a topological space. A collection 91 of subsets of fl is said to be a neighborhood basis for (Л, T) if the following two conditions are satisfied: a) 91 is a neighborhood basis on Q. b) The topology determined by 91 is T. In such cases, we also say that the neighborhood basis 91 induces or determines T. The reader should verify that each of the following conditions is neces- sary and sufficient for a collection 91 of subsets of fl to be a neighborhood basis for (fl, T). • 91 С T and each open set (i.e., member of T) is a union of members of 91. • 91 С T and for each OeT and each x G O, there is an N G 91 such that x G N С O.
416 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces Motivated by Proposition 7.3 (page 414), we now extend the definition of continuity to functions /: Q —* A, where Q and Л are topological spaces. DEFINITION 7.4 Continuous Functions on Topological Spaces Let Q and Л be topological spaces. A function /: Q —> A is said to be continuous if /”1(O) is open in Q whenever О is open in A. EXAMPLE 7. 2 Illustrates Topological Spaces and Continuous Functions a) Let Q = 72, and T consist of the usual open sets as given by Definition 2.7 on page 57. Then, according to Corollary 2.1 on page 67, a real-valued function on 72, is continuous in the sense of Definition 2.11 on page 65 if and only if it is continuous in the sense of Definition 7.4. b) The neighborhood bases given in Example 7.1(b) determine the same topology T on T22. With respect to T and the usual topology on 72,, the functions f,<?:7£2 -♦ 1Z given by f(x,y) = x and д(х, у) = у are continuous. c) Let fi be any set and T = {0, fi}. Then T is a topology on Q, albeit not an interesting one. Nevertheless, this topology is sometimes useful as an illustrative example. It is not hard to show that a function f: Q —* is continuous with respect to the topology T if and only if it is constant. d) Let Q be any set. Then the collection P(Q) of all subsets of Q is a topology on Q which is sometimes referred to as the discrete topology. It is not hard to see that 7>(Q) is determined by the neighborhood basis consisting of all the single-’element subsets of fl. Also, it is obvious that all functions from Q to 7£, or to any other topological space, are continuous with respect to the discrete topology on Q. □ Note: From now on, unless stated otherwise, we will assume that 1Z is equipped with the topology determined by the neighborhood basis of all open intervals, which is the same topology as the one consisting of the open sets of 72, as given by Definition 2.7 on page 57. Relative Topologies and Continuous Functions Given a topological space, we can produce still others by considering subsets with topologies defined as follows. Let (Q, T) be a topological space and D С fl. Then it is easy to check that the collection {D ПО : О ET} is a topology on D, that is, it satisfies (a)-(c) of Definition 7.2. This topology is given a special name.
7.1 Introduction to Topological Spaces □ 417 DEFINITION 7.5 Relative Topology Let (П, T) be a topological space and D a subset of fl. The collection of sets 7b = {jDnO:OeZ}isa topology on D, called the relative topology. Sets in are said to be relatively open. Remark: The reader should compare the definition of relatively open set given here with that given for subsets of in Chapter 2; specifically, see Definition 2.10 and Theorem 2.3, both on page 62. Unless stated otherwise, when we say that a function is continuous on a subset of a topological space, we will mean that it is continuous with respect to the relative topology. For example, when we say a function is continuous on the interval [0,1], we are assuming that [0,1] is equipped with the relative topology inherited from R,. We note that if /: Q —> Л is continuous and D C Q, then the function f\D'D —» Л, the restriction of f to D, is continuous with respect to the relative topology on D. Homeomorphic Topological Spaces We conclude this section by considering what it means for two topological spaces to be equivalent. DEFINITION 7.6 Homeomorphic Spaces; Homeomorphism Suppose that (Q, T) and (A,Z7) are topological spaces and h: fi —> A is a 1-1 correspondence. If h~1(l7) G T for each U EU and h(O) G U for each OgT, then we say that (Q,Z) and (A,Z7) are homeomorphic and call h a homeomorphism. We note that, if h is a homeomorphism, then both h and hr1 are continuous and, moreover, U —♦ is a 1-1 correspondence from U to T. Thus, homeomorphic spaces are equivalent as topological spaces. EXAMPLE 7. 3 Illustrates Definition 7.6 The function f(x) = 2x is a homeomorphism of the interval (0,1) onto the interval (0,2). Indeed, it can be shown that any two open intervals of TZ are homeomorphic. On the other hand, [0,1] and (0,1) are not homeomorphic. See Exercises 7.14-7.16. □
418 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces EXERCISES 7.1 7.1 Let Q be a nonempty set and suppose that for each a € Q, 97a is a neigh- borhood basis at the point a. True or False: The collection 91 = IJaen is a neighborhood basis on Q. 7.2 Refer to Example 7.1(b) on page 413. Show that the topologies determined by the neighborhood bases Z2 and P are identical. 7.3 Show that each of the following are neighborhood bases on 7£2 and that each determines the same topology as that in Exercise 7.2. a) The collection £ = { Lr(a, b) : (a, 5) G 7£2, r > 0 }, where Lr(a, b) = { (x, y) : |x — a| + \y — b\ < r }. b) The collection Wl = {Mr(a,5) : (a,5) G 7г2, r > 0}, where Mr(a, b) = { (z, y) : |x — a^2 + \y — bp^2 < r }. 7.4 Let T denote the neighborhood basis on H consisting of all open intervals and T the topology determined by T. Show that T consists precisely of the open sets of as given by Definition 2.7 on page 57. 7.5 Prove Proposition 7.1 on page 412. 7.6 Prove Proposition 7.2 on page 414. 7.7 Prove Proposition 7.3 on page 414. 7.8 Let 91 = { [a, b) : —oo < a < b < oo } a) Show that 97 is a neighborhood basis on b) Let T be the topology determined by 9L Give an example of a real-valued function that is continuous with respect to T but not with respect to the usual topology on 7.9 A collection S of subsets of a set Q is called a sub-basis on Q if the collection of finite intersections of members of S is a neighborhood basis on Q. The topology determined by the neighborhood basis is also said to be determined by the sub-basis S. a) Show that a collection 5 of subsets of Q is a sub-basis on Q if and only ifUsess = n. b) Show that the topology determined by the basis Z2 in Example 7.1(b) is also determined by the sub-basis {I x : I G T} U { x I: I G Z }.
7.2 Metrics and Norms □ 419 7.10 Verify the assertions made in parts (a)-(d) of Example 7.2 on page 416. 7.11 Let Q be a set and T = { 0 } U{ 17 . Uc is finite }. Show that T is a topology. ★ 7.12 Refer to Exercise 1.33 on page 25. Suppose (Q, T) is a topological space and = is an equivalence relation on Q. Let Q/= denote the corresponding set of equivalence classes and, for each x E Q, let (x) denote the equivalence class containing x. For any subset W of Q/=, let W = a) Show that 7i = {W:lV€T}isa topology on Q/=. The topology T= is often called the quotient topology determined by =. b) Show that the function p: Q —> Q/= defined by p(x) = (x) is continuous with respect to T and T=. 7.13 Prove that “homeomorphic to” is an equivalence relation on the collection of all topological spaces. 7.14 This exercise asks you to show that any two nonempty open intervals of TZ are homeomorphic. a) Prove that (0,1) and Ti are homeomorphic. b) Prove that any two nonempty open intervals are homeomorphic. 7.15 Show that if h is a homeomorphism from (a, b) onto (c, d), then h is either strictly increasing or strictly decreasing. 7.16 Show that no two of the intervals (0,1), [0,1), and [0,1] are homeomorphic. ★ 7.17 Let € be a collection of topologies on a set Q. a) Show that T is also a topology on Q. b) Does the result in part (a) hold if intersection is replaced by union? 7.2 METRICS AND NORMS In the previous section, we developed the notion of a neighborhood basis as a way of expressing the concept of “nearness.” An alternative approach to the idea of nearness is through a generalized concept of distance. In the case of the real line, Ti, we usually think of the distance, d(x, y), between the numbers x and у as being given by d(x, у) = — y\. Proofs of many of the fundamental theorems of analysis on Ti make use of three crucial properties of this distance function; namely, for all x, y, z e Ti, (DI) d(x, y) > 0, with equality if and only if x — y. (D2) d(rr,2/) = d(y,x}. (D3) d(x, z) < d(x, y) + d(y, z). Properties (D1)-(D3) are the model for the general notion of a dis- tance function or metric, which we will introduce in a moment. Of course,
420 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces (D1)-(D3) are derived from the properties of the absolute value function (see Exercise 2.2 on page 42). The absolute value function is also the model for another idea that we will introduce later in this section, namely, that of a norm. DEFINITION 7.7 Metric, Metric Space Let fi be a set. A function p: fi x fi —♦ TZ is said to be a metric on fi if it satisfies the following conditions for all x, 2/, z G fi: а) р(я, ?/) > 0, with equality if and only if x = y. b) p(x,y) = p(y,x). c) p(x, z) < p(x, y) + p(y, z). If p is a metric on fi, then the pair (fi, p) is called a metric space. Note: When it is clear which metric is defined on fi, we will often suppress the p and simply write fi for (fi, p). Normed Spaces While the distance function on 1Z given by d(x, y) = \x — y\ is the model for the concept of a metric, it has an algebraic aspect that is not present in Definition 7.7. We will combine algebraic and metric properties by adapting the notion of distance function to the setting of a linear space. To begin, we recall the definition of a linear space. DEFINITION 7.8 Linear Space A linear space (vector space) consists of a set fi, a field Fj and two functions +: fi x fi —► fi and •: F x fi —> fi, where we denote 4-(x, y) by x + y and -(a, x) by az, such that the following conditions are satisfied for all x, 2/, z 6 fi and a, /3 6 F: a) x 4- у = у 4- x. b) x 4- (2/ 4- z) = (x 4- y) 4- z. c) There exists a 0 E fi such that x 4- 0 = x. t A field is a set along with two binary operations satisfying the field axioms (F1)-(F5) on page 36. In this book, F will always be either TZ or C.
7.2 Metrics and Norms □ 421 d) There exists —x G fl such that x 4- (—x) = 0. e) a(f3x) = (a/3)x. f) a(x + y) = ax + ay. g) (a 4- /3)x = ax 4- fix. h) lx = x. The field F is called the field of scalars, 4- vector addition, and • scalar multiplication. Note: On account of (b), sums of the form x+y-j----}-z are unambiguously defined. Also, it is conventional to write x — у for x 4- (—?/)• The space 1Zn is a linear space having К as its field of scalars, where 4- and • are defined by (£1,2:2, • • •, Xn) + (yi, 2/2, • • ,Уп) = (zi + 2/1, X2 + 2/2, • • •, Xn + 2/n) and • • • i^n) = (ах1,ах2т--,ахп). Using analogous definitions, we can make Cn into a linear space having C as its scalar field. A nonempty subset D of fl is called a (linear) siubspace of fl if (1) x,y G D implies x 4- у G D and (2) a G F, x G D imply ax G D. We observe that a subspace D of a linear space fl is itself a linear space, where the operations of vector addition and scalar multiplication in D are the restrictions to D of those operations in fl. Often we will deal with linear spaces of real- or complex-valued func- tions on a set. When we do so, the operations of vector addition (4-) and scalar multiplication (•) will always be defined pointwise, as explained in Section 2.4 on page 65. Similarly, we also define the following operations pointwise: multi- plication of functions, fg; maximum of two real-valued functions, f V g; minimum of two real-valued functions, f Л g; the real part of a complex- valued function JRJ; the imaginary part of a complex-valued function, S/; the absolute value (modulus) of a complex-valued_ function, |/|; and the complex conjugate of a complex-valued function, f = — i^f. Also, a function that is constantly equal to a is denoted simply by a. Now that we have recalled the definition of a linear space, we can define a normed space. This is done as follows.
422 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces DEFINITION 7.9 Norm, Normed Space Let fi be a linear space having as its scalar field F either or C. A function || ||:f2 —> 7£, whose value at x is written as ||x||, is said to be a norm on Q if it satisfies the following conditions for all x, у G £2 and a G F: а) Ы > 0, with equality if and only if x = 0. b) ||aa;|| = |a|||x||. c) lk + y|| < M + IMI- If || || is a norm on Q, then the pair (Q, || ||) is called a normed space. Note: When it is clear from context which norm is being considered, the normed space (fl, || ||) will be indicated simply by fl. It is easy to check that if (Q, || ||) is a normed space, then p(x,y) = ||x-J/|| defines a metric on fl. We will call this the metric induced by the norm || ||. Hence, any normed space can also be viewed as a metric space; indeed, the first examples of metric spaces that we consider arise from norms. However, as we will see, there is still a need for the more general theory of metric spaces to handle, among other cases, those metric spaces where there is no underlying linear-space structure. EXAMPLE 7. 4 Euclidean n-Space Equipped with Various Norms The space 7Zn of n-tuples of real numbers is a linear space with respect to the operations of vector addition and multiplication by real scalars given on page 421. Here are three, naturally arising, norms defined on Ttn: ||x||2 = (x?4-a;2 + --- + *n)1/2> Iklll = |xi| + |l2| + ••• + |®n|, Woo = max{|xi|, |x2|,..., |xn|}, where x = (x1,X2,... ,£n)- □ EXAMPLE 7. 5 Unitary n-Space Equipped with Various Norms The set of complex numbers C = {x + iy : x,y G 1Z} with the usual absolute value function (modulus) defined by |x 4- iy\ = (x2 4- y2)1^2 is a normed space, where the scalar field is also C.
7.2 Metrics and Norms □ 423 The space Cn of n-tuples of complex numbers is a linear space with respect to the operations of vector addition and multiplication by complex scalars given on page 421. We will abuse notation slightly by also using || Ц2, || ||i, and || ||oo to denote the norms defined, respectively, on Cn via: № - (|21|2 + |z2|2 + ••• + Ы2)1/2> И1 = kl + |z2| + • • • + |zn|, Halloo =max{|zi|,|z2|,...,|zn|}, where z = (21, Z2,..., гп). □ EXAMPLE 7. 6 Spaces of Measurable Functions We will present three normed spaces of functions that are generalizations of those given in Example 7.5. Let (Q, Д,/х) be a measure space. a) Recall from Section 4.4 that the set £1(/x) consists of all complex-valued Д-measurable functions satisfying \ f\ dp < 00. Parts (a) and (b) of Theorem 4.8 on page 196 show that £1(^lz) is a linear space with scalar field C. Furthermore, if we identify two functions in whenever they are equal jx-ae and define . Il/lli= [ \f\dp, then || ||i is a norm on £х(м), called the £1-norm. b) A somewhat more difficult task is to show that if we again identify two functions that are equal jx-ae, then f r \ 1/2 11/112 uJ/|2dM) defines a norm on the linear space £2(m) consisting of all complex- valued Л-measurable functions such that | f |2 dp < 00. The norm || Ц2 is called the £2-norm, c) Another important space of measurable functions is £°°(^x). This space consists of all complex-valued Д-measurable functions that satisfy the following condition: There is a real number M such that \ f\ < M p-ae. Such functions are said to be essentially bounded. If we again identify two functions that agree p-ae, then ll/lloo = inf{Af : I/I < M M-ae} defines a norm on the linear space £°°(д). The norm || ||oo is called the £°°-norm or essential-supremum norm.
424 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces In case /1 is n-dimensional Lebesgue measure restricted to a mea- surable subset Q of 7£n, we denote the spaces £r(p), £2(/Д and £°°(m) by /^(Q), £2(fi), and £°°(Q), respectively. And when p is counting mea- sure on some set Q, we denote the spaces £г(р), £2(p), and £°°(m) by €r(Q), ^2(П), and €°°(Q), respectively. □ EXAMPLE 7. 7 Metric Spaces That Are Not Normed Spaces To see how metric spaces that are not normed spaces can arise naturally, we first consider a simple way of constructing new metric spaces from existing ones. Suppose that (Q, p) is a metric and that D is a subset of Q. Then we can define a metric pd on D by restricting the function p to D x D. When there is no danger of confusion, we will denote the metric space (£>,pp) by (D,P). Now, suppose that D is a subset, but not a linear subspace, of a normed space and let p be the metric induced by the norm. Then (£>, p) is a metric space that is not a normed space. □ Metric Spaces as Topological Spaces We now show how metrics can be used to define topologies. Let (Q, p) be a metric space. For x G Q and r > 0, let JB£(x) = { у G П : p(x, y) < r }. We call B?(x) the open ball of radius r centered at x. When the metric with which we are dealing is given unambiguously, we write Br(x) for B?(x). In case Q = TZ and p = d (the metric induced by absolute value), we have Br(x) = (x — r, x + r). The next proposition, whose proof is left to the reader as an exercise, shows that, just as the collection of open intervals { (x — r, x + r) : x G TZ, r > 0 } is a neighborhood basis on 7£, the collection of open balls of a metric space Q is a neighborhood basis on Q. PROPOSITION 7.4 Let (П, p) be a metric space. Then the collection, {Br(x) : x G fi, r > 0 }, of open balls of Q is a neighborhood basis on Q. The neighborhood basis {Br(x) : x E SI, r > 0} determines a topol- ogy on Q, denoted by Tp, which we call the topology induced by the metric p. If the metric p is itself induced by a norm || ||, we also say that Tp is the topology induced by the norm || ||. When we have a metric space (or a normed space), we assume, unless stated otherwise, that it has the topology induced by the metric (or norm).
7.2 Metrics and Norms □ 425 Thus, for example, suppose that Q and Л are each either a metric, normed, or topological space and that f: Q —> A. Then, when we say that f is continuous on Q, we mean, unless stated otherwise, that it is continuous with respect to the induced topologies on Q and Л. A topological space is said to be metrizable if there is a met- ric p on Q such that Tp = T. Later we will address the difficult problem of determining when a topological space is metrizable. We will see that even in cases where a topological space is metrizable, the metric may not be defined by a simple usable formula. When two metrics on the same set or two norms on the same linear space induce the same topology, we say that they are equivalent. EXAMPLE 7. 8 Nonequivalent Norms Consider the space C([a, b]) of continuous complex-valued functions on the closed interval [a, b]? C([a, b]) is a linear subspace of each of the spaces £х([а, b]), £2([a, b]), and £°°([a, b]); hence, it can be given any of the norms || ||i, || ||2, and || ||oo defined in Example 7.6 on page 423. It is left to the reader to show that no two of these norms on C([a, b]) are equivalent. □ The following proposition and its corollary provide useful equivalent conditions for two metrics or norms to be equivalent. We leave the proof of the corollary to the reader as an exercise. PROPOSITION 7.5 Let p and a be metrics defined on a set Q. Then p and a are equivalent if and only if the following condition is satisfied: for each x E Q and e > 0, there are positive numbers r and s such that B°(x) C Bf (x) and BP(x) C PROOF: Suppose that the condition specified in the statement of this proposition is satisfied. Let О be open with respect to the metric p and let x E O. Then there is an б > 0 such that B?(x) С O. So, by assumption, there is an s > 0 such that B°(x) C Bf(x) С O. Hence, О is also open with respect to a. A similar argument shows that a set that is open with respect to о is also open with respect to p. Conversely, suppose p and a are equivalent. Let x E Q and б > 0. Then, since Bf(x) is an open set containing x in the topology induced t In the terminology of Definition 2.11 on page 65, C([a,b]) denotes the collection of continuous real-valued functions on [a, b]. But, as we said in a footnote to that definition, the notation introduced there was temporary.
426 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces by p, it is also an open set containing x in the topology induced by a. Thus, there is an s > 0 such that Bf(z) C A similar argument shows that there is an r > 0 such that B?(x) C Bf (x). COROLLARY 7.1 Let || || and ||| ||| be norms on a linear space Q. Then || || and ||| ||| are equivalent if and only if there are positive constants A and В such that ли < |||x||| < B||r|| for all x G Q. Exercise 7.25 shows that the three norms on 1Zn defined in Example 7.4 on page 422 are equivalent, that is, they induce the same topology, T. Unless otherwise stated, we assume that each subset D of TV1 has the relative topology 7b* Similar comments hold for Cn. We conclude this section with a construction showing that every metric is equivalent to a bounded metric. PROPOSITION 7.6 Let (fyp) be a metric space. Then there is a bounded metric a on Q such that p and a are equivalent. PROOF: It can be shown (see Exercise 7.30) that the function a defined on Q x Q by "(1'9> = ггй is a metric. Clearly, cr(z,i/) < 1 for all x,y G fi and, so, a is bounded. Now, since cr(x, y) < p(x, y), it follows that for each x G fi and б > 0, Bf(z) C B°(x). On the other hand, choosing s = б/(1 4- e) and using p(x,y) = a(x,y)/(l — cr(x,y)), we find that Bf(x) C Bf(x). Thus, the condition of Proposition 7.5 is satisfied by p and a. EXERCISES 7.2 J.18 Let (Q, p) be a metric space. Prove each of the following facts. a) For x,y, z G Q, \p(x,y) -p(z,y)\ < p(x,z). b) For xi,X2,... ,xn G Q, P(®1, *Гп) < p(«Tl j X2) + p(X2, *Гз) 4“ * ’ * + p(Xn—1 j Xn)-
7.3 Weak Topologies □ 427 7.19 Refer to Example 7.4. Verify that each of || ||i, || Ц2, and || ||oo are norms. 7.20 Refer to Example 7.5. Verify that each of || ||i, || Ц2, and || ||oo are norms. 7.21 Refer to Example 7.6. Verify that each of || ||i, || Ц2, and || ||oo are norms. 7.22 For each x E 7Z, let (x) = |a?|ly/2. a) Show that ( ) satisfies conditions (a) and (c) of Definition 7.9 but not condition (b). b) Show that, nevertheless, p{x^ y) = (x — y) defines a metric on TZ that is equivalent to the metric induced by the absolute value function. 7.23 Prove Proposition 7.4. 7.24 Prove Corollary 7.1. 7.25 Prove that the three norms defined in Example 7.4 are all equivalent. 7.26 Prove that no two of the norms in Example 7.8 are equivalent. 7.27 Let p and a be metrics on Q. Show that each of the following are also metrics on Q. a) pi = p + a. b) P2 = (p2 + cr2)1/2. c) poo = max{p, cr}. 7.28 Refer to Exercise 7.27. Show that any two of the three metrics, pi, рг, and poo, are equivalent. 7.29 Refer to Example 7.2(d) on page 416. Let T be the discrete topology on a set Q. Show that (Q, T) is metrizable. ★ 7.30 Refer to the definition of a metric (Definition 7.7 on page 420). a) Show that if p satisfies condition (c), then so does 0 = p/(l + p). b) Deduce that the function cr in Proposition 7.6 is a metric. 7.31 Provide an example of a topological space that is not metrizable. it7.32 Suppose that (Q,p) and (A, cr) are metric spaces and let /:Q —♦ A. Show that f is continuous on Q if and only if for each a E Q and e > 0, there is a 8 > 0 such that р(я, a) < 6 implies a(f (re), f(a)) < e. 7.3 WEAK TOPOLOGIES While metric spaces are ubiquitous in analysis, there are natural ways in which nonmetrizable spaces enter the subject. For example, we will see later that nonmetrizable spaces often arise in the context of weak topologies determined by families of functions. It is the concept of weak topology that we introduce in this section.
428 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces If T and U are two topologies on a set Q and T CU, then we say that T is weaker than U. If T is weaker than but not equal to W, then T is said to be strictly weaker than U. Let fl be a nonempty set. Consider a family of functions T such that for each f € F, is a topological space and —> Ay. Can we find a topology T on Q such that f is continuous with respect to T and 7} for each f E Fl The answer to this question is “yes” because the discrete topology (Example 7.2(d) on page 416) on fi will always do the trick. However, the discrete topology is of little interest because, with respect to it, any function from fi into a topological space is continuous. Therefore, it is better to ask the following question: Of all the topologies on Q with respect to which each f E T is continuous, is there a weakest one? The answer to this question is based on the observation that if T is a nonempty collection of topologies on the set Q, then the intersection, ПТ€хТ is also a topology on Q. (See Exercise 7.17(a) on page 419.) DEFINITION 7.10 Weak Topology Let fl be a nonempty set. Consider a family of functions T such that for each / E J7, (Ay,7y) is a topological space and /:f2 Ay. Let I denote the collection of topologies on Q with respect to which all functions in J7 are continuous. Then the topology AT 7" GT is called the weak topology determined by the family F We leave it to the reader as an exercise to prove that Tjc is the weakest topology on fl for which all f E T are continuous. (See Exercise 7.34.) Usually, when there is no possibility of confusion, functions that are continuous with respect to Tjr are called weakly continuous. Further- more, Tjr-open sets are called weakly open. The following proposition provides a useful alternative way of looking at the topology Tjf. PROPOSITION 7.7 Let fl be a nonempty set. Consider a family of functions F such that for each f E J7, (Ay,7y) is a topological space and f :fl —> Ay. Suppose that
7^3 Weak Topologies □ 429 the topology 7} is determined by a neighborhood basis 91/. Then sets of the form П (7.4) fen where P is a Unite subset of 5* and, for each f e P, Wf e 91/, form a neighborhood basis that induces Tjr. PROOF: We first note that the collection of sets of the form (7.4) is a neighborhood basis on Q. We need to show that the topology T determined by that neighborhood basis is Tjr. Let f e F and О e 7/. Then /-1(О) = и WO, Because each /-1(W) belongs to T, it follows that /~1(О) e T. Thus, every function in F is continuous with respect to T. It follows that Tjr is weaker than T. On the other hand, each set of the form (7.4) is weakly open, being the intersection of finitely many weakly open sets. Consequently, T is weaker than Tjr. EXAMPLE 7.9 Compares Weak and Metric Topologies The space C([a, b]) of continuous complex-valued functions on [a, b] is a linear subspace of £°°([a, b]). Thus, C([a,b]) is a normed space with norm || Цое. Let denote the topology induced by this norm. For each x e [a, b], the complex-valued function on C([a, b]) defined by еж(/) = /(я) satisfies the inequality |ex(/) — ex(^)| < Ц/ — p||oo- From this inequality, it is easy to show that each function ex is continuous with respect to the topology T^o. It follows that the weak topology Tjr determined by the family F — { ex : x e [a, b] } is weaker than 7^. Is it possible that Tjr = 7^? Suppose the answer is yes. Then, in particular, the open ball Bi(O) is weakly open. Applying Proposition 7.7, we see that there exist X\,X2,... ,xn E [a,b], wi,W2,-..,wn e C, and positive numbers bi, 62, ..., 6n such that {f : |/(rrj) -Wj\ < 6j, j = 1, 2, ..., n} C Bi(O). However, it is easy to construct a function д e C([a,b]) with g(xj) = Wj for j = 1, 2, ..., n and g(c) = 2 for some c e [a,b] \ {xi,Z2,... ,xn }. Clearly, g is an element of the set on the left of the previous display but cannot belong to Bi(O). since ||p||oo > 2. Hence, we have a contradiction. Thus, Tjf / T^o and we conclude that Tj- is strictly weaker than □
430 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces Product Topologies Suppose that {(flt,71)}tez is an indexed family of topological spaces. The idea of a weak topology can be used to define a topology on the Cartesian product fl = XtG/flt. We recall from Definition 1.11 on page 18 that each element f e fl is a function on I such that /(t) € flt for each l € I. Furthermore, we know from the axiom of choice that fl / 0 provided flt / 0 for all l E I. DEFINITION 7.11 Product Topology Let {(flt,T^)}tGj be an indexed family of topological spaces and set fl = X tG/ flt. The function pL defined by pL(J) = /(z) is called the zth coordinate projection on fl. The weak topology on fl determined by the family of coordinate projections { pL : l e I} is called the product topology. Thus, the product topology is the weakest topology for which all coordinate projections are continuous. Examples of product topologies are discussed in the exercises. EXERCISES 7.3 7.33 Let T be a collection of topologies on a set Q. Show that if a function f is continuous with respect to every member of T, then it is continuous with respect to the intersection of T, that is, with respect to T. 7.34 Refer to Definition 7.10 on page 428. Prove that Tjr is the weakest topology on fl for which all f G F are continuous. That is, prove the following: a) Each f 6 F is continuous with respect to Tjf. b) If U is a topology on Q such that each f 6 T is continuous with respect to 1/, then Tj- is weaker than U. 7.35 Show that the function L:C([a, 6]) —► C defined by L(f) = f(x)dx is not continuous with respect to the weak topology defined in Example 7.9 on page 429. 7.36 Refer to Example 7.9 on page 429. For Ac [a, 6], let A = {ex : x 6 A}. Show that if A is a proper subset of B, then 7д is strictly weaker than Tjg- 7.37 In Example 7.9 on page 429, show that every Tjr-open set that contains the constant function 0 must also contain a nonzero linear subspace of C([a, 6]).
7.4 Closed Sets, Convergence, and Completeness □ 431 7.38 Let Q and Л be linear spaces having the same scalar field, F. A func- tion L:Q —► A is called a linear mapping or linear operator if for all x, у G Q and all scalars a € F, L(x + y) = L(x) + L(y) and L(ax) = aL(x). Suppose that L\ C([a, &]) —> C is linear and continuous with respect to the weak topology Tjr defined in Example 7.9 on page 429. Show that there are finitely many points zi, 2:2,.. ., xn G [a, b] and constants ci, C2,..., cn G C such that L CiCxj 02^x2 4“ 4“ Cn^xnj where ex(f) = f(x). Hint: Find a finite set of points {#i,..., 2?n} C [a, 6] such that if g(xj) = 0 for 1 < j < n, then |L(p)| < 1. 7.39 Refer to Exercise 7.12 on page 419. Show that if p is continuous with respect to some topology U on Q/=, then U is weaker than T=. 7.40 Show that the product topology on Cn is the same as the topology defined by any of the norms in Example 7.5 on page 422. 7.41 The space ^2(Л7) is a subset of the Cartesian product C^. Thus, ^2(Л7) can be given the relative product topology T. Show that T is strictly weaker than the topology induced by the norm || Ц2. 7.42 Do Exercise 7.41 with replacing ^2(Af) and || ||i replacing || Ц2. 7.43 Do Exercise 7.41 with £°°(X) replacing ^2(Af) and || ||oo replacing || Ц2. 7.44 Consider the Cantor set P as defined on pages 74-75. Recall that each x G P has a unique ternary expansion of the form x = an(^)3-n, where an{x) G {0,2} for all n G A/*. Define A:P —► {0,2}^ by A(x)n = an(x). Suppose that {0,2} is given the discrete topology and {0,2}^ is given the corresponding product topology. Prove that A is continuous. 7.4 CLOSED SETS, CONVERGENCE, AND COMPLETENESS In this section, we discuss closed sets and convergence in topological and metric spaces and some related topics as well. We assume throughout that (Q,T) is a topological space.
432 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces DEFINITION 7.12 Closed Set A subset F of a topological space is said to be closed if Fc is open. Note: From Proposition 2.14 on page 61, we see that the closed subsets of Ti, as given by Definition 2.9 on page 61, are also closed in the sense of Definition 7.12, and vice versa. It follows immediately from Definition 7.12 and the definition of a topology (Definition 7.2 on page 415) that the collection C of closed sets satisfies the following conditions: (Cl) 0,fieC. (C2) £ с C implies p|FG5 F € C. (C3) 2* *i, 7*2 £ C implies F\ U F% EC. Conversely, if C is a collection of subsets of fi satisfying (C1)-(C3), then {F^FeCjisa topology on fi for which C is the collection of closed sets. A simple example of a closed subset of TZ is [a, b], where a < b, because Mc = (—oo, a) U (&, oo). On the other hand, an interval of the form [a, b) is not closed. Limit Points, Closure, and Convergent Sequences Next we define the limit points and closure of a set. DEFINITION 7.13 Limit Point, Closure Let E be a subset of a topological space fi. A point x € fi is called a limit point of E if each open set containing x intersects E\ that is, if О is open and x E O, then О A E / 0. The set of all limit points of E) denoted E, is called the closure of E. Note: If E = fi (i.e., every point of fi is a limit point of E)y then we say that E is dense in fi. Thus, we see that E is dense in fi if and only if it has a nonempty intersection with every nonempty open set. We leave it to the reader as an exercise to show that the following properties hold: • E is the intersection of all closed sets that contain E and, hence, E is the smallest closed set containing E.
7.4 Closed Sets, Convergence, and Completeness □ 433 • Let 91 be a neighborhood basis that determines the topology. Then x € E if and only if x e W € 91 implies W A E 0. (7.5) Condition (7.5) suggests that we can interpret E as the set of points of Q that can be “approximated arbitrarily closely” by points of E. Thus, in the case of the real line the rational numbers are dense since any real number can be approximated arbitrarily closely by rational numbers. The next proposition, whose proof is left to the reader as an exercise, provides the basic properties of the closure operation. PROPOSITION 7.8 Let А, В C fi. Then a) A = A if and only if A is closed. b)^ = A. c) AU В = AUB_ _ d) А С В implies Ac B. e) Ac F and F closed implies A C F. Remark: It follows easily from Proposition 7.8(c) that Ufc=i Ль — Ufc=i whenever Ai, A2. ..., An are subsets of Q. (See Exercise 7.49.) When a topological space is metrizable, there is a useful characteriza- tion of limit point in terms of sequences. To give that characterization, we must first define convergence of a sequence in a topological space. DEFINITION 7.14 Convergent Sequence in a Topological Space Let (fl, T) be a topological space. A sequence {^n}^=1 of points in fl is said to converge to the point x € fl if for each open set О containing x. there is an integer N such that xn G О whenever n > N. Convergence of {xn}Xi x is denoted by lim xn = x or xn -+ x П—><Х> or, in case it is important to indicate the topology in which convergence occurs, by xn x. It is not hard to see that if the topology on fl is induced by a metric p, then xn —* x if and only if p(xn,x) —* 0. In case the topology is the weak topology determined by a family of functions J7, it can be shown that xn —> x if and only if f(xn) —► f(x) for each f € F (See Exercise 7.54.)
434 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces PROPOSITION 7.9 Suppose that the set fl has the topology induced by a metric p. Let E C fl. Then x G E if and only if there exists a sequence {^n}^=i of points of E such that Ишп-.оо xn = x. PROOF: Suppose that x G E. Then, by (7.5), for each positive integer n there exists an xn G Bi/n(x) П E. Because p(xn,x) < 1/n, it follows that limn_^oo xn = x. Conversely, suppose that {zn}Xi a sequence of elements of E such that limn-,» xn = x. Then, for each e > 0, there is a positive integer N such that p(xn,x) < e whenever n > N. It follows that Be(x) П E / 0. Thus, by the condition (7.5), we have that x G E. As a simple illustration of the previous proposition, consider the case where fl = and E = Q. Since every real number is the limit of a sequence of rational numbers, it follows that Q == 1Z. A more elaborate application of Proposition 7.9 is discussed in the following example. EXAMPLE 7.10 Illustrates a Nonmetrizable Space Let fl be the Cartesian product {0,1}я. Let {0,1} have the discrete topol- ogy and fl the corresponding product topology. Recall that fl is the set consisting of all functions from 'll to {0,1} and the product topology on fl is the weak topology determined by the family of functions {pt : t G 1Z}, where pt(/) = f(t). The product topology is determined by the neighbor- hood basis of sets of the form {/efl:/(tfc) = afc, fc = l, 2, ..., n}, (7.6) where ntN and € {0,1}, 1 < к < n. Consider the set U = { f G fl: /'"1({0}) is countable}. We claim that U is dense in fl. Indeed, the intersection of U with each set of the form (7.6) contains the function д defined by gfa) = ak for fc = 1, 2, .. ., n and g(t) = 1 for t G \ {£2, • • • ,tn }. In particular, U has a nonempty intersection with every set in a neighborhood basis determining the topology of fl. Hence U is dense in fl. We claim that fl is not metrizable. Suppose to the contrary. Then, by Proposition 7.9, there is a sequence {/n}^Li C U converging to the function on TZ that is identically 0. It follows from Exercise 7.54 that fin*n->oo fn(t) — О f°r each t G 7^. But, for each t in the complement of the countable set we have limn->oo fn(t) = 1. This contradiction shows that Q is not metrizable. □
7.4 Closed Sets, Convergence, and Completeness □ 435 Completeness There is a powerful extension of the notion of closed set, namely, the idea of a complete set. Before we can give a formal definition of a complete set, however, we require the following: DEFINITION 7.15 Cauchy Sequence A sequence {#n}^Li in a metric space (£2,p) is said to be a Cauchy sequence if for each e > 0, there is a TV e -V such that p(xn, xm) < e whenever n, m > N. In the space TZ with the usual metric, this definition of a Cauchy sequence is exactly the same as Definition 2.6 on page 52. Cauchy sequences in TZ always converge by Theorem 2.1 on page 53. In general, however, Cauchy sequences may fail to converge. Metric spaces for which all Cauchy sequences converge are called complete. DEFINITION 7.16 Complete Metric Space, Complete Set A metric space (Q, p) is said to be complete if every Cauchy sequence converges; that is, if {#n}Xi a Cauchy sequence of elements of Q, then there exists an x € £2 such that limn-^ xn = x. A subset E C £2 is called complete if (E, p) is a complete metric space. The real line, 7£, provides an example of a complete metric space. Many other examples will be encountered in the exercises in this section and in the text and exercises of future sections. Our.next proposition, whose proof is left to the reader as an exercise, relates the concepts of closed and complete. PROPOSITION 7.10 Let fl be a metric space and E C £2. Then the following hold. a) If E is complete, then it is closed. b) If £2 is complete and E is closed, then E is complete. The converse of Proposition 7.10(a) fails. Indeed, the interval (0,1] is closed in the relative topology of the space (0,2); however, (0,1] is not
436 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces complete because the sequence {1/n}^! is a Cauchy sequence in (0,1] but not convergent in (0,1]. Interior of a Set __ / As we have seen, E is the smallest closed set containing E. Similarly, there is a largest open set contained in E defined as follows. DEFINITION 7.17 Interior of a Set Let E be a subset of a topological space Q. A point x € fi is called an interior point of E if there is an open set О such that x G О С E. The set of all interior points of E, denoted E°, is called the interior of E. Remark: We note that the interior of a set may be empty. For example, if we take fl = 7£, then Q° = 0. We leave it to the reader as an exercise to show that each of the following properties hold: • E° is the union of all open sets contained in E and, hence, E° is the largest open set contained in E. • Let 91 be a neighborhood basis that determines the topology. Then x G E° if and only if there is a W G 91 such that x G W С E. The following is the analogue of Proposition 7.8 for the interior of a set. Its proof is left to the reader as an exercise. (See Exercise 7.63.) PROPOSITION 7.11 Let А, В C fl. Then a) A° = A if and only if A is open, b) (A°)° = A°. c) (AAB)° = A°AB°. d) А С В implies A° CB°. e) U C A and U open implies U C A°. EXERCISES 7.4 7.45 Let E be a subset of a topological space fl. Prove the following facts. a) E is the intersection of all closed sets that contain E and, hence, E is the smallest closed set containing E.
7.4 Closed Sets, Convergence, and Completeness □ 437 b) Let 91 be a neighborhood basis that determines the topology. Then x G E if and only if x G W G 91 implies W П E / 0. 7.46 Prove Proposition 7.8 on page 433. + 7.47 Let (Q,p) be a metric space. For x G Q and 0 E C Q, define p(x, E) = inf{ p(x, y) :y£ E}. p(x, E) is called the distance from x to E. Prove the following: a) There is a sequence {яп}^ C_E such that limn—о© p(x,xn) = p(x, E). b) p(x, E) = 0 if and only if x G E. с) p(x,E) = p(x, E). d) |p(a;i,E)-p(z2,B)| < p(zi,x2). e) The function /: Q —> defined by f(x) — p(x, E) is continuous. f) Let A and В be disjoint closed nonempty subsets of Q. Define f(x} = P&,A)(l + p(x,B)) p(x, A) + (1 + p(x, A))p(x, B) ’ Prove that f is continuous, /(Q) C [0,1], f(A) = {0}, and f(B) = {1}. + 7.48 Consider an open ball Br(x) in a metric space (Q, p). a) Show that Br(x) C {y : p(x,y) < r}. b) Show that equality holds in part (a) for the case of a normed space. c) Give an example where the containment in part (a) is strict. d) The set Br(x) — { у : p(x, y) < r } is called the closed ball of radius r centered at x. Verify that Br(x) is a closed set. 7.49 Verify the formula UZ=i = Ufc=i Ak- 7.50 Suppose that Q and A are topological spaces and /:Q —► Л. Show f is continuous if and only if /~X(F) is closed in Q whenever F is closed in Л. 7.51 Suppose that Q and Л are topological spaces and f : Q —> Л. Show that f is continuous if and only if f(A) C f(A) for all A C Q. 7.52 Suppose Q and Л are topological spaces and f: Q —> Л. If f is continuous, does it follow that f(E) is closed (open) whenever E is a closed (open) subset of Q? Explain your answer. + 7.53 Suppose Q and Л are topological spaces and f: Q —> Л. a) Show that the condition “f(xn) -* f(x) whenever xn ж” is necessary for f to be continuous. b) Show that the condition in part (a) is sufficient when Q is metrizable. Hint: Refer to Exercise 7.32 on page 427. 7.54 Suppose that Q has the weak topology determined by some set F of func- tions. Let {xn}^-! be a sequence of points of Q and x G Q. Show that xn —* x if and only if f(xn) —> f(x) for all f G F. Hint: See Exercise 7.53.
438 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces ★ 7.55 Show that'a Cauchy sequence is convergent if it has a convergent subse- quence. 7.56 Show that if a sequence in a metric space in convergent, then it is Cauchy. 7.57 Give an example of a nonconvergent Cauchy sequence. 7.58 Prove Proposition 7.10 on page 435. it 7.59 Show that Ял is complete in each of the norms defined in Example 7.4 on page 422. it 7.60 Show that Cn is complete in each of the norms defined in Example 7.5 on page 422. 7.61 Let Q be a nonempty set. Show that each of the spaces ^(Q), ^2(Q), and described in Example 7.6 on page 423, is complete. 7.62 Let E be a subset of a topological space Q. Prove the following facts. a) E° is the union of all open sets contained in E and, hence, E° is the largest open set contained in E. b) Let 91 be a neighborhood basis that determines the topology. Then x 6 E° if and only if there is a W 6 91 such that x € W С E. 7.63 Prove Proposition 7.11 on page 436. it 7.64 Let Q be a topological space. For E C Q, define dE = E\E°. The set dE is called the boundary of E. Prove the following: a) dE is closed. b) E is closed if and only if дЕ С E. c) (dE)°=0._ d) dE = EC\Ec. 7.5 NETS AND CONTINUITY Proposition 7.9 on page 434 describes the limit points of a subset of a metric space in terms of convergent sequences. Example 7.10 on page 434, on the other hand, shows that a similar characterization fails to be correct for general topological spaces. In this section, we present a generalization of sequences that is flexible enough to permit a version of Proposition 7.9 to hold for general topolog- ical spaces and provides as well an alternative method for characterizing continuity. We first introduce the concept of a directed set. DEFINITION 7.18 Directed Set A directed set is a nonempty set I together with a relation having the following properties:
7.5 Nets and Continuity □ 439 a) l ь for each ь G I. b) Li b2 and b2 2^ imply b\ -< ьз. c) b2 € I implies there exists ьз G I such that b\ < ьз and b2 ьз. An element of a directed set is called an index. Remark: It follows easily from Definition 7.18 that for each finite subset J of 7, there exists а к G I such that ь к for each ь G J. EXAMPLE 7.1 1 Illustrates Definition 7.18 a) A nonempty subset of real numbers with the order relation < is a di- rected set. In particular, the set of integers greater than or equal to some fixed integer is a directed set. * b) Let 91 be a neighborhood basis and %lx = {7Vg91:tG7V}. For <7, V G %r, say that U V if U D V. Then $lx is a directed set with respect to the relation that is, with respect to D. c) Let Pjr(S) denote the collection of finite subsets of a set S. Then Pjr(S) is a directed set with respect to C. □ DEFINITION 7.19 Net, Convergence of Nets A net of points in a set fi is a function x from a directed set I into fi. The set I is called the index set of the net. We write xb for x(b) and denote the net by When (fi,T) is a topological space, a net of points in fi is said to converge to the point x G fi if for each open set О containing x, there is an index to G 1 such that xb G О whenever to Convergence of {rrt}t€i to x is denoted by lim xb = x or xb —> x qt, in case it is important to indicate the topology in which convergence occurs, by xb x. EXAMPLE 7.1 2 Illustrates Definition 7.19 a) A sequence is a net with I = Af and = <• A sequence in a topological space that converges to a point x is also a net converging to x. b) A slightly more general situation than in part (a) occurs when a se- quence of the form {zn}“=1 is rePlacecl by a net indexed by the set
440 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces of integers {j G Z : j > k} for some integer fc, where the relation is the usual < ordering. Such nets axe customarily also referred to as sequences. c) Consider a function J: [a, oo) —> 7£. Since [a, oo) is a directed set with re- spect to the relation <, the function f can be viewed as a net {ft}te[a,oo) in where ft = f(t). Furthermore, lim/t = L if and only if for each e > 0, there is a number M such that \f(t) — L\ <6 whenever t > a V M. d) Refer to Example 7.11(b). Suppose that xu G U for each U G ЭТЖ. Then the net {xu}u&jix converges to x. e) Let f be a Riemann integrable function on [a, b] and <S the collection of step functions on [a, b] that axe dominated by /. Then <S is a directed set with respect to the usual < ordering of functions. For each h G S, let yh = Ja6 h(x) dx. Then {yh}hes a net of real numbers and we have that lim г/h = f(x) dx. (See Section 2.6 starting on page 81.) □ Infinite Series and Infinite Sums Using nets, we can now discuss infinite series and infinite sums in normed linear spaces. DEFINITION 7.20 Infinite Series and Sums in Normed Spaces Let fi be a normed space and S an infinite subset of Z with the usual < ordering. Suppose that Xj G fi for each j G S. a) Assume that S = {j G Z : j > к } for some integer fc. Then the expression oo j=k is called an infinite series. Let sn = xj f°r each n G S. If the net {sn}nes converges in Q to s, then we say that the infinite series хз converges to s and write s = xj- Otherwise, we say that the infinite series fails to converge. b) In general, the expression j£S is called an infinite sum. Let sf = HjeF хз f°r eac^ & 'Pf(S), where denotes the collection of all finite subsets of S with the C ordering. If the net {sf}fep^(s) converges in fl to s, then
7.5 Nets and Continuity о 441 we say that the infinite sum ^,jeSXj converges to s and write s = ^2j^sx3' Otherwise, we say that the infinite sum fails to con- verge. In the special case that S is as in part (a), we denote the infinite sum Xj by Xj. Remark: If fi = 1Z and Xj > 0 for each j E S', then infinite sums are a special case of generalized sums, discussed in Exercise 2.37 on page 57. EXAMPLE 7.13 Illustrates Definition 7.20 a) Let fl be a normed space, S = { j E Z : j > к } for some integer fc, and Xj E fl for each j E S. We will show that convergence of the infinite sum to s implies convergence of the infinite series to s. Suppose that Xj converges to s. We must show that Xj also converges to s. Let e > 0 and choose a finite subset Fq of S such that ||s — sp|| < e whenever Fq C F. Let N = maxFo- Then for n > N, Fq C {j : к < j < n }; hence, ||s — sn|| < e. Thus, ^2°°.kXj converges to s. b) Let fl be a normed space, S an infinite subset of Z, and Xj E fl for each j. It is not difficult to show that the infinite series xj con- verges if and only if the infinite series 22^U xj converges, where к < t. Similarly, the infinite sum 22jes xj converges if and only if the infinite sum 52j€S\f хз converges, where F is a finite subset of S. ' c) By Exercise 7.67, the series 22^=i VJ converge. In fact, we have Итп_>оо Vi = °0- d) In part (a), we showed that if S = { j E Z : j > к }, then convergence of an infinite sum to s implies convergence of the corresponding infinite se- ries to s. Here we show that the converse is false. By Exercise 7.68, the series converges. However, the infinite sum /J fails to converge. Indeed, suppose Fq is a finite subset of positive inte- gers. Let N = maxFo and set F = Fq U { 2j : N < j < n }. Then, by part (c) of this example, £(-1)7; = £ + (1/2) E Vj jEF jEF0 j=N can be made arbitrarily large by choosing n sufficiently large. Hence, the infinite sum fa^s converge. e) In part (a), we showed that if S = { j E Z : j > к }, then convergence of an infinite sum to s implies convergence of the corresponding infinite se- ries to s and, in part (d), we showed that the converse of that statement
442 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces is false. We will now show that in the special case of Q = 11 and Xj > 0 for each j G 5, the converse is true. Thus, assume that Xj G [0, oo) for each j G S and that хз converges to s. Let e > 0. Choose N so that n > N implies |s — Y^j=kxj\ < €- ^et = {j : < J < N}. Then F G and Fq C F implies N M s-e<y\j<'£/xj<^xj<s + e, j=k jeF j=k where M = maxF. Therefore, Ylj>kx3 converges to s. f) Assume the XjS belong to a normed space. If the series ||х^|| converges, then ^j>k хз converges to s if and only if Xj converges to s. □ Remark: When the XjS are nonnegative real numbers, ^^кхз converge if and only if the terms Xj become arbitrarily large as n in- creases, that is, limn-^Q sn — oo. Consequently, in this case, we often indicate convergence of Xj by Xj < oo and lack of convergence by HT=kx3 =°°- Nets and Topological Properties Using nets, we can generalize Proposition 7.9 on page 434 to arbitrary topological spaces. PROPOSITION 7.12 Let E be a subset of a topological space П. Then x G E if and only if there is a net {xb}bEi of points in E such that limxt = x. > PROOF: Suppose that x G E. Let Tx be the collection of all open sets containing x. Then Tx is a directed set with the relation D. For each О G Tx, we have ОПЕ / 0; using the axiom of choice, we select xq 6 OoE. Then {xo}oerx is a net of points in E such that limxo = x. Conversely, suppose there is a net {xb}bEj of points in E such that limxt = x. Then, for each open set О containing x, we have xb G О for some index l. Because xb G E, it follows that ОПЕ / 0. Hence, x G E. We can also use nets to characterize continuity of functions. Before we do so, however, it will be convenient to introduce the idea of a subnet. For motivation, we note that a subsequence of a sequence {^n}^Li
7.5 Nets and Continuity □ 443 is really the composition of the sequence (i.e., the function x on AT) with the strictly increasing function n:AT —* AT defined by n(fc) = Thus, we have the following definition. DEFINITION 7.21 Subnet Let {xb}bej be a net with order relation A subnet of {xb}bej is a composition of that net (i.e., the function x on 7) with a function h : К -+ I, where AT is a directed set with order relation < such that the following conditions are satisfied: a) If «1 < «2» then h(«i) h(«2)- b) For each t 6 I, there is а к 6 К such that l h(«). Usually we write instead of and denote the subnet {хн(к)}кек by Of course, a subsequence is also a subnet. Other examples of subnets are considered in the exercises. We leave it to the reader as an exercise to show that if a net converges to a;, then so does every subnet of that net. We now use nets to characterize continuity of functions. , THEOREM 7.1 Let Q and Л be topological spaces and f:£l —* Л. Then the following conditions are equivalent: a) For each x G fi and each open set V containing f(x), there is an open set U containing x such that f(U) С V. b) f is continuous, that is, /-1(O) is open in Q whenever О is open in A. c) f-ifF) is closed in Q whenever F is closed in Л. d) If {xb}bei is a net converging to x, then {f(xb)}bei has a subnet con- verging to f(x). e) If {xb}bej is a net converging to x, then {f(xb)}bei converges to f(x). PROOF: The equivalence of (a) and (b) is shown by Proposition 7.3 on page 414 and the observation that a topology is a neighborhood basis for itself. The equivalence of (b) and (c) follows at once from the set the- oretic identity f-^F0) = (f-^F))0. To complete the proof, it suffices to establish the chain of implications (a) implies (e), (e) implies (d), and (d) implies (c). Suppose (a) holds and that {xb}bei is a net in Q such that lima;t = x. Let V be an open set containing /(x). Then, by the continuity of /, there
444 □ Chapter 7 Elements of Topological Metric, and Normed Spaces is an open set U containing x such that f(U) С V. Since limxt = x, there is an index lq such that xL G U whenever lq b. It follows that f(xb) € С V whenever bQ b. Thus, lim/(xj = f(x). Next, suppose that (e) holds and that {xt}tG/ is a net in Q such that limxt = x. Then lim/(xj = f(x). Because {/(xt)}t€j is a subnet of itself, it follows that (d) holds. Finally, suppose that (d) holds and that F is a closed subset of Л. We will show that /""1(F) is closed by proving that f^fJF) — Let x G /~1(F). By Proposition 7.12, there is a net {xb}Lei in /~1(F) con- verging to x. It follows from (d) that {/(xt)}tG/ has a subnet {/(хЬк)}кек converging to f(x). So, by Proposition 7.12 again, f(x) G F = F and, hence, x G /"”1(F). We have shown that /-1(F) C /'’1(F). Because the reverse containment is trivial, we have established that /-1(F) = /-1(F). Consequently, by Proposition 7.8(a) on page 433, /“1(F) is closed. Motivated by Theorem 7.1(a), we can define continuity at a point for a function from one topological space to another. Let Q and Л be topological spaces and f: Q —* Л. We say that f is continuous at a point x G П if for each open set V containing /(x), there is an open set U containing x such that f(U) С V, We see from Theorem 7.1 that f is continuous if and only if it is continuous at each point of П. Criteria for Convergence of Nets The two main types of topological spaces we have studied thus far are met- ric spaces (including normed spaces) and spaces with weak topologies. For each of these two types of topological spaces, we have a simple characteri- zation of convergence of nets. PROPOSITION 7.13 Let (12, T) be a topological space and let {xb}Lei be a net in fi. a) If the topology T is induced by a metric p, then limxt = x if and only if lim p(xb,x) = 0. b) If the topology T is the weak topology determined by a family of func- tions F, then limxt = x if and only if lim f(xb) = f(x) for each f 6 F. PROOF: a) Suppose that limp(xt,x) = 0. If О is an open set containing x, then there is an e > 0 such that B€(x) С O. Moreover, there is an index to such that p(xt,x) < e for to Thus, x G О for to t. It follows that limxt = x. Conversely, suppose that limxt = x. Then given б > 0,
7.5 Nets and Continuity □ 445 there is an index lq such that xL € Be(x) for to Hence, p(xL,x) < e whenever to t. Thus, limp(xt, x) = 0. b) Suppose that limxt = x. Then, by Theorem 7.1(e), lim/(a;J = f(x) for each f € T. Conversely, suppose that lim/(xt) = /(x) for each f G F. Let О be an open set containing x. Then, by Proposition 7.7 on page 428, there exist n 6 Л7, C J7, and Uj € Tf., 1 < j < n, such that x e П ZfW c °- j=l Now, for each j = 1, 2, ..., n, since fj(x) e Uj and lim//3^) = there exists an index tj such that fj(xL) в Uj whenever t5 z< t. Because I is a directed set, there is an index to such that t; to for each j. Therefore, we have e П ZfW c °’ j=l whenever to t. Thus, limx,, = x. EXERCISES 7.5 7.65 Verify the assertions made in Example 7.12 on page 439. In Exercises 7.66-7.74, we are using the notation introduced in Definition 7.20 on page 440- 7.66 Refer to Definition 7.20. a) Show that an infinite series Xj converges if and only if the series Xj converges, where к < I. b) Show that an infinite sum converges if and only if the sum xi converges, where F is a finite subset of S. 7.67 Show that Y^jLi fails converSe- 7.68 Show that l)J/j converges. 7.69 Suppose ||xj || < oo. Show that Xj converges to s if and only if Xj converges to s. *7.70 Assume the ZjS are complex numbers. a) Prove that the infinite sum ^2jes Zj converges if there are nonnegative real numbers bj, j € S, such that < bj for all but finitely many j € S and bi < °°- b) Prove that \zj\ < oo implies ^2jeSZj converges.
446 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces 7.71 Let a and /3 be scalars. a) Show that if the infinite sums Yljes xi and jes Уэ converg6> then so does ^2jeS(otXj + fryj) and, moreover, ^(axj + /3yj) = а^х,+/3^у,. jes jes jes b) Show that the results of part (a) remain valid for infinite series. 7.72 Let S and T be disjoint infinite subsets of integers. Suppose that two of the three infinite sums, and are convergent- Prove that all three are convergent and that 52 = jesur jes jer Xj • 7.73 Let be a convergent infinite series in a normed space. a) Prove that lim xn — 0. b) Prove that lim Xj = 0. 7.74 Let be a convergent infinite sum in a normed space. Show that the net < 52 <= f xi г converges to 0. I ) Fev^(s) 7.75 Consider a sequence of real numbers. Show that the function ft = G[t], where [ ] denotes the greatest integer function and t € [1, oo), is a subnet of {an}“=1- The order relation on [1, oo) is understood to be <. 7.76 Consider a function /:[l,oo) —> TZ. Let ft = /(t). Then {ft}te[i,oo) is a net in 1Z with respect to the usual < ordering. Suppose that {tnjJXi is a nondecreasing sequence in [l,oo). Show that {ftn}^-! is a subnet {/t}tG[i,oo) И and only if limn-oo tn = oo. 7.77 Suppose that is a subnet of {xj^ez and that {хСк^}р.ем is a subnet of {xtK}«GK- Show that is a subnet of {rrt}tez- 7.78 Let be a net such that limxt = x. Show that if {хСк]кек is a subnet of {ziJigz, then limxtK = x. + 7.79 Prove that a Cauchy sequence converges if and only if it has a convergent subnet. +7.80 Let T and U be topologies on a set Q. Show that T is weaker than U if and only if xL x implies xc x. 7.81 Let Q be the Cartesian product of a family {Qk}kgk of topological spaces and suppose that Q is given the product topology. Let {rctjiez be a net in Q. Show that limzt = x if and only if limpK(xt) = pK(x) for each к e K. 7.82 Prove that the sum and product of continuous complex-valued functions on a topological space are continuous.
7.6 Separation Properties □ 447 7.83 Suppose that f is a complex-valued continuous function on a topological space Q. Prove that the function g defined on /~1({0}c) by g(x) = l/f(x) is continuous. 7.84 Let Q and Л be topological spaces and f: Q —> Л. Prove using nets that f is continuous if and only if f(E) C f(E) for each E C Q. ★7.85 Let /:Q —> A and g:A —> Г be continuous. Show that p о /:Q —> Г is continuous. In words, the composition of continuous functions is continuous. 7.86 Let Q be the Cartesian product of a family {Qk}kgk of topological spaces and suppose that Q is given the product topology. Let Л be a topological space and f : Л —► Q. Show that f is continuous if and only if рк о f is continuous for each к G К. +7.87 Let Q be a topological space and /:Q —> H. Prove that the following conditions are equivalent. a) f is continuous. b) /~1((—oo, a)) and J"1 ((a, oo)) are open for each a G 7£. с) /“‘ЧС—oo, a]) and /-1([a, oo)) are closed for each a G TZ. 7.88 Let Q be a set and (A,p) a metric space. A net of functions from Q to A is said to converge uniformly to the function f if for each e > 0, there is an index to such that p(fL(x\ f(x)) < e for all x E Q whenever to t. Show that if Q is a topological space, fL is continuous for each t G I, and converges uniformly to /, then f is continuous. +7.89 Let Q be a topological space, (A, || ||) a complete normed space, and S an infinite subset of integers. Suppose that for each j G S, fj'.ft —» A is continuous and there is a bj G such that \\fj (x)|| < bj for all x G Q. Show that if bi < °°> then fW = fi (x) defines a continuous function from Q into A. Hint: See Exercise 7.88. 7.6 SEPARATION PROPERTIES In this section, we take up the topic of separation in topological spaces. Two subsets A and В of a topological space Q are said to be separated if there exist open sets U and V such that A C U, В С V, and U П V = 0. When it is important to emphasize the role of the sets U and V, we will say that A and В are separated by U and V. EXAMPLE 7.14 Illustrates Separated Sets a) Consider the normed space (C([a, 5]), || ||oq) discussed in Example 7.8 on page 425. For f G C([a, b], we have ||/||oo = sup{ \f(x)I : x [a> Ц }. (Why?) Thus, when two functions Д and /2 are “close” with respect to this norm, say, ||/i — У2Ц00 < for some small number 5, it means
448 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces that |/i (x) — /2(^)1 < <5 for all x G [a, b], that is, the two functions are uniformly close. Suppose E C C([a, b]) and / 6 C([a, b]). Then f e E if and only if for each б > 0, there exists a function g 6 E such that ||/ — <?||оо < in other words, f G E if and only if it can be uniformly approximated arbitrarily closely by members of E. On the other hand, f E if and only if there is an б > 0 such that E C Ue = { h : \\f — h||oo > e}. Because {/} C Vc = {h : \\f — < б}, it follows that / is not uniformly approximable by members of E if and only if {/} and E are separated by the open sets Ue and V€ for some e > 0. b) Suppose that A and В are disjoint closed disks in the plane 112. Then there is a line L = { (т, у) : ax+by = c } such that A and В are separated by the corresponding open half-planes L_ = { (x, y) : ax + by < c } and L± = { (rr, ?/) : ax + by > c}. The topic of separation by half-spaces is important in our subsequent study of normed linear spaces. c) Let (Q,p) be a metric space and A and В disjoint closed subsets of Q. By Exercise 7.47 on page 437, the function f(x) = p(x. A) — p(x, B) is continuous on Q. It follows that A and В are separated by the open sets /“1((—00,0)) and /-1((0,00)). □ DEFINITION 7.22 Hausdorff Space, Normal Space Let fi be a topological space. a) fi is said to be a Hausdorff space if distinct points are separated; that is, x / у implies that {ж} and {y} are separated. b) fi is said to be a normal space if disjoint closed sets are separated; that is, A and В closed and А О В = 0 implies that A and В are separated. While it is not true in general that a normal space is a Hausdorff space (see Exercise 7.91), it is obvious that a normal space is Hausdorff if all single element subsets are closed. A space with the property that all single element subsets are closed is called a Ti-space. Hausdorff spaces are always Ti-spaces. From now on, whenever we consider a topological space, we will assume implicitly that it is a Ti -space. Example 7.14(c) shows that all metric spaces are normal. And it is an easy exercise to prove that all metric spaces are Hausdorff. Later we will see examples of normal and Hausdorff spaces that are not metric spaces.
7.6 Separation Properties □ 449 Existence of Continuous Functions Given an arbitrary topological space Q, it is not at all clear that there exist nonconstant, real-valued continuous functions on Q. However, as we will see momentarily, normal spaces always possess an abundance of such functions. First we need the following characterization of normal spaces. PROPOSITION 7.14 A topological space Q is normal if and only if for each closed set F and each open set О with F С O, there exists an open set W such that FcWcWcO. PROOF: Suppose that Q is normal. Let F be closed, О open, and F С O. Then F and Oc are disjoint closed sets. It follows that there are open sets W and U with F C W, Oc C (7, and U A W = 0. Because W C Uc and Uc is closed, it follows that W C Uc C (Oc)c = O. To prove the converse, let A and В be disjoint closed sets. Taking F = A and О = Bc, there is, by assumption, an open set W such that A C W C W C Bc. But IVе is open and contains B, and W A Wc = 0. Thus, A and В are separated by W and W . The string of containments F C W C W С О in the statement of Proposition 7.14 invites iteration: We can find open sets U and V such that F CU CU CW CW CV CV CO. (7.7) To iterate further, we need better notation. A natural and judicious choice is to use binary digits as follows: W = W.io, U = Woi, and V = W,ц. Then (7.7) becomes F C W.oi C W.oi C W io С ТУ.ю С Ж11 C W.n C O. This construction can now be carried on indefinitely to yield the following lemma. The details of its proof are left to the reader as an exercise. LEMMA 7.1 Suppose that Q is a normal space. Let F be closed, О open, and F с O. Furthermore, let T denote the set of numbers in the interval (0,1) that have terminating binary expansions. Then there is a collection of open sets { Ж :t e T } such that t,s ET and t < s implies F C Wt CWSCO. Using Lemma 7.1, we can now construct nonconstant, continuous, real- valued functions on normal spaces.
450 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces THEOREM 7.2 Urysohn’s Lemma Let A and В be disjoint closed nonempty subsets of a normal space П. Then there is a continuous function f :fl —> such that f(Q) C [0,1], /(A) = {0}, and/(B) = {1}. PROOF: First we apply Lemma 7.1 with F = A and О = Bc to obtain a collection of open sets { Wt : t G T } such that t, s G T and t < s implies A C Wt C C Bc. Also, we let Wx = Q and To = TU {1}. Now we define a function f on f2 by f(x) = inf{ t G To : x G Wt}. Clearly, f takes values only in [0,1]. If x G A, then x G Wt for each t G T. Because T is dense in [0,1], it follows that f(x) = 0. On the other hand, if x G B, then {t GT0 : x G } = {1}. Thus, f(x) = 1. It remains to show that f is continuous on Q. By Exercise 7.87 on page 447, it is enough to prove that for each real number s, /“1((—oo, $)) is open and oo,s]) is closed. First note that r—i// \\ Г 0, s < Oj j p—iff ix Г 0, S < 0, f ((-оо,в)) = |П) s>1 and f ((-«м]) = |П) Again using the fact that T is dense in [0,1], we have /-i((-oo,s)) = (J Wt, se(o,l], (7.8) t<s and r1((-TO>S]) = p|TFt, SG[O,1). (7.9) 3<t Equation (7.8) shows that for s G (0,1], /“1((—oo,s)) is open, being a union of open sets. And (7.9) shows that for s G [0,1), /-1((-oo, $]) is closed, being an intersection of closed sets. We have now shown that for all s G R, /“1((—oo,s)) is open and /-1((—oo, $]) is closed. Hence, f is continuous. Remark: Exercise 7.47(f) on page 437 provides a quick elementary proof of the metric-space version of Urysohn’s lemma. Urysohn’s lemma is frequently used to obtain continuous approxima- tions to characteristic functions of closed sets. Typically, one has a closed subset F of some normal space Q and an open set О containing F that is
7.6 Separation Properties □ 451 “close” to F in some sense. Applying Urysohn’s lemma with В = F and A = Oc, we obtain a continuous function f with values in [0,1] that agrees with the characteristic function of F everywhere except possibly on О \ F. When fi = 1Z, F = [a, b], and О = (a — 6, b + e), this approximation procedure is nicely illustrated by the continuous function that is 1 on [a, b], 0 on Oc, and linear on each of the intervals (a — e, a) and (b, b + e). Later, when we study spaces of continuous functions, we will rely heavily on the approximation of characteristic functions. As a more immediate application of Urysohn’s lemma, we present the following important result. THEOREM 7.3 Tietze’s Extension Theorem Let fi be $ normal space, F a closed subset of fi, and continu- ous function. Then there exists a continuous function /: fi —» such that f(x) = f(x) for each x e F. Moreover, if M — sup{ |/(#)| : x e F } < oo, then f may be chosen such that sup{ |/(x)| : x G fi } = M. PROOF: If M = 0, the result is trivial. We next consider the case where M is finite and nonzero. Without loss of generality, we can assume M = 1. (Why is that so?) Because f is continuous on F, the sets A = oo, —1/3]) and В — /“"1([l/3, oo)) are relatively closed in F and, because F is a closed subset of fi, A and В are also closed in fi. So, by Urysohn’s lemma, there is a continuous function pi on fi such that pi(fi) C [—1/3,1/3], gi(x) = —1/3 for all x 6 A, and pi(x) = 1/3 for all x G B. It follows that the continuous function /i defined on F via fi = f~9i satisfies |/i(a:)| <2/3 for all x G F. Similarly, applying Urysohn’s lemma to the sets A = oo, —2/9]) and В = /1“1([2/9, oo)), we can obtain a continuous function p2 on fi such that p2(fi) C [—2/9,2/9], рг(^) — -2/9 for all x G A, and рг(х) = 2/9 for all x G B. It follows that the continuous function /2 defined on F via /2 = fi - 92 = f “ (Pi + P2) satisfies |/2(z)| < 4/9 for all x G F. We now proceed inductively to construct a sequence {pn}^=i of con- tinuous functions on fi such that |pn(rr) | < 2n-1/3n for all x G fi and /(*)-$>) < (2/3)”, x G F J=1 It follows from Exercise 7.89 on page 447, that the function f = gn is continuous on fi. And the previous two inequalities show that |/(x)| < 1 for each x G fi and that /(x) = /(x) for each x G F.
452 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces It remains to consider the case where f is unbounded. To that end, define /о = arctan/. Since |/o(z)| < тг/2 for all x e F, we can apply the results just proved for bounded functions to obtain a continuous function po- fi —* 'R such that go(x) = /o(^) for each x E F. The function f = tanp0 is continuous on Q and is such that f(x) = /(x) for each x € F. EXERCISES 7.6 7.90 Consider the subsets of 7£2 given by A = {(x, у) : x > 0, у > 1/x} and В = { (x,0) : x > 0}. a) Show that A and В are disjoint closed sets that cannot be separated by open half-planes in the sense of Example 7.14(b) on page 448. b) Find explicitly open sets U and V that separate A and B. 7.91 Provide an example of a normal space that is not Hausdorff. Hint: Refer to Example 7.2(c) on page 416. 7.92 Show that all metric spaces are Hausdorff. 7.93 Let T = {0} U { W C A/*: Wc is a finite set} where, as usual, N denotes the set of positive integers. Show that T is a topology on Af and that (N,T) is a Ti -space. 7.94 Refer to Exercise 7.93. Show that the topological space (AT, T) is neither a Hausdorff space nor a normal space. 7.95 Describe all continuous functions f'.N —► 7£, where Af is given the topol- ogy T defined in Exercise 7.93. 7.96 Describe all convergent sequences in A/*, where is given the topology T defined in Exercise 7.93. 7.97 Prove that a normal Ti-space is a Hausdorff space. 7.98 Prove that a Hausdorff space is a Ti-space. 7.99 Prove that a topological space is Hausdorff if and only if convergent nets have unique limits (i.e., limxt = x and limxt, = у imply x = y). 7.100 Let Q be a nonempty set and 7> the weak topology on Q determined by a family of functions F. Suppose that for each f E F, the space /(Q) is Hausdorff. Show that (Q, T) is a Hausdorff space if and only if F separates the points of f^(i.e., x, у E Q and x у imply that there exists an f E F such that /(x) / /(?/)). 7.101 Provide the details of the proof of Lemma 7.1 on page 449. 7.102 Let S be a nonempty set. Formulate and prove a version of the Tietze extension theorem where 11 is replaced by the Cartesian product 1ZS with the product topology. 7.103 Show that Theorem 7.3 on page 451 is no longer valid if F is assumed to be open instead of closed.
7.7 Connected Sets □ 453 7.104 Let F be a closed subset of TZ and f: F —> TZ be continuous. From Propo- sition 2.13 on page 59, we can write Fc = Uje.s where 5 is a count- able collection of disjoint open intervals. Construct a continuous function g: TZ —> TZ that agrees with f on F and is linear on each interval JeS. 7.7 CONNECTED SETS If D is a subset of TZ, then, except in trivial cases, the characteristic func- tion xd is not continuous. There are, however, many topological spaces that have nonconstant, continuous characteristic functions. For example, if fi = [0,1] U [2,3] is given the relative topology from TZ, then X[o,i] is a continuous function on fi. Such topological spaces are called disconnected. DEFINITION 7.23 Disconnected and Connected Spaces A topological space having at least one nonconstant, continuous char- acteristic function is said to be disconnected. A topological space that is not disconnected is said to be connected. A subset of a topo- logical space is called (dis)connected if it is (dis)connected with respect to the relative topology. If f is a nonconstant, continuous characteristic function on a topologi- cal space fi, then oo, 1/2)) and /”1((l/2, oo)) are disjoint nonempty open sets whose union is fi. Thus, we see that each of the following condi- tions is equivalent to a topological space, fi, being disconnected: • fi can be decomposed into two disjoint nonempty open sets. • fi contains a proper, nonempty subset that is both open and closed. The following proposition provides yet another way of characterizing disconnected sets. PROPOSITION 7.15 A subset D of a topological space fi is disconnected if and only if there are nonempty sets A and В such that D = AuB. АПВ = 0, and АП В = 0. PROOF: Suppose that D is a disconnected subset of fi. Let f be a non- constant characteristic function on D that is continuous with respect to the relative topology. Because A = Z"1 ((1/2,3/2)) = /“1({1}), we have
454 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces that A is nonempty and both open and closed in the relative topology. Similarly, В = D \ A = /”1({0}) is also relatively open, relatively closed, and nonempty. It follows that there is a closed subset F of fi such that A — F П D. Because F A В = 0 and A C F, we have A A В = 0. Similarly, we have A A В = 0. Conversely,, suppose that there are nonempty sets A and В such^that D = AUB, A A В = 0, and A A В = 0. Then we have A = D A A and В = D A B. Thus, A and В are relatively closed. Since В = D \ A, В is also relatively open and, similarly, A is relatively open. It follows easily that the characteristic function xa is nonconstant on D and continuous in the relative topology. Consequently, D is a disconnected subset of fi. EXAMPLE 7.15 Illustrates Connected Sets Let fl be a topological space and x e fl. Then it follows easily from Proposition 7.15 that each singleton subset of fl is connected. The Cantor set provides an example of a topological space in which the only connected subsets are singletons. □ EXAMPLE 7.16 Illustrates Connected Subsets of It In this example, we will establish the fact that the connected subsets of 1Z are precisely the intervals (including degenerate intervals). Let D be a connected subset of 11. If D = 0, then it is also a degenerate interval (e.g., (я,#]). If D is a singleton set, {x}, then it is a degenerate interval of the form [ж,ж]. So, assume that D contains more than one point. Let a, b € D with а < b and let c 6 (a, b). If c does not lie in D, then the sets A = D A (c, oo) and В = D A (—oo,c) are relatively open, disjoint, and their union is D. Thus, D is disconnected, a contradiction. Hence, the interval (a, b) is contained in D whenever a and b are elements of D with а < b. It follows immediately that D is equal to (inf D,supD), (inf D, sup D], [inf D, sup D), or [inf P, sup D]. Conversely, suppose that D is an interval. We claim that D is con- nected. Assume to the contrary. Then, by Proposition 7.15, there are nonempty sets A and В such that D = AU В, A A В = 0, and A A В = 0. Let a e A and b e B, and assume without loss of generality that а < b. Consider the set C = { x : [a, x] C A }. We note that C/0 because а € C. Because b A, C is bounded above by b. Thus, и = sup C is a real number and а < и < b. There are three possibilities: и e А, и E B, or и D. The last possibility can be eliminated immediately because а < и < b and both a and b lie in the interval D.
7.7 Connected Sets □ 455 Suppose и € A. Because, for each n € Af, [a, и 4-1 /п] is not a subset of A, it follows that we can find an element bn e В П [и, и 4- l/п]. And, since Ишп-^оо bn = u,_we have that и € А П B. But this contradicts the assumption that А П В = 0. On the other hand, suppose и € В. Then u> a and, so, и — 1/n € A for sufficientlyjarge n. Consequently, because limn^oo и — 1/n = u, we have that и € А П B. But this contradicts the assumption that А П В = 0. □ One of the most useful properties of connected spaces is described in the following theorem which, in words, states that the continuous image of a connected space is connected. THEOREM 7.4 Let fi be a connected topological space and /: Q —► Л be a continuous function. Then f (fi) is a connected subset of A. PROOF: Suppose to the contrary that /(Q) is not connected. Then there is a nonconstant, continuous characteristic function д on /(Q). It follows from Exercise 7.85 on page 447 that the nonconstant characteristic function gof is continuous on fi. Thus, fi is not connected, a contradiction. Combining Theorem 7.4 and Example 7.16, we immediately obtain the following two corollaries. COROLLARY 7.2 Let f be a real-valued continuous function on a connected topological space fi. Then f(£l) is a (possibly degenerate) interval. COROLLARY 7.3 Intermediate Value Theorem Let f be a real-valued continuous function on a closed bounded inter- val [a,b]. Then for each number у between f(d) and f(b), there is an x € [a, b] such that /(x) = y. Arcwise Connected Spaces Let fi be a topological space and p and q points of Q. Then we say that p is connected to q by an arc if there exist a, b e and a continuous function g-. [a, b] —> Q such that p = g(a) and q = g(b). The set A = <j([a, b])-
456 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces is called an arc connecting p to q. It is easy to show that the following hold for all points p,q,r € Q. (See Exercise 7.111.) • p is connected to itself by an arc. • If p is .connected to q by an arc, then q is connected to p by an arc. • If p is connected to q by an arc and q is connected to r by an arc, then p is connected to r by an arc. Note: In view of the second bulleted item, we can unambiguously use phrases such as “p and q are connected by an arc” and “there is an arc connecting p and The space fi is said to be arcwise connected if for every pair of points p, q € fi, there is an arc connecting p and q. The next proposition shows that arcwise connected spaces are always connected. PROPOSITION 7.16 An arcwise connected topological space is connected. PROOF: Suppose that fi is arcwise connected but not connected. Let д be a nonconstant, continuous characteristic function on fi. Let p and q be points of fi such that <?(p) = 0 and g(q) = 1. As fi is arcwise connected, there is an interval [a, b] and a continuous function ft [a, b] —* Q such that /(a) = p and f (b) = q. It follows that g о f is a nonconstant, continuous characteristic function on [a, b], implying that [a,b] is disconnected. But, by Example 7.16 on page 454, the interval [a, b] is connected. Thus, we have reached a contradiction. Hence, Q must be connected. The converse of Proposition 7.16 is false. (See Exercise 7.109.) There is, however, a converse for open subsets of a normed linear space. PROPOSITION 7.17 A connected open subset of a normed space is arcwise connected. PROOF: Suppose that D is a nonempty open subset of a normed space. Let p e D and W be the set of all points of D that are connected to p by an arc in D. Since p e W, W is nonempty. We claim that W is open. Let q € W. Then q e D and, hence, there is an r > 0 such that Br(q) C D. If x 6 Br(q), then the arc { q + t(x — <?) : 0 < t < 1} connects q to x and lies inside Br(q). It follows that p is connected to x by an arc in D. Thus, Br(g) C W and, hence, W is open.
7.7 Connected Sets □ 457 We also claim that D \ W is open. Let q € D \ W. Then q € D and, so, as we discovered in the previous paragraph, there is an r > 0 such that Br(q) C D and any point of Br(q) is connected to q by an arc in D. If a point of Br(g) is connected to p by an arc in P, then there would be an arc in D connecting p to q, contradicting the assumption that q € D \ W. Thus, Br(q) C D \ W and, hence, D \ W is open. We have shown that W is both open and closed in D. Because D is connected and W is nonempty, it follows that D = W. As any point of D is connected to p by an arc in P, any two points of D must be connected to each other by an arc in D. Hence, D is arcwise connected. Remark: A normed space fl is always arcwise connected. Indeed, if x G fl, then the arc {tx : 0 < t < 1} connects 0 to x. Hence, any point of fl is connected to 0 by an arc and, so, any two points of fl are connected to each other by an arc. Connected Components, Totally Disconnected Spaces We will now discover how a topological space can be decomposed as the union of a family of pairwise disjoint connected subsets. First we state two propositions, whose proofs are left to the reader as Exercises 7.113 and 7.114. PROPOSITION 7.18 Let S be a collection of connected subsets of a topological space fl. Suppose that Pi П D2 / 0 whenever Di,D2 € S. Then Unes^ IS a connected subset of fl. PROPOSITION 7.19 Let fl be a topological space and A a connected subset of fl. Then A is also connected. Given a point x in a topological space fl, we can apply Proposition 7.18 with S equal to the collection of all connected subsets of fl containing x to obtain a connected set Cx. The set Cx is the largest connected sub- set of fl that contains x and is called the connected component of fl containing x.
458 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces THEOREM 7.5 Let Q be a topological space. a) For each pair of elements x, 2/ € fl, either Cx = Cy or СХПСУ = 0. b) For each x G П, Cx is closed. c) fi = UiGQ Cx. PROOF: а) К Cx П Cy / 0, then, by Proposition 7.18, Cx U Cy is connected. It follows that Cx U Cy C Cx and Cx U Cy C Cy. Hence, Cx = Cy. b) By Proposition 7.19, Cx is connected for each x € Q. Hence, Cx C Cx and, so, Cx is closed. Thus, (b) holds. c) The proof of (c) is trivial because x € Cx for all x G Q. A topological space Q is said to be totally disconnected if all of its connected components are single element sets. EXAMPLE 7.17 Illustrates Totally Disconnected Spaces a) Any nonempty set is totally disconnected with respect to the discrete topology. b) The set Q of rational numbers, equipped with the relative topology inherited from 11, is totally disconnected. c) The Cantor set P, equipped with the relative topology inherited from H, is totally disconnected. □ EXERCISES 7.7 7.105 Show that a topological space is disconnected if and only if it has a subset that is proper, nonempty, open, and closed. 7.106 Show that a continuous integer-valued function on a connected space must be constant. 7.107 Refer to Exercise 7.64 on page 438. Let Q be a topological space and A C Q. Suppose g: [0,1] —► Q is a continuous function such that p(0) G A and </(l) G Ac. Show that there exists an s G [0,1] such that g(s) G dA. 7.108 Provide the omitted details of Example 7.16 on page 454. 7.109 Give an example of a topological space that is connected but not arcwise connected. Hint: Consider the following subset of 1Z2: {(О,?/) G 7£2 : -1 < у < 1} U { (ж, 2/) G 7£2 : x > 0, у = sin(l/x) }. 7.110 Consider the normed linear space (C([a, 5]), || ||oo) from Example 7.8 on page 425. Which of the following subsets of C([a, 5]) are connected? Pro- vide a proof in each case. a) { g : g is real-valued and never 0 },
7.8 Separability, Second Countability, and Metrizability □ 459 b) { g : g(x) > 0 for each x 6 [a, 5] }, c) { 9 • 9 is never 0 on [a, 6] }. 7.111 Let p, q, and r be points of a topological space Q. Prove each of the following: a) p is connected to itself by an arc. b) If p is connected to q by an arc, then q is connected to p by an arc. c) If p is connected to q by an arc and q is connected to r by an arc, then p is connected to r by an arc. 7.112 Let Q be a topological space. For x 6 Q, define the arcwise connected component of x by Ax = { у £ Q : у is connected to x by an arc }. a) Prove analogues of parts (a) and (c) of Theorem 7.5 on page 458 using arcwise connected components in place of connected components. b) Show that the analogue of part (b) of Theorem 7.5 is false in general, but is true if Q is an open subset of a normed space and, so, in particular, is true if Q is a normed space. 7.113 Prove Proposition 7.18 on page 457. 7.114 Prove Proposition 7.19 on page 457. 7.115 Let T denote the unit circle centered at 0 in the complex plane C; let C(T) be the space of complex-valued continuous functions defined on T equipped with the norm Ц/Ц = sup{ |/(z)| : z 6 T}; and let G be the set of non- vanishing functions in C(T). a) Show that G is open. b) Describe the connected component of the constant function 1. c) Describe the connected components of G. 7.116 The Cantor function restricted to the Cantor set is an example of a con- tinuous function mapping a totally disconnected space onto a connected space. Show that if Q is a connected space, then there are no nonconstant continuous functions from Q into the Cantor set. 7.8 SEPARABILITY, SECOND COUNTABILITY, AND METRIZABILITY In this section, we will discuss separable spaces and a related class of spaces known as second countable spaces. We will also prove a powerful theorem that gives a sufficient condition for a topological space to be metrizable. Separable Spaces Recall that a subset E of a topological space fi is dense if E = Q. A crucial property of the space 1Z of real numbers is that it contains a countable subset that is dense; for example, the countable set Q of rational numbers is
460 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces dense, as we know from Proposition 2.4 on page 39. Many of the topological spaces of interest in analysis share with 1Z the property of having subsets that are both countable and dense. Such spaces are called separable. DEFINITION 7.24 Separable Space A topological space Q is said to be separable if it contains a countable dense subset; that is, if there is a set E С fi such that E is countable and E = Q. EXAMPLE 7.1 8 Illustrates Definition 7.24 In this example, we use the notation Q + iQ for the set of complex num- bers having rational real and imaginary parts. We note that Q + iQ is a countable set. (Why?) a) Consider the space ^1(ЛГ). For each n G V, the set An = { f € ^(Af) : /(j) G Q 4- iQ, l<j <n, and /(j) =0, j > n } is countable. Hence, by Proposition 1.10 on page 23, A = U^Li is also countable. It is left for Exercise 7.120 to show that A is dense in ^(Af). Thus, is separable. b) Consider the normed space (C([a, b]), || ||oo) discussed in Example 7.8 on page 425. For each n € Af, let PLn denote the set of / G C([a,b]) with the property that, for each j = 0, 1, 2, ..., n — 1, the restriction of f to the subinterval [a4-j(b—a)/n, a4-(j4-l)(b—a)/n] is of the form rrijx+bj, where rrij, bj G Q 4- iQ. Each function in PLn is completely determined by a 2n-tuple of numbers in Q+iQ. Hence, PLn is a countable set and, so, PL = (J~ i PLn is also countable. It is left for Exercise 7.121 to show that PL is dense in C([a, 6]). Thus, <7([a, b]) is separable. □ r EXAMPLE 7.1 9 A Nonseparable Metric Space Consider the space €°°([0,l]). The family { X[0,t] • t £ [0,1] } of character- istic functions satisfies the condition: Bi/2(X[o,t]) П B1/2(x[o,s]) = 0 for t / s. (7.10) If E is a dense subset of Z?°°([0,1]), then, for each t G [0,1], there is an ft G E П B1/2(x[o,t])- By (7.10), no two fts can coincide. Because the collection { ft : t G [0,1] } is uncountable, it follows that E is not countable. Consequently, Z?°°([0,1]) is not separable. □
7.8 Separability, Second Count ability, and Metrizability □ 461 Second Countable Spaces We know that the collection I of open intervals forms a neighborhood basis determining the topology of TZ. By considering intervals in I with rational endpoints, we obtain a countable neighborhood basis determining the topology of TZ. There are many interesting spaces that, like TZ, have countable neighborhood bases. Such spaces are called second countable. DEFINITION 7.25 Second Countable Space A topological space is said to be second countable if it has a count- able neighborhood basis. The following proposition relates the concepts of second countable and separable for topological spaces. PROPOSITION 7.20 a) If a topological space is second countable, then it is separable, b) If a metric space is separable, then it is second countable. PROOF: a) Let 01 be a countable neighborhood basis for a topological space fi. For each nonempty U € % let хц G 17. The set { хц : U € 01, U / 0 } is dense because it has a nonempty intersection with each nonempty open set, and it is countable, because 01 is countable. Thus, fi is separable. b) Suppose that (fl, p) is a metric space containing a countable dense sub- set, say, E = Then the collection of open balls : j, fc = 1, 2, ...} is countable. And it is easy to show that is a neighborhood basis on fl. (See Exercise 7.122.) We claim that ЯИ is a neighborhood basis for the topology induced by p. Let О be open with respect to the topology induced by p and let x e O. Choose e > 0 so that Be(x) С О and let fc be a positive integer such that 2/fc < e. Since E is dense, there exists a j such that Xj € Byk(x). Then x G Bi/k(xj) and Bi/k(xj) ®2/fc(*c) C Be(x) C O.
462 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces Thus, ЯН is a neighborhood basis for the topology induced by p and, so, (fl, p) is second countable. We next consider a consequence of second countability that will be useful later when we study compactness. Let E be a set. A collection S of sets such that E C Uses $ *s са^е^ a covering of E. A subcollection of S that is also a covering of E is called a subcovering. If the members of S are open in some topology, then S is called an open covering of E. A topological space fl is said to have the Lindelof property if every open covering of fl has a countable subcovering. PROPOSITION 7.21 A second countable topological space has the Lindelof property. PROOF: Suppose that fl is a topological space with a countable neigh- borhood basis {C7n}nj and let S be an open covering of fl. For each x E fl, we can choose an Ox E 5 and a positive integer nx such that x E Unx C Ox. The set of integers В = {nx : x E fl} is countable, being a subset of a countable set. For each m E B, we can choose an От E S such that Um C It follows that fl C c Thus, { Om : m E В } is a countable subcovering. The converse of Proposition 7.21 is false in general, but it is true for metric spaces. See Exercises 7.123-7.124. Metrization We conclude this section by stating and proving a theorem that provides a simple pair of conditions that are sufficient for a topological space to be metrizable. THEOREM 7.6 Urysohn’s Metrization Theorem A second countable, normal space is metrizable. PROOF: Let (fl, T} be a normal space with countable neighborhood ba- sis 91. Consider the countable set W = {(17, V) : U, V E У1 and U С V }. We show first that, for each open set О and point x E O, there is a pair (17, V) E W such that xtUcUcVcO. (7.11) Indeed, since 91 is a neighborhood basis, we can find a V E 91 such that x E V С O. Applying Proposition 7.14 on page 449 with F = {a;},
7.8 Separability, Second Countability, and Metrizability □ 463 we obtain an open set W such that x e W C W C VJ Again using the assumption that 91 is a neighborhood basis, we can find a U 6 91 such that x E U c W. It follows that the pair ((7, V) belongs to W and satisfies (7.11). Let {(Un, Vn)}n be an enumeration of W and apply Urysohn’s lemma (Theorem 7.2 on page 450) to obtain, for each n, a continuous function /n:f2 —> [0,1] that vanishes on Un and is constantly 1 on V£. Using the functions {/n}n, we define a function a on Q x Q by = 522-n|/n(x) - /n(y)|. n We claim that ст is a metric. That a(x,x) = 0, ст(х,?/) = сг(г/, rr), and <т(а;, y) < cr(x, z) + a(z, y) are easily verified. Thus, it remains only to show that a(a;,2/) > 0 if x / y. Because {y} is a closed set, (7.11) implies that there is a A; such that xeUkcU^cVkcQ\{2/}. Thus, fk(x) = 0 and fk(y) = 1. So, a(x,y) > 2~k\fk(x)-fk(y)\ = 2~fc > 0. The last step of the proof is to show that the topology Ta induced by the metric a is the same as T. For fixed у E Q, consider the function 9y(x) — follows from Exercise 7.89 on page 447 that gy is continuous with respect to the topology T. Hence, for each r > 0, the ball B°(y) = 9y \(—oo,r)) is T-open. Consequently, the topology is weaker than T. To prove that T is weaker than 7^, it suffices to show that if О is T-open and x E O, then there exists an s > 0 such that С O. Referring to (7.11), we see that we can find a positive integer m such that x e Um C Um c Vm C O. From 2~m\fm(x) - /m(t/)| < <r(x,t/), we deduce that, for у e B%_m(x), fm(y) = \fm(x) - fm(y)\ < 2ma(x,y) < 1. Because f is constantly 1 on it follows that B£_m(x) cVmcO. The following corollary of Urysohn’s metrization theorem provides a sufficient condition for a space with a weak topology to be metrizable. Its proof is left to the reader as Exercise 7.127. COROLLARY 7.4 Let Q be a set equipped with the weak topology induced by a family of functions F satisfying the following conditions: t See the paragraph following Definition 7.22 on page 448.
464 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces a) F is countable. b) Ifx.yeft and x^y, then there is an f e F such that f(x) / f(y). c) f(SY) is metrizable for each f e F. Then Q is metrizable. EXERCISES 7.8 7.117 Show that the spaces 7£n and Cn are separable. 7.118 Let E be a Lebesgue measurable subset of 1Z. Show that а) £г(Е) and £2(E) are separable. b) £°° (E) is not separable except in the trivial case where E has Lebesgue measure 0. 7.119 Work Exercise 7.118 with 7Z replaced by 7£n. 7.120 Show that the set A in Example 7.18(a) on page 460 is dense in 7.121 Show that the set PL in Example 7.18(b) on page 460 is dense in C([a, b]). 7.122 Refer to the first paragraph in the proof of part (b) of Proposition 7.20 on page 461. Show that 9Л is a neighborhood basis on Q. 7.123 Let V denote the topology on H determined by the neighborhood basis consisting of all intervals of the form [a, b). Show that (7£, V) is separable and has the Lindelof property but is not second countable. 7.124 Show that a metric space with the Lindelof property is second countable. it 7.125 A topological space is called first countable if at each point of the space, there is a countable neighborhood basis. Show that the space in Exer- cise 7.123 is first countable. 7.126 Show that the topological space in Example 7.19 on page 460 fails to be second countable. 7.127 Prove Corollary 7.4 on page 463. 7.128 Show that the conclusion of Corollary 7.4 on page 463 fails if the hypothesis that F is countable is omitted. 7.9 COMPACT METRIC SPACES The idea of compactness of a set of real numbers can be formulated in several ways — for example, compactness as the Heine-Borel property (see next.page) or compactness in terms of the Bolzano-Weierstrass condition (see Exercise 2.45 on page 63). In this section, we will present a definition of compactness in the con- text of metric spaces that reduces to the Heine-Borel property in the case of the real line 7£. We will also prove an important theorem that provides several alternative characterizations of compactness.
7.9 Compact Metric Spaces □ 465 DEFINITION 7.26 Compact Set, Compact Metric Space A subset E of a metric space Q is called compact if every open cov- ering of E has a finite subcovering. If Q itself is compact, then it is said to be a compact metric space. In practice, we often verify that a space is compact not directly from the definition but, rather, by using conditions equivalent to compactness. These conditions are generalizations of various formulations of compactness on the line. For example, the Heine-Borel theorem asserts that a subset of 11 is compact if and only if it is closed and bounded. We will see that we can get an appropriate generalization of the Heine-Borel theorem by using the right analogues of the terms closed and bounded. The following simple example shows that a condition more subtle than “closed and bounded” is needed to extend the Heine-Borel theorem to metric spaces. Let Q = Q, p(x,y) = |x — y\, and E =. [t,t 4- 1] A Q, where t is any irrational number. We note that although E is a closed and bounded subset of Q, the collection {(t 4- 1/n, t 4-1) A Q}Xi *s 311 °Pen covering of E without a finite subcovering. It is also not clear what replacement for bounded is appropriate for general metric spaces. A naive approach would call a subset E of a metric space “bounded” if the set of distances between points of E is bounded. By Proposition 7.6 on page 426, however, any metric is equivalent to a bounded metric. Because a set that is compact with respect to a metric p is also compact with respect to an equivalent metric, it follows that imposing a boundedness condition on the distances between points of a set will be irrelevant to the problem of characterizing compactness. We will show that one way to generalize the Heine-Borel theorem to arbitrary metric spaces is to replace the term closed by complete and the term bounded by what is called totally bounded. DEFINITION 7.27 Totally Bounded Set A subset E of a metric space f2 is said to be totally bounded if for each б > 0, there exist finitely many points si, ..., of E such that E C |Jj=1 Be(xj).
466 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces We note that a compact subset E of a metric space Q is totally bounded because, for each e > 0, the collection of balls { Be(x) : x G E } is an open covering of E. However, total boundedness is not by itself sufficient to guarantee compactness, as can be seen by considering again the example where fi = Q, p(x, y) = |x — y\, and E = Q A [t, t + 1]. The following theo- rem shows, among other things, that total boundedness and completeness together are equivalent to compactness. THEOREM 7.7 For a nonempty subset E of a metric space f2, the following conditions are equivalent: a) E is compact. b) If {Fn}^! is a sequence of closed subsets of Q such that for each M,we have E П (п„=1 Fn) / 0, then E П (A~=i Fn) / 0. c) Every sequence of points of E has a subsequence converging to a point ofE. d) E is complete and totally bounded. PROOF: (а) => (b): Suppose that E is compact and that {Fn}^^ is a sequence of closed sets such that E А Fn) / 0 for each N G Af. We must prove that E А (П5Х1, Fn) / 0- Suppose to the contrary. Then Ec OO \ c oo Грп) =UFn- 1=1 ' n=l Therefore, {F^}^ is an open covering of E. Because E is compact, we have that E C Un=iFn for some N Thus, EП (f)n=i= 0- This contradiction shows that E A (QJXi Fn) / 0. (b) => (c): Suppose that E satisfies (b). Let {^n}^=i be a sequence of points of E. The sets Fn = { Xk : к > n }, n G Af, satisfy the hypothesis of (b) and, hence, ПХ1 contains some point x G E. We will find a subsequence of {^n}Xi ^at converges to x. As x G Fi, there exists an П! > 1 such that р(хП11х) < 1. Suppose that integers ni < П2 < • • • < rik have been chosen such that p(xnj, x) < 1/j for j = 1, 2, ..., k. Because x G Fnfc, we can find an > nk such that p(xnjfc+1, x) < 1/(A: + 1). Thus, we have defined inductively a subsequence °f that converges to the point x of E.
7.9 Compact Metric Spaces □ 467 (с) => (d): Suppose E satisfies (c). First we show that E is totally bounded. Let 6 > 0. If rri 6 E, then either E C Be(xi) or there is an X2 G E\Be(xi). In the former case, we have found an open ball of radius б that covers E. In the latter case, we again have two possibilities — either E C B6(a;i) U Be(x2) or there is an x3 e E \ (Be(xi) U Be(x2)). Clearly, we can continue with this line of reasoning to obtain either a finite collection of balls of radius б covering E or a sequence {Tn}n=i С E satisfying a:n+i Uj=i ) for all ft € The latter case, however, contradicts (c) because it implies p(xn,xm) > б for m / n which, in turn, implies that {ajn}^Li cannot have a convergent subsequence. Hence, E is totally bounded. Next we show that E is complete. But this follows easily from Exer- cise 7.55 on page 438, which states that a Cauchy sequence with a conver- gent subsequence is convergent. (d) => (c): We will use a famous argument due to Georg Gantor. Let {zn}^Li be a sequence of points of E. Since E is totally bounded, we can find a finite number of open balls of radius 1/2 that cover E. It follows that one of those balls must contain xn for infinitely many n. Hence, it is possible to find a subsequence of {#n}^Li whose terms are all contained in a single ball. It is convenient to denote the nth term of this subsequence by S[i,nJ- Then we have р(я[1)П]5< 1 for n,m G //. Similarly, by covering E with finitely many open balls of radius 1/4, we can find a subsequence {a:[2,n]}“=i of -O[i,n] }^=i such that р(я?[2)„],Х[2,т]) < 1/2 for n, m G ЛЛ Continuing inductively, we obtain an infinite sequence of subsequences W,n] }£LilbLi such that {z[fc+i,n]}£=i is a subsequence of {z[fc,n]}£Lj and < 1/fc for m, n G ЛЛ It follows that {^[n,n]}n=rl is a subsequence of {zn}^i satisfying p(x[k,k]>x[j,j]) < max{l/j, 1/fc). Thus, {^[n.n]}^! is a Cauchy subsequence of and, so by completeness, that subsequence converges to a point of E. (d) => (a): Let E satisfy (d). Suppose for the moment we can show that (E, p) is separable. Then, by Proposition 7.20 (page 461) and Proposi- tion 7.21 (page 462), E has the Lindelof property. Thus, if О is an open covering of E, then it has a countable subcovering {On}^=i- We claim that {On}Xi bas a finite subcovering. For otherwise, we can choose an element xn G E \ Uj=i Oj for n = 1, 2, ... . Since we have already shown that (c) and (d) are equivalent, it follows that has a subsequence {£nfc}iK=i that converges to a point x G E. Because {On}^Li is a covering of E, we have x G Om for some m. Because Ить*оо xnk = x
468 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces and, because Om is open, it follows that there is a fc such that nk > m and хПк G Om. But this is a contradiction as хПк G E\ Ujii OjCE\ Om- To complete the proof of (d) => (a), we need to show that (E, p) is separable. If fc is a positive integer, then, because E is totally bounded, there are points xJyk G E, j = 1, 2, ..., m^, such that E C Uj=i ^i/feG^fc)- Let A = {Xj,k ' 1 < j' < W, fc G AT}. Then A is countable. Now let Be(x) be an open ball centered at an element x G E, Then, choosing 1/fc < 6, we can find a j such that p(x,Xj}fc) < 1/fc. It follows that Xj,k G Be(x). Thus, every open ball around a point of E contains a point of A. Hence, A is a countable dense subset of E and, so, (E, p) is separable. In the last two paragraphs of the proof of Theorem 7.7, we established the following result. COROLLARY 7.5 A totally bounded metric space is separable. EXAMPLE 7.20 Illustrates Theorem 7.7 Let || || denote one of the norms defined in Example 7.4 (or 7.5) on page 422. By Proposition 7.10 on page 435 and Exercise 7.59 (or 7.60) on page 438, a subset E of 7£n (or Cn) is complete if and only if it is closed. Exercise 7.129 shows that E is totally bounded if and only if E is bounded, that is, if and only if sup{ ||rr|| : x G E} < oo. From Theorem 7.7, we can now deduce the classical Heine-Borel theorem: A subset of (or Cn) is compact if and only if it is closed and bounded. □ A set E in normed space (Q, || ||) is called bounded if sup{ ||rr|| : x G E} < oo. Example 7.20 suggests that in a normed space, total boundedness might be equivalent to boundedness. That this is not correct is shown by the following example. EXAMPLE 7.21 A Noncompact, Closed and Bounded Set Refer to Exercise 7.48(d) on page 437. The closed unit ball Bi(0) in the space ^2(Af) is closed and bounded. For each n G AT, let en(fc) = 1 if fc = n, and 0 if к / n. As ||en — em||2 = \/2 for n / m, it follows that no ball of radius 1/2 can contain more than one en. Thus, the sequence
7.9 Compact Metric Spaces □ 469 of elements of Bi(0) cannot be contained in a finite union of balls of ra- dius 1/2. Hence, Bi(0) is not totally bounded and, so, by Theorem 7.7(d), is not compact. □ Properties of Compact Metric Spaces. Next we discuss some useful properties of compact metric spaces. Proofs will be left for the exercises. DEFINITION 7.28 The Lebesgue Number of a Covering Let О be an open covering of a metric space (Q, p). A number A > 0 is called a Lebesgue number of О if for each x e Г2, the ball B\(x) is entirely contained in some member of O. THEOREM 7.8 Let (П, p) be a compact metric space. Then every open covering of Q has a Lebesgue number. PROOF: See Exercise 7.137. DEFINITION 7.29 Uniformly Continuous Function Let (Q, p) and (Л, a) be metric spaces. A function /: Q —» A is called uniformly continuous if for each e > 0, there is a 6 > 0 such that /(?/)) < 6 whenever p(x, y) < 6. Note: A crucial element of Definition 7.29 is that 6 depends only on e. It has no dependence on x and y. THEOREM 7.9 Suppose (Q,p) and (A, a) are metric spaces, f2 is compact, and f:Sl —» A is continuous. Then f is uniformly continuous. PROOF: See Exercise 7.138.
470 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces EXERCISES 7.9 7.129 Consider 7£n equipped with any one of the norms discussed in Example 7.4 on page 422. a) Show that a subset E of IV1 is totally bounded if and only if it is bounded, that is, if and only if the set of norms of elements of E is bounded as a subset of 7£. b) Show that part (a) holds when is replaced by Cn. (Refer to Exam- ple 7.5 on page 422.) 7.130 In a metric space Q, let {zn}~=1 be a sequence such that limn-»oo xn = x. Show that the set { xn : n = 1, 2, ... } U {ж} is compact. 7.131 Compactness can also be expressed in terms of the Bolzano-Weierstrass property. Let (Q, p) be a metric space and E C Q. A point x G Q is called an accumulation point of E if for each e > 0, there is а у G E such that 0 < p(x, y) < e. Prove that E is compact if and only if every infinite subset of E has an accumulation point that is a member of E. Hint: Show that this condition is equivalent to (c) of Theorem 7.7. 7.132 Let у G ^2(ЛГ) and К = {x G £2(ЛГ) : |rc(j)l < |?/(j)| for each J G A/"}. Show that К is a compact subset of €2(.M). 7.133 Refer to Exercise 7.47 on page 437. Let К be a compact subset of a metric space (Q, p) and let x G Q. Show that there is an element у G К such that p(x,y) = p(x,K). 7.134 Refer to Exercise 7.47 on page 437. In a metric space (Q, p), let F and К be, respectively, closed and compact subsets such that F П К = 0. Show that p(F, K) > 0. 7.135 Consider the normed space (C([a, 5]),|| ||oo) discussed in Example 7.8 on page 425. Show that the closed unit ball Bi (0) is not compact. 7.136 Suppose that (Q, p) and (A, a) are metric spaces. Let QxAbe given the product topology. a) Show that Q x A is metrizable. b) Show that if К and H are compact subsets of Q and A, respectively, then К x H is a compact subset of Q x A. 7.137 Prove Theorem 7.8. 7.138 Prove Theorem 7.9. 7.139 Let (Q, p) and (A, a) be metric spaces and let f:Q. —► A be continuous. Show that if К is a compact subset of Q, then /(F) is a compact subset of A. In words, the continuous image of a compact space is compact. 7.140 Prove that a continuous real-valued function on a compact metric space attains maximum and minimum values. Hint: See Exercise 7.139.
7.10 Compact Topological Spaces □ 471 7.10 COMPACT TOPOLOGICAL SPACES In Section 7.9, we examined compact metric spaces. We are now ready to discuss compactness in the setting of arbitrary topological spaces. Our main goal is to prove a generalization of Theorem 7.7 on page 466, following which, we will derive some useful properties of compact topological spaces. DEFINITION 7.30 Compact Set, Compact Topological Space A subset E of a topological space fl is called compact if every open covering of E has a finite subcovering. If fl itself is compact, then it is said to be a compact topological space. Remark: Certainly, any compact metric space satisfies Definition 7.30. Later, we will give examples of nonmetrizable compact topological spaces. We note that E is a compact subset of fl if and only if E equipped with the relative topology is a compact topological space. We observe also that the union of a finite collection of compact sets is compact. By studying conditions (a)-(d) in Theorem 7.7, we find that only (d) involves the use of a metric in a crucial way — the conditions (a)-(c) have natural generalizations to the setting of any topological space. We can generalize condition (c) by passing from sequences to nets. And we can generalize condition (b) by introducing the finite intersection property: A collection C of subsets of a set fl is said to have the finite intersection property if the intersection of each finite subcollection of C is nonempty. THEOREM 7.10 The following conditions on a topological space fl are equivalent: a) fl is compact. b) If a collection C of closed subsets of fl has the finite intersection prop- erty, then F / 0. c) Every net in fl has a convergent subnet. PROOF: The equivalence of (a) and (b) is left for Exercise 7.141. (b) => (c): Suppose (b) holds. Let {xb}bEj be a net in fl having index set I with relation z<. For each index l, let Fb = {x^ : l tj }. We claim that the collection { Fb : l G I} of closed subsets of fl has the finite intersection property. For, if ti, 42, ..., 4n are indices, then, because I is directed, there
472 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces is an index lq such that bj 4 bQ for each j = 1, 2, ..., n. It follows that F^ C FLj for each j and, so, FLj / 0. Hence, by (b), Fb contains an element x. We will construct a subnet of converging to x. Let 91 denote the collection of all open sets containing x. For each U € 91 and ь G Z, we have {x^ : ь r]} QU / 0. Applying the axiom of choice, we obtain a function /:91 x I —* I such that ь f(U,b) and ,G U for each pair (U, b). We define a relation <j on 91 x I as follows: (17, б) <3 (V, n) if /(С/, ь) f(V, t]) and V C U. It is not hard to show that 91 x I is a directed set with respect to the relation <J. (See Exercise 7.142.) Therefore, the net defined on 91 x I by = xf(u,L) is a subnet of {xL}Lej. All that remains is to show that lim^^t) = Given W G 91, we choose any bQ G Z. If (W, t0) <з (17, t), then у^и,с) — xf(u,L) G U C W. It follows from the definition of convergence of nets that lim ?/(£/ <_) = x. (c) => (b): Suppose that (c) holds and that C is a collection of closed sets having the finite intersection property. If C* is the collection consisting of finite intersections of members of C, then, clearly, p|CeC С = F. Thus, to show that (c) implies (b), it is enough to show that F / 0. The collection C* is a directed set with respect to the relation de- fined by Fi F2 if F2 C Fi. Applying the axiom of choice, we obtain a net where xp G F for each F G C*. From (c), we know that there is a subnet with index set К and corresponding relation <j, having a limit x. Given an F G C*, there is a n G К such that F FK. Thus, G F when к <j 77. Because F is closed, Proposition 7.12 on page 442 implies that x G F. As F was chosen arbitrarily from C*, we have that x G P|F6C, T- ® Properties of Compact Topological Spaces From Theorem 7.10, we can derive one of the most useful properties of compact spaces. In words, it states that the continuous image of a compact space is compact. THEOREM 7.1 1 Let Q, be a compact topological space and f:Q—*Kbea continuous func- tion. Then f(£V) is a compact subset of A.
7.10 Compact Topological Spaces □ 473 PROOF: Let {yb}bei be a net in /(fi). For each l € /, we choose an xL € fi such that f(xL) = yb, thus obtaining a net {xL}L^i in fi. By Theorem 7.10, there is a subnet {хЬк}кек having a limit x 6 fi. It follows from Theorem 7.1 on page 443 that lim2/tK = lim/(x, ) = f(x). Noting that G J(fi), we conclude by applying Theorem 7.1U again that J(fi) is compact. The following corollary of Theorem 7.11 is left to the reader as an exercise. (See Exercise 7.143.) COROLLARY 7.6 If fi is compact and f is a real-valued continuous function on fi, then there exist points xi,X2 G fi such that J(xi) = sup/(fi) and f(x2) = inf J(fi). Next, we discuss relationships between compactness and separation properties. The first result is left to the reader as Exercise 7.144. THEOREM 7.1 2 a) A closed, subset of a compact space is compact. b) A compact subset of a Hausdorff space is closed. COROLLARY 7.7 Let fi be a compact space and Л a Hausdorff space. Suppose that f —► Л is continuous, one-to-one, and onto. Then /-1 is continuous and, so, f is a homeomorphism. PROOF: According to Theorem 7.1 on page 443, it suffices to prove that (/~1)”1(F) = f(F) is closed in Л when F is closed in fi. But, if F is closed in fi, then, by Theorem 7.12(a), F is compact. Hence, f(F) is compact by Theorem 7.11. Applying Theorem 7.12(b), we conclude that f(F) is closed. The following corollary is also left to the reader as an exercise. See Exercise 7.145. COROLLARY 7.8 Let T and U be topologies on a set fi such that T is weaker than U. If (fi, T) is Hausdorff and (fi,W) is compact, then T — U.
474 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces THEOREM 7.1 3 A compact Hausdorff space is a normal space. PROOF: Suppose that fi is a compact Hausdorff space and that A and В are disjoint closed subsets of fi. Because fi is compact, Theorem 7.12(a) implies that A and В are also compact. We must find disjoint open sets U and V containing A and B, respec- tively. Let b be a fixed, but arbitrary, element of B. Since fi is a Hausdorff space, we can, for each а € A, find disjoint open sets Oa and Pa containing a and &, respectively. The collection { Oa : a € A } is an open covering of A. As A is compact, there is a finite subcovering { Oaj : j = 1, 2, ..., m }. Let Ub = UjLi Oaj and Vb = Pa, • Then Ub is an open set containing A, Vb is an open set containing b, and Ub И И = 0. The open covering { Vb : b € В } of В has a finite subcovering { Vbk : к = 1, 2, ..., n }. Let V ~ Ufc=i and U = П£=1 Ubk. Then U and V are disjoint open sets containing A and B, respectively. The next corollary follows immediately from Theorem 7.13 and Theo- rem 7.6 on page 462. COROLLARY 7.9 A second countable compact Hausdorff space is metrizable. It is useful to note that Theorem 7.13 together with Urysohn’s lemma (Theorem 7.2 on page 450) show that compact Hausdorff spaces carry an abundance of real-valued continuous functions. EXERCISES 7.10 7.141 Prove the equivalence of (a) and (b) in Theorem 7.10 on page 471. 7.142 Prove that the set 91 x Z, defined in the proof of (b) => (c) in Theorem 7.10 on page 471, is directed with respect to the relation < defined there. 7.143 Prove Corollary 7.6 on page 473. 7.144 Prove Theorem 7.12 on page 473. 7.145 Prove Corollary 7.8 on page 473. 7.146 Let fi be a compact Hausdorff space. Suppose there is a sequence {fn}n of continuous real-valued functions on fi having the following property: If x / y, then there is an n such that fn(x) / fn(y)- Prove that fi is metrizable. 7.147 Refer to Exercise 7.125 on page 464. Show that, in a first countable com- pact Hausdorff space, every sequence has a convergent subsequence.
7.11 Locally Compact Spaces □ 475 7.148 Suppose that Q and A are compact spaces and that Q x A is given the product topology. Show that Q x A is compact. ★ 7.149 Let Q be a topological space. A function f :fl —> [—00,00) is said to be upper semicontinuous if 00, r)) is open for each real number r; a function g is said to be lower semicontinuous if — g is upper semicon- tinuous. a) Show that an upper semicontinuous function on a compact space is bounded above and attains the sup of its range. b) Show that a lower semicontinuous function on a compact space is bounded below and attains the inf of its range. +7.150 Refer to Exercise 7.149. Suppose that f is an upper semicontinuous func- tion on a compact Hausdorff space Q. a) Prove that f(x) — inf{h(x) : h is continuous and f < h} for each x E П. b) State and prove an analogous result to part (a) for lower semicontinuous functions. 7.151 Refer to Exercise 7.149, Definition 6.6 on page 331, and Example 7.8 on page 425. Show that f —>Vaf defines a lower semicontinuous function on the normed space (C([a, 6]), || ||oo). 7.11 LOCALLY COMPACT SPACES The space of real numbers, 7£, is not compact. We can see this directly by noting that the open covering {(—n, n) : n G Af} of TZ has no finite subcovering; or we can deduce it from the Heine-Borel theorem. Although TZ is not compact, compactness plays an important role in its analysis. This is because every element of TZ is contained in an open set having compact closure. Many topological spaces share with TZ this important property, which is called local compactness. DEFINITION 7.31 Locally Compact Topological Space A topological space Q is said to be locally compact if for each x G fi there is an open set W such that x G W and W is compact. It is not hard to see that the spaces TZn and Cn in Examples 7.4 and 7.5, respectively, on page 422 are locally compact. The spaces £1(/x)? £2(m)> and £°°(/z) of Example 7.6 on page 423 are not locally compact except in certain special instances. (See Exercise 7.152.)
476 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces In most cases of interest, the property of local compactness appears in conjunction with the Hausdorff property. The next several results provide some important properties of locally compact Hausdorff spaces. PROPOSITION 7.22 Let О be an open subset of a locally compact Hausdorff space fi. a) If x € O, then there exists an open set V such that V is compact and xe Vc v co. b) If К is compact and К С O, then there is an open set W such that W is compact and К C W C W С O. PROOF: _ a) Let W be an open set containing x such that W is compact. By The- orem 7.13 on page 474, W equipped with the relative topology is a normal space. We note that, in the relative topology of the compact space W, IVDO is open and {x} is closed. Hence, by Proposition 7.14 on page 449, there is a set V having the following properties: x С V, V is open in the relative topology of W, and the closure of V in the relative topology of W is contained in W П O. Because W is closed in fi, it follows that the closure of V in the relative topology of W coincides with its closure in fi. Hence, xeVcVcWnOcO. The proof of (a) will be complete if we can show that V is open as a subset of fi. By the definition of relative topology, there is an open subset U C fi with V = U A W. Then V = V QW = U QW QW = U HW. Thus, V is open in fi. b) By part (a) we can, for each x С K, find an open set Vx whose closure is compact and satisfies x € Vx C Vx С O. Because К C and К is compact, we can find finitely many points ®i, x2, ..., xn of К such that К C Uj=i Letting W = U?=1 VXj, we obtain KcWcW=(jV^CO. J=1 As W is a finite union of compact sets, it is compact. Using Proposition 7.22, we can prove a version of Urysohn’s lemma (Theorem 7.2 on page 450) for locally compact Hausdorff spaces.
7.11 Locally Compact Spaces □ 477 THEOREM 7.14 Suppose that fi is a locally compact Hausdorff space and that О and К are, respectively, open and compact subsets of fi such that KcO. Then there is a continuous function f:ft —> [0,1] such that f(x) = 1 for x e К and f(x) = 0 for x € Oc. PROOF: By applying Proposition 7.22 twice, we obtain open sets Wi and W2 such that W2 is compact and К C Wi C Wr C W2 C W2 C O. By Theorem 7.12 on page 473, К is a closed subset of W2. Because the space W2 equipped with the relative topology is normal (Theorem 7.13 on page 474), it follows from Urysohn’s lemma that there is a continuous function g: W2 —► [0,1] with g equal to 1 on AT and 0 on W2 \ Wi. We now define a function —► [0,1] by letting f be equal to g on W2 and equal to 0 on fi \ W2. It is left as an exercise for the reader to show that f is continuous on fi. (See Exercise 7.154.) Theorem 7.14 is the basis of an important construction related to cov- erings of compact subsets of locally compact Hausdorff spaces. To describe this construction, it is helpful to introduce the following terminology. Let f be a complex-valued function on a topological space fi. The closure of the set of points where f is not 0 is called the support of f and is denoted by supp f. Hence, supp / = { ж G fi : f(x) / 0 }. THEOREM 7.15 Partition of Unity Let fi be a locally compact Hausdorff space, К a compact subset of fi, and О an open covering of K. Then there are finitely many continuous real-valued functions fi, f2, ..., fn on fi such that: a) fj > 0 for each j. b) For each j, there is an Oj G О such that supp fj C Oj. c) fj(x) = 1 for each x G K. d) 127=1 fj(x) — 1 f°r each x PROOF: For each x G К we choose Ox G О such that x G Ox. By Proposition 7.22, there is an open set Vx such that Vx is compact and x € Vx C Vx C Ox. By Theorem 7.14, there is, for each x G K, a continuous function gx such that 0 < < 1? 9x(%) = 1> and 9x{y) = 0 for у G V£. We note that supp^ C Vx C Ox.
478 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces Since { gx 1 ((0, oo)) : x € К } is an open covering of K, there are a finite number of points x?, ..., xn of К such that К C Uj=i °°))- Hence, the function g = gXj is strictly positive on К. By Corollary 7.6 on page 473, we have a = inf g(K) > 0. The closed set F = oo,a/2]) is disjoint from K. Thus, again by Theorem 7.14, we find that there is a continuous function h such that 0 < h < 1, h(x) = 0 for x € K, and h(x) = 1 for x € F. Because the function g -4- h is positive everywhere on fi, it follows that the functions fj = 9x3A1 + hln)K9 + h), j = l, 2, n, are continuous. It is easy to check that the functions Л, /2, • • •, /n satisfy conditions (a)-(d). Theorem 7.14 can also be applied to extend Urysohn’s metrization theorem (Theorem 7.6 on page 462) to locally compact Hausdorff spaces. THEOREM 7.16 If a topological space is locally compact, Hausdorff, and second countable, then it is metrizable. PROOF: Let Q be a locally compact Hausdorff space with a countable neighborhood basis 91. Let W={(U,V): U,Ve% U is compact, and U С V}. We will show that given an open set О and a point x € O, there is a pair (U, V) € W such that x € U and V С О. Since 91 is a neighborhood basis, there is a V € 91 such that x € V С O. By Proposition 7.22 on page 476, there is an open set W such that W is compact and x € W C W С V. We can now choose a U € 91 such that x e U C W. It follows from Theorem 7.12 on page 473 that U is compact. Thus, (U, V) € W. The remainder of the proof is the same as the proof of Urysohn’s metrization theorem, where Theorem 7.14 on page 477 is used as a replace- ment for Urysohn’s lemma. It is possible to extract from the proof of Theorem 7.16 the following corollary whose proof is left to the reader as Exercise 7.159. COROLLARY 7.10 Let Q be alocally compact space. Then there is a neighborhood basis 91 such that U is compact for each U € 91. Furthermore, if Q is second countable, then 91 can be chosen to be countable.
7.11 Locally Compact Spaces □ 479 Let fi be a second countable topological space. Suppose that fi has a countable neighborhood basis 91 = {t7n}^=1 such that the_closure of each Un is compact. Let Wi = Ux. The sets in 91 cover U{ and, so, there is an integer n2 > 1 such that W\ C Uj. Let W2 = UjXi Цг Then W2 is compact and, hence, we can find an integer пз > n2 such that ^2 C Ujli Uj. Let W3 = Uj=i Uj. Continuing in this fashion we obtain an infinite sequence of sets satisfying the conditions delineated in the next definition. DEFINITION 7.32 Exhaustion A sequence {Wn}^ of subsets of a topological space fi is called an exhaustion if it satisfies the following conditions: a) Each Wn is open. b) Each Wn is compact. c) Wn c Ж+i for each n. d) П = Un=i Wn. Corollary 7.10 and the paragraph preceding Definition 7.32 show that a second countable locally compact space has an exhaustion. Here are some concrete examples of exhaustions. EXAMPLE 7.22 Illustrates Definition 7.32 a) {(—n, n)}^^ is an exhaustion of H. b) {(1/n, 1 — l/n)}^! is an exhaustion of the interval (0,1). c) {Bn(0)is an exhaustion of the normed space (7£n, || Ц2) discussed in Example 7.4 on page 422. □ In the next section, exhaustions will be used to obtain metrization results for certain spaces of functions. Compactification Our next theorem shows that it is possible to turn any locally compact Hausdorff space into a compact space by the addition of a single point. To see how, first consider the set AT of positive integers with the discrete topology. This space is locally compact, but not compact, and its compact subsets coincide with its finite subsets. We would like to add a “point at infinity,” denoted to this space to turn it into a compact space. The problem is to find the right topology for the set AfU {tu}.
480 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces To see what to do, we pass from the set A/" to the subset of real numbers E = {1/n : n € A/"} via the function h(n) — 1/n. Note that E = E U {0}. E is a bounded set of real numbers but is not compact because it is not closed. However, E U {0} is closed and bounded and, hence, compact with respect to the relative topology inherited from TZ. We can easily describe the open sets (in the relative topology) of Eu{0}: Each subset of E is open and a subset D of E U {0} containing 0 is open if and only if its relative complement (E U {0}) \ D is a finite set. If we now extend the function h to AT U {tu} by h(tu) = 0 and call a subset W of A/*U {lj} open if h(W) is open in EU {0}, we obtain a topology making AT U {u} into a compact space. The open sets of this topology on AT U {cj} consist of all subsets of A/" as well as all complements of finite subsets of X. As the next theorem shows, the construction that we just performed can be generalized to arbitrary locally compact Hausdorff spaces. The proof of the theorem is left to the reader as Exercise 7.160. THEOREM 7.17 One-Point Compactification Suppose that (Q, T) is a locally compact Hausdorff space. Let u> be an element not in Q and set П* = OU {lj}. Let T* denote the collection of subsets of SI* that are either members of T or whose complements are compact subsets of the space (Q, T). Then T* is a topology on Si* having the following properties: a) (Q*,T*) is compact. b) T coincides with the relative topology {QfW : W € T* }. c) SI is open in the topological space (Sl*,T*). d) SI is dense in (Sl*,T*) unless (Sl,T) is compact. The space constructed in Theorem 7.17, is called the one- point compactification of SI. EXERCISES 7.11 7.152 Show that the space ^2(Q) of Example 7.6 on page 423 is locally compact if and only if Q is finite. 7.153 Suppose that Q is a locally compact space, that D C SI, and that 7b is the relative topology on D. Prove that (D,7b) is locally compact if D is closed in Q. 7.154 Show that the function f defined in the last paragraph of the proof of Theorem 7.14 on page 477 is continuous and satisfies f(K) = {1} and /(Oc) = {0}.
7.12 Function Spaces □ 481 7.155 Show that the functions /i, /2, . • •, fn defined in the last paragraph of the proof of Theorem 7.15 on page 477 satisfy conditions (a)-(d) of that theorem. 7.156 State and prove a version of Tietze’s extension theorem (Theorem 7.3 on page 451) for locally compact Hausdorff spaces. 7.157 Let К be a compact subset of a locally compact Hausdorff space fi. Show that there is nonnegative continuous function f on fi such that К = /~1(0) if and only if there is a sequence {Gnj^Lx of open sets such that К = x Gn. 7.158 Let f be a complex-valued function on a set fi. A point xq 6 fi is said to be a peak point of f if |/(x)| < |/(xo)| for all x / xq. Show that in a metrizable locally compact space fi, each point is a peak point for some complex-valued continuous function on fi. 7.159 Prove Corollary 7.10 on page 478. 7.160 Prove Theorem 7.17. ★ 7.161 Let f be a continuous function from a locally compact Hausdorff space fi into a metric space (A, p). The collection /С of compact subsets of fi is a directed set with respect to the relation C. Define limz—w/(x) = у to mean that the net {sup{p(/(rr),2/) : x E Kc}}KE)C converges to 0. Show that f is the restriction of a continuous function on the one-point compactification of fi if and only if limx—w f(x) exists. Note: In case fi = 7£n, limx—a, f(x) = у if and only if lim||xj|—f(x) = y, where || || is any one of the norms defined in Example 7.4 on page 422. 7.162 Prove that the one-point compactification of К is homeomorphic to a circle in 7£2. 7.163 Define a “two-point compactification” of that is homeomorphic to the interval [—1,1]. 7.164 Prove that the one-point compactification of the space of complex num- bers C is homeomorphic to the sphere S = {x e H3 : ЦжЦг = 1 }• Hint: Let h(z) = (14- \z\2y\2ftz,2<Sz, \z\2 - 1) for z e C, and = (0,0,1). 7.12 FUNCTION SPACES We will consider, in this section, what it means for a sequence of continuous functions to converge. In particular, we will construct a topology T(fi,A) for the collection of continuous functions from a topological space fi to a
482 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces metric space (A, p) such that convergence of a sequence with respect to T(fl,A) corresponds to uniform convergence on compact subsets. Related notions of pointwise and uniform convergence will also be discussed. For a sequence {fn}^=1 of functions from a topological space fl to a metric space (Л, p), there are several meanings that can be attached to the expression lim fn = f. n—»oo (7-12) One simple way to define (7.12) is the following. DEFINITION 7.33 Pointwise Convergence A sequence {/nl^Lx of functions from a topological space fl into a metric space (A, p) is said to converge point wise to the function f if for each x E fl and each e > 0, there is an E Лг such that p(/n(z)5/(z)) < e whenever n > N. Pointwise convergence of a sequence of functions to a func- tion f requires that the sequence {/n(^)}^=i of elements of A converges to f(x) for each x G fl. A much more demanding mode of convergence is as follows. DEFINITION 7.34 Uniform Convergence A sequence of functions from a topological space fl into a metric space (A, p) is said to converge uniformly to the function f if for each e > 0, there is an N G Af such that p(/n(a;), /(^)) < 6 f°r all x G fl whenever n > N. The crucial difference between Definitions 7.33 and 7.34 is that, in the latter, N may not depend on x whereas, in the former, it may. For many applications, Definition 7.33 is too weak and Definition 7.34 is too strong. In this section, we will be concerned primarily with a mode
7.12 Function Spaces □ 483 of convergence that is intermediate between pointwise and uniform conver- gence. This mode of convergence is as follows. DEFINITION 7.35 Uniform Convergence on Compact Subsets A sequence {/n}^Li of functions from a topological space Q into a met- ric space (Л, p) is said to converge uniformly on compact subsets to the function f if for each compact subset К C Q and each e > 0, there is an N e Af such that p(fn(x), f(x)) < e for all x e К when- ever n > N. EXAMPLE 7.23 Illustrates Definitions 7.33-7.35 Let П = (0,1), A = 7£, and fn(x) = • Then the sequence of functions converges both pointwise and uniformly on compact subsets to the function f(x) = 1/(1 — x). But, does not converge uniformly to f. □ Next we introduce some notation that will be used throughout the remainder of the text. DEFINITION 7.36 Collection of Continuous Functions Let Q be a topological space and A a metric space. Then we denote by C(f2, A) the collection of all continuous functions from Q to A. In case Л = C, we write C(Q) for C(Q,A); thus, C(f2) is the collection of all complex-valued continuous functions on fl. We will construct a topology for C(f2,A) such that convergence of sequences in that topology is the same as uniform convergence on compact subsets. To aid our construction, it will be helpful to have the following notation. For f,g € C(Q,A) and S' C Q, we let % Ps(f,g) = sup{ p(f (a?),s(x)) : x 6 S }. (7.13) Thus, ps measures how (uniformly) close two functions are on S. The following proposition, whose proof is left for Exercise 7.166, shows that as a function from C(Q, Л) x C(Q, A) to [0, oo], ps is almost a metric.
484 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces PROPOSITION 7.23 The function ps, defined in (7.13), satisfies the following conditions for all £ C(fl, A): a) 0 < ps(j,g) < oo. b) ps(f,n = o. c) Ps(f,g) = Ps(g,f)- d) Ps(J,g) < ps(f,ty + ps (h,g). e) If S is compact, then ps(f,g) < oo. f) If S is compact, then \ps(f,h) - ps(g,h)\ < ps(f,9)- g) If fl is compact, then pp is a metric. If ps(f)9) = 0 for some f / g or if ps(f,g) = oo for some f and g, then p is not a metric. When S is a compact subset of fl, then, as Propo- sition 7.23(e) shows, the latter obstacle cannot arise, but the former still remains. (See Exercise 7.168.) Nevertheless, by considering the entire fam- ily { Pk • If compact}, we can produce a topology on C(fl, Л) that will be the correct one for studying uniform convergence on compact subsets. In the following definition, the notation рк(-, p) represents the function from <7(Q,A) to [0, oo) defined by = PK(f,9h where К C fl is compact and g G C(f2,A). DEFINITION 7.37 Topology of Uniform Convergence on Compacts The weak topology on C(fl, A) determined by the family of functions { £/<(•? <7) : If compact, g G C(fl, A)} is called the topology of uni- form convergence on compact subsets and is denoted T(fl,A). Note: Whenever we work with a function space of the form C(fl,A), we will assume that it is equipped with the topology T(fl,A) unless explicitly stated otherwise. The next proposition shows that convergence in the topology T(fl, A) is exactly the same as uniform convergence on compact subsets. PROPOSITION 7.24 Let {ЛЬе/ be a net of functions in C(fl,A). Then {fL}Lei converges to f if and only if lim pK(fb,f)=O (7.14) for each compact subset К C fl.
7.12 Function Spaces □ 485 PROOF: By Proposition 7.13(b) on page 444, the net {fb}Lei converges to f if and only if Итрк(Л,#) = pK(f,g) (7.15) for each compact set К and each function g G C(fl,A). The equivalence of (7.14) and (7.15) now follows easily from parts (b) and (f) of Proposi- tion 7.23. Remark: Because р(/(я),р(я)) < Рк(/,д) for each x € K, it is clear from Proposition 7.24 that convergence of a sequence with respect to the topology T(fl, Л) corresponds to uniform convergence on compact subsets. Although, in general, T(fl, A) is not metrizable, it is nevertheless pos- sible to define analogues of Cauchy sequences and a notion of completeness for the space C(fl,A). DEFINITION 7.38 k-Cauchy Sequence; k-Complete A sequence {/n}^Li of functions in C(fl, A) is said to be fc-Cauchy if for each compact subset К and each e > 0, there is an N G Af such that PKljmfm) < whenever n, m > N. If every fc-Cauchy sequence in C(fl,A) converges, then C(fl,A) is called fc-complete. Remark: The concept of a fc-Cauchy sequence expresses the idea of a se- quence that is “uniformly Cauchy on compact subsets.” The most interesting examples of spaces of the type C(fl,A) occur when fl is a locally compact Hausdorff space. Theorem 7.19 describes some properties of C(fl, A) in this case. Before stating and proving that theorem, however, we need some preliminary results. THEOREM 7.1 8 Suppose that fl is a topological space and that (A, p) is a metric space. Let {/n}~=i be a sequence in C(fl, A) that converges uniformly to a function f. ThenfeC^X). PROOF: Let xq e £1 and e > 0 be given. To establish the continuity of f at xq, we will show that there is an open set U containing xq such that p(/(u), /(^o)) < 6 whenever x G U. (See Theorem 7.1 on page 443.) By uniform convergence, there is an N such that p(/n(a;), /(^)) < e/3 for all x G fl whenever n > N. Because /n is a continuous function, there
486 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces is an open set U containing xq such that p(Jn(x), < e/3 whenever x G U. It follows that for x G U, p(f(x), f&oY) < fN(x)) + p(/w(x), fN(x0)) + p(Jn(zo), f(x0)) < e/3 + б/З 4- б/З = б. Hence, f is continuous at xq. LEMMA 7.2 Let fl and Л be topological spaces and f:Fl —> Л. Suppose that for each x G fl, there is an open set Ux containing x such that f\ux is continuous. Then f is continuous. PROOF: The proof is left for Exercise 7.170. THEOREM 7.1 9 Suppose that Q is a locally compact Hausdorff space and that (Л, p) is a metric space. a) If A is complete, then C(Q,A) is k-complete. b) If fl is second countable, then C(fl, A) is metrizable. c) If fl is second countable and A is complete, then C(fl, Л) is complete. d) If fl is compact, then the topology T(Q, Л) is induced by the metric pq. PROOF: Proof of (a): Suppose that {/n}^Li is a fc-Cauchy sequence. Applying Definition 7.38 with К = {x}, we find that the sequence {/n(^)}^=i is Cauchy in A for each x G fl. Because A is complete, we conclude that for each x G fl, the sequence {/n(^)}^=i converges in A. Let the function /:fl —» A be defined by f(x) = limn_4OO/n(z). We will show that f is continuous on fl. Let xq G fl and W be an open set containing xq such that W is compact. Let б > 0. We can choose N such that Pw(fn, fm) < б for m, n > N. For each x G W, we have p(/n(^),f(a:)) = lim p(/n(z),/m(z)) < limsup Лу(/П, fm) < e (7.16) m^oo т—юо for n > N. It follows that the restrictions to W of the functions fn con- verge uniformly to the restriction of f to W. Hence, by Theorem 7.18 and Lemma 7.2, f is continuous on all of fl. It remains to show that the sequence {/n}^Li converges to f with respect to the topology T(fl, A). Replacing W by an arbitrary compact
7.12 Function Spaces □ 487 subset К in (7.16) and taking the supremum over x G K, we get that Рк(/п, /) < e for n > N. Hence, by Proposition 7.24 on page 484, {Ail^Li converges in to f. The proof of (a) is now complete. Proof of (b): If Q is second countable, then it has an exhaustion {JFn}J°=1. (See Definition 7.32 on page 479 and the paragraph that follows that defini- tion.) Let pn = рщг-. By Proposition 7.23(e) on page 484, pn is real-valued. Let cr = 2“npn/(l+Pn)- We claim that a is a metric on C(f2, Л). That a is a real-valued function follows from Exercise 7.70 on page 445. By Exercise 7.30 on page 427, a satisfies Definition 7.7(c) on page 420. That ст satisfies Definition 7.7(b) is clear, as are the facts that a is nonnegative and satisfies сг(/, f) = 0 for each f. Thus, to prove that a is a metric, it remains only to show that a^f^g) = 0 implies f = g. If a(f,g) — 0, then, Pn(f,g) must vanish for each n. Hence, for each neAf, f(x) = g(x) for all x G Wn. Because is an exhaustion, it follows that f = g. Consequently, a is a metric. Let T denote the topology on C(f2, Л) induced by the metric a. We will show that T = Т(П, Л). By the definition of the topology T(f2, Л), for each fixed g G C(Q, A) and n G Af, the function pn(-,p) is continuous with re- spect to that topology. The sequence of sums 2“Jpj(-,p)/(l+pj(-,p)) converges uniformly on C(Q, Л) to the function ct(-, g). From Theorem 7.18, we conclude that cr(-,p) is continuous with respect to T(f2,A). Because B?(g) = cr(-,p)'"1(—oo,r), it follows that every open ball B°(g) is open with respect to T(Q, Л). Hence, every T-open set is T(Q, A)-open, that is, T is weaker than T(f2, A). To complete the proof of (b), we must show that T(Q, A) is weaker than T. To do so, it suffices, by Exercise 7.80 on page 446, to show that if a net {Л}^1 converges to f with respect to the topology T, then it converges to f with respect to the topology T(Q,A). Now, {Alter converges to f with respect to T if and only if lim a( fL,f) = 0. Let К be an arbitrary compact subset. Then, since the sets in the exhaustion {H'n}^=1 are an open covering of K, there is an m such that К C Wm. Thus, Pk(Jl, f) — Pwm (•/*’ D ~ Pm(fi) f)- The inequality 2-"WA, /)/(l + MA, /)) < ^(A, /) implies that lim pm (A >/) = 0. Hence, lim Pk(Ji, f) = 0. It now follows from Proposition 7.24 on page 484 that {A} converges to f with respect to the topology T(f2, A). This completes the proof of (b).
488 □ Chapter 7 Elements of Topological, Metric, and Normed Spaces Proof of (c): By (a) and the proof of (b), it suffices to show that a se- quence {/n}Xi that is Cauchy with respect to the metric a is fc-Cauchy. Let e > 0 be given and К a compact subset of Q. As before, we can choose an m such that К C Wm. It follows from the definition of a that n-m PK^fmfp) < <?—m Pm(fmfp) < rr( f f \ l + PK(fnJP)~ l + Pm(fn,fP)~ If N is large enough so that cr(fmfp) < 2-Tne/(l 4- e) for n,p > N, we obtain pK^fn, fP) < Thus, {/n|^Li is fc-Cauchy. The proof of (c) is now complete. Proof of (d): The proof of part (d) is left for Exercise 7.171. EXAMPLE 7.24 The Case Л = C Let fi be a locally compact Hausdorff space. Besides having a metric space structure derived from the usual distance function p(z,w) = |z — w|, the space С(П) has a linear-space structure, where addition and scalar multi- plication are defined pointwise. We now consider the relationship between the linear-space structure and the topology T(f2, C) which, for simplicity, we denote by T(Q). For S C Q, let П/Ils’ = Ps(f,0) = sup{ |/(x)| : x e S }. We note that ps(f,d) = ||/~p||s and that || ||s has the defining properties of a norm except that ||/||s can be oo for some functions and can be 0 for functions that do not vanish identically. If and are nets in C(f2) converging to f and p, respec- tively, then, for each compact subset К of Q, Pk(A + 9i, f + 9) = IIЛ + 9c - f - slk < IIA - /IIаг + Ил - sIIk = /) + рк(д<.,д)- Л. We now see from Proposition 7.24 on page 484 that {fL 4- converges to f 4- g. If follows that the operation of addition is continuous as a func- tion from C(Q) x C'(Q) to C(£2), where C(Q) x C(Q) is given the product topology. By a similar argument, we find that the operation of scalar multipli- cation is continuous as a function from C x C(f2) to C(f2). But, actually, more is true. Scalar multiplication of a function by a complex number is a special case of pointwise multiplication of functions. If the product fg of two functions f and g in С(П) is defined pointwise, then C(f2) becomes an
7.12 Function Spaces □ 489 algebra of functions? Furthermore, since the product operation on C(fi) satisfies ||/р||к < Ц/НкЦ^Цк, it follows by an argument similar to the one used to prove continuity of addition that the operation of multiplication is continuous as a function from C(fi) x C(fi) to C(fi). □ EXAMPLE 7.25 The Case fi Compact and Л = C Refer to Example 7.24. If fi is compact, then || ||q is a norm on C(fi) called the sup-norm, also known as the supremum norm or uniform norm. Thus the sup-norm on C(fi) is given by II/IIq = sup{ \f(x)\ : X G fi }, f £ C(fi). (7.17) The sup-norm induces the topology T(fi) and, moreover, (C(fi), || ||q) is complete. Whenever we are considering a space of the form C(fi) where fi is compact, we will assume that it is equipped with the sup-norm unless explicitly stated otherwise. □ EXAMPLE 7.26 The Case fi Not Compact and Л = C Refer to Example 7.24. If fi is not compact, || ||q is still a norm on some subspaces of C(fi). Important instances are the following: C'c(fi) = { f G C(fi) : supp f is compact} С0(П) = { f G C(Q) : lim f(x) = 0 }* X—*(*> С-ь(П) = {/e ОД :||/||n< 00 }. The spaces Cc(fi), Co(fi), and Cb(fi) are called, respectively, the contin- uous functions with compact support, continuous functions van- ishing at infinity, and bounded continuous functions. Co(fi) is a closed subspace of Ck(fi) with respect to the topology in- duced by || ||q. Cc(fi) is a linear subspace of Co(fi) but it is not closed. Indeed, it can be shown that Cc(fi) is dense in Co(fi) with respect to the topology induced by || ||q. See Exercise 7.173. □ For each x G fi define e^:C(fi, A) —* A by ex(f) = f(x). The weak topology Tp(fi,A) determined by the family of functions {ex : x G fi} t A linear space L with a multiplication operation that satisfies x(yz) = (xyjz, x(y + z) = xy 4- xz, (x 4- y)z = xz 4- yz, and а(ху) — (ах)у = x(ay) for all x, y, z E L and all scalars a is called an algebra. * For the meaning of limx-»w /(x), see Exercise 7.161 on page 481.
490 - □ Chapter 7 Elements of Topological, Metric, and Normed Spaces is called the topology of pointwise convergence. In case Л = C, the topology Tp(f2,C) is denoted by jTp(Q). Whereas each function ex is continuous with respect to T(f2, A), it follows that TP(Q,A) is weaker than T(£l, Л). The space C(Q,A) is a subset of the Cartesian product AQ. If AQ is equipped with the product topology, then 7^(0, A) is the relative topology on C(Q,A). EXAMPLE 7.27 The Case Q = Af and Л = C The set of positive integers A/* equipped with the discrete topology is a second countable, locally compact Hausdorff space. Thus, by Theorem 7.19 on page 486, (C(A/’), 7'(A/’)) is metrizable and complete. As the compact subsets of Af are exactly the finite ones, it follows that T(Af) = TP(A/). The subspace Cc(Af) consists of all sequences of complex numbers that are zero except for finitely many indices; the subspace often denoted in the literature by cq, consists of all sequences of complex numbers that converge to 0; and the subspace Cb(Af) coincides with the space £°°(A/) of Example 7.6 on page 423. Note also that on Сь(А0, || \\/j is just || ||oo. □ EXAMPLE 7.28 Continuous Periodic Functions Consider the subspace P of C(TV) given by P = { h G C(TV) : h(x 4- 2тг) = h(x) for all x G TZ }. Of course, P is just the space of continuous functions having period 2тг. It is easy to see that P is a closed subspace of the normed space Сь(7£). We will discuss a relationship between P and the space C'(T’), where T is the unit circle centered at 0 in the complex plane C. For / G C(T), define J(f):1Z —> C by J(/)(rr) = /(cosx 4- i sinx}. The function J is a one-to-one linear function from C(T) onto P satisfying J(fg) = and ||7(/)||тг = ||/||т for all f,g G C(T). (See Exercise 7.179.) Thus, as normed spaces and as algebras, the spaces P and C(T) are essentially copies of each other. It will be helpful later, when we study approximation by trigonometric polynomials, to identify the spaces P and C'(T) by means of the correspondence J. □ EXERCISES 7.12 « 7.165 Show by examples that no two of the modes of convergence described by Definitions 7.33, 7.34, and 7.35 are equivalent. 7.166 Prove Proposition 7.23 on page 484.
7.12 Function Spaces □ 491 7.167 Give an example where ps(f,g) = oo. 7.168 Suppose that a Hausdorff space Q is locally compact but not compact and that AT is a compact subset of Q. Prove that рк cannot be a metric on C(Q). 7.169 Prove that (<7(Q,A),T(Q, A)) is always a Hausdorff space. 7.170 Prove Lemma 7.2 on page 486. 7.171 Prove part (d) of Theorem 7.19 on page 486. 7.172 Prove that || ||q is a norm on each of the spaces Cc(Q), Co(Q), and Cb(Q) defined in Example 7.26 on page 489. ★ 7.173 Suppose that a Hausdorff space Q is locally compact but not compact. Let <7C(Q), Cb(Q), and Cb(Q) be given the norm || ||q. a) Show that Co(Q) is closed in but not equal to Cb(Q). b) Show that Cc(Q) is dense in Co(Q). c) Prove that Cb(Q) and Co(Q) are complete. d) Is Cc(Q) complete? Justify your answer. ★ 7.174 For xo G Q, define eXo:C(Q, A) —► A by eXQ(f) = f(xo). a) Show that eXQ is a continuous with respect to T(Q, A). b) Conclude that TP(Q,A) is weaker than T(Q,A). 7.175 Give an example where TP(Q,A) is properly contained in T(Q,A). 7.176 Show that the relative topology on Co (AT) determined by T(Af) is strictly weaker than the topology induced by || Цлг. 7.177 Let (Q, T) be a locally compact Hausdorff space. Show that the weak topology determined by the family of functions C(Q) coincides with T. 7.178 Let Q be a compact Hausdorff space. Show that the following subsets of C(Q) are open: a) { / : p/ = 1 for some g G C(Q) }. b) { f : f = eh for some h G C(Q) }, where eh is defined by eh(x) = eh^x\ Hint: Use the Taylor series expansions of 1/(1 — z) and log(l — z). 7.179 Verify the asserted properties of the function J defined in Example 7.28.
Marshall Harvey Stone (1903-1989) Marshall Stone was born in New York City on ЧМИЙ April 8. 1903. He entered Harvard in 1919 and WWL received his doctorate under G. D. Birkhoff in 1926. Stone taught at Columbia. Yale. Har- vard, and Chicago. He retired from Chicago in 1968, but accepted a position at the Uni- IHBHHHHHHHHi versity of Massachusetts where he worked full time until 1973. Influenced by Birkhoff and von Neumann, Stone obtained significant results in the spectral theory of unbounded operators, Boolean algebras, general topology, and rings of continuous functions. Perhaps most well known is his striking generalization of Weierstrass’s theorem on polyno- mial approximation. Four theorems are named for him: Stone’s represen- tation theorems for one-parameter unitary groups and Boolean algebras, the Cech-Stone compactification theorem, and the Stone-Weierstrass ap- proximation theorem. Under Stone's leadership as chairman, the University of Chicago’s de- partment of mathematics developed into a world center of mathematics research. In 1950, after modernizing and upgrading the University of Chicago's mathematics programs. Stone devoted himself to improving the teaching of pre-university mathematics. He took a leading part in a series of international conferences on mathematical education and served as an active member of the School Mathematics Study Group. Stone was traveling in Madras, India when he died on January 9, 1989. 492
Complete Spaces, Compact Spaces, and Approximation In this chapter, the ideas of Chapter 7 are used to prove some general theo- rems of analysis in metric and topological spaces. Sections 8.1 and 8.2 deal with two important consequences of the completeness property, namely, the Baire category theorem and the contraction mapping principle. Sections 8.3 and 8.4 answer the following question for spaces of the form G(Q,A) and for product spaces, respectively: “When is a set of functions compact?” In Section 8.5, we discuss generalizations of piecewise linear approximation of functions to spaces of the form C(Q,77.) and, in Section 8.6, we consider generalizations of polynomial approximation of functions to spaces of the form C(f2). 8.1 THE BAIRE CATEGORY THEOREM Suppose we think of a subset of a topological space as being “thin” if its closure has an empty interior. The Baire category theorem asserts that a complete metric space cannot be expressed as the union of countably many “thin” sets. Rather than use the term “thin,” we adopt the following more widely used terminology. 493
494 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation DEFINITION 8.1 Nowhere Dense Set A subset E of a topological space Q is said to be nowhere dense if its closure has an empty interior, that is, if (E)° = 0. EXAMPLE 8.1 Illustrates Definition 8.1 a) It is easy to see that a set is nowhere dense if and only if the complement of its closure is dense. b) Clearly, any finite subset of TZ is nowhere dense; in fact, any subset of TZ whose closure is countable is nowhere dense. There are, of course, countable subsets of TZ that are not nowhere dense (e.g., Q). On the other hand, the Cantor set is an example of an uncountable set that is nowhere dense. c) Any line segment in TZ2 is nowhere dense. □ We now state and prove the Baire category theorem. THEOREM 8.1 Baire Category Theorem A complete metric space (Q, p) is not the union of a countable collection of nowhere dense subsets. PROOF: Let be a sequence of nowhere dense subsets of Q. We must show that (8.i) 4=1 ' To prove (8.1) it suffices to verify that QJXi / 0» where Gn = (Sn)c. We will prove somewhat more, namely that U П (A^Li Gn) / 0, whenever 17 is a nonempty open subset of Q. So, let U be open and nonempty. We first note that, for each n € Af, Gn is dense because Sn is nowhere dense. Thus, there exists an element Xi in the open set U П Gp It follows that there is an ri > 0 such that Bn(xi) C U П Gi. Since G2 is dense and open, we can choose an element z2 € Bri (xi) A G2 and an r2 > 0 such that r2 < n/2 and ВГ2(х2) C Bri(xi) A G2.
8.1 The Baire Category Theorem □ 495 Continuing inductively, we obtain a sequence {^п}^ *n a?d a sequence of positive numbers suc^ ^at f°r all n e Af, Tn+i < Tn/2 and ВГп(хп) C Gn (8-2) and -®rw+i(^n+i) С ВГп(хп). (8.3) Because p(xn+i, rrn) < rn < 2“^n”^ri for n = 1, 2, ..., it follows that k-l k-l p(xn+k,xn) < J>(xn+j+1,sn+i) < < 2-n+2n. j=0 j=0 Hence, {^n}^Li is a Cauchy sequence and, so, its limit exists; call it x. It follows from (8.2) and (8.3) that for n > m, m- Thus, x G Gm for m = 1, 2, ... . As we also have xn G Bri (xi) C U, we conclude that x G U. Hence, x G U П (ПХ1 ^n)« The following corollary is an immediate consequence of the proof of Theorem 8.1 and is sometimes referred to as the Baire category theorem. COROLLARY 8.1 Baire Category Theorem (Alternative Version) Let {C?n}^Li be a sequence of dense open subsets of a complete metric space. Then OJXi Gn is dense. A subset of a topological space is said to be of the first category if it can be expressed as a countable union of nowhere dense sets. A set that is not of the first category is said to be of the second category. Using this terminology, we can restate the Baire category theorem as follows: A complete metric space is of the second category. It shows that sets of the first category in a complete metric space are in a sense much smaller than sets of the second category. The Baire category theorem is frequently used as a tool for obtaining existence results. One establishes the existence of an object of a certain type by showing that the complement of the collection of such objects is of first category in an appropriate complete metric space. Examples 8.2 and 8.3 illustrate this technique.
496 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation EXAMPLE 8.2 Illustrates the Baire Category Theorem We know that is a complete metric space. Whereas every single-element set in TZ is nowhere dense and the set Q of rational numbers is countable, it follows that Q is a set of the first category. Hence, the Baire category theorem implies that Qc is nonempty, that is, irrational numbers exist. □ EXAMPLE 8.3 Illustrates the Baire Category Theorem By Theorem 7.19 on page 486, the space C([0,1],7£) equipped with the norm || || [o,i] is a complete metric space, where ||/H[O,1] = sup{ |/(x)I : X e [0,1] }. We will use the Baire category theorem to show that “most” functions in C([0,1], TV) vary erratically in the sense that they fail to be monotonic on any nonempty subinterval. A stronger result of this type is developed in Exercise 8.2. Let T denote the collection of nonempty open subintervals of [0,1] having rational endpoints. For I G I, let Uj and Dj denote, respectively, the set of functions in C([0,1], TV) that are nondecreasing and nonincreas- ing on I. If we let F denote the set of functions in C([0,1],7£) that are monotonic on some nonempty subinterval of [0,1], then we have that F=U(t7fUDr). (8.4) it? It is not hard to see that for each I G I, Uj and Dj are closed in C([0,1],7£). If we can show that, for each I G T, Uj and Dj have empty interiors, then it will follow from (8.4) that F is a set of the first category. In particular then, the Baire category theorem will imply that Fc is nonempty; that is, there are functions in C([0,1],7£) that fail to be monotonic on any nonempty subinterval of [0,1]. We will prove that (C7/)° = 0. A similar proof shows that (-D/)° = 0. Let € > 0 and f G [7/. We choose a point t G I and 6 > 0 small enough so that the function f varies by less than e/2 on the interval [t — 5, t H- 5] C I. Define o(x\ _ f f(x), if x € [0,1] \ [t - <5, t + 6]; 9[X) ~ ( /(x) + 6~2e(62 - (x - t)2), if x € [t - 6,t + 6]. It is easy to see that д is continuous and that \\f — ^||[o,i] = 6- On the other hand, д because g(t) is greater than g(t + 6). This shows that every ball around f contains points outside Uj. □
8.1 The Baire Category Theorem □ 497 We conclude this section with an important consequence of the Baire category theorem called the uniform boundedness principle. We will give some applications of the uniform boundedness principle in the exer- cises. It will also prove useful later when we return to our study of normed spaces. THEOREM 8.2 Uniform Boundedness Principle Let Q be a complete metric space and J- a family of continuous functions from Q into a normed space (Л, || ||). Suppose that for each x € Q, sup{ ||/(x)|| : f € .F} < oo. (8.5) Then there is an M e 11 and a nonempty open set О such that ||/(u)|| < M for all и E О and f e. J2. PROOF: For each n E Af and f € J7, the set {x : ||/(ж)|| < n} is closed. Thus, En = П : ll/WU < ; 11/MII < " all / € Л /GJF is closed. By (8.5), Q = UJXi The Baire category theorem now implies that there is an integer N such that (En)° / 0. Thus, the assertion of the theorem is verified with M = N and О = (En)°- EXERCISES 8.1 8.1 Let E = {/ € C([0,1]) : \f(x) - f(y)\ < |ж - y\ for all x,y € [0,1] }. Show that E is nowhere dense in C([0,1]). 8.2 For this exercise, we need to extend the definition of differentiability given in Definition 6.1 on page 316. When we have a function f defined on a closed interval [a, 5], we say that f is differentiable at the left endpoint a if the limit f(a + h) — f(a) lim --------------- h-*o+ h exists and is finite. In that case the limit is called the derivative of f at a and is denoted by /'(a). In a similar way, we define differentiability at the right endpoint b. a) Let D denote the set of functions in C([0,1]) that are differentiable at some point of [0,1]. Show that D is of the first category. Hint: Consider the set of functions Dn = { f : \f(t) — f(x)| < n\t — x| for all t € {0,1] for some x e [0,1] }.
498 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation b) Deduce from part (a) that there are functions in C([0,1]) that are not differentiable at any point of [0,1]. 8.3 Show that, as a subset of (ZL1 ([0,1]), || ||i), Z2([0,1]) is of the first category. Hint: See Exercise 4.84 on page 206. 8.4 In this exercise, we will be considering functions from 1Z to 1Z. a) Give an example of a function that is continuous at each irrational and discontinuous at each rational. Hint: Consider the function that is 0 at each irrational, 1 at 0, and 1/q at each rational of the form p/g, where q > 0 and p and q have no common divisor except 1. b) Is there a function that is continuous at each rational and discontinuous at each irrational? Hint: Associate with each function /: func- tions и and £ defined by u(x) = inf{ sup{ f(y) :|z — p|<6}:6>0} and £(x) = sup{ inf{ f(y) :|z — p|<6}:6>0}. Consider the set where £ is strictly less than u. 8.5 Show that there is no sequence {fn}™=1 in C([0,1]) that satisfies .. . , 4 fl, if x is rational; hm fn(x) = 4 . n—oo 10, if x is irrational. In Exercises 8.6 and 8.7, we will need the concept of a basis for a linear space. A subset S' of a linear space Q is said to be a basis for Q if for each nonzero x G Q, there is a unique subset {zi, тг, • • •, zn} of S and a unique set of nonzero scalars {ai,«2,.. •, on} such that x = oizi 4- 02^2 4--1- anxn. It follows from Zorn’s lemma that every linear space has a basis. The number of elements in a basis is called the dimension of the linear space. A linear space is said to be finite dimensional if it has a basis containing finitely many elements; otherwise, it is said to be infinite dimensional. 8.6 Let Q be a normed space and D a proper, finite-dimensional, linear subspace of Q. a) Show that D is closed. b) Show that D is nowhere dense. 8.7 Let Q be a complete normed space and S a basis for Q. Prove that S is either finite or uncountable. Hint: Refer to Exercise 8.6. 8.8 Let Q be a normed space and D a proper closed linear subspace of Q. Show that D is nowhere dense. 8.2 CONTRACTIONS OF COMPLETE METRIC SPACES Let f: Q —> Q. An element p G Q is called a fixed point of f if /(p) = p. In this section, we will give a simple condition on f that guarantees the existence of a fixed point when Q is complete. We will also give some applications of that condition to differential and integral equations.
8.2 Contractions of Complete Metric Spaces □ 499 DEFINITION 8.2 Contraction Let (Q, p) be a metric space. A mapping f: fi —> Q is called a con- traction if there is a constant c G [0,1) such that cp(x,y) for all z, у G Q. EXAMPLE 8.4 Illustrates Definition 8.2 a) It is obvious that contractions are always continuous functions. b) Suppose a is a positive constant and define f(x) = ax for x G [0,1]. If a < 1, then f: [0,1] —> [0,1]. If a < 1, then f is a contraction with c = a. If a = 1, then f is not a contraction, since 1 = |l-0| = 1/(1)-/(0)1 < c|l —0| = c implies c > 1. Note, however, that 0 is a fixed point of f for all a. c) Suppose that f: [0,1] —► [0,1] is continuous and has a derivative at each point of (0,1). Suppose further that В = sup{ |/'(гг) | : x G (0,1) } < oo. By the mean value theorem, for [0,1] with x < ?/, we have that f(y) “ f(x) = ~ x) f°r some t G (z, y). It follows immediately that \f(y)—f(x)\ < B|t/—rrI for all x, у G [0,1]. Hence, f is a contraction if В < 1. d) An interesting special case of part (c) is the cosine function. Whereas the derivative of cos x is — sin x and sup{ | — sin ж| : x G [0,1] } = sin 1 < sin(7r/2) = 1, it follows from part (c) that the cosine function is a contraction of [0,1]. e) Let Q = {zgC:|z|<1}, the closed unit disk, equipped with the usual metric inherited from C. Define Q by /(z) = az, where a is a complex constant with |a| < 1. It is easy to see that f is a contraction if and only if |a| < 1. Note, however, that 0 is a fixed point of f for all a. f) Let Q = {zgC:|z|<1}, the open unit disk, equipped with the usual metric inherited from C. Define f: Q —► Q by /(z) = (1 + z)/2. Then f is a contraction with c = 1/2, but has no fixed point. □
500 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation THEOREM 8.3 Contraction Mapping Principle Let (Q, p) be a complete metric space and f:Q-^Qa contraction. Then f has a unique fixed point. Furthermore, if x is any point of Q, then the sequence {^n}^L0 defined recursively by xq = x and xn+i = f(xn) converges to the unique fixed point of f. PROOF: By Definition 8.2 there is a constant c G [0,1) such that p(xn+i,xn) = p(/(xn),/(a;n_1)) < cp(in,xn-i), for n = 1, 2, ... . It follows immediately that p(xn+i,xn) < cnp(xi,x). Thus, if 1 < n < m, we have ТП — 71 771 — 71 p(xm,xn) < 52 x^n+j.^n+j-i) < 52cn+J_1 J=1 J=1 < Cnp(x\, x} = Cnp{x\, z)/(l — c). Because 0 < c < 1, we have that {2rn}£L0 is a Cauchy sequence. Hence, by completeness, p = limn-^ooXn exists. Using the continuity of /, we conclude that /(p) = lim f{xn) = lim xn+i = p. n—*oo n—*oo Thus, p is a fixed point of f. It remains to establish uniqueness. Let q be a fixed point of f. Then p(p,g) = p(/(p)>№)) < cp(p,g). Since c < 1, it follows that p(p,g) = 0. Hence, p = q. We note from the proof of the contraction mapping principle that we can obtain not only the existence of a unique fixed point, but a method for approximating it and an error estimate. See Exercise 8.12. EXAMPLE 8.5 Illustrates the Contraction Mapping Principle We will illustrate the contraction mapping principle by using it to obtain an existence result for a certain class of integral equations. Let AT be a real-valued Borel measurable function defined on the rectangle I x J = [x0 - a, z0 + a] x [т/0 - Д po + &] where > 0. We make the following assumptions: ~K(x,z/2)| < -A|3/i-Уа! (8.6)
8.2 Contractions of Complete Metric Spaces □ 501 for all x e I and any pair ?/i, y2 G J, where A is a constant, and В = sup{ K(x, y) : (x,y) € I x J} < oo. (8.7) We will show using the contraction mapping principle that the integral equation 9(x)=y0 + [ K(t,g(t))dt (8.8) Jxo has a unique solution g G С(/, IV) if a A < 1 and aB < /3. By Theorem 7.19 on page 486, C(7, IV) is a complete metric space with respect to the norm || ||j. Hence, by Proposition 7.10 on page 435, the closed ball B/?(t/o), centered at the constant function y$, is also complete. If g G B/g(yo)J then the function T(g) defined by Г(з)(х) = Уо + / K(t, g(t)) dt (8-9) is continuous on I. (See Exercise 8.14.) Furthermore, since |r(ff)(®) -2/o| = / K(t,g(t)) Jxo < aB < /3, dt it follows that T(g) G Bp(y$). Thus, the function T defined by (8.9) carries В(з(уо) into itself. _ We will show that (8.8) has a unique solution in B/?(?/o) by showing that T is a contraction. For f,g€ B@(yo) and x > xq we have, using (8.6), |T(/)(x) - T(p)(x)| = Г - K(t,g(t\) dt Jxq <A \f(t) - p(t)| dt < aA\\f - fl'll/. Jxq Similar inequalities hold if x < Hence, ||T(f) — T(g)||z < aA||/ — g\\j and, because a A < 1, T is a contraction on Bp(yo). □ The conditions (8.6) and (8.7) appear rather restrictive. However, they can be used effectively to obtain a fundamental existence result for differential equations. See Exercise 8.16.
502 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation EXERCISES 8.2 8.9 Give an example of a complete metric space (Q, p) and a map ft Q —> Q that satisfies p(f(x), < р(я, у) but has no fixed point. 8.10 Define ft [0,1] [0,1] by f(x) = x2. Show that f is not a contraction. Note, however, that f has two fixed points, 0 and 1. 8.11 Let ft [0,1] —> [0,1] be continuous. a) Show that f has a fixed point. b) Must f have a unique fixed point? 8.12 Let (Q,p), /, {zn}£L(b and P be as in the proof of the contraction mapping principle. Establish the error estimate p(xn,p) < cnp(x, /(x))/(l — c). 8.13 Recall Newton’s method ®n+i = xn — G^x^/G'^Xn) for approximating the roots of a function G. Suppose that r is a root of G. Further suppose that 8 is a positive number such that G' does not vanish on [r — 6, r 4- <5] and c = sup {|g(x)G'7z)/(G,(x))2| : x € [r — 6,r + 6]| < 1. Show that if the initial guess xq in Newton’s method is chosen from the interval (r — 6,r-h<5), then the sequence {zn}“=o converges to r and satisfies \zn-r\<^G(x0)/G'(x0)\. Hint: See Exercise 8.12. 8.14 Refer to Example 8.5. a) Verify that K(-, </(•)} is Borel measurable if g G b) Use part (a) to prove that T(g) is continuous on I. 8.15 Let Tt C([0,1]) — C([0,1]) be defined by Г(/)(т) = 1 + f* f(t) dt. a) Show that T is not a contraction. b) Show that T о T is a contraction. c) Show by direct calculation that the sequence {/n}S?=o defined recursively by /о = 1 and fn+i = T(fn) converges in C([0,1]) to the solution of f(x) = l + ^ f(t)dt. In Exercises 8.16 and 8.17 we use the following notation for partial derivatives. Suppose f is a real- or complex-valued function defined on some open subset of Ип containing the point x = (zi, • • •, zn). When lim f(X1 + ^>^2,..-,^) - /(Х1,х2,...,жп) h—*0 h exists and is finite, we denote it by Dif(x) and call it the partial derivative of f at x with respect to a?i. Partial derivatives with respect to z2, яз, ..., xn are de- fined similarly. We will make use of standard results about partial differentiation from multivariable calculus? t See, for example, A. E. Taylor and W. R. Mann, Advanced Calculus, 3rd edition (New York: Wiley, 1983).
8.3 Compactness in the space C(Q, Л) □ 503 8.16 Use Example 8.5 to establish the following existence theorem for differential equations: Suppose that f and Р2/ are defined and continuous on some open set containing the point (xo,yo) G TZ2. Then there is a 6 > 0 and a unique continuously differentiable function g such that g(xo) = у о and g'(x) = /(ж,^(ж)) for |ж - z0| < 6. 8.17 Show that the uniqueness part of Exercise 8.16 fails if the conditions on the derivative £>2/ are removed. Hint: Consider д' = Зрз /2. 8.3 COMPACTNESS IN THE SPACE C(fi,A) We now take up the study of compactness in the space (C(Q, Л), T(Q, Л)), introduced in Section 7.12 beginning on page 481, where fi is a topological space and (Л, p) is a metric space. Under certain mild restrictions on Q, we give useful necessary and sufficient conditions for a subset D of C(Q,A) to be compact with respect to the topology T(Q, Л) of uniform convergence on compact subsets. But first we present a simple example. EXAMPLE 8.6 Illustrates Compactness in Function Spaces Consider the space C(TZ) equipped with the topology T(TZ) of uniform convergence on compact subsets. a) As for any topological space, any finite subset of C(TZ) is compact. b) Let f € C(TZ) and'define g: [0,1] —> C(TZ) by g(t)(x) = f(x+t). It is not difficult to show that g is continuous and, therefore, from Theorem 7.11 on page 472, { f (•+/):/ G [0,1] } is a compact subset of C(TZ). c) Let f € C(TZ). The set {/(• +1) : t G TZ} may fail to be a compact subset of C(7£), as is the case when /(x) = x. □ The construction of more elaborate examples of compactness in func- tion spaces requires some theory. We first develop a necessary condition for the compactness of D C C(Q, A) using the functions {ex : x G Q}. We will use a result from Exercise 7.174 on page 491, which we now state formally as a proposition. PROPOSITION 8.1 For x e Q, define ex: C(Q, A) -» A byex(f) = f(x). Then ex is a continuous function. Applying Proposition 8.1 and Theorem 7.11 (page 472) we get the following necessary condition for compactness of a subset of С(П, A).
504 о Chapter 8 Complete Spaces, Compact Spaces, and Approximation PROPOSITION 8.2 If D C C(Q, Л) is compact, then { f(x) : f G D} is a compact subset of A for each z G Q. . Next, using Proposition 8.2 and Theorem 7.7 on page 466, we derive another necessary condition for compactness of a subset of C(Q, A). PROPOSITION 8.3 Suppose that Q is a locally compact Hausdorff space and D is a compact subset of С(П, A). Then, given x € П and e > 0, there is an open set W containing x such that p(f(x), /(?/)) < e for ally eW and f G D, PROOF: First recall that for f,g G С(П, A) and S C$1, Ps(J,g) = sup{p(y(x),p(x)) :i€S} and that T(£l, A) is the weak topology on C(Q, A) determined by the family erf functions { Pk(-, g) : К compact, g G C(Q, A) }. Now let U be an open set containing x such that U is compact. Then, for each h G C(Q, A), the function Рц(-, ti) is continuous. It follows that the collection of sets { Jf : f G D } where Jf = { g : pff(g, f) < c/3 } is an open covering of D. Hence, there are finitely many functions /i, /2, • • •, /п € D such that D C /fj- Because there are only finitely many fjS, we can find an open set W such that x G W C U and p(Jj(x), fj(y)) < e/3 for each у G W and j = 1, 2, ..., n. If f G D, we choose к G {1,2,..., n} such that py(fk, f) < e/3- Then for each у G W, we have f(y)j < p(f(x), fk(x)j + p(/fc(x), fk(y}) + p(/fc(у), У(у)) < 2py(fk, f) + p(fk(x), fk(y)) < 2t/3 + e/3 = e, as required. The necessary condition derived in Proposition 8.3 is a natural exten- sion of the notion of continuity. It is so important that it merits a formal definition, which we now give. Note that the definition does not require Q to be locally compact or Hausdorff.
8.3 Compactness in the space C(Q, Л) □ 505 DEFINITION 8.3 Equicontinuity A subset D of C(fi, Л) is said to be equicontinuous on fi if for each x E fi and € > 0, there is an open set W containing x such that p(/(x), /(?/)) < € for all у € W and f € D. We will now show that under mild conditions on fi, the necessary conditions derived in Propositions 8.2 and 8.3 are also sufficient. THEOREM 8.4 Let SI be a second countable locally compact Hausdorff space. A subset D of C(SI, Л) is compact if and only if it is closed and satisfies the following conditions:________ a) { f(x) : f e D} is a compact subset of Л for each x G fi. b) D is equicontinuous. PROOF: Propositions 8.2 and 8.3 already show that conditions (a) and (b) are implied by the compactness of D. Assume that (a) and (b) hold and that D is closed. From Theorem 7.19 on page 486, we know that the space C(fi,A) is metrizable. Hence, by Theorem 7.7 on page 466, the compactness of D will follow if we can show that every sequence in D has a subsequence that converges uniformly on compact subsets of fi. So let {/n}~=1 C D. Since fi is second countable, Proposition 7.20 on page 461 implies that it contains a countable dense set, E. Suppose we can find a subsequence {gkjkLi of {/n)Xi suc^ ^at = lim 9k(y) (8.10) K—+OO exists for all у € E. We will show that the limit in (8.10) exists for all x G fi, that the limit function g is continuous, and that {gk}^-i converges uniformly on compact subsets of fi to g. Let x 6 fi and e > 0 be given. By (b) we can choose an open set W containing x such that p(fffc(a:),ff*:(w)) < e/3 (8.11) for each w € W and к G Af. As E is dense in fi, there is а у € E П W. By (8.10) there is an N 6 AT such that р(дк(у)^де(у)) < e/3 whenever
506 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation k,t> N. It follows that p(Sfc(a:),+ P(9k(.y),9e(y)) + р(«(у),рЯ^)) (8-12) < б/З 4- б/З 4~ б/З = 6 for к,£> N. Thus, the sequence {рл(^)}ь=1 is Cauchy. Since {gk(x)}kLi is contained in the compact set {/(x) : f G D}, it follows from Theorem 7.7 that limfc_>oogk(x) = g(x) exists. Thus, (8.10) continues to hold if у is replaced by any x G fi. We will now verify that g is continuous at each x G fl. For each e > 0, we choose an open set W as in the previous paragraph. From the inequalities |p(ff(®),<z(w)) - p(fffc(x),pfc(w))| < |p(ff(:r),5(w)) - p(g(x),gk(w))\ + |p(ff(^),5fcW) - p(gk(x),gk{w))| < P(ff(w),5*;(w)) + p(g(x'),gk(x)'), it follows that lim^^oo p(^(x),^(w)) = p(^(x),^(w)). Consequently, we can let к pass to infinity in (8.11) to obtain p(g(x), <?(w)) < б/З < б for each w G W. Next we show that for each x G Q and б > 0, there is an open set Ox containing x and an integer kx such that p(^fc(w),ff(w)) <6, w € Ox, к > kx. (8.13) Indeed, by the continuity of g and (b), we can choose an open set Ox containing x such that for each w G Ox, we have p(^(w),^(x)) < б/З and p(<7k(w)>^(z)) < e/3 for all k. Because g(x) = limfc->oo <7fc(z), it follows that there is an integer kx such that р(<7л(х)>р(х)) < e/% f°r Thus, for each w G Ox, we have p(fffc(w),s(w)) < p(gk(w),gk(x)) + p(gk(x),g(x)) + p(g(x),g(w)) < б/З ~F б/З 4" б/З = б, whenever к > kx and, so, (8.13) holds. Now, let К be a compact subset of Q and б > 0. Then we can cover К with finitely many open sets {OXj each °f which satisfies (8.13). Let ко = max{ kXj : j = 1,2,..., m }. If у € К, then у € OXj for some j and,
8.3 Compactness in the space C(Q, Л) □ 507 hence, p(<7fc(?/), <7(2/)) < c for k > fco. Thus, we have shown that {gk converges uniformly on compact subsets of Q to g. It now remains only to show that there is a subsequence {gk}kLi of {/n}~ i satisfying (8.10). We shall do this by adapting the diagonaliza- tion argument used in the proof of (d) => (c) in Theorem 7.7 on page 466. Let {z/n}^Li be an enumeration of the countable dense subset E of fi that we selected earlier. By (a) we can select a subsequence {/[i,n] }n=1 of {/n}Xi such that #(3/i) = limn-oo /[I,n](2/1) exists. And, then again by (a), we can find a subsequence {/[2^]}^^ of {/[i,n] such that gtyj) = limn->oo f[2,n](yj) exists for j = 1,2. Continuing in this man- ner, we obtain a sequence of subsequences {JXLri. x suc^ ^at g(yj) = limn^oo /[fc,n](3/j) exists for j = 1, 2, ..., fc. Letting gk = f[k,k], for each fc € AT, we now have a subsequence of {/n}^Li that satisfies (8.10). In the mathematical literature, the following variant of Theorem 8.4 is frequently cited. THEOREM 8.5 Ascoli-Arzela Theorem Let ft be a separable topological space and D a subset of С(П, Л) that satisfies the following conditions: a) { f(x) : f G D} is a compact subset of Л for each x € Q. b) D is equicontinuous. Then every sequence in D has a subsequence that converges uniformly on compact subsets of Q. PROOF: In the proof of the sufficiency part of Theorem 8.4, the second countability and local compactness conditions are used only to ensure that C(Q,A) is metrizable. If we are just trying to show that every sequence in D has a subsequence that converges uniformly on compact sets, then all that is required for the proof of Theorem 8.4 to remain valid is the existence of a countable dense subset of Q. Next we give a simple example of the use of Theorem 8.4. More elab- orate applications are left to the exercises. EXAMPLE 8.7 Illustrates Theorem 8.4 Let Q = [a, b] and Л = C. a) The closed ball Br(f) of radius r > 0 in C([a, b]) clearly satisfies the condition (a) of Theorem 8.4. On the other hand, Exercise 8.18 shows that Br(f) is not equicontinuous. Thus, Br(f) is not a compact subset of C([a,b]).
508 о Chapter 8 Complete Spaces, Compact Spaces, and Approximation b) Let A and В be nonnegative constants and F consist of all functions f G C([a, b]) satisfying |/(a)| < A, f is differentiable on (a, b), and \ff(x)| < В for all x G (a, b). We claim that D = F is compact. Indeed, if x G (a, b], then, by the mean value theorem from ele- mentary calculus, we have that f(x) = /(a) 4- f'(t)(x — a) for some t G (a,x). Setting M = A 4- (b - a)B, we see that |/(ж)| < M for all f G F and x G [a, b]. It follows easily that F and, hence, Z), satisfies Theorem 8.4(a). If a?, у G [a, b] and x / ?/, then another application of the mean value theorem yields \f(x) — f(y)\ < B\x — y\. It follows immediately that F is equicontinuous and, hence, by Exercise 8.19, so is D. We can now conclude from Theorem 8.4 that D = F is compact. □ EXERCISES S3 Some of the exercises in this section use the concept of a compact function. Let Q and Л be topological spaces and U C Q. A function f: U —* A is said to be compact if f(U) is a compact subset of A. 8.18 Show that the closed ball Br(f) in C([a,b]) is not equicontinuous. 8.19 Show that if F C C(Q, A) is equicontinuous, then F is also equicontinuous. 8.20 Let g G £х([0,1]) and F denote the set of all functions f G C([0,1]) such that /(0) = 0, f is absolutely continuous, and |/'| < |#| A-ae. Show that F is compact. 8.21 Let Af* denote the one-point compactification of the space (Af, 7d), where Af is the set of positive integers and Td is the discrete topology. Prove that a closed set D C C(Af*) is compact if and only if it satisfies the following two conditions: (1) sup{ |/(tu)| : f G D} < oo and (2) there is a sequence {bn}^^ such that limn—oo bn = 0 and |/(n) — /(o>)| — bn afi f G D and n G Af. 8.22 Let f(x) = sin x 4- sin \/2д:. For t G define ft (%) = f(x — t). a) Show that { ft : t E 11} is a compact subset of C(7£). b) Find a convergent subsequence of {f-2nn}^=1- 8.23 Refer to Exercise 8.22. Give an example of a bounded function f such that { ft : t G 1Z } is not a compact subset of C(1Z). 8.24 Refer to Example 8.5 on page 500. Suppose we drop the assumptions that (8.6) holds and a A < 1, but still assume that (8.7) holds and aB < /3. a) Show that T still maps В/з(1/о) into itself. b) Show that T is a compact function.
8.4 Compactness of Product Spaces о 509 Exercises 8.25—8.28 require some knowledge of the theory of functions of a com- plex variable? In these exercises, Q denotes an open subset of C and H(Q) the set of functions that are analytic on Q. 8.25 Show that is a closed subset of C(Q). Hint: Use the Cauchy integral formula. 8.26 Let F C B(Q). Show that F is compact if and only if it is closed in C(Q) and for each z0 G Q there is an r > 0 such that Br(z0) C Q and sup{ \f(z)\ : f tF, zE Br(zo)} < oo. 8.27 Let Q = Bi(0) and U = { f G H(Q) : \ f\ < 1}. Prove that for each z G Q, there is a g G U such that |/(z)| = sup{\f'(z)| : f G U }. 8.28 Let Q and U be as in Exercise 8.27 and 0 < r < 1. Consider the function T: U U defined by T(/)(z) = f(rz). Prove that T is a compact function. 8.4 COMPACTNESS OF PRODUCT SPACES The Cartesian product of two compact intervals of is a compact rect- angle in 7£2. Indeed, it is not hard to prove that Г x Л is a compact space whenever Г and Л are compact spaces, as you are asked to show in Exercise 8.29. In this section, we prove a striking generalization of the forementioned fact, namely, that the Cartesian product of any collection of compact spaces is compact. As a corollary to the main result, we obtain simple sufficient conditions for compactness of spaces with weak topologies. To begin, we briefly review two prerequisite concepts. • If is an indexed collection of sets, then the Cartesian product of the collection, denoted A, is the set of all functions x on I such that x(l) G Al for each l G I. We call x(l) the Lth coordinate of x and often denote it by xL. • Let {(Qt, TL)be an indexed collection of topological spaces and set Q = XtG/ The function pc: fl —> defined by pb(x) = x(t) is called the tth coordinate projection on fl. The weak topology on Q determined by the family of coordinate projections {pL : l G I} is called the product topology. Thus, the product topology is the weakest topology for which all coordinate projections are continuous. Now we present the main result of this section, known as Tychonoff’s theorem. t See, for example, L. Ahlfors, Complex Analysis (New York: McGraw-Hill, 1979).
510 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation THEOREM 8.6 Tychonoff’s Theorem The Cartesian product of any family of compact spaces is compact. That is, if an indexed collection of compact topological spaces, then the Cartesian product Q = X is compact with respect to the product topology. PROOF: By Theorem 7.10 on page 471 it suffices to show that any col- lection of closed subsets of П having the finite intersection property has a nonempty intersection. So, let C be a collection of closed subsets of Q having the finite intersection property. We need to prove that f| F/0. (8.14) Fee Let 21 denote the family of all A C P(fi) such that A has the finite intersection property and С C A. We will use Zorn’s lemma (page 17) to show that 21 contains a maximal element with respect to the inclusion ordering C. Suppose that € is a nonempty chain in 21. We claim that U = Uxec is an upper bound for C. Clearly A C U for all A € €. So, we need only show that U E 21. Because it is obvious that С C U, it remains to prove that U has the finite intersection property. Suppose Ui,U2, • •. ,Un G U. Then, for each j, Uj G Aj for some Aj G €. Since € is a chain, there exists an A G € such that Aj C A for j = 1, 2, ..., n. It follows that U±, U2,.. - ,Un G A and, since A has the finite intersection property, we conclude that flj=1 Uj / 0. Thus, U is an upper bound for €. Zorn’s lemma now implies that 21 has a maximal element, say, A*. We claim that A* has the following properties: Аъ A2,..., An G A* => p| Aj g A* (8.15) j=i and В С Q and В П A / 0 for all A G А* => В G A*. (8.16) To verify (8.15), let A = p|?=1 Aj. We note that A*U{A} has the finite intersection property and that С С A* U {A}. Because A* is a maximal element of 21, we must have A* U {A} = A*. Hence, A G A*. To establish (8.16), we show that A* U {B} has the finite intersection property. It will then follow from the maximality of A* that В G A*. Let Bx, B2, ..., Bm be distinct elements of A* U {B}. If Bj / В for j = 1, 2, ..., m, then because A* has the finite intersection property.
8.4 Compactness of Product Spaces □ 511 On the other hand,Jf Bk = В for some fc, then (\:^kBj G A* by (8.15). Thus, 771 z \ Г|В, =ВП.(р|В,) /0, J=1 ' as required. Let l G I. For 8 C A*, we have pL (Г|Л€£ А) С Пде^рДА). Hence, the collection {pfc(A) : A 6 A* } has the finite intersection property. From the compactness of the space we can conclude that Pl(A) / 0. It now follows from the axiom of choice (page 16) that there is an x G Q such that Pt(*) € Pl Pl (A) AGX* for each l G I. We will show that x G P|FgC F by proving that if F G C and W is an open set containing x, then WnF/0. (8.17) It will follow that x G F = F for each F G C. We recall that the product topology is determined by the neighborhood basis consisting of finite intersections of sets of the form p^WQ, where Wb is an open subset of Qt. Thus, to establish (8.17), it suffices to consider the case where W = f\€/ РГ^Ж) for some finite subset Iq С I. Let l G Iq and A G A*. Because pb(x) G Pt(A) and WL is an open set containing рДя), it follows that И^ПрДА) / 0. Thus, pp1(Wt)nA / 0 for each A G A* and, hence, by (8.16), pp1(WJ G A*. Therefore, since F G A* and A* has the finite intersection property, (П^е/о РГЧ^)) Fl F / 0. Next we will apply Tychonoff’s theorem to the study of compactness in the context of weak topologies. Specifically, we have the following corollary, which will be useful later when we study weak topologies on linear spaces. COROLLARY 8.2 Suppose that Q has the weak topology determined by a family of func- tions F and that the following conditions are satisfied: a) f (fi) is compact for each f G F. b) Ifx/ y, then f(x) / f(y) for some f G F. c) If {f(xL)}bEj is a convergent net for each f G F, then there is an x G fi such that lim f(xL) = f(x) for each f G F. Then Q is compact.
512 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation PROOF: Each f G T maps Q into a topological space Пу. Condition (a) asserts that Ay = /(Q) is a compact subset of Qy. It follows from Ty- chonoff’s theorem that the product space Л = X y6^ Ay is compact. Since a closed subset of a compact space is compact (Theorem 7.12 on page 473), we can establish the corollary by finding a homeomorphism from Q onto a closed subset of A. Let h: П —* A be defined by h(x)(/) = /(x). Condition (c) is equivalent to the assertion that h(Q) is a closed subset of A. That h is continuous fol- lows from the definitions of product and weak topologies, Proposition 7.13 on page 444, and Theorem 7.1 on page 443. Condition (b) says that h is one-to-one. It remains only to show that the inverse function h^1 is continuous on h(Q). If {h(xc)}cei is a net in h(Q) converging to Л(х), then, by Propo- sition 7.13, converges in Q to x. It now follows from Theorem 7.1 that /i-1 is continuous. Recall that if = A for each l G 7, where A is some set, then Cartesian products of the form XtGj Ab are usually denoted by A1. Note that A1 is just the set of all functions from I to A. In case I is the set M of positive integers, A1 is the set of all sequences of elements of A. Often a typical element of A^ is written in the form x = (xi, X2,...). EXAMPLE 8.8 Illustrates Tychonoff’s Theorem a) Consider the space { 0,1} endowed with the discrete topology and let I be any index set. We write 21 in place of {0,1}7. It follows from Tychonoff’s theorem that 21 is a compact space. b) If [a, b] is a closed bounded interval, then it follows from the Heine-Borel theorem and'Tychonoff’s theorem that [a, b]1 is compact for any index set I. □ EXERCISES 8.4 8.29 Give an alternative proof of Tychonoff’s theorem in case the index set has only two elements. That is, without using Tychonoff’s theorem, show that if Г and A are compact, then so is Г x A (in the product topology). 8.30 Let (Q, T) be a compact Hausdorff space. Show that the topology T coin- cides with the weak topology on Q determined by the family of functions ^ = C(Q). 8.31 Prove that the Cartesian product of a collection of Hausdorff spaces is a Hausdorff space. 8.32 Let Q be a compact Hausdorff space. Show that Q is homeomorphic to a closed subset of [0,1]J for some set I. Hint: Refer to Exercise 8.31.
8.5 Approximation by Functions From a Lattice о 513 8.33 A topological space Q is said to be completely regular if it satisfies the following two conditions: • Q is a Ti -space. • Given a closed set F and a point x G Fc, there is a continuous function k: Q —► [0, l^such that k(x) = 0 and k(F) = {1}. Let Q be completely regular and set F — C(£l, [0,1]). Define h: Q —► [0,1]'77 by h(x)(/) = /(x). Prove that h is a homeomorphism of Q onto h(Q). 8.34 This exercise continues Exercise 8.33. Let /?(Q) denote the closure of h(Q) in [0,1]77. By identifying Q with h(Q) via the map h we may consider Q a dense subset of the compact Hausdorff space /?(Q). Thus, /?(Q) is a compact- ification of Q, that is, a compact space containing Q as a dense subspace. The space /?(Q) is called the Stone-Cech compactification of Q. a) Prove that if g: Q —► 1Z is continuous and bounded, then it has a contin- uous extension to /?(Q). b) Does the one point compactification of Theorem 7.17 have the property in part (a)? Note: To appreciate the Stone-Cech compactification, it helps to consider the continuous function sin(l/a:) on the interval (0,00). 8.35 This exercise continues Exercise 8.34. Show that /?(Q) is the largest com- pactification of Q in the following sense: If A is a compact Hausdorff space such that Q C A, Q = A, and the topology of Q is the same as its relative topology inherited from A, then there is a continuous function f: /?(Q) —► A such that f(x) = x for each x G Q. 8.35 Show that if I is uncountable, then 2J is not metrizable. 8.3 f Prove that 2^ and the Cantor set are homeomorphic. Hint: Define h on 2^ by h((xi, x2, •••))= 2x„/3n. 8.38 Show that if Q is a compact metric space, then Q is homeomorphic to a closed subset of [0, l]7^. 8.39 Find a continuous function from 2^ onto [0, l]7^. 8.40 This exercise involves the Cantor set. a) Let F be a nonempty closed subset of the Cantor set P. Show that there is a continuous function r: P —♦ F such that r(t) = t for each t G F. b) Use part (a) and Exercises 8.37-8.39 to show that if Q is a compact metric space, then there is a continuous function from P onto Q. 8.5 APPROXIMATION BY FUNCTIONS FROM A LATTICE Recall that a real-valued function g on an interval J is piecewise linear if there is a partition o,q < ai < 0,2 < ... < an of J such that on each subinterval we have g(t) = mjt 4- bj for some real numbers mj
514 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation and bj. Consider the following two problems of approximation of continuous functions on the closed interval [0,1]. Problem 1: Given f E C([0,1], 7£) and e > 0, find a continuous piecewise linear function g such that |/(t) — g(t)\ < e for each t E [0,1]. Problem 2: Given f e C([0,1], 7£) and б > 0, find a polynomial p such that \f(t) ” pWI < 6 for each t € [fi? 1]- Let us denote by W the set of continuous piecewise linear functions on [0,1]. And let us also denote by Pr the set of polynomials with real coefficients? Then it is easy to see that Problems 1 and 2 can be solved if, respectively, f E W and f E Pr- Motivated by Problem 1, we will prove in this section a general result from which we obtain W = C([0,1],7£) as a special case. In the next section, we will prove a theorem that has Pr = C([0,1],7£) as a special case. It then follows that Problems 1 and 2 can be solved for each f E C([0,1], 7£). The collection W of continuous piecewise linear function is a motivat- ing example for the following definition. DEFINITION 8.4 Lattice of Functions A collection £ of real-valued functions on a set Q is called a lattice if it is closed under maximums and minimums. That is, a) f>9 € £ implies fVgEC. b) f>9 £ £ implies f Ад E £. EXAMPLE 8.9 Illustrates Definition 8.4 a) The following are lattices of functions contained in C([0,1],7£): (i) C([0,l],7£) itself. (ii) The collection of nonnegative continuous functions on [0,1]. (iii) The collection W of continuous piecewise linear functions on [0,1]. b) The collection Pr of polynomials on [0,1] with real coefficients is not a lattice of functions. □ t If we restrict a member of Pr to any subset of 7£, it is continuous thereon. For convenience, we will abuse notation slightly and use Pr to denote the collection of polynomials with real coefficients considered as functions on any particular subset of 7£. Context will determine the appropriate subset.
8.5 Approximation by Functions From a Lattice □ 515 The main result of this section, the Kakutani-Krein theorem, provides a set of sufficient conditions for a lattice to be dense in C(fi, TV) when fi is a compact Hausdorff space. In order to prove that result, we first establish the following theorem, which is important in its own right. You should recall the notation ||/||q = sup{ |/(x)| : x G fi }. THEOREM 8.7 Dini’s Theorem Let fi be a compact Hausdorff space. Suppose that F C C(fi, TV) has the following two properties: a) f>9 F implies there is an h G F such that h < f f\g. b) The function /о defined by fo(x) = inf{/(ic) : f G T7} is real-valued and continuous. Then given e > 0, there exists an f G F such that Ц/ — fa||q < c. PROOF: By the definition of /о, for each x G fi, there is a function fx G F such that /х(х) < /o(^) + c. Because /о is continuous, the sets ux = { у : fx(y) < /о(у) + e}, x e Q, constitute an open covering of fi. Hence, there are points x^, X2,..., xn G fi such that fi = Uj=i VXj • Using (a), we can find an f G F with / < fx. for j = 1, 2, ..., n. Hence, for each x G fi, we have /o(®) < /(x) < min fx (x) < fo(x) + e. It follows at once that \\f — /o||q < e. DEFINITION 8.5 Separation of Points A collection F of functions on a set fi is said to separate points of fi if whenever x and у are distinct elements of fi, there is an f G F such that /(x) / /(y). EXAMPLE 8.10 Illustrates Definition 8.5 a) If F contains a one-to-one function, then it separates points. b) PT separates the points of [0,1] because it contains the identity function. c) W separates the points of [0,1] because it contains the identity function.
516 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation d) Let F denote the polynomials on [—1,1] containing only even powers, that is, polynomials of the form do + aii2 4-----к anx2n. Then T does not separate the points of [—1,1] since /(-1) = /(1) for all f e T. e) Suppose fl is a topological space with the property that there is a col- lection T C C(fl, Tty separating the points of fl. Then fi is a Hausdorff space. f) The collection of functions {sinrr,cosa;} separates points of [0,2%), but it does not separate points of [0,2тг]. □ THEOREM 8.8 Kakutani-Krein Theorem Let fl be a compact Hausdorff space. Suppose that £ C C(Q, Tty satisfies the following conditions: a) £ is a lattice, b) £ separates points of fl. c) f E_£ and c^Tl implies cf G £ and f + c€ £. Then £ — C(tl,1ty. PROOF: For д G C(Q, Ity, let £g = { f G £ : g < f }. Because £ is a lat- tice, it follows that £g satisfies condition (a) of Dini’s theorem. Therefore, if we can show that c?(x) = inf{ f(x) : f eCg} (8.18) for each x 6 Q, then Dini’s theorem will imply the required result. We will prove (8.18) by constructing for each e > 0, an fx € £g such that = g(x) + e. To begin our construction, we show that for each pair of distinct points V, z € fl and each pair of real numbers a and b, there is an h € £ such that h(y) = a and h(z) = b. (8.19) Using (b), we choose an ho € £ with ho(y) ho (z) and then, using (c), we conclude that the function h = (a - b)———+ b ^o(y) - h0(z) belongs to £. It is easy to see that h satisfies (8.19). Next we consider the open set О = { у : g{y) < g(x) + e }. For each z € Oc, we can apply (8.19) to obtain an hz G £ such that M*) = sW + < and hz(x) = g(x) + e/2.
8.5 Approximation by Functions From a Lattice □ 517 Let Vz = { у : hz(y) > g(y)}. Then { Vz : z G Oc } is an open covering of the compact set Oc. Therefore, there are points zi, Z2,..., zn G Oc such that Oc C U;=i VZj. Let fx = + б) V hZ1 V hZ2 ... V hZn. It follows from (c) that £ contains the constant function g(x)+e and, hence, by (a), we have fx e £. If у G Oc, then у G VZj for some j G {1,2,..., n} and, consequently, we have that fx(y) > hZj (?/) > g(y). On the other hand, if у G O, then fx(y) > > д(у)- It now follows that fx G Cg. Finally, we have that fx(x) = (g(z) + e) V (g{x) + e/2) V • • V (g(x) + e/2) = g{x) + e, as required. EXAMPLE 8.11 Illustrates the Kakutani-Krein Theorem It is easy to see that the collection W of continuous piecewise linear func- tions on [0,1] satisfies the conditions of the Kakutani-Krein theorem. Con- sequently, W = C([0,1], 1Z). In other words, Problem 1 on page 514 can be solved for each f G C([0,1]). □ The most important application of the Kakutani-Krein theorem comes in the next section where it is used to prove the Stone-Weierstrass theorem. EXERCISES 8.5 8.41 Show that Dini’s theorem fails if the assumption that /0 is continuous is dropped. 8.42 In Exercise 2.63 on page 72, we asked you to prove another version of Dini’s theorem. Show that the theorem stated there is a special case of the Dini’s theorem of this section (Theorem 8.7). 8.43 Verify that the collection W of continuous piecewise linear functions on [0,1] is a lattice. 8.44 Suppose that £ C C(Q,7£), where Q is a compact Hausdorff space. Show that if £ is a lattice, then so is £. 8.45 Suppose that £ is a linear subspace of C(Q,7£). Show that £ is a lattice if and only if |f I G £ whenever f G £. 8.46 Give an example showing that the Kakutani-Krein theorem fails if condi- tions (b) and (c) are retained but (a) is dropped.
518 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation 8.47 Give an example showing that the Kakutani-Krein theorem fails if condi- tions (a) and (c) are retained but (b) is dropped. 8.48 Give an example showing that the Kakutani-Krein theorem fails if condi- tions (a) and (b) are retained but (c).is dropped. 8.49 Let Q be a compact Hausdorff space. Suppose that £ C C(Q, 7£) satisfies the following conditions: • £ is closed. • £ is a linear subspace of C(Q, 7£). • £ is a lattice. • f G £, g € C(Q, 7£), and 0 < g < f imply g G £. Show that either £ = C(Q,7£) or there is a nonempty closed set F C Q such that £ C { f G C(Q, TV) : f(x) = 0 for each x € F }. 8.6 APPROXIMATION BY FUNCTIONS FROM AN ALGEBRA In our study of measure theory, we found that the concept of an alge- bra of functions is essential. Now we will see that this concept is also of importance in the study of approximation by functions. DEFINITION 8.6 Algebra of Functions A collection A of real-valued or complex-valued functions on a set Q is called an algebra if it is closed under addition,, scalar multiplication, and multiplication. That is, if /, g G A and a is a scalar, then a) f + gtA. b) af G A. c) f • 9 e A. Theorem 4.3 (page 176) and Exercise 4.32 (page 182) show, respec- tively, that the collection of real-valued and complex-valued functions mea- surable with respect to a cr-algebra of subsets of a set Q form algebras of functions. In addition, Theorem 2.4 (page 66) shows that the collection of real-valued continuous functions oh a subset of 7Z constitutes an alge- bra of functions. It is this latter type of algebra—algebras of continuous functions — that will be important to us in this section.
8.6 Approximation by Functions From an Algebra □ 519 EXAMPLE 8.12 Illustrates Definition 8.6 a) Let Q be a topological space. Then the collection C(Q, 7£) of real-valued continuous functions on Q is an algebra of functions. b) Let Q be a topological space. Then the collection C(Q) of complex- valued continuous functions on Q is an algebra of functions. c) Let Pr denote the collection of polynomials with real coefficients viewed as functions on the closed bounded interval [a, b]J Clearly, Pr is an algebra in C([a, b], 7£). d) Let Ur denote the collection of trigonometric polynomials with real co- efficients viewed as functions on the closed bounded interval [a, b\. We claim that Ur is also an algebra in C([a, b],7£). To see this, recall that a trigonometric polynomial и with real coefficients is a function of the form n u(t) = cos jt + bj sin ji), (8.20) j=o where the a7s and bjS are real numbers. That Ur is a linear subspace of C([a, b],7£) is clear; that it is also closed under multiplication and, hence constitutes ah algebra, follows from the trigonometric identities 2 cos jt cos kt = cos(j 4- k)t 4- cos(j — k)t, 2 sin jt sin kt = cos(j — k)t — cos(j 4- k)t, and 2 sin jt cos kt = sin(j 4- k)t 4- sin(j — k)t. e) By considering functions of the form (8.20) where the djS and bjS are permitted to be complex numbers, we obtain the collection W of complex trigonometric polynomials. Rather than writing complex trigonometric polynomials in the form (8.20), we will usually work with the equivalent expression «w = ZL cieijt- j=-n It is easy to check that, viewed as a subset of C([a, b]), U is an algebra of functions. 1 See the footnote on page 514.
520 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation f) Suppose Q is a compact subset of C. Let P(fi) denote the collection of functions in С(П) that are polynomials in 2, that is, functions of the form p(z) = where the a^s are complex constants. It is clear that P(Q) is an algebra in C(Q). Another algebra in C(fi), which we denote by P*(Q), consists of all polynomials in z and 7, that is, functions of the form P(2,2) = j=0 fc=0 where each aj^ E C. We will see later that the closure of P*(Q) is C(Q). g) Refer to part (f). The case fi = T, where T is the unit circle in the complex plane, { z G C : |z| = 1}, is of particular interest. It is not hard to show that P*(T) consists of all functions of the form n «(*) = £ cizi' zeT' j~—n where each Cj E C. There is a connection between P*(T) and the collection U of trigonometric polynomials, namely, и E U if and only if u(t) = for some q E P*(T). □ Motivated by Problem 2 on page 514, we now take up the question of when an algebra of functions A C C(Q,7i) is dense in C(Q,7£). Later in this section, we will consider the same question when 1Z is replaced by C. It is a classical result due to Karl Weierstrass that every continuous real-valued function on a closed bounded interval can be uniformly approx- imated arbitrarily closely by polynomials. In the notation of the preceding example, Weierstrass’s theorem states that Pr = C([a, b],7£). Our next theorem, the Stone-Weierstrass theorem, is a far-reaching generalization of the forementioned result. The Stone-Weierstrass theo- rem gives a set of sufficient conditions for an algebra of functions to be dense in C(Q, 7£) when Q is a compact Hausdorff space. Its proof relies on the following lemma whose verification was considered in Exercise 3.2 on page 102? t Note that the lemma is actually a special case of the classical Weierstrass theorem but can be proved using a rather straightforward argument.
8.6 Approximation by Functions From an Algebra □ 521 LEMMA 8.1 For each e > 0, there is a polynomial p such that ||t| — p(t)\ < e for all tehMk THEOREM 8.9 Stone-Weierstrass Theorem Let fi be a compact Hausdorff space. Suppose that A C C(Q, 1Z) satisfies the following conditions: a) A is an algebra. b) A separates points of Q. с) 1 eA. Then A = Cffl.TV). PROOF: We leave it to the reader as an exercise to show that because A is an algebra of functions, so is A. If we can prove that A is also a lattice, then the verification will be complete on account of the Kakutani-Krein theorem (page 516). We first note that/Vp = (/+p+|/-p|)/2 and fKg = (/4-p-|/-p|)/2. Thus, to prove A is a lattice, it suffices to show that feA=>\f\eA. (8.21) If f = 0, then (8.21) is trivial. If f 0, let g = //||/||n and observe that g € A. Given б > 0, we can apply Lemma 8.1 to obtain a polynomial p such ||t| “p(t)| < б for all t G [—1,1]. Because the range of g is contained in [—1,1], it follows that |||p|-p°5||n < And because p о g is a polynomial in powers of g and A is an algebra containing the constant functions, we conclude that род e A. Thus, |p| 6 A = A. Finally, because f is a scalar multiple of g and A is an algebra, it follows that \ f\ G A. EXAMPLE 8.1 3 Illustrates the Stone-Weierstrass Theorem Suppose fi is a compact subset of TV1. Let P™ denote the set of polynomials in n variables with real coefficients. It is clear that, as a collection of func- tions on Q, P™ satisfies the hypotheses of the Stone-Weierstrass theorem. It follows that any f G C(Q, H) can be approximated arbitrarily closely by polynomials in n variables with real coefficients. □
522 □ Chapter 8 Complete Spaces, Compact Spaces, and Approximation EXAMPLE 8.1 4 Illustrates the Stone-Weierstrass Theorem The collection of real-valued trigonometric polynomials Ur is an algebra of functions in C([0,tt],7?,) satisfying the hypotheses of the Stone-Weierstrass theorem. Thus, llr = C([0,7r],Ti). As an algebra in C([0,2тг],7£), how- ever, Ur does not satisfy condition (b) of the Stone-Weierstrass theorem because it(0) = u(2tt) for each it G UT. If g € C([0,2тг], 1Z) is such that p(0) <?(2тг), then g cannot be uniformly approximated arbitrarily closely by trigonometric polynomials. □ EXAMPLE 8.1 5 Illustrates the Stone-Weierstrass Theorem Suppose f G C(7£, П) is periodic with period 2тг. Then, see Example 7.28 on page 490, f(t) = g(ezt>) for some function g G C'(T,7^). It is easy to verify that the algebra P*(T), defined in Example 8.12(g), is such that P*(T) П C(7\1Z) is an algebra in C(T,7£) satisfying the hypotheses of the Stone-Weierstrass theorem. Consequently, for each e > 0, there is a p G P*(T) П C(T, 7£) such that \\g — p\\r < c. It follows that |/(t) -р(е^)| < e, tell. Thus, we have proved the following important fact: Every continuous real- valued function on П having period 2тг can be uniformly approximated arbitrarily closely by a trigonometric polynomial. □ Complex Version of the Stone-Weierstrass Theorem If C(Q,7£) is replaced by C(Q), the hypotheses of the Stone-Weierstrass theorem must be augmented in order to obtain an analogous result. THEOREM 8.10 Stone-Weierstrass Theorem (Complex Version) Let Q be a compact Hausdorff space. Suppose that Л C C(Q) satisfies the following conditions: a) A is an algebra. b) A separates points of Q. c) 1 G A. _ d) fe_A=>fe A. Then A = PROOF: Let 3L4 denote the set of real parts of functions in A. Because ЭМ = {(/+7)/2:/€Л},
8.6 Approximation by Functions From an Algebra □ 523 it follows from the hypotheses of the theorem that 3?Л C A and that is an algebra in C(Q,7£). We claim that 5?Л separates the points of Q. Let x and у be distinct elements of Q. By condition (b), there is an f G A such that f(x) =£ f(y). We note that either 5?/(rr) =£ or 3?(г/(гг)) =£ Э?(г/(?/)). Because A is an algebra, we have if G A. It follows that 5?Л separates the points of Q. Because 5?Л satisfies the hypotheses of Theorem 8.9, we conclude that ЗЫ = C(Q, 7£). So, given f G C(Q) and e > 0, we can find g, h G ЗЫ such that ||3?/ - g\\n < e/2 and Ц9/ - h||n < e/2. Thus, \\f - g - ih\\a < e. As 3?Л С Л, it follows that g 4- ih G A. EXAMPLE 8.16 Illustrates Theorem 8.10 Refer to Example 8.12(f). Suppose Q is a compact subset of C. a) The algebra P*(Q) satisfies the hypotheses of the complex version of the Stone-Weierstrass theorem. Hence, P*(Q) is dense in C(Q). b) In general, the algebra P(Q) is not dense in C(Q). To see this, consider the case Q = T and assume P(T) = C(T). Then there is a sequence {PnlXi C P(T) such that {pn(ezt)}^=1 converges to e~~zt uniformly for t G [0,2%]. Consequently, 1 = 2- / еие~и(И= lim — / еирп(ег*) dt. 2% Л n-oo27rj0 J However, as ezkt dt = 0 for к G Af, it follows that the right-hand side of the previous equation equals 0, a contradiction. Thus, P(T) is not dense in C(T). Noting that P(T) satisfies conditions (a), (b), and (c) of the complex version of the Stone-Weierstrass theorem but not con- dition (d), we see that this example shows that the theorem fails if condition (d) is dropped from the hypotheses. □ EXERCISES 8.6 In this exercise set, we assume throughout that Cl is a compact Hausdorff space. 8.50 Let A be an algebra in C(Q,7£) or C(CT). Prove that A is also an algebra. 8.51 Verify the relations fvg = (f + 9 + \f -9^/2 and f Лд = (f + g- |f-0|)/2 used in the proof of the Stone-Weierstrass theorem.
524 о Chapter 8 Complete Spaces, Compact Spaces, and Approximation 8.52 Suppose A is an algebra in C(Q, 7£). Show that if f G A and a is a positive constant, then \ f\a G A. 8.53 Show that Theorem 8.9 remains valid if condition (c) is replaced by the following: There is a g G A such that g(x) / 0 for each x G Q. Hint: See Exercises 8.50 and 8.52. 8.54 Show that if A C C(Q,7£) satisfies conditions (a) and (b) of the Stone- Weierstrass theorem and A / C(Q,7£), then there is a point x G fi such that f(x) = 0 for each f G A. Hint: See Exercise 8.53. 8.55 A linear subspace I of C(Q, 1Z) is called an ideal if f G T and g G C(Q, 1Z) imply g-f el. Suppose that T is a proper closed ideal of C(Q,7£). Show that there is a nonempty closed subset F C Q such that T = { f G C(Q, 1Z.) : f(x) = 0 for each x G F }. Hint: By Exercise 8.54, the set F = A/€j(W) / Show that if g G C(Q, IV) vanishes on F, then there is an fo G T such that 0 < fo < 1 and /o(y) > 0 for all у £ P~1({0}). Deduce from Exercise 8.52 that G T. Show that ||<7 — gf^n ||n —♦ 0 as n —> oo. 8.56 Let D be a dense subalgebra of C(Q). Show that D must separate the points of Q. 8.57 Give an example showing that the complex version of the Stone-Weierstrass theorem fails if condition (c) is dropped. 8.58 Show that the complex version of the Stone-Weierstrass theorem remains valid if condition (c) is replaced by the following: There is a g G A such that g(x) ф 0 for each x G П. Hint: See Exercises 8.54 and 8.55. 8.59 Suppose that A C C(Q) satisfies conditions (a), (c), and (d) of the complex version of the Stone-Weierstrass theorem. Also suppose that g G A and that h is a complex-valued function continuous on the range of g. Show that h о g G A. 8.60 Give an example showing that the complex version of the Stone-Weierstrass theorem fails if instead of assuming that A is an algebra, we require only the weaker condition that A is a linear subspace of C(Q). 8.61 A linear subspace 1 of C(Q) is said to be ideal if f G T and g G C(Q) implyg-fel. Suppose that T is_a proper closed ideal of C(Q) satisfying the condition that f G T implies f G 1. Prove there is a nonempty closed subset F C Q such that Z = {/ G C(Q) : f(x) = 0 for each x G F}. Hint: See the hints for Exercise 8.55.
8.6 Approximation by Functions From an Algebra □ 525 8.62 Suppose that Г and A are compact Hausdorff spaces. Let A denote the collection of functions f on Г x A of the form f(x,y) = where n E V and gj € С(Г) and hj G C(A) for j = 1, 2, ..., n. Show that A is dense in С(Г x A). Generalize this result to arbitrary Cartesian products. 8.63 Let h be a strictly increasing continuous function on [a,b]. Suppose that f G C([a, satisfies the condition J* hn(t)f(t) dt = 0 for n = 0, 1, ... . Prove that f = 0. ★8.64 Suppose that f G £1([0, oo)) and that Jo°° e~txf(x)dx = 0 for each t > 0. Show that f = 0 ae.
David Hilbert (1862-1943) David Hilbert was born in Konigsberg, Ger- many, on January 23, 1862. He entered the University of Kbnigsberg in 1880 and received his doctorate there in 1885. In 1886, Hilbert qualified as an unpaid lecturer at the Univer- sity of Konigsberg and acted in this capacity until 1892, when he replaced Adolf Hurwitz as assistant professor. In 1895, he obtained a chair at the University of Gottingen where he remained until he retired in 1930. Hilbert's first work was on the theory of invariants. His activity moved from algebraic forms to algebraic number theory, foundations of geom- etry, analysis (including calculus of variations and integral equations), theoretical physics, and, finally, to the foundations of mathematics. The invention of the space that bears Hilbert's name grew from his work in the field of integral equations. The treatise, Der Zahlbericht, was begun in 1893 in partnership with Minkowski. But Minkowski abandoned the project, and Hilbert reshaped the information of algebraic number theory into a master work of mathe- matical literature—for 50 years, Der Zahlbericht was the sacred canon of algebraic number theory. Hilbert also wrote Grundlagen der Geometric, a text published in 1899 that reached its ninth edition in 1962. In 1925, Hilbert contracted pernicious anemia, and although he recov- ered from this illness, he did not resume his full scientific activity. Hilbert died in Gottingen, Germany, on February 14, 1943. 526
□ □ Hilbert Spaces and the Classical Banach Spaces The theory of normed spaces applies ideas from linear algebra, geometry, and topology to problems of analysis. In this chapter we will study in detail the most important examples of normed spaces, namely, Hilbert spaces and the classical Banach spaces. These spaces, which are natural generalizations of Euclidean n-space, 7£n, and unitary n-space, Cn, are ubiquitous in analysis. The examples we study in this chapter also serve to motivate some general theorems that appear in Chapter 10. Section 9.1 discusses preliminaries on normed spaces; Sections 9.2 and 9.3 consider Hilbert spaces and bases and duality of Hilbert spaces; Section 9.4 examines Cp spaces; and Sections 9.5 and 9.6 investigate non- negative linear functionals on С(П) and the dual spaces of C(Q) and Co(f2). 9.1 PRELIMINARIES ON NORMED SPACES In this section, we study some elementary properties of normed spaces. Specifically, we examine the relationship between continuity and linearity for mappings of a normed space. We also present a criterion for a normed space to be complete. 527
528 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces In calculus, the following properties of derivative and integral are used so often that their fundamental importance is indisputable: (а/ + Ш = а/'(0 + М«) and pb pb pb I (af + @g)(t)dt = a / f(t)dt + (3 / g(t)dt. a J a J a These formulas show that differentiation and integration are linear map- pings on appropriate spaces of functions. DEFINITION 9.1 Linear Mappings, Operators, and Functionals Let П and Л be linear spaces with the same scalar field. A function L: fl —► Л is said to be a linear mapping if for all x, у G fi and all scalars a the following two conditions are satisfied: a) L(x + y) = L(x) + L(y). b) L(ax) = aL(x). Linear mappings are also referred to as linear operators or linear transformations; and in cases where Л is the scalar field, linear map- pings are usually called linear functionals. It follows easily from Definition 9.1 that a linear mapping L takes the linear combination &jXj to the linear combination otjL(xj)\ that is, for each n 6 У, for all xi, X2,..., xn 6 П and scalars ai, ct2> • • •, otn. EXAMPLE 9.1 Illustrates Definition 9.1 a) Let Ci([0,1]) denote the collection of complex-valued functions on [0,1] having everywhere defined and continuous derivatives. Then the func- tion D: Ci([0,1]) —► C([0,1]) defined by D(f) = f is a linear mapping. b) The function J: C([0,1]) C([0,1]) defined by J(/)(x) = f* f(t) dt is a linear operator. c) The function £C([0,1]) —> C defined by €(/) = f(t)dt is a linear functional.
9.1 Preliminaries on Normed Spaces □ 529 d) Let A be an m x n real matrix. Then the function T: —► 7£n defined by T(x) = xA is a linear mapping. Here xA denotes the product of x with A as matrices, where x is considered a 1 x m matrix. These map- pings are the classical linear transformations studied in linear algebra. Note that if m = n, then T is a linear operator. □ The next proposition, whose proof is left to the reader as Exercise 9.1, considers the relationship between continuity and linearity of mappings of normed spaces. In the statement of the proposition, as often elsewhere in the text, we use the symbol || || as a generic norm. PROPOSITION 9.1 Let fi and Л be normed spaces with the same scalar field and L: fi —► Л a linear mapping. Then the following are equivalent: a) L is continuous. b) L is continuous at some point of SI. c) L is continuous at 0. d) sup{ ||L(x)|| : ||x|| < 1} <oo. e) There is a constant c such that ||L(x)|| < c||rr|| for all x G fi. Part (d) of Proposition 9.1 motivates the definition of a bounded linear mapping, as given in Definition 9.2. DEFINITION 9.2 Bounded Linear Mapping Suppose that fi and Л are normed spaces with the same scalar field and that L: fi —► Л a linear mapping. If Pill = sup{ ||L(x)|| : M < 1} < oo, then L is said to be a bounded linear mapping. Proposition 9.1 shows that a linear mapping is bounded if and only if it is continuous. Note that if L is a bounded linear mapping on fi, then we have \\L(x)\\ < |||L|||||x|| for all x G fi. EXAMPLE 9. 2 Illustrates Definition 9.2 a) Let Q be a normed space and I: fi —> fi be the identity function, that is, I(x) = x for all x G fi. Then I is a bounded linear operator and we have |||/||| = 1; I is called the identity operator on fi.
530 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces b) The linear operator J defined in Example 9.1(b) is bounded and, in fact, it is easy to show that ||| J||| = 1. c) The linear functional I defined in Example 9.1(c) is also bounded and, again, it is easy to show that |||€||| = 1. d) The linear mapping D defined in Example 9.1(a) is not bounded if Ci([0,1]) is given the norm || ||[o,i]• To see this, consider the sequence of functions defined by sn(x) = sinn7nr. Clearly, ||sn||[o,i] = 1- However, as ||B(sn)||[o,i] = П7Г, follows that |||B||| = oo. □ When П and Л are normed spaces with the same scalar field, the collection of all bounded linear operators from Q to A is denoted by B(Q, Л). If we define addition and scalar multiplication in B(Q,A) by (Li 4- B2)(^) = Li (ж) 4- L2{x) and (aLi)(rr) = aLi(rr), then B(Q, Л) becomes a linear space. Furthermore, as the reader is asked to show in Exercise 9.3, ||| ||| defines a norm on B(Q,A). From now on, unless specified otherwise, we will abbreviate the normed space (B(Q, A), ||| |||) by B(Q, A). When Q = A, we usually denote B(fl, A) by B(Q); and when A is the scalar field, B(Q, A) is denoted by Q* and the norm HI HI by || ||*. This latter space has a special name. DEFINITION 9.3 Dual Space Let Q be a normed space. Then the space (Q*, || ||*) of bounded linear functionals on Q is called the dual space of Q. The following proposition, whose proof is left to the reader as Exer- cise 9.6, provides a sufficient condition for the completeness of B(Q,A). PROPOSITION 9.2 Let П and A be normed spaces. If A is complete, then so is B(Q,A). In particular, the dual space (Q*, || || *) is complete. We will discover that in many notable cases it is possible to find a concrete description of the dual of a normed space. For example, we will prove later that t € C([0,1])* if and only if there is a unique complex Borel measure p on [0,1] such that €(/) = f f dp for all f e C([0,1]).
9.1 Preliminaries on Normed Spaces □ 531 Banach Spaces For normed spaces, completeness is a property of such consequence that those possessing it are called Banach spaces, after the noted mathematician Stefan Banach. (See the biography at the beginning of Chapter 10 for more on Banach.) DEFINITION 9.4 Banach Space A complete normed space is called a Banach space. EXAMPLE 9. 3 Illustrates Definition 9.4 a) Exercises 7.59 and 7.60 on page 438 show that TV1 and Cn are Banach spaces. b) By Proposition 9.2, B(Q,A) is a Banach space whenever Л is; in par- ticular, fi* is always a Banach space. c) If fi is a compact topological space, then С7(П) is a Banach space. d) If П is locally compact but not compact, then Exercise 7.173(c) on page 491 shows that Cb(Q) and Q,(Q) are Banach spaces. e) If (Q, Д, g) is a measure space, then £°°(/z) is a Banach space. □ Our next proposition characterizes completeness in normed spaces in terms of infinite series. First let us recall some concepts from Chapter 7. If {^n}n=i a sequence of elements in a normed space Q, then the ex- pression xn is called an infinite series. The sequence {sn}Xi °f elements of П defined by sn = xk is called the associated sequence of partial sums. We say the infinite series converges if the sequence of partial sums converges, that is, if limn-^o sn exists. Closely related to the concept of convergence of series is the concept of absolute convergence of series. If {zn}^Li is a sequence of elements in a normed space Q, then the infinite series xn is said to be absolutely convergent or to converge absolutely if ||rrn|| < oo- In the normed space 7£, a series of nonnegative terms converges if and only if it converges absolutely. On the other hand the series l)n/n con- verges but does not converge absolutely. We learned in calculus that every absolutely convergent series of real numbers converges. Proposition 9.3 shows that this property characterizes Banach spaces.
532 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces PROPOSITION 9.3 A normed space Q is a Banach space if and only if every absolutely con- vergent series in Q converges. PROOF: Suppose that Q is a Banach space and let xn be an abso- lutely convergent series. Since the sequence of partial sums sn = xn satisfies ||sn - sm|| < £X=m+i ll^ll for < n, it follows that {sn}^ is a Cauchy sequence. Therefore, by completeness, liiUn—oo sn exists. Conversely, suppose that every absolutely convergent series in Q con- verges. Let {z/n}^Li be a Cauchy sequence. Taking into account Exer- cise 7.79 on page 446, to prove that {з/п}^Х=1 convergent suffices to show that it has a convergent subsequence. By repeatedly applying the Cauchy property, we obtain a subsequence {z/nfc}j&i sucb that ||?/nfc+1 ~Упк || < 2“*. Let Xi = yni and xk = уПк ~ Упк_х for к > 2. Then xk converges absolutely. Because уПк = xh it follows that Иш^—оо уПк exists. EXERCISES 9.1 9.1 Prove Proposition 9.1. 9.2 Let L E B(Q, Л), where Q and Л are normed spaces and || || represents the norm on both spaces. Prove that IIILIH = sup{ \\L(x)\\ : И < 1} = sup{ \\L(x)\\ : ||x|| = 1}. 9.3 Suppose that Q and Л are normed spaces. Prove that ||| ||| is a norm on the space B(Q,A). 9.4 Let д E C([0,1]). Consider the linear operator L5:C([0,1]) —► C([0,1]) defined by Lg(f) = gf. Show that Lg is continuous and find |||L9|||. 9.5 Show that each of the following is a continuous linear functional on C([0,1]) and find its norm: a) £(f) = /(0) b) £(J) = £f(t)dt c) ^(/) = fg /(t)h(t) dt, where h € £x([0,1]) 9.6 Prove Proposition 9.2. 9.7 Let Ci([0,1]) be defined as in Example 9.1(a) on page 528. a) Show that Ci([0,1]) is not a closed subspace of C([0,1]). b) Conclude that Ci([0,1]) equipped with the norm |[ ||[o.i] is not a Banach space.
9.2 Hilbert Spaces □ 533 9.8 Show that the space Ci([0,1]) defined in Example 9.1(a) becomes a Banach 4 space if it is equipped with the norm ||/|| = |/(0)| 4- ||/'|l[o,i]• 9.9 Refer to Example 7.6 on page 423. Let Q be a nonempty set. Show that the spaces €1(Q), £2(Q), and £°°(Q) are all Banach spaces. 9.10 Prove that there exist discontinuous linear functionals on any infinite di- mensional normed space. 9.11 This exercise shows that linear mappings on Euclidean n-space or unitary n-space are automatically continuous. a) Show that all linear functionals on Cn or are continuous. b) Show that all linear mappings from Cn or Ип into a normed space are continuous. 9.12 Let S be a linear subspace of the normed space Q. Prove that if S° / 0, then S = Q. 9.13 Let Г and Л be normed spaces. Define ll(«>y)lli = lkll + l|y||> Il(®,j/)ll2 = (IM2 + M2)1/2, and ll(*>S/)ll°o = max{||z||, ||y||}. a) Prove that each of the three expressions defines a norm on the Cartesian product space ГхА. b) Prove that all three norms are equivalent. 9.14 Let || ||i be the norm on C([0,1]) defined by 11/111= f'tfMdt. Jo a) Show that ||/||i < ||/||[o,i]• b) Are || ||i and || ||[o,i] equivalent? 9.2 HILBERT SPACES Perhaps because they are such natural generalizations of the standard Eu- clidean space (7£n, || Ц2), Hilbert spaces appear more frequently in math- ematics than other Banach spaces. In addition to being intrinsically im- portant, the theory of Hilbert spaces also merits an extensive discussion because it serves as a model for the general theory of Banach spaces. In this section, we begin our treatment of Hilbert space theory.
534 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces DEFINITION 9.5 Inner Product, Inner Product Space Let X be a linear space with scalar field F either TZ or C. An inner product on X is a function ( , ): X x X —> F satisfying the following conditions for all x, y, z 6 X and a, /3 E F: a) {ax + 0y, z) = a{x,z) + (3{y, z). _____ b) (rr, yj = {y, x) if F = TZ or {x, y) = {y, x) if F = C. c) {x, x) > 0. d) {x, x) = 0 if and only if x = 0. If ( , ) is an inner product on X, then the pair {X, { , )) is called an inner product space. Note: When it is clear from context which inner product is being con- sidered, the inner product space (Л\ ( , )) will be indicated simply by X. And, although we usually denote an inner product by { , ), it is sometimes convenient to have slight variations of this notation such as ( , )2 or [ , ]. EXAMPLE 9.4 Illustrates Definition 9.5 a) Cn is an inner product space if we define n {z,w) = ^zkVTk, fc=l where z = (zi,..., zn) and w = (wi,..., wn). b) 7Zn is an inner product space if we define n (х,у) = ^ХкУк’ fc=l where x = (a?i,...,xn) and у = (?/i,..., yn). This inner product is the classical “dot product” encountered in vector-calculus courses. When we consider Cn or TC1 as an inner product space, we will assume that the inner product is as in this example unless we state otherwise. □ THEOREM 9.1 Let X be an inner product space. Then, for all x,y E X, a) + = {x,x} + 23l{x,y) 4- {y,y). b) |2 < {x,x){y,y). (Cauchy’s inequality) Moreover, if у / 0, then equality holds in (b) if and only if x = ay for some scalar a.
9.2 Hilbert Spaces □ 535 PROOF: a) From Definition 9.5, we have (x + y,x + y) = (x,x + y) + (y,x + y) = {x, x) + (x, y) + {x, y} + {y, y) (9.1) = {x,x) +2%t{x,y) + (y,y), as required. b) If in (9.1) we replace у by —ty where t is a real scalar, then we obtain the polynomial p(t) = (x - ty,x - ty) = 7 + /3t 4- at2, where a = (y,y), /3 = —2Щх,у), у = (x,x). By Definition 9.5(c), we have p(t) > 0. It follows that p(t) has at most one real root. Thus, /32 — 4a7 < 0, that is, (5t(x,y))2 < (x,x){y,y). (9.2) The proof of (b) is now complete in the case of real scalars. If the scalar field is C, we choose в G [0,2тг) so that eie(x,y) = |(rr,y)\ and use Definition 9.5 and (9.2) to obtain |(x,i/)|2 = (Щегвх,у))2 < {eiex,eiex){y,y) = вгве~гв(х,х)(у,у) = {x,x){y,y). Therefore, (b) holds in any case. Suppose now that the Scalar field is 7£, у / 0, and that equality holds in (b). Then the polynomial p(t) has a root at t = —/3/(2a). It follows from Definition 9.5(d) that x = — (/3/(2a))y. If the scalar field is C, we choose в as in the preceding. Then equality in (b) yields егвх = —(J3/(2a))y by an argument similar to that used in the real case. We have referred to the inequality in part (b) of Theorem 9.1 as Cauchy’s inequality. But it is also known as the Schwarz, Cauchy-Schwarz, Bunyakovski, or Cauchy-Bunyakovski-Schwarz (CBS) inequality. EXAMPLE 9.5 Illustrates Definition 9.5 and Theorem 9.1 Suppose Zi,Z2,...,zn,wi,W2,...,wn G C. Then it follows from Theo- rem 9.1 and Example 9.4 that n 4 / n \ / n \ < (^i^fci21 (^2ы2). /с=1 4=1 ' 4=1 ' This is Cauchy’s inequality for finite sequences of complex numbers. - □
536 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces EXAMPLE 9.6 Illustrates Definition 9.5 and Theorem 9.1 Refer to Example 7.6(b) on page 423. Let (П,Л,/х) be a measure space. Recall that £2(/z) consists of all complex-valued Д-measurable functions satisfying |/|2d/z < oo Also recall that we identify functions that are equal /z-ае. We will show that (Лр) = [ fgdp (9-3) Jn defines an inner product on £2(/z). Because of properties of Lebesgue integration that we established in Chapter 4, we need only prove that (9.4) But this follows immediately from the simple inequality 2|/<j| < \ f\2 + |p|2. From now on, whenever we consider £2(/z) in the context of inner product spaces, we will always use the inner product defined by (9.3). □ EXAMPLE 9.7 Illustrates Definition 9.5 and Theorem 9.1 Let (П,Л, P) be a probability space. By Example 9.6, the function ( , ) defined by (X, Y) = £(XY) is an inner product on the space of all ran- dom variables having finite variance where, again, we identify two random variables that are equal with probability one. Note that Cov(X, Y) = £((X - £(X))(Y - £(Y))) = ((X - £(X)), (Y - f (Y))) and, in particular,- Var(X) = ((X - £(X)), (X - £(X))). The correlation coefficient of two random variables X and Y having finite variance is defined by px,y = Cov(X, Y)/x/Var(X)Var(Y). This quantity is used extensively in probability, statistics, and stochastic processes. From Cauchy’s inequality, we see that — 1 < px,y < L □ COROLLARY 9.1 Let X be an inner product space. Define || ||: X —> by
9.2 Hilbert Spaces □ 537 Then the following hold. a) The function || || is a norm on X. b) We have lk + y||2 + lk-y||2 = 2|М2 + 2||у||2 for all x,y e X. c) The inner product is continuous with respect to the product topology induced on X x X by the norm || ||. PROOF: a) Definition 7.9 on page 422 gives the three conditions for being a norm. It is easy to check that || || satisfies the first two conditions. To verify the third condition, we use Theorem 9.1 to conclude that lk + з/И2 = Ikll2 + 23t(x,j/> + hll2 < Ikll2+2|klllMI + IMI2 = (Ikll+ 11И)2- This gives the required result. b) Applying Theorem 9.1 again, we obtain that Ik + 2/II2 = Ikll2 + %R{x,y) + ||y||2 and, replacing у by — у in the previous equation, we get lk-3/ll2 = Ikll2 -2^,y).+ ||y||2. Adding corresponding sides of the two preceding equalities yields (b). c) We leave the proof of part (c) to the reader as Exercise 9.15. Й In the future, we will assume that every inner product space is also a normed space, equipped with the norm defined in Corollary 9.1. If an inner product space is complete, it is called a Hilbert space in honor of the mathematician David Hilbert. (See the biography at the beginning of this chapter for more about Hilbert.) DEFINITION 9.6 Hilbert Space An inner product space that is complete with respect to its norm is called a Hilbert space. We already know that TV1 and Cn are Hilbert spaces. Later in this chapter we will prove that all spaces of the form £2(/z) are Hilbert spaces. But for now we will content ourselves with knowing that £2(/z)-type spaces are inner product spaces, as we showed in Example 9.7.
538 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces Nearest Points The standard Euclidean plane (T?2, || Ц2) serves to illustrate an essential property of Hilbert spaces that we will prove in Theorem 9.2. We know that the linear subspaces of 1Z2 are {(0,0)}, 7£2, and lines passing through (0,0). If L is a line through (0,0) and if x G 7£2, then the point of intersection, г/о» of L and the line through x perpendicular to L is the unique point on L that is nearest to x. What is important for us is that yo is completely determined by the conditions т/o G L and {x ~ 2/0 3 y) = 0 for all У £ L, as seen in Fig. 9.1. This property of the Euclidean plane serves to motivate the following important theorem about Hilbert spaces. THEOREM 9.2 Let H be a Hilbert space and К a closed linear subspace of Tt. For each x G 7Y there is a unique point y0 G К such that Ik - i/o|| = p(x,k),
9.2 Hilbert Spaces □ 539 where p(x,K) = inf{ ||ж — г/|| : у G К }. Furthermore, the point yo is determined by the conditions yo E К and (x — yo, у) = 0 for all у G К. (9.5) In other words, (9.5) determines the unique nearest point of К to x. PROOF: We establish the theorem when the scalar field is C; the proof for real scalars is obtained by a slight modification. To begin, we select a sequence {2/n}^Li С К such that limn-^o ||x — yn\\ = p(x, K). We claim that {l/n}^Li is a Cauchy sequence. Setting x = x — yn and у = x — ym in Corollary 9.1, we obtain 4lk - (.Уп + Ут)/2||2 + ||j/n - Ут||2 = 2||х - уп||2 + 2||х - уго||2. Since К is a linear subspace, (yn + Ут)/2 € К. It follows that Из/n - Ут||2 < 2||x - ynII2 + 2||x - ymII2 - 4p(x, K)2. (9.6) Because the right-hand side of (9.6) tends to 0 as n, m —> oo, we conclude that {2/n}^°=i is a Cauchy sequence. By completeness, yo = limn_>oo yn exists and, because К is closed, we have yo G K. Moreover, Ik - Уо|| = Пт ||x - 2/n|| = p(x, K). n—>oo To verify (9.5), it suffices to consider the case where у 6 К \ {0}. Suppose that yo is a point of К nearest to x. By Theorem 9.1(a), we have II® - Уо - ay||2 = Ik “ Уо||2 - 2Kd{x - y0,y) 4- |a|2||y||2 for all scalars a. Choosing a = (x - г/о?3/)/||з/||2, we obtain Ik - 3/0 - ay||2 = lk-yo||2 - Ik - yo, у)|2/||у||2- Because К is a linear subspace, it follows that yo + oty G K. Hence, Ik - 2/0II2 = pk, K)2 < Ik - (Уо + ay)II2 = Цх-3/oll2 - Ik-yo,y>|2/l|y||2 and, consequently, {x — yo, y) =0.
540 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces Suppose, on the other hand, that yo is an element of К that satis- fies (9.5). Then, for every y€K, Ik - y||2 = Ik - yo + 2/0 - S/||2 = Ik - 2/oII2 + 2Э?к - Уо, Уо ~ У) + ||з/0 - y||2 (9-7) = |k-2/o||2 + ho -У||2 > 1к-Уо||2. Thus, yo is a point of К nearest to x. It remains to prove that yo is unique. Let yi be a point of К nearest to z. Then, by (9.7), Ik - 2/o||2 = Ik - 2/1II2 = Ik - yoll2 + ho - 2/1II2 and, therefore, ||г/о — 3/11|2 = 0- It follows that yo = У1- EXAMPLE 9. 8 Illustrates Theorem 9.2 a) Let (a?i, t/i), (#2,3/2), • • •, Уп) be n points in the plane. In statistics and other fields, it is important to find the straight line that best fits the n points in the sense of minimizing the sum of squared errors. That is, the problem is to find real numbers a and (3 that minimize i>2(yj - (a + /3xj))2. j=i The resulting line is called the least-squares line or regression line. We can use Theorem 9.2 to obtain the regression line as follows. Let x = (zi,z2,...,zn), у = (t/i,2/2,---,2/n), w = (1,1,..., 1), and К = { aw 4- bx : a, b e }. Finding the regression line is equivalent to obtaining the element yo of /^nearest to y. Writing yo = aw + fix, we apply (9.5) to get the equations {aw -F /3z, w) = (7/, w) and {aw + /3x, x) = (2/, x) or, equivalently, na + P^Xj = ^yj and a^xj + j=l J=1 J=1 j=l J=1 We thus have two linear equations in the two unknowns a and (3. The solution, which we leave to the reader, gives the slope and ^/-intercept of the regression line.
9.2 Hilbert Spaces □ 541 b) Let fj, be the measure on [-1,1] defined by = X(E)/2. The quantity .1/2 g(x)\2dx) can be thought of as the average distance between f and g. We will use Theorem 9.2 to find the function of the form g(x) = ax + (3 that minimizes the average distance to f(x) = x2. The function g must satisfy y* (x2 — ax — P)(^x -I- (5) dx = 0 for all 7,6 6 C. A calculation shows that 2 (<5 — cry)/3 — 2,(36 = 0 for all 7 and 6. It follows that a = 0 and /3 = 1/3. Thus, the best approximation to x2 of the form ax + /3 in the sense of the £2(/x)-norm is the constant function g(x) = 1/3. c) Refer to Example 9.7. Let (Г2,Д, P) be a probability space and X a random variable having finite variance. We will use Theorem 9.2 to determine the constant c that minimizes £((X — c)2). Applying (9.5) to the subspace generated by the random variable 1, we obtain the equation £((X - c)l) = 0. Thus c = £(X) minimizes 5((X - c)2) and we see that the minimum value is Var(X). □ A close reading of the proof of Theorem 9.2 reveals that more than just that theorem has been established. We did not fully use the assumption that 7Y is complete; rather, we only needed the completeness of the linear subspace К. The assumption that AT is a linear subspace of H can also be relaxed. Recall that a subset S of a linear space is said to be a convex set if for all z, у e S and 0 < a < 1, we have ax + (1 — a)y e S'; in words, whenever S contains two points, it also contains the entire line segment connecting the two points. If C is a closed convex subset, but not necessarily a linear subspace, of a Hilbert space 7Y, then we can still obtain a unique nearest point. However, (9.5) is in general no longer valid. (See Exercise 9.22.) Theorem 9.2 enables us to associate with each closed linear subspace К of a Hilbert space H the function Pk'.'H —> W, where Рк(ж) is the point of К nearest to x. The properties of the function Pk are explored in Exercise 9.26 where, in particular, it is shown that it is a bounded linear operator on H having range K. The operator Pk is often referred to as the orthogonal projection of H onto K.
542 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces Orthogonality Recall that the ordinary dot product on 7£2 satisfies (x,y) = ||z||||t/|| cos0, where в is the angle between x and y. Thus, two vectors in 1Z2 are per- pendicular if and only if their dot product is 0. Similarly, the condition (x,y) = 0 captures the notion of perpendicularity of two elements of a general inner product space X. The term used for “perpendicular” in the context of inner product spaces is orthogonal. DEFINITION 9.7 Orthogonality Let X be an inner product space. Two elements x and у of X are said to be orthogonal if (ж,7/) = 0. For a subset S of X, we define the orthogonal complement of 5, denoted 5Х, to be the set of all elements of X that are orthogonal to every element of S’, that is, = { у G X : (x, y) = 0 for all x G S }. EXAMPLE 9. 9 Illustrates Definition 9.7 a) The elements (1,0) and (0,1) of 1Z2 are orthogonal and the orthogonal complement of {(1,0)} is { (0, у) : у G TZ }. b) Recall that two random variables having finite variance are said to be uncorrelated if Cov(X, Y) = 0. We see from Example 9.7 that two random variables are uncorrelated if and only if X — £ (X) and Y — 8 (У) are orthogonal. □ It is left to the reader as Exercise 9.23 to prove that Sx is always a closed linear subspace. Moreover, it can be shown that in Hilbert spaces, (5х )x = span 5, as the reader is asked to verify in Exercise 9.25. Here we are using span S to represent the span of 5, that is, the linear subspace of all finite linear combinations of elements of S. Our next result is a version of Theorem 9.2 that emphasizes the role of the orthogonal complement. It also serves as the prototype for an important theorem in the general theory of normed spaces that appears in Chapter 10. THEOREM 9.3 Let К be a proper closed linear subspace of the Hilbert space 7Y and x G Kc. Then there exists a unique zq G K1 such that ||zo|| = 1 and p(x, K) = inf{ ||rr - y|| : у G К } = sup{ |(x, z)| : z € and ||г|| < 1} = (x, zq).
9.2 Hilbert Spaces □ 543 PROOF: Let т/o be the nearest point of К to x. If z G is such that || г|| < 1, then, by the definition of KL and Theorem 9.1, we have |(z,z)| = \{x-y0,z)\ < ||rr- 2/o||||z|| < inf{ ||z - y\\ :yeK}. (9.9) It follows that inf{ ||я — г/|| : у G К } > sup{ |(x, z)| : z G KL and ||z|| < 1}. Now we let zq = (x — 2/о)/||я — Уо||• By (9.5), z0 G K1 and, furthermore, inf{ ||t - z/Ц : у G К } = ||rr - т/oll = (x - y0, z0) = (z, z0) < sup{ |(x, z)| : z G KL and ||z|| < 1}. The equations in (9.8) now follow from (9.9) and (9.10). The uniqueness of zq is left to the reader as Exercise 9.28. As a visual aid to understanding Theorem 9.3, we have constructed a simple illustration of the theorem in Fig. 9.2. > EXERCISES 9.2 9.15 Prove part (c) of Corollary 9.1. 9.16 Let (<¥,|| ||) be a normed space with scalar field a) Suppose the norm satisfies the identity in Corollary 9.1(b) on page 537. Show that there is an inner product on <¥ such that || || is the induced norm. b) Repeat part (a) in case the scalar field is C.
544 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces 9.17 A semi inner product on a linear space X is a function ( , ): X x X —> F satisfying conditions (a), (b) and (c) of Definition 9.5 on page 534 and the following weakening of condition (d): {x,x} = 0 if x = 0. Show that (a) and (b) of Theorem 9.1 remain valid for semi inner products. 9.18 Let X be a linear space with inner product ( , ) and L: X —> X a linear operator. Show that [:r, y] = (L(x), L(y)) defines a semi inner product on X in the sense of Exercise 9.17. 9.19 Let Q be a nonempty set. Prove that ^2(Q) is a Hilbert space with respect to the inner product given by </,p)= / f9d^ Jn where p is counting measure on Q. 9.20 Let (Q,A,p) be a measure space. Show that if f e £2(g), then there is a sequence of simple functions {rn}^_1 C £2(g) such that as n —> oo, II/ - rn||2 -+ o, ||rnII2 11/Ц2, and Гп-> f g-ae. 9.21 Let be a sequence of Hilbert spaces and set < 00 ►. Denote by ( , ) the inner product for each 7Yn- Show that 7Y is a Hilbert space with respect to the inner product defined by [a:, y] = {хп,Уп)- 9.22 Let C be a closed convex subset of a Hilbert space Fl. Show that for each x € Fl there is a unique point yo 6 C such that ||x — 3/01| = C). 9.23 Let S be a subset of an inner product space X. Show that S1- is a closed linear subspace of X. 9.24 Verify the following properties of orthogonal complements: a) AcB^cA1. b) A1- = (span A)-1. c) iHnE1 = (DUE)1. 9.25 Prove that in Hilbert spaces, (A±)“L = span A. ★9.26 Let К be a closed linear subspace of a Hilbert space Fl and Pk the associated orthogonal projection. Verify the following properties. a) Pk is linear. b) ||Рк(x)II < ||x||, so that Pk is continuous. с) Рк о Pk = Pk- d) Pk1({0}) = ^±- e) The range of Pk is K. f) PK-l = I-Рк, where I is the identity operator on Fl. (See Exercise 9.25.) g) Deduce from part (f) that each x € Fl can be written uniquely in the form x = у 4- у1-, where у € К and у1- € К±.
9.3 Bases and Duality in Hilbert Spaces о 545 9.27 Let 2/o be a nonzero element of a Hilbert space H and set К = span{po}- Find an explicit formula for Pk. 9.28 Verify the uniqueness of zq in Theorem 9.3. 9.3 BASES AND DUALITY IN HILBERT SPACES As we know, the concepts of linear independence and basis play an essen- tial role in the theory of finite dimensional linear spaces. In the infinite dimensional case, one can use Zorn’s lemma to prove the existence of a Hamel basis — a maximal linearly independent set В — and then show that every element of the space can be written uniquely as a finite linear combination of members of B. Hamel bases are of little use in analysis, however, because they gener- ally cannot be obtained by a formula or constructive process. Fortunately, in Hilbert spaces, there is an analogue of Hamel basis that is much better suited to the needs of analysis. It is this notion of basis to which we now turn our attention. DEFINITION 9.8 Orthogonal Set; Orthonormal Set and Basis Let (Д', ( , )) be an inner product space. A subset S С X is said to be an orthogonal set if every two distinct elements of S are orthogonal, that is, (ж, у) — 0 for all z, у G S with x / y. An orthogonal set S is said to be an orthonormal set if ||x|| = 1 for each x G S. If S is an orthonormal set and is contained in no strictly larger orthonormal set, then S is called an orthonormal basis, or simply a basis. EXAMPLE 9.10 Illustrates Definition 9.8 a) The set of elements {(1,0,0,... ,0), (0,1,0,... ,0),..., (0,0,0,..., 1)} is an orthonormal set in Cn. Clearly, it is also a basis. b) Let Q be a nonempty set. For each x G Q, let dx denote the function that is 1 at x and 0 at all other points of П. Then {dx : x G Q } is an orthonormal set in £2(Q). We will see later that it is also an orthonormal basis.
546 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces c) For each п e Z, define en(x) = (2тг)“1/2егпж. It is easy to see that the collection of functions { en : n e Z } is an orthonormal set in C2 ([0,2тг]). Later we will show that it is an orthonormal basis as well. □ Our next theorem provides some fundamental properties of orthonor- mal sets. THEOREM 9.4 Let X be an inner product space and E = {ei, 62,..., en} a Unite orthonor- mal subset of X. Then the following hold. a) E is linearly independent. Ь) II II2 — lQj|2 ?ог апУ choice of scalars Oi,ct2, • • • ,c*n- c) For each x 6 X, we have 1(ж> ej)|2 — ll^ll2• d) x — (s, ej}ej f°r each x € spanE. e) span E is a complete subspace of X, in particular, a closed subset of X. f) For each x € X, the element г/о = ej)ej *s unique nearest point of spanE to x, that is, it is the unique member у of spanE satisfying ||x — y\\ = p(x,spanE). PROOF: The proofs of (a), (b) and (d) are left to the reader as Exer- cise 9.30. To prove (c), let x e X and у = By Part (b), we have Hj/II2 = |(х,е,)|2. Also, {Х>У} — k>ej)ej) = k, ej)(x,ej) = Ik) ej)l • J=1 j=l Applying Theorem 9.1(a) on page 534, we now obtain that о < |k~ Z/H2 = Ikll2 -2Щх,у} 4- ||У||2 = Ikll2 -^|к,е7)|2, J=1 from which (c) follows immediately. To prove (e), let {ym}m=i be a Cauchy sequence in spanE. From Cauchy’s inequality, we have \{ym,ek) - (ye,ek)\ < Ikm ~3/dl- Thus, {{Ут)Ск)}т=1 is a Cauchy sequence for к = 1, 2, ..., n. Applying part (d) and using the completeness of the scalars, we conclude that the limit n У — lim ym = У2 ( lira {ym, ek)) ek fc=l
9.3 Bases and Duality in Hilbert Spaces □ 547 exists. Clearly, у G span E. We have now shown that spanE is complete. Since a complete subset of a metric space is closed, it follows that span E is closed in X. Next we establish (f). By Theorem 9.2 on page 538 and the defining properties of inner product, it is enough to show that {x — уо,вк) =0 for к = 1, 2, ..., n. Using the fact that E is an orthonormal set, we get n {x-y0,ek) = (x,ek) -£(x,ej}(ej,ek) = (x,ek) - (x,ek) = 0, J=1 as required. As an immediate consequence of Theorem 9.4(c), we get the following important result, known as Bessel’s inequality. Refer to Exercise 2.37 on page 57 for the meaning of the summation that occurs in that inequality. COROLLARY 9.2 Bessel's Inequality Let E be an orthonormal subset of an inner product space X. Then £|М12<И2 e£E for all x e X. EXAMPLE 9.11 Illustrates Theorem 9.4 In the space £2 ([0, 2тг]), consider the linear subspace Un — span{ ejt : —n < к < n }, where e^x) = (2iv)~1^2e'Lkx. From what we noted in Example 9.10(c), Un is an orthonormal set. It is clear that Un is the space of complex trigonometric polynomials of degree at most n. Let f e £2([0,2тг]). Then, from Theorem 9.4(f), the nearest member of Un to f is given by sn = |fc|<n The number 1 Г27Г /(fc) = (2тг) - V2 </, efc) = _ jf, f(x)e~ik* dx
548 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces is called the fcth Fourier coefficient of f. Thus, the best approximation, n «„(*) = 52 <J,e*}ek = 52 /(fc)e<fcX’ |fc|<n k=—n is the nth partial sum of the Fourier series f(k)etkx associated with the function f. □ More examples of orthonormal sets can be found using the procedure described in the proof of the following theorem. THEOREM 9. 5 Let {•c7n}m=i be a sequence of elements in an inner product space X and assume xj / 0. Then there is a countable orthonormal set {z/i, Z/2, • • •} and a nondecreasing sequence of integers {k{m)}^=1 such that span{xi, x2, ...,xm} = span{j/i, j/2,..., yk(m)} for each m € AT. PROOF: We outline an argument by mathematical induction leaving the details for Exercise 9.31. Let 2/i = zi/||zi||. Proceeding inductively, suppose ?/i, ?/2, • • •, Ук(т) have been chosen so that {т/i, ?/2,...,Ук(т)} is an orthonormal set and span{xi,a:2,... ,xm} = span{7/i,7/2, • • • ,7/fc(rn)}. Define fc(m) V = 2?m+l ~ У1}УЗ' Then we find that v is orthogonal to yj for j = 1, 2, ..., k(m). If v = 0, then G span{i/i,j/2,... ^Ук(т)} and, in this case, we let k(m + 1) = k(m). If v / 0, we let fc(m + 1) = k(m) + 1 and define Ук(т+1) = v/||v||; then {3/1,J/2, • • • ,Ук(т),Ук(т+1)} is an orthonormal set such that span{a:i,a:2, = span{3/i,3/2,... ,3/fc(m+i)}- The following theorem provides several equivalent conditions for an or- thonormal set in a Hilbert space to be a basis. It also makes clear why bases in the sense of Definition 9.8 are appropriate analogues of Hamel bases. Before stating the theorem, we need to discuss generalized sums in normed spaces. Let {xL}L^i be an indexed collection of elements of a normed space. Then we say that the sum xL converges if there are only countably many nonzero terms and if for every enumeration of these terms, the resulting series converges to the same element.
9.3 Bases and Duality in Hilbert Spaces □ 549 THEOREM 9. 6 Let H be a Hilbert space and E an orthonormal subset of H. Then the following are equivalent: a) E is a basis. b) spanE = 7Y. с) (x, e) = 0 for each e G E implies x = 0. d) For each x EH, we have x = {x, e)e. e) Ikll2 = Eees K1»e)l2 for each я: € W. PROOF: _______ (a) => (b): If span E / H, then by Theorem 9.2 on page 538, we can find a nonzero element z G (spanE)3-. Let eo = г/||г||. We note that E U {eo} is orthonormal and properly contains E. Thus, E is not a basis. (b) => (c): Suppose that (x,e) — 0 for each e G E. It follows from the properties of an inner product that (x,y) = 0 for each у G spanE. Using the continuity of the inner product, we conclude that x is orthogonal to every element of spanE, which by assumption equals H. Therefore, {x, x) = 0 and, so, x = 0. (c) => (d): It follows from Bessel’s inequality that e)l2 < °0- Using that fact and Exercise 2.37(c) on page 57, we conclude that the set Eq = { e G E : (x,e) / 0 } is either countably infinite or finite. We will deal with the former case; the latter one is handled in a similar manner. Let be an enumeration of Eo and define xn = 22j=1 {x,ej)ej. If n < m, then Theorem 9.4(b) implies that ||^n~^m||2 = 1(ж> еэ)I2* It now follows that is Cauchy and, therefore, converges to some у G H. We claim that у = x. For each e G E, we have {x -y,e) = (x,e) - ^2 (®,е7)(е,-,е). (9.11) 3=1 If e is not in Eo, then (x,e) = 0 and (ej,e) = 0 for each j. If e = ejt for some fc, then the right-hand side of (9.11) reduces to (x, e^) — (x, e*). Thus, x - у is orthogonal to each element of E. It follows from (c) that у = x. (d) => (e): It follows from (d) and the continuity of the inner product that m2 = = 52 K37’ e>i2» eEE eEE as required.
550 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces (e) => (a): If E is not a basis, we can find an element e0 € H such that ||eo|| = 1 and (eo,e) = 0 for each e G E. Thus, ||e0||2 = 1 /0 = £ Keo,e)|2. e£E This completes the proof of the theorem. EXAMPLE 9.12 Illustrates Theorem 9.6 Assume as known that £2([0,2тг]) is complete, a fact that will be proved in the next section. We will show that the orthonormal set { en : n G Z }, introduced in Example 9.10(c), is a basis for £2([0,2тг]). By Theorem 9.6, it suffices to show that if f G £2 ([0,2тг]) is such that /•2тг / f(x)e~inx dx = 0, n G Z, (9.12) Jo then f = 0 ae. It follows immediately from (9.12) that if p is a trigonometric poly- nomial, then J027r dx = 0. As the reader is asked to show in Ex- ercise 9.34, there is a sequence {pn}^Li of trigonometric polynomials such that limn-юо \\f — pn||2 = 0. Using the continuity of the inner product, we conclude that r2ir ______ /»2тг I f(x)f(x)dx = lim I f{x)pn(x)dx = Q. Jo n—>oo Jq Hence, f vanishes ae. Because {en : n G Z} is a basis for £2([0,2тг]), Theorem 9.6 implies that each function f G £2([0,2тг]) has the Fourier series expansion /(*) = £ /(")Л n=—OO where the convergence is in £2 ([0,2тг]). □ Unless we know that a Hilbert space possesses a basis, Theorem 9.6 is of little consequence. That every Hilbert space does in fact have a basis is part of our next theorem.
9.3 Bases and Duality in Hilbert Spaces □ 551 THEOREM 9. 7 Let Ti be a Hilbert space. Then the following hold. a) H has a basis. b) If E is a basis for a closed linear subspace К ofH, then there exists a basis for H containing E as a subset. с) H has a countable basis if and only if H is separable. PROOF: We prove (a) and leave (b) and (c) to the reader as Exercises 9.35 and 9.36. 'Let О denote the collection of orthonormal subsets of H, ordered by C. Suppose that C is a chain of O. Then Uoec О € is an upper bound for C. Thus, we may apply Zorn’s lemma (page 17) to obtain a maximal element of O. The Dual of a Hilbert Space Let у be an element of the Hilbert space H. The mapping defined by £(x) = xEH, (9.13) is a linear functional and satisfies |^(a?)| < ||z||||2/||. Thus, I belongs to the dual space 7Y*. It is an important property of Hilbert spaces that all continuous linear functionals are of the form (9.13). THEOREM 9. 8 Let H be a Hilbert space. Then leW if and only if there is ay EH such that £{x) — (x,y) for each x EH. Furthermore, ||^||* = ||з/||- PROOF: We have already observed that functionals of the form (9.13) belong to W*. Conversely, suppose that £ G W*. If € is identically 0, then (9.13) holds with у = 0. Otherwise, К = £“1({0}) is a proper closed linear subspace of H and, consequently, K1- contains at least one nonzero element z. For each x E H, we have £(£(z)x — £(x)z) = 0. Thus, 0 = (£(z)x — £(x)z, z) = £(z)(x, z) — £(x){z, z). It follows that £(x) = {x,y}, where у = (£(z)/{z, z))z. To find the norm of the linear functional £, we first apply Cauchy’s inequality to get \\£\\* = sup{\{x,y)\ : ||x|| < 1} < ||2/||. Thus, if у = 0, then, trivially, ||£||* = ||з/||. If у / 0, we choose w = 3//Ц3/Ц in order to obtain Ц2/Ц = (w,y) < ||^||*.
552 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces Remark: If E is a basis for a Hilbert space W, then we can write a formula for the element *y given in Theorem 9.8 in terms of the basis elements. Indeed, noting that €(e) = (e,?/), we have by Theorem 9.6 that eEE eEE Theorem 9.8 is a prototype for results appearing in subsequent sections where we find explicit formulas for bounded linear functionals on various Banach spaces. EXERCISES 9.3 9.29 Verify the assertions parts (b) and (c) of Example 9.10. 9.30 Prove (a), (b), and (d) of Theorem 9.4. 9.31 Provide the details for the proof of Theorem 9.5. 9.32 In this exercise, E denotes an orthonormal set and H a Hilbert space. a) Show that if e and e' are distinct members of E, then ||e — e'||2 = 2 . b) Show that if the closed unit ball Bi (0) of H is compact, then H is finite dimensional. 9.33 Let [a, 6] be a closed bounded interval. a) Prove that the continuous functions are dense in £2([a, b]). b) Formulate and prove a similar result for unbounded intervals. 9.34 Prove that the trigonometric polynomials are dense in £2([0,2тг]). Hint: Refer to Exercise 9.33. 9.35 Prove part (b) of Theorem 9.7. 9.36 Prove part (c) of Theorem 9.7. 9.37 Let E be an orthonormal set of a Hilbert space H. Establish the following. a) = Se6B (x>e)e for aD X e H. b) p(x, spanE)2 = ||x||2 - ^,ceE |<x,e)|2 for all x € H. c) If a is a scalar-valued function on E such that |a(e)|2 < °0? then the sum a(e)e converges. 9.38 Refer to Theorem 9.5. a) Apply the technique used in the proof of that theorem to the subset of £2([—1,1]) consisting of 1, ж, x2, ... to obtain an orthonormal set of polynomials Lo, Li, ... Show that L„(x) = (n + l/2)1/2(2"n!)~1dn(x2 - l)n/dxn. The polynomials (2пп!)-1<Г*(х2 — l)n/dxn are called Legendre poly- nomials. b) Show that {Lo, Li,...} is a basis for £2([—1,1]).
9.4 ГР-Spaces о 553 9.39 The Haar functions are functions on [0,1] defined as follows. Ho(t) = 1, € [O’ .1» H1( ) |-i, te (1/2,1], and Hj(t) = ( 2n/2Hi(2nt - j + 2n), t E [-1 + j/2n, -1 + (j + l)/2n]; 3 (0 otherwise, for 2n < j < 2n+1. Show that the Haar functions form a basis for £2([0,1]). 9.40 Let n 6 AT. Define a linear functional S on £2([0,2тг]) by n S(f) = X f(k). k——n Find a function g 6 £2([0,2-zr]) such that S(f) = f(x)g(x) dx. In Exercises 9.41-9.44, we will need the concepts of an isometric function and an isomorphism of normed spaces. Let Q and A be normed spaces and L: Q —► Л. Then L is said to be isometric (or to be an isometry) if ||L(x)|| = ||x|| for each x 6 Q. It is said to be an isomorphism if it is linear, one-to-one, onto, and continuous and L-1 is also continuous. 9.41 Let H be a separable Hilbert space. Show that there is an isometric iso- morphism from H onto ^2(Af). 9.42 Let 7Y be a Hilbert space. Show that there is an isometric isomorphism from H onto £2(S) for some set S. 9.43 Prove that the function g —► {-,g) defines an isometric linear mapping of £2(/z) onto £2(/z)*. 9.44 Show that there is no isometric isomorphism from £2(1l) onto £1(7?.). 9.4 £p-SPACES In Example 7.6 on page 423, we introduced three normed spaces of measur- able functions: jC1 (/z), £2(m), and £°°(/z). Now we will generalize to £p(/z), where p is any positive extended real number. These spaces are called £p-spaces. We will show that for p > 1, £p(/z) is a Banach space and will describe its dual space in the spirit of Theorem 9.8 (page 551). The £p-spaces, along with spaces of the form C(Q) where Q is a compact Hausdorff space, are sometimes referred to in the literature as the classical Banach spaces.
554 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces DEFINITION 9.9 £P-Spaces Let (Q, Л, g) be a measure space, f a complex-valued Л-measurable function on fi, and 0 < p < oo. • For 0 < p < oo, we define ap(/)= [ \f\PdP Jn and • For p = oo, we define ||/||oo = inf{M:|/|<Mp-ae}. The collection of all complex-valued Л-measurable functions / such that H/llp < oo is denoted £р(П,Л,р) or, when no confusion can arise, simply £p(p). The spaces £Р(9,Л, p), 0 < p < oo, are called £p-spaces. Note: Under certain conditions, special notation is used for £p-spaces: • When p is Lebesgue measure restricted to some Lebesgue measurable subset Q of 7£n, we write £P(Q) for £p(p). • When p is counting measure on some set Q, we write €P(Q) for £p(p) and, in the special case, fi = AT, we sometimes write simply IP. As mentioned earlier, we identify functions that are equal p-ae. Keep- ing that in mind, we will see later that || ||p is a norm on the linear space £p(p) when 1 < p < oo. When 0 < p < 1, the space £p(p) is still a linear space, but || ||p is no longer a norm. Rather, in this case, £p(p) is a metric space with metric given by pp(f,g) = crp(f — p). See Exercises 9.53-9.55. EXAMPLE 9.13 Illustrates Definition 9.9 a) Let [a, b] be a closed bounded interval of 7Z and 0 < p < oo. A complex- valued Lebesgue measurable function / on [a, b] is in £p([a, b]) if and only if fb |/(x)|p dx < oo.
9.4 £P-Spaces □ 555 b) Let p be counting measure on {1,2}. Then the space of real-valued functions in ^({1,2}) can be identified with 1Z2, We have ||(xi,a:2)||p = ((|xi|p + k2|p)1/p, | maxflxij, |x2|}, 0 < p < oo; p = oo. Figure 9.3 shows the unit “circles” centered at (0,0) in the metric space (7£2,po.5) and in the normed space (7£2, || ||p) for p = 1, 2, 3, and oo. c) Refer to Example 5.10(c) on page 293. Let (П,Л, P) be a probability space. The random variables with finite nth moments are precisely those in £n(P). d) Let p be counting measure on V and 0 < p < oo. A sequence {an}^! of complex numbers is in IP if and only if |an|p < oo. □ Our next proposition, whose proof is left to the reader as Exercise 9.45, provides some basic properties of £p-spaces.
556 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces PROPOSITION 9.4 Let p be a positive extended real number. Then the following hold. a) IIq/IIp = lalll/llp f°r ah f € £р(д) and scalars a. b) Cp(p) is a linear space. c) For each f С £p(/z), there exists a sequence of simple functions {sn}^^ in £p(p) such that, as n oo, sn f p-ae, \\f — sn||p —> 0, and Jn |sn|p dp -► |/|P dp. In Section 9.2, we used Cauchy’s inequality to prove that an inner product ( , ) induces a norm via ||ж|| = у/{x,x}. Similarly, we will use Holder’s inequality, a generalization of Cauchy’s inequality, to show that || ||p is a norm when p > 1. THEOREM 9. 9 Holder’s Inequality Let 1 < p < oo and let q be such that 1/p 4- 1/q = 1. Then for any two A-measurable functions f and g, we have [ l/ffl dp < Hrilpllplk- Jn (9-14) Furthermore, if 1 < p < oo, then equality holds in (9.14) if and only if there are constants a and /3 not both zero such that a|/|p = /9|p|9. PROOF: Without loss of generality we can assume that \\f\\p and ||^||g are finite and nonzero. Suppose that 1 < p < oo. By the concavity of the natural log function we have In \fgI = (1/p) In |/|P + (1/g) In |p|9 < ln(( 1/p)|/|₽ + (1/g)|5|’). Thus, \fg\ < (i/?)l/lp + (1/<?)Ы’. (9.15) If ||/||p = ||p||g = 1, it follows from (9.15) that f \fg\dp<(l/p) [ \f\pdp + (l/q) [ \g\q dp = 1/p + 1/q = 1 (9.16) Q Jn Jn and, hence, (9.14) holds in that case. In general, we can replace / and g by //||/||p and <?/||<?||д, respectively, and use Proposition 9.4(a) and (9.16) to obtain (H/llpllffll,)"1 Jn\fg\ dp < 1. We leave the cases p = 1 and p = oo and the “Furthermore, ...” part to the reader as Exercises 9.46-9.47.
9.4 /^-Spaces □ 557 THEOREM 9.1 0 Minkowski’s Inequality Let 1 < p < oo. Then ||/ + ^||Р<]|/||Р + ЫР for all f,g € £p(p)- PROOF: The case p = 1 follows immediately from \f + g\ < |/| 4-1<?| and the case p = oo from the fact that if \f\ < p-ae and |p| < М2 p-ae, then I/ + g\ < Mi + M2 p-ae. Suppose that p G (1,00) and let q be defined via 1/p + 1/q = 1. From I/+ slP < 1/11/+ ff|p-1 + IffllZ + ffr1 / we get \\f + 9\\pP < [ l/ll/ + pr1dp+ [ \g\\f + g^1 dp. (9.17) Noting that Л|/ + РГ1)9Ф= / \f + 9\qp'-qdp = \\f + g\\pp1 Jn, Jn it follows from (9.17) and Holder’s inequality that II/ + 9\\PP < II/lipII/ + 9\\pP/q + ||ffl|P||/ + 9\\pP/g- Hence, ll/ + slirP/? < ll/llp + Ир- Whereas p — p/q = 1, the proof is complete. It follows from Proposition 9.4 and Theorem 9.10 that £p(p) is a normed space when p G [l,oo]. The next theorem shows that it is in fact a Banach space. THEOREM 9.1 1 Riesz’s Theorem For 1 < p < 00, the normed space (£p(p), || ||p) is a Banach space, that is, a complete metric space in the metric induced by the norm || ||p.
558 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces PROOF: We leave the case p = oo to the reader as Exercise 9.51. By Proposition 9.3 on page 532, it suffices to show that the series fn converges with respect to the norm || ||p whenever 52X1II/n ||p < oo. Consider the nondecreasing sequence of functions gn = J2fc=i IАI an<^ set g = limn_^oopn. It follows immediately from Minkowski’s inequality that f^g^dp < (££=1 ||A||p)p. Applying the monotone convergence the- orem, we obtain г / 00 \ p / gpdfi< (Sll/nllp) < oo- 4=i / Hence, g must be finite /z-ae. It is easy to see that, whenever g(x) < oo, the sequence of partial sums sn(z) = fk (T) is Cauchy and, hence, convergent. Let s(x) = f >oo sn(x), I °, if g(x) < oo; if g(x) = oo. Then s e £p(/z) because fQ |s|p dp < fQ |(?|pd/z < oo. Also, using the fact that |s — sn|p < gp and applying the dominated convergence theorem, we get lim ||s - sn||p = lim [ \s - sn|p dp = 0. n-oo v n—oo We have now shown that the series fn converges with respect to the norm || ||p. * The Dual Space of Cp(p) We now take up the problem of describing the bounded linear functionals on £p(/z). At this point, we restrict ourselves to the case where 1 < p < oo. To begin, we observe that for g e £Q(/z), where 1/p + 1/q = 1, the linear functional defined by £(f)= [ fgdp (9.18) is continuous on £p(p). Indeed, by Holder’s inequality, |€(/)| < ||/||p||<j||9 and, therefore, * НФ < llffll,- (9-19) We claim that equality holds in (9.19). If g = 0, there is nothing to prove. So assume ||(j||Q / 0 and set sfaA = /ff(*)/|<z(x)l, if / 0; 10, if g(x) = 0.
9.4 £P-Spaces о 559 Then the function f0 = s|g|9- 1/||g||’_1 satisfies [ \fo\p dp = [ |s|p|p|P9"p/||p||P9~pdp = [ \g\q/\\g\\qdp = l. Jn Jn Jn Hence, /о € £p(p) and ||/o||p = 1- Furthermore, €(/o) = [ s\g\g~1gdfi = - * f \g\4 dfj, = ||</||,. Ilffllg Jn llffllq Jn It follows from this last equality and (9.19) that ||£||* = ||^||^. We have shown that functions in £9(p) induce bounded linear func- tionals on £p(p) via the formula (9.18). Now the question is whether these exhaust all bounded linear functionals on £p(p). The following theorem shows that the answer is yes! THEOREM 9.12 Riesz Representation Theorem Let 1 < p < oo and 1/p + 1/q = 1. Then £ G £p(p)* if and only if there exists a unique g G £9(p) such that €(/) = [ fgd^ fem Jn Furthermore, g satisfies ||€||* = ||p||g. PROOF: In view of our discussion directly before this theorem, we need only prove necessity. So assume that £ G £p(p)*. We will work under the assumption that (fi, A, p) is a finite measure space and leave the general case to the reader as Exercises 9.62-9.65. We also leave the proof of the uniqueness of g for Exercise 9.59. Define the complex measure и on A by v(E') = £(xe\ If p(-E) = 0, then xe = 0 p-ae and so v(E) = £(xe) = 0- Thus v is absolutely continu- ous with respect to p. Applying the complex version of the Radon-Nikodym theorem (page 383), we conclude that there exists a function g G £T(p) such that 1(xe) = [ gdp, EeA; Je By linearity, it follows that £(ф) = ]^фдс1р for all (Л-measurable) simple functions ф. Thus, | fQ фgdp\ < ||€||*||ф||р for all simple functions. Let . s(x\ _ J 9(x)/\g(x)\, if g(x) / 0; {) to, if g(x) = 0.
560 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces As the reader is asked to show in Exercise 9.60, we can find a sequence of simple functions such that \фп\ < 1 p-ae and —> s p-ae. We have f iM>gdp < IKII»Mp and, applying the dominated convergence theorem, we obtain [ Ф\д\ dp < ||4I»Mp- Jn (9.20) We will use (9.20) to show that g belongs to the space £9(p). Let n e N and En — {x : |р(ж)| < n}. The function /0 = Хеп IpI9”1 belongs to £p(p). Hence, by Proposition 9.4 on page 556, there is a se- quence of simple functions such that, as к —> oo, фк —* fo Ц-ж and \\фк\\р -* ||/o||p- Replacing фк by %En\ФкI if necessary, we may assume that the фкз are nonnegative and vanish outside of En. Using Fatou’s lemma and (9.20), we obtain / tel9-1 Isd dp < liminf [ фк\д\ dp < ||£||. liminf ||<Mp = ||£||,||/o||p JEn Jn k~*°° and, hence, that / /» \ 1/q / /• \ i-i/p (/£ \9\gdp) =\JE\g\<dp) <||£||.. Letting n —* oo and applying the monotone convergence theorem, We get that ||p||g < ||€||*. Thus, g belongs to £9(p). Because g 6 £9(p), the function tg defined by £g(f) = /n/pdp is in £p(p)*. As t and agree on simple functions, Proposition 9.4 implies that they are identical. Remark: If p = 1, Theorem 9.12 remains valid under the additional as- sumption that (fi, Л, p) is cr-finite, as the reader is asked to prove in Exer- cise 9.61. An example given in Chapter 10 shows that Theorem 9.12 fails when p = oo. In view of Theorem 9.12, we can write £p(p)* = £9(p), for 1 < p < oo, and, in the cr-finite case, for p = 1. However, for p = oo, we can assert only that с°°{ру^с\р}. (9.21) See Exercise 9.58.
9.4 £P-Spaces □ 561 EXAMPLE 9.14 Illustrates Theorem 9.12 Refer to Example 9.11 on page 547. Let x e [0,2тг] and 1 < p < oo. Define the linear functional tx on £p([0,2тг]) by = E f№ikx k=—n Of course, £x just gives the value at x of the nth partial sum of the Fourier series of f. First we will show that (x is bounded and then we will find the function g 6 £Q([0, 2tt]) guaranteed by Theorem 9.12. Prom Holder’s inequality, l/(*)l = Г №e-ikvdy 27Г Jq < ф* ( Г le-ifcVl9dy} /9 = ||/||р(2тг)-1/₽. 2тг \Jo It follows at once that f£x(f)| < (2п+1)(2тг) 1/Гр||/||р. Thus, lx is bounded. Finally, we write W) = E V- / f№ik{X~y} dy = / f(.y)Dn(x - y) dy, 27r Jg Jg where ад)=i E eikt = < 27Г sin((n + l/2)t) 27Г sin(t/2) Thus, the function g guaranteed by Theorem 9.12 is g(y) = Dn(x — y). □ EXERCISES 9.4 9.45 Prove Proposition 9.4. 9.46 Prove the “Furthermore, ...” part of Holder’s inequality. 9.47 Verify Holder’s inequality for p = 1 and p — oo. 9.48 Discuss the case of equality in (9.14) when p = 1 or p = oo. 9.49 Suppose that p.q 6 (0, oo]. a) Let r be such that 1/r = 1/p + 1/q. Show that if f 6 £p(p) and 9 € £9(p), then fg € £г(м) and ||/p||r < ||/||ph||g. b) Let (Q, Л, p) be a finite measure space. Show that if 0 < s < r < oo, then £г(р) C £e(p).
562 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces 9.50 Let (£2,Л,р) be a finite measure space. Show that for each f 6 £°°(p), II/IIp -* ll/lloo asp-> oo. 9.51 Prove that the normed space (£°°(p), || ||oo) is a Banach space. 9.52 Show that (£p([0,1]), || ||p) is not an inner product space unless p = 2. 9.53 Show that || ||p does not define a norm on £p([0,1]) when 0 < p < 1. if 9.54 Refer to Definition 9.9 on page 554. a) Show that if 0 < p < 1, then сгр(/ + g) < (TP(f) + crp(p). b) Deduce that pp(/, g) = crp(f-g) defines a metric on £p(p) for 0 < p < 1. 9.55 Refer to Exercise 9.54. Show that if 0 < p < 1, then (£p(p),pp) is a complete metric space. 4t9.56 Let J be a nonempty interval in 7Z and 0 < p < oo. a) Show that if J is closed and bounded, then C(J) is dense in £P(J). b) Refer to Example 7.26 on page 489. Show that Cc(J) is dense in £P(J). c) Show that Cc(J) is not dense in £°°(J). 9.57 Let 0 < p < oo. Prove that the trigonometric polynomials are dense in £p([0,27t]). 9.58 The result of this exercise gives meaning to the relation (9.21) on page 560. Prove that if g G £T(p), then £(f) = j^fgdp defines a bounded linear functional on £°°(p) and that ||^||* = ||p||i. 9.59 Prove the uniqueness of the function g in. Theorem 9.12. if 9.60 Suppose that f G £°°(p). Show that there exists a sequence of simple functions {0n}SXi such that |0n| < II/II с» M-ae and limn->oo Фп = f p-ae. if9.61 Prove Theorem 9.12 when p = 1 under the assumption that (П,Л, p) is a cr-finite measure space. In Exercises 9.62-9.65 we complete the proof of Theorem 9.12 by eliminating the restriction p(Q) < oo. 9.62 Suppose (Q, Л, p) is a measure space. For E G Л, define the measure pe on A by ре(Л) = p(E П A). a) Show that / 6 £p(pe) if and only if XEf 6 £p(p); b) Show that if £ 6 £p(p)*, then £e(J) = £(хеГ) defines a continuous linear functional on £p(pe) and ||/?e||* < ||£||*. c) If р(Е) < oo, show there is a unique function gE G £9(p) such that gE vanishes outside of E, £e(J) = fgEdp, for each / 6 £p(pe), and ИЫ2 = fn |S£|’d/x£- 9.63 Use Exercise 9.62 to prove Theorem 9.12 in case (П,Л, p) is cr-finite. 9.64 Let (£2,Л,р) be an arbitrary measure space and 1 < p < oo. Show that if £ € £p(p)*, then there exists a sequence {Qn}^i of Л-measurable sets
9.5 Nonnegative Linear Functionals on C(Q) □ 563 such that /z(Qn) < oo for each n € J\[ and ^(хл) = 0 for each A e A such that /i(A) < oo and A C (UXi ^n)c. 9.65 Use Exercises 9.62-9.64 to verify Theorem 9.12 for an arbitrary measure space (Q, A,/i). 9.5 NONNEGATIVE LINEAR FUNCTIONALS ON C(Q) We have now characterized the dual spaces of Hilbert spaces (Theorem 9.8 on page 551) and £p-spaces (Theorem 9.12 on page 559). Our next task, which we will begin in this section and complete in the following one, is to characterize the dual spaces of C(Q) and Co(J2). We will see that the linear functional defined on C([0,1]) by = f1 f(x)dx= [ fdX Jo J[0,l] is typical in the sense that all bounded linear functionals on C(fi) arise from integration with respect to some complex measure. Here we lay the foundation for the general result by characterizing those that arise from integration with respect to a (nonnegative) measure. Borel Sets and Regular Borel Measures In Chapter 3 we defined the collection В of Borel sets of 'll. We showed in Theorem 3.4 that В is the smallest cr-algebra of subsets of 1Z that contains the-open sets of 7£. This characterization allows us to extend the concept of Borel sets to any topological space. DEFINITION 9.10 Borel Set, Measure, and Measurable Function Let Q be a topological space. The smallest cr-algebra of subsets of Q that contains all the open sets is denoted В(П). We use the following terminology: • Borel set: a member of B(Q). • Borel measurable function: a function measurable with respect to B(fi). • Borel measure: a signed or complex measure on B(Q).
564 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces EXAMPLE 9.1 5 Illustrates Definition 9.10 a) B(R) ~ B, as defined in Chapter 3. b) b(h2) = 02 = В x В, as discussed in Exercise 4.147 on page 244. More generally, B(1ln) = Bn = BxBx-xB,as discussed in Exercise 4.171 on page 259. c) Let fl be any set and T = {fl, 0}. Then B(fl) = T. d) Let fl be any set and T be the discrete topology on fl. Then we have that B(fl) = T = P(Q). e) Let (fi, T) be a topological space. Then all functions in C(fl) are Borel measurable. □ To characterize the bounded linear functionals on C(fl), we need the concept of a regular Borel measure. We recommend that the reader review the discussion of the total variation of a complex measure presented in Section 6.7 starting on page 381. DEFINITION 9.11 Regular Borel Measure Let fl be a locally compact Hausdorff space. A complex Borel mea- sure д is said to be a regular Borel measure if for each В E B(fl) and e > 0, there is a compact set К and an open set О such that KcBcOand \p](O\K)<e. The collection of all regular Borel measures on fl is denoted by M(fl); the real-valued and nonnegative regular Borel measures are denoted, respectively, by Afr(fl) and Remark: Definition 9.11 requires that a regular Borel measure be finite valued. Other definitions of regular Borel measure exist and some permit certain extended real-valued measures, such as Lebesgue measure, to be regular. EXAMPLE 9.1 6 Illustrates Definition 9.11 a) Lebesgue measure on [0,1] is a regular Borel measure. In fact, Lebesgue measure on any Borel set of finite Lebesgue measure is a regular Borel measure. b) The Lebesgue-Stieltjes measure corresponding to a distribution function on R is a regular Borel measure, as the reader is asked to establish in Exercise 9.68.
9.5 Nonnegative Linear Functionals on C(Q) □ 565 c) Let Q be a locally compact Hausdorff space. For x G Q, the Dirac measure concentrated at x, restricted to the Borel sets of Q, is a regular Borel measure. See Exercise 9.71. □ Suppose that Q is a locally compact Hausdorff space. The spaces M(Q) and Afr(Q) are, respectively, complex and real linear spaces, where the operations of addition and scalar multiplication are defined by (/z 4- i/)(B) = jz(B) + and (q/z)(B) = afi(B). Referring to Exercise 6.112 on page 386, we see that the linear spaces M(Q) and Afr(Q) are also normed spaces, where the norm is given by the total variation, that is, ||jz|| = |д/| (Q). Moreover, as the reader is asked to prove in Exercise 9.66, M(Q) and Mr(Q) are Banach spaces with respect to the norm || ||. If F is a closed subset of Q, then any и G M(F) can be extended to a regular Borel measure i/' on Q by defining 1/'(B) = 1/(BAF), BeB(fi). It is convenient to view v as a measure on Q by identifying it with v'. In this way we can identify M(F) with the linear subspace {M € M(Q) : м(В) = 0 for all В G B(Q) with В C Fc}. Nonnegative Linear Functionals From now on in this section, unless explicitly stated otherwise, we assume that Q is a compact Hausdorff space. If jz G Л/(П), then ц induces a linear functional on the space C(Q) via W) = [ fd^ fem That is a bounded linear functional follows from IM/)I < llfllnlMl(fi) = 11/hlHI, where we have applied Exercise 6.117(b) on page 387. In this section we will show that any linear functional on C(Q) sat- isfying a certain nonnegativity condition must be of the form for some fi G M+(Q). In the next section, we will extend this result to all bounded linear functionals on C(Q) if Q is a compact Hausdorff space and to Co(Q) if Q is a locally compact Hausdorff space.
566 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces DEFINITION 9.12 Nonnegative Linear Functional A linear functional £ on (7(Q) is said to be nonnegative if €(/) > 0 whenever f > 0. As the reader is asked to show in Exercise 9.75, the linear functional on C(Q) induced by a regular Borel measure p is nonnegative if and only if p is nonnegative. The following theorem, whose proof is left to the reader in Exer- cises 9.76-9.81, presents some basic properties of nonnegative linear func- tionals. THEOREM 9.13 Let Q be a compact Hausdorff space. a) If £ is a nonnegative linear functional on C(Q), then £ G C(Q)* and ||€||. =£(1). b) If £ e (7(Q)* and ^((7(0,7?,)) C 7Z, then there exist nonnegative linear functionals and £- such that ||€|| * = €+(l) + ^-(1) and £ = £+ - €_. We have noted that a nonnegative regular Borel measure on Q induces a nonnegative linear functional on C(Q). Our next theorem shows that all nonnegative linear functionals on C(Q) are of that type. There are two main ideas in the proof of this result. One is the use of Urysohn’s lemma to obtain suitable approximations to characteristic functions of closed sets. The other is to mimic the construction of Lebesgue measure from Lebesgue outer measure. With regard to the latter, recall that the collection Л4 of Lebesgue measurable sets is defined using the Caratheodory criterion and Lebesgue outer measure: E G Л4 if and only if A*(W) = A*(W A E) + A*(W A Ec) for all W c Theorem 3.11 on page 120 shows that Ad is a cr-algebra. A careful look at the proof reveals that it uses only the properties of Lebesgue outer measure given in (a), (b), Xе), and (e) of Proposition 3.1 on page 106. In other words, we have already proved the following proposition. PROPOSITION 9.5 Let Q be a set and v* an extended real-valued function on P(Q) satisfying the following conditions:
9.5 Nonnegative Linear Functionals on C(Q) □ 567 a) v* (A) > 0 for each А С П. b) = 0. c) 4cB=>/(A)</(B). d) {An}n C P(Q) => v* (Un An) < EnP*(An). Then the collection of subsets E of Q satisfying = l/^WdE)+ v\W П Ec) for all W C Q is a cr-algebra. Members of this cr-algebra are referred to as i/*-measurable sets? We now state and prove the main result of this section, known as the Riesz-Markov theorem. THEOREM 9.14 Riesz-Markov Theorem Let Q be a compact Hausdorff space and £ a nonnegative linear functional on C(Q). Then there exists a unique p G Af+(Q) such that ^(/)= [ fdp, JeC(fl). PROOF: We start by assigning a nonnegative number /7(0) to each open set O. If О = 0, let p(O) = 0; otherwise, let /7(0) — sup{€(/) : 0 < f < 1 and supp/ С О }. We note that /7(0) < /7(Q) = ^(1) for all O. Next, for each A C fl, we define p*(A) = inf{/7(0) : О open and О D A}. Observe that /z*(O) = /7(0) whenever О is open. We will show that /z* satisfies the hypotheses of Proposition 9.5. Con- ditions (a)-(c) follow easily from the definition of /z*. To verify condi- tion (d), we first show that if {On}^^ is a sequence of open subsets of Q, then z OO \ oo м(и°п) (9-22) '71= 1 ' 71=1 t Proposition 4.13 on page 210 shows that the Quter measure i/* induced by an appropriate set function on a semialgebra of subsets of a set Q satisfies (a)-(d) of Proposition 9.5. Thus the concept of i/*-measurability given here is the same as that in Definition 4.18 on page 211.
568 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces Let f be a continuous function satisfying 0 < f < 1 and supp f c U~ t On. Applying Theorem 7.15 on page 477 with К = supp /, we obtain continuous functions A, /2, ..., /m satisfying • 0 < fj < 1, for each j, • EjLi Л(®) = 1 for x e supp/, • fj i> 311(1 • for each J, there is an mj such that supp fj C • By replacing fj by 52mfc==rn. fk if necessary, we can assume that the mjS are distinct. It is clear that f = SJLi f fj and, so, €(/) = Kffj)- As supp ffj С Omi -> it follows that J / ^(/)<f>(On). (9.23) n=l Taking the supremum on the left-hand side of (9.23), we obtain (9.22). It is now easy to check that /z* satisfies condition (d) of Proposition 9.5, as we ask the reader to verify in Exercise 9.82. We complete the proof of the theorem by showing successively that • all open sets are /z*-measurable, • д — is a regular Borel measure, and • *(/) = for a11 f e <ЭД- To show that an arbitrary open set О is /^’-measurable it suffices to prove that /z*(A) > ц*(А П О) + /z*(A П Oc) (9.24) for all А С П. Let U be an open set containing A, f a continuous function satisfying 0 < f < 1 and supp f C U П O, and V = U П (supp /)c. If g is a continuous function satisfying 0 < g < 1 and supp g С V, then supp(/ + g) C supp f U suppgCU. It follows that M(t/)>€(/)+^). (9.25) From (9.25) we deduce that Ш > £(f) + /z(V) > €(/) + /z*(A П Oc)
9.5 Nonnegative Linear Functionals on C(Q) □ 569 and, therefore, that Д(С7) > T1{U П О) + /Г(А П Oe) > /Г (А П О) + д*(А П 0е). As the open set U containing A was chosen arbitrarily, (9.24) holds. Having shown that all open sets are /i*-measurable, we can invoke Proposition 9.5 and Proposition 4.16 on page 213 to conclude that all Borel sets are /immeasurable and /1 = Mib(Q) *s a B°rel measure. To show that /1 is regular, we first observe that, by the definition of /i*, /i(B) = inf{ /1(0) : О open and О D В }, Be B(Q). (9.26) Because /z(Q) = £(1) < oo, we have for each В e that д(В) = д(Я) - д(Вс) = д(П) - inf{ fi(W) : W open, W D Bc } = sup{ n(Wc) : W open, W D Bc } (9.27) = sup{ /z(F) : F closed, F С В }. It follows at once from (9.26) and (9.27) that g is regular. Finally, we must show that ПП = [ fdli. Jq f e C(Q). (9.28) Every function in C(Q) is a linear combination of functions with values in the interval [0,1). Therefore, by the linearity of £, it suffices to estab- lish (9.28) in case 0 < f < 1. Let n e J\T. For each integer fc, 0 < к < n, the sets Fk — f~Wk/n, oo)) and Uk = /“1(((fc — l)/n, oo)) are closed and open, respectively. Moreover, we have n—1 C Uk+i C Fk cUk and Q = (Ffc \ Fk+i). fc=O If Fk = 0, we set gk = 0. Otherwise, we first invoke the regularity of /i to choose an open set Vk such that Fk CVkcUk and HZo1 f4Vk\Fk) < 1 and then apply Proposition 7.14 on page 449 and Urysohn’s lemma on page 450 to obtain a continuous function gk such that 0 < gk < 1, St (Ft) — {1}, and suppgfc C Vk-
570 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces Let h = (1/n) 9j- We claim that f < h. For each x E Я, choose the unique к such that x E /~1([fc/n» + l)/n)) = If 0 < J < fc, then gj(x) = 1 because Fk C Fj\ if j > к 4- 1, then gj(x) = 0 because x € F£+1 C t/^+2 c Uj c ХЛ ft follows that h(x) = (fc 4- l)/n 4- > /(*), as required. Using the fact that f < h and the nonnegativity of £, we obtain n—1 n—1 ЦП < Ць) (1/n) £>ft) < (1/n) J=O j=0 n—1 n—1 = (l/«)52(M(vj \fj)+m(fj)) < 1/n + (1/n) ^^(F,). J=o j=0 (9.29) For j = 0, 1, ..., n — 1, we can write Fj = Ut=J (-ffc \ -ffc+i) anch therefore, p(Fj) = ^p(Fk\Fk+1). k=j Applying (9.29), we get ЦП < 1/n + (1/n) £ £ M(Ffc \ Fk+1) j=Q k=j = 1/n + (1/n) ^(fc + l)M(Ffc \ Ffc+1) fc=0 n—1 = 1/n + M(n)/n + 5?(fe/n)pt(Ffc \ Ffc+1) fc=0 = 1/n + £(l)/n + [ 52(fc/n)x(Ffc\Ffc+i) Ja k=o ' <(l+£(l))/n+ f fdn. Jq Because n was chosen arbitrarily, it follows that ^(/)< f dp.. (9.30)
9.5 Nonnegative Linear Functionals on C(Q) □ 571 We can replace f by (1 — /)/2 in (9.30) to get £(1)-£(/)< д(П) - [ /dg = €(l)- [ fdfi. Jq jq Thus (9.28) holds. It remains only to prove the uniqueness of д, which we leave to the reader as Exercise 9.83. EXERCISES 9.5 9.66 Let Q be a locally compact Hausdorff space. Show that (M(Q),|| |[) and (Mr(Q), || ||) are Banach spaces, where ||д|| = |д|(Q). 9.67 Let fi be a locally compact Hausdorff space. Show that if p G M(Q), then \p\ g M(Q). ★9.68 In this exercise, you are asked, among other things, to verify the statement of Example 9.16(b). a) Prove that if a locally compact metric space Q is the countable union of compact subsets, then every complex Borel measure on Q is regular. b) Show that the Lebesgue-Stieltjes measure associated with a distribution function on 11 is a regular Borel measure. 9.69 Suppose Q is locally compact and p G M_|_(Q). Prove that Cb(Q). is dense in £p(p) for 1 < p < oo. 9.70 Let p G M([0,1]) satisfy * I xn dp(x) = 0 J [од] for n = 0,1,2,.... Show that p = 0, that is, p vanishes identically. 9.71 Suppose Q is a locally compact Hausdorff space. Let x G Q and 6X be defined on B(Q) by, c (/ 1’ if ж G B, = 10, ifx^B. a) Show that 6X is a regular Borel measure. b) Determine f f d6x when f G C(Q). v a 4 9.72 Let 6X be as in Exercise 9.71. a) Show that ||6X — 6y|| = 1 when x / y. b) Deduce that M(Q) is not separable if Q is uncountable. 9.73 Show how to identify M(Q) and ^(Q) when fi is countable. 9.74 Let fi be a locally compact Hausdorff space, p G M(Q), and В G B(Q). Prove that there are sets F and G such that G is a countable intersection
572 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces of open sets, F is a countable union of closed sets, F С В C G, and |/z|(G\F) = 0. 9.75 Suppose that fi is a compact Hausdorff space and that /i € M(Q). Show that £M(/) = f dpL defines a nonnegative linear functional on C(Q) if and only if € M+(Q). Exercises 9.76-9.81 provide the proof of Theorem 9.13 on page 566. 9.76 Show that if £ is a nonnegative linear functional on C(Q), then £ 6 C(Q)* and ||£||* = £(1). 9.77 Suppose that £ satisfies the hypotheses of part (b) of Theorem 9.13. For each nonnegative continuous function f on П, let £+(f) = sup{ £(g) :0 < g < f, g continuous}. a) Show that if /i and /2 are nonnegative and continuous, then M/1+/2) = €+(Л) + £+(Л). b) Show that 0 < f < g implies £+(/) < £+(g). c) Show that £+(a/) = a£+(/) whenever f > 0 and a is a nonnegative real number. 9.78 Extend the function £+ defined in Exercise 9.77 to all of C(Q,TV) by the formula M/) = MII/11 + /) - MII/11), where ||/|| = ||/||n. a) Prove that this new definition of £+ (/) agrees with the old one when f is nonnegative. b) Show that this extended £+ is linear on the space C(Q,7£). 9.79 Extend the function £+ defined in Exercise 9.78 to all of C(Q) by the formula М/) = МЯ/) + *МЗ/). a) Prove that this new definition of £+(f) agrees with the old one when f is real valued. b) Show that this extended function is linear and nonnegative. 9.80 Suppose that £ satisfies the hypotheses of part (b) of Theorem 9.13. Let — £, where £+ is defined as in Exercise 9.79. Show that £- is nonnegative. 9.81 Suppose that £ satisfies the hypotheses of part (b) of Theorem 9.13. Let £+ and £- be defined as in Exercise 9.80. Show that ||£||* = €-|-(l) + ^-(1)- Hint: If 0 < g < 1, then ||2p — 1|| < 1 and, so, ||£||* > 2£(g) — 1. 9.82 Show that the set function д* defined in the proof of Theorem 9.14 satisfies condition (d) of Proposition 9.5. 9.83 Prove the uniqueness part of Theorem 9.14.
9.6 The Dual Spaces of C(Q) and Co(Q) □ 573 9.6 THE DUAL SPACES OF C(Q) AND C0(Q) In this section, we extend the Riesz-Markov theorem (page 567) to arbitrary bounded linear functionals on C(Q). We will also characterize the bounded linear functionals on Со(П) when Q is a locally compact Hausdorff space. These results show that we are justified in writing C(Q)* = M(Q) and Co(Q)* = M(fi) in the compact and locally compact cases, respectively. LEMMA 9.1 Suppose that Q is a compact Hausdorff space and p G Af(Q). Further suppose that ф is a complex-valued Borel measurable function such that |0| < 1 |/x|-ae. Then there is a sequence {/n}^=i of continuous functions such that || fn||q < 1 for each n and fQ \ fn — ф\ d\p\ —> 0 as n —> oo. PROOF: By Exercise 9.60 on page 562, we can choose a sequence of Borel measurable simple functions such that \фп\ < 1 |//|-ae for all n and limn-^oo фп = ф |/i|-ae. Applying the dominated convergence theorem, we get that lim [ \фп - ф\ d\p\ = 0. (9.31) n~*OO JQ Let n G Af be fixed but arbitrary. We can write фп = <*kXEk, where | < 1 for each к and the EkS are pairwise disjoint Borel sets whose union is Q. Using the regularity of д, we can find compact sets Fk C Ek such that |/i|(£?fc \ Fk) < 1/nm for к = 1, 2, ..., m. For each fc, we can write ак = |a&\егвк, where 0 k € [0,2%). If x G Fk, define uq(x) = |ak| and vq{x) = 0k- Since the FfcS are pairwise disjoint and closed, the functions uq and vq are well-defined and continuous on (JfcLi and, furthermore, |uo| < 1. By Tietze’s extension theorem (page 451), we can extend uq and vq to continuous real-valued functions и and v on all of fi such that |u| < 1. Let fn = ueiv. Then fn = фп on UZLi F* and \\fn||n < L Moreover, I \фп fn\ d\p\ = I |ctfc — fn\ d|/x| JQ k=l JEk m p W + l/W (9.32) m m < 2|m|\ Fk) < 52 2/mn = 2/n. k=l k=l It follows from (9.31) and (9.32) that Итп_чоо Jq \fn — Ф\ d|g| = 0.
574 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces THEOREM 9.15 Riesz Representation Theorem Let Q be a compact Hausdorff space. Then I G if and only if there exists a p G M(Q) such that f fdp, feC(ty. Jn Furthermore, the measure д is unique and satisfies PH» = |Д|(П). (9.33) (9.34) PROOF: In the penultimate paragraph before Definition 9.12, we showed that each p G M(Q) induces a bounded linear functional on C(Q) via the relation (9.33). Conversely, suppose that t G C(Q)*. Define W) = IШ + €(/)) and £im(/) = 1(£(/) - £(/)). 2 2г Then £re and £im satisfy t 4- and the hypotheses of Theo- rem 9.13(b) on page 566. Therefore, by the Riesz-Markov theorem, there are measures /ii, /12, Рз, P4 € M_|_(fi) such that 4e(/) = [ fdpr- [ fdp2 and 4m(/) = [ fdp3- [ f dp4 for all f G C(Q). Thus, the measure p = pi — p2 + г(рз — pt) belongs to M(Q) and satisfies (9.33). To verify (9.34), we note first that НФ = sup = sup( j^f dp. : ||/||n <11 < sup{ №|д|(П) : № < 1 } = |д|(П). To prove the reverse inequality, we use Exercise 6.117 on page 387 to obtain a Borel measurable complex-valued function ф such that ]</>| = 1 |/z|-ae and Jnvdp = JQvфd\p\ for all v G £х(|д|). Applying Lemma 9.1 to ф, we choose a sequence {/n}^=i of continuous functions such that ||/п||п < 1 and fQ \ fn — ф\ d\p\ —» 0. We have fndp- |/z|(Q) = [ ф(/п~Ф)Л\р\ < [ |/n — <£|d|/z|. It follows that |/z| (Q) < ||£||* and, hence, (9.34) holds. The proof of unique- ness is left to the reader as Exercise 9.84.
9.6 The Dual Spaces of C(Q) and Co(fl) □ 575 The Case Q Locally Compact Next we extend Theorem 9.15 to locally compact, noncompact Hausdorff spaces. In this case, we work with Co(fl) rather than C(fl) because || ||q is no longer a norm on C(Q). THEOREM 9.16 Riesz Representation Theorem Let fl be a locally compact, noncompact Hausdorff space. Then t G Co (fl)* if and only if there exists a p G M(fl) such that [ fdp, JQ f e Co(fl). (9.35) Furthermore, the measure p is unique and satisfies ||£||* = |/i|(Q). PROOF: Let t G Co (fl)*. We will prove the existence of the measure p satisfying (9.35), leaving the proofs of the remainder of the assertions to the reader as Exercise 9.86. Let fl* = Q U {o>} denote the one-point compactification of Q, as described in Theorem 7.17 on page 480, and define the function L on C(fl*) by L(g) = ^(<7|q —<?(^)). Clearly L is linear. That it is also bounded, follows from 1Ш1 = W<7|n-ff(u0)l < IWIsiq -ffMllfi < 2||£||.^||n- Hence, by Theorem 9.15, there is a measure p* G Af(fl*) such that L(g) = [ gdp\ geC(Q*). J a- Letting p = Mjs(Q), we obtain £($) = [ gdp+ д(ш)р'9 e C(Q*). (9.36) JQ Now let f G Co(fl). By defining /*(#) = f(x) for x G fl and /*(o?) = 0, we can extend f to a function /* G C(fl*) having the same norm; indeed, Co (fl) is the collection of restrictions to fl of functions in C(fl*) that vanish at ш. We have by (9.36) that £(/) = L(J*) = f^f dp. The regularity of p follows from Exercise 9.85.
576 □ Chapter 9 Hilbert Spaces and the Classical Banach Spaces * Two simple but instructive illustrations of Theorem 9.16 are provided in Example 9.17. In the next chapter, we will see more elaborate applica- tions of the results of this section. 9 EXAMPLE 9.17 Illustrates Theorem 9.16 a) When it is given the discrete topology, the set of positive integers N becomes a locally compact space. Cq(AT) is simply the collection of all sequences {an}^=1 of complex numbers such that Нтп-юо ап = 0. By Exercise 9.73, we can identify M(AQ with ^(AQ and so we can write Cq(AT)* = €1(Af). It follows from Theorem 9.16 that the bounded linear functionals on are of the form £(a) = 52^=1 an&n, for some b E ^(A/*) and, furthermore, that ||£||* = SmLi l&n|> b) Let Q be a locally compact Hausdorff space and жо € П. Define the function I on Co(fi) by £(/) = /(xo). Clearly, £ € Co(Q)* and ||£||* < 1. Since /(x0) = ft follows from the uniqueness part of Theo- rem 9.16 that /z = 6Жо. Moreover, ||^||* = |6Жо|(Q) = = 1. □ EXERCISES 9.6 9.84 Verify the uniqueness assertion in Theorem 9.15. 9.85 Let Q be a locally compact, noncompact Hausdorff space and fi* = Qu{oo} its one point compactification. a) Show that B(Q) C B(Q*). b) Show that /z 6 M(Q) if and only if there exists /z* € such that д*(В) = м(В) for all В € B(Q). 9.86 Verify the assertions in Theorem 9.16 that we did not prove. 9.87 Refer to Exercises 7.149 and 7.150 on page 475. Let Q be a compact Haus- dorff space, g a lower-semicontinuous function on Q, and д € M+(Q). Prove that fQgdp = sup{ f f dp: f € C(Q) and f < g }. 9.88 Let Q and A be compact Hausdorff spaces, p E M(Q), and G: Q —» A be continuous. a) Show that there is a measure v E M (A) such that Д / di/ == f о G dp for all f E C(A). b) Verify that и = p о G"1, the measure induced by p and G. 9.89 Define the linear functional € on C([0,1] x [0,1]) by €(/) = f(x,x)dx. Describe explicitly the measure p that satisfies €(/) = f dp. Hint: Refer to Exercise 9.88. 9.90 In Exercise 4.158 on page 256, we defined the convolution product of two nonnegative сг-finite Borel measures on 7£. An alternative definition that holds for any two (complex) Borel measures on 1Z is given as follows. For
9.6 The Dual Spaces of C(fi) and Co(Q) □ 577 /1, v 6 define the convolution product of p and v to be the unique measure /i * v G M(1V) satisfying ' fdp*v= I / f(x + y)dp(x)dv(y), feCo(1Z). n J'rJ’R Show that for /z, i/ € M+(7£), this definition agrees with the one given in Exercise 4.158(d) on page 256. 9.91 Refer to Exercise 9.90. For p 6 M(7£), find p * <5q. 9.92 Let Q be a locally compact Hausdorff space and и 6 M+(Q). Denote by AC(y) the collection of measures in M(Q) that are absolutely continuous with respect to v. Prove that AC(y) is a closed subspace of M(Q). 9.93 Refer to Exercise 9.92. Show that C1 (y) is isometrically isomorphic to AC(y) via the correspondence f —► i/y, where = f f du.
Stefan Banach (1892-1945) Stefan Banach was bom in Krakow, Poland, on March 30, 1892, the son of a railway offi- cial. His parents gave Banach to a woman who gave him her name. Banach graduated from high school in Krakow in 1910. He supported himself by tutoring for the last three years. Although Banach attended the University of Lvov, he was awarded his doctorate in mathe- matics in 1919 under the unusual circumstance of not completing a uni- versity education. Banach's thesis, "Sur les operations dans les ensembles abstraits et leur application aux equations integrates,” was published in Fundamenta mathematicae in 1922. A professor at the University of Lvov from 1927, Banach was also a member of the Polish and Ukrainian Academies of Science. With his friend H. Steinhaus he founded the journal Studia Mathematica. Through his writings and through his students, many of whom, lite S. Mazur, W. Orlicz, J. Schauder, and S. Ulam, became notable researchers, Banach exerted enormous influence on mathematics. In his classic monograph, Theorie des operations lineaires," he laid the foundations of modern functional analysis. He also made important contributions to the theory of measure and integration, to orthogonal series, and to general topology. During the Nazi occupation of Lvov (1941-1944), Banach was forced to work in a German infectious disease institute where his health was broken. He died in Lvov less than a year later on August 31, 1945. 578
□ 10 Basic Theory of Normed and Locally Convex Spaces In this chapter, we will develop the basic theory of normed spaces. We will also present results on locally convex spaces, spaces that include the normed spaces as well as interesting spaces like C(Q), where Q is locally compact but not compact. Section 10.1 discusses the Hahn-Banach theorem and some of its most important consequences. In Section 10.2, we investigate linear transfor- mations of Banach spaces. Section 10.3 introduces locally convex spaces and examines some of their fundamental properties. Section 10.4 discusses locally convex topologies on normed spaces and their duals. And, in Sec- tion 10.6, we present the Krein-Milman theorem, a result about compact convex subsets of locally convex spaces. 10.1 THE HAHN-BANACH THEOREM We begin by introducing notation for a normed space that is suggested by the duality theory of Hilbert spaces — each bounded linear functional £ on a Hilbert space H is of the form £(x) = (x, y) for some у G H. (See Theorem 9.8 on page 551.) 579
580 d Chapter 10 Basic Theory of Normed and Locally Convex Spaces Let (Q, || ||) be a normed space. For x E Q and x* E П*, we define (x,x*) = x*(x). And when A С fl, we let A1 = {x* e П* : (x,x*) = 0 for all x € A}. As the reader is asked to verify in Exercise 10.1, Ax is a closed linear subspace of П*. The notation we have just introduced and Theorem 9.3 on page 542 suggest the following conjecture: Let К be a closed linear subspace of the normed space Q and let x e П. Then there is an Xq E KL such that ||xq||* < 1 and p(x, A") = inf{ ||® - 2/|| : У € К } = sup{ |(x,®*)| : x* 6 and ||x*||, < 1} = {x,Xq). Verifying this conjecture depends on being able to extend a linear functional on span({x} U K) to all of Q without increasing its norm. This requires a fundamental result that is the main topic of this section, namely, the Hahn-Banach theorem. THEOREM 10.1 Hahn-Banach Theorem Let V be a linear space with real scalars and p a real-valued function on V such that p(u + v) < p(u) + p(u), u, v E V and p(au) == ap(u), v E V, а > 0. Suppose Vo is a linear subspace of V and £q is a linear functional on Vfc such that 4)(v) < p(v), V E Vo- Then there exists a linear functional I on V such that £(v) = £q(v) for each v E Vo and £(u) < p(u) for each u^V.
10.1 The Hahn-Banach Theorem □ 581 PROOF: If Vb = V, there is nothing to prove. So, assume Vo is a proper subspace of V. We begin by enlarging only slightly the domain of the functional £q. Let Vi G Vq and consider the linear subspace Vi = { avi 4- v : a e 7£, v G Vq }• Tq define a linear functional on Vi that agrees with £q on Vb and satisfies £i(avi 4- v) < p(avi 4- v), a G TZ, v G Vo, (Ю.1) it suffices to assign a value /3 to ^i(vi) such that a/3 < p(avi 4- v) — £q(v), a eTZ, v G Vq. (10.2) Indeed, if we can find such a /3, then the mapping £f. Vi —> TZ defined by -£i(avi 4-v) = a/34-4)(^) will give the required extension, as the reader can easily verify. If a = 0, then, by hypothesis, (10.2) holds for any choice of /3. If a > 0, then (10.2) holds if and only if /3 < oT^ptavi 4- и) — а“х€о(и) = p(vi 4- a-1v) — ^(a-1^) for each v eVq. As v varies over all of Vo, so does a~rv. Hence (10.2) holds for a > 0 if and only if (3 < inf{p(vi 4-u) — £q(u) \ u G Vb}. Similarly, if a < 0, then (10.2) holds if and only if —p(-Vi - q-1v) 4- 4)(—a-1v) = a“xp(avi 4- v) — а-1Л)(^) < for each v G Vq. Hence (10.2) holds for a < 0 if and only if sup{ -p(-vi 4- w) 4- £q(w) ' w EVq} < /3. It follows that we can choose a suitable value for Л(^1) if sup{ -p(-vi 4- w) 4- Zq(w) : w G Vq } (10.3) < inf{p(i>i + u) - £q(u) : и e Vo }.
582 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces We will now verify (10.3). For u, w G Vq we have £0(u) 4- 4(w) = £0(u 4- w) < p(u 4- Vi - щ 4- w) < p(u 4- Vi) 4- p(-vi 4- w). Thus, -p(-Vi 4- w) 4- ^o(w) < p(vi 4- u) - £q(u). Since и and w are arbitrary members of Vq, it follows that (10.3) is valid? Consider the collection £ of pairs of the form (L, W), where TV is a linear subspace of V containing Vq and L is a linear functional on W that agrees with on Vo and satisfies L(w) < p(w) for each w G W. We define an order relation -< on £ by (Iq, Wi) -< (L2, W%) if Wi C W2 and L2 agrees with L\ on W\. As the reader is asked to verify in Exercise 10.2, -< is a partial or- dering and each chain in £ has an upper bound. It follows from Zorn’s lemma (page 17) that £ has a maximal element (-C,K;) with respect to the ordering -< To complete the proof of the theorem, it suffices to show that V^ = V. Suppose to the contrary that V£ / 0 and let G V°. Then we can apply the argument used in the beginning of the proof with V^ replacing Vq, replacing and replacing Vi to obtain a linear functional on span(l<u U {vw}) such that (C5span(K, U {v^})) G £ and (4v>K>) (£u,span(K, U {v^})). Thus, we have reached a contradiction to the maximality of (Л^ИД It follows that Vu = V. EXAMPLE 10.1 Illustrates the Hahn-Banach Theorem Let £^°(jV) or, more briefly, ££°, denote the real linear space consisting of all bounded sequences of real numbers and set E = { x G : lim xn exists }. f Having succeeded in finding a suitable extension of the linear functional £0 to the subspace Vi = span(Vo U {vi }), we could now proceed inductively to find a sequence of subspaces of the form Vn = span(Vb U fyi, г>2> • • •»vn}) and corresponding linear functionals £n, that extend Iq and satisfy £n(w) < p(u), in the hope that the Vns would exhaust V. That this approach cannot work in general can be seen by considering a space V that is not the span of a countable set. Nevertheless, as the following argument shows, this idea becomes effective if we replace the inductive procedure by “transfinite induction” based on Zorn’s lemma.
10.1 The Hahn-Banach Theorem □ 583 Consider the linear functional Lo on E defined by Lq(x) = lim^oo xn. We will use the Hahn-Banach theorem to extend Lq to all of Define p: -► H by p(x) = limsupn_oo n”1 £X=i xk- Then p satis- fies the hypotheses of the Hahn-Banach theorem and, according to part (b) of Exercise 2.35 on page 56, p(x) = Lq(x) for all x G E. It follows that there is a linear functional L on that agrees with Lq on E and satisfies L(x) < p{x) for all x G The functional L shares with Lq the following properties: liminf xn < L(x) < limsupzn and L(x) = (Ю.4) n~*oo where = zn+i for each n. (See Exercise 10.3.) For this reason L can be thought of as assigning a “generalized limit” to any bounded sequence of real numbers. Linear functionals on ££° satisfying (10.4) are called Banach limits. □ Next we present a version of the Hahn-Banach theorem that is valid in the case of complex scalars. THEOREM 10.2 Hahn-Banach Theorem (Complex Version) Let V be a linear space with complex scalars and p a real-valued function on V such that p(u 4- v) < p(u) 4- p(v), it, v G V and p(av) = |a|p(v), v G V, a G C. Suppose Vq is a linear subspace of V and £q is a linear functional on Vq such that |^o(v)| <p(v), ve Vq. Then there exists a linear functional £ on V such that £(v) = £o(v) for each v G Vo |/?(u)| P(w) ^ог eac^ V. PROOF: As 1Z С C, it follows that V and Vq are also linear spaces with respect to 71. Furthermore, JJfo is linear with respect to real scalars and satisfies Э?£о(^) < l^o(^)| < p(v) for each v G Vq. Hence, we can apply Theorem 10.1 to obtain a function £r on V that is linear with respect to real scalars and satisfies £r(v) = 3Wo(^) for each v E Vq and £r(u) < p(u) for each и G V.
584 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces We note that ?R£o(iv) = -S£o(^) and, so, 4)(^) = ~ i3W0(w)- Thus, the function £ defined on V by £{u) = £T(u) — i£r(iu) agrees with £o on Vq and it is easy to see that £ is linear with respect to complex scalars. We will complete the proof by showing that |-£(u)| < p(u) for each и € V. Choosing a complex number a such that ]a| = 1 and a£(u) = |€(u)|, we have |£(u)| = £(au) = £r(au) < p(au) = |a|p(u) = p(u), as required. COROLLARY 10.1 Let S be a linear subspace of the normed space Q and let £ € S*. Then there exists an L G fl* such that L\$ = £ and ||L||* = ||£||*. PROOF: Apply Theorem 10.2 with p(x) = ||£||*||x||. Armed with Theorem 10.2, we can now prove the conjecture made on page 580 in the paragraph prior to the statement of the Hahn-Banach theorem. THEOREM 10.3 Let К be a closed linear subspace of the normed space Q and let x e SI. Then there is an Xq G K"*1 such that ||xq||# < 1 and p(x, K) = inf{ ||x - 2/|| : у G К } = sup{ |(x,x*)| : x* G and ||x*||* < 1} = (x,Xq). PROOF: We will prove the theorem under the additional assumption that x G Kc, leaving to the reader the case x G К as Exercise 10.4. Let Vo = {otx+y : a G С, у G К } and £q the linear functional defined on И) by £Q(otx + у) = ap(x, K). Then |£0(a# + y)\ < ||as + p||. Thus, £o satisfies the hypotheses of Theorem 10.2 with p = || ||. It follows that there is a linear functional Xq on fl satisfying Xq(olx + у) = ap(x, K“), a G С, у G К (10.5) and l*5(*)l < ll< * G Q. (10.6)
10.1 The Hahn-Banach Theorem □ 585 The relations (10.5) and (10.6) show that we have x$ g K1, ll^oll* — and Xq(x) = p(rr, K). Finally, let x* G K1- with ||rr* ||* <1. If у G K, then we have 1(*.И1 = |x‘(x) - x*(j/)| < ||x*||.||x - 2/11 < ||x - j/||. Taking the infimum over у G К, we get that |(rr, rr*)| < p(x, K). Therefore, we have sup{ |(z,x*)| : x* G and ||z*||* < 1} < p(x, K) = (x,Xq). The reverse inequality is trivial. As our first application of Theorem 10.3, we use it to prove an attrac- tive and useful symmetry between the norms || || and || ||*. COROLLARY 10.2 Let (fi, || ||) be a normed space, xq G fi, and Xq G fi*. Then ||x0|| = sup{ |(x0,x*)|: ||x*||, < 1} (10.7) and Цх^Н» = sup{ |(x,Xq)| : ||x|| < 1}, (Ю.8) and, moreover, the supremum in (10.7) is attained. In particular, if xq / 0, then there is an x* G fi* having norm 1 such that ||xo|| = {xq,x*). PROOF: Equation (10.8) is just the definition of the norm of the bounded linear functional Xq. To obtain (10.7), we apply Theorem 10.3 with x = xq and К — {0} to obtain an x$ G fi* having the properties ||zj ||* < 1 and xj(rro) = p(zo, {0}) = H^oll- It follows that Ikoll < sup{|(xo,x*)| : ||x‘||. < 1}. (Ю.9) On the other hand, we have |(xq, x*)| < ||zo|| whenever ||z*||* < 1. Conse- quently, the reverse of (10.9) is also valid. Finally, since (zo/lko||) = 1, we see that ||ar*||* = 1.
586 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces EXAMPLE 10.2 Illustrates Theorem 10.3 Let fi be a compact Hausdorff space and F a nonempty closed subset of fi. We consider the linear subspace of C(fi) defined by Tp = { / € С(П) : f(x) = 0 for each x € F }. Theorem 9.15 on page 574 shows that we can identify C(Q)* with the space M(fi) of regular Borel measures on fi. We claim that . Ti = {M G M(fi) : |/i|(Fc) = 0}. (10.10) That Tp contains the right-hand side of (10.10) follows easily from the equation fQf dp = fF f dp 4- JFc f dp. To show that T^ is contained in the right-hand side of (10.10), it suffices to prove that p(E) — 0 whenever p ETp and E is a closed subset of Fc. Let e > 0 be given. By the regularity of the measure p there is an open set О such that E С О and |/i|(O \ E) < e. Furthermore, we can choose О so that О C Fc. Next, we use Urysohn’s lemma to obtain a continuous function g such that 0 < g < 1, g(E) = {1}, and g(Oc) = {0}. Then g G Tf and, hence, 0= / gdp = p(E) 4- / gdp. Jq Jo\e It follows that |д(Я)| = <|/z|(<9\^)<6. Thus, p(E) = 0. Having established (10.10), we can now use Theorem 10.3 to assert that for each f G C(fi), there is a regular Borel measure p0 such that 1мо|(П) < 1, |zio|(Fc) = 0, and P(f,lF) = SUp ( fdji : |д|(П) < 1 and |д|(Гс) = (A = [ f dp,Q Q ) JO, for some measure pQ. (See Exercise 10.12.) It is also clear from (10.10) that Tp can be identified with M(F) since it consists of measures in M (fi) that vanish on Borel subsets of Fc. □
10.1 The Hahn-Banach Theorem □ 587 EXAMPLE 10.3 Illustrates Corollary 10.2 a) Consider the space £p(/z), where 1 < p < oo, and let q be defined by 1/p+l/q = 1. Applying the Riesz representation theorem for £p-spaces (Theorem 9.12 on page 559) and Corollary 10.2, we obtain the following fact: If f G £p(/z), then there is a function g G such that ||p||g < 1 and ll/llp = fnfgdp. b) We can use Corollary 10.2 to show that Theorem 9.12 cannot be ex- tended to the case p = oo. To do that, consider the space £°° and define {rn}^=1 G £°° by xn = n/(n 4-1). Note that ||x||oo = 1- If we could extend Theorem 9.12 to hold for p = oo, then we could apply Corollary 10.2 to find a у G t1 such that oo oo oo 52 = Mi =1 = ikiioo = n=l n=l n=l However, this is impossible because the quantity on the right-hand side is in modulus strictly less than the quantity on the left. □ More on Duality As we mentioned at the beginning of this section, the notation (x,x*) is motivated by Hilbert space theory. Specifically, if TL is a Hilbert space, then each bounded linear functional on H is of the form (,y(x) = (x,y) for some у e 7Y and, moreover, ||£y ||* = ||j/||. The correspondence j-/H —► 7Y* defined by j(y) = ty is, therefore, onto and isometric. It is also almost, but not quite, linear. Indeed, we have j(y 4- z) = j(y) 4- j(z) and j(ay) = aj(y). Due to these properties of j, we can use it to identify W* and H. When this identification is made, it becomes clear that the notation A1 = {z* G fi* : {x,x*} = 0 for all x G A } has the same meaning for Hilbert spaces as the notation for the orthogonal complement of a set introduced in Definition 9.7 on page 542. When fi is a normed space, but not a Hilbert space, and А С fi, then A1 no longer resides in fi but, rather, in fi*. Thus, while the notation (A.-1-)-1- makes sense in Hilbert spaces, it does not in general normed spaces. One way to generalize the notation for a double orthogonal comple- ment from inner product spaces to general normed spaces is to define, for В C fi*, = { x G fi : (x,x*) = 0 for all / G В }. Then an analogue of the notation that makes sense in arbitrary normed spaces is ±(A±).
588 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces THEOREM 10.4 Let Q be a normed space, A C fl, and В C fl*. Then the following hold: a) LB is a closed linear subspace of fl. Ъ) ±(А±) = span A. c) If A1- = {0}, then span A is dense in fl. PROOF: The proof of (a) is left to the reader as Exercise 10.11 and (c) fol- lows immediately from (b). Thus, we move on to the verification of (b). From (a), we know that ^(A-1-) is a closed linear subspace of fl. Therefore, because А С ±(А±), we have span A C ±(A±). To prove the reverse inclusion, let x G X(AX). Applying Theorem 10.3 with К = span A, we conclude that there exists an x* G span A such that p(x,spanA) = (x,x*). Because spanA C A-1-, it follows that (x,x*) = 0. Thus, x G spanA. EXAMPLE 10.4 Illustrates Theorem 10.4 In this example we make use of the following result from the theory of analytic functions: Let д be a function that is analytic on a connected open subset О of C. If there is a sequence {bn}^=i of distinct elements of О such that b = limn^oo bn exists and belongs to О and д(Ъп) = 0 for each n G Af, then д vanishes identically on all of O. Consider the space C([a,5]), where а < b and 0 [a, 6]. For a G C, define fa(x) = l/(rr —a) and let {an}^Li be a sequence of distinct elements of C \ [a, b] with lim^oo an — 0. We will prove that span{ fan : n G Af } is dense in C([a, 5]) by showing that {/fln : nGJV}1 = {0}. Suppose p G { fan '•ntNJ1 and let 9&) = I Because д is analytic on the open connected set C \ [a, b] and g(an) = 0 for n G AT, we conclude that g vanishes identically on C \ [a, 6]. This, in turn, implies that the nth derivative vanishes identically on C \ [a, 6]. In particular, then, f^a x~n dp(x) = 0 for each n G Л’. It follows from the complex version of the Stone-Weierstrass theo- rem (page 522) that the span of the functions 1, or1, rr“"2, ... is dense
10.1 The Hahn-Banach Theorem □ 589 in C([a, b]). From this we can conclude that = 0 for each f e C([a, b]) and, consequently, /1 = 0. We leave the details for the reader as Exercise 10.13. □ EXERCISES 10.1 10.1 Let A be a subset of the normed space Q. Show that A1' is a closed linear subspace of Q*. 10.2 In the proof of Theorem 10.1, verify that -< is a partial ordering and that each chain of 8 has a -<-upper bound. 10.3 Verify (10.4) in Example 10.1. Hint: See Exercise 2.35 on page 56. 10.4 Prove Theorem 10.3 in the case where x € K. ★10.5 For a normed space Q, let Q** denote the dual space of the dual space, that is, (Q*)*. Let —> Q** be defined by (x*, J(x)) = (x,x*). Show that J is a linear isometry. 10.6 Show that the mapping J defined in Exercise 10.5 is onto if Q is a Hilbert space. 10.7 Show that the mapping J defined in Exercise 10.5 is onto if Q = £p(/z), where 1 < p < oo. 10.8 Show that the mapping J defined in Exercise 10.5 is not onto if Q = 10.9 Prove that Q is a separable space if Q* is separable. 10.10 Show that the converse of the assertion of Exercise 10.9 is false. 10.11 Prove part (a) of Theorem 10.4. 10.12 Show that in Example 10.2, the measure /io can be chosen to be &6X for some x E F and some constant a with |a| = 1. 10.13 Refer to Example 10.4. a) Show that the function g is analytic on C \ [a, b]. b) Verify the formula g^n\z) = n! /(x — z)n+1 dp(x). c) Prove that span{ : n E A/"} is dense in C([a, 5]). d) Use part (c) and the fact that x~n dp(x) = 0 for each n E A' to show that х~г/(х) dp(x) = 0 for all f E C([a, 5]). e) Use part (d) to conclude that fB x-1 dp(x) = 0 for each В E B([a, b]). f) Deduce from part (e) that p = 0.
590 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces 10.2 LINEAR OPERATORS ON BANACH SPACES Linear operators (mappings) appear in most branches of mathematics, but especially in analysis, where differentiation and integration are basic pro- cesses. In this section we present some important general results about continuous (bounded) linear operators on Banach spaces. The Open Mapping Theorem Let T be a continuous function from a topological space Q to a topological space Л. Then, by definition, the inverse image of an open set under T is open, that is, T“1(C7) is open in Q whenever U is open in Л. On the other hand, as the reader is asked to verify in Exercise 10.14, it is not generally true that the image of an open set under T is open, that is, that T(O) is open in Л whenever О is open in Q. However, if Q and Л are Banach spaces and T is linear, continuous, and onto, then our next theorem, called the open mapping theorem, shows that T does carry open sets to open sets. We will employ the following notation. Suppose A and В are subsets of a linear space and a is a scalar. Then A + В = {x + у : x e А, у e B} and aA = {ax : x e A}. Furthermore, we define A ~ В = A -I- (-l)B and x + В = {ж} + В. Note that, in a normed space Q, we have Br(x) = x + rj?i(0) for all x G Q and r > 0. THEOREM 10.5 Open Mapping Theorem Let Q and Л be Banach spaces and T: Q —► Л be continuous, linear, and onto. Then T(O) is open in A whenever О is open in Q. PROOF: We claim that it suffices to prove that there is an б > 0 such that B6(0) C^B^O)). (10.11)
10.2 Linear Operators on Banach Spaces □ 591 Indeed, suppose that (10.11) holds. If О G Q is open, then for each x G O, there is a 6 > 0 such that x + <5Bx(0) = В$(х) С O. Hence, by (10.11), B6c(T(x)) = T(x) + <5eBx(0) = ВД + <5B€(0) G Т(ж) + <5T(Bx(0)) = T(x + <5Bx(0)) C T(O). As T(x) is an arbitrary point of T(O), it follows that T(Q) is open. Therefore, to establish the theorem, we need only show that (10.11) is valid. As a first step, we will prove that there is an e > 0 such that ве(о)сад/2(о)). (Ю.12) Since fi - UXi п-®1/г(0), we have Л = T(fi) = (j nT(B1/2(0)). n=l It follows from the Baire category theorem (page 494) that there exist m G A/", т/o € A, and a > 0 such that Ва(уо) G mT(BX/2(0)) and, conse- quently, that 2/o/^ + ^a/m(0) С Т{Вг/2^)) • Because y$/m G we also have —y^/m G (—l)T(Bx/2(0)) = T(BX/2(0)). Thus, ^a/2m(0) = (l/2)Ba/m(0) = -2/o/2m + (1/2) (2/0/m + Ba/m(0)) G (1/2)T(Bx/2(0)) + (1/2)T(Bx/2(0)) GT((1/2)Bx/2(0)) + (1/2)Bx/2(0)) CT(Bx/2(0j). (See Exercise 10.15.) We have now verified (10.12) with e = a/2m. Finally, we will derive (10.11) from (10.12). Let у G Be(0). By (10.12) we can find an zx G Bx/2(0) such that \\y — T(xx)|| < e/2. As 2/-T(xx) G (l/2)Be(0) G (1/2)T(Bx/2(0)) = T(Bx/4(0)), it follows that there exists rr2 G Bx/4(0) such that \\y—T(rrx)—T(x2)|| < e/4. Proceeding in this fashion and using mathematical induction, we obtain a sequence {zn}^=x of elements of Q such that ||xn|| < 2 n and n J=1 < e/2n.
592 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces Because П is a Banach space, Proposition 9.3 on page 532 implies that the series Xn converges, say, to x. Noting that oo oo и< £h„ii<E2~" = i’ n=l n=l we conclude that x G #i(0). By the continuity of T, we have T(x) = у and, so, у G T(Bi(0)). We have shown that Bc(0) C T(Bi(0)). COROLLARY 10.3 Suppose that the operator T satisfies the hypotheses of the open mapping theorem and is also one-to-one. Then the following hold: a) T-1 is linear and continuous. ь) iriir1 < iiit-iiii. V^e have щт-1г1Ы < ЦГ(х)|| < 1ЦТЦ1М for all x G Q. PROOF: It is easy to see that the inverse function T^.A —> Q is linear. Moreover, as (T”1)“1(O) = T(O) is open in Л whenever О is open in Q, T-1 is continuous. Thus, (a) holds. By (a), we know that |||T”1||| < oo. For each у e Л, we have 112/11 = ||Г(Г-1(У))11 < Wil ||T_1(!/))|| < 111Л1 Г'1 III НИ, from which (b) follows immediately. To obtain (c), we need only prove the first inequality. We observe that, for each x G Q, ' ||x|| = ||Г-1(Г(х))|| < 1117-411 ||7(x)||, as required. EXAMPLE 10.5 Illustrates Corollary 10.3 It follows from Exercises 3.77 and 8.64 on pages 148 and 525, respectively, that the Laplace transform L defined by L(/)(s)= [ e~8Xf(x)dx, s>0, Jo
10.2 Linear Operators on Banach Spaces □ 593 is a one-to-one linear operator from the Banach space (£1([0, oo)), || ||i) into the Banach space (Co([0, oo)), || ||[o,oo)) • Because |L(/)(S)|< f°° e~ax \f(x)\dx< Г° \f(x)\dx, Jo Jo we have ||L(/)||[o,oo) < ll/lli- И follows that L is continuous and |||L||| < 1. We will use Corollary 10.3 to show that L is not onto, that is, there are functions in <7o([O, oo)) that are not Laplace transforms of functions in£x([0, oo)). Suppose to the contrary that L is onto. By Corollary 10.3(c), there is a positive constant c such that cll/llx < ||L(/)||[0,oc) (10.13) for all f G Z31 ([0, oo)). Let n G X and define fn = X[n,n-|-1) X[n4-l,n+2)* Then \\fn||i = 2. Moreover, L(/n)(s) = e-n\e'2s - 2e~s + l)/s = se~ns((e“5 - l)/s)2. It is easy to check that the maximum of ((e”s — l)/s)2 is 1 and that the maximum of is 1/ne. Thus, using (10.13) we obtain 2c < 1/ne. As ntM was chosen arbitrarily, we conclude that c = 0, a contradiction. □ One can get a better appreciation for the power of the open mapping theorem by trying to explicitly construct a function in Cb([0, oo)) that is not a Laplace transform of a function in £x([0, oo)). We leave that to the reader and, instead, present another interesting corollary of the open mapping theorem. COROLLARY 10.4 Let Q be a linear space. Suppose that || || and || ||o are norms on Q such that (Q, || ||) and (Q, || ||o) are Banach spaces. If there is a positive constant a such that ||z|| < a||z||o, x e fl, (10.14) then there is a positive constant /3 such that Ikllo < Ж1, x e n. (10.15)
594 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces PROOF: From (10.14), we see that the identity map Z:(Q,|| ||o) — (П,|| II) is continuous. The relation (10.15) now follows from Corollary 10.3. We note that Corollary 10.4 shows that the topology of a Banach space (Q, || ||) cannot be strictly weaker than the topology induced by a norm || ||o for which (Q, || ||0) is complete. And, as the reader is asked to verify in the exercises, Corollary 10.4 can also be used to prove that any finite dimensional normed space is isomorphic to either Cn or 1Zn for some n e AT. The Closed Graph Theorem Another application of Corollary 10.4 provides a condition equivalent to continuity for linear operators on Banach spaces. Let Q and A be normed spaces and T:Q —► A a linear operator. Then, as we know from Exer- cise 7.53 on page 437, T is continuous if and only if it satisfies lim xn = x => lim T(xn) = T(x). n—>oo n—*oo A weaker condition on T is that lim xn = x and lim T(xn) = у => у = T(x). (10.16) n—+oo n—*oo A linear operator satisfying (10.16) is said to be closed. We use that terminology because (10.16) is equivalent to the condition that the graph of T—the set {(rr,T(a;)) : x E Q} — is a closed subset of the product space Q x A. In our next theorem, called the closed graph theorem, we will prove that, when Q and A are Banach spaces, not only is being closed a necessary condition for a linear operator to be continuous, but it is also sufficient. First, however, we present an example showing that, in general, a closed linear operator need not be continuous. EXAMPLE 10.6 A Discontinuous Closed Linear Operator Let Q = { f : /' e C([0,1])} and A = C([0,1]), both equipped with the sup-norm, and let D: Q —> A be the differentiation operator, D(f) = Suppose that is a sequence of functions in Q such that fn~>f and D(fn) ~It follows from these assumptions and the second fundamental theorem of calculus that f(x) = /(0) 4- Jq g(t)dt. This, in turn, implies that D(f) = g and hence that D is closed. But D is not continuous, as can be seen by considering the sequence fn(x) = sin(nrr)/n. □
10.2 Linear Operators on Banach Spaces □ 595 THEOREM 10.6 Closed Graph Theorem Let Q and Л be Banach spaces and T: Q —* Л a linear operator. If T is closed, then it is continuous. PROOF: We define a second norm on Q by Ионм + imii and show that (Q, || ||o) is a Banach space. Let {xn}Xi a Cauchy sequence with respect to || ||o- Because ||xn — £m|| < ||xn — xm||o and ||T(zn) - T(xm)У < ||zn “ Zm||o, it follows that {zn}£°=1 and {T(x„)}™=1 are Cauchy sequences in (Q, || ||) and Л, respectively. Hence, x = limrrn and у = limjT(2:n) both exist. Because, by assumption, T is closed, we have у = T(x). It follows that lim ||xn - rc||o = lim ||xn - z|| + lim ||T(a:n) - T(x)|| = 0. n—*oo n—>oo n—+OO Thus, (Q, || ||o) is a Banach space. Since ||x|| < ||rr||o for all ж G fi, it follows from Corollary 10.4 that there is a positive constant /3 such that ||T(rr)|| < ||ж||о < /?||я|| for all x G fi. Hence, T is continuous. The Uniform Boundedness Principle for Linear Operators A look at the proof of the open mapping theorem reveals that it is a con- sequence of the Baire category theorem. We conclude this section with another application of the Baire category theorem to the theory of linear operators on Banach spaces. THEOREM 10.7 Uniform Boundedness Principle for Linear Operators Suppose that T is a collection of continuous linear operators from a Banach space fi into a normed space A. If sup{ ||T(x) || : T G T} < oo for each x G Q, then sup{ |||T||| : T G T } < oo. PROOF: By Theorem 8.2 on page 497, there exists an x§ G Q and a <5 > 0 such that M = sup{ ||T(x)|| : Tg7, ||ж — ж0|| < 6 } < oo. If ||u|| < 1, then ||T(u)|| < г?-1 (ПГ^о + Ml + ||T(xo)||) < 2M6-1. It follows that |||T||| < 2M6"1 for each T G T.
596 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces EXERCISES 10.2 10.14 Let Q and A be Banach spaces. a) Provide an example of a linear and continuous mapping from Q into A that does not take open sets to open sets. b) Provide an example of a continuous mapping from Q onto A that does not take open sets to open sets. 10.15 Show that if A and В are subsets of a normed linear space and a is a nonzero scalar, then aA = aA and x + A = x + A, A + В С A + B, and A-Вс A^B. 10.16 Provide an example of a continuous linear operator T:£l —> A, where Q and A are normed spaces, such that T is one-to-one and onto but T-1 is not continuous. Exercises 10.17 and 10.18 show that every finite dimensional normed space is isomorphic to either Cn or Нп for some n E Л'. Recall that the dimension of a linear space is the number of elements in a Hamel basis. 10.17 Let Q be a finite dimensional normed space. a) Show that the dimension of Q* is at most the dimension of Q. Hint: What is the dimension of the space of all linear functionals on Q? b) Use Exercise 10.5 on page 589 and Proposition 9.2 on page 530 to deduce that Q is complete. 10.18 Let Q be a finite dimensional normed space and {zi,яг, • • •, £n} a Hamel basis for Q. Recall that each x 6 Q can be written uniquely in the form x = aj(x)xj, where the aj(x)s are scalars. For x 6 Q, define / n \ 1/2 p(®) = (52 M*)!2) S=i / a) Show that p is a norm on Q. b) Show that there exist positive constants a and fi such that for each x 6 Q, we have ap(x) < ||x|| < fip(x). c) Deduce that Q is isomorphic to Cn or to TV1 in the case of complex or real scalars, respectively. 10.19 Let Q be a normed linear space such that Bi(0) is compact. Prove that Q is finite dimensional. Hint: Find xrj, xj,.. •, x*n E Q* such that n {x :M = 1}cU{*:|MI>1/2} j=l and consider the mapping L(x) = ((x, xj), (x, zj),..., (x, x*)).
10.3 Topological Linear Spaces □ 597 In Exercises 10.20-10.22, we consider projections of a normed space. Let Q be a normed space. A linear operator P : Q —> Q is called a projection if both the range of P and P“1({0}) are closed and P о P = P. Exercise 9.26 on page 544 shows that orthogonal projections, as defined in Section 9.2 on page 541, are projections in the sense defined here. 10.20 Show that if Q is a Banach space, then all projections of Q are continuous. 10.21 Show that if P is a projection on Q, then ||x — P(x)|| > p(x, range P) for each x € Q. 10.22 Let К be a finite dimensional subspace of Q. Show that there is a projec- tion of Q with range equal to K. Exercises 10.23-10.27 elaborate on Example 9.11 on page 547. Let Sn denote the linear operator defined on C([0,2тг]) by Sn(f) = sn. 10.23 Refer to the definition of a projection given in the paragraph prior to Exercise 10.20. Show that Sn is a projection of С([0,2тг]) having range given by Un — span{ еь : —n < к < n }. 10.24 Show that l|S„(/) - /||[0,2Ж) > p(/,Wn) > |/(2tt) - /(0)1/2. Deduce that /(2тг) = /(0) is a necessary condition for the uniform conver- gence of the Fourier series f(k)etkx to the function /(x). 10.25 Show that for each / 6 C([0,2тг]), Sn(f)(x) = (2ТГ)-1 Г Jo sm((x t)/2) 10.26 Let С„([0,2тг]) = { f € C([0,2тг]) : /(0) = /(2тг) }. Show that sup { |S„(/)(x)|: / 6 CP([0,2тг]), ||/||[0,2w) < 1} /9_л-1 sin((n+l/2)(x-t)) , - (2’> /, sin((a: —1)/2) ★ 10.27 Show that there is a function in Ср([0,27г]) whose Fourier series diverges at some point. 10.3 TOPOLOGICAL LINEAR SPACES Let fi be a locally compact Hausdorff space. We recall from Section 7.12 (see page 483) that for S G fi, pstfig) = sup{ |/(x) : X e S}, f,g e C(D).
598 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces And we also recall (see page 484) that the weak topology on C(Q) de- termined by the family of functions {рк(-,р) : К compact, g G C(Q)} is called the topology of uniform convergence on compact sets and is denoted by T(Q). As we know, if fi is compact, then . Il/lln = pn(/,o)=sup{|/(x)|:xen}, /еОД, is a norm on the linear space C(Q) that induces the topology T(Q). On the other hand, if Q is not compact, then the topology T(Q) on the linear space C(Q) is not induced by a norm. Indeed, suppose to the contrary that T(Q) is induced by a norm || ||. Because {f : ||/|| < 1} is an open set containing 0, it follows from Proposition 7.7 on page 428 that there exist nG X, compact sets /<2, • • •, Kn, and positive numbers Ci, 62, ..., 6n such that П{/ : pKi(M < } C {f : H/ll < 1}. (10.17) j=l Because Q is not compact, there is an xq Uj=i Kj- Applying Theo- rem 7.14 on page 477, we choose g G C(Q) such that p(rro) = 1 and g(x) = 0 for x G (Jj=i й now follows that from (10.17) that |a|||p|| = ||ap|| < 1 for each a G C. On the other hand, because g 0, we have ||p|| 0. Hav- ing reached a contradiction, we conclude that there is no norm inducing the topology T(fi). Nevertheless, it is clear that the topology and the linear structure on C(Q) are related. In this section, we develop a theory of topological linear spaces that encompasses not only normed spaces, but interesting spaces like (C(Q),T(Q)) as well. DEFINITION 10.1 Topological Linear Space Let fi be a linear space with scalar field F and let T be a topology on fi. Then we say that (fi, T) is a topological linear space if the operations of addition and scalar multiplication are continuous, that is, if the functions Л: fi x fi -> fi and M: F x Q —► f2 defined by Л(я, У) = x + У and M(a, x) = ax are continuous.
10.3 Topological Linear Spaces □ 599 EXAMPLE 10.7 Illustrates Definition 10.1 a) Any normed space is a topological linear space. b) If fi is a locally compact Hausdorff space, then (C(S2),T(S2)) is a topo- logical linear space, as the reader is asked to verify in Exercise 10.28. c) Let (П,Л,/г) be a measure space. For 0 < p < 1, the space £pQu) is a topological linear space with respect to the topology induced by the metric pp, where Pp(/, 9) = [ \ f~9\p dp,, f,ge JO. See Exercise 10.40. □ Unless there is a danger of ambiguity, we will write Q for the topolog- ical linear space (S1,T). In what follows, we assume that the scalar field of Q is C unless we state otherwise. Our results are easily adapted to the case of real scalars. The next two propositions provide some basic properties of topological linear spaces. Note that the second one shows that the topology of a topo- logical linear space is determined by a neighborhood basis at 0. We leave the proofs of both propositions to the reader as Exercises 10.29 and ДО,30. PROPOSITION 10.1 Let fi be a topological linear space. a) Suppose that у G Q and a is a nonzero scalar. Then the mappings T(x) = x + y and S(x) = ax are homeomorphisms of Q onto itself. b) Suppose that U is an open subset of П, A G SI, and a is a nonzero scalar. Then A+ 17 and aU are open subsets of fi. PROPOSITION 10.2 Let SI be a topological linear space. Then there is a collection W of open sets containing 0 having the following properties: a) ИД, W2 G W => W3 C Wi П W2 for some W3 G W. b) W G W and x G W => there is a G W such that x + W\ C W. c) W G W => there is a W\ G W such that Wi + W\ C W. d) W G W => there is а ИД G W and an e > 0 such that aWi C W whenever |a| < e. e) { x 4- W : x G SI, W G W } is a basis for the open sets of SI. Conversely, if W is a collection of subsets of a linear space SI satisfying (a)-(d) and 0 G W for each W G W, then {rr + W : rr G SI, W G W } is a basis for a topology T on Q and (SI, T) is a topological linear space.
600 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces Locally Convex Topological Linear Spaces and Seminorms When fi is a normed space, we can take the collection W in Proposition 10.2 to be { Br (0)*: r > 0 }. Thus, in the important case of a normed space, the sets in W can be assumed convex. This convexity property turns out to be the key to a significant generalization of normed spaces, called locally convex topological linear spaces. DEFINITION 10.2 Locally Convex Topological Linear Space A topological linear space fl is said to be locally convex if there is a collection W of convex open sets containing 0 such that a) Wi, W2 G W => W3 C Wi A W2 for some W3 G W. b) W G W and x € W => there is a G УУ such that x + C W. c) W G W => there is a W\ G W such that Ж + Ж C W. d) W G W => there is a Wi G W and an e > 0 such that aW\ C W whenever |a[ < e. e) { x + W : x G fi, W G W } is a basis for the open sets of Q. EXAMPLE 10.8 Illustrates Definition 10.2 a) Any normed space is a locally convex topological linear space. b) As we will soon see, if Q is a locally compact Hausdorff space, then (С(О),Т(П)) is a locally convex topological linear space. c) Exercise 10.41 shows that the space in Example 10.7(c) is not a locally convex topological linear space. □ Locally convex topological linear spaces are often defined in terms of collections of objects called seminorms. A seminorm has the defining prop- erties of a norm except that the seminorm of a nonzero element may be 0. DEFINITION 10.3 Seminorm Let fl be a linear space having as its scalar field F either or C. A function a: Q —> is said to be a seminorm on Q if it satisfies the following conditions for all x,y G Q and a G F: a) a(x) > 0. b) <t(0) = 0. c) crfotx) = |o|cr(o:). d) o(x + y) < cr(x) + o(y).
10.3 Topological Linear Spaces □ 601 Remark: Although condition (b) follows from condition (c), we have in- cluded the former to retain the resemblance of a seminorm to a norm. Let a be a seminorm on a linear space fi. Then for each x G Q and r > 0, we define B°(x) = {y : <?&-y) < r}- It is important to note that, by the defining properties of a seminorm, sets of the form B° (x) are convex. EXAMPLE 10 .9 Illustrates Definition 10.3 a) Any norm is a seminorm. b) Let fi be locally compact, noncompact, Hausdorff space and К a com- pact subset of Q. The function || ||k defined by \\Л\к = Pk(/,0) = sup{ \f(x)\ : x G K}, fE ОД, is an example of a seminorm that is not a norm. c) If Ms a linear functional on a linear space V, then |f| is a seminorm on V. ) □ Let 5 be a collection of seminorms defined on a linear space Q. For a E S and x G Q, define crx by crx(y) = сг(х + у), у E Cl. Then we define the topology induced by S to be the weak topology on Q determined by the family of functions {ax : x G Q, cr G S }. PROPOSITION 10.3 Let Cl be a linear space having the topology Ts induced by a family of seminorms S and let W denote the collection of subsets of Cl consisting of intersections of finitely many sets of the form B^(0). Then a) { x 4- W : x G Q, W G W } is a basis for Ts- b) (Cl, Ts) is a locally convex topological linear space. PROOF: It can be shown that W satisfies the conditions (a)-(d) of Propo- sition 10.2. The details are left to the reader as Exercise 10.31. EXAMPLE 10. 10 Topologies Induced by a Collection of Seminorms a) Let (Q, || ||) be a normed space. Then the topology induced by the single-element collection 5 = {|| ||} is the same as the topology induced by the norm || ||. b) Let Cl be a locally compact Hausdorff space. Then it follows from Exam- ple 10.9(b) and Proposition 10.3 that (C(fi),T(f2)) is a locally convex topological linear space. □
602 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces PROPOSITION 10.4 Let Q be a linear space with a topology induced by a collection of semi- norms S. Then the following hold: a) Q is Hausdorff if and only if x /0 => a{x) / 0 for some a eS. b) Suppose that is a net in Q. Then limrrt = x if and only if limcr(rrt — x) = 0 for each cr G 8. PROOF: a) Suppose fi is Hausdorff and let x / 0. Then, by Proposition 10.3, there is an e > 0 and а а € 5 such that x (0). Hence, cr(rr) > e. To prove the converse, let x and у be distinct elements of Q. Then, by assumption, there is an e > 0 and a G S such that cr(x — y) = e. It follows that x + B^2(0) and у + B^2(0) are disjoint open sets containing x and 2/, respectively. Hence, (Q,7s) is Hausdorff. b) By Proposition 7.13 on page 444, we know that limrrt = x if and only if limay(xi) = ay(x) f°r each У € П and G S. Suppose that limxt = x and let a G S, Then, setting у = — x, we get lim cr(xL — x) = lim cr_x (x t) = a_x(x) — a(0) = 0. Conversely, suppose that lim<r(a:t — x) = 0 for each a G S. Then, using condition (d) of Definition 10.3, we get . lim — cry(x) | < lim<r(a:t — x) = 0 for all у G fi and a G 5. Thus, limrrt = x. For topologies induced by collections of seminorms, there is a nice ana- logue of Proposition 9.1 on page 529. We present this as Proposition 10.5 and leave the proof to the reader as Exercise 10.32. PROPOSITION 10.5 Let Q and Л be linear spaces with the same scalar fields and having topolo- gies induced, respectively, by the collections of seminorms Si and S2> For a linear mapping T: Q —» A, the following are equivalent: a) T is continuous. b) T is continuous at 0. c) For each a G S2 there exist <ti,(T2, ... ,crn G <Si and a constant a such that cr(T(rr)) < amax{ crj(rr) : j = 1,2,... n } for all x G Q. We observe that if Л = C (with the usually topology), then the semi- norm a in Proposition 10.5 is just the modulus of a complex number.
10.3 Topological Linear Spaces □ 603 Linear Functionals and Separation by Hyperplanes Because continuous linear functionals play an important role in the the- ory of normed spaces, one might expect that they would also be signifi- cant in the theory of topological linear spaces. Surprisingly, though, there are naturally arising examples of topological linear spaces having no con- tinuous linear functionals other than the one that is identically 0. (See Exercise 10.41.) The situation becomes much more agreeable, however, if local convex- ity is assumed. We will devote the remainder of this section to showing that, in the locally convex case, there are an abundance of continuous lin- ear functionals; indeed there are enough to separate elements from closed convex subsets. DEFINITION 10.4 Internal Point, Support Function Let V be a linear space and A a convex subset of V. a) An element и G A is said to be an internal point of A if for each v G V, there is an e > 0 such that и + av € A for all scalars & such that |a| < 6. b) If 0 is an internal point of A, then the function зд defined on V by sa (v) = inf{ r : r-1v G A, r > 0 } is called the support function of A. EXAMPLE 10.11 Illustrates Definition 10.4 Let (fi, || ||) be a normed space. Then 0 is an internal point of Bi(0) and, as the reader is asked to verify in Exercise 10.34, «Bi(o) = II II- □ Our next proposition, Proposition 10.6, shows that support functions behave much like norms. PROPOSITION 10.6 Let V be a linear space and 0 an internal point of the convex set A. Then a) ед (av) = asx(v) for all v GV and а > 0. b) sa(v! + v2) < sa(^i) + $a(v2) for all Vi, v2 G V. c) { v : sA(v) < 1} С A C { v : sA(y} < 1}.
604 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces PROOF: We leave the proofs of (a) and (c) to the reader in Exercise 10.35. To prove (b), let ri and Г2 be positive numbers such that rf xvi, € A. As A is convex, we have that (ri 4- r2)-1(vi + v2) = (n/(ri + r2))rf + (r2/(rj + r2))r21v2 e A. Hence, + V2) < ri + 7*2- Taking infimums with respect to ri and 7*2, we obtain (b). We will use support functions to study separation by a hyperplane. To introduce that topic, let E and F be disjoint closed convex subsets of 1Z2. Then, as shown in Fig. 10.1, E and F can be separated by a line L in the following sense: Associated with L, there is a nontrivial linear functional t on 'R? and a real number a such that L = £“1({a}), E C £~1((—00, a]), and F C £~1([a, 00)). Note that sup{ t(y) :vtE}< inf{t(u) :u e F} is a necessary condition for such a separation. FIGURE 10.1 Separation by a hyperplane. Similarly, disjoint closed convex subsets of 7£3 can be separated by a plane. In what follows, we will generalize these two simple examples to locally convex topological linear spaces. THEOREM 10.8 Let V be a linear space with real scalars. Suppose that Ai and A2 are nonempty disjoint convex subsets of V and that Ai has an internal point. Then there is a nontrivial linear functional I on V such that sup{ £(v) : v e Ai } < inf{ £(u) : и E A? }. (10.18)
10.3 Topological Linear Spaces □ 605 PROOF: Let Vi be an internal point of Ai, V2 be any point of A2, and vo = V2 — Vi. Then it is easy to check that the set A = vQ 4- Ai - A2 is convex and contains 0 as an internal point. We define a linear functional £q on the subspace Vq = { avQ : a E К } by 4)(av0) = a and will show that £q is dominated on Vo by s^. Because Ai П A2 = 0, we have vq A. Hence, by Proposition 10.6, $a(vo) > 1. Using Proposition 10.6 again, we conclude that for a > 0, 4(»v0) = a < asA{vo) = $д(аи0). On the other hand, if a < 0, then £q(&vq) < sa(qvq) is trivially true. We can now invoke the Hahn-Banach theorem (page 580) to obtain a linear functional £ on V such that ^|y0 = £0 and €(v) < sa(v), v E V. Because £(yo) = 1, £ is not identically 0. Also, if v E Ai and и E A2, then vq + v — и E A and, so, by Proposition 10.6, 1 4- £(y) — £(u) = £{vq 4- v — u) < sa(vq 4- v - u) < 1. The inequality (10.18) now follows immediately. Next, we would like to prove a version of Theorem 10.3 (page 584) for topological linear spaces. To do so, we will need the following lemma. LEMMA 10.1 Let £ be a linear functional on the topological linear space fi. If there is an open set W containing 0 such that sup{ SR£(rr) : x E W } < 00, then £ is continuous. PROOF: It suffices to show that if xq E Q and 6 > 0, then there is an open set U containing 0 such that £(xq 4- U) C {z: |^(rr0) - z\ < e }. Let bo = sup{ 3W(x) : x E W }. Since 0 E W, it follows that bo > 0- Apply- ing Proposition 10.2(d), we choose a 6 > 0 and an open set О containing 0 such that aO C W whenever |q| < 6. Let b > bo and set U = eb~1 U aO. |Q|<«
606 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces Then U is open and eltU CU for all t G Let у eU. Selecting t so that |^(?/)| = elt£(y) we get |£(y)| = е«£(у) = Ш^у) = eb~19i£(e~1beity) < eb^bg < e. Hence, |£(xo) £(xo + У)\ < e- ® THEOREM 10.9 Let F be a nonempty closed convex subset of a locally convex topological linear space fi. If Xq G Fc, then there is a continuous linear functional £ on fi such that У1£(хо) < inf{ ?R£(rr) : x G F }. PROOF: The set — xq 4- F is convex and closed and does not contain 0. Since fi is locally convex, there is a convex open set U that contains 0 and is disjoint from —Xq + F. And because scalar multiplication is continuous, the point 0 is an internal point of U. Thus, by Theorem 10.8, there is a nontrivial linear functional £ on fi such that sup{ 3t£(u) : и G U } < inf{ 3W(v) : v G -xq + F }. Applying Lemma 10.1, we conclude that £ is continuous. Because 0 G U, we have sup{ У1£(и) : и G U } > 0. We claim that this inequality is strict. Suppose to the contrary and let z G fi. Then by choosing c > 0 small enough, we get eltez G U for all t G and, hence, < 0 for all t G 7£. Upon selecting t so that elt£(z) = |£(z)|, it follows that £(z) = 0. We have shown that £ is identically 0, a contradiction. From sup{ У1£(и) : и G U} > 0 and the previous displayed inequality, we conclude that inf{ 3t£(v) : v G — xq + F} > 0. Consequently, we have JW(rro) < inf{ 3t£(rr) : x G F }. Remaining consistent with the notation used for normed spaces, we will write fi* for the space of all continuous linear functionals on the topological linear space fi and (x, rr*) for x*(x) when x G fi and rr* G fi*. Theorem 10.9 shows that when fi is locally convex and Hausdorff, fi* has enough members to separate the points of fi. We refer the reader to Exercises 10.40-10.41 for an example of a topological linear space where the only continuous linear functional is identically zero. When the convex set in Theorem 10.9 is a linear subspace of fi, we have the following refinement, which is also a generalization of Theorem 10.3 on page 584.
10.3 Topological Linear Spaces □ 607 COROLLARY 10.5 Let К be a closed linear subspace of the locally convex topological linear space Q and let x G Kc. Then there is an x$ G such that (x,Xq) > 0. PROOF: By Theorem 10.9, there is a continuous linear functional xj such that x\) < inf{ x*) : у G K}. Let у € К and b > 0. Choose t so that ett{y1Xi) = |(з/,Because —be1* у G K, we have < Щ-Ьеиу,х$) = Щ-Ьеи{у,х^У) = -b\{y, zj>|. As b is an arbitrary positive number, it follows that (?/, xj) =0. If s is chosen so that els{x, x*) = |(x, x*) |, then the functional Xq = eisrc* satisfies the assertions of the corollary. The next corollary shows that the point xq in Theorem 10.9 can be replaced by a compact convex set disjoint from F. COROLLARY 10.6 Let fi be a locally convex topological linear space. Suppose that F and К are, respectively, nonempty closed and compact convex subsets of Q such that F A К = 0. Then there is an x* G Q* such that sup{ 3t{y, x*) : у G К } < inf {, x*) : x G F }. PROOF: Clearly F — К is convex; we claim that it is also closed. Let ~ be a net in F — К such that lim(rrt — yL) = z. Because К is compact, there is a subnet {yLrj}vtT that converges to an element у G K. It follows that is a net in F converging to z + У- Because F is closed, z + у G F; hence, z E F — K. We have now shown that F — К is closed. Because F and К are disjoint, 0 F — K. Hence, we can apply Theorem 10.9 to obtain a continuous linear functional x* such that 0 = (0, x*) < inf{ — 2/, x*) : x G F, у G K} = inf{ x*} : x G F } - sup{ 5?(т/, x*) :y G K}, as required. In the remaining two sections of this chapter, we will see many appli- cations of Theorem 10.9 and its corollaries. Here we content ourselves with the following simple examples.
608 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces EXAMPLE 10.12 Illustrates Theorem 10.9 a) Let F be a nonempty closed convex subset of the normed space Q. By Theorem 10.9, if 0 F, then there exists an rr* G Q* such that inf{ x*) : x G F } > 0. b) Let F = { f G C([0,1]) : SR/(t) > t}. As 0 £ F, part (a) guarantees the existence of an x* G'C([0,1])* such that inf{ : f G F } > 0. The continuous linear functional on С([0,1]) defined by {g,x*) = g(t)dt satisfies that condition. In fact, the infimum is 1/2. c) If we replace C([0,1]) by C(TZ) in part (b), but use the same F and x*, then we obtain an illustration of Theorem 10.9 for the case of a locally convex topological linear space that is not a normed space. □ EXERCISES 10.3 10.28 Let Q be a locally compact Hausdorff space. Prove that (C(Q),T(Q)) is a topological linear space. 10.29 Prove Proposition 10.1. 10.30 Prove Proposition 10.2. 10.31 Prove Proposition 10.3. 10.32 Prove Proposition 10.5. 10.33 Let U be an open convex subset of a topological linear space. Show that all points of U are internal points. 10.34 Show that in a normed space, the support function of the open unit ball around 0 is equal to the norm. 10.35 Prove (a) and (c) of Proposition 10.6. 10.36 Let £ be a linear functional on a topological linear space Q. Prove that the following conditions on £ are equivalent: a) £ is continuous. b) £ is continuous at some point of Q. c) sup{ SR£(u) : и G U } < oo for some nonempty open set U. d) inf{ Э1£(и) : и G U } > —oo for some nonempty open set U. e) sup{ |£(u)| : и G U } < oo for some nonempty open set U. 10.37 Show that Corollary 10.6 fails if the compactness assumption on К is replaced by the assumption that К is closed. 10.38 Let A and В be subsets of a topological linear space Q. Show that if A is closed and В is compact, then A 4- В is closed. 10.39 Show that if a topological linear space is locally convex and Ti (i.e., single- element subsets are closed), then it is Hausdorff.
10.4 Weak and Weak* Topologies □' 609 10.40 Consider the space £p([0,1]), where 0 < p < 1. By Exercise 9.54 on page 562, the function pp defined by Pp(/,p) = ap(/-p)= [ \f - g\p dX, /,p 6 £p([0,l]) ./[0,1] is a metric on £p([0,1]). Show that £p([0,1]) is a topological linear space with respect to the topology induced by pp. 10.41 Refer to Exercise 10.40. a) Show that (£p([0,1]))* contains only the functional that is identically 0. b) Deduce that £p([0,1]) is not locally convex when 0 < p < 1. In Exercises 10.42-10.44, C°°(7£) denotes the space of complex-valued functions having derivatives of all orders at each point of 1Z. For nonnegative integers n and m, define a„,m(/)=sup{|t|n|/(ra)(t)| /€C°°(7£). We will consider the space S(7£) = { / б С°°(7г) : an,m(/) < oo, n, m = 0,1,2,...}. 10.42 Let the notation be as in the previous paragraph. a) Show that S(1Z) is a linear space. b) Show that functions of the form p(x)e~x , where p(x) is a polynomial, belong to . 10.43 Let the notation be as in the foregoing. a) Show that { ап,тп : n, m — 0,1,... } is a family of seminorms inducing a Hausdorff topology on S(7£). b) Show that the linear operators D(f) = f and M(/) = p/, where p is a polynomial, are continuous with respect to this topology. 10.44 Call a subset J7 C bounded if sup{ crn,m(/) : f e T} < 00 for each pair of nonnegative integers n and m. Let S(7£) have the topology defined in Exercise 10.43. Prove the following version of the Heine-Borel theorem: T7 C S(K) is compact if and only if it is closed and bounded. 10.4 WEAK AND WEAK* TOPOLOGIES In this section we will introduce and discuss the main properties of the weak topology on a normed space and the weak* topology on its dual space. Included will be an investigation of weak convergence and weak
610 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces boundedness of sequences in a normed space and an important result about weak* compactness. Let t be a linear functional on a linear space V. Then, as we have seen, |^| is a seminorm on V. Therefore, according to Proposition 10.3 on page 601, if T is a family of linear functionals on V, the collection of seminorms { |£| : I e T } induces a locally convex topology on V. The cases where V is a normed space or its dual are important enough to warrant a special definition. DEFINITION 10.5 Weak and Weak* Topologies Let fi be normed space. For each x 6 fi, define the linear functional tx on fi* by €x(x*) — x*(x) = (x,x*). Then we use the following terminology: • Weak topology: the topology on fi induced by the collection of seminorms { |x*| : x* 6 fi* }. • Weak* topology: the topology on fi* induced by the collection of seminorms { |£ж | : x 6 fi }. Because we work with the norm topologies on fi and fi* as well as the weak and weak* topologies, it is useful to have a convenient way to distinguish these topologies. When no modifier is used, we assume that the topology is the norm topology. So, for instance, when we say that a function is continuous, we mean that it is continuous with respect to the norm topology, and when we say that a set is closed, we mean that it is closed with respect to the norm topology. On the other hand, we will employ the words weak and weakly to indicate “with respect to the weak topology” and, similarly, use the term weak* to indicate “with respect to the weak* topology.” Thus, for example, a function that is continuous with respect to the weak* topology is called weak* continuous and a set that is closed in the weak topology is called weak closed or weakly closed. Let {xL}Lei and be nets in fi and fi*, respectively. Then we use the notation wlimxt = x and w*lima;* = x* (10.19) to denote, respectively, that i weak converges (or converges weakly) to x and {x*}t€; weak* converges to x*. We observe that (10.19) holds if
10.4 Weak and Weak* Topologies □ 611 and Only if lim(xt,x*) = x* 6 Q*, and lim(x,x*) = (x,x*), x e Q, respectively. EXAMPLE 10.13 Illustrates Definition 10.5 a) Let H be a Hilbert space. By Theorem 9.8 on page 551, each element of H* is of the form (•,?/) for some у G H. Hence, by associating у with (•,?/), we can identify W with its dual space. A consequence of this identification is that the weak and weak* topologies of a Hilbert space coincide. Now suppose H contains an infinite orthonormal sequence Then, by Bessel’s inequality, we have en)|2 < INI2 < oo for each x e H and, so, lim^oo (x, en) = 0 for each x G W. It follows that wlimen = 0; but, on the other hand, because ||en|| = 1 for all n, the sequence {en}^L1 cannot converge to 0 with respect to the norm topology on H. This simple example shows that the norm topology on a normed space can be strictly stronger than the weak topology. Indeed, as the reader is asked to verify in Exercise 10.48, the weak (weak*) topology coincides with the norm topology on Q (Q*) if and only if Q is finite dimensional. b) Suppose that fi is a compact Hausdorff space. In view of the Riesz representation theorem (Theorem 9.15 on page 574), we know that the dual space of C(Q) can be identified with the space M(Q) of regular Borel measures on fi. Thus, for a sequence {/n}Xi °f we ^ave wlim fn — f if and only if lim / fnd/j,— I f dfj,, /xEM(fi). ' Jn Jn And, for a sequence {Mn}^=i of Af(Q), we have w*lim/zn = Д if and only if lim [ f dfin = [ fdn, fe C(fi). n~*°°Jn Jn c) Suppose that fi is a locally compact Hausdorff space. In view of the Riesz representation theorem (Theorem 9.16 on page 575), we know
612 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces that the dual space of Со(П) can be identified with the space M(Q) of regular Borel measures on Q. Thus, the same results hold as in part (b) provided we replace C(Q) by СЬ(П). d) A sequence {A'n}^=1 of random variables (not necessarily defined on the same probability space) is said to converge in distribution to the random variable X if the sequence {мхп}Х1 converges to px in the weak* topology. We write Xn X to indicate convergence in distribution. Thus, we have Xn X if and only if lim [ fdpXn = [ fdpx, ft CQ(1l). n~*°° Jn Jn Actually, it can be shown that the previous limit holds for all f G Съ(Т1)> (See Exercise 10.59). An equivalent condition for Xn X is that for each x G TZ at which Fx is continuous, Fxn{x) —> Fx(z) as n —* oo. The reader is asked to verify this in Exercise 10.61. In that exercise, we also ask the reader to show that convergence in probability implies convergence in distribution. e) We will show that a familiar example of convergence is really weak* convergence in disguise. Consider the sequence of measures in M([0,1]) defined by /in = (1/n) Then, for each f G C([0,1]), we have lim / f dpn = lim (l/n)Y2/(j/n) = / f(x)dx. n->°° Jo It follows that the sequence {^n}^=i converges in the weak* topology to Lebesgue measure on [0,1]. □ Our next theorem provides some fundamental properties of the weak topology on a normed space. THEOREM 10.10 Let Q be a normed space. a) With respect to the weak topology, fi is a locally convex Hausdorff topological linear space. b) Weakly closed subsets of fi are closed. c) Convex closed subsets of Q are weakly closed. PROOF: a) By definition, the weak topology is induced by the collection of semi- norms {|rc*| : x* G Q* }. Hence, it follows from Proposition 10.3 on
10.4 Weak and Weak* Topologies □ 613 page 601 that fi is locally convex with respect to the weak topology. That the weak topology is Hausdorff follows from Corollary 10.2 on page 585 and Proposition 10.4 on page 602. b) This result follows immediately because the weak topology on fi is weaker than the norm topology on fi. c) Let F be a nonempty closed convex subset of fi. We will prove that Fc is weakly open by showing that for each r E Fc, there is a weakly open set W such that X e W C Fc. (10.20) By Theorem 10.9 on page 606, there exists an x* G fi* such that 5R(x,x*) < d = inf{ Э£(з/, x*) : у 6 F}. It follows that the weakly open set W = {w G fi : 5R(w, x*) < d} satisfies (10.20). Bounded and Weakly Bounded Sets A subset E of a normed space fi is said to be bounded if sup{ ||z|| : x G E} < oo. Our next definition provides a less restrictive notion of boundedness. DEFINITION 10.6 Weakly Bounded Set A subset E of a normed space fi is said to be weakly bounded if sup{ |(x, x*}| : x e E} < oo for each x* G fi* . The inequality |(x,x*)| < ||x*||*||z|| implies that bounded sets are weakly bounded. Although not obvious, it is nevertheless true that weakly bounded sets are bounded. THEOREM 10.11 A subset of a normed space is weakly bounded if and only if it is bounded.
614 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces PROOF: We have already established sufficiency. To prove necessity, we first note that each x G E determines a continuous linear functional ix on Q* via tx(x*) — (x,x*). Because E is weakly bounded, sup{ |£r(x*)| ' x e E} = sup{ | {x, x*)| : x e E} < oo for each x* G Q*. Recalling that Q* is a Banach space, we apply the uniform boundedness principle for linear operators (page 595) to conclude that sup{ III^HI : x G E} < oo. But, by Corollary 10.2 on page 585, we know that III4IH = ||x||. EXAMPLE 10.14 Illustrates Theorem 10.11 In this example we will use Theorem 10.11 to characterize the weakly con- vergent sequences in the space C(Q), where Q is a compact Hausdorff space. Specifically, we will show that a sequence of functions in С(Г2) converges weakly if and only if it converges pointwise and is uniformly bounded. Suppose that {/n}^Li is a sequence in C(Q) converging weakly to f. Then lim I fndp = I f dp (10.21) n“>°° Jq Jq for each p G M(Q). Setting p = 6X, we obtain lim fn(x) = /(x), xGfi. (10.22) n—»oo Also, since weakly convergent sequences are weakly bounded, it follows from Theorem 10.11 that sup{ ll/nlln : n GAf} < oo. (10.23) Thus, (10.22) and (10.23) are necessary conditions for the weak convergence of {/n}~ 1 to f. Next, we show that (10.22) and (10.23) together are sufficient condi- tions for the weak convergence of to f. Let p G M(Q). Then, because of (10.22), (10.23), and |/i|(Q) < oo, we can apply the Lebesgue dominated convergence theorem to obtain (10.21). □ Compactness in the Weak* Topology One of the most important properties of the weak* topology is that the closed unit ball is always weak* compact. This famous result is known as Alaoglu’s theorem.
10.4 Weak and Weak* Topologies □ 615 THEOREM 10.12 Alaoglu’s Theorem In the dual space fi* of a normed space fi, the closed unit ball, B,(0) = {x* G П* : ||x*||» < 1}, is weak* compact. PROOF: In Exercise 10.5 on page 589, we introduced the linear operator J : fi -* fi** defined by = {x, x*). The relative weak* topology on B1(0) is just the weak topology induced by the family F of restrictions of functions in J(fi). We will establish the theorem by showing that the family satisfies the hypotheses of Corollary 8.2 on page 511. If x e fi and x* 6 Bj(0), then |J(x)(x*)| — |(x,x*)| < ||x||. Con- sequently, 7(х)(Вх(0)) is a compact subset of C for each x E fi. Thus, T satisfies condition (a) of Corollary 8.2. If x* and x% are distinct ele- ments of B^O), then for some у 6 fi, •Л1/)С4) = + (y,xt,) = J(y)(x*2). Hence, condition (b) of Corollary 8.2 holds. To verify condition (c) of Corollary 8.2, let be a net in Вг(0) such that lim (x, x*) = £{x) exists for each x 6 fi. Whereas t{ax + /Зу) = lim {ax 4- /Зу, z*) = lim(a(x, x*) + /3{y, x*}) = at(x) + /?€(y) and |^(x)| = lim|(x,x*)\ < ||x||, it follows that t is a continuous linear functional on fi with norm at most 1. Furthermore, t is the weak* limit of the net {x*}t€j. Hence, condition (c) of Corollary 8.2 is satisfied. COROLLARY 10.7 Every bounded net in the dual space fi* of a normed space fi has a weak* convergent subnet. f PROOF: A bounded net is contained in a closed ball B*(0) for sufficiently large r. Since B*(0) = rBj(O), it follows that B*(0) is also weak* compact. An application of Theorem 7.10 on page 471 completes the proof. In practice, Corollary 10.7 is often applied in an effort to obtain a weak* convergent subsequence of a bounded sequence. But unless we know that B1(0) is metrizable with respect to the weak* topology, all that we can assert is that a bounded sequence has a weak* convergent subnet. Our next theorem shows that Bx(0) is weak* metrizable if fi is separable.
616 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces THEOREM 10.13 Let_£2 be a separable normed space. Then the following hold: а) Вг(0) is weak* metrizable. b) Every bounded sequence in fi* has a weak* convergent subsequence. PROOF: We note, in view of Alaoglu’s theorem and Theorem 7.7 on page 466, that (b) follows from (a). We now outline the proof of (a), leaving the details to the reader in Exercise 10.54. Let {^n}^=1 be a sequence that is dense in the closed ball Bi(0). We define a metric p on Br (0) by oo p(x*,y') = 522~n|(a:n,a:*) - (xn,y*)\. n=l For each x*, the function p(x*,-) is weak* continuous. It follows that B£(x*) is weak* open for each r > 0 and x* 6 Вг(0). Thus, the topology induced by p is weaker than the relative weak* topology. Since is weak* compact, we conclude from Corollary 7.8 on page 473 that the topol- ogy induced by p coincides with the weak* topology. EXAMPLE 10.15 Illustrates Theorem 10.13 Let д 6 £2([0,1] x [0,1]) and define L: Г2([0,1]) - £2([0,1]) by rl Ь(/)(х)=/ g(x,y)f(y)dy. Jo Then L is a linear operator and satisfies / r1 r1 \ V2 l'lLll'“\/o Jo ^x,y^2dxdy) • We will prove that there is an fo € £2([0,1]) such that ||L(/b)||2 = |||L|||- Let {/n}Xi be a sequence with ||/n||2 < 1 and lim^» ||L(fn)||2 = |||Ь|||. By Theorem 10.13, there is a subsequence {/nj }J^i converging weakly to some /о- Hence, lim L(Jn)(x) = lim {fn ,g(x, )) = {f0,g(x,-)) = L(f0)(x) J—ЮО j—»oo for almost all x. By Cauchy’s inequality, |L(/nJ(x)|2 < / \g(x,y)\2dy. Jo Thus, the Lebesgue dominated convergence theorem implies that Pill2 = lim ||L(/nJ||i = l|L(/o)||i, as required. □
10.4 Weak and Weak* Topologies □ 617 EXERCISES 10.4 10.45 Let V be a linear space and P a family of linear functionals on V. Then the collection of seminorms 5 = {|f| : € 7} induces a locally convex topology Ts on V. Show that Ts is the same as the weak topology Tjr determined by the family J7 in the sense of Definition 7.10 on page 428. 10.46 Construct a sequence in C([0,1]) such that \\fn||[o,i] = 1 and wlim/n = 0. 10.47 Let fi be an infinite dimensional normed space. a) Show that if W is a weak open set containing 0, then W contains an infinite dimensional linear subspace of fi. Hint: Consider the linear mapping L: fi —► Cn defined by L(x) = ((ж, xj), (x, xj),..., (x, x„)) for appropriate linear functionals ж J, xj, ..., E fi*. b) Show that if U is a weak* open set containing 0, then U contains an infi- nite dimensional linear subspace of fi*. Hint: Consider an appropriate analogue of the hint for part (a). 10.48 Use Exercise 10.47 to prove the following facts. a) The norm and weak topologies are equal only for finite dimensional spaces. b) The norm and weak* topologies are equal only for finite dimensional spaces. 10.49 Consider the normed space £1. a) Prove that if a sequence converges weakly, then it converges in the norm, b) Deduce from part (a) and Exercise 10.48 that with the weak topology is not metrizable. 10.50 In the space £2, let en denote the sequence which is 1 at the nth position and 0 elsewhere, and set E = { en + nem : m > n > 1}. a) Show that E is closed. b) Show that 0 is in the weak closure of E. Deduce that there is a net in E converging weakly to 0. c) Show that there is no sequence in E that is weakly convergent to 0. d) Deduce from parts (b) and (c) that £2 with the weak topology is not metrizable. it 10.51 Show that if fi is a compact or locally compact Hausdorff space, then M+(fi) is weak* closed. 10.52 Suppose that fi is a compact or locally compact Hausdorff space and that M(fi) is endowed with the weak* topology. Let A: fi —► M(fi) be defined by A (a;) = 6X. Prove that A is continuous. it 10.53 Let fi be a Hausdorff space and set P(fi) = { д E M+(fi) : /i(fi) = 1}, the collection of probability measures in M(fi). a) Show that if fi is compact, then P(fi) is weak* compact. b) Show that if fi is locally compact but not compact, then P(fi) is not weak* compact. 10.54 Provide the details of the proof of Theorem 10.13 on page 616.
618 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces 10.55 Prove that in a separable Hilbert space, every bounded sequence has a weakly convergent subsequence. 10.56 Consider the space £p([a, b]), where 1 < p < oo. Prove that every bounded sequence has a weakly convergent subsequence. 10.57 Refer to Example 10.15 on page 616. a) Show that L maps £2 ([0,1]) into £2([0,1]). b) Verify that / y»l /•! \ !/2 M< (J J \g(x,y)\2dxdy) . 10.58 Recall that Co(7^)* = M(7£). Let be a sequence in M+(7£) and p G M+(H). a) Show that if sup{ /zn(7£) : n eV} < oo and limn—oo НЦп(х) = FM(x) at every x where FM is continuous, then w*lim^n = b) Show by example that the converse of the statement in part (a) is false, c) Show that if w*lim jzn = p and limx_>_oo sup{ F^n (x) : n G JV} = 0, then limn->ooF^n(x) = F^x) at every x where FM is continuous. +10.59 Let be a sequence in M+(1Z). Recall that w*lim^n = p means that j^fdpn —* fnf dp for each f G Co (11). Show that if w*limjxn = P and pn(1l) —> p(H), then fnfdpn —► f^fdp for each f G Cb(1Z). Hint: First consider the case where 0 < f < 1. 10.60 Suppose that {Fn}^ is a sequence of distribution functions on 1Z such that sup{ Fn(oo) : n G .V} < oo and lim^-oo sup{ Fn(x) : n G A/"} = 0. Show that there is a subsequence {Fnfc}^=1 and a distribution function F such that limfc_oo Fnfc (x) = F(x) at every x where F is continuous. This result is a version of what is known as Helly’s selection principle. Hint: Observe that Co(1Z) is separable. Use Exercise 10.58(c). 10.61 Refer to Example 10.13(d) on page 612. Let X be a random variable and a sequence of random variables. a) Show that Xn -4 X if and only if Fxn(x) —► Fx(x) at every x where Fx is continuous. Hint: Use Exercises 10.58 and 10.59. b) Suppose {Xn}“=1 and X are all defined on the same probability space. Prove that if {X}Xi converges to X in probability, then Xn -4 X. Hint: Use the fact that functions in Co(1Z) are uniformly continuous and Theorem 6.17 on page 404. 10.5 COMPACT CONVEX SETS In this section, we will study subsets of locally convex topological linear spaces that are both convex and compact. We will prove the Krein-Milman theorem, a result that describes how compact convex sets are generated by
10.5 Compact Convex Sets □ 619 their irreducible elements. Additionally, we will give an application of the Krein-Milman theorem to the trigonometric moment problem. To begin, we introduce some simple geometric ideas. Let vi, v2, ..., vn be elements of a linear space and ai, o2, ..., an nonnegative scalars that sum to 1. Then the sum v = Q1vi + a2v2 4-----h OLnvn (10.24) is called a convex combination of the v^s. When n = 2, we see that v must lie on the line segment connecting and v2. Thus, a set is convex if it contains all convex combinations of any two of its elements. It is not hard to show that a convex set contains all convex combinations of any finite subset of its elements, (See Exercise 10.62.) Some convex combinations are trivial, such as v = ----hanv or v = lv + 0v2 + • • • + 0vn. We say that a convex combination of the form (10.24) is proper if there are at least two distinct indices i and j such that ai and aj are positive, and either v / or v / Vj. An element of a convex set C is either a proper convex combination of elements of C or it is not. The latter case, where the element is “irreducible” with respect to convex combinations, is important enough to warrant the following definition. DEFINITION 10.7 Extreme Point An element of a convex set C is called an extreme point of C if it is not a proper convex combination of elements of C. We use ex C to denote the set of all extreme points of C. There are various useful equivalent conditions for an element of a con- vex set to be an extreme point. In describing these conditions, we let [u, v] denote the closed line segment {(1 — a)u 4- av : a G [0,1]} and (u, v) denote the open line segment { (1 — a)u + av : a e (0,1) }. We leave it to the reader» to show that each of the following conditions is equivalent to x being an extreme point of the convex set C. • If x G (u, v), where u, v G C, then x = и = v. • If x G [u, v], where G C, then x = и or x = v. Closely related to the concept of an extreme point is that of a face, as described in our next definition.
620 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces DEFINITION 10.8 Face of a Convex Set Let C be a convex set. A set F с C is said to be a face of C if it satisfies the following two conditions: a) F is convex. b) au + (1 — a)v G F, where 0 < a < 1 and u, v G C, implies u, v G F. EXAMPLE 10.16 Illustrates Definitions 10.7 and 10.8 Figure 10.2 shows a closed triangular region T in the plane, 112. The extreme points of the region are the vertices, pi, p2, and рз; in sym- bols exT = {Р1,Р2,Рз}« The single element sets {pi}, {рг}» {рз}, the edges [Р1,рг], [р2,Рз], [рз,Р1], the set T itself, and (vacuously) the empty set are faces of T. □ FIGURE 10.2 A triangular region. We observe that in Example 10.16, every element of the triangular region T is a convex combination of the extreme points. As we will see shortly, this is close to being typical of compact convex subsets of Hausdorff locally convex topological linear spaces in that every element of such a set is “approximately” a convex combination of the extreme points. Let A be a subset of a linear space. The set of all possible convex combinations of elements of A is called the convex hull of A and is
10.5 Compact Convex Sets □ 621 denoted cov(A). Referring to Example 10.16, we see that the line seg- ment [pi,рг] is the convex hull of {pi,P2} and that the triangular region T is the convex hull of {Р1,Р2,Рз}- ______ In topological linear spaces, we write cov(A) for cov(A) and call this set the closed convex hull of A. Proposition 10.7 provides some basic properties of the convex hull and closed convex hull. PROPOSITION 10.7 Let fl be a topological linear space and А, В C Q. Then the following hold: a) covA is convex. b) If В is convex and Ac B, then cov(A) С B. c) If В is convex, then В is convex. d) If В is closed and convex and A С B, then cov(A) С В. e) If fl is locally convex, then cov(A) = { x e fl: $i{x,x*) < sup{ %l{y, x*) : у € A} for all x* € Q* }. PROOF: We leave the proofs of parts (a), (b), (c), and (d) to the reader as Exercise 10.64. To prove part (e), we let, for each x* G Q*, b(x*) = sup{ Щу,x*) : у G A }, and Hx. = { x e fl: %t{x, x*) < b(x*)}. We must show that cov(A) = Hx*. Each Hx* is closed, because x* is continuous, and is convex, because x* is linear. It follows from (d) that cov(A) C Hx* for each x* G fl*. Therefore, we have cov(A) C Hx*- To prove the reverse containment, suppose that x0 cov(A). By Theorem 10.9 on page 606, there is an x* G Q* such that й(х0, Xi) < inf{ У1(у, Xi) :y G cov(A) }. Letting xj — — xj, we obtain b(xo) < sup{SR(2/,Xo) : у G cov(A)} < 3J(xo,Zq). Hence, xq Hx*. We note the following consequences of Proposition 10.7. • cov(A) is the smallest convex set containing A. • cov(A) is the intersection of all closed half-spaces containing A.
622 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces The Krein-Milman Theorem We axe now ready to prove the main result of this section—the Krein- Milman theorem. THEOREM 10.14 Krein-Milman Theorem Let Cl be a Hausdorff locally convex topological linear space. If К is a nonempty compact convex subset of Cl, then К = cov(exK). In particular, we have that ex К PROOF: It suffices to consider the case of real scalars. Let denote the collection of nonempty closed faces of К. Then the following assertions hold, as the reader is asked to verify in Exercise 10.65: F C F(K) and Qf/0 p| Fe F{K) (10.25) FCF F6F FgF(K) F{F)cF(K) (10.26) FeF(K) => exFCexK (10.27) i* G Q* {xeK:{x,x*} = inf x*(K) } G (10.28) The collection is partially ordered by reverse inclusion D. If J7 is a chain in then, because F has the finite intersection property, the intersection F\ = nF€j-F is nonempty; hence, by (10.25), F\ G F(K). Since F D Fi for each F G F, we see that F has a D-upper bound. Thus, by Zorn’s lemma, there is a D-maximal nonempty closed face Fq. We claim that Fo has only one element. Suppose to the contrary. Then Theorem 10.9 (page 606) implies the existence of an x* G Q* that is nonconstant on Fo. It follows from (10.26) and (10.28) that the set {x e Fq : (x,x*) = infrr*(Fo)} is a nonempty closed face of К that is properly contained in Fo. This contradiction shows that Fo = {z} for some x G K. Because Fo is a face, x must be an extreme point of K. Each F G F(K) is also a compact convex set and, consequently, by the preceding argument, exF / 0. It follows from (10.27) that each F G F(K) contains an extreme point of К. We are now in a position to show that К = cov(exK). Since К is closed and convex, we have cov(exK) с K. To prove the reverse in- clusion, suppose that К \ cov(exK) / 0. Then Theorem 10.9 implies that there is a y* G Q* such that inf?/*(K) < inf 2/*(cov(exK)). Hence, { x G К : (x, y*) = inf y*(K)} is a nonempty closed face of К that is dis- joint from ex K. Since every nonempty closed face of К contains an extreme point of K, we have reached a contradiction. Thus, К = cov(exK).
10.5 Compact Convex Sets □ 623 Some of the most important applications of the Krein-Milman theorem involve optimization of linear functionals. Corollary 10.8 provides some particulars. COROLLARY 10.8 Suppose that Q is a Hausdorff locally convex topological linear space and that x* G Q*. If К is a nonempty compact convex subset of fl, then there exist Х1,Ж2,хз G ex К such that a) K(zi,z*) = inf{SR(i/,z*) :yeK}, Ъ) Щх2,х*) = sup{^y,x*) : у e K}, c) |(x3,x*)| =sup{|(y,x*)| : у € К}. PROOF: We will prove part (a) and leave parts (b) and (c) for the reader as Exercise 10.67. Let а = inf{ 3J(t/, z*) : у G K}. According to (10.28), the set F = { x € К : 3?(z, z*) = a } is a nonempty closed face of K. Thus, the Krein-Milman theorem implies that exF / 0. Applying (10.27), we conclude that F contains at least one element of ex K, EXAMPLE 10.17 Illustrates the Krein-Milman Theorem a) Suppose that К is a compact convex subset of Rn. Given a real-valued continuous function д on A, it is often a difficult problem to find the maximum value of g. In case g is linear, however, this problem is sim- plified by the relation supg(K) = supg(exK). For example, if К is the triangle in Fig. 10.2, then the maximum on К of a real-valued linear functional is attained at one of the points of {р1,Р2,Рз}- This prop- erty of linear functionals on finite-dimensional compact convex sets is fundamental to the subject of linear programming. b) Let Z) = {zeC:|z|<l} and H(D) the collection of analytic functions on D, equipped with the relative topology inherited from C'(P). Each function f G H(D) has a power series expansion f(z) — 52Xo an(/)zn. As the reader is asked to show in Exercise 10.68, the coefficients an(/) define continuous linear functionals on H(D). Hence, if К is a compact convex subset of H(D) and n G AT, then there exists a g G ex К such that |an(p)| = sup{|an(/)| : f G K}. This observation is useful in complex-variable theory. □ Next we give a measure-theoretic version of the Krein-Milman theo- rem. To begin, we define the concept of a representing measure.
624 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces DEFINITION 10.9 Representing Measure Let К be a compact convex subset of a topological linear space fl. A measure p e M(K) is said to be a representing measure for the element x if p is nonnegative, p(K) = 1, and {x,x*)= / <2/,x‘)dM(y) Jk for each x* € fl*. It is easy to see that in a Hausdorff topological linear space, x can be written as a convex combination x = ai^i 4- «2^2 H-----h anxn if and only if the measure M = + o^2^x2 4- • • • 4- an6Xn is a representing measure for x. This suggests that a representing measure is a kind of generalized convex combination. As we will now see, the Krein- Milman theorem shows that every element of a compact convex subset of a Hausdorff locally convex topological linear space is, in that sense, a generalized convex combination of extreme points. THEOREM 10.15 Krein-Milman Theorem (Measure-Theoretic Version) Let fl be a Hausdorff locally convex topological linear space. If К is a nonempty compact convex subset ofQ, then each x e К has a representing measure p such that p(K\exK) =0. (10.29) PROOF: Let x 6 К. It follows from our first version of the Krein-Milman theorem (Theorem 10.14) that x is the limit of a net {xJieJ contained in cov(exK). Now each xL is a convex combination = 4* Oft>2*^L,2 4- * • • 4- , where xt)i,xtj2,... , xtjn<. € ex A". Hence, xL has the representing measure Mt = 4- «/,,2^X2,^ 4- • • • 4- •
10.5 Compact Convex Sets □ 625 By Alaoglu’s theorem (page 615), the net of measures {pL}Lei has a sub- net which is weak* convergent to a measure д in M(K). And, by Exercise 10.53 on page 617, p is a probability measure. That д is a representing measure for x follows from the fact that (x,x*) = = lim / (y,x*)dnt(y) = / {y,x*) jk Jk for each x* e Q*. It remains to verify (10.29). By the regularity of p, it is enough to show that p(F) = 0 whenever F is a compact subset of К \ ex K. We apply Urysohn’s lemma to obtain a continuous function g:K —> [0,1] such that g vanishes on ex К and is constantly 1 on F. Since each of the measures pLrt satisfies (10.29), it follows that /z(F) < JKgdp = lim fK g dpLr} =0. Remark: We emphasize that the essential point of Theorem 10.15 is not simply the existence of a representing measure for each point of AT, but the existence of a representing measure concentrated on ex AT, in the sense of (10.29). The Trigonometric Moment Problem The remainder of this section is devoted to a rather elaborate application of the measure-theoretic version of the Krein-Milman theorem. We will use that theorem to solve the classical trigonometric moment problem:^ Given a doubly infinite sequence of complex numbers {/(n)}^L_oo, find necessary and sufficient conditions for the existence of a mea- sure p such that f(n) = f einldp(t\ neZ. (10.30) J[0,27r) In what follows, we assume that the space £°°(Z) of bounded sequences is equipped with the weak* topology, where we recall that €°°(Z) is the dual space of ^(Z). (Refer to Exercise 9.61 on page 562.) t See “Moments in Mathematics,” H.J. Landau (ed.), Proceedings of Symposia in Applied Mathematics, 37 (American Mathematical Society, 1987), for an introduction to the voluminous literature on problems of this type.
626 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces We first derive a necessary condition for the existence of a measure ц satisfying (10.30). Consider the complex numbers A-n? A -n+i> • • • ? A-i, Aq, Ai,..., An. It follows from (10.30) that j,k=—n I°’27r) n 2 dp(t). Hence, we see that a necessary condition for the existence of a measure p satisfying (10.30) is that 1 fc) > 0, j,k=—n {Aj}"__n с C, neX. (10.31) Sequences satisfying (10.31) are called nonnegative definite. So, we have proved the necessity part of the following theorem. THEOREM 10.16 Let {/(n)}n^=-oo a doubly infinite sequence of complex numbers. A necessary and sufficient conditions for the existence of a measure p such that f(n) = [ eintd[i(t), net, ./[0,27г) is that {/Wj^L-oo is nonnegative definite. To establish sufficiency in Theorem 10.16, we first introduce some no- tation. Let D and Dx denote, respectively, the collection of nonnegative definite sequences and the collection of nonnegative definite sequences that аге 1 at the Oth position. The following basic properties of D and Dx are left to the reader as Exercise 10.72. (Pl) If f e D, then |/(n)| < /(0) for all n e Z. (P2) If / G ID, then = /(n) for all n G Z. (P3) f G D and а > 0 => af G D.
10.5 Compact Convex Sets □ 627 (P4) If w G C with |w| = 1, then the sequence defined by ew(n) = wn is in Pi. (P5) If w G C with |w| = 1, then ew G ex Pi. (P6) D is a weak* closed subset of €°°(Z). (P7) Di is a convex, weak* compact subset of €°°(Z). (P8) If T = { z G C : |z| = 1} and D\ has the weak* topology, then the function E\ T —> Di defined by E(w) = ew is continuous. We note that (Pl) and (P3) imply that every element of D is a nonneg- ative scalar multiple of an element of D±. Therefore, we need only establish sufficiency in Theorem 10.16 for elements of Pi. The crucial assertion for what follows is that ex A = E(T). (10.32) Suppose, for the moment, that we have established (10.32), and let f G Pi. By (P8) and (10.32), ex Pi is weak* closed. Applying Theorem 10.15, we obtain a representing measure Vf for f such that i//(Pi \exPi) =0. Thus, we have Cf) = [ JexDi for each weak* continuous linear functional £. Define Ё: [0,2тг) ex Pi by E[f) = E(ezt). Because Ё is continuous, one-to-one, and onto, the set function д defined by //(A) = i//(E(A)), A G B([0,27г)), is a measure. It now follows from Theorem 6.17 on page 404 that J[0,2tt) If, for each n G Z, we apply the previous equation to the linear functional that evaluates sequences at the integer n, we obtain f(ri) = f E(t)(n)d/z(t) = f elnt diift), neZ, ./[0,27г) J[0,2tt) as required. It remains to show that (10.32) is valid. In view of (P5), we can conclude that E(T) C ex Pi, but the reverse inclusion is more difficult to
628 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces prove. Let f G ex£>i. For a fixed but arbitrary integer m, we define the sequence ft(fc) = i(/(fc 4- - f(k - к G Z. We claim that f±h/2eD1. (10.33) By (P3) and the fact that h(0) = 0, we see that proving (10.33) is equivalent to showing that 2/ ± h e D. Let n G Af and {Aj}j=_n С C. Setting Aj = 0 for |J| > n, the condition for nonnegative definiteness can be expressed as £ Aj A?(2/(j - fc) ± h(j - fc)) > 0, (10.34) j,k where Y^jtk indicates summation over all integers j and fc. Let S denote the left-hand side of (10.34) ahd note that S is real since h(fc) =* h(—fc) for all к G Z. Then, using (10.31), (Pl), and (P2), we obtain s>£AjW(j-fc) ± 52 ’WW /0’ - fc+m) _ (10.35) T 52 /(m) WO’ - fc - m) +52 /(m)Aj/(rn)Afc /0' - fc)- 5’Л On the right-hand side of (10.35), we replace fc by fc + m in the second sum, 3 by j 4- m in the third sum, and both j by j 4- m and fc by fc 4- m in the fourth sum to obtain s > 52 AjW(j - fc) ± 52 iXjXk+тЯт) № - k) j,k j,k T^iXj+mf(m)X^f(j - fc) + 52/(m)Aj+m/(m)Afc+»n/0 ~ fc) j,k = ^(Aj -F ^Aj_|_Tn/(772))(Afc 4= iAfc+Tn/(m)/(J — fc).
10.5 Compact Convex Sets □ 629 Because f is nonnegative definite, it follows that S > 0. Therefore, 2f ± h is nonnegative definite and, so, (10.33) holds Now, we have / = (1/2)(/+ Л/2) 4-(l/2)(/— h/2). Because f e exZ?i, it follows that h = 0 and, therefore, /(fc + m)/(—m) = /(fc — m)/(m), fc, m € Z. (10.36) We will use (10.36) to show that / € E(T). First we observe that there is a smallest positive integer n such that /(n) / 0. For otherwise, it follows from (P2) that /(fc) = r 1, 10, if fc = 0; if fc / 0. But, as the reader is asked to verify in Exercise 10.73, this sequence is not an extreme point of £>i, a contradiction. Let b = |/(n)| and /(n) = bezt. Any integer has the form m = sn + r, where s and r are integers and 0 < r < n. Thus, from (P2) and (10.36), we have /(sn 4- r)be~zt = /(($ — 2)n + r)bezt or, equivalently, /(sn + r) = /((s - 2)n + r)e2lf. It follows that, when s > 0, /(sn + r) = < eisf/(r)> 5 is even; et(s+l)t jf s is qJJ Recalling that /(g) = 0 when 0 < q < n and using (P2) again, we obtain /(sn + r) = eist, 0, beist, if s is even and r — 0; if 0 < r < n; if s is odd and r = 0 (10.37) for an arbitrary integer s. Next we make use of a well-known property of roots of unity, namely, fc=0 1, 0, if n divides m; if n does not divide m (10.38)
630 □ Chapter 10 Basic Theory of Normed and Locally Convex Spaces where w = It follows from (10.37) and (10.38) that n—1 /(m) = 2-1(l + ^n”1 J2u>fc’nei,nt/n fc=0 n— 1 4- 2-1 (1 - b)n-1 ^2 eim*/nwkmeimt/n. k=Q Thus, we have written f as a convex combination of sequences of the forms (wkelt!n)m and (el7r/na;fcelt/n)rn. Because f G ex£>i, it follows that b = n = 1 and /(m) = eimt. We have shown that f G E(T). Hence (10.32) is valid. EXERCISES 10.5 10.6 2 Show that a convex set contains any convex combination of its elements. 10.6 3 Show that x is an extreme point of the convex set C if and only if it satisfies the following condition: x = au + (1 — a)v, where 0 < a < 1 and u, v G C, implies и = v = x. 10.6 4 Prove parts (a)-(d) of Proposition 10.7 on page 621. 10.6 5 Prove assertions (10.25)-(10.28) on page 622. 10.6 6 Why does it suffice to prove the Krein-Milman theorem (Theorem 10.14 on page 622) in the case of real scalars? 10.6 7 Prove parts (b) and (c) of Corollary 10.8 on page 623. 10.6 8 Verify that the coefficient functionals in Example 10.17(b) on page 6g3 are continuous. 10.6 9 Let Q be a compact Hausdorff space and P(Q) the collection of proba- bility measures on Q, that is, the set {/i G M+(Q) : /z(Q) = 1}. By Exercise 10.53 on page 617, P(Q) is weak* compact, and it is easy to see that P(Q) is convex. Show that exP(Q) = { 6X : x G Q }. 10.7 0 Let К be a compact convex subset of a locally convex topological linear space and set P(K) = {д G M+(K) : fi(K) = 1}. Show that each fi G P(K) is the representing measure for some point of K. Hint: See Exercise 10.69. 10.7 1 Consider the space ZL1 (7?.). a) Show that Bi(0) has no extreme points. b) Deduce that Theorem 9.12 on page 559 fails in the case p = oo. 10.7 2 Prove assertions (P1)-(P8) on pages 626-627.
10.5 Compact Convex Sets □ 631 10.73 Define /(") = { 1, o, if n = 0; if n / 0. Show that f ex Di. 10.74 Let Q be a compact Hausdorff space and /:Q —> Q be continuous. A measure /1 G M(Q) is said to be invariant with respect to f if м(/-1(А))=м(Л), AeB(fi). Let Т/ denote the collection of all probability measures in M(Q) that are invariant with respect to f. a) Show that Tf is convex and weak* compact. b) Suppose that /1 G exZy. Prove that if A.G B(Q) and /~1(A) C A, then д(А) G {0,1}. Measures satisfying this condition are called ergodic. Hint: Refer to Theorem 6.17 on page 404.

PART FOUR □ Harmonic Analysis and Dynamical Systems
шШ (1954-) 1г|||В11аЛЛ ;'; gium on August 17,1954, and is now a natural- ist ’,^Вч|ЖЙЖ‘О|Ш1':0Ш:ЙШЙ1ЖвЖЙВИ * and PhD degrees in physics (in 1975 and 1980) from the Free University in Brussels, Belgium, and remained there in a research position un- > til 1987. From 1987 to 1994, she was a member ries, taking leaves to spend six months in 1990 at the University of Michi- gan and two years, from 1991-1993, at Rutgers University. Daubechies is an elected member of the National Academy of Sci- ences and the American Academy of Arts and Sciences. She was awarded the 1994 Leroy P. Steele prize for exposition for her book Ten Lectures on Wavelets and the 1997 Ruth Lyttle Satter Prize by the American Math- ematical Society. Her many editorial activities include editor-in-chief for Applied and Computational fiannonjc Analysis (Academic Pre»). The simplest example of what is now" known as a wavelet family was discovered in 1909 by T. Haar. The usefulness of Haar’s wavelets are limited, however, because they are discontinuous. Daubechies made a major contribution to wavelet theory when she found generalizations of Haar wavelets that, in addition to being highly regular, are considerably more effective for representing fonctions. Daubechies is currently at the Mathematics Department and the Pro- gram in Applied and Computational Mathematics at Princeton University. 634
□ 11 □ Elements of Harmonic Analysis Much of the subject matter of real analysis emerged from attempts by various mathematicians to deal with problems associated with the idea, developed by Jean Baptiste Joseph Fourier, of expanding a function in a series of the form oo f(x) = ao + ^2 (&n cos nx + bn sin nx). n=l (11-1) This expansion can be thought of as a decomposition of f into an infi- nite sum of harmonic (oscillatory) terms. Thus one speaks of (11.1) as a harmonic analysis of the function f. In this chapter we will investigate the meaning of (11.1) and some of its many variations using ideas that we have explored and results that we have obtained in previous chapters. Sections 11.1 and 11.2 deal with properties of Fourier series. In Sections 11.3 and 11.4, we investigate the Fourier transform. Sections 11.4 and 11.5 are devoted to wavelet expan- sions, analogues of Fourier expansions that have received much attention in recent years. 635
636 □ Chapter 11 Elements of Harmonic Analysis 11.1 INTRODUCTION TO FOURIER SERIES Recall that a complex-valued function f on is said to be periodic with period p if f(x + p) = /(я), xe1l. In most cases of interest, there is a smallest positive period, called the basic period of f. (See Exercise 11.1.) The reciprocal of the basic period is called the frequency of the periodic function. For convenience, and because it involves no real loss of generality, we restrict our treatment to functions having period 2тг. (See Exercise 11.2.) Harmonic analysis, often called Fourier analysis after its main founder, attempts to understand complicated periodic functions in terms of simple ones. Specifically, it was Fourier’s idea to try to represent a function with period 27Г as a series of the form (11.1). The formula (11.1) yields the function f as a sum of simple oscillating terms, that is, sine and cosine terms whose frequencies form the arithmetic progression 1/2тг, 2/2тг, 3/2тг, ... . The main purpose of this and the next section is to explore the meaning and ramifications of (11.1). Using the formulas 2 cos# = егх + е~гх and 2i sin x = егх — е~гх, we can recast (11.1) in the more compact form oo /(*) = 52 cneinx- n=—oo (11.2) Assuming that the series converges rapidly enough so that the integral and sum can be interchanged, we obtain f(x)e~inxdx = £ ck Г eiik~n)xdx k=—oo J—ir and, hence, that Cn = (2тг)“1 f(x)e~tnx dx. This shows how the co- efficients Cn, n = 0, ±1, ±2, ..., can be calculated explicitly from the function f and serves as motivation for the following definition.
11.1 Introduction to Fourier Series □ 637 DEFINITION 11.1 Fourier Coefficients, Transform, and Series For f e £1([—7г,7г]), the function ft Z —» C defined by /(n) = ^£y(x)e"^dx is called the Fourier transform of f. We refer to the number f(n) as the nth Fourier coefficient of f and to the expression £ f(n)einx as the Fourier series of f. The Fourier series of f is said to converge at x if the sequence of partial sums £ f(k)etkx, пеЛТ, k=—n has a finite limit s(z). Convergence to s(x) will often be indicated by s[x) = SX-oo /(П)егпж- The Fourier series of f is said to converge in the norm || || to the function s if the forementioned partial sums converge to s with respect to || ||. EXAMPLE 11.1 Illustrates Definition 11.1 a) Let /(z) = sin mz, where m G ЛЛ Then /(±m) = ±1/2г and /(n) = 0 otherwise. The Fourier series of f converges to /(z) for all z. In fact, the partial sums at z equal /(z) as soon as n > m. b) Consider the function f defined by -j, ifzG[-7T,0); if z G [О,7г]. An easy calculation shows /(0) = 0 and /(n) = г((—l)n — 1)/2тгп if n / 0. We will see later that the Fourier series of f converges to /(z) at points of (—7T,0) U (0,7r) and to 0 at —7Г, 0, and tv. □ The following theorem provides some basic properties of Fourier trans- forms of functions in £x([—7Г, 7r]).
638 □ Chapter 11 Elements of Harmonic Analysis THEOREM 11.1 Let f € £1([—тг, тг]). Then a) f € Co(Z). b) ||/|k < Н/111/2ТГ. c) f = 0 => f = 0 ae, PROOF: We leave the proof of part (a) to the reader as Exercise 11.3. Part (b) follows from the inequality |/(n)| < ±I**\f(x)e~inx\dx = Example 9.12 on page 550 shows that part (c) holds whenever f is a function in £2([—тг, tt]). The same argument remains valid, however, if £2([—тг, тг]) is replaced by £1([—тг,тг]) and || Ц2 is replaced by || ||i. Any function with period 2тг is completely determined by its values on the interval (—тг,тг]. Conversely, a function f defined on (—тг,тг] extends uniquely to a periodic function on all of via f(x) = f(x - 27rfc), X e ((2fc - 1)7T, (2fc 4- 1)7Г], к € Z, (11.3) A continuous function f on [—тг, тг] extends to a continuous periodic func- tion via (11.3) if and only if /(-тг) = /(тг). We will use to denote the space of continuous complex-valued functions having period 2тг. And, unless stated otherwise, we assume that is equipped with the supremum norm, ll/h = ll/111-^л) = ll/lloo = sup{ |/(x)| : X e [-7Г, 7Г] }. The normed space is identified via (11.3) with a closed subspace of C([—тг,тг]), hence it is a Banach space. Similarly, for 1 < p < 00, denotes the space of complex-valued Lebesgue measurable functions on with period 2тг whose restrictions to [—тг, тг] are in £p([—тг, тг]). We assume that £2^ ^ias norm iiyii = f if 1 < p < oo; ( inf{M : ll/l < M ae}, if p = 00. Since £2^ is identified via (11.3) with the space £p([—тг, тг]), it is, according to Riesz’s theorem (page 557), a Banach space.
11.1 Introduction to Fourier Series □ 639 For a function f on 7£, let fy denote the translated function defined by A/(x) = f(x ~ У)- If / € £2tt> ^еп so is fy and, likewise, if f G С27г, then so is fy. Thus we say that the spaces C27r and axe translation invariant. Furthermore, it follows from Exercise 11.2 that Ц/Jlp = ||/||p. Some important properties of the spaces an(l Фзтг are given in the following proposition. Its proof is left to the reader as Exercise 11.4. PROPOSITION 11.1 a) For p > 1, we have C27r с с c b) For 1 < p < oo, 62% is dense in jC^tt w^h respect to the norm || ||p. c) For 1 < p < 00, we have (£%*)* = where 1/p+l/q = 1. Specifically, I is a continuous linear functional on if and only if there exists a function g G such that Г f(x)g(x)dx, fe£^. Furthermore, ||£||* = \\g\\q. d) For f G £^ and у € H, we have e) Each function in is uniformly continuous on H. f) For f e C^ir and у ell, we have fy (n) = n G Z. For f G £271-> Sn(f) denote the nth partial sum of the Fourier series of/: 5n(/)(x) = £ f(k)eikx. (11.4) fc=—n It is easy to see that (11.4) defines a linear operator Sn on £2% with range in C27r. Because c it follows that Sn also maps into ^2%- The following proposition describes some essential properties of Sn.
640 □ Chapter 11 Elements of Harmonic Analysis Dn(x) PROPOSITION 11.2 Define Dn(x)= eikx, n = 0,1,2,.... k——n Then the following hold. a) We have sm((n + l/2M sm(a:/2) F 2n + 1, if ге/2тг e Z. b) = c) Dn(—x) = Dn(x) for each x e 11. 4 d) Iff e then Sn(f)(x) = (27г)”1 f(t)Dn(x-t) dt for each x eH. e) If f is a trigonometric polynomial, then Sn(J) = f whenever n > deg f. PROOF: We prove part (a) and leave the remaining parts to the reader as Exercise 11.5. If ж/2тг e Z, then егкх = 1 for each integer к and, consequently, Dn(x) = 2n + 1. If x/2tt Z, then using eikx = е~гкх and the formula for a geometric sum, we have JL JL / JL \ p£(n+i)x _ 1 Dn(x) = eikx + 22 e'ikx = 2ЭЧ £ eikX ) - 1 = Э*2 te _ - 1 fc=0 fc=l ' k=Q ' _ ег(п+1/2)ж - e~ix/2 _ sin((n 4- l/2)z) eix/2 _ e-ix/2 sin(x/2) ’ as required. The function Dn, often called the Dirichlet kernel, changes sign more and more rapidly as n increases. This is a main reason why so little can be said about the behavior of the sequence of partial sums of the Fourier series of a function f unless special conditions are imposed. However, the corresponding sequence of averages 1 n-l л.(/) = -Гад- (n-5) П k=Q satisfies a formula similar to Proposition 11.2(d), where Dn is replaced by a nonnegative function. This tends to make {An(/)}^=1 a more tractable sequence than {SnCf)}^. Clearly, (11.5) defines a linear operator on with range in C^. Proposition 11.3 presents some basic properties of An.
11.1 Introduction to Fourier Series □ 641 PROPOSITION 11.3 Define - n—1 Fn(x) = -VDk(x\ пеЛЛ n fc=0 Then the following hold. a) We have Fn(x) = < 1 /sin(nx/2)\ 2 n \ sin(z/2) J if гг/2тг Z; n, if x/2it € Z. b) (27г)'1 f\Fn(t)dt = l. c) Fn(—x) = Fn(x) for each x ell. d) For each б e (0, тг), sup{ Fn(x) : б < |x| < 7Г} = 0. e) If f e jCJtt, then An(f)(x) = (27г)”1 f** f(t)Fn(x-t) dt for each x eH. f) If 1 < p < oo and f e £p2*, then ||An(f)||p < \\f\\p. PROOF: The proofs of parts (a)-(e) are left to the reader as Exercise 11.6. To prove part (f), we argue as follows. If p = oo, then, by part (e), |An(/)(x)| < /’ \f(t)\Fn(x -t)dt< £ Fn(x _ f) dt for each x e 11. Applying Proposition 11.2, parts (c) and (b), we obtain Hn(/)(*)l < ll/lloo for each x e И and so ||An(/)||oo < ||/||oo Now suppose 1 < p < oo and let 1/p + 1/q = 1. For x e 11, define the Borel measure p on [—тг, тг] by p(B) = (27г)”1 fBFn(x — t)dX(t). It follows from parts (a), (b), (c), and Exercise 11.2 that p is nonnegative and p([—тг, тг]) = 1. Furthermore, by part (e) and Exercise 4.61 on page 191, An(/)(z) = f(t)dv(t). Hence, by Holder’s inequality (page 556), |An(/)(x)|< ( [ |/(t)|₽dp(t)y/P \»/[—7Г,7Г] / Applying Exercise 4.61 again and using Fubini’s theorem, we obtain IH„(/)||₽ < /’ |/(t)I” /’ Fn(x - t) dxdt = H/Ц*. Thus, part (f) is established.
642 □ Chapter 11 Elements of Harmonic Analysis The function Fn introduced in Proposition 11.3, often referred to as Fejer’s kernel, plays an essential role in Fourier analysis. We conclude this section by introducing another important function in harmonic analysis, namely, the sine function, defined by ' sin x — sine x = for x / 0; (11.6) , 1, for x = 0. As Exercise 3.91 on page 160 shows, sine is not Lebesgue integrable over Tt, yet its improper Riemann integral exists. In particular, we can assert that roo lim / sincxdx = 0. (11.7) b-*oojb We will find (11.7) useful in the next section. EXERCISES 11.1 11.1 Let У be a complex-valued measurable function on It such that there are arbitrarily small positive numbers p satisfying f(x + p) = f(x) for almost all x 6 It. Show that / is constant ae. ★11.2 Let f be a complex-valued function on It. a) Show that f has period p if and only if the function defined by f(px/2ir) has period 2тг. b) Show that if f has period p > 0 and is integrable over every bounded interval of It, then f*+p f(t) dt is independent of x. 11.3 Prove Theorem 11.1(a), often referred to as the Riemann-Lebesgue lemma. Hint: Start with the case where f is the characteristic function of a subinterval of [—тг, тг]. 11.4 Prove Proposition 11.1. Hint: For part (b), see Exercise 9.56 on page 562 and, for part (d), refer to Exercise 11.2. 11.5 Complete the proof of Proposition 11.2. 11.6 Prove parts (a)-(e) of Proposition 11.3. 11.7 Let f be a complex-valued function defined on It and satisfying the fol- lowing conditions: f is not identically 0, is continuous at 0, has period 2тг, and f(x + y) = f(x)f(y) for all x, у G It. Show that f(x) = егпх for some integer n. it 11.8 For f,g G £2%» let the function f * g be defined by 1 Г* = — / f(x-y)g(y)dy.
11.2 Convergence of Fourier Series □ 643 Show that f * g is well-defined ae and belongs to . f * g is called the convolution of f and g. Observe that Sn(f) = f*Dn and An(f) = f*Fn. In Exercises 11.9-11.12, f * g denotes the convolution product introduced in Ex- ercise 11.8. ★11.9 Verify that the convolution product is commutative and associative, that is, prove that a) f*g = S*f- b) (f * ff) * h = f * (g * h). +11.10 Show that if g € then f * g 6 ★11.11 Show that if f 6 has a derivative at all points of f' 6 and 9 € then (f * g)f exists and equals (/') * g. +11.12 Prove that f * g = fg. Exercises 11.13-11.16 are concerned with extending the definition of the Fourier coefficient to measures. A measure д 6 M([—7г,тг]) is said to be periodic if м({—тг}) = д({тг}). The set of all periodic measures is a closed subspace of the Banach space M([—тг,тг]). The Fourier coefficients of a measure g 6 are defined by the formula AW — тг" I e~tnx dp,(x), n e z. 27Г «/[-7Г,7Г] In Exercises 11.13-11.16, we assume that д and v are members of Л/г*. +11.13 Show that if д << A with Radon-Nikodym derivative g, then Д = g. 11.14 Prove that |Д(п)| < |д|([—тг, тг])/2тг. Does Д always lie in Cq(Z)? (See part (a) of Theorem 11.1.) 11.15 Set Sn(/z)(x) = №)eikx. Verify that Sn(5o) = Dn/2ir. 11.16 Show that if Д = v, then /2 = 1/. Hint: See Example 8.15 on page 522. 11.2 CONVERGENCE OF FOURIER SERIES For a particular function f 6 and a number 2; 6 K, we pose the following two questions: Question 1: Does the Fourier series of f converge at x? Question 2: If the answer to Question 1 is yes, does the Fourier series of f at x converge to /(z)? Because these two questions are so broadly posed, one has to give the answer “not always” to both. Nevertheless, they serve as motivators for a
644 □ Chapter 11 Elements of Harmonic Analysis host of interesting and useful results. In this section we present samples of two approaches to answering Questions 1 and 2, namely: • Narrow the class of functions under consideration. • Modify the convergence requirement. The following theorem shows, among other things, that if the Fourier coefficients of a function converge to 0 rapidly enough, then the Fourier series converges to the function for almost all x. THEOREM 11.2 bet f e If f e £*(Z), then the Fourier series of f converges uniformly to a continuous function д such that f = д ae. In particular, the Fourier series of f converges to f almost everywhere. PROOF: Because Y^=-oq l/(n)l < °°> we deduce from Exercise 7.89 on page 447 that the series oo д(х)= £ /(n)e™ n=—oo converges uniformly on 71 and that g is continuous and has period 2тг. It follows that g = f and, consequently, from Theorem 11.1 on page 638 that g = f ae. EXAMPLE 11.2 Illustrates Theorem 11.2 Consider the function f defined on 11 having period 2тг and satisfying /(«) = 1 — |ж|/тг for x e [—7Г, 7г]. An easy calculation shows that /(0) = 1/2 and 1 _ (_ПП = «*<>. Because f € Theorem 11.2 implies that we have the expansion £ Л")'‘“Ч+£Ц^(е*"'+е’,п‘) n=—oo n=l _ 1 4 cos((2n 4- l)z) “ 2 + (2n + I)2 ’ n=0 4 '
11.2 Convergence of Fourier Series □ 645 where the series converges uniformly for x G [—тг, тг]. Setting x — 0, we obtain the formula >8 oo = E n=0 1 (2n + l)2 as a special case. □ Our next theorem, due to Dirichlet, shows that the Fourier series of a function of bounded variation converges everywhere and that it converges almost everywhere to the function. The proof of Dirichlet’s theorem re- quires the following lemma. LEMMA 11.1 Suppose that F is a right-continuous function of bounded variation defined on [a, 5]. Then there exists a p G M([a, b]) such that F(t) — F(a) = /i((a, t]) for each t G [a,b]. PROOF: Suppose first that F is nondecreasing and nonnegative. Then we can extend F to a distribution function on 71 by defining F(t) = 0 for t < a, and F(t) = F(b) for t > b. Applying Theorem 4.13 on page 226 to the ex- tended version of F, we conclude that there is a finite Borel measure p on H such that F(t) — F(a) = /i((a, t]) for each t G (a, 6]. By Exercise 9.68 on page 571, p is regular and, hence, the restriction of p to Borel measur- able subsets of [a, b] is a regular Borel measure satisfying the assertion of the lemma. To continue, we next assume that F is real-valued. Then, according to Theorem 6.3 on page 332, we can write F = pi — #2 where gi and g2 are nondecreasing functions on [а,Ь]. Letting /3 = pi (a) A p2(a), we define Fi(b) = 9i(b) - /3, F2(b) = p2(6) - /?, and for t G [a,b), F^t) = gi(t+) - /3 and F2(t) = p2(^+) — /3- Then Fi and F2 are nonnegative, nondecreasing, and right continuous on [a, 6], and we have F = Fi — F2. Therefore, there exist Д1,д2 G Af([a, d]) such that F(t) — F(a) = pi ((a, t]) — /i2((a,t]) for each t G [a, b]. It follows that the signed measure p = pi — p2 satisfies the assertion of the lemma. It remains to establish the lemma in case F is complex valued. This is done by first noting that the real and imaginary parts of F are real-valued, right-continuous functions of bounded variation and then applying what we just proved to the real and imaginary parts of F.
646 □ Chapter 11 Elements of Harmonic Analysis THEOREM 11.3 Dirichlet’s Theorem Suppose that f E £2% anc^ JS °? bounded variation on the interval [—7г,7г]. Then i(/(x+) + /(x-))= f; /(n)e*n® (11.8) n=—00 for each x € 71. In particular, the Fourier series of f converges to f almost everywhere. PROOF: Because /(#+) = f(x) for all but countably many x, if we re- place f(x) by f(x+) at each x, the Fourier coefficients of the function are unaltered. Therefore, without loss of generality, we can assume that f is right continuous on 1Z. We first show that (11.8) holds when x = 0. Using Proposition 11.2 on page 640, we obtain = i Г f(t)Dn(t)dt 27Г J Jo 1 f° + J-J/W - /(0-))Dn(t)dt + Ш Г Dn(t) dt + [° Pn(t) dt (119) 27Г Jo 2тг = ^l\f^-fW)Dn(.t)dt + |(/(0) + /(0-))- A We will show that lim [ (/(t)-/(0))Dn(t)dt = 0. (11.10) 71—>OO Jq Set g(t) = (t/2)/(t)/sm(t/2) for t e (0,7r] and g(0) = /(0). Clearly, g is right continuous and, referring to Exercise 6.28 on page 334, we see that it has bounded variation over [0, тг]. Hence, by Lemma 11.1, there is a regular
11.2 Convergence of Fourier Series □ 647 Borel measure /z on [0,7r] such that ^(t) - g(0) = д((0, t]) = [ dn(x) for t G [0, тг]. Applying Fubini’s theorem, we obtain f(/W - /(0))Pn(t) dt = [\g(t) - g(0))sm>±W) dt Jo Jo Ь/ z _ Г [ sin((n + l/2)f) . M, ~ I I 4. tn dp,[x)dt Jo j (0,t] 4 * = f J(О,7Г] Jx Щ r /•(n+l/2)% = 2 / / sine vdvdpjx). J(О,7Г] J(n+l/2):c It follows from (11.7) on page 642 that the sequence I sinev dv J(n+l/2)a: is uniformly bounded and tends to 0 as n —> oo for я; G (О,тг]. There- fore, by the dominated convergence theorem, (11.10) is satisfied. A similar argument applied to the function p(—t) shows that lim - f(0-))Dn(t) dt = 0. n-*°o Jo In view of (11.9), we conclude that lim Sn(/)(0) = |(/(0) 4- /(0-)). (11.11) n~>oo 2 To complete the proof of (11.8), we apply (11.11) to the translated function f_x. Using Proposition 11.2 on page 640 and Exercise 11.2 on page 642, we obtain S„(/-x)(0) = T £ f(t + x)Dn(t) dt = T / f(t)Dn(t -x)dt = Sn(/)(a:). 2^ J
648 □ Chapter 11 Elements of Harmonic Analysis Thus, from (11.11) and Exercise 11.17, we have 1 1 lim Sn(/)(x) = ~(f-x(0) + /-Д0-)) = -(/(x) + 71—ЮО Z Z as required. The last sentence in the statement of the theorem follows from the fact that a function of bounded variation has only countably many points of discontinuity. EXAMPLE 11.3 Illustrates Dirichlet's Theorem Refer to Example 11.1(b) on page 637. Clearly, f is a right-continuous func- tion of bounded variation and /(0) = 1/2 and /(0—) = —1/2. Considering the Fourier series of f at x = 0, we have E /(n) = n=—oo lim n—*oo <((-!)*-1)) 2?rfc 2-^ 2nk k——n = 0=J(/(0) 4-/(0—)), as predicted by Dirichlet’s theorem. Theorems 11.2 and 11.3 are about pointwise convergence of Fourier series. Other interesting results can be obtained if we allow alternate modes of convergence. For example, if f 6 then it follows from Example 9.12 on page 550 that lim Ц/ - Sn(/)||2 = 0, (11.12) n—*oo that is, the Fourier series of f converges to f with respect to the norm || Ц2. The weaker formula lim ||/-Лп(/)||2 = 0, (11.13) n—>OO which follows immediately from (11.12), suggests still another way of look- ing at the convergence of Fourier series. Intuitively, because averaging tends to diminish fluctuations, it should tend to make partial sums easier to handle. That this intuition is correct is borne out by the following theorem of which (11.13) is a special case. THEOREM 11.4 a) If f & C2k, then limn-,», Ц/ - A^/)^ = 0. b) Forl<p< oo, if f e then H-oo II/ ~ An(/)||p = 0. c) If f € then the sequence {An(/)}^L1 converges to f in the weak* topology of
11.2 Convergence of Fourier Series □ 649 PROOF: We show first that for f e lim An(/)(0) = /(0). n—>oo (U-14) Later in the proof, we will generalize the argument to establish part (a). By Proposition 11.3 on page 641, we have I A»(/)(0) - /(0)| < |/(t) - /(0)|Fn(t) dt. Given e > 0, there exists a 6 € (0,7г) such that \f(t) — /(0)| < e/2 for t e (—6,6), Hence, Hn(/)(0)-/(0)|<5-^y’"F„(t)dt (11.15) 1 27Г |/(t)-/(0)|Fn(t)dt. Since, by Proposition 11.3, (2тг) 1 f^g Fn(t)dt < 1, we have that Hn(/)(0) - /(0)1 < | + 2II/IU sup{ Fn(t): 6 < |t| < 7Г}. £ Equation (11.14) now follows from Proposition 11.3(d). To complete the proof of part (a), we observe that, as f is uniformly continuous on It, the 6 in the foregoing argument can be chosen so that whenever t e (—6,6), \f-x(t) — /_ж(0)| < e/2 for all x e It. It follows that the inequality (11.15) is satisfied when f is replaced by f_x. Because An(/-x)(0) = Ап(/)(ж) and ||/-x||oo = ll/lloo, we deduce that |An(/)(x) - fix)I < | + 211/Hoo sup{ Fn(t): <5 < |t| < тг}. Hence, the sequence {An(/)}^=1 converges uniformly to f. This verifies part (a) of the theorem. To establish part (b), we make use of the density of in and the inequality ||p||p < (2тг)1/р||р||оо for functions in For e > 0, choose д € Сгтг such that \\f - g\\p < e/3. Then we have ||An(/) - f\\p < \\An(f - p)||p + ||An(p) - g\\p + Ц/ - g\\p <?€ + (2тг)1/р||Ап(^) — рЦоо- О
650 □ Chapter 11 Elements of Harmonic Analysis It now follows from part (a) that ||An(J) — f\\p < e for sufficiently large n. For part (c), we must show that lim / An(f)(x) g(x) dx = / f(x)g(x)dx (11.16) n-oo for all g G By Proposition 11.3 and Fubini’s theorem, ^п(/)(ж)5(ж) dx = [ [ f(t)g(x)Fn(x - t) dtdx It follows that /7Г С7Г A»(/)(®)p(a:)<b: - / f(x)g(x) dx -7Г J —7Г = Г f(t)(An(g)(t) - g(t)) dt < ll/UIIAn(g) - 5|| J — 7Г and, hence, in view of part (b), we see that (11.16) holds. EXAMPLE 11.4 Illustrates Theorem 11.4 It follows from Exercise 10.27 on page 597 that there exists a function f € such that the sequence {Sn(/)(«o)}^Li diverges for some xq. Theorem 11.4 shows that by averaging the Sn(f)s we can remove the di- vergence at xq and get uniform convergence for all x. EXERCISES 11.2 11.17 Show that /_x(0+) = f(x+) and /_x(0—) = f(x-). 11.18 Explain why part (b) of Theorem 11.4 is false when p = oo. 11.19 Localization of Fourier series. a) Suppose f G and that f vanishes on some open interval J C [—тг, тг]. Show that the Fourier series of f converges uniformly to 0 on compact subsets of J. Hint: Start with the case where f is the characteristic function of an interval disjoint from J. b) Deduce from part (a) that if /, g G and f = g on some open interval J С [-тг,тг], then /(n)einx converges at a point x G J if and only if 9(n)e'nx converges.
11.2 Convergence of Fourier Series □ 651 11.20 Suppose that G C2-K for j = 0, 1, ..., m — 1 and that is absolutely continuous. a) Show thatj/(n)| < ||/(m) ||1/(2тгпт). b) Deduce that if m > 2, then the Fourier series of f converges uniformly to f. ★ 11.21 In this exercise, we evaluate Jo°° sine x dx. a) Consider the function f with period 2% satisfying /(*) = Г (тг — x)/2, [ -(тг + x)/2, if 0 < x < %; if —тг < x < 0. Show that •f(n) “ { (2m)"1, if n = 0; if n / 0. Deduce that ' (7Г — x)/2 < 0, k -(тг + z)/2, if < x < тг; if x = 0, %, — тг; if тг < x < 0. b) Show that for x G [—тг, 7r], = + Dn(t)dt. c) Show that for x G [—7г,тг], Q x 1 , Г sin((n + l/2)t) f . Sn{f){x) = --x + / —-— dt + qn(x), 2 Jo z where qn{x) tends to 0 uniformly for x G [—тг, тг] as n —► 00. d) Deduce from part (c) that sine v dv = тг/2. 11.22 This exercise studies the behavior near 0 of the sequence of partial sums of the Fourier series of a function f. It is assumed that f is right continuous and of bounded variation on [—тг, тг]. In what follows, /0 denotes the function defined in Exercise 11.21. For 0 < £ < тг, let wn,6(/) = sup{ Sn(/)(x) - Sn(/)(-rr) : 0 < x < 6 } and w6(/) = limsup^^ wn,6(/)- a) For each nonnegative integer n, verify that the function n sin(n + l/2)t
652 □ Chapter 11 Elements of Harmonic Analysis has a local max and local min at ir/(n + 1/2) and —тг/(п + 1/2), re- spectively, and, moreover, that Gn(ir/(n 4-1/2)) = I sinctdt and Gn(~тг/(п+ 1/2)) = — I sinctdt. b) Deduce that lim6—o+ им(/о) = 2 f* sine t dt. c) Show that if f is continuous at 0, then lim^o+ ^<s(7) = 0- Hint: Study the proof of Theorem 11.3 carefully. Use Exercise 11.21 to show that there is a constant c such that UJ7 Dn(t)dt| < c for all n and for all X,y 6 [—7Г, 7г]. d) Show that if f is discontinuous at 0, then 2 / lim ws(f) = -(/(0) - /(0-)) / sinct dt. Hint: Consider f — afo for a suitable constant a and use part (c). This property of the sequence of partial sums near a jump discontinuity is known as Gibbs’ phenomenon. 11.23 Let Мгк denote the space of periodic measures on [—7r, 7г], as defined in the paragraph preceding Exercise 11.13 on page 643. Verify that Мг* can be identified with the dual space of in the sense that every continuous linear functional on is of the form f —* J f dp for some g G Мг^- 11.24 Refer to Exercise 11.23. Suppose that g € and set лп(м)(х) = 1У sk(n)(x), neM, where S„(g)(x) = J2”=_n/i(fc)e<fca: and Д(А:) = (27г) 1e ikx йц(х). Define vn on 23([—7г, тг]) by vn(B) = An(g)(x)dx. Prove that the se- quence {i/n converges in the weak* topology of М2 к to Lebesgue mea- sure on [—7г,тг]. Let I be a bounded interval with a nonempty interior. A sequence {an}“=1 of elements of I is said to be uniformly distributed in I if lim -N({keti:l<k<n, akeJ}) = ^ n—»oo П for each subinterval J С I. Here N(E) denotes the number of elements of a set E and £ denotes length. The idea of uniform distribution is that the relative
11.3 The Fourier Transform □ 653 frequency of points of the sequence lying in an interval is proportional to the length of the interval. 11.25 The object of this exercise is to establish Weyl’s criteria for uniform distribution, which we do for I = [—7г,7г]. Prove that the following are equivalent: а) MXi is uniformly distributed in [—7г,тг]. b) limn_+oo n-1 f(ak) = (27г)”1 f** f(x) dx for each f e C([-тг, тг]). c) limn_>oo n-1 elOfeTn = 0 for every nonzero integer m. Hint: If part (b) holds, consider continuous functions f and g having the prop- erty that 0 < g < xj < f. ★ 11.26 Show that the sequence {nb — is uniformly distributed in [0,1] if and only if b is irrational. Here [ ] denotes the greatest integer function. 11.3 THE FOURIER TRANSFORM Fourier series expansions express 2?r-periodic functions in terms of the os- cillatory functions егпх, n e Z, whose basic periods form the discrete set {27Г, 2tt/2, 2tt/3, ...}. In this section we discuss an analogous expansion for certain nonperiodic functions in terms of oscillatory functions of the form eltx, where the parameter t is continuous rather than discrete and summation is replaced by integration. Specifically, we have the following definition. DEFINITION 11.2 Fourier Transform For f e £1(7^), the function fiH—* * C defined by /(t) = Г f(x)e~itxdx V27T J-oo is called the Fourier transform of f. We should point out the following facts: • Definition 11.2 deviates from the one given in Exercise 4.81 on page 202 by the factor (27г)""1/2. In fact, slightly different definitions appear in various mathematical subfields, mostly for aesthetic reasons. • The term Fourier transform is used in both Definition 11.1 (for periodic functions) and Definition 11.2 (for /^-functions). There is little room for
654 □ Chapter 11 Elements of Harmonic Analysis confusion, however, because the Fourier transform of a function in £|% a function on 2, whereas that of a function in £1(7?<) is a function on 1Z. Moreover, the only function common to both £3% and & W the zero function. The advantage of using common terminology is that it suggests many important analogies between properties of the two transforms. The following theorem, whose proof is left to the reader as Exer- cise 11.27, provides some basic properties of the Fourier transform. One of the properties employs the notation to represent the translation- dilation of the function /. That is, we write /o,b(®) = - b)/a) v|a| for a, b G TZ and а / 0. THEOREM 11.5 Let f G £1(7^). Then the following hold: a) feC0(K). b) ll/h < 11/111/Л c) The function F:£x(7£) —► Cq(H) defined by F(J) = f is a continuous linear mapping. d) For a,b elZ and а / 0, we have = УЙe~ibtf(at), t G 1Z. e) If |ж/(ж)| dx < oo, then f' exists for all t Gil and 1 f°° f'(t) = I ^ix)f[x)e^itx dx. V27T J-oo f) If f exists ae and f € £1(7J), then ff (t) — itf(t). We observe that parts (a) and (b) of Theorem 11.5 are, respectively, the analogues of parts (a) and (b) of Theorem 11.1 on page 638. Moreover, for а = 1, part (d) of Theorem 11.5 is the analogue of part (f) of Proposi- tion 11.1 on page 639. The properties of the Fourier transform described in parts (e) and (f) of Theorem 11.5 have numerous applications to many fields, including differential equations and probability theory.
11.3 The Fourier Transform □ 655 EXAMPLE 11.5 Illustrates Definition 11.2 and Theorem 11.5 For c > 0, we have 1 rc X[-c,c]W — ~7= / e~ltx dx = y/2/ircsinc(cf). Using this fact and Theorem 11.5(d), it is easy to obtain the Fourier trans- form of any integrable step function. □ EXAMPLE 11.6 Illustrates Definition 11.2 and Theorem 11.5 The function g(x) = e-*2/2, often called the Gaussian function, arises in many areas of mathematics, including harmonic analysis, probability theory, and statistics. We will prove that the Gaussian function is its own Fourier transform, that is, g = g. To accomplish this, we first note that g satisfies the condition of Theorem 11.5(e) and, consequently, p'(t) = --= у (—гя)е x2l2e ztx dx. Applying integration by parts, we find that g '(t) = —tg(t). This differential equation has the solution g(t) = £(0)е“* I2. As the reader is asked to verify in Exercise 11.28, /•OO 2 х/2тгд(0) = / e~x /2 dx = л/2тг. J—oo Hence, £(0) = 1, as required. □ Convolution Products In the theory of Fourier series, we find frequent appearanc^of convolution products, that is, integrals of the forni 1 (J * 5)(x) = — J f(x- y}g(y) dy, where f,g E Some basic properties of convolution are examined in Exercises 11.8-11.12 on pages 642-643. For instance, Exercise 11.12 shows that convolution multiplication of periodic functions corresponds to ordi- nary multiplication of Fourier coefficients. In the theory of the Fourier transform, a similar notion of convolution product for Z}1 (T^)-functions plays an essential role. We begin with the following definition.
656 □ Chapter 11 Elements of Harmonic Analysis DEFINITION 11.3 Convolution of Functions For f,g e £* 1(7i), the function f * g defined by 1 r°° (f * g)(x) = / f(x - y) g(y) dy, V2tt J-oo is called the convolution of f and g. xeTl, As with the definition of the Fourier transform, minor modifications of the definition of convolution given in Definition 11.3 appear in various mathematical subfields. In particular, the factor (2тг)‘“1/2 is often omitted, as was done in Exercise 4.157(d) on page 256. From this point on, we will use Definition 11.3. The following theorem summarizes basic properties of the convolution product. Its proof is left to the reader as Exercise 11.31. THEOREM 11.6 bet f,g,h 6 £1(7^). Then the following hold: л) f*g& /ЭД and, in fact, \\f * < ||/||i||p||i/v^. b) = c) (f *g)*h = f d) f*(g + h) = f*g + f*h. e) f*g = fg- EXAMPLE 11.7 Illustrates Definition 11.3 The integrals 1 fT - W)(s) = / /(t)eitedt, T > 0, v 2тг J-t are analogous to the partial sums of a Fourier series. We will show how to express Ir(f) as a convolution product. Using Fubini’s theorem, we have W)(x) = Г eitx Г f(y)e~itv dydt Z7T J_T J-.^ = Г f№ f e-^-^'dt dy = (f* DT)(x), 27r 7-00 J-T where Dy(x) = yJ'llvTsinc(Tx). The function Dt is the continuous ana- logue of the Dirichlet kernel. □
11.3 The Fourier Transform □ 657 Uniqueness and Inversion Based on an analogy with Fourier series, we might expect the formula lim JT(/)(x) = /(x) (11.17) 1 —+oo ( to hold, at least under some reasonable conditions on f. Indeed, the follow- ing heuristic argument suggests that (11.17) is valid when f is continuous at x. By Exercise 11.21 on page 651, we have 1 f (x) = - f(x)sincydy * J-OO and, by Example 11.7, 1 M/)W = - / f(y) sinc(T(x - 2/)) d(Ty) J-oo 1 r°° = - y/T) sine у dy. J-OO It follows that 1 Jr(/)(*) - Лж) = - / (Л* - У/т) - /(x)) sine у dy. J-oo Hence, 1 f°° lim Fp(/)(x) - /(x) = - / lim (/(x - y/T) - f(x)) sine у dy = 0. T-*oo 7Г T—>oo The obstacle to making this argument rigorous is that the function sine does not belong to £1(7i) and so the dominated convergence theorem cannot be applied. A way around this obstruction is to pass from the integals Ir(f) to their averages. For f e Г1^), let W) = ± [T It(f)dt. 1 Jo Integrals of the form Jrtj), which are analogous to averages of partial sums of Fourier series, make tractable substitutes for Like
658 □ Chapter 11 Elements of Harmonic Analysis the integral Jt(J) is also a convolution product, as can be seen as follows. By Fubini’s theorem, 1 fT J- Jo 11 rT r°° = -= / / /(y)tsinc(t(x - y))dydt J Jo J—oo If00 1 [T = ~ isinc(f(x-y))dfdy J—oo •* JO 1 f°° ft X1 - cos(T(x - y)~) . \ = * L, {Ы dy - (/ * G^’ where x [2 1 - cos(Tx) [2 sin2(Ti/2) GtW=v; w = vj №/2' - The function Gt is the continuous analogue of the Fejer kernel. Three of its essential properties are presented in Lemma 11.2. Parts (a) and (c) of the lemma are obvious; part (b) is left to the reader as Exercise 11.32. LEMMA 11.2 The function Gt defined by the previous equation satisfies the following conditions. a) Gt > 0. b) (2TT)-1/2frooGT(x)dx = l. c) For each 6 > 0, 1нпт_>оо Gt(x) dx = 0. Our next theorem presents results analogous to those given for Fourier series in Theorem 11.4 on page 648. THEOREM 11.7 a) Iff E Со(тг) П then limr-.oo \\f - Jr(/)||oo = 0. b) Iff G Г1 (7г), then limr^oo \\f - W)||i = o. PROOF: To prove part (a), let e > 0. Since f is uniformly continuous, we can choose 6 > 0 so that \f(x — y) — f(x)\ < e for all у G (—6,6) and x G 7£.
11.3 The Fourier Transform □ 659 Then, by Lemma 11.2, we have 1 r°° Jr(f)(x) - f(x) = -= / /(y)Gr(x -y)dy- f(x) V J — oo 1 f°° = -7== / (/(* - У) ~ /(^))Ст(у) dy. V J—qo Thus, 1 r°° \W)(x) - f(x)\ < -= / |/(x - y) - f(x)\GT(y) dy V2tt J—о© < -7== f GT(y)dy V2tt + \[^ ll/lloo [ GT(y)dy <e+\[f H/lloo [ GT(y)dy. Applying Lemma 11.2(c), we conclude that lim sup |JT(f)(x) - /(x)| < e. т-*°°хетг As e > 0 was arbitrarily chosen, we see that part (a) holds. To establish part (b), we begin by using Fubini’s theorem to conclude that Г° | JT(/)(x) - /(x)I dx<* Г Г |/(x - y) - /(x)|GT(y) dy dx J—oo V J — oo J — oo = -7== f f \f(x-y)~ f(x)\GT(y)dxdy V J—oo J—oo 1 /*°° = -= / ||/a-/||1GT(y)dy. V J — oo The function h(y) = \\fy — /||i is bounded by 2||/||i and, as we ask the reader to verify in Exercise 11.34, is continuous at 0. It follows by the argument used in part (a) that lim f ИД - f\\iGT(y) dy = 0. T-*°° J-oo Thus, part (b) is proved.
660 □ Chapter 11 Elements of Harmonic Analysis COROLLARY 11.1 Uniqueness Property of Fourier Transforms If f,g e £1(7^) and f = g, then f = g ae. PROOF: If f = 0, then and, hence vanishes for every T > 0. Applying Theorem 11.7(b), we conclude that f = 0 ae. The corollary now follows from the linearity of the Fourier transform. Corollary 11.1 implies that an СУ (T^)-function is determined by its Fourier transform in the sense that two functions having the same transform must be identical almost everywhere. Theorem 11.8 gives a recipe for recovering a function from its Fourier transform. Such recipes are referred to as inversion theorems. THEOREM 11.8 Suppose that both f and f belong to C1 ('ll). Then 1 г°° л /(x)==--= / V J —oo for almost all x £ TI- PROOF. Let 1 f°° л р(х) = -=/ f{tytxdt. N J —oo Because, by assumption, / e £1(7i), it follows that g is well defined for all x € H. Using the dominated convergence theorem, we can write g(x) = lim -1= [ f(tytx dt = lim 1 —>oo у z7T J — T T—*oo From this we can conclude that g(x) = lim?^ Jr(/)(x), as the reader is asked to verify in Exercise 11.35. In particular, we have shown that the sequence {Jn(/)}^Li converges to g pointwise on 71. Now, by Theorem 11.7(b), the sequence {Jn(/)}^=i converges to f in the £1(7^)-norm. Applying Exercise 4.84 on page 206 and Proposition 4.12 on page 204, we deduce that there is a subsequence of that converges to f almost everywhere. Consequently, f = g ae. Applying Theorem 11.8 and Theorem 11.5 on page 654, we obtain the following corollary.
11.3 The Fourier Transform □ 661 COROLLARY 11.2 If both f and f belong to £x (1Z), then f(x) = /(—x) for almost all x e 11. Furthermore, f is equal to a continuous function almost everywhere. Although Theorem 11.8 is adequate for handling functions satisfying certain mild restrictions, such as the ones given in Exercise 11.36, it is by no means the last word on inversion of the Fourier transform. Indeed, Exam- ple 11.5 on page 655 shows that the Fourier transform of the characteristic function of an interval fails to be Lebesgue integrable. EXERCISES 11.3 11.27 Prove Theorem 11.5 on page 654. 11.28 Show that e~x2^2 dx = у/2тг. Hint: Use polar coordinates to evaluate the double integral e”^2+j/2^2 dxdy. 11.29 Calculate the Fourier transform of the function f(x) = e~^x~b^2^a, where a and b are real constants with a > 0. 11.30 The convolution product also appears in probability theory in a natural way. Let X and Y be independent random variables having probability density functions fx and /у, respectively. a) Show that the random variable X + Y has probability density function given by fx+Y = VZnfx * fv- b) Explain the discrepancy between the result in part (a) and the one obtained in Exercise 5.56(c) on page 288. 11.31 Prove Theorem 11.6 on page 656. 11.32 Prove that there is no identity for the convolution product, that is, there does not exist a function h G £1(7?-) such that f = f * h for all f € С1 (IV). Note, however, that Theorem 11.7 shows that limy—oo f * Gt = f for aii/er1^). 11.33 Show that Hint: Use 1 - cos(Tz) _ 1 sin(ta) , Tx2 ~TJQ and Exercise 11.21 on page 651. 11.34 Let 1 < p < oo and f G CP(1V). Show that the function h defined by h(y) = \\fy — /||p is continuous on IV 11.35 Let / G £1(7?.) and x G 1Z. Suppose that limy—oo Лг(/)(я) exists and equals, say, L. Prove that limy—о© Jy(/)(x) also exists and equals L.
662 □ Chapter 11 Elements of Harmonic Analysis 11.36 Suppose that /" exists and is finite everywhere and that /, /" € C1 (1Z). Prove that f G £1(7?.). 11.37 Suppose that € £T(7^) A C(1Z) and that there is a constant M such that |/(x)|V|/'(x)|V|/"(x)|<T^, xen. a) Prove the Poisson summation formula: oo oo ^2 7(fc) = V^ir J2 /(2’rn)- fc=—oo n=—oo Hint: Consider the function g(x) = f(x + 2тгп). b) Use the Poisson summation formula to verify the Jacobi theta func- tion identity: £2 e-n2/2t = V^t J e-2’2"2*, t>0. n=—oo n=—oo Hint: Refer to Exercise 11.29. In Exercises 11.38-11.40, C°°(1Z) denotes the space of complex-valued functions having derivatives of all orders at each point of 11. For nonnegative integers n and m, define = sup{ (1 + xr2”)|/<Tn)(re)| : x e 7г}, f € С°°(7г). We will consider the linear space S(7£) = { f e С°°(П) : an,m(/) < oo, n, m = 0,1,2,... } with the topology induced by the family of seminorms {<rn,m : n, m = 0,1,... }. 11.38 Prove that f € S(1Z) and g G £1(7^) imply / * g G S(1Z). 11.39 Prove that f G S(1Z) if and only if f G S(1Z). 11.40 Prove that the linear operator F: S(K) —► S(1Z) defined by F(/) = f is continuous, one-to-one, and onto. 11.4 FOURIER TRANSFORMS OF MEASURES In this section we will extend the concept of Fourier transform from func- tions in £1(7^) to measures in M(1Z). As an application of Fourier trans- forms of measures, we will obtain several interesting and important results in probability theory, including the celebrated central limit theorem.
11.4 Fourier Transforms of Measures о 663 DEFINITION 11.4 Fourier Transform of a Measure For ft 6 M(7£), the function fi: 1Z, —> C defined by AW = -i= f e~itxd^x) v2tt Jn is called the Fourier transform of /z. EXAMPLE 11.8 Illustrates Definition 11.4 a) The Radon-Nikodym theorem for complex measures (page 383) and Exercise 6.115 on page 387 imply that if /z C A, then /z = d/i/dX. b) If а 6 1Z and /z = 6a> then /z(t) = (2тг)“1 /2е~га1. □ Our next proposition, whose proof is left for Exercise 11.41, provides some basic properties of Fourier transforms of measures. PROPOSITION 11.4 Let /z e M (H). Then the following hold: a) fie с(тг). b) |M(t)| < |д|(я)/>/2?. c) If n(B) = fB f(x) dA(x) for some f € £1(7J), then fi = f. The integrals It (J) and 1т(Г)> defined in Section 11.3, play an im- portant role in the theory of Fourier transforms of C1 (T^)-functions. They have natural analogues when f is replaced by a measure: If /z e we let /Т(д)(х) = -А= [T M)eitxdt, T > 0, V2tt J-t and 1 fT = T>0. Using these integrals we can show that a measure is determined by its Fourier transform. We begin with the following theorem whose proof is left to the reader as Exercise 11.42.
664 □ Chapter 11 Elements of Harmonic Analysis THEOREM 11.9 For fi G A/(7£), define pT(B) = / JT(/z)(x)dA(x), BeB. Then the following hold: &) рт g M(1Z). b) |/zT|(7J) < |M|(7J). c) The net {mt}tg(0,oo) converges in the weak* topology to p. COROLLARY 11.3 Uniqueness Property of Fourier Transforms If p, v G M(11) and p = у, then p = y. PROOF: If p = 0, then Jt(p) and, hence рт, vanish for every T > 0. Applying part (c) of Theorem 11.9, we conclude that p = 0. The corollary now follows from the linearity of the Fourier transform of a measure. Corollary 11.3 implies that a measure is determined by its Fourier transform in the sense that two measures with the same transform must be identical. When p is a probability measure, we can get still more infor- mation about its relationship with p. LEMMA 11.3 Suppose p G M+(1Z) and p(1Z) = 1. Then, for each c> 0, we have [Це p([—2c,2c]) > v2tfc / p(t)dt J-l/c - 1. PROOF: Let b > 0. Then r J rb r = / — / e~ltx dt dp(x) = / sinc(fer) dp(x). fn 2o J-ь Jn It is easy to see that | sinc(frr)| < 1 for all x and that | sinc(fer)| < (26c)-1 when |rr] > 2c. It follows that Г 1 J ji(t) dt < д([-2с, 2c]) + \ [-2c, 2c]).
11.4 Fourier Transforms of Measures □ 665 Taking b—\/c and using /z(7£) = 1, we get 2тг /*1/c i 1 / A(t)dt <1 + £д([-2С)2с]), from which the assertion of the lemma follows immediately. Just as convolution of functions plays an important role in the theory of Fourier transforms of functions, convolution of measures figures promi- nently in the theory of Fourier transforms of measures. DEFINITION 11.5 Convolution of Measures For д, у G М(И), the Borel measure /z * у defined by (д * p)(B) = -J= [ n(B - x) dv(x), В ев, v2tt Ju is called the convolution of /z and y. As with the definition of convolution of functions, minor modifications of the definition of convolution of measures given in Definition 11.5 appear in various mathematical subfields. In particular, the factor (2тг)-1//2 is often omitted, as was done in Exercise 4.158(d) on page 256. From this point on, we will use Definition 11.5. Our next proposition, whose proof is left for the reader as Exer- cise 11.45, shows that convolution of measures (Definition 11.5) is consistent with convolution of functions (Definition 11.3 on page 656). PROPOSITION 11.5 Let £1(7^). Define measures p and у by /z(B) = [ fdX and y(B)= [ gdX, В G B. J в J в Then (M*P)(B)= [ (J*g)dX, BeB. Jb Equivalently, if /z and у are absolutely continuous with respect to Lebesgue measure, then so is p*y and, moreover, d(p * y)/dX = (dp/dX) * (dy/dX). Proposition 11.5 shows that convolution of functions corresponds to a special case of convolution of measures. More examples of convolution of measures are contained in Example 11.9.
666 □ Chapter 11 Elements of Harmonic Analysis EXAMPLE 11 .9 Illustrates Definition 11.5 a) For each /i G Af(7£), we have (6o * д)(В) = X= [ 60(B - x) dfi(x) = В g B. v2tt Лг v2tt In other words, 6q * /z = (27г)-1/2/х for all /z G b) Let X and Y be independent random variables. Then Exercise 5.56 on page 288 implies that /ix+у = у/2тг fix * □ The following analog of Theorem 11.6 on page 656 gives some basic properties of convolution of measures. Its proof is left for Exercise 11.46. THEOREM 11. 10 Let G Then the following hold: a) fi * у G b) fi * у = у * fi. с) (д*р) *7 = Ц* (y*^f). d) fi * (i/ 4- 7) = fi * у + fi * 7. e) fr*y = jiy. Fourier Transforms in Probability Theory Let X be a random variable. Then the function V>xW = £(eltx) is called characteristic function of the random variable X, not to be confused with the characteristic function of a set. It is easy to see that V’xW = [ e.ttxdnx(x) = V2tt Mx(-t). JR. Instead of using the characteristic function V>x5 as is usually done in prob- ability theory, we will work with the essentially equivalent Fourier trans- form fix- Recall that a sequence {Xn}^_1 of random variables is said to con- verge in distribution to the random variable X, written Xn X, if the sequence {/zxn}^i of measures converges to fix in the weak* topology. Exercise 10.59 on page 618 shows that Xn Л X if and only if lim / fdfiXn= / fdfix, fe n“*°° Jtz Jn
11.4 Fourier Transforms of Measures □ 667 Suppose that Xn -4 X, Then it follows immediately from the previous equation that lim p^(t) = px(t), t e 7г. (11.18) n—*oo In other words, convergence in distribution of a sequence of random vari- ables implies pointwise convergence of the Fourier transforms of the cor- responding distributions. The following important theorem, due to Paul Levy, provides a partial converse to this result. THEOREM 11. 11 Ldvy’s Theorem Let be a sequence of random variables such that the sequence of Fourier transforms converges pointwise to a function h that is continuous at t = 0. Then there is a random variable X such that Xn Л X andh = jrx- PROOF: Let be any subsequence of {Xn}Xi an<^ Mfc — We will first show that has a subsequence {Mfcj that converges in the weak* topology of M (Я) to a probability measure p. Because the PfcS are probability measures, it follows from Theorem 10.13 on page 616 that there is a subsequence that converges in the weak* topology of М(Тг) to a regular Borel measure p and, by Exercise 10.51 on page 617, p e м+(тг). We will show that p is a probability measure. Let б > 0. Because h is continuous at 0, there is a 6 > 0 such that — h(0)| < б for |t| < 6. Let c be a positive real number such that c“x < 6. Select a continuous function д satisfying 0 < д < 1, д(х) = 1 for |x| < 2c, and g(x) = 0 for |x| > 2c + 1. Applying Lemma 11.3, we have I g(x)dfj,kj(x) > pfc.([-2c,2c]) > Л/27ГС - 1. (11.19) Letting j —> oo and using weak* convergence on the left-hand side of (11.19) and dominated convergence on the right-hand side of (11.19), we obtain
668 □ Chapter 11 Elements of Harmonic Analysis Using [—1/c, 1/c] C (—<5,6) and h(0) = (2тг) */2, we conclude that p(T^) > / g(x) dfi(x) > 1 — 2л/2тг €. Jn Because 6 is an arbitrary positive number, it follows that p(7£) > 1. On the other hand, if f G Co(7£) with \ f\ < 1, then Applying the Riesz representation theorem (page 575), we deduce that p(7£) < 1- Thus', we have shown that p(7£) = 1- Next we apply Exercise 10.59 on page 618 to assert that for each f & Cbity, -> /л f d/J, as j -» oo. Letting f(x) = , we obtain that h(t) = p(t). Now suppose {Xrnjc}^=1 is another subsequence of {^n}^°=r By the preceding argument, there is a subsequence of {pxmk )ь=1 ^hat converges in the weak* topology to a probability measure v G M(1Z) with v(t) = h(t). Invoking the uniqueness property of Fourier transforms of measures (Corol- lary 11.3), we conclude that v = p>. Thus, we have shown that every subse- quence of {pxn }^Li has a subsequence converging weak* to the probability measure p. In a metric space, a sequence converges to a limit L if every subse- quence has a subsequence converging to L. Because the set of probability measures in M(ft) is weak* metrizable, it follows that {pxn}^=1 converges in the weak* topology to the probability measure To complete the proof, let X be the identity function on TZ. Then, as a random variable on the probability space (7£, 23, p,), we have that p = p,x and, because w*limpxn = p, we conclude that Xn Л X. In view of Proposition 11.4(a) on page 663, the uniqueness property of Fourier transforms, and Levy’s theorem, we obtain the following corollary. COROLLARY 11.4 Let X, Xi, X%, ... j be random variables. Then Xn X if and only if fix2 fix pointwise as n —> 00. The Central Limit Theorem The strong law of large numbers for sequences of independent and identi- cally distributed (iid) random variables, Theorem 5.9 on page 308, is one
11.4 Fourier Transforms of Measures □ 669 of the two most important theorems in probability theory. The other is the central limit theorem. This remarkable and useful result states that the partial sums of any sequence of iid random variables is asymptotically normally distributed, provided only that the random variables have finite variance. We will use Levy’s theorem to prove the central limit theorem. But first we require a lemma, the verification of which is left to the reader as Exercise 11.50. LEMMA 11.4 Suppose that p e M±(7V) is such that p(T^) = 1 and x2 dp(x) < oo. Set mi = f^xdp(x) and m2 = fnx2 dpjx). Then л/2тг p(t) = 1 - im^t — m2t2/2 4- a(t), where lim^o a(t)/t2 = 0* THEOREM 11. 12 Central Limit Theorem Suppose Xi, X2, .. •, are mutually independent and identically distributed random variables with mean m and Suite variance a2. Let Sn = $2£=1 Xfc. Then we have lim P 1 а < n—>oo \ Sn — nm y/na 1 f -z2/2 > - I e dx V27r Ja uniformly for all —00 < а < b < 00. PROOF: The reader is asked in Exercise 11.51 to show that we can with- out loss of generality assume that m = 0 and a2 = 1. It follows from Example 11.9(b) on page 666 that pSn = (2?r)(n x)/2p * p * • • • * jU, n times where p denotes the common distribution of the Xns. Let Zn = Sn/y/n. Then, by Theorem 11.10(e) and Exercise 11.43,
670 □ Chapter 11 Elements of Harmonic Analysis Using Lemma 11.4, we get _____... I t2 ( t \ P>zn (£) — “7= 1 — n—Ь a I “7= I у/2тг L 2n \V^J Consequently, lim gZn(t) = -±==e t2/2 = -i=g(t), n-юо у/2тг v2?r where g is the Gaussian function discussed in Example 11.6 on page 655. By Levy’s theorem, the sequence {Zn}^=1 converges in distribution to a random variable having the distribution u(B) = (27г)”1/2 fB e~x f2 dx. Applying Exercise 10.59 on page 618, we conclude that lim / f dp>zn = I f du, n-kO° Jn Jn f e Сь(тг). Let 0 < € < (b — a)/2. Choose a continuous function fi such that 0 < fi < 1, fi(x) = 0 for x (a, b), and fi(x) = 1 for x e [a 4- e, b — б]. And choose a continuous function /2 such that 0 < j2 < 1, AW = 1 for x E [a, b], and /2(2) = 0 for x (a — e, b 4- б). Then /1(1) dvzjx) dnzn(x)< / y2(a:)dgzn(a:)- Jn Thus, and / I / v 1 2 limsup/x^n((a, 6]) < / f2(x) dv(x) < —= / e~x '2 dx. n-+oo Jn V 27Г Ja-e Because б can be made arbitrarily small, it follows that e x2/2 dx = lim Mzn((u, b])= lim P n—*00 n—ЮО as required. The uniformity in a and b follows from Exercise 11.52.
11.4 Fourier Transforms of Measures □ 671 EXAMPLE 11.10 Illustrates the Central Limit Theorem As a consequence of the strong law of large numbers for iid random vari- ables, we proved, in Corollary 5.5, Borel’s strong law of large numbers: Suppose that E is an event associated with some random experiment and let p be its probability. Denote by n(E) the number of times that event E occurs in n independent repetitions of the experiment. Then, with proba- bility one, limn-юо n(E)/n = p. Similarly, we can obtain as a special case of the central limit theorem, the following result known as the DeMoivre-Laplace theorem: lim P n—*oo n(E) — np< y/np(l-p) ~ [be-^dx 27Г Jа uniformly for all —oo < а < b < oo. To prove this, define for each nG .V, Xn = 1 or 0 according to whether event E occurs or does not occur on the nth repetition of the experiment. Then Xi, X2, ..., are iid and have common meanp and variancep(l—p). Noting that n(E) = XiH-----hXn, we obtain the DeMoivre-Laplace theorem from the central limit theorem. □ EXERCISES 11.4 11.41 Prove Proposition 11.4 on page 663. 11.42 Prove Theorem 11.9 on page 664. 11.43 Establish the following facts. a) Let p G M(It) and i/(B) = д(а-1(В — 5)), where a and b are constants with a 0. Show that v(t) = e~lbtp(at). b) Let X be a random variable and set Y = aX 4- ft, where a and b are constants with a 0 0. Show that py(t) — e~'btp(at). 11.44 Let p e Show that if |x|fcd|/z|(o:) < 00, where k is a positive integer, then the fcth derivative p^ exists and £<fc>(t) = -L [ (~ix)ke~itx V2rr Jn t&n. 11.45 Prove Proposition 11.5 on page 665. 11.46 Prove Theorem 11.10 on page 666. 11.47 Let p G М(П). Show that p has period 2тг if and only if |/z|(3c) = 0. In the next two exercises, we borrow some terminology from communications engineering. A measure p G is said to be time limited if it vanishes on
672 □ Chapter 11 Elements of Harmonic Analysis Borel subsets of [—a, a]c for some a > 0, and it is said to be band limited if Д vanishes outside of [—&, b] for some b > 0. 11.48 Show that if p is band limited, then there is an f G £1(7^) such that p(B) = fB fdX for all В G 13. In other words, prove that every band- limited measure is absolutely continuous with respect to Lebesgue measure. 11.49 Show that a measure that is both band limited and time limited must vanish identically. Hint: Use the fact that if a function analytic on C vanishes on a nonempty open interval, then it vanishes identically. 11.50 Prove Lemma 11.4 on page 669. 11.51 Show that it suffices to prove Theorem 11.12 in the case of zero mean and unit variance. 11.52 Let {Xn}£°=1 be a sequence of random variables. a) Show that if Xn Л X, where X is a continuous random variable, then Fxn —► Fx uniformly on Я. Hint: Show that for each c > 0, there is a T > 0 such that gxn ([—T,T]C) < e for all n. Use the uniform continuity of Fx . b) Use part (a) to deduce the uniformity in the central limit theorem. 11.5 £2-THEORY OF THE FOURIER TRANSFORM Because £2тг С Fourier coefficients are defined for functions in £3^. Indeed, the theory of Fourier series for ^„.-functions is particularly well understood. In the sense of convergence in the norm of £3^., we have, for f G £3^, that №)= £ /(n)eini (11.20) n=—00 and, furthermore, 00 II/II2 = 2ТГ X \f(n)\2. (11.21) n=—00 Given the strong analogy between Fourier coefficients and the Fourier transform, we would expect similar results to hold for functions in £2(7£) provided, of course, that the sums in (11.20) and (11.21) are replaced by suitable integrals. However, there is an immediate problem: Because £2(7£) $£ £x(7£), the Fourier transform is not defined for all functions in £2(7£). To proceed we must therefore first provide an appropriate definition of the Fourier transform of such functions. In this section, we will see that the “correct” definition leads naturally to extensions of (11.20) and (11.21).
11.5 £2-Theory of the Fourier Transform □ 673 We begin by studying the integral f \f(t)\zdt, /бф). (11.22) J—oo In particular, we would like to know when this integral is finite. By the monotone convergence theorem, the finiteness of (11.22) is equivalent to that of limT—oo |/(t)|2 dt. Referring to Example 11.5 on page 655 and applying Fubini’s theorem, we get that [T \f(t)\2dt=^~ Г Г /(;г)Ж [T e-^-^dtdxdy J-T 27Г J-ooJ-oo J-T = - [ [ f(x)f(yyr sinc(T(x -y))dxdy. J-oo J-OO The presence of the term sinc(T(x — y)) makes these integrals hard to handle. Consequently, we will employ the averaging technique used in previous sections. We first note that [ |/(s)|2dsdt = ~^= f f f(x)f(y)GT(x-yjdxdy, J—t v2tt J—о© J—oo (11.23) where 1 — cos(Tx) TV Because, by Exercise 11.53, lim f |/(t)|2dt= lim If f \f(s)\2dsdt, Г-oo J_T T—*oo T Jo J_f it follows that finiteness in (11.22) is equivalent to that of the right-hand side of the previous equation. Our strategy therefore is to examine finite- ness in (11.22) by working with the right-hand side of (11.23). LEMMA 11.5 If f G £2(7£) П ^(Я), then f G £2(7£) and ||/||2 = ||/||2. i2(Ti/2) = vl
674 □ Chapter 11 Elements of Harmonic Analysis PROOF: By Lemma 11.2 on page 658, ll/lli = -L Г WfWlGr^dx = -+=. Г Г f(y)f(y)GT(x)dxdy. v 2тг J-oo v2?r J-oo J-oo On the other hand, from (11.23) we have [ [ \f(s)\2dsdt = —j==f [ f(x+ y)f(y)Gr(x)dxdy. Jo J—t \/27Г J—oo J—oo Using Fubini’s theorem, we get, 1 rT ft л - / i/(S)i2dSdt-n/iii 1 Jo J-t = ~i= [ [ (f(x + y) - /(j/))/(j/)Gt(x) dydx. V 2/K J—oo J-oo and applying Cauchy’s inequality gives 7 [T f \f(s)\2dsdt-ll/ll2 < 4=11/112 Г \\f-s-f\\2GT(x)dx. 1 Jo J-t V^T J — oo We now proceed as in the proof of Theorem 11.7(b) on page 658 to show that the right-hand side of the preceding inequality tends to 0 as T —> oo. Thus, we have Г |/(з/)|2й= lim i [T /“ \f(s)\2dsdt = ||/||22, J — OO T—+OO 1 Jq J _t as required. THEOREM 11.13 Plancherel’s Theorem There is a unique linear operator .F:£2(7£) —> £2(7£) with the following properties: a) For each f e £2(7£) О we have = f ae. b) For each f E £2(7£), we have lima/-^ ||^(/) — r^f Ц2 = 0, where тм = С) И(/)1|2 = ||/||2 for each f e Г2(Л). d) W/),^)) = {f,9} for f,g & C2{Tl). e) 5"(Jr(/))(a:) = /(—x) ae for each f e £2(7l).
11.5 £2-Theory of the Fourier Transform □ 675 PROOF: Let f G £2(7£) and let {A/n}^=1 be a sequence of positive num- bers tending to oo. Then we have {тмп/}™=1 С £2(7£) A £1(7^) and limn^oo ||тд/п/ — f ||2 = 0. Let fn = TMnf. From Lemma 11.5, it fol- lows that \\fn - fm||2 = \\fn - /mlh- Consequently, the sequence {fn}™=1 is Cauchy. Using the completeness of £2(7£), we now define JT(f) = lim f^ = lim r^f. (11.24) n—+OO n—*oo As the reader is asked to show in Exercise 11.54, the limit in (11.24) is independent of the particular sequence of Mns. If f G £2(Я)А£х(Я), then the dominated convergence theorem implies that the sequence {fn}^=i converges pointwise to f. Thus, part (a) holds. That У is a linear operator can be seen as follows: Let f,g G £2(7£) and a,/3 G C, and, for each n G N, set hn = TMnaf 4- тмп&9- Using the linearity of the Fourier transform on £1(7^), we have hn = атмп/+/3тм^д. Passing to the limit on n, we get 4- /Зд) = aFfJ) 4- /3F(cj). Part (b) follows from the definition of ^(f) and Exercise 11.54. We obtain part (c) from Lemma 11.5 via И(/)||2 = Hm ||t^7||2 = lim ||tm„/||2 = ||/||2. n—>oo n—>oo The uniqueness of a linear operator satisfying parts (a) and (c) is a conse- quence of the fact that £x(7£) A£2(7£) is a dense subset of £2(7£). Part (d) is left to the reader as Exercise 11.55. It remains to prove part (e). Since C2(1t) A £x(7^) is dense in £2(7£), it suffices to prove that ^(^(f)) = R(f) for all f G £2(7£) A jC1^), where R(f)(x) = f(~x)- So assume f G £2(7£) A £x(7^). Then, as the reader is asked to show in Exercise 11.55(b), f * GT G £2(7£) А £\?г) and f^GT e £2(7£) A £\R) (11.25) and lim ||f *GT-f ||2 =0. (11.26) T —►oo Applying parts (a) and (c), (11.25), (11.26), and Theorem 11.8 on page 660, we obtain w 0= lim ||f * GT -f||2 = lim ||f - JT(^(f))||2 T—к» T—кэо = lim * GT) - J=W))||2 = \\R(f) - JW))ll2- T—►□© The proof of Plancherel’s theorem is now complete.
676 □ Chapter 11 Elements of Harmonic Analysis The operator T given in Plancherel’s theorem extends the definition of the Fourier transform to the space £2(7£). From now on we will call ^(/) the Fourier transform of an £2(7£)-function / and write Я/) = f, fe Although strictly speaking this is an abuse of notation, we observe from part (a) of Plancherel’s theorem that this notation is consistent with pre- vious usage. EXAMPLE 11.11 Illustrates Plancherel’s Theorem The function sine is not in £1(7?.), but it is in £2(7?,). By Example 11.5 on page 655 we have sine t = \Ar/2 xpij] (t). Applying part (e) of Plancherel’s theorem, we deduce that sincx = X[-i,i](-x) = y/ir/2x[-i,i](x) for almost all x. In particular, sine is not continuous. □ The Fourier transform on £2(7£) retains some of the properties that it has on £1(7?,) — but others are lost. Specifically, our next theorem shows that part (d) of Theorem 11.5 remains valid for £2(7£)-functions, as do parts (e) and (f), provided we modify the notion of derivative. On the other hand, the Fourier transform of an £2(7£)-function need not be continuous, as Example 11.11 shows. Let f G £2(7£). Then we say that ф G £2(7£) is the derivative of f in the £2-sense, and write ф — if ||(/-h - f)/h- ф\\ъ 0 as Л 0. That this definition of derivative is close to the usual one can be seen from Exercise 11.57(a). THEOREM 11.14 Let f G £2(7£). Then the following hold: a) For a, b G and a / 0, we have = \/H e~'btf(at) ae. b)If$lя2|/(я)|2 dx < oo, then we have f'=gin the £2-sense, where g(x) = —ixf(x). c) If /' exists in the £2-sense, then /'(t) = itf(t) ae. PROOF: See Exercise 11.57.
11.5 £2-Theory of the Fourier Transform □ 677 EXERCISES 11.5 11.53 Show that lim [ \f(t)\2dt= lim 1 Г Г |/0)|2 ds dt. T-^J_T T^°°T Jo J-t 11.54 Prove that the limit in (11.24) on page 675 is independent of the particular sequence of Mns tending to oo. 11.55 Refer to Plancherel’s theorem (page 674). a) Prove part (d) of the theorem. b) Verify (11.25) and (11.26). 11.56 Show that the Fourier transform is onto £2(7£). 11.57 Establish the following. a) If f G £2(7£) is an absolutely continuous function such that /' G £2(7£), then f' is also the derivative of f in the £2-sense. b) Theorem 11.14 on page 676. 11.58 Let f be a continuous function in £2(7£) such that $2^1|f(n) | < oo and f (t) = 0 for 11\ > 7Г. a) Show that, as a function on [—тг, тг], f has the Fourier series expansion f(t) = E”=-oo Cne4”* where °" = (27r)-1/2/(-n)- b) Show that /(x) = /(n) sinc(7r(x — n)) for each x E1Z. c) Use part (b) to prove the Shannon sampling theorem: Let L be a positive constant. Suppose that g is a continuous function in £2(7£) such that l^(n7r/^)l < 00 and dlfi = ^ог H — L. Then p(x) = №№ sinc(Lx — тгп). The Shannon sampling theo- rem is used extensively in communications engineering. Exercises 11.59-11.66 consider an important class of special functions closely related to the Fourier transform. 11.59 Let д be the measure defined by /x(B) = fB e“®2 dX(x) and ( , )д the inner product induced by д. a) Verify that the space £2(д) contains all polynomials. b) Apply the Gram-Schmidt orthogonalization technique (see Theorem 9.5 on page 548) to the sequence 1, x, x2, ... to obtain a sequence of poly- nomials Ho, Hi, ..., where Hn is of degree n, that are orthonormal with respect to ( , ) . The Hns are often referred to as Hermite polynomials. c) Deduce that any polynomial p of degree n can be written in the form 11.60 Refer to Exercise 11.59. 2 a) Deduce that the functions hn(x) = Hn(x)e~x constitute an or- thonormal sequence in £2(7£).
678 □ Chapter 11 Elements of Harmonic Analysis b) Show that kn = hn takes the form kn(t) = JCn (t)e f2/2, where Kn is a polynomial of degree at most n. 11.61 Prove that, for each n G V, we have Kn = anHn, where an E {1, —1, i, —i}. 11.62 Let an denote the leading coefficient of /fn. Show that hn{x) xhnix) == 2 <*n+l 11.63 Prove that e~^2hn(x) = cn^e^, where cn = (— l)nan2“n. 11.64 Referring to Exercise 11.61, verify that an = (—i)n. 11.65 Show that the collection of functions {ho, hi, /12, • • •} form an orthonormal basis for £2(1Z). 11.66 Show that the Fourier transform of a function f 6 £2(1Z) can be expressed 11.6 INTRODUCTION TO WAVELETS The theory of Fourier series seeks expansions of the form /(x) = £ /(n)e’nx n=—OO that express the function f as an infinite linear combination of dilations of the basic oscillating function E(x) = егх. Similarly, wavelet theory is concerned with expansions of the form f(x) = Cnm^(anx + bm). (11.27) n,m=-oo that express f as an infinite linear combination of translations-dilations of a single function *ф called a wavelet. Wavelet theory, however, unlike the theory of Fourier series, emphasizes the case where is localized, that is, tp vanishes or decays rapidly outside of some bounded interval. This and the following section provide a brief introduction to the bur- geoning theory of wavelets. We begin with a discussion of the family of Haar wavelets. Motivated by the example of Haar wavelets, we will then introduce the concept of a multiresolution analysis of £2(7£).
11.6 Introduction to Wavelets □ 679 In our discussion of wavelets, we will restrict ourselves to functions in the Hilbert space £2(7£). And when we consider convergence of expan- sions of the form (11.27), we will always do so in the context of the usual £2 (7?,)-norm. It will therefore be unambiguous to drop the subscript on that norm and to write ( , ) for the usual inner product on £2(TZ). As a further restriction, we will only investigate expansions of the form (11.27) in case an = 2~n and bm = —m, where n and m vary over the set Z of all integers. Double sums of the form 52^°m=-oo be denoted by У . Wavelets and Haar Wavelets In what follows, we will employ the notation /(n,m)(^) = f2”,2”m(x) = 2~n/2f(2~nX - m). It is important to note that if f G £2(7£) and ||/|| = 1, then we have ll/(n,m)|| = ll/ll = 1 for all n,m G Z. DEFINITION 11.6 Orthonormal Wavelet Basis, Wavelet Let ф G £2(7£). If the collection of functions {^(п,т) : n,m G Z} is an orthonormal basis for £2(7£), then it is called an orthonormal wavelet basis and the function is called a basic wavelet or, more simply, a wavelet. The following example introduces an important orthonormal wavelet basis and illustrates some basic ideas of wavelet theory. EXAMPLE 11.12 The Haar Wavelet For each n G Z, let Vn denote the set of all functions in £2(1Z) that are constant on every interval of the form [t 2n, (£+l)2n), where t G Z. Clearly, we have Vn C Vn-i for each n G Z. Moreover, as the reader is asked to verify in Exercise 11.67, we have Vn is a closed linear subspace of £2(7£), (11.28) U = Ф), (11.29) nEZ п К = {о}. (11.30) nez
680 □ Chapter 11 Elements of Harmonic Analysis Let ip = X[o,i) • From Exercise 11.68, Vn = span{ 4>(n,m) : m 6 Z }. (11.31) Applying (11.29), it follows that span{ ^(n,m) • m, n G Z } = £2(7£). (11.32) Also, for each n 6 2, the family { ^(n,m) • n* G Z } is orthonormal. But although {(P(n,m) • тп,п E Z} resembles an orthonormal wavelet basis, it is not because it lacks orthogonality. Indeed, we have, for example, that <¥>(n,o),^(o,o)) / 0- The problem is that the <p(n,m)S are nonnegative- valued. To produce an orthonormal wavelet basis, we will modify To that end, we define 1, h(x) = < —1, 0, if 0 < x < 1/2; if 1/2 < ж <1; otherwise. Members of the family Bh = { h(n,7n) : n, m G Z } are referred to as Haar functions. It is not difficult to show that Bh is orthonormal; but verifying that it is a basis for £2(7£) is somewhat more challenging. Suppose we can prove that (pGspanBfc. (11.33) Then, because f G spanSh => € spanBh, n,m G Z, (11.34) (see Exercise 11.70), it follows that span{ : m, n G Z } C spanBh. Thus, by (11.32), spanS/i = £2(7£) and, hence, Bh is a basis. It remains to verify (11.33), which we will do by proving that (11.35) 171,71 As the reader is asked to show in Exercise 11.71, / , \ ( 2 n/2, if n > 0 and m = 0; ^Л(п,ГО)) = |0) otherwise.
11.6 Introduction to Wavelets □ 681 To establish (11.35), we first prove that <p(xj = J 2~nh(2~nx), X e n. n=l (11.36) Clearly, both sides of (11.36) are 0 if x < 0. For x G [0,1), we have 2~nx G [0,1/2) for all n > 1 and, consequently, both the left- and right- hand sides of (11.36) equal 1. For x G [l,oo), select к G such that 2k~r < x < 2k. Then we have 2~nx G [0,1/2) for n > fc, 2~kx G [1/2,1), and 2~nx G [l,oo) for n < fc. Therefore, the right-hand side of (11.36) is -2~fe + 2~n = 0 which, of course, equals the left-hand side of (11.36). It now follows from Proposition 9.3 on page 532 and the DCT that (11.35) holds. We have now shown that the family of Haar functions constitutes an orthonormal basis for £2(7£). Hence, it forms an orthonormal wavelet basis and fc is a wavelet, called the Haar wavelet. □ Multiresolution Analysis Guided by the essential features of Example 11.12, we can establish a gen- eral framework for constructing orthonormal wavelet bases. Specifically, we will work with a sequence {Vn}^°=_oo of closed subspaces of £2(7£) sat- isfying the following conditions. (Ml) ••• C V2 C Vi C Vo C V-i C V_2 C •••. (М2) J Vn = £2(7£). nez (М3) Qv„ = {0}. nez (M4) f G Vn if and only if /(_n,o) € Vo- (M5) f G Vo if and only if /(o,m) € Vo for all m G Z. (M6) There is a function ip G Vo such that {<p(o,m) : m G Z } is an orthonormal basis for Vo- A sequence {Vn}^L:_oo of closed subspaces of £2(7£) satisfying (M1)-(M6) is said to be a multiresolution analysis of As we observed in Example 11.12, if we let Vn denote the collec- tion of £2(7£)-functions that are constant on every interval of the form [£2n, (£ + l)2n), where £ G Z, then {Vnl^L-oo is a multiresolution analysis with tp = X[o,i) •
682 □ Chapter 11 Elements of Harmonic Analysis In the general setting described by (M1)-(M6), the family of functions { : m G Z } is an orthonormal basis for Vn for each n, but (p(n?Tn) and may not be orthogonal if n / j. Rather, an orthonormal wavelet basis can be constructed using <£, as we will show in the next section. For the remainder of this chapter, we will assume that {Vn}^L_oo is a multiresolution analysis. The orthogonal projection of £2(7£) onto Vn will be denoted by Pn. Our next lemma, whose proof is left to the reader as Exercise 11.72, will be needed in our development of the theory of wavelets. LEMMA 11.6 Let Wo = Vo-1- П V_b Wn = { /(n,0) : f € Wo }, and Qn : £2(7£) - Wn be the orthogonal projection of £2(jR.) onto Wn. Then the following hold, a) If £ then (f,g) — 0 for all f G We and g G Wn. Ь) Pn-l = Pn “b Qn- c) For each f G £2(7£), we have f = Y^=-ooQn(fh where the series converges absolutely with respect to the £2(1Z)-norm. Now, if {*0(0,771) • пг G Z} is an orthonormal basis for Wo, then { 0(n,m) • m E Z } is an orthonormal basis for Wn. Hence, by Theorem 9.6 on page 549, Qn(/) = Е~=_те for each f (= £2(7£). It follows from Lemma 11.6(c) that f = £n,m On the other hand, by Lemma 11.6(a), the family • n, m G Z} is orthonormal. Thus, { 0(n?rn) : n, m G Z } is an orthonormal wavelet basis for £2(7£). We have shown that if {0(Ojrn) : m G Z} is an orthonormal ba- sis for Wo, then {0(njTn) : n,m G Z} is an orthonormal wavelet basis for £2(7£). We conclude this section by giving sufficient conditions for {0(o,m) : П7 6 2} to be an orthonormal basis for Wo- PROPOSITION 11.6 Suppose 0 E Wo is such that ||0|| = 1 and also satisfies the following two conditions: a) For each n G Z\ {0}, we have eint |0(t)|2 dt = 0. b) For each f G Wo, there exists F G £2^ such that f = Р'Ф- Then { 0(o,m) : m. G Z } is an orthonormal basis for Wo and, consequently, {0(n,m) - n,m E Z} is an orthonormal wavelet basis for £2(7?,).
11.6 Introduction to Wavelets □ 683 PROOF: Applying condition (a), Plancherel’s theorem (page 674), and Theorem 11.14 (page 676), we get that = Г = {°- ™ J —co ' t 11 P ~ frt,t Thus, { ^(о,т) : m e Z } is orthonormal. We will show that it forms a basis for Wo by applying Theorem 9.6(c) on page 549. Suppose that f 6 Wq and (/, — 0 for each m 6 Z. Then, by condition (b) and, again, Plancherel’s theorem and Theorem 11.14, eimtF(t)\^(t)\2dt = = 0 (11.37) for each m € Z. Now, we have [°° eimtF(t)$(t)\2dt = V [ eimtF(t)\ij>(t)\2 dt -<x> l=_(x> eimtF(t) 52 |^(« + (2^+1)тг)|2Л. £=-oo The function g(t) = F(t) Y^-oo by Cauchy’s inequality, \$(t + (2^ + 1)tt)|2 belongs to since, 52 |^(t + (2£+ l)7r)|2dt £=-oo ‘OO In view of (11.37), all Fourier coefficients of g vanish and, consequently, Theorem 11.1 on page 638 implies that g = 0 ae. It follows that f = Fty vanishes ae on 7?. and hence that f = 0 ae. EXERCISES 11.6 11.67 Verify (11.28)-(11.30). 11.68 Refer to Example 11.12 on page 679.__ a) Prove that Vn = span{ v?(n,m) : m € 2}. b) Show that (a) holds for any multiresolution analysis of £2(7£).
684 □ Chapter 11 Elements of Harmonic Analysis 11.69 Prove that the Haar functions form an orthonormal family. 11.70 Verify (11.34). 11.71 Show that (X[o,i)> Ь.(П)ГП)) f 2 n/<2, if n > 0 and m = 0; ( 0, otherwise. 11.72 Prove Lemma 11.6 on page 682. 11.73 Calculate the Fourier transforms of the Haar functions. 11.74 For a multiresolution analysis {Vn}$^=-oo> let Pn denote the orthogonal projection of £2(7£) onto Vn- Is Pno Pn-i = Pn? Hint: See Exercise 9.26 on page 544. 11.75 Show that the Haar functions do not span a dense subspace of £1(7^). 11.76 Let { h(niTn) : n,m 6 Z } be the family of Haar functions. Define /(x) = < 2z, 2-2z, 0, for 0 < x < 1/2; for 1/2 < x < 1; for x £ [0,1). Sketch the graph of the partial sum 52n=-i 11.7 ORTHONORMAL WAVELET BASES? THE WAVELET TRANSFORM In this section, we continue our presentation of wavelet theory. Working with a multiresolution analysis {Vn}£L-oo> we will construct a function *ф satisfying the conditions of Proposition 11.6 on page 682 and thereby ob- tain an orthonormal wavelet basis for £2(7£). We will also introduce a continuous version of the wavelet expansion. Scaling Functions Recall that a sequence {Vn}^=_oo of closed subspaces of £2(7£) is called a multiresolution analysis of £2(7£) if it satisfies (M1)-(M6) on page 681. By (M6) there is a function tp such that { 9?(o,m) • m 6 Z } is an orthonormal basis for Vq. We will call ip a scaling function of the multiresolution analysis {Vn}^L_oo. Properties of tp are developed in the following lemmas.
11.7 Orthonormal Wavelet Bases; The Wavelet Transform □ 685 LEMMA 11.7 Let tpbea scaling function of the multiresolution analysis {Vn}£L-oo* Then 00 1 £ |^ + 2тг< = - 27Г for almost all t G 1Z. PROOF: We outline the proof, leaving the details to the reader for Ex- ercise 11.77. Let g(f) = £X-oo + 27r^)|2- Then g is an extended real-valued function with period 2тг. That g is finite almost everywhere follows from f^g(t)dt = ||0||2 = 1. We now see that g G Using Plancherel’s theorem and Theo- rem 11.14 on page 676, it can be shown that the Fourier coefficients g(k) vanish for к / 0. Thus, g has the same Fourier coefficients as the function that is constantly equal to 1/2тг. Applying Theorem 11.1(c) on page 638 now yields the required result. LEMMA 11.8 Let <pbea scaling function of the multiresolution analysis {Vn}^=-oo • Then the following hold, a) For almost all x € 71, we have tp(x) = \/2 - n), where Pn = (p,P(-i,n))- Moreover, EX-oo Ш2 = L b) For almost all t G 1Z, we have <^(t) = mo(t/2) (^(t/2), where mo(t) = PROOF: Again we only outline the proof, leaving the details for Exer- cise 11.78. To prove (a) it suffices to show that {(p(_i?n) : n G Z } is an orthonormal basis for the space V-i. Applying the Fourier transform to both sides of the equation for given in part (a), leads to the verifica- tion of (b). LEMMA 11.9 Let ip be a scaling function of the multiresolution analysis {Vn}£L-oo an(^ let mo be as in Lemma 11.8(b). Then |m0(t)|2 + |mo(t + 7r)|2 = 1 ae.
686 □ Chapter 11 Elements of Harmonic Analysis PROOF: By Lemmas 11.7 and 11.8, we have 1 00 = E |£(t + 2tt£)|2 £=—oo oo oo = j; Ht+4^)i2+ 52 i^+2^+i)i2 £=—oo £=—oo oo = 52 l^(V2 + 2тг€)|2 |m0(t/2 + 2тг£)|2 £=—oo oo + 1^/2 + + 2тг£) |2 \m^(t/2 + 7Г + 2тг£)|2. £=—oo for almost all t. Using Lemma 11.7 and the fact that mo has period 2тг, we obtain that |m0(t/2)|2 У |^(t/2 + 2тг£)|2 1 27Г £=—oo + |mo(t/2 + 7г)|2 |<£(£/2 + 7Г + 2тг£)|2 £=—oo = ^-|m0(t/2)|2 + ^-|mo(t/2 + 7r)|2. 27Г 27Г The assertion of the lemma now follows immediately. Our next lemma characterizes the action of the Fourier transform on the space V-i. LEMMA 11.10 Let <p be a scaling function of the multiresolution analysis {Vn}^=-oo and let f € V-i. Set fn = (/, ^(-i,n))- Then f(t) = mf(t/2) ^(t/2), where (11.38)
11.7 Orthonormal Wavelet Bases; The Wavelet Transform □ 687 PROOF: Recall that Wo = Vo-1" О V-i. Because : n € Z} is an orthonormal basis for V-i, every f e Wo has the expansion oo oo f= X? (A <£(-l,n))<£(-l,n) = У2 /n^(-l,n)‘ n——oo n=—oo The required result now follows from a straightforward application of the Fourier transform using Theorem 11.14 on page 676. When the function f belongs to the space Wo, more can be said about the function rrif given in (11.38). LEMMA 11.11 Let tp be a scaling function of the multiresolution analysis {Vnl^-o© and let mo and m/ be as in Lemmas 11.8 and 11.10, respectively. IffE Wo, then ______ _________________ mo(t) TTif(t) 4- mo(t 4- тг) mj(t 4- тг) = 0 for almost all t. PROOF: The proof is similar to that of Lemma 11.7. We sketch it here and leave the details to the reader as Exercise 11.79. Let G(t)= f(t + 2тг£) <p(t + 2тг€). £=—oo Then G € ancl Fourier coefficients vanish; so, G = 0 ae. Applying Lemmas 11.8 and 11.10 we have, for almost all t, that 0= ^2 Tn/(t/2 4- 7t£) m^t/2 4- тг£) |<^(t/2 4- тг£)|2. £=—oo The proof is completed by an argument similar to that used in the proof of Lemma 11.9. Next we have a formula that characterizes the Fourier transform of a function in the space Wo- LEMMA 11.12 Let ip be a scaling function of the multiresolution analysis {VnJJJL-c» and let ttiq be as in Lemma 11.8. Then f € Wo if and only if there is a function F € £?, such that f(t) = eit/2 mo(t/2 + 7r) <p(t/2) F(t). (11.39)
688 □ Chapter 11 Elements of Harmonic Analysis PROOF: Suppose that f e Wo- Let Ltt} = / + тг), if m0(t 4- тг) / 0; ( 0, if mo(t 4- тг) = 0. It follows from Lemmas 11.9 and 11.11 that L(t) = —L(t + тг) (11.40) and _________ mf(t) = mo(t 4- тг) L(t). (11.41) Now let F(t) = e"lt/2L(i/2). Applying Lemma 11.10 and (11.41), we deduce that * f(t) = ei(/2 m0(t/2 + тг) £(t/2) F(t). From (11.40), we see that F has period 2тг. Also, it follows from the defini- tion of F that |F(t)|2 |mo(t/2-|-7r)|2 = |my(t/2)|2. Hence, by Lemma 11.9, К = |F(t)|2 |m0(t/2)|2 + |F(t)|2 |m0(t/2 + < = |m/(Z/2 + 7r)|2+|m/(f/2)|2. Consequently, by Theorem 9.6 on page 549, we have [ |F(t)|2 dt = f (|m/(t/2 4- тг)|2 -I- |m/(t/2)|2) dt J — 7Г J — 7Г = 2/ |m/(t)|2 dt = 4тг £2 l/n|2 = 47г||/||2 < oo. t'~7r n——oo This shows that F G £2тг- Conversely, suppose that f satisfies (11.39) for some F G £2тг- We then have a Fourier series expansion F(t) = 52^=-oo F(n)elnt, where l^(«)|2 < 00. Thus, /(<) = £ F(n)einteit/2m0(t/2 + ir)<p(t/2). (11.42) n= —OO Applying Theorem 11.14 on page 676, we have ____________ i 00 eil'2 mg(t/2 + тг) ^(t/2) = 4= V K— — OO oo = (”~f) 1,—fc)(f), fc=—oo where we recall from Lemma 11.8 that pk —
11.7 Orthonormal Wavelet Bases; The Wavelet Transform □ 689 The series oo V»= 52 ,-fc) k=—oo defines a function in the space V-i. By the continuity of the Fourier trans- form, we have _____________ 0(t) = mo(t/2 + 7r) ip(t/2). (11.43) Consequently, we can use Plancherel’s theorem and Theorem 11.14 to rewrite (11.42) as oo /(<)= £ F(n)eint^t). n=—OO Thus, f is also the Fourier transform of the function ^(П)0(о,-п)« Applying the uniqueness property of Fourier transforms (Corollary 11.1 on page 660), we conclude that oo /=^2 F(n)^0-n)- n=—oo Because € V-i for each n € Z, it follows that f € V-i. To complete the proof, we must show that f € Vo"1"- This will be accomplished if we can prove that (V\0,—n)? ^(0,—п+т)} = (VS ^(0,m)) 0 (11.44) for all m € Z. However, we have by Plancherel’s theorem, Lemmas 11.7 and 11.8, and (11.43) that (Ф 1^(0,тп)) ~ (VS 0(0,m)) = f ei(m+i/2)tmo(f/2-h 7r) mo(^/2) \ф(1/2}\2 dt J—oo OO -2k__________________________________ = 2/ ег(2гп+х)* m0(t + 7r) mo(t) |0(t + 27f^)|2 dt £=—oo J0 1 f27T_________________________ = — / el(2m+1)t Tno(t + 7r) molt) dt П Jo = — [ mo(t + 7r) mo(t) dt * Jo + - Г ei(2m+l)(t+Tr) m + 2k) TOo(t + 7Г) dt К Jo = 0. This verifies (11.44) and completes the proof of the lemma.
690 □ Chapter 11 Elements of Harmonic Analysis Construction of Orthonormal Wavelet Bases In the course of the proof of Lemma 11.12, we constructed the function OO k=—oo As the next theorem shows, ф is a wavelet. THEOREM 11.15 Let be a multiresolution analysis of £2(7£) with scaling func- tion tp. Define oo Ф= 52 (11.45) k=—oo where ipk — Then {ф(п,т) • n,m E Z} is an orthonormal wavelet basis for £2(7£). PROOF: We will prove the theorem by verifying that ф satisfies the hy- potheses of Proposition 11.6 on page 682. To begin, we note that \\ф\\ = 1 because 'EnL-oo l^n12 = 1- Also, in the course of proving Lemma 11.12, we actually established that ф G Wo- It now follows from (11.43) and Lemma 11.12 that condition (b) of Proposition 11.6 is satisfied. It remains to show that condition (a) of Proposition 11.6 holds. To that end, we apply (11.43), Lemma 11.7, and Lemma 11.9 to obtain that Г eint|Vi(t)|2 dt = Г° eint |m0(t/2 + тг)|2 |£(t/2)|2 dt —oo J—oo oo г2тт = У2 I el2nt\mQ(t 4- 7г)|2 \fi(t 4- 27rf)|2 dt £=-oo 1 f27r . = — J et2nt\mQ(t 4- тг)I2 dt = — el2nt|mo(t 4- 7r)|2 dt + — [ et2nt\m^{t 4- 2тг)|2 dt к Jo 1 Г = - / ei2nf dt = 0, Jo for П / 0.
11.7 Orthonormal Wavelet Bases; TheWavelet Transform □ 691 EXAMPLE 11.13 Illustrates Theorem 11.15 Refer to Example 11.12 on page 679. If we apply Formula (11.45) to the scaling function <p = X[o,i), we obtain the wavelet -0(x) = ip(2x 4-1) — ip(2x 4- 2). This wavelet is quite similar to the basic Haar wavelet, h. In fact, we have ^(0,1) = —h. It follows that, in this case, the orthonormal wavelet basis determined by consists of the Haar functions multiplied by the factor —1. □ The Wavelet Transform Next we introduce a continuous version of the discrete wavelet expansion. To begin, we recall that for a,b G 'll and a / 0, the function 'фа,ъ is the translation-dilation of the function фа,ь(х>) = ~7r=^((x ~ tf/a)- Vl°l Here now is the definition of the wavelet transform. DEFINITION 11.7 Wavelet Transform Let be a fixed function in £2(7l) \ {0}. Then, for each f G £2(7£), , the function Wf: (0, oo) x H —> C defined by W) = (« is called the wavelet transform of f. Note: Although the wavelet transform depends on the fixed function we have retained the terminology found in the literature by writing W instead of and by using the terminology “ the wavelet transform” instead of, say, “ the wavelet transform with respect to EXAMPLE 11.14 Illustrates Definition 11.7 Let Г > 0 and set ф = From Plancherel’s theorem, we have for each f G £2(7?.) that Wf(a,b) = Referring to Example 11.5 on page 655, we conclude that H7(a,&) = f(t) sinc(afT)elbt dt. oo
692 □ Chapter 11 Elements of Harmonic Analysis Replacing f by f and again applying Plancherel’s theorem, we obtain that Ж/(а, b) = I f(t) sinc(atT)eiM dt ’ = J f(x) sinc(axT)e~lbx dx. If f is also in £x(7£), then we can use the dominated convergence theorem to conclude that If00 lim W= —7= / f(x)e ltxdx = f(t). у2тг J-oo Thus, we obtain the Fourier transform of f as a limiting case of a wavelet transform. □ If гр is a wavelet, that is, if { V\n,m) • n,m € Z } happens to be an orthonormal wavelet basis for £2(7£), then f can be recovered from its wavelet transform via f = 52 = 52 W7’(2n,2nm)V’2’>,2’>m- n,m n,m This suggests, in general, the heuristic formula f= / W7(a,6)^ai6dg(a,&) (11.46) J (0,oo)x'R. for recovering a function from its wavelet transform. In what follows, we will show how sense can be made of (11.46) by choosing the measure ц appropriately and imposing mild restrictions on гр. We begin by deriving the measure /1. By Plancherel’s theorem and Theorem 11.14 on page 676, we have /ОО __________ /(t)^(at) eibt dt = F(-b), -OO where F(t) = y/2naf(t)$(at). Again applying Plancherel’s theorem, we , obtain, for each а > 0, that /ОО pOO \Wf(a,b)\2db = / \F(—b)\2 db -oo J —oo /ОО roo |F(t)|2 dt = 2тга / |/(t)|2|V-(at)|2 dt. -oo J—oo
11.7 Orthonormal Wavelet Bases; The Wavelet Transform □ 693 Multiplying by a 2 and integrating over (0, oo) yields /•OO /*OO 1 I / \Wf(a,b)\2-^dbda 0 J—oo /ОО z»OO 1 2тг|/(£)|2 / -|V>(at)|2 dadt -oo Jo & (И-47) /•oo /»oo -j /»0 /*0 -j f 2тг|/(£)|2 I -|^(5)|2 dsdt — / 2тг|/(£)|2 / -|^(s)|2dsdt 0 Jo s J-ж J_oo s We are now ready to impose a restriction on namely, that r°° 1 - ds = I -|'0(s)|2 ds = Сф Jo s oo. With this restriction on Vs we can prove a theorem for the wavelet transform that is analogous to Plancherel’s theorem. THEOREM 11. 16 Suppose that € C2(JRJ) \ {0} and 7° 1 - r00 1 - — I -l^(s)I2ds = / -|^(s)|2 ds = < oo. J—oo $ Jo & Define the Borel measure p$ on (0, oo) via Mo(B) = 77г [ a~2 в EB, ЛТГСф J в (11.48) and let p = pq x Л. Then the wavelet transform is a linear operator from £2(7£) to C2(p) that satisfies II WII2.M = 11/11, /e£2(7£), (11.49) where || ||г,м denotes the £2-norm on £2(g). PROOF: It follows from (11.47), (11.48), and Plancherel’s theorem that f 1 f°° f°° 1 / \Wf(a,b)\2d^a,b) = —— / / \Wf(a,b)\2-idbda У(0,оо)хтг z7rC^ j0 j_oo a /•OO 7*00 = / |/(t)|2dt = / \f(x)\2 dx. J—00 J — 00 Thus, (11.49) is valid.
694 □ Chapter 11 Elements of Harmonic Analysis Theorem 11.16 provides a likely candidate for the measure p appearing in the heuristic formula (11.46). Still, the problem of correctly interpret- ing (11.46) remains. One possible approach, explored in Exercise 11.83, is to show that under appropriate conditions f(x)=l Wf(a,b)ipatb(x)dn(a,b) J (0,oo)x7?, for almost all x. A more subtle, but easier to prove, interpretation is based on the following theorem whose verification is left to the reader as Exercise 11.86. THEOREM 11. 17 Suppose that ф € £2(7£) \ {0} and /° i л /*°° i л -|^(s)|2ds= I -|^(s)|2ds = < oo. -oo Jf) § Then (11.46) is valid in the sense that (f,g) = [ Wf(a,b){ipatb,g)dfj,(a,b'), f,ge£2(1l), J (0,оо)х7£ where p is defined as in Theorem 11.16. The theory of wavelets is an important and active research area. As a starting point for the interested reader, we recommend the paper “Wavelet transforms and orthonormal wavelet bases” by I. Daubechies (Proceedings of Symposia in Applied Mathematics, Vol. 47, American Math. Soc., Prov- idence, RI, 1993). EXERCISES 11.7 ft 11.77 Provide the details of the proof of Lemma 11.7 on page 685. 11.78 Provide the details of the proof of Lemma 11.8 on page 685. 11.79 Provide the details of the proof of Lemma 11.11 on page 687. 11.80 Show that (11.48) on page 693 is satisfied if ф is real-valued and ф vanishes in some open interval containing 0. 11.81 Show that (11.48) on page 693 is satisfied if ф is real-valued, ф is continuous in some open interval containing 0, ф(0) = 0, and ^'(0) exists.
11.7 Orthonormal Wavelet Bases; The Wavelet Transform □ 695 11.82 Show that (11.48) on page 693 is satisfied by the Haar wavelet, h, discussed in Example 11.12 on page 679. Find C# in this case. 11.83 Suppose that ip satisfies (11.48), д is defined as in Theorem 11.16, and /, f 6 £2(7£) П £1(7^). Prove that /(x) = I Wf(a,b)ipa,b(x)dfi(a,b) < J (0,oo) X7£ for almost all x G TZ. 11.84 Consider the Hermite function hi discussed in Exercises 11.59-11.66. a) Find C/ц. b) Determine Who. 11.85 Find a formula for Wfc,d in terms of Wf. 11.86 Prove Theorem 11.17. _ -
Claude Elwood Shannon (1916- ) Claude Elwood Shannon was born in Gaylord, Michigan, on April 30, 1916. In 1936, he ob- tained a bachelor's degree at the University of Michigan; in 1940, he was awarded both a mas- ter’s degree and a doctorate in mathematics at the Massachusetts Institute of Technology. After working as a National Research Fellow at Princeton University for a year, he joined the staff at Bell Telephone Laboratories in 1941. Shannon's charge at Bell Labs was to determine the most efficient method of transmitting information; his success in presenting the transmission of information as precise mathematical theory has led to his being regarded as one of the founders of information theory. Shannon related the relaying of informa- tion to a binary system of yes/по choices, represented by a 1/0 binary code, a representation still integral to computer design today. Shannon published the book, The Mathematical Theory of Communi- cation, in 1949, In 1956. he accepted the position of Visiting Professor of Electronic Communication at the Massachusetts Institute of Technol- ogy; in 1957, Professor of Communications Science and Mathematics, and in 1958, Donner Professor of Science. In addition to communications engineering, Shannon's methods have profoundly influenced several other sciences including statistics, engineer- ing, biology, and physics. Dr. Shannon is now retired and resides in Cam- bridge, Massachusetts. 696
12 Measurable Dynamical Systems In this chapter we will discuss the theory of measurable dynamical systems. Section 12.1 introduces the theory by providing a motivating heuristic il- lustration, stating the definition of a measurable dynamical system, and presenting several examples. In Section 12.2 we discuss ergodicity and prove the pointwise ergodic theorem. Section 12.3 examines isomorphisms of measurable dynamical systems and introduces entropy. Then, in Sec- tion 12.4, we investigate the entropy of a Bernoulli shift. 12.1 INTRODUCTION AND EXAMPLES To introduce this chapter, we construct a simple heuristic model illustrating the idea of a measurable dynamical system. Imagine a particle p confined in some compact region (1 C ft3. Suppose that p moves around inside Q according to the following rule: If p is at x at time n, it moves to <^(z) at time n -I-1, where <p: Q —> Q is a function that is independent of n. Although, according to this rule, the particle is always moving in Q, the law governing its movement remains constant for all time. 697
698 □ Chapter 12 Measurable Dynamical Systems ^n(A) — f For A C Q, let 1, if p € A at time n; 0, otherwise. Then the expression д(А)= lim represents an average over time of the number of visits of the particle to the set A, that is, the number of visits to A per unit time by the particle. Let A denote the collection of subsets of Q such that the previous limit exists. Clearly, 0 and Q belong to A, and it is easy to see that p satisfies the following conditions: • p(A) > 0 for all A e A. • д(0) = 0. • М(П) = 1. • If A, В e A are disjoint, then p(A U B) = p(A) + /z(B). Consequently, we see that the triple (Q, A, p) resembles a probability space. Suppose that, indeed, (Q, A, p) is a probability space. (12.1) Because the particle is in A at time n if and only if it is in <^“1(A) at time n — 1, we have /zn_i(99-1(A)) = pn(A). It follows that AeA =* ^(AJeA (12.2) and AeA => ^(A) = ^"1(A)). (12.3) Thus, a quadruple (П, A,p, y>) satisfying (12.1)-(12.3) models the average behavior of the simple particle motion described in the preceding. Formally, we have the following definition. DEFINITION 12.1 Invariant Measure, Measurable Dynamical System Let (Q,A,/z) be a measure space. Suppose that —> Q and that y?~1(A) e A for all A e A. Then p is said to be invariant with respect to if p(A) = /z(y>-1(A)), AeA.
12.1 Introduction and Examples □ 699 If ii is invariant with respect to and is also a probability measure, then the quadruple (П,Л,^, 92) is called a measurable dynamical system. In the remainder of this section, we will present a variety of examples of measurable dynamical systems showing their relevance and importance. EXAMPLE 12.1 Addition Modulo One Let the operation 4- be defined on [0,1) by ? f \ ji / я + ?Л if x + y < 1; x + у = (x + y) mod 1 = < -r _l i v [ x + у - 1, if x + у > 1. For fixed b G [0,1), let <ръ(х) = x -?• b. Then ([0,1), A[o,i)> <Рь) is a measurable dynamical system. □ EXAMPLE 12.2 Rotation Through an Angle Let E be the map from [0,1) onto the unit circle T in the complex plane defined by E(x) — е21ггх and let A = { A С T : E"1^) G M }. Define the measure /z on A by p(A) = A(E”1(A)), so that /z is normalized arc-length measure on T. Also, for fixed b G [0,1), define •фь-Т -+T by ^(z) = e27ribz. Then (Г, Л, /z, фъ) is a measurable dynamical system. In a sense that will be made precise later, this example is the same as the previous one. □ EXAMPLE 12.3 Multiplication by 2 Mod One Let the mapping 925 in Example 12.1 be replaced by <p(x) = 2x mod I = x x. As the reader is asked to verify in Exercise 12.1, у is measurable with respect to A4[o,i) and A[0,i) is invariant with respect to 99. Consequently, ([0,1), Л4[од), A[o,i), <p) is a measurable dynamical system. It is interesting to note that if x G [0,1) has the binary expansion x = Q.X1X2X3 ..., then we have <p(x) = О.Я2Я3 • • • (2). □ EXAMPLE 12.4 Bernoulli Schemes Let S = {1,2,...,JV}, where N > 2, and let p = (pi,P2, • • • ,Pn), where Pj > 0 for each j G S and £^1 Pj = 1- The vector p defines a probability measure /zq on S via Mo({j}) = Pj-
700 □ Chapter 12 Measurable Dynamical Systems Recall that the Cartesian product fi = Sz consists of all functions on the integers having values in S', or, alternatively, all doubly infinite sequences of elements of S. From /iq we will construct a probability mea- sure // on Q by extending the development of product measure given in Theorem 4.20 on page 254. To begin, let F be a finite set of integers and a a function from F into S'. Then we define CF,a = { / G П : /(j) = a(j) for j e F }. Denote by C, the collection of subsets of fi consisting of 0, fi, and all sets of the form CF,a. Next we define a set function l on C by letting t(0) = 0, t(fi) = 1, and t(CF>o) = JI Mo({a(j)})- j’GF Exercises 12.2 and 12.3 ask the reader to show that C is a semialgebra of subsets of fi and that l satisfies conditions (E1)-(E3) on page 208 and con- dition (E4) of Theorem 4.12 on page 216. Consequently, by that theorem, i extends uniquely to a probability measure /1 on the ст-algebra A generated byC. We now have a probability space (fi,A, /х). Next we define the function > fi by <^(/)(j) = f(j + 1). If we consider the elements of fi doubly infinite sequences, then the effect of <p is to move each term of a sequence f one place to the left. For this reason, the mapping p is often called a Bernoulli shift. It is easy to see that p'\CF^ = CF^a^ (12.4) where F* = {j + 1 : j G F} and a*(j + 1) = a(j). It follows that the a-algebra { A C fi : G A} contains C and, hence, (^“^A) G A for each A G A. We claim that the measure /j, is invariant with respect to <p. Let the measure и be defined on A by i/(A) = ^(<^“1(A)). By (12.4) we have «'(C'f.o) = M(CF.,a.) = П Mo(a‘(j)) = П W>(°0’)) = M(CF,a)- J6F- jEF Thus v and agree on C and so, by Theorem 4.12, v = g. This means that д is invariant with respect to p.
12.1 Introduction and Examples □ 701 We have shown that (Л,Л,/х, is a measurable dynamical system. This system is known in the literature as a Bernoulli scheme and is often denoted by B(p1,p2> • • • ,Pn). □ EXAMPLE 12.5 Continued Fraction Expansions Decimal and binary expansions are familiar ways of representing real num- bers. Less familiar but, nevertheless, useful and interesting is the expansion of a real number as a continued fraction. This expansion is based on iter- ation of the function defined on [0,1) by ¥>(*) = | J7* L1/a:J’ for x / 0; for x = 0, where |_ J denotes the greatest integer function. We can express x in terms of ip(x) by 1 ot(x) + ip(x) ’ (12.5) where a(x) = / IV^-I > I oo, if x / 0; if x = 0. Replacing x by (p(x) and substituting into the right-hand side of (12.5), we obtain 1 ж- 1 + а(^(яг)) + Repeating this procedure, we obtain 1 X~ 1 a(x) +---------------------------------------------------------- a(y>(x)) +----------------------------------------- a(^(^(x))) +-------------------j---------
702 □ Chapter 12 Measurable Dynamical Systems where (p^ indicates the nth iterate of tp. As the reader is asked to show in Exercise 12.4, the sequence of quotients 1 Xn — 1 a(x) +-------------------------------------------- “(Ф)) +----------------------------------- а(^(р(я:))) +------------------ (a:)) converges to x. Thus, we have the continued fraction expansion x =--------------------------------------- (12.6) a(®) +---------------------------- а(<р(я:)) +--------------— а(<р(<р(х))) + — Clearly, the mapping <p is the key element for obtaining (12.6). We will relate a measurable dynamical system to the continued fraction by finding a probability measure p, on Borel subsets of [0,1) that is invariant with respect to <p. To obtain /z, we begin by deriving a necessary condition for a Borel measure on [0,1) to be both invariant with respect to <p and absolutely continuous with respect to A[o,i)« Suppose then that /1 is a Borel measure on [0,1) that is invariant with respect to <p and absolutely continuous with respect to A[Oji). Set g = dpb/dX. Then, for each t 6 (0,1), we have / g(x) dX(x) = / g(x)dX(x). Ao,t) Jy-'M Using 1((0,t))= (J{x : Ll/zJ = k and 92(2?) <t}= |J | fc=i fc=i K we obtain (12.7)
12.1 Introduction and Examples □ 703 Ignoring questions of convergence, we differentiate both sides of (12.7) to get the equation 00 / i \ 1 9 ® = 529 ) (t+fc)2 (12,8) k=l x z v ' Equation (12.8) looks formidable. To find a solution, it is helpful to recast it as a functional equation: 1 (t + fc + l)2 V ’ 12.9) 9(t) = g 1 \ 1 уг / 1 t + ij (t + l)2 v + fc + i Мгтт) <rrip+s'1 + 1) The form of (12.9) suggests that we try to find solutions of the type g(t) = (t + l)a. Substituting for g(t) in (12.9) gives / 1 V 1 (t + i)“ = (j7i+1) (t+Ij2 + (*+ 2)a- (12’10) It is not hard to see that (12.10) is satisfied for all t G [0,1) if a = —1. The preceding informal argument suggests that measures on [0,1) of the form = / 7TT dX& J В я + 1 are invariant with respect to the transformation <p. It is left for Exer- cise 12.5 to verify this suggestion formally. The choice c = (log2)”1 yields an invariant probability measure on [0,1). □ EXAMPLE 12.6 Hamiltonian Systems Consider the system of differential equations = = J = l,2,...,3n, (12.11) dt @Pj OQj where H is a function on TZen of the form H(p,q) = H(pi,P2,--->P3n,9i,Q2,---,93n)
704 □ Chapter 12 Measurable Dynamical Systems Such systems of differential equations are important in mechanics where, for 1 < j < n, the vectors (Q3j-2,Q3j-i, Q3j) and (p3j_2,P3j-i5P3j) rep- resent, respectively, the position and momentum of the jth of n particles moving in 7£3. The term | SjZi Pj /mj gives the total kinetic energy of the n particles and V(q\, g2, • • •, <73n) is the energy associated with interactions of the n particles. Assuming that V is reasonably well behaved, it follows from the general theory of differential equations that, for each (x,y) = (х1,х2,...,2:зп,У1,2/2,...,Узп) € 1ZGn, there is a unique solution a(t, x, p) = (pi (t, x, p),..., рзп(t, x, y), gi (t, x, p),..., g3n(*, x, 2/)) to the system that is defined for all t and satisfies q(0, x, p) = (ж, у). Also, under appropriate hypotheses on the function V, it can be shown that, for j = 1, 2, ..., 3n, all second-order partial derivatives of the func- tions Pj(t, x,y) and qj(t, x, y) with respect to each of the variables t, a?i, ..., узп exist and are continuous. For fixed t, the function ipt(x,y) = a(f, x,y) maps 1Z6n into itself. We will show that A6n(£) = Абп^Г1 (£))> E e -M6n. (12.12) This result is known as Liouville’s theorem. To obtain (12.12), we first apply the change of variable formula from advanced calculus* to conclude that I | det J(pt | dXen — I d\Qn Jb J^b) whenever В is a Cartesian product of bounded intervals. And then we use the fact that detJ^ = l. (12.13) (See Exercise 12.6.) Next we will combine (12.12) with an invariance property of H to produce a measurable dynamical system. The property of H that we need is ( Ho<pt = H. (12.14) t See, for instance, Protter and Morrey’s A First Course in Real Analysis, 2nd edition (New York: Springer-Verlag, 1991), p. 366.
12.1 Introduction and Examples 705 To obtain (12.14), we use the chain rule and (12.11). We have to>+gta(I,s)) to> J=1 J J 3n dH . ..дн, , " C7Qj- OPj = 0. It follows that H(<Pt(x,yy) = #(^o(z,y)) = H(x,y) and, hence, (12.14) holds. For each c G 1Z, let Qc = H""1((—oo,c)). Then, in view of (12.14), we have <pt(Qc) C Qc. Assuming, as in many applications, that Qc is a bounded set with positive Lebesgue measure, we can define a probability measure on Lebesgue measurable subsets of Qc by /z(E) = A6n(£) ^бп(^с) It follows from (12.12) that p, is invariant with respect to ipt and, conse- quently, (Qc,-A4qc,/z, <^t) is a measurable dynamical system. □ EXERCISES 12.1 12.1 Prove that the mapping <p in Example 12.3 on page 699 is measurable with respect to Л4(о,1) and that A[o,i) Is invariant with respect to ip. 12.2 Show that the collection C defined in Example 12.4 on page 699 is a semi- algebra. 12.3 Show that the set function l defined on the collection C of Example 12.4 satisfies conditions (E1)-(E3) on page 208 and condition (E4) on page 216. 12.4 Prove that the sequence {zn}!Xv defined in Example 12.5 on page 702, converges to x. Hint: If x G Q, show that xn = x for sufficiently large n. If x Q, show that xn = pn/qn where {pnJ^Li and are sequences of integers defined recursively by: p~i = 0, po = 1, and pn = anpn-i + Pn-2 for n > 1; q-i = 1, go — a(z), and qn = unQn-i 4- qn-2 for n > 1. Here an = a(cp(n\x)).
706 □ Chapter 12 Measurable Dynamical Systems 12.5 Prove that the measure m(B) = x 4-1 В € B[0,i), is invariant with respect to the mapping ip defined in Example 12.5 on page 701. 12.6 Verify (12.13) on page 704. Hint: Show that ddet y)/dt — 0. ★12.7 Suppose that Q is a compact Hausdorff space and let ipi Q —> Q be contin- uous. Show that there is a regular Borel probability measure д on Q such that ^(y?“1(B)) = /z(B) for, all Borel subsets of Q. Hint: Fix w E О and apply the Hahn-Banach theorem (page 580) using the subadditive function a: C(Q, 1Z) —► defined by <r(/) = limsup п-oo n < where is the fcth iterate of <p. *12.8 Let Q be a compact Hausdorff space and (pi Q —> Q be continuous. Show that the collection I(ip) of regular Borel probability measures on Q that are invariant with respect to <p is weak* compact and convex. 12.9 Let ip(x) = x2. Show that the only regular Borel probability measures on [0,1] that are invariant with respect to the function ip are those of the form сбо 4- (1 — c)(5i, where 0 < c < 1. 12.10 Let <pi [0,1] —► [0,1] be absolutely continuous, strictly increasing, and onto. Set ip = <p~l. Show that if /j, is absolutely continuous with respect to A[o,iJ and invariant with respect to y?, then for almost all x 6 [0,1]. 12.11 Suppose that (pi [0,1] —► [0,1] is continuously differentiable and that for each x 6 [0,1], (p"1 ({ж}) is a finite set. Let // be absolutely continuous with respect to A[o,i] and set g = dpb/dA. Show that /z is invariant with respect to ip if and only if g(y) for almost all x € [0,1].
12.2 Ergodic Theory □ 707 12.2 ERGODIC THEORY Let (fl, А, д, (p) be a measurable dynamical system. Recall that for n e M denotes the nth iterate of </?. We also define to be the identity function on fl. For x e П, the sequence x, p(x), p(<p(x)), ..., p(n)(x), ..., called the orbit of x, describes the path of the point x as it moves in Q under iterations of the mapping </?. Ergodic theory tries to find out as much as possible about this sequence. Oftentimes in applications, orbits cannot be observed directly, but rather data are obtained in the form of numerical sequences /(x), №(x)), /(^(x))), №(n)(z))> where f is some function defined on Q. In this section, we prove some gen- eral results about the average behavior of the sequence {/(^n“14^))}n=1- Specifically, we will first establish that for each f E £1(/^), the limit - n—1 /* =, lim - V f о n—>oo ti k=0 exists /i-ае. Then we will investigate the important case where /* is con- stant /i-ae for all f E £1(/i). THEOREM 12.1 Pointwise Ergodic Theorem For each / e the limit - n-1 f* = lim -У (12.15) n—»oo TI fc=0 exists fi-ae. Furthermore, f* € Г1 (д) and satisfies n n (12.16)
708 □ Chapter 12 Measurable Dynamical Systems PROOF:* We will prove the theorem in the special case f = хв? leaving the proof of the general case for Exercises 12.13 and 12.14. We begin by considering the number of visits to the set В among the first n terms of the orbit of x, that is, Sn(x) = Хв 0 anc* *he average number of visits An(x) = Sn(x)/n. _ Suppose we can show that the functions A(x) = limsupn-^ An(#) and A(z) = liminfn^oo An(x) satisfy [ Adfj,<fj,(B) and [Adp,>pJB). (12.17) Jn Jn Then we would have fQ(A — A) cfyt < 0 and, because A — A > 0, it would follow that the limit (12.15) exists /i-ае and that (12.16) holds. We proceed to verify (12.17). Our arguments will make use of the following properties of the functions A and A: 0< A< A< 1 (12.18) and = А and Aoip — A (12.19) (See Exercise 12.12.) To understand the proof of (12.17), it helps to think of the parameter n as time. Then An(x) represents the average number of visits of the orbit of x to the set В by time n — 1. Let e > 0 and let re(x) denote the first time that the average number of visits exceeds A(x) — 6. Symbolically, we have r€(x) = min{ n eAf : An(z) > A(z) - e }. We observe that by (12.18), re(x) is always a positive integer. From {x : re(x) > c} = Q{x : An(x) < A(x) - e}, n<c it follows that re is A-measurable. t This proof is adapted from one given by M. Keene, “Ergodic Theory and Subshifts of Finite Type,” in Ergodic Theory, Symbolic Dynamics and Hyperbolic Spaces, edited by T. Bedford, M. Keene, and C. Series (Oxford, UK: Oxford University Press, 1991). Keene’s argument is based on ideas in Y. Kamae, “A Simple Proof of the Ergodic Theorem Using Non-Standard Analysis” (Israel J. of Math, 42, pp 284-290, 1982.)
12.2 Ergodic Theory □ 709 Either re is essentially bounded or it is not, that is, either re e (12.20) or Te^r00^). (12.21) Suppose first that (12.20) holds. Then we can choose an integer M such that M(r€-1((M,«)))=0. (12.22) For each x e П, we consider the sequence of integers 7i(x) = r£(x), r2(x) = re(9?(ri(l))(x)), т3(х) = те(<^(Т1(а:)+Т2(а:))(х)), .... It follows from (12.22) and the invariance of д with respect to ip that, for //-almost all x, we have Tj(x)<M, jeM. (12.23) Suppose that x satisfies (12.23). In what follows, we will suppress the dependence of Tj on x. Let n be a positive integer greater than M and let q be such that where we are using the notation aq = n 4- 12 4----F rq. Then Sn(x) > S„(x) = IL о ^k\x) + iz *b о (fW (x) + • • -. + xb 0 (*) fc=0 fc=CTi k=aq-i = ST1(x) + ST3(^(x)) + + ST,(^’-)(x)). It follows from (12.19) and the definition of re that STl(x) > Т1(Л(х) -e), > t2(A^\x)) -e)= t2(A(x) -e) 5T,(^’-'>(x)) > r,(A(^-)(x)) - e) = 79(A(x) - e).
710 □ Chapter 12 Measurable Dynamical Systems Hence, Sn(x) > crq(A(x) - e) > (n - Tg+i)(A(a:) - e). Applying (12.23) we conclude that, for д-almost all is, we have the in- equality Sn(x) > (n - M)(A(x) - б). (12.24) We have shown that (12.24) holds for g-almost all x € Q. Integrating both sides of (12.24) and using the invariance of д with respect to we obtain пд(В) = 2д((^))-1(В)) k=0 = I Sn(x) dp,(x) > I (n — M)(A(x) — б) d[i(x). Jn Jn Dividing by n and letting n —* oo, we get /i(B) > I A(x) d[i(x) — 6. Jn A similar argument shows that /i(B) < I 4(я) d/i(x) 4- 6. Jn As б > 0 was chosen arbitrarily, we obtain (12.17). Thus, the proof of the theorem is complete in case (12.20) holds. It remains to establish (12.17) in case (12.21) holds, that is, when r€ is not essentially bounded. The idea is to reduce the proof to the case where (12.20) holds by slightly enlarging the set B. Because re is finite-valued, we can choose a positive integer M such that д(т€~1((Л/,оо)) < б. Now we set Be = В U t”1((M,oo)), k=0 A^(x) = Sn(x)/n, and т€(я) = min{n G Af : A^(x) > A(x) - б}. It follows immediately that t6 < re. We claim that т€(х) < M, x G Q. (12.25)
12.2 Ergodic Theory □ 711 (12.26) (12.27) For, if т€(я) > M, then т€(х) > M. Hence, А|(ге) = 1 > A(x) — c, but this implies that re(x) = 1 < M, a contradiction. We can now apply the arguments used in the case (12.20) to obtain /i(B6) > fQ A(x) d/j,(x) — c. Therefore, m(B) + € > м(В) + м (т-1 ((M, oo))) >/i(Bc) > f A(x) dp,(x) - c. Jn By similar arguments, we obtain that /i(B) - б < / A(x) dpjx) + 6. Jn From (12.26) and (12.27), we deduce that (12.17) holds. EXAMPLE 12.7 Illustrates the Pointwise Ergodic Theorem Consider the Bernoulli scheme of Example 12.4 on page 699. Let к E J\f. Define F: Q —* 1Z by F(/) = f(k). Because F-1({m}) = { f e n : /(fc) = m } = C{k}<m, F is Л-measurable. Applying the pointwise ergodic theorem, we conclude that the average F*(/)= lim i£/(fc + J) j=0 exists for almost all f e fi. We also have f f N N / F*dp = / F dp = V mp{C{k}tm) = V mpm, as is easily verified. □ Ergodicity Many interesting measurable dynamical systems have the property that for each f e £1(/i), the average, /*, in the pointwise ergodic theorem is constant almost everywhere. In Theorem 12.2 we will see that this property is characterized by the following condition: ВеЛ & B = ^“1(B) => /i(B) =0 or /i(B) = 1. (12.28)
712 □ Chapter 12 Measurable Dynamical Systems To understand the meaning of (12.28), it helps to consider its negation. Suppose Qi G Л, Qi = 1 (Qx), and 0 < /i(fli) < 1. Let Q2 = fl \ fli- Then we also have П2 G A, Q2 = and 0 < /1(0г) < 1. For j = 1, 2, we define the ст-algebra Aj = {А П flj : A G A} and a corresponding probability measure p>j(A П flj) = /i(A П fl,)//i(flj). Denoting by tpj the restriction of the mapping <p to Slj, we obtain the two measurable dynamical systems (Oj, j = 1, 2. For x 6 fl, the orbit {^n”14x)}^=1 is contained in either flx or fl2. Indeed, that orbit equals either {^"^(я)}^ or {p^~ 1\rE)}n2=i • Thus, we have complete information about the orbits of (fl, A, /1, </?) if we have it for each of the two smaller systems (fl.,, Aj, <Pj), j = 1, 2. DEFINITION 12.2 Ergodicity A measurable dynamical system (fl, A, /1, p) is called ergodic if Ее A к E^ip-\E) => /i(E) = 0 or /i(E) = 1. Exercise 12.21 shows that the measurable dynamical system in Ex- ample 12.3 on page 699 is ergodic. Example 12.8, which we will present shortly, shows that the measurable dynamical system in Example 12.2 on page 699 is ergodic if and only if b is irrational. In the proof of our next theorem, we will need to know that ergodicity is equivalent to Ее A & Ec</?”1(E) => /i(E) = 0 or /i(E) = 1. (12.29) We leave the verification of this fact to the reader as Exercise 12.15. THEOREM 12.2 Let (fl, A, /1, (p) be a measurable dynamical system. Then the following are equivalent: a) (fl, A, p, (p) is ergodic. b) For each f e £1(/i), the average f* = lim - V f о tpW n—>00 fl k—0 is constant /i-ae. c) If f e C1 (/1) and f о ip = f p-ae, then f is constant /i-ae.
12.2 Ergodic Theory □ 713 PROOF: The equivalence of (b) and (c) is left for Exercise 12.16. Suppose (a) holds and f G £1(p) is such that f о cp = f p-ae. To show that f is constant p-ae, it suffices to consider the case where f is real-valued. Let D — {т G fi : f(x) 7^ f о p(x) }. Then = 0. Letting p~k = (<p(fc))“x, we have from the invariance of p that p(<p“fc(D)) = 0 for all k. Hence, ✓ OO \ oo 4 □ ^-fe(D)) < £M(^-fc(D)) = 0. 'fc=0 ' fc=0 Let b G 11 and set E = /-1((-оо,Ь)) \ UfcLo Then we have p(E) = p(/-1((—oo,6))) and E C p'^E'). By (a) and (12.29), we know that p(/”1((—oo, 6))) equals either 0 or 1. It is now immediate that f is constant p-ae. Consequently, we see that (a) => (c) Conversely, suppose (c) holds. Let E G A be such that E = <p”1(E'). Then xe 0 Ч> = Х^-ЦЕ) = Xe- Hence, by (c), xe is constant p-ae. It follows that p(£?) is either 0 or 1. Thus, we have shown that (c) => (a) EXAMPLE 12.8 Illustrates Theorem 12.2 Using Theorem 12.2, we will now show that the measurable dynamical system of Example 12.2 on page 699 is ergodic if and only if b is an irrational number. Suppose f G £x(p) is such that f о p = f. Then the Fourier coefficients of the function g(x) = f(etx) must satisfy g(n) = е27ГгпЬд(п). If b is irrational, it follows that g(n) = 0 for all nonzero integers n. Thus, f is constant by Theorem 11.1 on page 638. Consequently, we see that (Г, .A,p,Vb) is ergodic. On the other hand, if b is rational, say, b = p/g, where p and q are integers, then the function f(z) = zq is nonconstant and satisfies foipb = f. Hence, (Т,Л, p, Vb) is not ergodic. □ From the pointwise ergodic theorem and Theorem 12.2, we obtain the following important corollary. COROLLARY 12.1 If (Q, Л, p, p) is ergodic, then for each f G E1 (p), 1 n~1 /* for almost all x G Q.
714 □ Chapter 12 Measurable Dynamical Systems EXERCISES 12.2 12.12 Verify (12.19). 12.13 Let (Q, Л,/z, y?) be a measurable dynamical system. Suppose f 6 £x(/z) and f > 0 /z-ае. Let N 6 X and e > 0 be given. Set Sn(f) = Z2kZo An(/) = Sn(f)/n, A(f) = limsupAn(J), and Te = min{ n € N : An > min{7V, A(f) — e} }. Show that I fdp,> I min{7V, A(f) - e} d/z, Jn Jn if Te 6 £°°(/z), and I fdp,> I min{7V, A(f) — e} dfi — e, Jn Jn ifT£g£~(/z). 12.14 Use Exercise 12.13 to complete the proof of the pointwise ergodic theorem. 12.15 Prove the equivalence of ergodicity and condition (12.29). Hint: Consider я = П„>ои„>^-п(£)- 12.16 Prove the equivalence of (b) and (c) in Theorem 12.2 on page 712. Exercises 12.17-12.20 are devoted to proving an £2-version of the pointwise er- godic theorem. 12.17 Let V denote the collection of all f e £2(/z) such that /* = lim n-*oo П k=0 exists in the sense of convergence in the £2(^z)-norm. Prove that V is a closed linear subspace of £2(/z). 12.18 Let V be as in Exercise 12.17 and let У = {/еГ2(д):/о^ = /}. Show that Y С V and P(/) = /* for all f e V, where P:£2(/z) -> Y is the orthogonal projection. 12.19 Refer to Exercises 12.17 and 12.18. Let Z = {foip — f:fe £2(/z) }• Show that Z С V and P(Z) = {0}.
12.3 Isomorphism of Dynamical Systems; Entropy □ 715 12.20 Refer to Exercises 12.17-12.19. Show that (Y+Z)1- = {0} and deduce that V = C2(p). This proves the £2-ergodic theorem: For each f G £2(m), the limit f* ~ lim n 52 °v,(fc) k=0 exists in the sense of convergence in the £2(/z)-norm. Hint: Show that h G (Y + Z)1- => (h о f) = (h,f о у). 12.21 Show that the measurable dynamical system of Example 12.3 on page 699 is ergodic by employing the following argument. a) Show that if (/?-1(A) = A, then A(A A I) = A(A)A(Z), whenever I is a subinterval of [0,1) of the form I — [p/2n,g/2n) for integers p and q. b) Extend the result in part (a) to arbitrary subintervals of [0,1). c) Show that A (A) = A (A)2. 12.22 Use the Fourier coefficients cn = J* e~2™nxf(x) dx, n G Z, to provide an alternative verification to the one given in Exercise 12.21 showing that the measurable dynamical system of Example 12.3 on page 699 is ergodic. 12.23 Show that if (Q,A, Mi,^) and (Q,A,M2,¥?) are both ergodic, then either Ml = М2 or Ml ± М2- 12.24 Let Q be a compact Hausdorff space, A be the collection of Borel subsets of Q, and y?: Q —► Q be a continuous function. Consider /(Q) = { n 6 P(fi) : д(^-1(А)) = д(Л) for all A € A }. Show that p is an extreme point of Z(Q) if and only if (Q, А, m, ¥>) is ergodic. Refer to Exercises 12.7 and 12.8 on page 706. 12.25 Let Q, A, and ip be as in Exercise 12.24. Show that for each v G Z(Q), there is a regular Borel measure on the weak* closure of exZ(Q) such that [ ( I fdp) сЕДм) = [ fdv, f G C(Q). AxT(Q) \Jq / Jn Hint: See Theorem 10.15 on page 624. 12.3 ISOMORPHISM OF MEASURABLE DYNAMICAL SYSTEMS; ENTROPY This section is an introduction to some ideas motivated by the question: “When are two measurable dynamical systems essentially the same?” First we will give a definition of what it means for measurable dynamical systems to be isomorphic. Then we will present a powerful tool for deciding when two measurable dynamical systems are isomorphic, namely, entropy.
716 □ Chapter 12 Measurable Dynamical Systems DEFINITION 12.3 Isomorphism of Measurable Dynamical Systems Two measurable dynamical systems (fi,A, p, <p) and (A,5, p, VO are said to be isomorphic if there are mappings J: fi —> A and К: Л —> fi such that a) J“1(B) 6 A for each В 6 S, b) K-1(A) G 5 for each A € A, с) p(J“1(B)) = v(B) for each В G 5, d) p(K~1(A')) = p(A) for each A G A, e) J о = -0 о J p-ae, f) К о'ф — ip о К p-ae, g) К о J(x) = x ц-ае, h) J о К (у) = у v-ae. Each of the mappings J and К is called an isomorphism. As the reader is asked to verify in Exercise 12.27, the measurable dynamical systems given in Examples 12.1 (page 699) and 12.2 (page 699) are isomorphic via the mapping E(x) = e27rLX defined in the latter. A more complicated example of a pair of isomorphic measurable dy- namical systems is obtained by considering a so-called one-sided variation of the Bernoulli scheme B(l/2,1/2). EXAMPLE 12.9 Illustrates Definition 12.3 Refer to Example 12.4 on page 699. The construction of the Bernoulli scheme is unaffected if the space fi = Sz is replaced by fi+ = S^. In the case where S = {0,1} and (р1,рг) = (1/2,1/2), the measure p is replaced by the measure p+ satisfying р+(С/?а) = 2~N^ and the function ip is re- placed by <p+((xi, ^2, X3, ...)) = (^2, хз,...). It can be shown that the map- ping J: fi+ —* [0,1) defined by J((x!,X2,X3,...)) = Xj2~^ if Xj = 0 for some j, and 0 otherwise, is an isomorphism of (fi+,A+,p+,<p+) onto the measurable dynamical system ([0,1), B[o?1), A[0,i), <p) of Example 12.3 on page 699. See Exercise 12.28. □ The idea of isomorphism immediately suggests the following prob- lem: Given two measurable dynamical systems, determine whether they are isomorphic. A natural approach to this problem is to seek invari- ants of measurable dynamical systems. An invariant of a measurable dynamical system (fi, A, p, <p) is a number or property, T(fi, A, p, <p), such that if (fi,A,p, <p) and (A,S,v,il>) are isomorphic, then Z(fi, A,fi,ip) and I(A, 5, v, i/j) are identical.
12.3 Isomorphism of Dynamical Systems; Entropy □ 717 Here is a simple illustration of the use of invariants. As the reader is asked to verify in Exercise 12.29, the property of being ergodic is an invari- ant of a measurable dynamical system. From Example 12.8 on page 713, we know that if b € Q and c Q, then (T, A, /z, is not ergodic and (Г,А,д,^с) is ergodic. Therefore, those two measurable dynamical sys- tems are not isomorphic. Entropy The remainder of this section is devoted to a discussion of numerical mea- sures of information. To motivate the pertinent ideas, we consider the following “thought experiment.” Let (fi,A,p) be probability space. Suppose that tlie distribution of the location of a particle, p, in fi is given by the probability measure p; that is, for each A 6 A, the probability that p is in A equals p(A). The object of our experiment is to locate the position of p as closely a& possible. Let ф be a measurable partition of (Q, A). Suppose that we can extract information about the location of p by answering, for each A € ф, the question: “Is p in A?” In other words, we can ascertain which element of ф contains p. Some partitions tell us more than others about the location of p. For example, for the probability space ([0,1),Л4[од), A[o,i))i we expect more information from ф = {[0,1/2), [1/2,1)} than Q. = {[0,1/100), [1/100,1)}. This is because we are guaranteed that ф will reduce by 50% the measure of the set where we have to look for p, whereas, unless we are lucky, Q will reduce it by only 1%. To proceed rigorously, we need to assign a number to the amount of information gained from a measurable partition. That number is called the entropy of the measurable partition. DEFINITION 12.4 Entropy of a Measurable Partition Let (fi,A,p) be a probability space and ф a measurable partition of (fi, A). Then the entropy of ф, denoted by Н(ф), is defined by where we use the convention that 0 log 0 = 0.
718 □ Chapter 12 Measurable Dynamical Systems At the end of this section, we will derive the formula for H(^J) from some plausible properties of a measure of information. For the present, we content ourselves with the intuitively satisfying observation that in the case of the probability space ([0,1), A4[o,i), A[o,i)), the entropy of a two element partition H({A, Ac}) = -A(A) log A(A) - (1 - A(4)) log(l - A(4)) is maximized when A(A) = A(AC) = 1/2. To obtain the basic properties of entropy, we need to introduce the concept of the refinement of a measurable partition. We say that a mea- surable partition £1 is a refinement of the measurable partition and write £} if every element of is a union of elements of £1. For any two measurable partitions ^ and 9Я, there is a smallest common refinement given by*pv£R = {AnB:Ae*P, BefR}. PROPOSITION 12.1 Let (Q, A,p) be a probability space and £1, and 91 be measurable par- titions of (fl, A). Then the following hold: а) ф«СП=>Н(ЭД<Н(£1). b) Я(<£ V £R) < Я(ф) + Я(ЭТ). PROOF: To prove (a) we start by observing that each A G ф is a disjoint union of members of £J. Thus, for p(A) > 0, we have -/z(A)log/z(4) = - J2 ^(s)log/z(4) BCA BEU = - E M(B) log д(В) + 52 д(В) log BCA BCA B€Q B€Q < - E M(B)log/z(B). BCA вед Summing over A e we obtain that HW) = - E M(A)logM(A) < - E E M(B)logM(B) = B(Q). АЕф A&p BCA The proof of (b) is based on the fact that the function g(t) = —tlogt is concave on [0,1]; that is, g satisfies — 9(12.30) j=i '
12.3 Isomorphism of Dynamical Systems; Entropy □ 719 for all convex combinations of elements of [0,1]. Without loss of generality we can assume that д(С) > 0 for all С 6 fH. Thus, we can write for each A E ^3. It follows from (12.30) and (12.31) that ЮЕд(Л) > - ьЕ (2ИПЗ) = - д(Л n C) logn(A П C) + 52 /Ф4 n C) log/i(C). сел с&я Summing over A E we get >-^Х^АпСУ) 1о^И n С) + V £ n C) log//(C) = - 52 5><Л n C) log д(А П C) +.£ M(C) logM(C) AG^P CGfH CGfH = H(qjvfH)-H(fH). Thus, (b) is proved.* Entropy and Measurable Dynamical Systems Up to this point, we have defined entropy for probability spaces (9,Л, д). Now we introduce a dynamical aspect by considering the measurable dy- namical system (Q, Д,/х, ip). Suppose that we modify the “thought experi- ment” introduced on page 717 by allowing the particle p to move according to the following rule: If p is at x at time 0, then its position at time 1 is ^(x), its position at time 2 is etc. If we use a measurable partition ^3 to .obtain information about the location of p at time 0, then the measurable partition ^-пФ = {(у>(п))-1И):^е‘р} yields corresponding information about the particle’s location at time n, and the measurable partition fp(n) = <p v V ... V yields corresponding information about the path of successive positions of p at times 0 through n — 1 as it moves in Q under the action of
720 □ Chapter 12 Measurable Dynamical Systems PROPOSITION 12.2 Let (П,Л,р, 99) be a measurable dynamical system and a measurable partition of (fl, Д). Then the following hold: а) Н&~кф) = Я(«р). b) = Я(<р<п)). c) < Я(ф(п>) + Я(ф(т>). PROOF: Parts (a) and (b) follow immediately from the definition of the entropy of a measurable partition and the invariance of p with respect to <p. To obtain (c), we begin with the observation <p(n+m) = *p(n) V ((p'"n^J)^Tn'\ It follows from Proposition 12.1 that Я(ф(п+го)) < Я(^(п>) + Я((<р"п‘Р)(,п)). The assertion (c) is now an immediate consequence of (b). Using Proposition 12.2(c) it can be shown that the limit Я(ЯЗ<П)) Я(*£, сл) = hm ——- n—>oo П exists. (See Exercise 12.30.) We can think of H(^3, ip) as the time aver- age for the entropies associated with the measurable partitions The quantity /1(9?) = sup{ H(^3, ip) : ^3 a partition of (Q, Л) }, which can be viewed as the maximum amount of information that can be extracted from the dynamical system per unit time, is called the entropy of the measurable dynamical system (Я,Д,д,<р). As the reader is asked to show in Exercise 12.34, h is an invariant of (О,Л, д, p). Calculation of the entropy of a measurable dynamical system is often not an easy task. In the next section, though, we will find a formula for the entropy of the Bernoulli scheme B(pi,p2, • • • ,Pn)« Motivating the Formula for the Entropy of a Partition We will now motivate the formula for the entropy of a measurable partition given in Definition 12.4 on page 717. Let us return to the “thought experi- ment” discussed previously in this section. Recall that a particle is located in Q according to the probability measure p. That is, for each A 6 Л, the probability that p is in A (12.32) equals p(A).
12.3 Isomorphism of Dynamical Systems; Entropy □ 721 We would like to assign a numerical value 1(A) to the information contained in the event (12.32). It seems reasonable to require that 1(A) be a decreasing function of p(A). In other words, the smaller the probability of A, the greater the information that is obtained from the knowledge that p is in A. Thus, we should have a decreasing function f defined on JO, 1] such that 1(A) = f(jj,(A)). (12.33) Another plausible condition on 1(A) is that it should assign the value 0 to the sure event: I(Q) = /(1) = 0. (12.34) Equation (12.34) reflects the fact that knowing p is in Q provides no infor- mation. Our final condition on I concerns the total information in two indepen- dent events, say, A and B. Knowing that one of the events occurs provides no probabilistic information regarding the occurrence of the other event. Therefore, there is no redundancy in the information imparted by knowing that p is in A and the information imparted by knowing that p is in B. Therefore, the total information imparted by knowing that p is in А П В is the aggregate of the individual information: I(A A B) = 1(A) + 1(B). (12.35) Combining (12.33-12.35) we obtain a decreasing function f defined on probabilities such that /(l) = 0 and f(st) = f(s) + f(t), s,te[0,1]. (12.36) As the reader is asked to verify in Exercise 12.35, the only decreasing functions on [0,1] that satisfy (12.36) are those of the form /(t) = —alogt, where a is a positive constant. For convenience, we choose a = 1 to arrive at the following definition of the information content of a single event: /(Д) = -logM(^). Now consider a measurable partition = {Ai, A2,..., An}. The dis- crete random variable Х = ^1^)ХА, 3=1
722 □ Chapter 12 Measurable Dynamical Systems gives the information gained by knowing which element of ф contains p. The expected value of X is f (X) = [ X du = £ = - £ д(АД 1оёМ(АД = Я(ф). >=i Thus we see that the entropy of a measurable partition is the expected amount of information gained by knowing which element of the measurable partition contains the particle p. EXERCISES 12.3 12.26 Prove that isomorphism of measurable dynamical systems is an equivalence relation. 12.27 Show that the measurable dynamical systems in Examples 12.1 and 12.2 on page 699 are isomorphic. 12.28 Refer to Example 12.9 on page 716. Show that the measurable dynamical system (Q+,A+,//+,<£+) is isomorphic to ([0,l),S[o,i), where ip is the mapping defined in Example 12.3 on page 699. 12.29 Prove that ergodicity is an invariant of a measurable dynamical system. 12.30 Suppose that {an}^=1 is a sequence of real numbers satisfying the subad- ditivity condition an+m < un 4- am. Show that limn—ooUn/п exists as a real number or, possibly, — oo. Hint: Let m be fixed, but arbitrary. Each n G AT can be written as n = £m 4- r, where I > 0 and 0 < r < m. Thus, an < tom 4- ar. 12.31 Consider the probability space ([0, l),A4[o,i), A[0,i)). Show that among all measurable partitions of ([0, l),Af[o,i)) having n members, entropy is maximized by the measurable partition {[(J — l)/n,j/n)}j=1. 12.32 Let (П,Л,д) be a probability space. Show that if ф is a measurable partition having n elements, then H(^3) < logn. 12.33 Let (p be the identity function. Calculate the entropy of (Q, A, /z, 12.34 Prove that if (Q,^, /z, <p) is isomorphic to (A, 5, p, ^), then h(<^) = hfy). 12.35 Show that if f : [0,1] —♦ [0, oo] is nonincreasing and satisfies (12.36), then it must be of the form f(t) = —alogt. The remaining exercises of this section consider an alternative approach to the concept of the information in an event. As previously in this chapter, (£l,A,p) is a probability space. 12.36 Let J be an information function on A of the form J (A) = p(/z(A)), where g is a function defined on [0,1]. Suppose that there is also a conditional
12.4 The Kolmogorov-Sinai Theorem;Calculation of Entropy □ 723 information function defined by KA IB) - / P(B)g(p,(A I B)), if g(B) > 0; ifg(B)=0. Define the joint information function of A and В to be the sum of the information in В and the information in A given that В does not occur, that is, J(A, B) = J(B) + J(A | Bc). Suppose that the following conditions are satisfied: • {д(В):ВС A} = [0iM(A)]. • J(fi) = 0. • J(A,B) = Show that g satisfies the functional equation 9&) + (1 - x)g ) = 9(y) + (1 - У)9 ( Al-х/ \1 *“ У / for x, у e [0,1] and x 4- у < 1. 12.37 Let g be as in Exercise 12.36. a) Show that g(x) = p(l — x). b) Deduce that J(A) = J(AC), that is, the information in A is the same as that in Ac. Observe that the information function I discussed at the end of this section fails to have this property. 12.38 Let g be as in Exercise 12.36. a) Assuming that g is twice continuously differentiable, show that it must have the form g(x) = c(x\ogx 4- (1 — x) log(l — x)) for some constant c. Hint: Differentiate both sides of the equation in Exercise 12.36, first with respect to x and then with respect to y, and then use the substitutions и = y/(\ — x) and v = x/(l — y). b) Deduce that, in the case of two element partitions, using J (A) as a measure of the information content of an event A leads to the same definition of entropy as given in Definition 12.4. 12.4 THE KOLMOGOROV-SINAI THEOREM; CALCULATION OF ENTROPY Our goal in this section is to prove a theorem, due to Kolmogorov and Sinai, that will enable us to calculate the entropy of the Bernoulli scheme
724 □ Chapter 12 Measurable Dynamical Systems We will need the following natural extension of the notion of the en- tropy of a measurable partition. Suppose that ф and £1 are measurable partitions of (Я, Л, p). The conditional entropy of ф relative to £1 is defined by Я(ЭД £2) = - £ £ д(В)д(А | B) log//(A | B), вейАеф where we assign the value 0 to a summand in which p(B) = 0. PROPOSITION 12.3 Let (fl, Л, p, ip) be a measurable dynamical system and let ф, £2, and be measurable partitions of (Я,Л). Then the following hold: а) Я(ЭД£1)<Н(ЭД. b) Н(^ V £2) = Я(£2) + H(% | £2). c) Я(фУ£2|ЭД<Я(ЭДЭД + Я(£2|ЭТ). d) ф«£2=>Я(ЭДЭД <Я(£2|ЭД. e) £2«91=>Я(ЭДЭД <Я(ф|£2). f) Я(<р-1ЭД y>-1£2) = Я(ЭД£2). g) Я(£2,^)<Я(£2|ЭД + Я(ЭД^). PROOF: The proofs of (a)-(f) are left for Exercises 12.39-12.40. To ob- tain (g), we argue as follows. By Proposition 12.1 on page 718 and (b) and (c), we have Я(£2(п)) < Я(£2(п) V ЭДп)) = Я(ЭДп)) + Я(£2(п) |ЭДп)) n— 1 < Я(ЭДП)) + I ЭДЭД. j=Q Using (e) and (f) we conclude that n—1 Я(£2(п)) < Я(ЭДЭД + | <*ЭД < Я((р(п)) + пЯ(£21ЭД. >=о Recalling from page 720 that Я(£2,у>) = lim (12.37) n—*oo 72 we see that (g) holds. Next/we need a lemma about approximating cr-algebras by algebras of sets. In stating the lemma, we recall the notation for the symmetric difference of two sets: E A F = (B \ F) U (F \ F).
12.4 The Kolmogorov-Sinai Theorem;Calculation of Entropy □ 725 LEMMA 12.1 Let (Q, Д, д) be a probability space, F C A an algebra of sets, and £ the smallest а-algebra containing?. Then, given E E £ and e > 0, there exists an F E ? such that p(E Д F) < e. PROOF: Let Q denote the collection of all G E A having the property that there is a sequence {Fn}Xi C ? such that limn—p(G Д Fn) = 0. As the reader is asked to prove in Exercise 12.41, Q is an algebra of sets. The lemma will be established if we can show that Q is actually a cr-algebra. Let {Gn)Xi be a sequence of sets in Q. We must prove that UXi € Q. First we disjointize the Gns. Let Ei = Gi and, for n > 2, let En = Gn \ U£=i Gk- Because Q is an algebra, {En}Xi c moreover, we have |JXi En = UXi Gn- Let E = UXi En- ' Because Q is an algebra, (J>=i Ej E Q. It follows that for each n E Af, there is an Fn E ? such that Ej) Д Fn) < 1/n. Now, we have Hence, p{E AFn) < p £ fW+s- j=n+l Since 52X1 Д(^п) < 1, we conclude that limn—p(E Д Fn) = 0. Conse- quently, e e g. For А, В E A, the expression p(A | B) log/z(A | B) will be close to zero if p(A | B) is either close to zero or close to one. In other words, p(A | B) log д(А | B) will be close to zero if A and В are either nearly dis- joint or nearly equal. This observation makes it reasonable to consider the conditional en- tropy, | £2), a measure of closeness of the measurable partitions and £2. From this viewpoint, our next lemma concerns approximating one measurable partition by another.
726 □ Chapter 12 Measurable Dynamical Systems LEMMA 12.2 Let J7 C A be an algebra of sets, £ the smallest (J-algebra containing J7, and ф C £ a measurable partition. Then for each e > 0, there is a measurable partition £1 C 5 such that | Q) < e. PROOF: We sketch the proof, using imprecise terms such as “small” and “close,” leaving the details for Exercise 12.42. Let ф = {Ai, A2,..., An}. The main idea of the proof is to use Lemma 12.1 to approximate each Aj by a Cj e F. Let 6 be a small positive number. By Lemma 12.1, we can find, for each j, a set Cj G T such that p(Aj Д Cj) < 6. We will use the CjS to construct a measurable partition of Q. First, we disjointize the CjS by defining Bj = Cj \ Ck- Then we obtain a measurable partition £1 = {Bi, B2,.. •, Bn, Bn+i) by letting Bn+1 = Q \ Uj=1 Bj. Because T7 is an algebra, it follows that Bj G T7 for all j. Now we consider the conditional entropy n n+1 HOP IO) = - £ £ д(Вк)д(А, I Bfc) log^A,- I Bfc). J=1 fc=l On the right-hand side of the previous equation, the sum of the terms for which к = n + 1 is dominated by n/z(Bn+i) log 2/2. This latter expression can be made small by choosing 6 appropriately, because |/x(4j) “ v(Bj)\ is small for 1 < j < n and p(Aj) = 1. We use -/z(Bfc)/z(Aj | В*,) log/i(Aj | Bfc) < | Bfc) log/x(Aj | Bfc) and the observation that p(Aj |В&) is close to 0, when j / fc, and close to 1, when j = fc, to assert that the sum of the remaining terms of Н(ф | Q) is small when <5 is sufficiently small. In the remainder of this section, we assume that (Q,A,/z, </?) is a mea- surable dynamical system. We also continue to use the convention that denotes the identity on Q. LEMMA 12.3 4 Let Vp be a measurable partition. Then H(*p(k\<p) = y>) for all к > 1.
12.4 The Kolmogorov-Sinai Theorem;Calculation of Entropy □ 727 PROOF: It is easy to check that (<p(fc))(n) = ^(fc+n-i). Hence, Я(ф<‘>,lim g(ffllt,),n)) = lim n—+oo n n—>oo n = lim -----------^=Я(ф,^), 7П-+ОО ТП as required. If p is a 1-1 correspondence and (П, Л, /z, у?”1) is a measurable dynam- ical system, then we say that p is invertible. In such cases, the notation <p(m’n) = y-nty v V • • • V p~nty is meaningful for each pair of integers n, m with m < n. LEMMA 12.4 If p is invertible and is a measurable partition, then Я(<Р(т’п>,¥>) = Н(ф,<р) for each pair of integers n, m with m <n. PROOF: It is easy to see that = (^~гпф)(п”тп+1). Hence, by Lemma 12.3, we have p) = H(p~mty, p). Since p is invariant with respect to both p and p~r, it follows that Я(у>“т*р, p) = Я(ф, p). Next we discuss the relationship between measurable partitions and algebras of sets. Specifically, if p is invertible and ф is a measurable par- tition, then for each n € AT, the collection ЛП(Ф) = | В € A : В is a union of members of ф(~п’п> | is an algebra of subsets of Q. Because ЛП(^Р) С Лп+iCP), the collection Лоо(^) = и~=1Лп(Ф) is also an algebra of subsets of Q. (See Exer- cise 12.43.) We are now ready to state and prove the main result of this section, which is known as the Kolmogorov-Sinai theorem. In doing so, we recall that the entropy of the measurable dynamical system (Q, A, p, p) is defined by h(p) = sup{ Я(^Р, p) : a partition of (Q, Л) }.
728 □ Chapter 12 Measurable Dynamical Systems THEOREM 12.3 Kolmogorov-Sinai Theorem Let (П, А, д, tp} be a measurable dynamical system and assume that <p is invertible. Suppose that ф is a measurable partition of (О,Л) such that A is the smallest а-algebra containing Лоо(Ф)- Then h(ip) = H(ty, cp). PROOF: By the definition of h(cp), it suffices to prove that (12.38) for each measurable partition £J. It follows from Proposition 12.3(g) on page 724 that H(Q, ^) < Я(П | «p(-n-n)) + Я(«р(~п’п), <p) for all n € Af. Hence, by Lemma 12.4, we have Я(£2, ¥>) < Я(О1 ф(-"’п>) + Я(«р, y>). (12.39) Given e > 0, we can apply Lemma 12.2 to find a measurable partition R such that 91 С Лоо(Ф) and H(£l 191) < e. Since 91 is a finite collection, it follows that 91 С Дп(ф) for some n. In particular, we have 91 ф(~п’п\ Applying Proposition 12.3(e), we get Я(О | ф^“п,п^) < H(Q|91) < e. Hence, by (12.39), we have H(£l, <p) < e + Я(ф, ip). Since e is an arbitrary positive number, the assertion (12.38) follows and the proof is complete. There is a version of the Kolmogorov-Sinai theorem that is valid when ip is not necessarily invertible. For a measurable partition ф, let Лп(ф) = |BG^:Bisa union of members of ф^п^ } and let Лоо(ф) = UXi Аг(Ф)- If *п *be proof of the Kolmogorov-Sinai theorem, we replace Лп(ф) and Лоо(ф) by Лп(ф) and Лоо(ф), respec- tively, we obtain a proof of the following theorem. THEOREM 12.4 Let (fl,A,p,ip) be a measurable dynamical system. Suppose that ф is a measurable partition of (fl, A) such that A is the smallest a-algebra containing (ф). Then h(<p) = Н(ф, <p).
12.4 The Kolmogorov-Sinai Theorem;Calculation of Entropy □ 729 EXAMPLE 12.10 Entropy of a Bernoulli Scheme In this example, we apply the Kolmogorov-Sinai theorem to obtain the entropy of the Bernoulli scheme B(pi,P2, • • • ,Pw), first introduced in Ex- ample 12.4 on page 699. Consider the measurable partition of (Q, Д) given by : к = 1,2,..., N }. The entropy of ^3 is N N = ~^M(C{0},fc)loS/*(<?{()},*:) = -^PkbgPfc. fc=l fc=l We will now show that ф satisfies the hypothesis of the Kolmogorov-Sinai theorem. It is easy to see that is invertible. We have <p“1(C{o},fc) = ^{1}Л - more generally, <p“€(C{o},fc) = ^W,k for every integer t. Therefore, a typ- ical element of is of the form'll _m = C{_m _m+i,...,m},b where 6(£) = ke for — m <£<m. We recall that, in this example, A is the cr-algebra generated by sets of the form Ср?а, where F is a finite set of integers and a is a function from F into {1,2,..., TV}. By choosing m large enough, we can assume that F C {—m, ...,m}. Hence, we can write C{-m,...,m},b> where the union is over all functions b: {—m,..., m} —> {1,..., N} such that = a(^) for all t^F. It follows that Cp,a belongs to Am(*P) and this in turn implies that the algebra Aoo(^3) contains all sets of the form Сг,а. Thus, A is the smallest cr-algebra containing Aoo^P). Next, we calculate H(^3, <p). The entropy of is NN N m-1 m-1 52 П p(C{e},kt) log П Р(С{ОЛг) fc0=lfci=l ^-1=1 £=0 £=0 NN N 1 m-1 = S IIpfc'losIIpfc' fc0=lfc1=l £=0 £=0 As the reader is asked to verify in Exercise 12.44, using ^LiPk = 1, it can be shown that NN N m—1 m—1 N EE E П Pkt log fj Pkt = m^pk logpfc. fc0=lfc1=l £=0 £=0 fc=l Applying the Kolmogorov-Sinai theorem, we conclude that Wm)) h(<p) = p) = lim ------------------ ~52Pk log?*. 7П-+ОО 771
730 □ Chapter 12 Measurable Dynamical Systems Thus we see that the entropy of the Bernoulli scheme • • • ,Pn) equals - Pk log Pk □ Using Example 12.10 and the fact that entropy is an invariant of a measurable dynamical system, we obtain the following: If two Bernoulli schemes B(pi,p2, • • • and B(gi, g2, • • •,Qm) are isomorphic, then N M 52 Pk log Pit = 52 # log qe. (12.40) fc=l €=1 Thus, for example, we see that B(l/2,1/2) and S(l/3,1/3,1/3) are not isomorphic because log 2 / log 3. Actually, a stronger result exists regarding Bernoulli schemes, namely, that B(pi,p2,... ,pjv) and B(qi, q2, ..., qjw) are isomorphic if and only if (12.40) holds? EXERCISES 12.4 12.39 Prove (a), (b), and (c) of Proposition 12.3 on page 724. 12.40 Prove (d), (e), and (f) of Proposition 12.3 on page 724. 12.41 Suppose (Q, Л, /i) is a probability space. Let Q denote the collection of all E E A having the property that there is a sequence C F such that limn_oo P>(E A Fn) = 0. Prove that Q is an algebra of subsets of Q. 12.42 Provide the details for the proof of Lemma 12.2 on page 726. 12.43 Prove that if {A}^=1 is a sequence of algebras of subsets of some set Q such that An C Лп+i, then U^Li An is a^so an algebra of subsets of ft. 12.44 Using = 1, show that NN N m—1 m — 1 N 52 52 52 Пpktlog Пpk‘ ~mYjPklogpfc. fc0 = l*!l=l fem-1 = 1 £=0 £=0 fc = l 12.45 Show that if (Q,*4,g,tp) has entropy h(^>), then h(y№) = kh(tp) when ktN and, if ip is invertible, hfjp^) = |fc|h((£>) for all к 6 Z. 12.46 Refer to Example 12.1 on page 699. Show that h(ipb) = 0 if b is rational. Hint: See Exercise 12.45. t For a relatively short proof of this result, see M. Keane and M. Smorodinsky, “Bernoulli Schemes of the Same Entropy are Finitarily Isomorphic” (Annals of Math., 109, pp 397-406, 1979).
12.4 The Kolmogorov-Sinai Theorem;Calculation of Entropy □ 731 12.47 Let (Q,.A, д,(£>) be a measurable dynamical system and a measurable partition of (6,Д). Show that = limn->oo Htfp | Hint: Use Proposition 12.3 on page 724 to show that = я(<р | «1Ф)(к)) + 12.48 Consider the measurable dynamical system in Example 12.1 on page 699 and assume b is irrational. Let = {[0,6), [5,1)}, An = {B 6 A : В is a union of members of } , n 6 Af, and Лоо = UXi Show that the smallest a-algebra containing Лоо(ф) is the a-algebra of Borel subsets of [0,1). Hint: See Exercise 11.26 on page 653. 12.49 Refer to Example 12.1 on page 699. Show that h(^) = 0 if b is irrational. Hint: Use Exercises 12.47 and 12.48.

Index Absolute continuity equivalent conditions for, 350 Absolute value, 42 of a function, 421 Absolutely continuous, 343, 364 see also Absolutely continuous random variable Absolutely continuous function on a finite closed interval, 343 on the real line, 343 relation to absolutely continuous measures, 371 Absolutely continuous measure, 364 for complex measures, 382 relation to absolutely continuous functions, 371 Absolutely continuous random variable, 276, 371 probability density function of, 276 Absolutely convergent series, 531 Accumulation point, 63 in metric spaces, 470 Alaoglu’s theorem, 615 Algebra, 489 of functions, 66, 518 generated by a collection of sets, 28 Algebra of functions continuous real-valued functions on a subset of 7£, 66 Algebra of sets, 26 Almost all, 171 Almost always, 171 Almost certainly, 171 Almost everywhere, 171 see also Lebesgue almost everywhere Almost surely, 171 Almost-uniform convergence, 206 Л-measurable function Lebesgue integral for nonnegative extended real-valued functions, 186 Л-measurable set, 168 Arc, 456 connecting two points, 456 Archimedean principle, 38 Arcwise connected component of a point, 459 Arcwise connected space, 456 733
734 □ Index Ascoli-Arzela Theorem, 507 Associative laws, 36 Asymptotically uncorrelated, 312 Atom, 174, 392 Axiom of choice, 16 Baire category theorem, 494 alternative version, 495 Banach limits, 583 Banach space, 531 Banach, Stefan biography, 578 Band-limited measure, 672 Basic period, 636 Basic wavelet, 679 Basis, 498, 545 Bayes, Thomas, 273 Bayes’ rule, 273 Bernoulli, James, 287 Bernoulli scheme, 699, 701 Bernoulli shift, 700 Bernoulli trials, 287 Bessel’s inequality, 547 Binomial distribution, 287 Bolzano-Weierstrass theorem, 63 Boole’s inequality, 267 Bootstrapping, 188 Borel-Cantelli lemma, 270 Borel, Emile biography, 92 Borel measurable function, 563 Borel measurable functions on 7£, 94 equivalent condition for, 99 Borel measurable functions on a subset of 7£, 101 equivalent condition for, 101 Borel measure, 220, 563 decomposition of, 395 finite, 220 n-dimensional, 259 regular, 564 two-dimensional, 244 Borel sets n-dimensional, 254, 278 relation to Lebesgue measurable sets, 125 of a topological space, 563 two-dimensional, 244 Borel sets of 7£, 95 equivalent condition for, 100 as related to the Borel sets of a subset of 7£, 102 Borel sets of a subset of Ti, 101 equivalent condition for, 102 as related to the Borel sets of 7£, 102 Boundary of a set, 438 Bounded subset of a normed space, 613 weakly, 613 Bounded above subset of real numbers, 37 Bounded below subset of real numbers, 38 Bounded intervals, 4, 41 Bounded linear mapping, 529 Bounded set in a normed space, 468 Bounded variation, 331 Canonical representation of a simple function, 130, 184 Cantor function, 78 Cantor, Georg biography, 2 Cantor set, 75 Cantor ternary set see Cantor set Caratheodory criterion, 119 Cartesian product of a collection of sets, 18 of a finite number sets, 17 Cauchy criterion, 52 Cauchy sequence, 52 in a metric space, 435 pointwise, 73 uniform, 73 Cauchy’s inequality, 534 Central limit theorem, 669 Chain, 17 Characteristic function, 81 of a random variable, 666 Chebychev’s inequality, 295 Chi-square distribution, 285 Closed under pointwise limits, 69 Closed ball, 437 Closed convex hull, 621 Closed graph theorem, 595 Closed interval, 61 Closed linear operator, 594 Closed set in 7£, 61 of a topological space, 432 Closure, 432 of a set of real numbers, 60 Cluster point of a sequence of real numbers, 45
Index □ 735 C(Q, A), 483 C0(Q), 489 Cb(Q), 489 Сс(П), 489 Commutative laws, 36 Compact function, 508 Compact metric space, 465 Compact set, 465, 471 Compact topological space, 471 Complement relative, 6 of a set, 5 Complete measure space, 171 Complete metric space, 435 Complete set, 435 Complete subset of a metric space, 435 Completely regular space, 513 Completeness axiom for the real numbers, 37 Complex conjugate of a function, 421 Complex measure, 379 absolute continuity of, 382 decomposition of, 380 Radon-Nikodym theorem for, 383 total variation of, 381 Composition of two functions, 13 Conditional entropy, 724 Conditional expectation existence of given a a-algebra, 385 given a random variable, 388 given a a-algebra, 384 relative to an event, 300, 384 Conditional probability, 267 existence of given a a-algebra, 373 given a random variable, 375 given a a-algebra, 372 Connected by an arc, 455 set, 453 topological space, 453 Connected component, 457 Continuous at a point, 444 uniformly, 469 weakly, 428 Continuous function on a metric, normed, or topological space, 425 real-valued function of a real variable, 65 with respect to neighborhood bases, 413 on a subset of a topological space, 417 on a topological space, 416 Continuous functions bounded, 489 collection of from a topological space to a metric space, 483 with compact support, 489 vanishing at infinity, 489 Continuous measure, 363, 392 Continuous random variable, 277 Continuous uniform model, 266 Contraction, 499 Contraction mapping principle, 500 Converge absolutely, 531 Convergence almost everywhere, 162 Convergence almost uniformly, 206 Convergence in distribution, 612, 666 Convergence in measure, 203 Convergence of nets, 439 Convergence in probability, 203 Convergent sequence extended sense in 7£*, 44 of real numbers, 43 in a topological space, 433 Convex combination, 619 proper, 619 Convex function, 329 Convex hull, 620 Convex set, 541 extreme point of, 619 face of, 620 Convolution, 643, 656 of two Borel measurable functions, 256 of two а-finite Borel measures, 256, 374 Convolutions of measures, 665 Coordinate, 18 Coordinate projection, 430 Correlation coefficient, 536 Countable set, 21 Countable additivity, 105 Countable subadditivity, 106, 267 of Lebesgue outer measure, 106 of outer measure, 210 property of a measure, 170 Countably infinite set, 21 Counting measure, 169
736 □ Index Covariance, 296, 299 bilinearity property of, 299 Covering, 462 Daubechies, Ingrid biography, 634 Decomposition of complex measures, 380 of finite Borel measures, 395 De Morgan’s Laws for collections of sets, 8 for two sets, 5 DeMoivre-Laplace theorem, 671 Dense, 432 irrational numbers, 39 rational numbers, 39 Dependent events, 269 Derivative of a complex-valued function, 328 of an indefinite integral, 339, 341 in the £2-sense, 676 of monotone functions, 325 Radon-Nikodym, 369 of a real-valued function, 316 Differentiability Lebesgue theorem on for monotone functions, 325 Dimension of a linear space, 498 Dini-derivates, 317 Dini’s theorem, 72, 515 Dirac measure, 170 Directed set, 438 Dirichlet kernel, 640 Dirichlet’s theorem, 646 Disconnected set, 453 topological space, 453 Discontinuity point of, 65 Discontinuous at a point, 65 Discrete distribution function, 396 Discrete measure, 170, 363, 392 Discrete random variable, 276, 374 probability mass function of, 276 Discrete topology, 416 Discrete uniform model, 266 Disjoint sets, 9 pairwise, 9 Distance between a point and a set, 110 between two sets, 110 from a point to a set, 437 Distribution convergence in, 612, 666 Distribution function, 226, 395 decomposition of, 395, 398 discrete, 396 of a finite Borel measure, 221 see also Probability distribution function Distributive law, 36 Distributive laws for union and intersection, 6, $ Domain of a function, 12 Dominated convergence theorem, 197 for convergence in measure, 205 for Lebesgue integration, 154 Dual space, 530 Egorov’s theorem, 139, 207 Empty set, 3 Entropy of a measurable dynamical system, 720 of a measurable partition, 717 Enumeration, 21 Equality of sets, 3 Equicontinuity, 505 Equicontinuous, 505 Equivalence of sets, 21 Equivalence class, 25 Equivalence relation, 25 Equivalent, 21 metrics, 425 norms, 425 Ergodic, 712 Ergodic measure, 631 Ergodic theorem in £2, 715 Ergodicity of a measurable dynamical system, 712 Essential-supremum norm, 423 Event impossible, 263 occurrence of, 263 Event class, 263 Events dependent, 269 independent, 269 mutually exclusive, 263 mutually independent, 269 pairwise independent, 269 pairwise mutually exclusive, 263
Index □ 737 Exhaustion, 479 Expectation, 289 conditional, see Conditional expectation finite, 196, 289 law of total, 301 long-run-average interpretation of, 288 in terms of probability distributions, 290 Expectation of a random variable see Mean of a random variable Expected value of a random variable, 289 Expected value of a random variable see Mean of a random variable Extended real numbers, 40 interval of, 40 Extended real-valued functions, 178 Extensions to measures, 207, 214 existence of, 214 uniqueness of, 216 Extreme point, 619 Face, 620 Fatou’s lemma for the abstract Lebesgue integral, 188 for convergence in measure, 206 for Lebesgue integration, 146, 148 Fejer’s kernel, 642 Field, 420 Field axioms for the real numbers, 36 Finite sequence, 14 set, 21 Finite additivity, 105 Finite Borel measure distribution function of, 395 Finite expectation, 196 Finite intersection property, 471 Finite mean, 196 Finite measure, 173, 195 Finite measure space, 173, 195 Finite sequence, 14 of sets, 9 First category, 495 First countable space, 464 First fundamental theorem of calculus for Lebesgue integration, 339 for Riemann integration, 335 First moment of a random variable, 289 Fixed point, 498 Fourier coefficient, 548 Fourier coefficients, 637 of a periodic measure, 643 Fourier series, 548, 637 convergence in norm, 637 convergence at a point, 637 localization of, 650 Fourier series expansion, 550 Fourier-Stieltjes transform of a finite Borel measure, 257 . Fourier transform, 637, 653 definition of, 202 of an £2-function, 676 of a measure, 663 uniqueness property of, 660 Fourier transform of measures uniqueness property of, 664 Frequency, 636 Fa-set, 127 Fubini’s theorem, 247 Function, 12 absolute value of, 421 compact, 508 complex conjugate of, 421 domain of, 12 extended real-valued, 178 greatest integer, 43 inverse of, 13 monotone, 67 nondecreasing, 67 nonincreasing, 67 one-to-one, 12 onto, 12 peak point of, 481 range of, 12 real and imaginary parts of, 421 and set operations, 15 weakly continuous, 428 Functional linear, 528 Functions addition of, 421 algebra of, 518 composition of, 13 lattice of, 514 maximum of, 421 minimum of, 421 multiplication of, 421
738 □ Index Gaussian function, 655 G$-set, 127 General change of variable formula, 404 Generalized sums, 57 in normed spaces, 548 Gibbs’ phenomenon, 652 Greatest integer function, 43 in a real number, 43 Greatest lower bound, 38 Haar functions, 553, 680 Haar wavelet, 681 Hahn-Banach theorem, 580 complex version, 583 Hahn decomposition, 357 Hahn decomposition theorem, 357, 360 Half-open interval, 61 Hamel basis, 545 Hausdorff space, 448 Heine-Borel theorem, 72, 89, 465 for Rn or Cn, 468 Helly’s selection principle, 618 Hermite polynomials, 677 Hilbert, David biography, 526 Hilbert space, 537 Holder’s inequality, 556 Homeomorphic topological spaces, 417 Homeomorphism, 417 Hyperplane separation by, 604 Ideal, 524 Identities for the real number system, 36 Identity operator, 529 iid, 300, 308 Image of a function, 12, 14 Impossible event, 263 Indefinite Lebesgue integral, 334 Independence, 280 mutual, 281 pairwise, 281 Independent events, 269 Index, 438 set, 439 Indexed collection of sets, 9 Infimum of a subset of real numbers, 38 Infinite set, 21 Infinite sequence, 9, 14 of sets, 9 Infinite series, 56 absolutely convergent, 531 converge absolutely, 531 in normed spaces, 440 Infinite sums in normed spaces, 440 Inner product, 534 Inner product space, 534 Integration by parts for Lebesgue integration, 352 for Lebesgue-Stieltjes integrals, 258 Integration by substitution for Lebesgue integration, 353 for Riemann integration, 402 Interior, 436 Interior point, 436 Internal point, 603 Intersection of a collection of sets, 7 of two sets, 5 Interval closed, 61 in the extended real number system, 40 half-open, 61 open, 58 Intervals, 4 bounded, 4, 41 unbounded, 5, 41 Invariant measure, 631, 698 Inverse of a function, 13 Inverse image of a function, 15 Inverses for the real number system, 36 Inversion theorems, 660 Invertible measure-preserving transformation, 727 Irrational numbers density of in R, 39 Isometric, 553 Isometry, 553 Isomorphism of a measurable dynamical system, 716
Index □ 739 Jacobi theta function identity, 662 Joint probability density function, 279 Joint probability distribution, 279 Joint probability distribution function, 280 Joint probability mass function, 279 Jointly absolutely continuous random variables, 279 joint probability density function of, 279 Jointly discrete random variables, 279 joint probability mass function of, 279 Jordan decomposition theorem, 361, 362 Kakutani-Krein theorem, 516 fc-Cauchy, 485 fc-Complete, 485 Kolmogorov, A. N., 261 biography, 260 Kolmogorov extension theorem, 301 Kolmogorov-Sinai theorem, 728 Krein-Milman theorem, 622 measure-theoretic version, 624 Kronecker’s lemma, 303 ГЧд), 423 Z^-norm, 423 424 L2-ergodic theorem, 715 Г2(д), 423 £2-norm, 423 £2(O), 424 £°°(д), 423 £°°-norm, 423 £°°(Q), 424 £p-spaces, 554 ^(Q), 424 £2(Q), 424 £°°(Q), 424 Laplace transform, 148 Lattice of functions, 514 Law of large numbers, 261 see Strong law of large numbers and Weak law of large numbers Law of total expectation, 301 Law of total probability, 273 Least upper bound, 37 Least upper-bound axiom, 37 Lebesgue almost everywhere, 162 Lebesgue decomposition, 390 Lebesgue decomposition theorem, 390 Lebesgue, Henri biography, 166 Lebesgue integrable, 150, 193, 195 Lebesgue integral of an arbitrary measurable function, 150 of a complex-valued A-measurable function, 194 of an extended real-valued A-measurable function, 192 as an extension of the Riemann integral, 157 of a function defined almost everywhere, 164, 197 indefinite, 334 linearity property of, 153, 196 of a nonnegative extended real-valued A-measurable function, 186 of a nonnegative measurable function, 135 of a nonnegative simple function, 131, 185 second fundamental theorem of calculus for, 350 with respect to a signed measure, 379 Lebesgue integration first fundamental theorem of calculus for, 339 Lebesgue measurable function, 128, 129 Lebesgue integral of, 150 Lebesgue integral of for nonnegative functions, 135 Lebesgue measurable set, 123 in 7£n, 211 Lebesgue measurable sets relation to Borel sets, 125 Lebesgue measure definition of, 123 on a measurable subset of 7£, 169 in ?гп, 214 Lebesgue number, 469 Lebesgue outer measure basic properties of, 106 definition of, 106 finite additivity properties of, 110, 116 in 7£n, 210 Lebesgue singular function, 78 Lebesgue-Stieltjes integral, 227 Lebesgue-Stieltjes measure, 226 Legendre polynomials, 552 Levy’s theorem, 667 Limit with respect to neighborhood bases, 413 of a sequence of real numbers, 43
740 □ Index Limit inferior of a sequence of real numbers, 48 of a sequence of sets, 11 Limit point, 432 of a set of real numbers, 60 Limit superior of a sequence of real numbers, 48 of a sequence of sets, 11 Lindelof property, 462 Lindelof’s theorem, 63 Linear combination, 528 Linear functional nonnegative, 566 Linear mapping bounded, 529 linear functional, 528 linear operator, 431, 528 Linear operator closed, 594 uniform boundedness principle for, 595 Linear space, 420 dimension of, 498 finite dimensional, 498 infinite dimensional, 498 Linear subspace, 421 Linearity property of the Lebesgue integral, 153, 196 Liouville’s theorem, 704 Lipschitzian, 352 Locally compact space, 475 Locally convex topological linear space, 600 Lower bound, 38 Lower Riemann integral, 82 Lower semicontinuous, 475 Lower and upper limits, 316 Lusin’s theorem, 140 Mapping linear, 528 Marginal probability density function, 279 Marginal probability mass function, 279 Markov, A. A., 299 Markov’s inequality, 299 Maximal element, 17 Maximum of a finite set of real numbers, 42 of two functions, 96 of two real numbers, 42 Mean finite, 196 of a population, 300 of a random variable, 289 Mean of a random variable, 189 Measurable dynamical system, 698 entropy of, 720 invariant of, 716 isomorphic, 716 Measurable function complex-valued, 178 extended real-valued, 179 see also Lebesgue measurable function real-valued, 175 Measurable partition, 378 entropy of, 717 refinement of, 718 Measurable rectangle, 232 Measurable set, 104 in the context of outer measure, 211 see also Lebesgue measurable set Measurable space, 168 Measurable transformation, 403 measure induced by, 403 Measure, 104, 168 atom of, 174 atoms of, 392 Borel, 220 complex, 379 continuous, 363, 392 counting, 169 Dirac, 170 discrete, 170, 363, 392 ergodic, 631 extension to, 207 finite, 173, 195 induced by a measurable transformation, 403 invariant, 631, 698 Lebesgue decomposition of, 390 see also Lebesgue measure Lebesgue-Stieltjes, 226 periodic, 643 probability, 169 product, 236 properties of, 170 representing, 624 signed, 356 Measure-preserving transformation invertible, 727 Measures mutually singular, 361
Index □ 741 Measure space, 168 complete, 171 completion of, 172 finite, 173, 195 product, see Product measure space cr-finite, 234 Measure zero, 85 Metric, 420 induced by a norm, 422 restricted, 424 Metric space, 420 compact, 465 complete, 435 Metrizable, 425 topological space, 425 Minimum of a finite set of real numbers, 42 of two functions, 96 of two real numbers, 42 Minkowski’s inequality, 557 .M-measurable function see Lebesgue measurable function Modulus of a function, 194 Monotone, 30 sequence of functions, 70 sequence of real numbers, 44 sequence of sets, 30 Monotone class, 29 Monotone class theorem, 30 Monotone convergence theorem for the abstract Lebesgue integral, 188 for Lebesgue integration, 142, 147 Monotone function, 67 differentiability of, 325 Monotone nondecreasing sequence of sets, 30 Monotone nonincreasing sequence of sets, 30 Monotonicity of Lebesgue outer measure, 106 of outer measure, 210 property of a measure, 170 Multinomial distribution, 287 Multinomial trials, 287 Multiresolution analysis, 681 Mutual independence, 281 Mutually exclusive events, 263 Mutually independent events, 269 Mutually singular measures, 361 Nearest point, 539 Negative part of a function, 149 Negative set, 357 Negative variation of a signed measure, 377 Neighborhood, 412 Neighborhood basis determining a topology, 415 inducing a topology, 415 at a point, 412 on a set, 412 for a topological space, 415 Net, 439 Nondecreasing sequence of functions, 70 sequence of real numbers, 44 Nondecreasing function, 67 Nonincreasing sequence of functions, 70 sequence of real numbers, 44 Nonincreasing function, 67 Nonnegative definite sequence, 626 Nonnegative linear functional, 566 Nonnegativity of Lebesgue outer measure, 106 of outer measure, 210 Norm, 422 metric induced by, 422 Normal number, 311 Normal space, 448 Normed space, 422 infinite series in, 440 infinite sums in, 440 Nowhere dense, 494 nth moment, 293 finite, 293 i/*-measurable sets, 567 One-point compactification, 480 One-to-one, 12 1-1 correspondence, 20 Onto, 12 Open relatively, 417 Open ball induced by a metric, 424 Open covering, 462 Open interval, 58 Open mapping theorem, 590
742 □ Index Open set, 413, 415 of the extended real numbers, 179 in 57 with respect to a neighborhood basis, 413 of a subset of 7£, 62 weakly, 428 Operator linear, 528 Orbit in a measurable dynamical system, 707 Order axioms for the real numbers, 37 Orthogonal complement, 542 Orthogonal elements, 542 Orthogonal projection, 541 Orthogonal set, 545 Orthonormal basis, 545 Orthonormal set, 545 Orthonormal wavelet basis, 679 Outcome space, 263 Outer measure, 106 induced, 209 see also Lebesgue outer measure Lebesgue, see Lebesgue outer measure Pairwise disjoint, 9 sequence of sets, 10 Pairwise independence, 281 Pairwise independent events, 269 Pairwise mutually exclusive events, 263 Partial derivative, 502 Partial ordering, 16 Partially ordered set, 16 Partition, 11 Partition of unity, 477 Peak point, 481 Period, 636 Periodic function, 636 measure, 643 Periodic function, 636 frequency of, 636 Periodic measure, 643 Fourier coefficients of, 643 Piecewise linear, 513 Plancherel’s theorem, 674 Point of discontinuity, 65 Pointwise convergence, 482 of a sequence of functions, 68 of a sequence of real-valued functions, 68 Pointwise ergodic theorem, 707 Pointwise limit of a sequence of functions, 68 Pointwise limits closure under, 69 Poisson distribution, 287 Poisson summation formula, 662 Population mean, 300 Population standard deviation, 300 Positive part of a function, 149 Positive set, 357 property of, 358, 359 Positive variation of a signed measure, 377 Power set, 5 Probabilistically independent, 269 see also Independent events Probability conditional, see Conditional probability relative-frequency interpretation of, 262 Probability density function, 276 joint, 279 marginal, 279 as a Radon-Nikodym derivative, 371 Probability distribution, 275 binomial, 287 chi-square distribution, 285 joint, 279 multinomial, 287 Poisson, 287 standard normal, 285 uniform, 277, 285 Probability distribution function, 277 joint, 280 Probability mass function, 276 joint, 279 marginal, 279 as a Radon-Nikodym derivative, 374 Probability measure, 169 Probability space, 169, 265 Product measure, 236 n-dimensional, 254 Product measure space, 236 completion of, 250 of a finite number of factor spaces, 253 n-dimensional, 254 Product cr-algebra, 236 n-dimensional, 254 Product topology, 430
Index □ 743 Projection in a normed space, 597 Proper convex combination, 619 Proper subset, 4 Quotient topology, 419 Radon-Nikodym derivative, 369 Radon-Nikodym theorem, 354 for complex measures, 383 Random experiment, 263 Random variable, 177, 275 absolutely continuous, 276, 371 characteristic function of, 666 continuous, 277 discrete, 276, 374 expectation of, 289 probability distribution of, 275 probability distribution function of, 277 standard deviation of, 294 variance of, 294 Random variables covariance of, see Covariance independent, 280 joint probability distribution of, 279 joint probability distribution function of, 280 jointly absolutely continuous, 279 jointly discrete, 279 mutually independent, 281 pairwise independent, 281 uncorrelated, 297 Random vector, 278 Range of a function, 12 Rational numbers density of in TZ, 39 Real numbers completeness axiom for, 37 extended, 40 field axioms for, 36 least upper-bound axiom for, 37 order axioms for, 37 Real-valued function, 65 on a set, 65 Rectangle, 18 Refinement of a measurable partition, 718 Regular Borel measures, 564 Relative complement, 6 Relative-frequency interpretation of probability, 262 Relative topology, 417 Relatively open, 417 Representing measure, 624 Restriction of a function, 13 Riemann, Bfernhard biography, 34 Riemann integrable, 82 Riemann integral, 82 lower, 82 upper, 82 Riemann-Lebesgue lemma, 642 Riesz-Markov theorem, 567 Riesz representation theorem for C(Q), 574 for C0(Q), 575 for £p-spaces, 559 Riesz’s theorem on the completeness of £p-spaces, 557 Sample space, 263 Scaling function, 684 Schroder-Bernstein theorem, 26 Second category, 495 Second countable space, 461 Second fundamental theorem of calculus for Lebesgue integration, 350 for Riemann integration, 335 Sections of a function on a product space, 239 of a set in a product space, 237 Semi inner product, 544 Semialgebra, 32, 208 Seminorm, 600 topology induced by a family of, 601 Separable space, 460 Separate points, 515 Separated sets, 447 Separation by a hyper plane, 604 Sequence, 14 convergent, 43 finite, 14 infinite, 14 nonnegative definite, 626 of real numbers, 43 of sets, 9 subsequence of, 14 term of, 14 Sequence of functions monotone, 70 nondecreasing, 70 nonincreasing, 70 pointwise limit of, 68
744 □ Index Sequence of real numbers limit inferior of, 48 limit superior of, 48 monotone, 44 nondecreasing, 44 nonincreasing, 44 Sequence of sets monotone, 30 monotone nondecreasing, 30 monotone nonincreasing, 30 pairwise disjoint, 10 Set, 3 countable, 21 countably infinite, 21 empty, 3 finite, 21 infinite, 21 of measure zero, 85 uncountable, 21 Set operations and functions, 15 Sets algebra of, 26 disjoint, 9 equality, 3 equivalence of, 21 equivalent, 21 finite sequence of, 9 infinite sequence of, 9 limit inferior of, 11 limit superior of, 11 pairwise disjoint sequence of, 10 sequence of, 9 a-algebra of, 28 Shannon, Claude E. biography, 696 Shannon sampling theorem, 677 a-algebra generated by a collection of sets, 29 product, 236 a-algebra of sets, 28 а-finite measure space, 234 Signed measure, 356 Hahn decomposition for, 357, 360 Lebesgue integral with respect to, 379 negative variation of, 377 positive and negative sets for, 357 positive variation of, 377 total variation of, 377 variations of, 377 Signed measures Jordan decomposition of, 361, 362 properties of, 358, 359 Simple function । canonical representation of, 130, 184 Lebesgue integral of, 131, 185 on Q, 184 on 7£, 130 Sine function, 642 Span, 542 Standard deviation of a population, 300 of a random variable, 294 Standard normal distribution, 285 Statistically independent, 269 see also Independent events Step function, 81 Stochastically independent, 269 see also Independent events Stone-Cech compactification, 513 Stone, Marshall Harvey biography, 492 Stone-Weierstrass theorem, 521 complex version, 522 Strictly weaker relation between topologies, 428 Strong law of large numbers, 306 Cantelli’s, 311 iid case, 308 Kolmogorov’s, 306 Sub-basis, 418 Subcovering, 462 Subnet, 443 Subsequence, 14 Subset, 3 proper, 4 Subspace, 421 Sup-norm, 489 Support function, 603 Support of a function, 477 Supremum of a subset of real numbers, 37 Supremum norm, 489 Symmetric difference, 7 Taylor’s theorem for Lebesgue integration, 352 Term of a sequence, 14 Tietze extension theorem, 451 Time-limited measure, 671 Toeplitz’s lemma, 302 Ti-space, 448 Tonelli’s theorem, 245 for the completion of a product measure space, 252
Index □ 745 Topological linear space, 598 locally convex, 600 Topological space, 415 compact, 471 completely regular, 513 first countable, 464 locally compact, 475 metrizable, 425 neighborhood basis for, 415 one-point compactification of, 480 second countable, 461 separable, 460 Topology, 415 determined by a basis, 415 determined by a neighborhood basis, 415 determined by a subbasis, 418. discrete, 416 induced by a collection of seminorms, 601 induced by a neighborhood basis, 415 induced by a norm, 424 quotient, 419 relative, 417 of uniform convergence on compact subsets, 484 Topology of pointwise convergence, 490 Total variation, 331 of a complex measure, 381 of a signed measure, 377 Totally bounded, 465 Totally disconnected space, 458 Transitive property of ordering of 7£, 37 Translated function, 639 Translation-dilation, 654 Translation invariance of Lebesgue outer measure, 106 Translation invariant, 639 Triangle inequality for real numbers, 42 TYichotomous property of ordering of 7£, 37 Tychonoff’s theorem, 510 Unbounded intervals, 5, 41 Uncorrelated, 297 asymptotically, 312 Uncountable set, 21 Uniform boundedness principle, 497 for linear operators, 595 Uniform convergence, 482 on compact subsets, 483, 484 of nets of functions, 447 of a sequence of functions, 69 Uniform distribution, 277, 285 Uniform norm, 489 Uniformly continuous, 469 Uniformly distributed sequences, 652 Union of a collection of sets, 8 of two sets, 5 Unit point mass, 170 see also Dirac measure Upper bound, 17, 37 Upper Riemann integral, 82 Upper semicontinuous, 475 Urysohn metrization theorem, 462 Urysohn, Pavel S. biography, 410 Urysohn’s lemma, 450 Value of a function, 12 Variance finite, 294 of a random variable, 294 of a sum of random variables, 295 Vector space, 420 Vitali cover, 322 Vitali covering theorem, 322 Wavelet, 679 Haar, 681 Wavelet theory, 678 Wavelet transform, 691 Weak law of large numbers, 306 Chebychev’s, 312 Markov’s, 312 Weak topology, 428, 610 Weak* topology, 610 Weaker relation between topologies, 428 Weakly bounded set, 613 Weakly continuous, 428 function, 428 Weakly open set, 428 Well-ordering principle, 22 Weyl’s criteria for uniform distribution, 653 With probability one, 171 Zorn’s lemma, 17
45 43: A Course in Real Analysis fE %; J. N. McDonald, N. A. Weiss ф i? «: ЙШЙШ: Ж I® %: ЕР «I %: £ if: йдашад^^Ш*1П37 3 100010) 010-64015659,64038347 kjsk@vip.sina.com х Jf *: 24 Jf ЕР 5ft: 32 2005^4 Я^ II® 2006^1 2 ЛЕРИ 45 7-5062-6573-7/0 * 430 ISIXSiB: ffl?: 01-2005-1286 Те K|-: 89.00 х t*IS4SElsevier (Singapore) Pte Ш.ШЯЙФИ±й Й^ЖЕР^^.